2017-04-23 76 views
1

我经常使用具有需要分隔字符串值的列的数据框。这是数据录入程序中的“选择多个”选项的结果(我不能不幸地改变)。我试过tidyr::separate但这并没有正确地排列结果。举个例子:将新列值依赖于原始数据的数据帧列拆分

require(tidyr) 
df = data.frame(
    x = 1:3, 
    sick = c(NA, "malaria", "diarrhoea malaria")) 

df <- df %>% 
    separate(sick, c("diarrhoea", "cough", "malaria"), 
      sep = " ", fill = "right", remove = FALSE) 

但我想要的结果看起来像这样:

df2 = data.frame(
    x = 1:3, 
    sick = c(NA, "malaria", "diarrhoea malaria"), 
    diarrhoea = c(NA, NA, "diarrhoea"), 
    cough = c(NA, NA, NA), 
    malaria = c(NA, "malaria", "malaria")) 

在正确的方向任何帮助将非常感激。

回答

1

我们可以separate_rows尝试dcast

library(tidyr) 
library(reshape2) 
library(dplyr) 
separate_rows(df, sick) %>% 
    mutate(sick = factor(sick, levels = c("diarrhoea", "cough", "malaria")), sick1 = sick) %>% 
    dcast(., x~sick, value.var = "sick1", drop=FALSE) %>% 
    bind_cols(., df[2]) %>% 
    select(x, sick, diarrhoea, cough, malaria) 
# x    sick diarrhoea cough malaria 
#1 1    <NA>  <NA> <NA> <NA> 
#2 2   malaria  <NA> <NA> malaria 
#3 3 diarrhoea malaria diarrhoea <NA> malaria 

或者另一种选择是使用cSplitsplitstackshapedcastdata.table

library(splitstackshape) 
dcast(cSplit(df, "sick", " ", "long")[, sick:= factor(sick, levels = 
    c("diarrhoea", "cough", "malaria"))], x~sick, value.var = "sick", drop = FALSE)[, 
     sick := df$sick][] 
+1

感谢@akrun,就像我曾希望工程。 –