2017-04-21 93 views
1

生成虚拟数据具有唯一值的列

MainID=c('A1','A1','B2','C1','C1','C1','D2','D2') 
HouseholdID=c('Ab1','Ab1','cb2','Ca2','cb2','cb3','Da1','db2') 
relation=c('Spouse','Spouse','Child','Spouse','Child','Mother','Brother','Spouse') 

df=data.table(MainID,HouseholdID,relation) 
head(df) 

    MainID HouseholdID relation 
1:  A1   Ab1 Spouse 
2:  A1   Ab1 Spouse 
3:  B2   cb2 Child 
4:  C1   Ca2 Spouse 
5:  C1   cb2 Child 
6:  C1   cb3 Mother 

重塑数据,我需要重塑如下这样的数据:

期望的结果

MainID  Household1  Relation1  Household2   Relation2   Household3  Relation3 
A1    Ab1   Spouse   NA     NA     NA    NA 
B2    cb2   Child   NA     NA     NA    NA 
C1    Ca2   Spouse   cb2     Child    cb3   Mother 
D2    Da1   Brother   db2     Spouse    NA    NA  

什么是做到这一点的最好办法使用dplyr , reshape , tidyverse或任何其他方法/包?

回答

0

既然你已经在使用“data.table”,你可以只取唯一值,然后添加一行指示变量,最后dcast以宽幅:

library(data.table) 
dcast(unique(df)[, ind := rowid(MainID)], 
     MainID ~ ind, value.var = c("HouseholdID", "relation")) 
# MainID HouseholdID_1 HouseholdID_2 HouseholdID_3 relation_1 relation_2 relation_3 
# 1:  A1   Ab1   NA   NA  Spouse   NA   NA 
# 2:  B2   cb2   NA   NA  Child   NA   NA 
# 3:  C1   Ca2   cb2   cb3  Spouse  Child  Mother 
# 4:  D2   Da1   db2   NA Brother  Spouse   NA 
相关问题