我对R相对比较陌生,每次我需要“重塑”数据时,我都感到非常困惑。我有一个看起来像这样的数据:只收缩一些变量,长到R的宽格式
拥有:
ID ever_smoked alcoholic medication dosage
1 1 no no humira/adalimumab 40mg
2 1 no no prednisone 15mg
3 1 no no azathioprine 30mg
4 1 no no rowasa 9mg
5 2 yes no lialda 20mg
6 2 yes no mercaptopurine 1g
7 2 yes no asacol 1600mg
旺旺:
ID ever_smoked alcoholic medication
1 1 no no humira/adalimumab, prednisone, azathioprine, rowasa
2 2 yes no lialda, mercaptopurine, asacol
dosage most_recent_med most_recent_dose
1 40mg, 15mg, 30mg, 9mg rowasa 9mg
2 20mg, 1g, 1600mg asacol 1600mg
(请注意,它应该是2个观测和7个变量)。本质上,我想(1)只折叠一些变量,(2)保留其他变量的第一行,并且(3)根据某些变量的最后一个条目创建2个新变量的变量。
代码重现:
have <- data.frame(ID = c(1, 1, 1, 1, 2, 2, 2),
ever_smoked = c("no", "no", "no", "no", "yes", "yes", "yes"),
alcoholic = c("no", "no", "no", "no", "no", "no", "no"),
medication = c("humira/adalimumab", "prednisone", "azathioprine", "rowasa", "lialda", "mercaptopurine", "asacol"),
dosage = c("40mg", "15mg", "30mg", "9mg", "20mg", "1g", "1600mg"), stringsAsFactors = FALSE)
want <- data.frame(ID = c(1, 2),
ever_smoked = c("no", "yes"),
alcoholic = c("no", "no"),
medication = c("humira/adalimumab, prednisone, azathioprine, rowasa", "lialda, mercaptopurine, asacol"),
dosage = c("40mg, 15mg, 30mg, 9mg", "20mg, 1g, 1600mg"),
most_recent_med = c("rowasa", "asacol"),
most_recent_dose = c("9mg", "1600mg"), stringsAsFactors = FALSE)
感谢。