我拥有与案例及其联系人有关的广泛数据集。 (这是一个制作的例子;真实的数据集要大得多)。选择多个列并重新整形为长
structure(list(record_id = structure(1:4, .Label = c("01-001",
"01-002", "01-003", "01-004"), class = "factor"), place = structure(c(1L,
2L, 1L, 1L), .Label = c("a", "b"), class = "factor"), sex = structure(c(2L,
2L, 1L, 2L), .Label = c("F", "M"), class = "factor"), age = c(4L,
13L, 28L, 44L), d02_1 = c(2L, 2L, NA, 2L), d02_2 = structure(c(3L,
2L, 1L, 3L), .Label = c("", "F", "M"), class = "factor"), d02_3 = c(27L,
16L, NA, 66L), d03_1 = c(3L, 3L, NA, 3L), d03_2 = structure(c(3L,
3L, 1L, 2L), .Label = c("", "F", "M"), class = "factor"), d03_3 = c(14L,
55L, NA, 12L), d04_1 = c(4L, NA, NA, NA), d04_2 = structure(c(2L,
1L, 1L, 1L), .Label = c("", "M"), class = "factor"), d04_3 = c(7L,
NA, NA, NA)), .Names = c("record_id", "place", "sex", "age",
"d02_1", "d02_2", "d02_3", "d03_1", "d03_2", "d03_3", "d04_1",
"d04_2", "d04_3"), row.names = c(NA, -4L), class = "data.frame")
其中:
- RECORD_ID是本案的唯一标识符
- 的地方是哪里的情况下生活
- 年龄的地方是区分年龄
性别区分的性别
d02_1,d03_1,d04_1 ... d0j_1是联系人的ID小号
- d02_2,d03_2,d04_2 ... d0j_2是联系人的性别
- d02_3,d03_3,d04_3 ... d0j_3是接触的年龄
在真实数据集,也有每箱许多潜在的接触,以及更多与联系人特征相关的变量。并非所有案件都会有联系。
我想数据集重塑一个整洁的格式,每箱/接触一排,即:
id case place sex age
1 01-001 1 a M 4
2 01-001-2 0 a M 27
3 01-001-3 0 a M 14
4 01-001-4 0 a M 7
5 01-002 1 b M 13
6 01-002-2 0 b F 16
7 01-002-3 0 b M 55
8 01-003 1 a F 28
9 01-004 1 a M 44
10 01-004-2 0 a M 66
11 01-004-3 0 a F 12
我想,我需要创建与每个联系人列名的载体(可能在列名上使用字符匹配),按顺序选择这些列,并将它们附加到对方(以及连接案例/联系人ID),但真的很困难,无法进行大量的代码行复制。必须是更有效的方法?
这是没有帮助:http://stackoverflow.com/questions/40229114/tidyrgather-multiple-columns-of-varying-types?rq=1。似乎基本上是一样的东西。 – MrFlick
你应该在阅读时设置'na.strings ='''。这没有什么意义/让所有事情都变得更难以在那里空白...... – Frank