有可能做到这一点更有创意的方式,但这里的使用dplyr
包解决方案R.
library(dplyr)
lapply(B$Class, function(x) {
mask <- B$Class == x
data.frame(Class = x,
Students = unlist(strsplit(B$Students[mask], ',')),
stringsAsFactors = F)
}) %>%
bind_rows() %>%
full_join(A, by = 'Students') %>%
group_by(Class) %>%
summarize(`Mean Score` = mean(Test.Score)) %>%
full_join(B, by = 'Class')
分步
的dplyr
包有助于数据操作步骤。这是一个可重现的例子。
library(dplyr)
A <- read.csv(text = 'Students,Test Score
A, 100
B, 81
C, 92
D, 88', stringsAsFactors = F)
B <- read.csv(text = 'Class, Students
1,"{A,D}"
2,"{B,C}"', stringsAsFactors = F) %>%
mutate(Students = gsub('\\{|\\}', '', Students))
str(A)
# 'data.frame': 4 obs. of 2 variables:
# $ Students : chr "A" "B" "C" "D"
# $ Test.Score: int 100 81 92 88
str(B)
# 'data.frame': 2 obs. of 2 variables:
# $ Class : int 1 2
# $ Students: chr "A,D" "B,C"
做一些字符操纵将您的B表转换为“长”格式。
C <- lapply(B$Class, function(x) {
mask <- B$Class == x
data.frame(Class = x,
Students = unlist(strsplit(B$Students[mask], ',')),
stringsAsFactors = F)
}) %>%
bind_rows()
str(C)
# 'data.frame': 4 obs. of 2 variables:
# $ Class : int 1 1 2 2
# $ Students: chr "A" "D" "B" "C"
将学生的成绩添加到我们的“长”表中。
D <- full_join(A, C, by = 'Students')
str(D)
# 'data.frame': 4 obs. of 3 variables:
# $ Students : chr "A" "B" "C" "D"
# $ Test.Score: int 100 81 92 88
# $ Class : int 1 2 2 1
按照班级对学生进行分组并计算每班的平均分数。然后,添加一个列,其中包括哪些学生在课堂上。
E <- D %>%
group_by(Class) %>%
summarize(`Mean Score` = mean(Test.Score)) %>%
full_join(B, by = 'Class')
str(E)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ Class : int 1 2
# $ Mean Score: num 94 86.5
# $ Students : chr "A,D" "B,C"
第二张表中的“学生”列是什么类?一个向量或列表? – www
它实际上是一个因素,因为源文件是针对该列的格式:“{A,D}”等 – user7729135