我想通过两个分组变量(resp & company)和三个数字响应变量(质量,数量,意义)将宽数据帧整形为宽数据框。我试图用dcast函数来完成它,但它不允许我通过两个变量进行分组。谁能帮我吗?使用由两个因素分组的合并函数将长整型数据帧重整为宽数据框
#Current long dataframe: two grouping variables (resp & company), three numerical respons variables (Quality, Amount, Sense)
resp <- c(1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109)
company <- c("Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Manual.nl","Manual.nl","Manual.nl","Dark.nl","Dark.nl","Dark.nl")
question <- c("Quality","Quality","Quality","Amount","Amount","Amount","Quality","Quality","Quality","Sense","Sense","Sense")
score <- c(4,1,2,6,8,10,5,5,7,4,6,7)
current <- data.frame(resp,company,question,score,answer); current
#Desired wide dataframe
resp2 <- c(1325851107,1325851107,1325851108,1325851108,1325851109,1325851109)
company2 <- c("Dark.nl","Manual.nl","Dark.nl","Manual.nl","Dark.nl","Manual.nl")
Quality <- c(4,5,1,5,2,7)
Amount <- c(6,NA,8,NA,10,NA)
Sense <- c(4,NA,6,NA,7,NA)
desired <- data.frame(resp2,company2,Quality,Amount,Sense); desired
#Using dcast function to reshape
library("reshape2")
dcast(current, resp + company ~ question, value.var="score")
Parfait提供的合并函数有效。我在这里提供了制作技巧的脚本(谢谢Parfait;))。
cols2keep <- c("resp", "company", "score")
df <- merge(current[current$question=='Quality', cols2keep], #merge two dataframes
current[current$question=='Amount', cols2keep],
by=c("resp", "company"), all=TRUE)
df <- merge(df,current[current$question=='Sense', c("resp","company","score")], #merge third respons variable into new dataframe
by=c("resp", "company"), all=TRUE)
colnames(df) <- c("resp","company","quality","amount","sense")
该解决方案有效,但我的真实数据集存在53个响应变量。因此这种方法非常耗时。我尝试了Parfait的迭代方法,但是我得到以下错误。
dfList <- lapply(unique(current$question), function(i){
temp <- setNames(current[current$question==i, c("resp", "company", "score")],
c("resp", "company", paste0(i)))
})
finaldf <- Reduce(function(...) merge(..., y=c("resp", "company"), all=T), dfList)
Error in fix.by(by.x, x) :
'by' must specify one or more columns as numbers, names or logical
我对R编码比较陌生,无法掌握我写的错误。我对现在的解决方案感到满意,但如果有更高效的解决方案,我愿意接受。
非常感谢你Parfait。这个脚本很容易使用,并产生我想到的数据框。 – SHW
好听!乐意效劳。请接受以确认解决方案。快乐的编码! – Parfait
现在我遇到一些困难时,我的一个分组变量(公司)由两个以上的级别组成(请参阅我已添加到原始帖子中的附加代码:#Grouping变量超过两个级别,包括“Senses”)。我得到这个错误:fix.by(by.x,x)中的错误:'by'必须指定一个或多个列作为数字,名称或逻辑。任何想法这里出了什么问题? – SHW