子集和总结数据帧

我的目标是：给定的二元反应（如0和1）一个数据帧，我怎么能产生摘要矩阵：1）有两列（一个为正确回答第一个问题，另一个用于错误地回答），以及2）具有与获得特定总分的个人数相关的行。子集和总结数据帧

例如，说我有50名受访者，以及5个问题。这意味着有6种响应模式（全部不正确/ 0，然后是1,2,3，和4正确，最后是全部正确/ 1）。我想结果矩阵对象的样子：

... INCORRECT ..... CORRECT <-- pertaining to a 0 or 1 on the first item respectively 

[1]... 10 ............ 0  <-- indicating people who, after responded 0 on the first question, responded 0 on all questions (5 zeroes) 
[2]... 8 ............ 2  <-- indicating 12 people who got 1 correct (8 got the first question incorrect, 2 got the first question correct) 
[3]... 4 ............. 8  <-- indicating 12 people who got 2 correct (4 got the first question incorrect but got 2 of the other questions correct, 8 got the first question and 1 other correct) 
[4]... 6 ............. 3  <-- indicating 9 people who got 3 correct 
[5]... 3 ............. 4  <-- indicating 7 people who got 4 correct 
[6]... 0 ............. 8  <-- pertaining to the 8 people who answered all 5 questions correctly (necessarily indicating they got the first question correct).

我的思路是，我需要通过对第一个问题的表现拆分数据帧（工作在一次一列），并找到每个总和分数行（参与者），然后将它们列表到第一列;那么对第二个做同样的事情？

这是要建设成一个包，所以我试图找出如何只使用基础功能做到这一点。

下面是类似的例子集我将与合作：

n <- 50 
z <- c(0, 1) 
samp.fun <- function(x, n){ 
    sample(x, n, replace = TRUE) 
} 

data <- data.frame(0) 
for (i in 1:5){ 
    data[1:n, i] <- samp.fun(z, n) 
} 
names(data)[1:5] <- c("x1", "x2", "x3", "x4", "x5")

任何想法，将非常感激！

来源

2013-03-16 Twitch_City

使用@ alexwhan的数据，这里有一个data.table解决方案：

require(data.table) 
dt <- data.table(data) 

dt[, list(x1.incorrect=sum(x1==0), x1.correct=sum(x1==1)), keyby=total] 
# total x1.incorrect x1.correct 
# 1:  0   2   0 
# 2:  1   7   1 
# 3:  2   9   8 
# 4:  3   7   6 
# 5:  4   0   7 
# 6:  5   0   3

等价，你可以得到的结果更直接的，如果你不介意以后设置的列名，使用table与as.list如下：

dt[, as.list(table(factor(x1, levels=c(0,1)))), keyby=total] 
# total 0 1 
# 1:  0 2 0 
# 2:  1 7 1 
# 3:  2 9 8 
# 4:  3 7 6 
# 5:  4 0 7 
# 6:  5 0 3

注意：您可以setNames()像包裹as.list(.)：

dt[, setNames(as.list(table(factor(x1, levels=c(0,1)))), 
      c("x1.incorrect", "x1.correct")), keyby = total]

也一次性设置列名称。

来源

2013-03-16 08:26:18 Arun

每次发布信息时，这让我很沮丧，我还没有来得及学会data.table但 – alexwhan 2013-03-16 10:56:57

@alexwhan，你的'ddply'解决方案可能只是：'> ddply（数据，（总），总结，N = sum（x1 == 0），y = sum（x1 == 1））'no？为什么不尝试'data.table'解决方案？如果有改进，我可以根据我的知识来纠正你的问题...... – Arun 2013-03-16 13:55:03

我打算将其标记为已解决，尽管我希望找到一个不需要外部软件包（如data.table，reshape2，或者plyr）。感谢您的帮助！ – 2013-03-22 00:49:27

因为你创建你的数据时，没有使用set.seed，我无法对证的例子这一解决方案，但我认为这是你以后。我使用reshape2和plyr中的函数来获取数据的摘要。

library(reshape2) 
library(plyr) 
#create data 
set.seed(1234) 
n <- 50 
z <- c(0, 1) 
samp.fun <- function(x, n){ 
    sample(x, n, replace = TRUE) 
} 

data <- data.frame(0) 
for (i in 1:5){ 
    data[1:n, i] <- samp.fun(z, n) 
} 
names(data)[1:5] <- c("x1", "x2", "x3", "x4", "x5") 
data$id <- 1:50 

#First get the long form to make summaries on 
data.m <- melt(data, id.vars="id") 

#Get summary to find total correct answers 
data.sum <- ddply(data.m, .(id), summarise, 
        total = sum(value)) 

#merge back with original data to associate with id 
data <- merge(data, data.sum) 
data$total <- factor(data$total) 

#summarise again to get difference between patterns 
data.sum2 <- ddply(data, .(total), summarise, 
       x1.incorrect = length(total) - sum(x1), 
       x1.correct = sum(x1)) 
data.sum2 
# total x1.incorrect x1.correct 
# 1  0   2   0 
# 2  1   7   1 
# 3  2   9   8 
# 4  3   7   6 
# 5  4   0   7 
# 6  5   0   3

来源

2013-03-16 04:44:45 alexwhan

为什么不只是做'rowSums'计算'total'。当它们都是唯一的时候，为什么要用'id'作为子集？ – Arun 2013-03-16 23:28:57

-1

不错的益智 - 如果我得到它的权利这也应该做到这一点：

table(rowSums(data),data[,1])

来源

2013-03-16 19:06:55 texb

不幸的是，这并不符合我的需求。谢谢你的尝试！ – 2013-03-22 00:50:03

感谢您的反馈意见 - 但是，您是否看到它似乎有什么问题？（使用用'set.seed（1234）; n < - 50 ....'创建的alexwhan数据，我的输出与Arun的输出类似。） – texb 2013-03-22 09:34:18

子集和总结数据帧

回答

相关问题