频率表与R中的几个变量

我想复制一个经常用于官方统计的表，但目前为止没有成功。鉴于这样一个数据帧：频率表与R中的几个变量

d1 <- data.frame(StudentID = c("x1", "x10", "x2", 
          "x3", "x4", "x5", "x6", "x7", "x8", "x9"), 
      StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'), 
      ExamenYear = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'), 
      Exam   = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'), 
      participated = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'), 
      passed  = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'), 
      stringsAsFactors = FALSE)

我想创造出每年的表，所有的学生（所有）和那些谁是女性，那些谁参加和那些谁通过的数量。请注意以下“哪个”是指所有学生。

一表我的想法会看起来像：

cbind(All = table(d1$ExamenYear), 
    participated  = table(d1$ExamenYear, d1$participated)[,2], 
    ofwhichFemale  = table(d1$ExamenYear, d1$StudentGender)[,1], 
    ofwhichpassed  = table(d1$ExamenYear, d1$passed)[,2])

我肯定是这种事情在R.

注意一个更好的办法：我看到了LaTex的解决方案，但我不使用这将适用于我，因为我需要在Excel中导出表。

在此先感谢

来源

2012-08-07 user1043144

使用plyr：

require(plyr) 
ddply(d1, .(ExamenYear), summarize, 
     All=length(ExamenYear), 
     participated=sum(participated=="yes"), 
     ofwhichFemale=sum(StudentGender=="F"), 
     ofWhichPassed=sum(passed=="yes"))

其中给出：

ExamenYear All participated ofwhichFemale ofWhichPassed 
1  2007 3   2    2    2 
2  2008 4   3    2    3 
3  2009 3   3    0    2

来源

2012-08-07 19:13:18 Andy

谢谢。非常感谢。我一定会学习plyr。 – user1043144 2012-08-07 19:18:53

很好的答案，但比@csgillespie晚一分钟。 – 2012-08-07 19:20:44

@Jilber，我想你的意思是*提前一分钟*。你的评论中不应该有“but”。 – A5C1D2H2I1M1N2O1R2T1 2012-08-07 19:22:51

的plyr包非常适合这样的事情。第一负载包

library(plyr)

然后我们使用ddply功能：

ddply(d1, "ExamenYear", summarise, 
     All = length(passed),##We can use any column for this statistics 
     participated = sum(participated=="yes"), 
     ofwhichFemale = sum(StudentGender=="F"), 
     ofwhichpassed = sum(passed=="yes"))

基本上，ddply期望一个数据帧作为输入，并返回一个数据帧。然后我们将输入数据帧拆分为ExamenYear。在每个子表上，我们计算一些汇总统计。请注意，在ddply中，在引用列时，我们不必使用表示法。

来源

2012-08-07 19:14:21 csgillespie

谢谢。你们都是我的一天 – user1043144 2012-08-07 19:19:12

可能有一些修改（使用with来减少调用df$的次数，并使用字符索引来改善自我文档）到您的代码中，这样可以更容易阅读，并且有价值的竞争对手ddply解决方案：

with(d1, cbind(All = table(ExamenYear), 
    participated  = table(ExamenYear, participated)[,"yes"], 
    ofwhichFemale  = table(ExamenYear, StudentGender)[,"F"], 
    ofwhichpassed  = table(ExamenYear, passed)[,"yes"]) 
    ) 

    All participated ofwhichFemale ofwhichpassed 
2007 3   2    2    2 
2008 4   3    2    3 
2009 3   3    0    2

我希望这是比ddply解决方案更快，但如果你正在处理大型数据集的只会是显而易见的。

来源

2012-08-07 19:28:11

您可能还需要看一看的plyr的下一个迭代：dplyr

它采用了ggplot的语法和编写C关键件++提供快速的性能。

d1 %.% 
group_by(ExamenYear) %.%  
summarise(ALL=length(ExamenYear), 
      participated=sum(participated=="yes"), 
      ofwhichFemale=sum(StudentGender=="F"), 
      ofWhichPassed=sum(passed=="yes"))

来源

2014-01-26 07:24:42

频率表与R中的几个变量

回答

相关问题