聚合数据帧到一个频率表

我期待从东西看起来像这样，与变量重塑一个数据帧：聚合数据帧到一个频率表

Year, University, Degree, Gender

，每行描绘了一个学生的条目，如：

2017, University College London, Social Science, Male 

2017, University of Leeds, Social science, Non-Binary

我想从这些数据中创建一个频率表，以便压缩行数，这样对于每个大学，每个学位类别有19行，然后对于每个学位，每个学位的计数/频率显示性别，看起来像这样。

Year University Degree [Gender (Male, Female, Non-Binary)] 

2017 UCL Biological Sciences 1 0 2

我希望这是有道理的。感谢您的帮助。

编辑：我现在想能够使用数据的一个子集将这些数据绘制成折线图。我目前子集划分的绘图功能外，像这样

subsetucl <- TFtab[which(TFtab$University == 'University College London'),] 
ggplot(data=subsetucl, aes(x=Year, y=Female, group=Degree, color = Degree)) + geom_line()+ geom_point(size = 0.8) + xlab("Year of application") + ylab("Frequnecy of Females") + ggtitle("UCL Applications by Degree (2011-2017)") + theme_bw()

什么是对绘图功能子集内的数据的最佳方式，如何进行最佳的显示行于所有性别，而不仅仅是女性的频率。谢谢

来源

2017-07-19 James Todd

继承人与dplyr非常解决。

library("dplyr") 
data %>% 
    group_by(University, Degree, Gender) %>% 
    count()%>% 
    spread(key = Gender, value = n, fill = 0)

但是严重使用堆栈溢出的搜索功能。 Here's a book to help with R

来源

2017-07-19 14:27:46 svenhalvorson

这是有益的，但对于性别频率都在一列中，而不是为每个单独的列性别等级。这也会删除0个值。有没有办法保持0值？ –

'data％>％ group_by（Univesity，Degree，Gender）％>％ count（）％>％ spread（key = Gender，value = n，fill = 0）' – svenhalvorson

这为值为0的行添加了0但是对于整个0行，没有行。有没有办法做到这一点？@svenhalvorson –

1）aggregate/model.matrix试试这个单行聚合解决方案。没有包被使用。

aggregate(model.matrix(~ Gender + 0) ~ Year + University + Degree, DF, sum)

，并提供：

Year    University   Degree GenderFemale GenderMale GenderNon-Binary 
1 2017  University of Leeds Social science   1   0    1 
2 2017 University College London Social Science   0   1    0

2）集料/ cbind也将是可能的写出来使用cbind(...)这样的model.matrix(...)部分可更清楚虽然乏味：

aggregate(cbind(Female = Gender == "Female", Male = Gender == "Male", 
      `Non-Binary` = Gender == "Non-Binary") ~ Year + University + Degree, DF, sum)

给出以下与上面相同的内容，只是列名略有变化：

Year    University   Degree Female Male Non-Binary 
1 2017  University of Leeds Social science  1 0   1 
2 2017 University College London Social Science  0 1   0

注：在重现的形式在实施例中使用上述的输入是：

Lines <- "Year, University, Degree, Gender 
2017, University College London, Social Science, Male 
2017, University of Leeds, Social science, Non-Binary 
2017, University of Leeds, Social science, Female" 
DF <- read.csv(text = Lines, strip.white = TRUE)

来源

2017-07-19 16:17:48

这似乎并没有为我工作 –

假设“似乎没有工作”意味着你只是想聚合的价值，而不是有一个完整的n路表我修改了使用聚合答案。 –

这现在可以工作，非常感谢 –

聚合数据帧到一个频率表

回答

相关问题