dplyr/tidyr - 总结与条件的数据

问题我想使用dyplr & tidyr实现一个输出表（如应急表，我认为），它总结了数据转换成频率（例如标题的计数，描述&负号，中性号和正号）。我已经尝试了许多不同的方法，我可以找到最近的例子是在Using Tidyr/Dplyr to summarise counts of groups of strings。但这并不适合，相当。dplyr/tidyr - 总结与条件的数据

示例数据 数据看起来有点像...

df <- data.frame("story_title"=c(0.0,0.0,0.0,-1.0,1.0), 
        "story_description"=c(-0.3,-0.3,-0.3,0.5,0.3), 
        "story_body"=c(-0.3,0.2,0.4,0.2,0))

所需的输出 输出将有希望看起来有点像这样，显示每个故事部分的摘要频率.. 。

    Negative Neutral Positive 
story_title    1   3  1   
story_description  3   0  2 
story_body    1   1  3

（编辑总计为story_body - 感谢Akrun）

尝试的方法

如果我是正确的第一步将使用gather正是如此重塑数据...

df <- df %>% gather(type,score,starts_with("story")) 

> df 
     type score 
1  story_title 0.0 
2  story_title 0.0 
3  story_title 0.0 
4  story_title -1.0 
5  story_title 1.0 
6 story_description -0.3 
7 story_description -0.3 
8 story_description -0.3 
9 story_description 0.5 
10 story_description 0.3 
11  story_body -0.3 
12  story_body 0.2 
13  story_body 0.4 
14  story_body 0.2 
15  story_body 0.0

在这里，我认为这是GROUP_BY的组合和总结我试过...

df %>% group_by(sentiment) %>% 
      summarise(Negative = count("sentiment_title"<0), 
        Neutral = count("sentiment_title"=0), 
        Positive = count("sentiment_title">0) 
        )

显然这不起作用。

任何人都可以用dplyr/tidyr解决方案帮助（基表的答案也将作为一个例子很有用）？

来源

2015-11-06 BarneyC

我想'story_body'应该是'1 1 3' – akrun

尝试

library(dplyr) 
library(tidyr) 
gather(df) %>% 
     group_by(key,value= sign(value))%>% 
     tally() %>% 
     mutate(ind= factor(value, levels=c(-1,0,1), 
        labels=c('Negative', 'Neutral', 'Positive'))) %>% 
     select(-value) %>% 
     spread(ind, n, fill=0)

来源

2015-11-06 11:17:10 akrun

我喜欢用'sign'的想法。我可以用它缩短我的时间。 –

肯定不会那么简单，我本来以为，同意符号（）任务是一个没有教养的小举动。当我免费 – BarneyC

@BarneyC将与一些解释以后更新。 – akrun

尝试使用cut重新标记三类。然后，它仅仅是一个gather熔化数据和重塑“宽”与dcast的问题。

library(tidyr) 
library(reshape2) 
df[] <- lapply(df, function(x) {cut(x, c(-Inf,-1e-4,0,Inf), c("Negative", "Neutral", "Positive"))}) 
dcast(gather(df), key~value) 
#   key Negative Neutral Positive 
#1  story_title  1  3  1 
#2 story_description  3  0  2 
#3  story_body  1  1  3

来源

2015-11-06 11:22:38

为什么不使用原生R的xtabs？

从您的代码继：

>df <- df %>% gather(type,score,starts_with("story")) 
>df$movement<-ifelse(df$score ==0 ,"Neutral",ifelse(df$score < 0 ,"Negative","Positive")) 
>xtabs(~df$type+df$movement) 

         df$movement 
    df$type    Negative Neutral Positive 
    story_title    1  3  1 
    story_description  3  0  2 
    story_body    1  1  3

来源

2015-11-06 11:43:20 Pash101

dplyr/tidyr - 总结与条件的数据

回答

相关问题