组由ID和过滤器只有组具有最大平均

我有一个DF如下，组由ID和过滤器只有组具有最大平均

a <- data.frame(group =c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5), count = c(12L, 80L, 102L, 97L, 118L, 115L, 4L, 13L, 136L,114L, 134L, 126L, 128L, 63L, 118L, 1L, 28L, 18L, 18L, 23L)) 

    group count 
1  1 12 
2  1 80 
3  1 102 
4  1 97 
5  2 118 
6  2 115 
7  2  4 
8  2 13 
9  3 136 
10  3 114 
11  3 134 
12  3 126 
13  4 128 
14  4 63 
15  4 118 
16  4  1 
17  5 28 
18  5 18 
19  5 18 
20  5 23

我使用了下面的命令，

a %>% group_by(group) %>% summarise(mean(count)) 

    group mean(count) 
    (dbl)  (dbl) 
1  1  72.75 
2  2  62.50 
3  3  127.50 
4  4  77.50 
5  5  21.75

我想筛选出的条目属于最高平均值的组。这里说的第三组包含的最大平均，所以我的输出应该是，

group count 
1  3 136 
2  3 114 
3  3 134 
4  3 126

任何人都可以给一些想法如何做到这一点？

来源

2016-06-08 haimen

已经有很多的选择。尽管你现有的方法，你只需要添加'％>％slice（which.max（mc））％>％semi_join（a，。，“group”）' –

如果你想看到一个基础R解决方案，可以使用which.max和aggregate：

# calculate means by group 
myMeans <- aggregate(count~group, a, FUN=mean) 

# select the group with the max mean 
maxMeanGroup <- a[a$group == myMeans[which.max(myMeans$count),]$group, ]

作为第二种方法，你可以尝试data.table：

library(data.table) 
setDT(a) 

a[group == a[, list("count"=mean(count)), by=group 
      ][, which.max(count)], ]

group count 
1:  3 136 
2:  3 114 
3:  3 134 
4:  3 126

来源

2016-06-08 19:08:56 lmo

您的基本R方法可以被重写为'子集（对于喜欢“子集”的人，使用（aggregate（count〜group，a，mean），group [which.max（count）]））％的％（％，group％） –

@docendodiscimus在输出上使用'with' '聚合'是一个很酷的想法，我从来没有见过。谢谢你的提示。 – lmo

使用dplyr：

a %>% group_by(group) %>% 
    mutate(mc = mean(count)) %>% ungroup() %>% 
    filter(mc == max(mc)) %>% select(-mc) 

Source: local data frame [4 x 2] 

    group count 
    (dbl) (int) 
1  3 136 
2  3 114 
3  3 134 
4  3 126

另一种选择与data.table：

a[a[, .(mc = mean(count)), .(group)][mc == max(mc), -"mc", with=F], on = "group"] 
    group count 
1:  3 136 
2:  3 114 
3:  3 134 
4:  3 126

来源

2016-06-08 19:02:49 Psidom

你要mutate，而不是summarize这样你就可以把所有的意见在你data.frame。

new_data <- a %>% group_by(group) %>% 
    ##compute average count within groups 
    mutate(AvgCt = mean(count)) %>% 
    ungroup() %>% 
    ##filter, looking for the maximum of the created variable 
    filter(AvgCt == max(AvgCt))

然后你的最终输出

> new_data 
Source: local data frame [4 x 3] 

    group count AvgCt 
    (dbl) (int) (dbl) 
1  3 136 127.5 
2  3 114 127.5 
3  3 134 127.5 
4  3 126 127.5

而且，如果你喜欢删除计算变量，

new_data <- new_data %>% select(-AvgCt) 

> new_data 
Source: local data frame [4 x 2] 

    group count 
    (dbl) (int) 
1  3 136 
2  3 114 
3  3 134 
4  3 126

来源

2016-06-08 19:03:09 BarkleyBG

也许还有些xtabs/tabulate过一些有趣的（如果groups不仅仅是数字，则需要将names添加到which.max呼叫）

a[a$group == which.max(xtabs(count ~ group, a)/tabulate(a$group)),] 
# group count 
# 9  3 136 
# 10  3 114 
# 11  3 134 
# 12  3 126

或合并rowsum

a[a$group == which.max(rowsum.default(a$count, a$group)/tabulate(a$group)), ] 
# group count 
# 9  3 136 
# 10  3 114 
# 11  3 134 
# 12  3 126

来源

2016-06-08 19:34:31

组由ID和过滤器只有组具有最大平均

回答

相关问题