通过分配列

我有这样的例子data.frame Aggreageting在data.frame行：通过分配列

set.seed(1) 
df <- data.frame(id = letters[1:10], a = sample(100,10), b = sample(100,10), 
       aggregate_with = c(rep(NA,6),"y","b","b","e"), aggregate_order = c(rep(NA,6),"a,b","a,b","b,a","a,b")) 

> df 
    id a b aggregate_with aggregate_order 
1 a 27 21   <NA>   <NA> 
2 b 37 18   <NA>   <NA> 
3 c 57 68   <NA>   <NA> 
4 d 89 38   <NA>   <NA> 
5 e 20 74   <NA>   <NA> 
6 f 86 48   <NA>   <NA> 
7 g 97 98    y    a,b 
8 h 62 93    b    a,b 
9 i 58 35    b    b,a 
10 j 6 71    e    a,b

我想，他们的aggregate_with值匹配其他行的id值（有效集合行的行自身aggregate_with值无法匹配它自己的id值），我想要应用的函数是根据aggregate_order列中的分配将它们的a和b的值相加。聚合行的id,aggregate_with和aggregate_order应保留由aggregate_with列指示的行的值。

下面是导致data.frame应该是什么样子：

> aggregated.df 
    id a b aggregate_with aggregate_order 
1 a 27 21   <NA>   <NA> 
2 b 134 169   <NA>   <NA> 
3 c 57 68   <NA>   <NA> 
4 d 89 38   <NA>   <NA> 
5 e 26 145   <NA>   <NA> 
6 f 86 48   <NA>   <NA> 
7 g 97 98    y    a,b

正如你所看到的，列在aggregated.df第2行的a是a列a，的总和，和行2，8 b，一9分别在df，反之亦然b列。列a和b第aggregated.df行第5行的a和b行df第5行和第10行。尽管df中的第7行的值为aggregate_with，但它不存在于df中，因此未汇总。

来源

2016-02-29 user1701545

循环 - 但认为有一个更优雅的解决方案。 – user1701545

你应该用自己所拥有的东西进行编辑，这样人们就不会花费很多精力去到你已经存在的地方。 – alistaire

我正在使用data.table库。

library(data.table) 
dt <- as.data.table(df) 

#a table to join with 
dt2 <- dt[, list(id = aggregate_with, a, b, aggregate_order)] 
#set the right order 
dt2[, c('a', 'b') := list(ifelse(aggregate_order == 'a,b', a, b), ifelse(aggregate_order == 'a,b', b, a))] 
setkey(dt2, id) 

#joining tables 
res <- dt2[dt] 

#replacing NA's with 0 and summing 
for (j in c('a', 'b')) set(res, which(is.na(res[[j]])), j, 0) 
res[!aggregate_with %in% id, list(a = sum(a) + i.a[1], b = sum(b) + i.b[1]), by = id]

来源

2016-02-29 08:31:19

回答

相关问题