2016-09-20 94 views
1

我一直在疯狂的东西基本...[R计数和列表每列的唯一行满足条件

我想记数和列表以逗号分隔列中的每个唯一的ID上来的数据框架,如:

df<-data.frame(id = as.character(c("a", "a", "a", "b", "c", "d", "d", "e", "f")), x1=c(3,1,1,1,4,2,3,3,3), 
x2=c(6,1,1,1,3,2,3,3,1), 
x3=c(1,1,1,1,1,2,3,3,2)) 

> > df 
    id x1 x2 x3 
1 a 3 6 1 
2 a 1 1 1 
3 a 1 1 1 
4 b 1 1 1 
5 c 4 3 1 
6 d 1 2 2 
7 d 3 3 3 
8 e 1 3 3 
9 f 3 1 2 

我想获得唯一ID的数量满足条件,> 1:

res = data.frame(x1_counts =5, x1_names="a,c,d,e,f", x2_counts = 4, x2_names="a,c,d,f", x3_counts = 3, x3_names="d,e,f") 

> res 
    x1_counts x1_names x2_counts x2_names x3_counts x3_names 
1   5 a,c,d,e,f   4 a,c,d,f   3 d,e,f 

我试图与data.table但似乎很令人费解,即 DT = as.data.table(df) res <-DT [,list(x1 = length(unique(id [which(x1> 1)])),x2 = length(unique(id [which(x2> 1)]))),通过= ID)

但我不能得到它的权利,我不会得到我需要做的data.table,因为它不是一个真正的分组,我正在寻找。你能指导我走正确的道路吗?非常感谢!

回答

2

可以重塑你的数据,以长格式,然后做摘要:

library(data.table) 
(melt(setDT(df), id.vars = "id")[value > 1] 
    [, .(counts = uniqueN(id), names = list(unique(id))), variable]) 
    # You can replace the list to toString if you want a string as name instead of list 

# variable counts  names 
#1:  x1  5 a,c,d,e,f 
#2:  x2  4 a,c,d,e 
#3:  x3  3  d,e,f 

为了得到你所需要的东西,重塑回宽幅:

dcast(1~variable, 
     data = (melt(setDT(df), id.vars = "id")[value > 1] 
       [, .(counts = uniqueN(id), names = list(unique(id))), variable]), 
     value.var = c('counts', 'names')) 

# . counts_x1 counts_x2 counts_x3 names_x1 names_x2 names_x3 
# 1: .   5   4   3 a,c,d,e,f a,c,d,e d,e,f 
+1

谢谢!!!! !我太遥远了,没想到我不得不将数据融化!再次感谢Psidon! – user971102