2016-12-15 59 views
2

我试图从数据框中计算出家庭规模,其中还包含两类事件:死亡的家庭成员和离开家人的家庭成员。我想考虑这两个参数来计算实际的家庭规模。 这是我的问题的生殖例如,仅具有3个家族:来自数据帧的R计数和减法事件

family <- factor(rep(c("001","002","003"), c(10,8,15)), levels=c("001","002","003"), labels=c("001","002","003"), ordered=TRUE) 
dead <- c(0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0) 
left <- c(0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0) 
DF <- data.frame(family, dead, left) ; DF 

我能数N =总家族成员(在每个家庭)在第二数据帧DF2,通过简单地使用表()

DF2 <- with(DF, data.frame(table(family))) 
colnames(DF2)[2] <- "N" ; DF2 
family N 
1 001 10 
2 002 8 
3 003 15 

但我找不到一个合适的方法来获取实际人数(例如,创建一个新的变量N2到DF2中),通过减去N来计算死亡或离开家庭的成员数量。我想我必须以某种方式将两个数据帧DF和DF2联系起来。我已经在这个网站寻找其他相关的问题,但找不到正确的答案... 如果任何人有一个好主意,这将是伟大的! 预先感谢您.. 杰尼

+0

'library(dplyr); DF%>%group_by(family)%>%summarize(n() - sum(dead)-sum(left))' –

回答

2

逻辑:首先,我们要group_by(family),然后计算2号:1)各组二)总#obs减去这个总的sum(dead) + sum(left)

dplyr包:n()帮助我们获得总#observations每组

data.table.N做同样的工作之上

library(dplyr) 
DF %>% group_by(family) %>% summarise(total = n(), current = n()-sum(dead,left, na.rm = TRUE)) 
# family total current 
# (fctr) (int) (dbl) 
#1 001 10  6 
#2 002  8  4 
#3 003 15  7 


library(data.table) 
# setDT() is preferred if incase your data was a data.frame. else just DF. 
setDT(DF)[, .(total = .N, current = .N - sum(dead, left, na.rm = TRUE)), by = family] 
# family total current 
#1: 001 10  6 
#2: 002  8  4 
#3: 003 15  7 
+1

谢谢Joel为您提供的两种解决方案。这对我来说是一大步,谢谢 – den

+1

请不要发表[代码只是答案](http://meta.stackexchange.com/questions/148272/is-there-any-benefit-to-allowing-code-only -answers-while-blocking-code-only-ques)对于除OP之外的任何人都没有帮助,他/她的具体问题 –

+0

这在上面的例子中很好,但不是在我的真实数据库中,在那里我必须计数某些变量的属性(不仅是0或1):“DF%>%group_by(family)%>%summarize(total = n(),current = n() - sum(dead == 1)-sum(left = = 1))“我得到了以下错误信息:错误mutate_impl(.data,dots): 错误的结果大小(3853),预计33或1 ...任何想法如何解决这个问题?谢谢 – den

2

这里是一个base R选项

do.call(data.frame, aggregate(dl~family, transform(DF, dl = dead + left), 
     FUN = function(x) c(total=length(x), current=length(x) - sum(x)))) 

或修改后的版本是

transform(aggregate(. ~ family, transform(DF, total = 1, 
    current = dead + left)[c(1,4:5)], FUN = sum), current = total - current) 
#  family total current 
#1 001 10  6 
#2 002  8  4 
#3 003 15  7 
0

我终于找到另一个工作正常(从另一篇文章),允许计算从原始DF表中的一切。本品采用ddply功能:

DF <- ddply(DF,.(family),transform,total=length(family)) DF <- ddply(DF,.(family),transform,actual=length(family)-sum(dead=="1")-sum(left=="1")) DF

非常感谢大家谁帮助! Deni