2013-03-05 45 views
13

是否有可能在单个tapply或聚合语句中包含两个函数?单个tapply或聚合语句中的多个函数

下面我使用两个tapply语句和两个汇总语句:一个用于平均值和一个用于标清。
我宁愿合并报表。

my.Data = read.table(text = " 
    animal age  sex weight 
     1 adult female  100 
     2 young male  75 
     3 adult male  90 
     4 adult female  95 
     5 young female  80 
", sep = "", header = TRUE) 

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)})) 
with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x) })) 

with(my.Data, aggregate(weight ~ age + sex, FUN = mean) 
with(my.Data, aggregate(weight ~ age + sex, FUN = sd) 

# this does not work: 

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x) ; sd(x)})) 

# I would also prefer that the output be formatted something similar to that 
# show below. `aggregate` formats the output perfectly. I just cannot figure 
# out how to implement two functions in one statement. 

    age sex mean  sd 
adult female 97.5 3.535534 
adult male  90  NA 
young female 80.0  NA 
young male  75  NA 

我总是可以运行两个单独的语句并合并输出。我只是希望可能有一个更方便的解决方案 。

我发现下面张贴在这里的答案:Apply multiple functions to column using tapply

f <- function(x) c(mean(x), sd(x)) 
do.call(rbind, with(my.Data, tapply(weight, list(age, sex), f))) 

然而,无论是行或列的标签。

 [,1]  [,2] 
[1,] 97.5 3.535534 
[2,] 80.0  NA 
[3,] 90.0  NA 
[4,] 75.0  NA 

我更喜欢base R中的解决方案。plyr软件包的解决方案发布在上面的链接中。如果我可以将正确的行和列标题添加到上面的输出中,那将是完美的。

回答

14

但这些应该有:

with(my.Data, aggregate(weight, list(age, sex), function(x) { c(MEAN=mean(x), SD=sd(x))})) 

with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x))})) 
# Not a nice structure but the results are in there 

with(my.Data, aggregate(weight ~ age + sex, FUN = function(x) c(SD = sd(x), MN= mean(x)))) 
    age sex weight.SD weight.MN 
1 adult female 3.535534 97.500000 
2 young female  NA 80.000000 
3 adult male  NA 90.000000 
4 young male  NA 75. 

原则被遵守是让你的函数返回“一件事”,这可以是一个向量或列表,但不能是两个函数的连续调用调用。

+0

谢谢!这两个汇总语句有效。 tapply语句似乎不起作用,但我可以使用聚合方法。 – 2013-03-05 03:22:46

+1

嗯,我认为它“有效”,只是不给你打印很好的东西。用(my.Data,tapply(weight,list)(age,sex),function(x){c(mean(x),sd(x))}))[1,1]'试试''在列表矩阵内部查看索引。 – 2013-03-05 03:29:20

+0

我明白了。谢谢。如果我将整个陈述放在colnames()或rownames()中,那么我就可以得到标签。 – 2013-03-05 03:35:55

8

如果您想使用data.table,它有withby内置到它:

library(data.table) 
myDT <- data.table(my.Data, key="animal") 


myDT[, c("mean", "sd") := list(mean(weight), sd(weight)), by=list(age, sex)] 


myDT[, list(mean_Aggr=sum(mean(weight)), sd_Aggr=sum(sd(weight))), by=list(age, sex)] 
    age sex mean_Aggr sd_Aggr 
1: adult female  96.0 3.6055513 
2: young male  76.5 2.1213203 
3: adult male  91.0 1.4142136 
4: young female  84.5 0.7071068 

我用一个稍微不同的数据集,从而不会有NA值SD

4

重塑让你传递2个函数; reshape2不会。

library(reshape) 
my.Data = read.table(text = " 
    animal age  sex weight 
     1 adult female  100 
     2 young male  75 
     3 adult male  90 
     4 adult female  95 
     5 young female  80 
", sep = "", header = TRUE) 
my.Data[,1]<- NULL 
(a1<- melt(my.Data, id=c("age", "sex"), measured=c("weight"))) 
(cast(a1, age + sex ~ variable, c(mean, sd), fill=NA)) 

#  age sex weight_mean weight_sd 
# 1 adult female  97.5 3.535534 
# 2 adult male  90.0  NA 
# 3 young female  80.0  NA 
# 4 young male  75.0  NA 

我欠了@Ramnath,他在昨天注意到了this

6

本着共享的精神,如果您熟悉SQL,您可能还会考虑“sqldf”包。 (着重强调,因为你需要知道的,例如,该mean是为了得到你想要的结果avg

sqldf("select age, sex, 
     avg(weight) `Wt.Mean`, 
     stdev(weight) `Wt.SD` 
     from `my.Data` 
     group by age, sex") 
    age sex Wt.Mean Wt.SD 
1 adult female 97.5 3.535534 
2 adult male 90.0 0.000000 
3 young female 80.0 0.000000 
4 young male 75.0 0.000000