2013-04-24 430 views
1

经过相当长的时间寻找解决方案和摆弄之后,我试图在boxplot上显示加权平均数(我以为我已将此查询提交给ggplot2邮件列表,但那是4个多小时前的事了,我的问题还没有出现,所以担心我在我的帖子中犯了一个错误,我在这里发帖 - 因为我的问题非常紧迫)。如何在箱形图上绘制加权平均值

我在下面提供一个玩具的例子。

#data 

value <- c(5, 7, 8, 6, 7, 9, 10, 6, 7, 10) 
category <- c("one", "one", "one", "two", "two", "two", 
       "three", "three", "three","three") 
weight <- c(1, 1.2, 2, 3, 2.2, 2.5, 1.8, 1.9, 2.2, 1.5) 
df <- data.frame(value, category, weight) 

#unweighted means by category 
ddply(df, .(category), summarize, mean=round(mean(value, na.rm=TRUE), 2)) 

    category mean 
1  one 6.67 
2 three 8.25 
3  two 7.33 

#weighted means by category 
ddply(df, .(category), summarize, 
      wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2)) 

    category wmean 
1  one 7.00 
2 three 8.08 
3  two 7.26 

#unweighted means added to boxplot (which works fine) 
ggplot(df, aes(x = category, y = value, weight = weight)) + 
    geom_boxplot(width=0.6, colour = I("#3366FF")) + 
    stat_summary(fun.y ="mean", geom ="point", shape = 23, 
       size = 3, fill ="white") 

我的问题是,我如何在boxplot上显示加权平均值而不是未加权平均值?

回答

4

您可以将加权平均值保存为新的数据框,然后用它来绘制geom_point()。参数inherit.aes=FALSE将确保在绘制点时不会继承ggplot()调用中提供的信息。

library(Hmisc) 
library(plyr) 
library(ggplot2) 
df.wm<-ddply(df, .(category), summarize, 
      wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2)) 

ggplot(df, aes(x = category, y = value, weight = weight)) + 
    geom_boxplot(width=0.6, colour = I("#3366FF")) + 
    geom_point(data=df.wm,aes(x=category,y=wmean),shape = 23, 
      size = 3, fill ="white",inherit.aes=FALSE) 

enter image description here

+1

那只是医生嘱咐的。非常感谢!这非常有帮助。 – user2317662 2013-04-25 05:57:38

+0

由于某些原因与此代码我得到了一个错误,但在[这个问题](http://stackoverflow.com/questions/3277326/group-by-in-r-ddply-with-weighted-mean)上的代码工作。 – Tom 2013-09-27 07:47:20

+0

@Tom对我来说,这段代码仍然没有任何错误。 – 2013-09-27 07:55:44