2017-09-29 96 views
0

数据描述了两个村庄Villariba和Villabajo之间长达4000多米长的道路上树木的商品分布(苹果和香蕉)。数据已经被分类(即每500米提供一个总结),或者提供了大量的地点误差,因此500米的分类是很自然的。我们想要通过内核平滑处理并将它们绘制成平滑的后分布分布。有两种明显的方法可以在ggplot2包中执行此操作。首先读取数据(长格式)。R:使用ggplot2平滑处理数据图中的binned数据

library(ggplot2) 
databas<-read.csv(text="dist,stuff,val 
500,apples,10 
1250,apples,25 
1750,apples,55 
2250,apples,45 
2750,apples,25 
3250,apples,10 
3750,apples,5 
500,bananas,7 
1250,bananas,14 
1750,bananas,20 
2250,bananas,17 
2750,bananas,10 
3250,bananas,30 
3750,bananas,20") 

的第一次尝试是一个无聊的barplot与geom_col()。接下来,我们可以分别使用密度图(geom_density())和平滑曲线(stat_smooth()或等效geom_smooth())中包含的两个ggplot2工具。该三种方式实现如下:

p1<-ggplot(databas,aes(dist,val,fill=stuff,alpha=0.5))+geom_col(alpha=0.5,position="dodge") 
    p2<-ggplot(databas,aes(dist,val,fill=stuff))+stat_smooth(aes(y=val,x=dist),method="gam",se=FALSE,formula=y~s(x,k=7)) 
    p3<-ggplot(databas,aes(dist,val,fill=stuff,alpha=0.5))+geom_density(stat="identity") 

library(gridExtra) 
grid.arrange(p1,p2,p3,nrow=3) 

three plots with density smoothing in ggplot2

有每一个方法的不足之处。叠加密度图(底部图)是最想要的设计,但是选项stat="identity"(因为数据是分级的)可防止创建精美的平滑分布,就像通常一样。 stat_smooth()选项提供几乎优秀的曲线,但这些只是曲线。那么,如何将来自密度图的着色和平滑函数的平滑结合起来呢?这是为了平滑geom_density()中的数据,还是在stat_smooth()曲线下用半透明颜色填充空间?

回答

2

如果你喜欢你gam配合,你可以使用stat = "smooth"geom_ribbon绘制曲线。诀窍是将ymin设置为0并将ymax设置为..y..,这是由预测线stat_smooth创建的特殊变量。

ggplot(databas, aes(x = dist, y = val, fill = stuff)) + 
    geom_ribbon(stat = "smooth", aes(ymin = 0, ymax = ..y..), alpha = .5, 
       method = "gam", se=FALSE, formula = y ~ s(x, k = 7)) 

enter image description here

2

这里有一种方法:

library(ggplot2) 
p2 <- ggplot(databas, aes(dist ,val ,fill = stuff)) + stat_smooth(aes(y = val,x = dist), method = "gam",se = FALSE,formula = y ~ s(x, k = 7)) 

提取曲线与ggplot_build

p2_build = ggplot_build(p2) 
p2_fill <- data_frame(
    x = p2_build$data[[1]]$x, 
    y = p2_build$data[[1]]$y, 
    group = factor(p2_build$data[[1]]$group, levels = c(1,2), labels = c("apples","bananas"))) 

加彩与geom_area

p2 + geom_area(data = p2_fill[p2_fill$group == "apples", ], 
        aes(x=x, y=y), fill = "red", alpha = 0.2)+ 
    geom_area(data = p2_fill[p2_fill$group == "bananas", ], 
      aes(x=x, y=y), fill = "blue", alpha = 0.2) 

enter image description here

完整的答案:

ggplot(databas, aes(dist, val, color = stuff))+ 
    stat_smooth(aes(y = val,x = dist), method = "gam",se = FALSE, formula = y ~ s(x, k = 7))+ 
    geom_area(data = p2_fill[p2_fill$group == "apples", ], 
      aes(x=x, y=y), fill = "#F8766D", alpha = 0.2, inherit.aes = F)+ 
    geom_area(data = p2_fill[p2_fill$group == "bananas", ], 
      aes(x=x, y=y), fill = "#00BFC4", alpha = 0.2, inherit.aes = F)+ 
    theme_classic() 

enter image description here