这个问题涉及 Create custom geom to compute summary statistics and display them *outside* the plotting region :;GGPLOT2:添加样品尺寸信息x轴刻度标签


stripchart_w_n_ver1 <- function(data, x.var, y.var) { 
    x <- factor(data[, x.var]) 
    y <- data[, y.var] 
# Need to call plot.default() instead of plot because 
# plot() produces boxplots when x is a factor. 
    plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var) 
    levels.x <- levels(x) 
    x.ticks <- 1:length(levels(x)) 
    axis(1, at = x.ticks, labels = levels.x) 
    n <- sapply(split(y, x), length) 
    mtext(paste0("N=", n), side = 1, line = 2, at = x.ticks) 

stripchart_w_n_ver1(mtcars, "cyl", "mpg") 


stripchart_w_n_ver2 <- function(data, x.var, y.var) { 
    x <- factor(data[, x.var]) 
    y <- data[, y.var] 
# Need to set the second element of mgp to 1.5 
# to allow room for two lines for the x-axis tick labels. 
    o.par <- par(mgp = c(3, 1.5, 0)) 
# Need to call plot.default() instead of plot because 
# plot() produces boxplots when x is a factor. 
    plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var) 
    n <- sapply(split(y, x), length) 
    levels.x <- levels(x) 
    axis(1, at = 1:length(levels.x), labels = paste0(levels.x, "\nN=", n)) 

stripchart_w_n_ver2(mtcars, "cyl", "mpg") 

Example using axis()

虽然这是基础R很容易的事,它是在GGPLOT2 maddingly复杂,因为它是很难得到的数据被用来产生情节,虽然功能相当于axis()(例如,scale_x_discrete等),但不存在与mtext()等效的功能,可让您轻松地将文本放置在边距内的指定坐标处。

我试着使用stat_summary()函数中的内置函数来计算样本大小(即fun.y = "length"),然后将该信息放在x轴刻度标签上,但据我所知,不能提取样本然后用函数scale_x_discrete()以某种方式将它们添加到x轴刻度标签中,则必须告知stat_summary()您希望使用哪种几何。您可以设置geom="text",但您必须提供标签,并且要点是标签应该是样本大小的值,这是stat_summary()正在计算的值,但您无法获得(您也可以得到)指定要放置文本的位置,并且很难找出将它放在哪里,以便它位于x轴刻度标签的正下方)。


StatN <- ggproto("StatN", Stat, 
    required_aes = c("x", "y"), 
    compute_group = function(data, scales) { 
    y <- data$y 
    y <- y[!is.na(y)] 
    n <- length(y) 
    data.frame(x = data$x[1], y = min(y), label = paste0("n=", n)) 

stat_n <- function(mapping = NULL, data = NULL, geom = "text", 
    position = "identity", inherit.aes = TRUE, show.legend = NA, 
     na.rm = FALSE, ...) { 
    ggplot2::layer(stat = StatN, mapping = mapping, data = data, geom = geom, 
     position = position, inherit.aes = inherit.aes, show.legend = show.legend, 
     params = list(na.rm = na.rm, ...)) 

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point() + stat_n() 

ggstripchart <- function(data, x.name, y.name, 
    point.params = list(), 
    x.axis.params = list(labels = levels(x)), 
    y.axis.params = list(), ...) { 
    if(!is.factor(data[, x.name])) 
    data[, x.name] <- factor(data[, x.name]) 
    x <- data[, x.name] 
    y <- data[, y.name] 
    params <- list(...) 
    point.params <- modifyList(params, point.params) 
    x.axis.params <- modifyList(params, x.axis.params) 
    y.axis.params <- modifyList(params, y.axis.params) 

    point <- do.call("geom_point", point.params) 

    stripchart.list <- list(
     theme(legend.position = "none") 

    n <- sapply(split(y, x), length) 
    x.axis.params$labels <- paste0(x.axis.params$labels, "\nN=", n) 
    x.axis <- do.call("scale_x_discrete", x.axis.params) 
    y.axis <- do.call("scale_y_continuous", y.axis.params) 
    stripchart.list <- c(stripchart.list, x.axis, y.axis)   

    ggplot(data = data, mapping = aes_string(x = x.name, y = y.name)) + stripchart.list 

ggstripchart(mtcars, "cyl", "mpg") 

ggstripchart(mtcars, "cyl", "mpg") + facet_wrap(~am) 


我已经更新了EnvStats 包为包括stat称为stat_n_text这将增加每个唯一X - 值以下样本大小(的独特ý - 值的数目)。有关更多信息和示例列表,请参阅help filestat_n_text。下面是一个简单的例子:


p <- ggplot(mtcars, 
    aes(x = factor(cyl), y = mpg, color = factor(cyl))) + 
    theme(legend.position = "none") 

p + geom_point() + 
    stat_n_text() + 
    labs(x = "Number of Cylinders", y = "Miles per Gallon") 

pgg = function(dat, x, y, facet=NULL, nudge=0.17) { 

    # Convert x-variable to a factor 
    dat[,x] = as.factor(dat[,x]) 

    # Plot points 
    p = ggplot(dat, aes_string(x, y)) + 
    geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw() 

    # Summarise data to get counts by x-variable and (if present) facet variables 
    dots = lapply(c(facet, x), as.symbol) 
    nn = dat %>% group_by_(.dots=dots) %>% tally 

    # If there are facets, add them to the plot 
    if (!is.null(facet)) { 
    p = p + facet_grid(paste("~", paste(facet, collapse="+"))) 

    # Add counts as text labels 
    p = p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)), 
        y=min(dat[,y]) - nudge*1.05*diff(range(dat[,y])), 
        colour="grey20", size=3.5) + 

    # Turn off clipping and return plot 
    p <- ggplot_gtable(ggplot_build(p)) 
    p$layout$clip[p$layout$name=="panel"] <- "off" 


pgg(mtcars, "cyl", "mpg") 
pgg(mtcars, "cyl", "mpg", facet=c("am","vs")) 

pgg = function(dat, x, y, facet_r=NULL, facet_c=NULL) { 

    # Convert x-variable to a factor 
    dat[,x] = as.factor(dat[,x]) 

    # Plot points 
    p = ggplot(dat, aes_string(x, y)) + 
    geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw() 

    # Summarise data to get counts by x-variable and (if present) facet variables 
    dots = lapply(c(facet_r, facet_c, x), as.symbol) 
    nn = dat %>% group_by_(.dots=dots) %>% tally 

    # If there are facets, add them to the plot 
    if (!is.null(facet_r) | !is.null(facet_c)) { 

    facets = paste(ifelse(is.null(facet_r),".",facet_r), " ~ " , 

    p = p + facet_grid(facets) 

    # Add counts as text labels 
    p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)), 
       y=min(dat[,y]) - 0.15*min(dat[,y]), colour="grey20", size=3) + 
    scale_y_continuous(limits=range(dat[,y]) + c(-0.1*min(dat[,y]), 0.01*max(dat[,y]))) 

pgg(mtcars, "cyl", "mpg") 
pgg(mtcars, "cyl", "mpg", facet_c="am") 
pgg(mtcars, "cyl", "mpg", facet_c="am", facet_r="vs") 

非常感谢本您的帮助!在发布我的问题后,我已经想出了如何根据第二个建议将样本大小放置在绘图面板的底部。我几乎完成了新的统计函数和geoms,它们将按照我的要求做,并将这些函数合并到我的EnvStats包的下一个版本中(当我这样做时,将在这里发布)。再次感谢您的帮助和建议! –