我试着想象的分布函数两个柱状图之间的差异等方面的差异在以下两条曲线:在彼此的顶部如何可视化概率分布函数之间的差异?
当差异大,你可以只绘制两条曲线和如上所述填补差异,但是当差异变得非常小时,这是麻烦的。绘制此的另一种方式,正在密谋的差异本身如下:
不过,这似乎很难读给大家看这样的图是第一次,所以我想知道:有没有用其他方式可以看到两个分布函数之间的区别?
我试着想象的分布函数两个柱状图之间的差异等方面的差异在以下两条曲线:在彼此的顶部如何可视化概率分布函数之间的差异?
当差异大,你可以只绘制两条曲线和如上所述填补差异,但是当差异变得非常小时,这是麻烦的。绘制此的另一种方式,正在密谋的差异本身如下:
不过,这似乎很难读给大家看这样的图是第一次,所以我想知道:有没有用其他方式可以看到两个分布函数之间的区别?
我想也许这可能是一个选择,只是简单地结合你的两个命题,同时扩大差异,使其可见。
接下来是试图用ggplot2来做到这一点。其实这比我最初想象的要多一点,我对结果绝对不满意;但也许它有帮助。评论和改进非常受欢迎。
library(ggplot2)
library(dplyr)
## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
mutate(sample = as.factor(sample))
## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)
## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2),
side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
geom_line(aes(x = x, y = 5 * abs(ydiff), colour = side)) +
geom_area(aes(x = x, y = 5 * abs(ydiff), fill = side, alpha = 0.4))
g3 <- g2 +
geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
xlim(0, 10) +
guides(alpha = FALSE, colour = FALSE) +
ylab("Curves: density\n Shaded area: 5 * difference of densities") +
scale_fill_manual(name = "samples", labels = 1:2, values = gg_color_hue(2)) +
scale_colour_manual(limits = list(1, 2, FALSE, TRUE), values = rep(gg_color_hue(2), 2))
print(g3)
正如意见建议的@Gregor,这里有一个版本,那么下面的海誓山盟两个独立的地块,但共享相同x轴缩放。至少传说应该明显地被调整。
library(ggplot2)
library(dplyr)
library(grid)
## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]
}
## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
mutate(sample = as.factor(sample))
## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)
## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2),
side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
xlim(0, 10) +
guides(alpha = FALSE, fill = FALSE)
g3 <- ggplot(df2) +
geom_line(aes(x = x, y = abs(ydiff), colour = side)) +
geom_area(aes(x = x, y = abs(ydiff), fill = side, alpha = 0.4)) +
guides(alpha = FALSE, fill = FALSE)
## See [3]
grid.draw(rbind(ggplotGrob(g2), ggplotGrob(g3), size="last"))
...或abs(ydiff)
通过ydiff
在第二情节的建设代替:
来源:SO answer 3
我觉得这是一个有趣的问题,但它对于SO来说太开放和基于观点。 (而且它也不是真的关于编程。)也许这将是交叉验证的主题? – Gregor 2015-03-31 21:53:08
只是为了确保我们谈论的是同样的事情:您想要通过考虑实现所述概率分布的直方图来可视化概率密度函数,对吗?因为累积分布函数是非常不同的... – jhin 2015-03-31 22:34:59
示例数据集将会很好。 – jhin 2015-03-31 22:35:09