2016-06-07 33 views
0

我试图做一个图表,显示不同年龄组中有18岁以下孩子的男性和女性的比例。我想要一个图表,其中有两个栏(一个表示男性,一个表示女性)每个年龄组并排;我希望酒吧两个显示底部有孩子的比例,而不是顶部(堆积酒吧)。我无法弄清楚如何在ggplot2中制作这样的图表,并且非常感谢您的建议。您是否可以在ggplot2中的多面条形图中更改因子的顺序?

我计算使用dplyr我的分组统计:

kid18summary <- marsub %>% 
group_by(AgeGroup, sex, kid_under_18) %>% 
summarise(n=n()) %>% 
mutate(freq = n/sum(n)) 

其产生这样的:

dput(kid18summary) 
structure(list(AgeGroup = c("Age<40", "Age<40", "Age<40", "Age<40", 
"Age41-49", "Age41-49", "Age41-49", "Age41-49", "Age50-64", "Age50-64", 
"Age50-64", "Age50-64"), sex = structure(c(1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("Male", "Female"), class = "factor"), 
    kid_under_18 = c("No", "Yes", "No", "Yes", "No", "Yes", "No", 
    "Yes", "No", "Yes", "No", "Yes"), freq = c(0.625, 0.375, 
    0.636833046471601, 0.363166953528399, 0.349557522123894, 
    0.650442477876106, 0.444897959183673, 0.555102040816327, 
    0.724852071005917, 0.275147928994083, 0.819548872180451, 
    0.180451127819549)), .Names = c("AgeGroup", "sex", "kid_under_18", 
"freq"), class = c("grouped_df", "tbl_df", "tbl", "data.frame" 
), row.names = c(NA, -12L), vars = list(AgeGroup, sex), drop = TRUE, indices = list(
    0:1, 2:3, 4:5, 6:7, 8:9, 10:11), group_sizes = c(2L, 2L, 
2L, 2L, 2L, 2L), biggest_group_size = 2L, labels = structure(list(
    AgeGroup = c("Age<40", "Age<40", "Age41-49", "Age41-49", 
    "Age50-64", "Age50-64"), sex = structure(c(1L, 2L, 1L, 2L, 
    1L, 2L), .Label = c("Male", "Female"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L), vars = list(AgeGroup, sex), drop = TRUE, .Names = c("AgeGroup", 
"sex"))) 

我可以积的人口比例各年龄组没有谁下的孩子在性别18:

ggplot(kid18summary, aes(x = factor(AgeGroup), y = freq, fill = factor(sex)), color = factor(sex)) + 
    geom_bar(position = "dodge", stat = "identity") + scale_y_continuous(labels = percent) 

或者我可以制作一个多面条形图,它更接近于我我喜欢同时显示“是”和“否”,即使百分比合计为100,因为我认为比较负面空间比较彩色条更容易。唯一的问题是,不管我做什么,“不”在底部,顶部是“是”,我反过来想要它。 (理想情况下,我真的很喜欢为男性和女性设计不同的颜色,对于有孩子的男性来说,深蓝色,对于没有男性的来说,浅蓝色;对于没有孩子的女性来说,深红色,对于女性, )

我试图以各种方式改变因素的顺序,都完全不成功。

ggplot2 documentation建议,我试图改变直接因子水平的顺序:

kid18summary$kid_under_18 < as.factor(kid18summary$kid_under_18) 
o <- c("Yes", "No") # which I've also changed to ("No", "Yes"), which makes no difference; the order of the Yes and No in the legend changes, but the "Yes" bars stay on top 
kid18summary$kid_under_18 <- factor(kid18summary$kid_under_18, levels = o) 

kid18summary $ kid_under_18 < - 因子(kid18summary $ kid_under_18,水平(kid18summary $ kid_under_18)[C( “是”,“否”)])#改变为[c(“否”,“是”)]也仅改变图例的顺序

我试过了在另一个问题中提出的答案,订购因子:

kid18summary <- transform(kid18summary, stack.ord = factor(kid_under_18, levels = c("Yes", "No"), ordered = TRUE)) 
ggplot(kid18summary, aes(x = factor(sex), y = freq, fill = factor(stack.ord)), color = factor(stack.ord)) + geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + facet_wrap(~AgeGroup, nrow=1) 

或者只是增加一个虚拟变量:

kid18summary$orderfactor <- "NA" 
kid18summary$orderfactor[kid18summary$kid_under_18 == "Yes"] <- 0 
kid18summary$orderfactor[kid18summary$kid_under_18 == "No"] <- 1 
ggplot(kid18summary, aes(x = factor(sex), y = freq, fill = factor(orderfactor)), color = factor(orderfactor)) + geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + facet_wrap(~AgeGroup, nrow=1) 

所有这一切给了我很多,我可以切换是的颜色,没有集团在酒吧不同的方式,但实际上没有哪一个是上最佳。 Plot1Plot2

+0

设置您的填充因子水平顺序后,您需要按该数据集因子。请参阅[此答案](http://stackoverflow.com/a/34637703/2461552) – aosmith

+0

此外,不同组合的不同颜色可能是可行的。如果仍然感兴趣,你可能会问一个问题后,看看[这个问题/答案](http://stackoverflow.com/questions/16026215/generate-ggplot2-boxplot-with-different-colours-for-multiple-组) – aosmith

回答

1

与艾欧史密斯提出的答案,我结束了以下,这不正是我想要的东西:

ggplot(arrange(df, kid_under_18), aes(x = factor(sex), y = freq, fill = interaction(sex, factor(kid_under_18))), color = factor(kid_under_18)) + 
geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + 
facet_wrap(~AgeGroup, nrow=1) 
相关问题