2017-06-26 56 views
1

这是一个简单的问题,但我难以理解通过GGPLOT2要求的格式:[R GGPLOT2按百分比堆叠barplot几个分类变量

我有以下的R中data.table

print(dt) 
    ID  category  A B C  totalABC                                                           
1: 10  group1  1 3 0  4                                                           
2: 11  group1  1 11 1  13                                                           
3: 12  group2  15 20 2  37                                                           
4: 13  group2  6 12 2  20                                                           
5: 14  group2  17 83 6  106 
... 

我的目标是创建一个正比堆积条形图如在这个例子中:https://rpubs.com/escott8908/RGC_Ch3_Gar_Graphs

其中X/totalABC,其中X是任一category_type A,B或C.我也想PERFO的百分比rm按类别分类,例如X轴值应该是group1,group2等。

作为具体例子,在group1的情况下,总共有4 + 13 = 17个元素。

的百分比是percent_A = 11.7%, percent_B = 82.3%, percent_C = 5.9%

正确的解决方案GGPLOT2似乎是:

library(ggplot2) 
pp = ggplot(dt, aes(x=category, y=percentage, fill=category_type)) +                                                        
      geom_bar(position="dodge", stat="identity") 

我的困惑:我怎么会创建一个对应三个分类值单percentage列?

如果以上错误,我将如何格式化我的data.table以创建堆叠的barplot?

+0

使用'位置=“补” '而不是'position =“闪避” –

回答

1

这里有一个解决方案:

require(data.table) 
require(ggplot2) 
require(dplyr) 
melt(dt,measure.vars = c("A","B","C"),variable.name = "groups",value.name = "nobs") %>% ggplot(aes(x=category,y=nobs,fill=groups))+geom_bar(stat = "identity",position="fill") 
1

您可以使用下面的代码:

melt(data.frame(#melt to get each variable (i.e. A, B, C) in a single row 
    dt[,-1] %>% #get rid of ID 
      group_by(category) %>% #group by category 
        summarise_each(funs(sum))), #get the summation for each variable 
        id.vars=c("category", "totalABC")) %>% 
ggplot(aes(x=category,y=value/totalABC,fill=variable))+ #define the x and y 
     geom_bar(stat = "identity",position="fill") + #make the stacked bars 
       scale_y_continuous(labels = scales::percent) #change y axis to % format 

这将绘制:

                                                                    enter image description here

数据:

dt <- structure(list(ID = 10:14, category = structure(c(1L, 1L, 2L, 
    2L, 2L), .Label = c("group1", "group2"), class = "factor"), A = c(1L, 
    1L, 15L, 6L, 17L), B = c(3L, 11L, 20L, 12L, 83L), C = c(0L, 1L, 
    2L, 2L, 6L), totalABC = c(4L, 13L, 37L, 20L, 106L)), .Names = c("ID", 
    "category", "A", "B", "C", "totalABC"), row.names = c(NA, -5L 
    ), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000100788>) 

如果你想坚持你的绘图代码,该怎么办?

在这种情况下,你可以用它来获得百分比:

df <- melt(data.frame(#melt to get each variable (i.e. A, B, C) in a single row 
     dt[,-1] %>% #get rid of ID 
      group_by(category) %>% #group by category 
      summarise_each(funs(sum))), #get the summation for each variable 
       id.vars=c("category", "totalABC")) %>% 
       mutate(percentage = dtf$value*100/dtf$totalABC) 

但需要修改ggplot正确地得到堆积条形图:

#variable is the column carrying category_type 
#position dodge make the bars to be plotted next to each other 
#while fill makes the stacked bars 
ggplot(df, aes(x=category, y=percentage, fill=variable)) +   
     geom_bar(position="fill", stat="identity") 
+1

谢谢你的解释性评论! – ShanZhengYang

+0

哦,我已经将data.table定义为'dt',而不是'df'。为了使未来的读者保持一致 – ShanZhengYang

+0

在这种情况下,“fill = variable”是什么意思? – ShanZhengYang