2016-11-08 164 views
4

也许这是一个愚蠢的想法,或者它可能是一个脑波。我有4种不同物种的脂类的数据集。数据是成比例的,总和是1000.我想要显示每个物种中每个类别的比例差异。一般来说,一个堆叠的酒吧将成为这里的途径,但有几个类,并且它变得无法解释,因为只有底层类共享一个基线(见下文)。 Traditional stacked bar小提琴情节形状的堆积条形图

这似乎是一个糟糕的群体的最佳选择,馅饼和甜甜圈的图表是冷不防的。 然后,我受到创作Symmetrical, violin plot-like histogram?的启发,创建了一种堆叠分布式小提琴剧情(见下文)。 Stacked distribution violin

我想知道这是否可以以某种方式转换为堆叠小提琴,使每个段代表整个变量。就我的数据而言,物种'A和D在TAG细分市场将是'肥',而在STEROL细分市场则是'瘦'。这样的比例水平描绘,并始终有一个共同的基准。思考?

数据:

structure(list(Sample = c("A", "A", "A", "B", "B", "B", "C", 
"C", "C", "D", "D"), WAX = c(83.7179798600773, 317.364310355766, 
20.0147496567679, 93.0194886619568, 78.7886829173726, 79.3445694220837, 
91.0020522660375, 88.1542855137005, 78.3313314713951, 78.4449591023115, 
236.150030864875), TAG = c(67.4640254081232, 313.243238213156, 
451.287867136276, 76.308508343969, 40.127554151831, 91.1910102221636, 
61.658394708941, 104.617259648364, 60.7502685224869, 80.8373642262043, 
485.88633863193), FFA = c(41.0963382465756, 149.264019576272, 
129.672579626868, 51.049208042632, 13.7282635713804, 30.0088572108344, 
47.8878116348504, 47.9564218319094, 30.3836532949481, 34.8474205480686, 
10.9218910757234), `DAG1,2` = c(140.35876401479, 42.4556176551009, 
0, 0, 144.993393432366, 136.722412691012, 0, 140.027443968931, 
137.579074961889, 129.935353616471, 46.6128854387559), STEROL = c(73.0144390122309, 
24.1680929257195, 41.8258704279641, 78.906816661241, 67.5678558060943, 
66.7150537517493, 82.4794113296791, 76.7443442992891, 68.9357008866253, 
64.5444668132533, 29.8342694785768), AMPL = c(251.446564854412, 
57.8713327050339, 306.155806819949, 238.853696442419, 201.783872969561, 
175.935515655693, 234.169038776536, 211.986239116884, 196.931330316831, 
222.658181144794, 73.8944654414811), PE = c(167.99718650752, 
43.3839497916674, 22.1937177530762, 150.315149187176, 153.632530721031, 
141.580725482114, 164.215442147509, 155.113323256627, 143.349000132624, 
128.504657216928, 50.6281347160092), PC = c(174.904702096271, 
52.2494387772846, 28.8494085790995, 191.038328534942, 190.183655117756, 
175.33290326259, 199.2632149392, 175.400682364295, 176.64926273487, 
163.075864395099, 66.071984352649), LPC = c(0, 0, 0, 120.508804125665, 
109.194191312608, 103.16895230176, 119.324634197247, 0, 107.09037767833, 
97.151732936871, 0)), class = c("tbl_df", "tbl", "data.frame" 
), row.names = c(NA, -11L), .Names = c("Sample", "WAX", "TAG", 
"FFA", "DAG1,2", "STEROL", "AMPL", "PE", "PC", "LPC")) 
+0

您对每个样品多行。你想用这些做什么?把它们加起来,或者在每个样本 - 变量组合中显示这些值的分布? –

+0

@Jan van der Laan展现出像堆叠酒吧一样的手段。 –

+0

我已经专注于我的回答中的阴谋,甚至没有意识到你有多重价值。您必须首先在ggplot2之外进行聚合。 – Roland

回答

2

这基本上是一个水平条形图:

library(reshape2) 
DFm <- melt(DF, id.vars = "Sample") 
DFm1 <- DFm 
DFm1$value <- -DFm1$value 
DFm <- rbind(DFm, DFm1) 


ggplot(DFm, aes(x = "A", y = value/10, fill = variable, color = variable)) + 
    geom_bar(stat = "identity", position = "dodge") + 
    coord_flip() + 
    theme_minimal() + 
    facet_wrap(~ Sample, nrow = 1, switch = "x") + 
    theme(axis.text = element_blank(), 
     axis.title = element_blank(), 
     panel.grid = element_blank()) 

resulting plot

+0

只是一个想法(也许不是OP想要的),从最大到最小的堆叠会不会更好? – zx8754

+1

@ zx8754我们可以假设这样的订单对所有样品都是一样的吗?如果我们不能,我不会那样做。无论如何,这很容易做到(可以留给读者作为练习)。 – Roland

+0

不,我希望他们按不同的顺序排列,这样它就可以显示每个组中哪个var更“重要”,更像“堆叠玩具”。我知道如果不按顺序比较增值会很难,那么我们可以放一些线。同意,把它作为读者的练习。 – zx8754