2017-09-05 82 views
1

the plotly directions绘制的分布,我想画出类似下面的代码的东西:与长度不均匀

import plotly.plotly as py 
import plotly.figure_factory as ff 

import numpy as np 

# Add histogram data 
x1 = np.random.randn(200) - 2 
x2 = np.random.randn(200) 
x3 = np.random.randn(200) + 2 
x4 = np.random.randn(200) + 4 


# Group data together 
hist_data = [x1, x2, x3, x4] 

group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4'] 

# Create distplot with custom bin_size 
fig = ff.create_distplot(hist_data, group_labels, bin_size = [.1, .25, .5, 1]) 

# Plot! 
py.iplot(fig, filename = 'Distplot with Multiple Bin Sizes') 

不过,我有一个现实世界的数据集是不均匀的样品尺寸(即第1组的计数与组2中的计数不同等)。此外,它是名称 - 值对格式。

下面是一些假的数据来说明:

# Add histogram data 
x1 = pd.DataFrame(np.random.randn(100)) 
x1['name'] = 'x1' 

x2 = pd.DataFrame(np.random.randn(200) + 1) 
x2['name'] = 'x2' 

x3 = pd.DataFrame(np.random.randn(300) - 1) 
x3['name'] = 'x3' 

df = pd.concat([x1, x2, x3]) 
df = df.reset_index(drop = True) 
df.columns = ['value', 'names'] 

df 

正如你所看到的,每个域名(X1,X2,X3)具有不同的数量,也是“名称”一栏是我想什么用作颜色。

有谁知道我怎么可以阴谋策划这个?

FYI在R,它非常简单,我只是简单的叫ggplot,并在aes(fill = names)

任何帮助将不胜感激,谢谢!

回答

2

你可以尝试切片你的数据帧,然后把它放入Ploty中。

fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1]) 

enter image description here

import plotly 
import pandas as pd 
plotly.offline.init_notebook_mode() 
x1 = pd.DataFrame(np.random.randn(100)) 
x1['name']='x1' 

x2 = pd.DataFrame(np.random.randn(200)+1) 
x2['name']='x2' 

x3 = pd.DataFrame(np.random.randn(300)-1) 
x3['name']='x3' 

df=pd.concat([x1,x2,x3]) 
df=df.reset_index(drop=True) 
df.columns = ['value','names'] 
fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1]) 
plotly.offline.iplot(fig, filename='Distplot with Multiple Bin Sizes') 
+0

感谢您一个完美的解决方案。 –

1

plotly的文档中的example工作了不均匀的样本框的尺寸太大:

#!/usr/bin/env python 

import plotly 
import plotly.figure_factory as ff 
plotly.offline.init_notebook_mode() 
import numpy as np 

# data with different sizes 
x1 = np.random.randn(300)-2 
x2 = np.random.randn(200) 
x3 = np.random.randn(4000)+2 
x4 = np.random.randn(50)+4 

# Group data together 
hist_data = [x1, x2, x3, x4] 

# use custom names 
group_labels = ['x1', 'x2', 'x3', 'x4'] 

# Create distplot with custom bin_size 
fig = ff.create_distplot(hist_data, group_labels, bin_size=.2) 

# change that if you don't want to plot offline 
plotly.offline.plot(fig, filename='Distplot with Multiple Datasets') 

以上脚本将产生以下结果:


enter image description here