2017-12-27 1003 views
2

我有一个数据帧,它有2列:genre和release_year。每年都有多种流派。格式如下:Python:按多列分组的值线图

genre release_year 
Action 2015 
Action 2015 
Adventure 2015 
Action 2015 
Action 2015 

我需要使用Pandas/Python绘制所有类型的变化。

df = pd.read('genres.csv') 

df.shape 
(53975, 2) 


df_new = df.groupby(['release_year', 'genre'])['genre'].count() 

这会导致以下分组。

release_year genre   
1960  Action    8 
      Adventure   5 
      Comedy    8 
      Crime    2 
      Drama    13 
      Family    3 
      Fantasy    2 
      Foreign    1 
      History    5 
      Horror    7 
      Music    1 
      Romance    6 
      Science Fiction  3 
      Thriller    6 
      War     2 
      Western    6 
1961  Action    7 
      Adventure   6 
      Animation   1 
      Comedy    10 
      Crime    2 
      Drama    16 
      Family    5 
      Fantasy    2 
      Foreign    1 
      History    3 
      Horror    3 
      Music    2 
      Mystery    1 
      Romance    7 
          ... 

我需要为多年来流派特征的变化绘制线图。即我必须有一个循环,这可以帮助我绘制多年来的各种流派。例如,

df_action = df.query('genre == "Action"') 
result_plot = df_action.groupby(['release_year','genre'])['genre'].count() 
result_plot.plot(figsize=(10,10)); 

显示类型“行动”的情节。同样,而不是分别绘制每个流派我需要有一个相同的循环。

我该怎么做?任何人都可以帮助我吗?

我试过以下,但它不起作用。

genres = ["Action", "Adventure", "Western", "Science Fiction", "Drama", 
    "Family", "Comedy", "Crime", "Romance", "War", "Mystery", 
    "Thriller", "Fantasy", "History", "Animation", "Horror", "Music", 
    "Documentary", "TV Movie", "Foreign"] 

for g in genres: 
    #df_new = df.query('genre == "g"') 
    result_plot = df.groupby(['release_year','genre'])['genre'].count() 
    result_plot.plot(figsize=(10,10)); 

回答

2

怎么样开拆你的串联和一个命令绘制的一切:

In [36]: s 
Out[36]: 
release_year genre 
1960.0  Action  8 
       Adventure  5 
       Comedy  8 
       Crime   2 
       Drama  13 
       Family  3 
       Fantasy  2 
       Foreign  1 
       History  5 
       Horror  7 
          .. 
1961.0  Crime   2 
       Drama  16 
       Family  5 
       Fantasy  2 
       Foreign  1 
       History  3 
       Horror  3 
       Music   2 
       Mystery  1 
       Romance  7 
Name: count, Length: 30, dtype: int64 

In [37]: s.unstack() 
Out[37]: 
genre   Action Adventure Animation Comedy Crime Drama Family Fantasy Foreign History Horror Music Mystery Romance \ 
release_year 
1960.0   8.0  5.0  NaN  8.0 2.0 13.0  3.0  2.0  1.0  5.0  7.0 1.0  NaN  6.0 
1961.0   7.0  6.0  1.0 10.0 2.0 16.0  5.0  2.0  1.0  3.0  3.0 2.0  1.0  7.0 

genre   Science Fiction Thriller War Western 
release_year 
1960.0     3.0  6.0 2.0  6.0 
1961.0     NaN  NaN NaN  NaN 

绘图:

s.unstack().plot() 
2
df_new.unstack().T.plot(kind='bar') 

我选择柱状图中,你可以改变你需要what ever

PS:你可以考虑crosstab而不是groupby

pd.crosstab(df.genre,df.release_year).plot(kind='bar') 

enter image description here

0

我推荐使用seaborn这将有助于避免数据帧的处理绘图之前。您可以通过运行pip install seaborn来安装它。它有标准的各种情节的简单API:

RELEASE_YEAR VS流派

import seaborn as sns 
sns.countplot(x='release_year', hue='genre', data=df) 

release_year vs genre

流派VS RELEASE_YEAR

import seaborn as sns 
sns.countplot(x='genre', hue='release_year', data=df) 

genre vs release_year