2017-05-07 79 views
0

我有两列,分类和年份,我试图绘制。我试图将每年的每个分类的总和作为一个多级时间序列图。如何随着时间的推移按类别绘图

ax = data[data.categorical=="cat1"]["categorical"].plot(label='cat1') 
data[data.categorical=="cat2"]["categorical"].plot(ax=ax, label='cat3') 
data[data.categorical=="cat3"]["categorical"].plot(ax=ax, label='cat3') 
plt.xlabel("Year") 
plt.ylabel("Number per category") 
sns.despine() 

但是,我得到一个错误,指出没有数字数据绘图。我正在寻找类似于上面的东西,可能与data[data.categorical=="cat3"]["categorical"].lambda x : (1 for x in data.categorical)

我将使用以下列表作为示例。

categorical = ["cat1","cat1","cat2","cat3","cat2","cat1","cat3","cat2","cat1","cat3","cat3","cat3","cat2","cat1","cat2","cat3","cat2","cat2","cat3","cat1","cat1","cat1","cat3"] 

year = [2013,2014,2013,2015,2014,2014,2013,2014,2014,2015,2015,2013,2014,2014,2013,2014,2015,2015,2015,2013,2014,2015,2013] 

我的目标是获得类似于下面的图片 enter image description here

+1

您能提供完整的回溯错误和一些样本数据吗? – Chuck

+0

是的,抱歉。现在应该更清楚了。 – Min

+0

它没有任何意义:'数据[data.categorical ==“CAT2”] [“绝对”]'是一系列字符串,只有'“CAT2”'作为值。你不能策划这一点。 – IanS

回答

0

您是否尝试过GROUPBY的东西吗?

df.groupby(["year","categorical"]).count() 
+0

是的,我做了较早,但plt.plot(df.groupby([“年”,“分类])。COUNT())返回'元组‘对象不是可调用的’ – Min

+0

,但你应该得到一个数据框,你可以工作。例如,在groupby之前添加df [“count”])df [“categorical”],而不是在groupby之后选择cat1 over loc并尝试打印那个 – Herka

1

我不愿称之为“解决方案”,因为它基本上是一个基本的熊猫功能,这是在哪里找到您放置在时间序列图相同的文档中解释汇总您帖子。但看到围绕groupby出现一些混淆和绘图,演示可能有助于清理事情。

我们可以使用两个电话groupby()
使用count聚合,第一个groupby()可获得每年类别出现次数。
第二个groupby()用于绘制每个类别的时间序列。

要启动,产生的样本数据帧:

import pandas as pd 
categorical = ["cat1","cat1","cat2","cat3","cat2","cat1","cat3","cat2", 
       "cat1","cat3","cat3","cat3","cat2","cat1","cat2","cat3", 
       "cat2","cat2","cat3","cat1","cat1","cat1","cat3"] 
year = [2013,2014,2013,2015,2014,2014,2013,2014,2014,2015,2015,2013, 
     2014,2014,2013,2014,2015,2015,2015,2013,2014,2015,2013] 
df = pd.DataFrame({'categorical':categorical, 
        'year':year}) 

    categorical year 
0  cat1 2013 
1  cat1 2014 
       ... 
21  cat1 2015 
22  cat3 2013 

每类现在得到计数,每年:

# reset_index() gives a column for counting, after groupby uses year and category 
ctdf = (df.reset_index() 
      .groupby(['year','categorical'], as_index=False) 
      .count() 
      # rename isn't strictly necessary here, it's just for readability 
      .rename(columns={'index':'ct'}) 
     ) 

    year categorical ct 
0 2013  cat1 2 
1 2013  cat2 2 
2 2013  cat3 3 
3 2014  cat1 5 
4 2014  cat2 3 
5 2014  cat3 1 
6 2015  cat1 1 
7 2015  cat2 2 
8 2015  cat3 4 

最后,对于每个类别,由色键情节的时间序列:

from matplotlib import pyplot as plt 
fig, ax = plt.subplots() 

# key gives the group name (i.e. category), data gives the actual values 
for key, data in ctdf.groupby('categorical'): 
    data.plot(x='year', y='ct', ax=ax, label=key) 

time series plot by category

相关问题