2016-11-05 88 views
0

年我TOT他一个类似的一个数据帧:熊猫KeyError异常:为csv文件数据框

BirthYear Sex Area Count 
2015   W  Dhaka 6 
2015   M  Dhaka 3 
2015   W  Khulna 1 
2015   M  Khulna 8 
2014   M  Dhaka 13 
2014   W  Dhaka 20 
2014   M  Khulna 9 
2014   W  Khulna 6 
2013   W  Dhaka 11 
2013   M  Dhaka 2 
2013   W  Khulna 8 
2013   M  Khulna 5 
2012   M  Dhaka 12 
2012   W  Dhaka 4 
2012   W  Khulna 7 
2012   M  Khulna 1 

现在我想创建一个大熊猫条形图,其中只有男&女出生于2015年将被显示。 代码:

df = pd.read_csv('out.csv') 
df=df.reset_index() 
df=df.loc[df["BirthYear"]==2015] 
agg_df = df.groupby(['Sex']).sum() 
agg_df.reset_index(inplace=True) 
piv_df = agg_df.pivot(columns='Sex', values='Count') 
piv_df.plot.bar(stacked=True) 
plt.show() 

和执行后,IDLE显示了这个错误:

Traceback (most recent call last): 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc 
    return self._engine.get_loc(key) 
    File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066) 
    File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930) 
    File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408) 
    File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359) 
KeyError: 'BirthYear' 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "C:/Users/sabid/Dropbox/Freelancing/data visualization python/pie.py", line 8, in <module> 
    df=df.loc[df["StichtagDatJahr"]==2015] 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__ 
    return self._getitem_column(key) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column 
    return self._get_item_cache(key) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache 
    values = self._data.get(item) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get 
    loc = self.items.get_loc(item) 
    File "C:\Users\sabid\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc 
    return self._engine.get_loc(self._maybe_cast_indexer(key)) 
    File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066) 
    File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930) 
    File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408) 
    File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359) 
KeyError: 'BirthYear' 

我来自this link知道,这是因为在“BirthYear”列名,收到了一些头。 但我不知道如何删除标题,并使代码工作。 这是否有任何富有成效的解决方案?

+0

[解决方案](http://stackoverflow.com/a/23733522/5741205)已经在提供的链接,你已经张贴 - 没有你尝试一下? – MaxU

+0

你的意思是“之前的一些标题?”如果你的意思是在字符串的开头有一个空格? – Batman

+0

@MaxU我试过了,但那并没有真正起作用......错误一次又一次地出现 –

回答

1

您可以重命名列。

df.rename(columns=["BirthYear", "Sex", "Area", "Count"], inplace=True) 
+2

我也想用'df.reset_index(drop = True,inplace = True)'替换'df = df.reset_index()''。将drop设置为false将使前一个索引显示为一个名为index的新列。 – Jakub

+0

是的。好点子。 – Batman

+0

@Batman,'df.rename(columns = list)'将产生'TypeError:'list'对象不可调用。测试这个:'df = pd.DataFrame(np.random.rand(3,2),columns = list('ab')); df.rename(columns = ['X','Y'])' – MaxU

0

我假设你想输出是这样的:

Barplot

我不知道这一点,但我认为使用pivot方法搞砸你。您不需要使用pivot,因为agg_df基本上是一个数据透视表。下面是我用来创建图形代码:

import pandas as pd 

# I made this to approximate your CSV file. 
table = { 
    'BirthYear': [2015, 2015, 2015, 2015, 2014, 2014,], 
    'Sex': ['W', 'M', 'W', 'M', 'M', 'W',], 
    'Area': ['Dhaka', 'Dhaka', 'Khulna', 'Khulna', 'Dhaka', 'Dhaka',], 
    'Count': [6, 3, 1, 8, 13, 20] 
} 

df = pd.DataFrame(table) 
df = df.reset_index(drop=True) 

# Select people born in 2015. 
df = df.loc[df["BirthYear"] == 2015] 

# This is basically a pivot table. 
agg_df = df.groupby(['Sex']).sum() 

# Make the plot. 
agg_df['Count'].plot.bar(stacked=True) 
+0

这里的诀窍是指定正确的BOM编码。 –

+0

感谢您的努力。 –