2016-09-20 85 views
4

我在练习从谷歌财经股市数据导入熊猫数据帧列中的数据时,Python的错误:调用从熊猫数据框中

import pandas as pd 
from pandas import Series 

path = 'http://www.google.com/finance/historical?cid=542029859096076&startdate=Sep+22%2C+2001&enddate=Sep+20%2C+2016&num=30&ei=3HvhV4n3D8XGmAGp4q74Ag&output=csv' 
df = pd.read_csv(path) 

到目前为止好,和DF也显示了完整的数据集,我需要。

但是,调用特定列的时候,像

df['Date'] 

的Python示出下面的错误代码:

Traceback (most recent call last): 

    File "<ipython-input-31-cb486dd31fbc>", line 1, in <module> 
    df['Date'] 

    File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__ 
    return self._getitem_column(key) 

    File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column 
    return self._get_item_cache(key) 

    File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache 
    values = self._data.get(item) 

    File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get 
    loc = self.items.get_loc(item) 

    File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc 
    return self._engine.get_loc(self._maybe_cast_indexer(key)) 

    File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154) 

    File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018) 

    File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368) 

    File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322) 

KeyError: 'Date' 

在另一方面,其它的塔,例如DF [ '高']原来没问题。无论如何,我可以解决这个问题吗?

+1

当我尝试它工作正常,正确分析。 – ayhan

+0

(基于MaxU的回答,它可能正常工作,因为我使用Python 3.5)。 – ayhan

+0

@ayhan,did'df ['Date']'为你工作吗?它不应该也在Python 3.5下工作... – MaxU

回答

5

这个CSV文件包含BOM (Byte Order Mark) signature,所以试试这种方法:

df = pd.read_csv(path, encoding='utf-8-sig') 

如何可以很容易地找出这个问题(感谢@jezrael's hint):

In [11]: print(df.columns.tolist()) 
['\ufeffDate', 'Open', 'High', 'Low', 'Close', 'Volume'] 

,并在第一列注意

注意:作为@ayhan已经注意到,从版本0.1开始9.0 Pandas will take care of it automatically

的Bug pd.read_csv()造成BOM文件被不忽略BOM GH4793

+0

嘿谢谢!这样可以很好地工作。您能否更详细地解释一下为什么它会产生差异,或者指出一些关于BOM签名的来源?再次感谢。 –

+3

更好看,如果使用'print(df.columns.tolist())',+1 – jezrael