2016-09-27 253 views
2

我读一个CSV文件未找到,frmo我获得这些列:熊猫 - 列的数据帧

encoding = "UTF-8-SIG" 
csv_file = "my/path/to/file.csv" 
fields_cols_mapping = { 
    'brand_id': 'Brand', 
    'custom_dashboard': 'Custom Dashboard LO', 
    'custom_dashboard_isfeatured': 'Custom Dashboard LO - Is Featured', 
    'description': 'LODescription', 
    'is_active': 'TrainingIsActive', 
    'lo_id': 'LOID', 
    'lo_type_id': 'LOType', 
    'timestamp': 'Timestamp', 
    'title': 'LOTitle', 
    'training_version_id': 'TrainingVersion' 
} 

dataframe = pd.read_csv(
     csv_file, 
     encoding=encoding, 
     sep='|', 
     usecols=[unicode(v) for v in fields_cols_mapping.values()], 
     dtype={ k: object for k in fields_cols_mapping.keys() }, 
    ) 

然而,尽管有IPDB检查我发现,所谓的与read_csv解析器不转换列名Custom Dashboard LO – Is Featured

# debug 
> /../../venvs/myvenv/lib/python2.7/site-packages/pandas/io/parsers.py(1140)__init__() 
1138    col_indices = [] 
1139    for u in self.usecols: 
-> 1140     if isinstance(u, string_types): 
1141      col_indices.append(self.names.index(u)) 
1142     else: 

ipdb> self 
<pandas.io.parsers.CParserWrapper object at 0x10b134710> 
ipdb> self.names 
[u'LOType', u'LOID', u'LOTitle', u'TrainingVersion', u'LODescription', u'TrainingIsActive', u'Custom Dashboard LO', u'Brand',  u'Custom Dashboard LO \u2013 Is Featured', u'Timestamp'] 

有没有人对我应该做什么有什么建议?

回答

0

谢谢。我改变了字典值,但:

In [130]: dataframe = pd.read_csv(
    ...:    lo_csv_path, 
    ...:    encoding=encoding_l, 
    ...:    sep='|', 
    ...:    usecols=[unicode(v) for v in fields_cols_mapping.values()], 
    ...:    dtype={ k: object for k in fields_cols_mapping.keys() }, 
    ...:   ) 
--------------------------------------------------------------------------- 
UnicodeDecodeError      Traceback (most recent call  last) 
<ipython-input-130-670241506984> in <module>() 
     3    encoding=encoding_l, 
     4    sep='|', 
----> 5    usecols=[unicode(v) for v in fields_cols_mapping.values()], 
     6    dtype={ k: object for k in fields_cols_mapping.keys() }, 
     7  ) 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 20: ordinal not in range(128) 
1

你的问题是数据框中的破折号与字典中的破折号不一样。数据框中的一个是短划线(\u2013),而字典中的一个是连字符(\u2010)。它们看起来相似,但它们不是同一个字符,所以字符串不匹配。