2016-11-26 67 views
2

我正在处理一些我以csv格式从网上下载的数据。原始数据如下所示。如何在没有解析日期字符串的情况下调用pandas read_csv()

Test Data 
"Date","T1","T2","T3","T4","T5","T6","T7","T8" 
"105/11/01","123,855","1,150,909","9.30","9.36","9.27","9.28","-0.06","60", 
"105/11/02","114,385","1,062,118","9.26","9.42","9.23","9.31","+0.03","78", 
"105/11/03","71,350","659,848","9.30","9.30","9.20","9.28","-0.03","42", 

我用下面的代码读取它

import pandas as pd 
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5]) 

我也曾尝试使用

import pandas as pd 
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], keep_date_col=True) 

我总是得到下面的结果

  Date T3 T4 T5 
105/11/01 9.30 9.36 9.27 NaN 
105/11/02 9.26 9.42 9.23 NaN 
105/11/03 9.30 9.30 9.20 NaN 

这是什么我想得到

 Date T3 T4 T5 
105/11/01 9.30 9.36 9.27 
105/11/02 9.26 9.42 9.23 
105/11/03 9.30 9.30 9.20 

正如你可以看到大熊猫治疗日期字符串的数据不是一个组成部分,转移该指数将一个左边这导致最后一列是NaN

我已阅读read_csv()上的熊猫文档,发现它可以用parse_dates,keep_date_col参数解析日期,但有什么办法可以解析日期吗?

+1

我认为你的问题完全是关于数据行,但没有尾随分隔符标题。请参阅http://stackoverflow.com/questions/13719946/python-pandas-trailing-delimiter-confuses-read-csv –

回答

2

这似乎很好地工作:从帮助文档

import pandas as pd 
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], index_col=False) 

df 
#  Date  T3  T4  T5 
#0 105/11/01 9.30 9.36 9.27 
#1 105/11/02 9.26 9.42 9.23 
#2 105/11/03 9.30 9.30 9.20 

而且这样的:

index_col : int or sequence or False, default None 
    Column to use as the row labels of the DataFrame. If a sequence is given, a 
    MultiIndex is used. If you have a malformed file with delimiters at the end 
    of each line, you might consider index_col=False to force pandas to _not_ 
    use the first column as the index (row names)