2015-11-05 118 views
4

我在使用python中的pandas更改现有DataFrame中的标题行时遇到问题。在导入熊猫和csv文件后,我将标题行设置为None,以便在移调后能够删除重复的日期。但是,这留下了我不想要的行标题(实际上是一个索引列)。如何更改Python数据框中的标题行

df = pd.read_csv(spreadfile, header=None) 

df2 = df.T.drop_duplicates([0], take_last=True) 
del df2[1] 

indcol = df2.ix[:,0] 
df3 = df2.reindex(indcol) 

但是,上述不具代表性的代码却失败了两项。索引列现在是必需的,但所有条目现在都是NaN。我对python的理解还不足以认识python在做什么。下面所需的输出是我所需要的,任何帮助将不胜感激!前

DF2重新索引:

重新索引之后
 0    2    3    4    5 
0  NaN XS0089553282 XS0089773484 XS0092157600 XS0092541969 
1 01-May-14   131.7   165.1   151.8   88.9 
3 02-May-14   131   164.9   151.7   88.5 
5 05-May-14   131.1   165   151.8   88.6 
7 06-May-14   129.9   163.4   151.2   87.1 

DF2:

   0 2 3 4 5 
0         
NaN  NaN NaN NaN NaN NaN 
01-May-14 NaN NaN NaN NaN NaN 
02-May-14 NaN NaN NaN NaN NaN 
05-May-14 NaN NaN NaN NaN NaN 
06-May-14 NaN NaN NaN NaN NaN 

DF2期望:直接

 XS0089553282 XS0089773484 XS0092157600 XS0092541969 
01-May-14   131.7   165.1   151.8   88.9 
02-May-14   131   164.9   151.7   88.5 
05-May-14   131.1   165   151.8   88.6 
06-May-14   129.9   163.4   151.2   87.1 

回答

2

ASIGN:

indcol = df2.ix[:,0] 
df2.columns = indcol 

reindex问题是,它会根据您的DF的现有索引和列值,以便您顺利通过新的列值,为什么你得到所有NaN小号

更简单的方法不存在,因此你在做什么努力做到:

In [147]: 
# take the cols and index values of interest 
cols = df.loc[0, '2':] 
idx = df['0'].iloc[1:] 
print(cols) 
print(idx) 

2 XS0089553282 
3 XS0089773484 
4 XS0092157600 
5 XS0092541969 
Name: 0, dtype: object 

1 01-May-14 
3 02-May-14 
5 05-May-14 
7 06-May-14 
Name: 0, dtype: object 

In [157]: 
# drop the first row and the first column 
df2 = df.drop('0', axis=1).drop(0) 
# overwrite the index values 
df2.index = idx.values 
df2 

Out[157]: 
       2  3  4  5 
01-May-14 131.7 165.1 151.8 88.9 
02-May-14 131 164.9 151.7 88.5 
05-May-14 131.1 165 151.8 88.6 
06-May-14 129.9 163.4 151.2 87.1 

In [158]: 
# now overwrite the column values  
df2.columns = cols.values 
df2 

Out[158]: 
      XS0089553282 XS0089773484 XS0092157600 XS0092541969 
01-May-14  131.7  165.1  151.8   88.9 
02-May-14   131  164.9  151.7   88.5 
05-May-14  131.1   165  151.8   88.6 
06-May-14  129.9  163.4  151.2   87.1 
0
In [310]: 
cols = df.iloc[0 , 1:] 
cols 
Out[310]: 
1 XS0089553282 
2 XS0089773484 
3 XS0092157600 
4 XS0092541969 
Name: 0, dtype: object 

In [311]: 
df.drop(0 , inplace=True) 
df 
Out[311]: 
      0 1  2   3 4 
1 01-May-14 131.7 165.1 151.8 88.9 
2 02-May-14 131  164.9 151.7 88.5 
3 05-May-14 131.1 165  151.8 88.6 
4 06-May-14 129.9 163.4 151.2 87.1 

In [312]: 
df.set_index(0 , inplace=True) 
df 

Out[312]: 
    0   1 2   3 4  
01-May-14 131.7 165.1 151.8 88.9 
02-May-14 131  164.9 151.7 88.5 
05-May-14 131.1 165  151.8 88.6 
06-May-14 129.9 163.4 151.2 87.1 

In [315]: 

df 
df.columns = cols 
df 
Out[315]: 
      XS0089553282 XS0089773484 XS0092157600 XS0092541969     
01-May-14 131.7     165.1 151.8   88.9 
02-May-14 131     164.9 151.7   88.5 
05-May-14 131.1     165 151.8   88.6 
06-May-14 129.9     163.4 151.2   87.1 
+0

就地=真对我产生一个错误: 类型错误:降()得到了一个意想不到的关键字参数“就地” – Oaka13

+0

这个错误表示有是名'inplace'没有参数对于'drop'这个方法当然不是这种情况,我不确定对此,你确定你遵循了相同的步骤吗? –

+0

我再次尝试过同样的事情,这很奇怪,因为set_index有inplace参数。 df.drop的唯一参数是标签,轴和级别。 – Oaka13

相关问题