初始化大熊猫dataframes使用和不使用索引，列产生不同的结果

如果我用下面的方法来构建一个pandas.DataFrame，我得到一个输出（我认为）是奇特：初始化大熊猫dataframes使用和不使用索引，列产生不同的结果

import pandas, numpy 

df = pandas.DataFrame(
    numpy.random.rand(100,2), index = numpy.arange(100), columns = ['s1','s2']) 
smoothed = pandas.DataFrame(
    pandas.ewma(df, span = 21), index = df.index, columns = ['smooth1','smooth2'])

当我去看看在平滑值，我得到：

>>> smoothed.tail() 
smooth1 smooth2 
95  NaN  NaN 
96  NaN  NaN 
97  NaN  NaN 
98  NaN  NaN 
99  NaN  NaN

这似乎是它下面的零散调用，产生不同的结果的汇总：

smoothed2 = pandas.DataFrame(pandas.ewma(df, span = 21)) 
smoothed2.index = df.index 
smoothed2.columns = ['smooth1','smooth2']

再次使用DataFrame.tail()调用我得到：

>>> smoothed2.tail() 
smooth1 smooth2 
95 0.496021 0.501153 
96 0.506118 0.507541 
97 0.516655 0.544621 
98 0.520212 0.543751 
99 0.518170 0.572429

任何人都可以提供理由，为什么这些数据帧到施工方法应有所不同？

来源

2012-02-23 benjaminmgross

ewma(df, span=21)的结果已经是一个DataFrame，所以当您将它传递给DataFrame构造函数以及列列表时，它将“选择”您传递的列。在这种特殊情况下很难打破标签和数据之间的联系。如果你这样做：

In [23]: smoothed = DataFrame(ewma(df, span = 21).values, index=df.index, columns = ['smooth1','smooth2']) 
In [24]: smoothed.head() 
Out[24]: 
    smooth1 smooth2 
0 0.218350 0.877693 
1 0.400214 0.813499 
2 0.308564 0.739426 
3 0.433341 0.641891 
4 0.525260 0.620541

这是没有问题的。当然

smoothed = ewma(df, span=21) 
smoothed.columns = ['smooth1', 'smooth2']

是完全没有过

来源

2012-02-23 21:25:24

韦斯，你真了不起。感谢您构建这样一个惊人的抽象，并感谢这样一个快速的响应！ – benjaminmgross 2012-02-23 21:32:49

初始化大熊猫dataframes使用和不使用索引，列产生不同的结果

回答

相关问题