熊猫滚动应用自定义

我一直在遵循类似的回答here，但在使用sklearn和滚动应用时我有一些问题。我想创建z分数，并与滚动做PCA申请，但我一直在得到'only length-1 arrays can be converted to Python scalars' error.熊猫滚动应用自定义

按照前面的例子中，我创建一个数据帧

from sklearn.preprocessing import StandardScaler 
import pandas as pd 
import numpy as np 
sc=StandardScaler() 
tmp=pd.DataFrame(np.random.randn(2000,2)/10000,index=pd.date_range('2001-01-01',periods=2000),columns=['A','B'])

如果我使用rolling命令：

tmp.rolling(window=5,center=False).apply(lambda x: sc.fit_transform(x)) 
TypeError: only length-1 arrays can be converted to Python scalars

我得到这个错误。然而，我可以用平均值和标准偏差创建功能，没有问题。

def test(df): 
    return np.mean(df) 
tmp.rolling(window=5,center=False).apply(lambda x: test(x))

我相信这个错误发生在我试图用z-score的当前值减去平均值时。

def test2(df): 
    return df-np.mean(df) 
tmp.rolling(window=5,center=False).apply(lambda x: test2(x)) 
only length-1 arrays can be converted to Python scalars

如何使用sklearn创建自定义滚动函数来首先标准化并运行PCA？

编辑：我意识到我的问题并不完全清楚，所以我会再试一次。我想标准化我的数值，然后运行PCA以获得由每个因素解释的变化量。做这个没有滚动是相当直接的。

testing=sc.fit_transform(tmp) 
pca=decomposition.pca.PCA() #run pca 
pca.fit(testing) 
pca.explained_variance_ratio_ 
array([ 0.50967441, 0.49032559])

我在滚动时不能使用这个相同的过程。使用@piRSquared的滚动zscore函数可以提供zscores。似乎sklearn的PCA与滚动应用自定义功能不兼容。（事实上，我认为这是大多数sklearn模块的情况。）我只是试图获得解释的差异，这是一维项目，但下面的代码返回一堆NaN。

def test3(df): 
    pca.fit(df) 
    return pca.explained_variance_ratio_ 
tmp.rolling(window=5,center=False).apply(lambda x: test3(x))

但是，我可以创建自己解释的方差函数，但这也行不通。

def test4(df): 
    cov_mat=np.cov(df.T) #need covariance of features, not observations 
    eigen_vals,eigen_vecs=np.linalg.eig(cov_mat) 
    tot=sum(eigen_vals) 
    var_exp=[(i/tot) for i in sorted(eigen_vals,reverse=True)] 
    return var_exp 
tmp.rolling(window=5,center=False).apply(lambda x: test4(x))

我得到这个错误0-dimensional array given. Array must be at least two-dimensional。回顾一下，我想运行滚动的Z分数，然后在每次滚动时滚动输出解释的方差。我有滚动的Z分数下降，但没有解释方差。

来源

2016-12-04 Bobe Kryant

你期望的输出是什么？熊猫滚动函数应该从大量输入中产生单个标量值。如果你想在块上做更复杂的操作，你将不得不“自己滚动”。 – BrenBarn

正如@BrenBarn所评论的，滚动功能需要将向量减少为单个数字。以下内容与您正在尝试做的事情相同，并有助于突出显示问题。

zscore = lambda x: (x - x.mean())/x.std() 
tmp.rolling(5).apply(zscore)

TypeError: only length-1 arrays can be converted to Python scalars

在zscore功能，x.mean()降低，x.std()减少，但x是一个数组。因此，整个事情是一个数组。

解决这个问题的方法是执行上需要它的Z分数计算的部分轧辊，而不是导致问题的部分。

(tmp - tmp.rolling(5).mean())/tmp.rolling(5).std()

来源

2016-12-04 06:31:59 piRSquared

感谢z-score部分。我试图为PCA部分做一些类似的工作无济于事。 lambda是否搞乱了PCA，因为我正在为多行而不只是一行？ –

熊猫滚动应用自定义

回答

相关问题