2016-09-18 40 views
1

试图从log_returns矩阵删除第一排,当我卡住了。基本上,我想摆脱第一行,因为它有NaN值。我已经尝试isnan()没有欢乐,最后降落在numpy.delete()方法,这听起来最有前途,但仍然没有达到目的。上述无法删除第一行中的矩阵

import pandas as pd 
from pandas_datareader import data as web 
import numpy as np 

symbols = ['XOM', 'CVX', 'SLB', 'PXD', 'EOG', 'OXY', 'HAL', 'KMI', 'SE', 'PSX', 'VLO','COP','APC','TSO','WMB','BHI','APA','COG','DVN','MPC','NBL','CXO','NOV','HES','MRO','EQT','XEC','FTI','RRC','OKE','SWN','NFX','HP','MUR','CHK','RIG','DO'] 

try: 
    h9 = pd.HDFStore('port.h9') 
    data = h9['norm'] 
    h9.close() 
except: 
    data = pd.DataFrame() 
    for sym in symbols: 
     data[sym] = web.DataReader(sym, data_source='yahoo', 
           start='1/1/2010')['Adj Close'] 
    data = data.dropna() 
    h9 = pd.HDFStore('port.h9') 
    h9['norm'] = data 
    h9.close() 

data.info() 
log_returns = np.log(data/data.shift(1)) 
log_returns.head() 
np.delete(log_returns, 0, 0) 

最后一行(删除)引发以下例外,它是没有意义的作为row = 0location = 0肯定是不出来的log_returns矩阵是形状(1116,37)中的范围。

ValueError: Shape of passed values is (37, 1115), indices imply (37, 1116) 
+3

那么:'log_returns = log_returns.iloc [1:]'? – MaxU

+0

的第二个参数['np.delete()'](http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html)可能不是你认为它是。如果你只需要扔掉第一排,@ MaxU的建议就是要走的路。另外,'np.nan!= np.nan'会使'np.delete'的工作更加困难。 –

+0

MaxU - iloc方法工作得很好!非常感谢。也感谢Andras的回应。 – skafetaur

回答

0

演示:

In [202]: from pandas_datareader import data as web 

In [218]: df = web.DataReader('XOM', 'yahoo', start='1/1/2010')['Adj Close'] 

In [219]: pd.options.display.max_rows = 10 

In [220]: df 
Out[220]: 
Date 
2010-01-04 57.203028 
2010-01-05 57.426378 
2010-01-06 57.922715 
2010-01-07 57.740730 
2010-01-08 57.509100 
       ... 
2016-09-12 87.290001 
2016-09-13 85.209999 
2016-09-14 84.599998 
2016-09-15 85.080002 
2016-09-16 84.029999 
Name: Adj Close, dtype: float64 

In [221]: np.log(df.head(10).pct_change() + 1) 
Out[221]: 
Date 
2010-01-04   NaN 
2010-01-05 0.003897 
2010-01-06 0.008606 
2010-01-07 -0.003147 
2010-01-08 -0.004020 
2010-01-11 0.011157 
2010-01-12 -0.004991 
2010-01-13 -0.004011 
2010-01-14 0.000144 
2010-01-15 -0.008214 
Name: Adj Close, dtype: float64 

解决方案:

In [224]: np.log(df.pct_change() + 1).dropna() 
Out[224]: 
Date 
2010-01-05 0.003897 
2010-01-06 0.008606 
2010-01-07 -0.003147 
2010-01-08 -0.004020 
2010-01-11 0.011157 
       ... 
2016-09-12 0.005169 
2016-09-13 -0.024117 
2016-09-14 -0.007185 
2016-09-15 0.005658 
2016-09-16 -0.012418 
Name: Adj Close, dtype: float64 

或:

In [225]: np.log(df.pct_change() + 1).iloc[1:] 
Out[225]: 
Date 
2010-01-05 0.003897 
2010-01-06 0.008606 
2010-01-07 -0.003147 
2010-01-08 -0.004020 
2010-01-11 0.011157 
       ... 
2016-09-12 0.005169 
2016-09-13 -0.024117 
2016-09-14 -0.007185 
2016-09-15 0.005658 
2016-09-16 -0.012418 
Name: Adj Close, dtype: float64 

或:

In [227]: np.log(df.pct_change() + 1).drop(df.index[0]) 
Out[227]: 
Date 
2010-01-05 0.003897 
2010-01-06 0.008606 
2010-01-07 -0.003147 
2010-01-08 -0.004020 
2010-01-11 0.011157 
       ... 
2016-09-12 0.005169 
2016-09-13 -0.024117 
2016-09-14 -0.007185 
2016-09-15 0.005658 
2016-09-16 -0.012418 
Name: Adj Close, dtype: float64