2016-11-10 79 views
4

的样本数据从数据框中删除一行时指数(日期时间)是星期天

       Open  High  Low Close 
DateTime              
2016-01-03 00:00:00+00:00 1.08701 1.08723 1.08451 1.08515 
2016-01-04 00:00:00+00:00 1.08701 1.09464 1.07811 1.08239 
2016-01-05 00:00:00+00:00 1.08238 1.08388 1.07106 1.07502 
2016-01-06 00:00:00+00:00 1.07504 1.07994 1.07185 1.07766 
2016-01-07 00:00:00+00:00 1.07767 1.09401 1.07710 1.09256 
2016-01-08 00:00:00+00:00 1.09255 1.09300 1.08030 1.09218 

日期时间是索引,需要删除其中有日期时间为周日或周六(2016年1月3日)的行。

我从CVS文件

df = pd.read_csv(filename, names=['DateTime','Open','High','Low','Close'], 
       parse_dates = [0], index_col = 'DateTime') 

试图做类似下面,但没有工作,阅读本数据。

df = df.drop(df[df.weekday() == 6].index) #delete Sundays 
+4

你可以做'DF = DF [!df.index.weekday = 6]',你尝试过什么将无法正常工作,因为'drop'查找索引标签丢弃,您通过了布尔系列,这就是为什么它不起作用 – EdChum

+1

我会说:'df = df.loc [df.index.dayofweek <5]' – MaxU

+0

df = df [df.index.weekday!= 6] ----工作 –

回答

5

你可以使用asfreq('B')reindex df哪个是business days行。 但是,请注意,如果df.index中缺少工作日,则asfreq将返回带有一行NaN的DataFrame,以指示缺少的行。另请注意,df.index必须是DatetimeIndex。

In [106]: df.asfreq('B') 
Out[106]: 
       Open  High  Low Close 
2016-01-04 1.08701 1.09464 1.07811 1.08239 
2016-01-05 1.08238 1.08388 1.07106 1.07502 
2016-01-06 1.07504 1.07994 1.07185 1.07766 
2016-01-07 1.07767 1.09401 1.07710 1.09256 
2016-01-08 1.09255 1.09300 1.08030 1.09218 

这里是用来产生上述结果的设置:

import pandas as pd 
df = pd.DataFrame(
    {'Close': [1.0851500000000001, 1.08239, 1.0750200000000001, 1.0776600000000001, 1.09256, 1.0921799999999999], 'DateTime': ['2016-01-03 00:00:00+00:00', '2016-01-04 00:00:00+00:00', '2016-01-05 00:00:00+00:00', '2016-01-06 00:00:00+00:00', '2016-01-07 00:00:00+00:00', '2016-01-08 00:00:00+00:00'], 'High': [1.0872299999999999, 1.0946400000000001, 1.08388, 1.0799399999999999, 1.0940099999999999, 1.093], 'Low': [1.0845100000000001, 1.0781100000000001, 1.0710600000000001, 1.07185, 1.0770999999999999, 1.0803], 'Open': [1.08701, 1.08701, 1.0823799999999999, 1.07504, 1.0776700000000001, 1.0925499999999999]}) 
df['DateTime'] = pd.to_datetime(df['DateTime']) 
df = df.set_index('DateTime') 
print(df.asfreq('B')) 
+0

df = df.asfreq('B')----工作。谢谢unutbu –