2015-11-08 65 views
2

我想保留“1分钟”的分辨率的数据,并返回其中的日期相匹配的索引三者之一的DF的一个子集如何用较高分辨率的索引器(频率= 1D)对低分辨率熊猫df(频率= 1分钟)进行分片?

import pandas as pd 
import numpy as np 

df=pd.DataFrame(index=pd.date_range("2013-10-08 00:00:00","2015-10-08 00:00:00", freq="1min",tz='UTC')) 
df['data']=np.random.random_integers(0,1,len(df)) 
indexer=["2013-12-24","2014-01-16","2015-02-19"] 

下不起作用:

df.loc[pd.DatetimeIndex(indexer)] 

回答

0

您必须按功能get_loc获取整数位置,然后您可以按df.ix选择数据。

但是,您不仅需要午夜时间,还需要一天中的1440分钟。

我再次使用列表理解函数range,范围午夜从00:00:00+00:0023:59:00+00:00。最后flat列表已创建,因为范围返回列表的列表。

print df.head() 
#       data 
#2013-10-08 00:00:00+00:00  0 
#2013-10-08 00:01:00+00:00  0 
#2013-10-08 00:02:00+00:00  1 
#2013-10-08 00:03:00+00:00  0 
#2013-10-08 00:04:00+00:00  0 

#list comprehension - get loc of dates 
idx = [df.index.get_loc(pd.to_datetime(i)) for i in indexer] 
print idx 
#[110880, 144000, 718560] 

#add 1439 + 1 minutes, because range is 0 indexing 
idx = [range(i, i+1440) for i in idx] 
#flatten list 
idx = [y for x in idx for y in x] 

#select df by integer indexes 
df = df.ix[idx] 

print df.head() 
#       data 
#2013-12-24 00:00:00+00:00  1 
#2013-12-24 00:01:00+00:00  0 
#2013-12-24 00:02:00+00:00  0 
#2013-12-24 00:03:00+00:00  0 
#2013-12-24 00:04:00+00:00  1 
print df.tail() 
#       data 
#2015-02-19 23:55:00+00:00  1 
#2015-02-19 23:56:00+00:00  0 
#2015-02-19 23:57:00+00:00  0 
#2015-02-19 23:58:00+00:00  0 
#2015-02-19 23:59:00+00:00  0