2017-05-24 30 views
4

我有这样一个数据帧这如何总结蟒蛇大熊猫数据框在一定的时间范围内

DF

order_date amount 
0 2015-10-02  1 
1 2015-12-21  15 
2 2015-12-24  3 
3 2015-12-26  4 
4 2015-12-27  5 
5 2015-12-28  10 

我想基于范围从DF [DF [“量”]总结” order_date的“]的df,[” order_date的“] +6天

order_date amount sum 
0 2015-10-02  1  1 
1 2015-12-21  15  27 //comes from 15 + 3 + 4 + 5 
2 2015-12-24  3  22 //comes from 3 + 4 + 5 + 10 
3 2015-12-26  4  19 
4 2015-12-27  5  15 
5 2015-12-28  10  10 

order_date的数据类型为datetime 曾尝试使用ILOC但如果任何人有任何的IDE它没有很好地工作...... 一个/谁在这工作的例子, 请请让我知道。

+0

如果'order_date'是某种DateTime对象,你可以试试这个:'从日期时间进口timedelta'然后'DF [ 'order_date的'] + timedelta(天= 6)' – Taylor

回答

0

扩大对我的评论:

from datetime import timedelta 

df['sum'] = 0 
for i in range(len(df)): 
    dt1 = df['order_date'][i] 
    dt2 = dt1 + timedelta(days=6) 
    df['sum'][i] = sum(df['amount'][(df['order_date'] >= dt1) & (df['order_date'] <= dt2)]) 

有可能是一个更好的方式来做到这一点,但它的工作原理...

0

还有就是我对这个问题的一种方法。它的工作原理..(我认为应该有一个更好的方法来做到这一点。)

import pandas as pd 

    df['order_date']=pd.to_datetime(pd.Series(df.order_date)) 
    Temp=pd.DataFrame(pd.date_range(start='2015-10-02', end='2017-01-01'),columns=['STDate']) 
    Temp=Temp.merge(df,left_on='STDate',right_on='order_date',how='left') 
    Temp['amount']=Temp['amount'].fillna(0) 
    Temp.sort(['STDate'],ascending=False,inplace=True) 
    Temp['rolls']=pd.rolling_sum(Temp['amount'],window =7,min_periods=0) 
    Temp.loc[Temp.STDate.isin(df.order_date),:].sort(['STDate'],ascending=True) 


    STDate Unnamed: 0 order_date amount rolls 


0 2015-10-02   0.0 2015-10-02  1.0 1.0 
80 2015-12-21   1.0 2015-12-21 15.0 27.0 
83 2015-12-24   2.0 2015-12-24  3.0 22.0 
85 2015-12-26   3.0 2015-12-26  4.0 19.0 
86 2015-12-27   4.0 2015-12-27  5.0 15.0 
87 2015-12-28   5.0 2015-12-28 10.0 10.0 
0

设置order_dateDatetimeIndex,这样就可以使用df.ix[time1:time2]得到的时间范围行,然后过滤amount列,总结他们。

你可以尝试使用:

from datetime import timedelta 
df = pd.read_fwf('test2.csv') 
df.order_date = pd.to_datetime(df.order_date) 
df =df.set_index(pd.DatetimeIndex(df['order_date'])) 
sum_list = list() 
for i in range(len(df)): 
    sum_list.append(df.ix[df.ix[i]['order_date']:(df.ix[i]['order_date'] + timedelta(days=6))]['amount'].sum()) 
df['sum'] = sum_list 
df 

输出:

  order_date amount sum 
2015-10-02 2015-10-02 1  1 
2015-12-21 2015-12-21 15  27 
2015-12-24 2015-12-24 3  22 
2015-12-26 2015-12-26 4  19 
2015-12-27 2015-12-27 5  15 
2015-12-28 2015-12-28 10  10 
+0

谢谢,比我的解决方案更好 – Wen

0
import datetime 

df['order_date'] = pd.to_datetime(df['order_date'], format='%Y-%m-%d') 
df.set_index(['order_date'], inplace=True) 

# Sum rows within the range of six days in the future 
d = {t: df[(df.index >= t) & (df.index <= t + datetime.timedelta(days=6))]['amount'].sum() 
     for t in df.index} 

# Assign the summed values back to the dataframe 
df['amount_sum'] = [d[t] for t in df.index] 

df现在是:

  amount amount_sum 
order_date      
2015-10-02  1.0   1.0 
2015-12-21 15.0  27.0 
2015-12-24  3.0  22.0 
2015-12-26  4.0  19.0 
2015-12-27  5.0  15.0 
2015-12-28 10.0  10.0 
3

如果大熊猫rolling允许左对齐窗口(默认是对齐的),那么答案将是一个简单的单线程:df.set_index('order_date').amount.rolling('7d',min_periods=1,align='left').sum(),但是前瞻性还没有实现(即, rolling不接受align参数)。所以,我想出的诀窍是暂时“反转”日期。解决方案:

df.index = pd.to_datetime(pd.datetime.now() - df.order_date) 
df['sum'] = df.sort_index().amount.rolling('7d',min_periods=1).sum() 
df.reset_index(drop=True) 

输出:

order_date amount sum 
0 2015-10-02  1 1.0 
1 2015-12-21  15 27.0 
2 2015-12-24  3 22.0 
3 2015-12-26  4 19.0 
4 2015-12-27  5 15.0 
5 2015-12-28  10 10.0