2017-02-11 59 views
4

我是熊猫新手,作为练习,我正在移动一些旧代码/解决方案以从中学习。在这种情况下,我试图计算一个产品价格的综合指数,它在SQL中计算之前。用自定义函数计算熊猫每日总量

这是我在数据帧的论坛中的数据:

id weight  date  price 
0 1 0.002796 2005-11-15 0.998298 
1 1 0.002796 2005-11-16 1.014242 
2 1 0.002796 2005-11-17 1.016452 
3 1 0.002796 2005-11-18 1.026396 
4 1 0.002796 2005-11-19 1.026047 
5 1 0.002796 2005-11-20 1.024285 
6 1 0.002796 2005-11-21 1.018764 
7 1 0.002796 2005-11-22 1.033175 
8 1 0.002796 2005-11-23 1.058509 
9 1 0.002796 2005-11-24 1.061231 
10 1 0.002796 2005-11-25 1.058137 
11 1 0.002796 2005-11-26 0.999380 
12 1 0.002796 2005-11-27 0.990504 
13 1 0.002796 2005-11-28 0.993764 
14 1 0.002796 2005-11-29 0.978754 
15 1 0.002796 2005-11-30 0.992070 
...  ...  ...  ...  ... 
4085 1 0.002796 2017-01-21 0.857420 
4086 1 0.002796 2017-01-22 0.848195 
4087 1 0.002796 2017-01-23 0.791784 
4088 1 0.002796 2017-01-24 0.846603 
4089 1 0.002796 2017-01-25 0.878104 
4090 1 0.002796 2017-01-26 0.806651 
4091 1 0.002796 2017-01-27 0.849316 
4092 1 0.002796 2017-01-28 0.826550 
4093 1 0.002796 2017-01-29 0.848651 
4094 1 0.002796 2017-01-30 0.829643 
4095 1 0.002796 2017-01-31 0.837094 
4096 1 0.002796 2017-02-01 0.846572 
4097 1 0.002796 2017-02-02 0.800163 
4098 1 0.002796 2017-02-03 0.820356 
4099 1 0.002796 2017-02-04 0.818924 
4100 1 0.002796 2017-02-05 0.822157 
4101 1 0.002796 2017-02-06 0.787123 
4102 1 0.002796 2017-02-07 0.796264 
4103 1 0.002796 2017-02-08 0.797241 
4104 1 0.002796 2017-02-09 0.818499 
4105 1 0.002796 2017-02-10 0.810928 

合成指数的计算与日收益,作为当天返回:

保留时间=(Price_day/Price_day_before) - 1

我一直在阅读熊猫,时间系列等,但我很努力去理解在这里执行的具体操作;这是一个滚动?如何获取给定日期和日期前的数据?

回答

2

IIUC要使用pct_change()方法:

In [196]: x 
Out[196]: 
    id weight  date  price 
0 1 0.002796 2005-11-15 0.998298 
1 1 0.002796 2005-11-16 1.014242 
2 1 0.002796 2005-11-17 1.016452 
3 1 0.002796 2005-11-18 1.026396 
4 1 0.002796 2005-11-19 1.026047 
5 1 0.002796 2005-11-20 1.024285 
6 1 0.002796 2005-11-21 1.018764 
7 1 0.002796 2005-11-22 1.033175 
8 1 0.002796 2005-11-23 1.058509 
9 1 0.002796 2005-11-24 1.061231 
10 1 0.002796 2005-11-25 1.058137 
11 1 0.002796 2005-11-26 0.999380 
12 1 0.002796 2005-11-27 0.990504 
13 1 0.002796 2005-11-28 0.993764 
14 1 0.002796 2005-11-29 0.978754 
15 1 0.002796 2005-11-30 0.992070 

In [197]: x['Rt'] = x['price'].pct_change() 

In [198]: x 
Out[198]: 
    id weight  date  price  Rt 
0 1 0.002796 2005-11-15 0.998298  NaN 
1 1 0.002796 2005-11-16 1.014242 0.015971 
2 1 0.002796 2005-11-17 1.016452 0.002179 
3 1 0.002796 2005-11-18 1.026396 0.009783 
4 1 0.002796 2005-11-19 1.026047 -0.000340 
5 1 0.002796 2005-11-20 1.024285 -0.001717 
6 1 0.002796 2005-11-21 1.018764 -0.005390 
7 1 0.002796 2005-11-22 1.033175 0.014146 
8 1 0.002796 2005-11-23 1.058509 0.024521 
9 1 0.002796 2005-11-24 1.061231 0.002572 
10 1 0.002796 2005-11-25 1.058137 -0.002915 
11 1 0.002796 2005-11-26 0.999380 -0.055529 
12 1 0.002796 2005-11-27 0.990504 -0.008882 
13 1 0.002796 2005-11-28 0.993764 0.003291 
14 1 0.002796 2005-11-29 0.978754 -0.015104 
15 1 0.002796 2005-11-30 0.992070 0.013605 

另一种解决方案(使用shift()法):

In [199]: x['Rt2'] = x['price']/x['price'].shift() - 1 

In [200]: x 
Out[200]: 
    id weight  date  price  Rt  Rt2 
0 1 0.002796 2005-11-15 0.998298  NaN  NaN 
1 1 0.002796 2005-11-16 1.014242 0.015971 0.015971 
2 1 0.002796 2005-11-17 1.016452 0.002179 0.002179 
3 1 0.002796 2005-11-18 1.026396 0.009783 0.009783 
4 1 0.002796 2005-11-19 1.026047 -0.000340 -0.000340 
5 1 0.002796 2005-11-20 1.024285 -0.001717 -0.001717 
6 1 0.002796 2005-11-21 1.018764 -0.005390 -0.005390 
7 1 0.002796 2005-11-22 1.033175 0.014146 0.014146 
8 1 0.002796 2005-11-23 1.058509 0.024521 0.024521 
9 1 0.002796 2005-11-24 1.061231 0.002572 0.002572 
10 1 0.002796 2005-11-25 1.058137 -0.002915 -0.002915 
11 1 0.002796 2005-11-26 0.999380 -0.055529 -0.055529 
12 1 0.002796 2005-11-27 0.990504 -0.008882 -0.008882 
13 1 0.002796 2005-11-28 0.993764 0.003291 0.003291 
14 1 0.002796 2005-11-29 0.978754 -0.015104 -0.015104 
15 1 0.002796 2005-11-30 0.992070 0.013605 0.013605 
+0

哇,甚至根本不知道pct_change的()。该文档对确切的计算并不是很好,但结果与操作本身是一致的。 – mydaemon

+0

一旦你添加了shift(),我给你最好的答案,这正是我想到的那种操作。高超! – mydaemon

+0

@mydaemon,很高兴我能帮到你:) – MaxU

3

代码:

df['return'] = df['price']/df['price'].shift(1) - 1

df.shift(1)功能一世n在Python中,Python被用作LEAD()LEG()函数。

2

以生成合成指数从100

df['PrcRtnIdx'] = df.price.pct_change().fillna(0).add(1).cumprod().mul(100) 

print(df) 

    id weight  date  price PrcRtnIdx 
0 1 0.002796 2005-11-15 0.998298 100.000000 
1 1 0.002796 2005-11-16 1.014242 101.597118 
2 1 0.002796 2005-11-17 1.016452 101.818495 
3 1 0.002796 2005-11-18 1.026396 102.814590 
4 1 0.002796 2005-11-19 1.026047 102.779631 
5 1 0.002796 2005-11-20 1.024285 102.603131 
6 1 0.002796 2005-11-21 1.018764 102.050089 
7 1 0.002796 2005-11-22 1.033175 103.493646 
8 1 0.002796 2005-11-23 1.058509 106.031365 
9 1 0.002796 2005-11-24 1.061231 106.304029 
10 1 0.002796 2005-11-25 1.058137 105.994102 
11 1 0.002796 2005-11-26 0.999380 100.108384 
12 1 0.002796 2005-11-27 0.990504 99.219271 
13 1 0.002796 2005-11-28 0.993764 99.545827 
14 1 0.002796 2005-11-29 0.978754 98.042268 
15 1 0.002796 2005-11-30 0.992070 99.376138 

响应开始@mydaemon

df['PrcRtnIdx'] = \ 
    df.groupby('id').price.apply(
     lambda x: x.pct_change().fillna(0).add(1).cumprod().mul(100)) 
+0

你,先生,我睁大眼睛:-O。 因此,这使得每个合成指数的整个计算为P0 = 100?每个产品ID是什么? – mydaemon

+0

@mydaemon请参阅编辑。 – piRSquared

+0

非常棒,谢谢! – mydaemon