2017-06-18 78 views
0

我想估计我的季节性预测与实际数据的不同。我有以下数据集:计算窗框上累积产品的总和

day   real_revenue historical_coeff 
01/01/2017 100    1.1 
01/02/2017 105    0.98 
01/03/2017 109    1.05 
01/04/2017 107    1.07 
01/05/2017 90    1 
01/06/2017 120    0.95 
01/07/2017 98    0.99 

01/01/2017revenue = 100和季节性预测采取每天超过一天系数并将其应用到当前的收入。所以它预测01/02/2017的收入将会是100*1.1 = 110,在01/03/2017这是110*0.98 = 107.8等等。然后,预测的剩余收入将成为所有预测拍摄日的总和。例如,对于天数系数应用日期后的01/01/2017,总和将为688.274235

对于第二天01/02/2017我们从值105开始。所以我们预测在01/03/2017上我们会有105*0.98 = 102.9,那么对于01/04/2017我们会预测102.9*1.05 = 108.045等等。总预测剩余收入将为531.2557215

最后我想收到的表是这样的:

day   forecasted_total_remaining_revenue 
01/01/2017 688.274235 
01/02/2017 531.2557 
01/03/2017 ... 
01/04/2017 ... 
01/05/2017 ... 
01/06/2017 ... 
01/07/2017 ... 

从本质上讲,我需要的累积产物的总和的每一天,即a + a*b + a*b*c + a*b*c*d + ...

是否有可能在vertica或sql中编写这样的查询?

+0

不应该为'01/01结果根据所解释的逻辑,“2017年”是“802.18129365”吗? –

+0

如果包含最后一个系数,也可以得到802。在我的情况下,我描述了只有7天,因此不使用最后系数。 –

+0

“只有7天”的含义是什么?这个问题没有提到这一点。 –

回答

1

您可以使用ln()exp()获得剩余价值的产品:

select t.*, 
     exp(sum(ln(historical_coeff)) over (order by day desc)) as factor 
from t; 

当然,表达的是更复杂,如果historical_coeff是每一个负数或零。

然后,你可以利用这个累积和获取所需金额的整体因素:

select t.* 
     real_revenue * sum(factor) over (order by day desc) * forecasted_total_remaining_revenue 
from (select t.*, 
      real_revenue * exp(sum(ln(historical_coeff)) over (order by day desc)) as forecasted_total_remaining_revenue 
     from t 
    ) t 
+0

您必须添加'ROWS UNBOUNDED PRECEDING',因为当连续的行具有相同的'historical_coeff'(并且效率较低)时,默认的'RANGE'将返回错误的答案。 – dnoeth

+0

戈登..我不认为这会给累计产品所需的总和。例如。你可以在2017年1月2日加入'1.1 * 0.98 * 1.05 * 1.07 * 1 * 0.95 * 0.99'的日期(01/01/2017)和'0.98 * 1.05 * 1.07 * 1 * 0.95 * 0.99'等等..但所需总和为1.1 + 1.1 * 0.98 + 1.1 * 0.98 * 1.05 + 1.1 * 0.98 * 1.05 * 1.07 + 1.1 * 0.98 * 1.05 * 1.07 * 1 + 1.1 * 0.98 * 1.05 * 1.07 * 1 * 0.95 + 1.1 * 0.98 * 1.05 * 1.07 * 1 * 0.95 * 0.99' –

+0

'01/01/01'。 –

0

在常规的SQL(这里显示的语法是SQL Sever的),这可以用递归来完成cte(只要DBMS支持它们)。

with rownums as (select t.*,row_number() over(order by dt) as rn from tbl t) 
,cte as (select rn,dt,real_revenue,historical_coeff,cast(real_revenue*historical_coeff as decimal(38,10)) as res 
     from rownums 
     where rn=1 
     union all 
     select t.rn,t.dt,t.real_revenue,t.historical_coeff,cast(c.res*t.historical_coeff as decimal(38,10)) 
     from rownums t 
     join cte c on t.rn=c.rn+1 
     ) 
select dt,sum(res) over(order by dt desc) as forecasted_remaining_revenue 
from cte 

用于排除最后系数的逻辑不清楚。这总结了从给定日期到最后日期的所有累积产品。

Sample Demo

0

我认为你在寻找这样的事情(你可能需要调整的间隔天数):

SELECT 
    day, 
    SUM (frev) OVER (ORDER BY day 
     RANGE BETWEEN CURRENT ROW AND INTERVAL '5 DAYS' FOLLOWING 
    ) AS forecasted_total_remaining_revenue 
FROM (
    SELECT 
     day, 
     real_revenue * 
      EXP(SUM (LN(historical_coeff)) OVER(
       ORDER BY day 
       RANGE BETWEEN CURRENT ROW AND INTERVAL '5 DAYS' FOLLOWING 
       ) 
      ) AS frev 
    FROM 
     public.t1 
) a 
;