2016-11-14 32 views
0

我经常会用相同的聚合函数组合来查询。例如。如何重用PostgreSQL中的聚合表达式而不会减速

SELECT 
    my_id, 
    sum(a * weight)/nullif(sum(CASE WHEN a IS NOT NULL THEN weight END), 0) AS a, 
    sum(b * weight)/nullif(sum(CASE WHEN b IS NOT NULL THEN weight END), 0) AS b 
FROM my_table 
GROUP BY my_id 

我想避免重复相同的表达式一遍又一遍。这将是巨大的一项新功能weighted_avg得到相同的结果:

SELECT 
    my_id, 
    weighted_avg(a, weight) AS a, 
    weighted_avg(b, weight) AS b 
FROM my_table 
GROUP BY my_id 

要做到这一点,我知道的唯一方法,就是使用CREATE AGGREGATE与中间状态和SFUNC它被调用的每一行。不幸的是,这比原来的查询慢得多,这使得它在我的情况下不可用。

我想象我的理想的解决方案会是什么样子

CREATE AGGREGATE FUNCTION weighted_avg(x float, weight float) 
RETURNS float AS $$ 
    SELECT sum(x * weight)/nullif(sum(CASE WHEN x IS NOT NULL THEN weight END), 0) 
$$ language SQL IMMUTABLE; 

和执行查询时会内联。但是我找不到Postgres支持的任何类似内容。

+1

使用功能的大概总是要有点比在原始代码中使用表达式慢。 –

+0

我对一些开销很满意,但是使用'CREATE AGGREGATE'的plpgsql实现需要4倍的时间才能执行。所以我会保留原始表达式,这是可以接受的,但我希望有更好的解决方案。 –

+0

在'FROM'中使用子查询来计算一次输入表达式。 –

回答

0

您没有显示测试的聚合函数。这是我会怎样创建它:

create function weighted_avg_acumm (fa float[], x float, weight float) 
returns float[] as $$ 
    select array[ 
     fa[1] + x * weight, 
     fa[2] + weight 
    ]::float[] 
$$ language sql immutable strict; 

create function weighted_avg_acumm_final (fa float[]) 
returns float as $$ 
    select fa[1]/fa[2] 
$$ language sql immutable strict; 

create aggregate weighted_avg (x float, weight float)(
    sfunc = weighted_avg_acumm, 
    finalfunc = weighted_avg_acumm_final, 
    stype = float[], 
    initcond = '{0,0}' 
); 

更新

我测试,它也慢得多了我:

create table t (a int, weight int); 
insert into t (a, weight) 
select 
    nullif(round(random() * 10), 0), 
    trunc(random() * 10) + 1 
from generate_series(1,1000000) 
; 

explain analyze 
select weighted_avg(a, weight) 
from t; 
                QUERY PLAN              
------------------------------------------------------------------------------------------------------------------- 
Aggregate (cost=269425.25..269425.26 rows=1 width=8) (actual time=7933.440..7933.440 rows=1 loops=1) 
    -> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.018..241.571 rows=1000000 loops=1) 
Planning time: 0.189 ms 
Execution time: 7933.508 ms 

explain analyze 
select 
    sum(a::numeric * weight)/
    nullif(sum(case when a is not null then weight end), 0) 
from t; 
                QUERY PLAN              
------------------------------------------------------------------------------------------------------------------- 
Aggregate (cost=26925.00..26925.02 rows=1 width=8) (actual time=904.852..904.852 rows=1 loops=1) 
    -> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..127.264 rows=1000000 loops=1) 
Planning time: 0.048 ms 
Execution time: 904.891 ms 
+0

它大部分是相同的(一些不同的零和NULL处理)。不幸的是,这比原生表达慢大约4倍。 –