2016-07-06 83 views
2

目前我有这个,通过PostgreSQL的获得每日,每周,和在一个查询事件的发生的月平均

  1. 功能会采取汇总每日,每周,每月数到中间表相当大的查询按事件名称和日期分组的事件的count()
  2. 通过按事件做avg()组,选择每个中间表的平均计数,对结果进行联合,并且因为我想每天,每周,每月都有一个单独的列,将填充值0填入空列。
  3. 然后我总结所有的列,0基本上作为一个无操作,这给了我每个事件只有一个值。

查询是相当大的,虽然,我觉得我做了很多重复性的工作。有什么方法可以更好地执行此查询或使其更小?我之前没有真正做过这样的查询,所以我不太确定。

WITH monthly_counts as (
    SELECT 
    event, 
    count(*) as count 
    FROM tracking_stuff 
    WHERE 
    event = 'thing' 
    OR event = 'thing2' 
    OR event = 'thing3' 
    GROUP BY event, date_trunc('month', created_at) 
), 
weekly_counts as (
    SELECT 
    event, 
    count(*) as count 
    FROM tracking_stuff 
    WHERE 
    event = 'thing' 
    OR event = 'thing2' 
    OR event = 'thing3' 
    GROUP BY event, date_trunc('week', created_at) 
), 
daily_counts as (
    SELECT 
    event, 
    count(*) as count 
    FROM tracking_stuff 
    WHERE 
    event = 'thing' 
    OR event = 'thing2' 
    OR event = 'thing3' 
    GROUP BY event, date_trunc('day', created_at) 
), 
query as (
    SELECT 
    event, 
    0 as daily_avg, 
    0 as weekly_avg, 
    avg(count) as monthly_avg 
    FROM monthly_counts 
    GROUP BY event 
    UNION 
    SELECT 
    event, 
    0 as daily_avg, 
    avg(count) as weekly_avg, 
    0 as monthly_avg 
    FROM weekly_counts 
    GROUP BY event 
    UNION 
    SELECT 
    event, 
    avg(count) as daily_avg, 
    0 as weekly_avg, 
    0 as monthly_avg 
    FROM daily_counts 
    GROUP BY event 
) 
SELECT 
    event, 
    sum(daily_avg) as daily_avg, 
    sum(weekly_avg) as weekly_avg, 
    sum(monthly_avg) as monthly_avg 
FROM query 
GROUP BY event; 

回答

1

我会写查询在这样的方式:

select event, daily_avg, weekly_avg, monthly_avg 
from (
    select event, avg(count) monthly_avg 
    from (
     select event, count(*) 
     from tracking_stuff 
     where event in ('thing1', 'thing2', 'thing3') 
     group by event, date_trunc('month', created_at) 
    ) s 
    group by 1 
) monthly 
join (
    select event, avg(count) weekly_avg 
    from (
     select event, count(*) 
     from tracking_stuff 
     where event in ('thing1', 'thing2', 'thing3') 
     group by event, date_trunc('week', created_at) 
    ) s 
    group by 1 
) weekly using(event) 
join (
    select event, avg(count) daily_avg 
    from (
     select event, count(*) 
     from tracking_stuff 
     where event in ('thing1', 'thing2', 'thing3') 
     group by event, date_trunc('day', created_at) 
    ) s 
    group by 1 
) daily using(event) 
order by 1; 

如果where条件消除了数据的显著部分(比如一半以上)使用cte可能略有加快查询执行:

with the_data as (
    select event, created_at 
    from tracking_stuff 
    where event in ('thing1', 'thing2', 'thing3') 
    ) 

select event, daily_avg, weekly_avg, monthly_avg 
from (
    select event, avg(count) monthly_avg 
    from (
     select event, count(*) 
     from the_data 
     group by event, date_trunc('month', created_at) 
    ) s 
    group by 1 
) monthly 
-- etc ... 

只是为了好奇,我已经做了数据测试:

create table tracking_stuff (event text, created_at timestamp); 
insert into tracking_stuff 
    select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365) 
    from generate_series(1, 1000000); 

在每一个我把它换成thingthing1查询,所以查询排除行的2/3。的10个测试

平均执行时间:

Original query   1106 ms 
My query without cte 1077 ms 
My query with cte  902 ms 
Clodoaldo's query  5187 ms 
+0

只是一个真正的快速问题,没有检查任何事实...不是比工会更昂贵的加入?除了偏好之外,还有什么理由不使用'with'? – m0meni

+1

在这种情况下,'union'和'join'之间的区别应该是不可察觉的。类似的评论可能涉及使用'cte'。当我需要递归时,通常使用'with'。 – klin

+0

'CTE'是规划师的优化围栏。可能或不会有所作为。 –

3

在9.5+使用grouping sets

由FROM和WHERE子句是由每个指定分组集合单独分组,骨料计算了所选择的数据每个组与简单的GROUP BY子句一样,然后返回结果

select event, 
    avg(total) filter (where day is not null) as avg_day, 
    avg(total) filter (where week is not null) as avg_week, 
    avg(total) filter (where month is not null) as avg_month  
from (
    select 
     event, 
     date_trunc('day', created_at) as day, 
     date_trunc('week', created_at) as week, 
     date_trunc('month', created_at) as month, 
     count(*) as total 
    from tracking_stuff 
    where event in ('thing','thing2','thing3') 
    group by grouping sets ((event, 2), (event, 3), (event, 4)) 
) s 
group by event 
+0

这是非常有趣的提示!尽管我的直觉告诉我这个查询应该相当昂贵。 – klin