2017-08-25 44 views
0

我每天都有日志记录数据存储在一个Postgres数据库中,这些数据由id和date组成。显然,如果用户登录多次,用户可以在数据库中拥有多行。使用PostgreSQL获取每日MAU滚动总和

形象化:

| id | timestamp   | 
|------|---------------------| 
| 0099 | 2004-10-19 10:23:54 | 
| 1029 | 2004-10-01 10:23:54 | 
| 2353 | 2004-10-20 8:23:54 | 

比方说,MAU(“每月活跃用户”)被定义为是登录某一日历月独特 ID的数量的。我想在一个月内每天获得MAU的滚动总和,即MAU在不同时间点的增长。例如,如果我们查看2014年10月:

| date  | MAU | 
|------------|-------| 
| 2014-10-01 | 10000 | 
| 2014-10-02 | 12948 | 
| 2014-10-03 | 13465 | 

等到月底结束。我听说窗口函数可能是解决这个问题的一种方法。任何想法如何利用它来获得滚动MAU总和?

+0

您可以加入表定义和一些示例数据/预期的结果你的问题? – Marth

回答

0

对于给定的月份,您可以通过它们被看作当月份增加在第一天用户计算这个

select date_trunc('day', mints), count(*) as usersOnDay, 
     sum(count(*)) over (order by date_trunc('day', mints)) as cume_users 
from (select id, min(timestamp) as mints 
     from log 
     where timestamp >= '2004-10-01'::date and timestamp < '2004-11-01'::date 
     group by id 
    ) l 
group by date_trunc('day', mints); 

注:这回答您的问题约一个月。这可以扩展到更多日历个月,在这里您可以在第一天统计唯一用户,然后添加增量。

如果您有一个问题,其中累计期限通过月份边界,然后问另一个问题,并解释在这种情况下一个月的含义。

1

阅读the documentation for Postgres window functions后,这里有一个解决方案,得到了滚动MAU和当月:

-- First, get id and date of each timestamp within the current month 
WITH raw_data as (SELECT id, date_trunc('day', timestamp) as timestamp 
    FROM user_logs 
    WHERE date_trunc('month', timestamp) = date_trunc('month', current_timestamp)), 

-- Since we only want to count the earliest login for a month 
-- for a given login, use MIN() to aggregate 
month_data as (SELECT id, MIN(timestamp) as timestamp_day FROM raw_data GROUP BY id) 

-- Postgres doesn't support DISTINCT for window functions, so query 
-- from the rolling sum to have each row as a day 

SELECT timestamp_day as date, MAX(count) as MAU 
    FROM (SELECT timestamp_day, COUNT(id) OVER(ORDER BY timestamp_day) FROM month_data) foo 
    GROUP By timestamp_day