2017-11-10 309 views
0

tl; dr:我想在Redshift中生成一个日期表,以便更容易地生成报告。不需要大型表已经在redshift,需要上传一个csv文件。如何在Redshift中创建日期表?

长版本: 我正在编写一份报告,我必须平均每周创建新项目。日期范围可能会持续数月或更长时间,所以可能会有5个星期一,但只有4个星期日,这可能会使数学有点棘手。另外,我无法保证每天有单个项目的实例,特别是一旦用户开始分割数据。其中,这正在绊倒BI工具。

解决此问题的最佳方法很可能是日期表。但是,日期表的大多数教程都使用了Redshift无法提供或不完全支持的SQL命令(我在看着你,generate_series)。

有没有一种简单的方法在Redshift中生成日期表?

我尝试使用的代码(在此基础上也 - 不工作的建议:http://elliot.land/post/building-a-date-dimension-table-in-redshift

CREATE TABLE facts.dates (
    "date_id"    INTEGER      NOT NULL PRIMARY KEY, 

    -- DATE 
    "full_date"   DATE      NOT NULL, 

    -- YEAR 
    "year_number"   SMALLINT     NOT NULL, 
    "year_week_number"  SMALLINT     NOT NULL, 
    "year_day_number"  SMALLINT     NOT NULL, 

    -- QUARTER 
    "qtr_number"   SMALLINT     NOT NULL, 

    -- MONTH 
    "month_number"   SMALLINT     NOT NULL, 
    "month_name"   CHAR(9)      NOT NULL, 
    "month_day_number"  SMALLINT     NOT NULL, 

    -- WEEK 
    "week_day_number"  SMALLINT     NOT NULL, 

    -- DAY 
    "day_name"    CHAR(9)      NOT NULL, 
    "day_is_weekday"  SMALLINT     NOT NULL, 
    "day_is_last_of_month" SMALLINT     NOT NULL 
) DISTSTYLE ALL SORTKEY (date_id) 
; 


INSERT INTO facts.dates 
(
    "date_id" 
    ,"full_date" 
    ,"year_number" 
    ,"year_week_number" 
    ,"year_day_number" 

    -- QUARTER 
    ,"qtr_number" 

    -- MONTH 
    ,"month_number" 
    ,"month_name" 
    ,"month_day_number" 

    -- WEEK 
    ,"week_day_number" 

    -- DAY 
    ,"day_name" 
    ,"day_is_weekday" 
    ,"day_is_last_of_month" 
) 
    SELECT 
    cast(seq + 1 AS INTEGER)          AS date_id, 

    -- DATE 
    datum               AS full_date, 

    -- YEAR 
    cast(extract(YEAR FROM datum) AS SMALLINT)     AS year_number, 
    cast(extract(WEEK FROM datum) AS SMALLINT)     AS year_week_number, 
    cast(extract(DOY FROM datum) AS SMALLINT)      AS year_day_number, 

    -- QUARTER 
    cast(to_char(datum, 'Q') AS SMALLINT)       AS qtr_number, 

    -- MONTH 
    cast(extract(MONTH FROM datum) AS SMALLINT)     AS month_number, 
    to_char(datum, 'Month')          AS month_name, 
    cast(extract(DAY FROM datum) AS SMALLINT)      AS month_day_number, 

    -- WEEK 
    cast(to_char(datum, 'D') AS SMALLINT)       AS week_day_number, 

    -- DAY 
    to_char(datum, 'Day')           AS day_name, 
    CASE WHEN to_char(datum, 'D') IN ('1', '7') 
     THEN 0 
    ELSE 1 END             AS day_is_weekday, 
    CASE WHEN 
     extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER + 
         INTERVAL '1' MONTH) :: DATE - 
         INTERVAL '1' DAY) = extract(DAY FROM datum) 
     THEN 1 
    ELSE 0 END             AS day_is_last_of_month 
    FROM 
    -- Generate days for 81 years starting from 2000. 
    (
     SELECT 
     '2000-01-01' :: DATE + generate_series AS datum, 
     generate_series      AS seq 
     FROM generate_series(0,81 * 365 + 20,1) 
    ) DQ 
    ORDER BY 1; 

会抛出这个错误

[Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.; 
1 statement failed. 

......因为,我假设INSERT和generate_series不允许在Redshift中的同一命令中

+0

正如你已经发现,'generate_series()'不能与实际的数据,因为它仅执行领导节点上使用。你的方法生成一个数字表,然后加入它的效果很好。或者,在Excel中创建源文件并仅导入结果。像这样的日期表非常适合报告。您可能想要添加的其他内容:公共假期标志,季度标志的最后一天,年份标志的最后一天(适用于按期间最后一个日期分组的报告)。 –

+0

我喜欢那些额外的列。谢谢约翰! – Phillip

回答

1

作为一个工作karound,您可以在本地计算机上运行Postgres实例,在那里运行代码,导出为CSV,然后仅在Redshift中运行CREATE TABLE部分并从CSV加载数据。由于这是一次性操作,因此可以这么做,这就是我实际为新的Redshift部署所做的事情。

+0

非常好的主意,但我想出了一个方法来做到这一点,而无需上传csv。不幸的是,会采取一些复制粘贴魔法。如果您有任何改进,我在下面发布我的解决方案。 – Phillip

0

在问这个问题时,我明白了。哎呀。

我从一个“事实”模式开始。

CREATE SCHEMA facts; 

运行下面开始数表:

create table facts.numbers 
(
    number int PRIMARY KEY 
) 
; 

使用此生成你的电话号码清单。我用一百万上手

SELECT ',(' || generate_series(0,1000000,1) || ')' 
; 

然后数从结果在下面的查询复制粘贴,值之后:

INSERT INTO facts.numbers 
VALUES 
(0) 
,(1) 
,(2) 
,(3) 
,(4) 
,(5) 
,(6) 
,(7) 
,(8) 
,(9) 
-- etc 

^确保从禁止复制删除前导逗号数字

的粘贴名单一旦你有一个数字表,那么你就可以生成一个日期表(再次,从艾略特土地http://elliot.land/post/building-a-date-dimension-table-in-redshift偷码):

CREATE TABLE facts.dates (
    "date_id"    INTEGER      NOT NULL PRIMARY KEY, 

    -- DATE 
    "full_date"   DATE      NOT NULL, 

    -- YEAR 
    "year_number"   SMALLINT     NOT NULL, 
    "year_week_number"  SMALLINT     NOT NULL, 
    "year_day_number"  SMALLINT     NOT NULL, 

    -- QUARTER 
    "qtr_number"   SMALLINT     NOT NULL, 

    -- MONTH 
    "month_number"   SMALLINT     NOT NULL, 
    "month_name"   CHAR(9)      NOT NULL, 
    "month_day_number"  SMALLINT     NOT NULL, 

    -- WEEK 
    "week_day_number"  SMALLINT     NOT NULL, 

    -- DAY 
    "day_name"    CHAR(9)      NOT NULL, 
    "day_is_weekday"  SMALLINT     NOT NULL, 
    "day_is_last_of_month" SMALLINT     NOT NULL 
) DISTSTYLE ALL SORTKEY (date_id) 
; 


INSERT INTO facts.dates 
(
    "date_id" 
    ,"full_date" 
    ,"year_number" 
    ,"year_week_number" 
    ,"year_day_number" 

    -- QUARTER 
    ,"qtr_number" 

    -- MONTH 
    ,"month_number" 
    ,"month_name" 
    ,"month_day_number" 

    -- WEEK 
    ,"week_day_number" 

    -- DAY 
    ,"day_name" 
    ,"day_is_weekday" 
    ,"day_is_last_of_month" 
) 
    SELECT 
    cast(seq + 1 AS INTEGER)          AS date_id, 

    -- DATE 
    datum               AS full_date, 

    -- YEAR 
    cast(extract(YEAR FROM datum) AS SMALLINT)     AS year_number, 
    cast(extract(WEEK FROM datum) AS SMALLINT)     AS year_week_number, 
    cast(extract(DOY FROM datum) AS SMALLINT)      AS year_day_number, 

    -- QUARTER 
    cast(to_char(datum, 'Q') AS SMALLINT)       AS qtr_number, 

    -- MONTH 
    cast(extract(MONTH FROM datum) AS SMALLINT)     AS month_number, 
    to_char(datum, 'Month')          AS month_name, 
    cast(extract(DAY FROM datum) AS SMALLINT)      AS month_day_number, 

    -- WEEK 
    cast(to_char(datum, 'D') AS SMALLINT)       AS week_day_number, 

    -- DAY 
    to_char(datum, 'Day')           AS day_name, 
    CASE WHEN to_char(datum, 'D') IN ('1', '7') 
     THEN 0 
    ELSE 1 END             AS day_is_weekday, 
    CASE WHEN 
     extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER + 
         INTERVAL '1' MONTH) :: DATE - 
         INTERVAL '1' DAY) = extract(DAY FROM datum) 
     THEN 1 
    ELSE 0 END             AS day_is_last_of_month 
    FROM 
    -- Generate days for 81 years starting from 2000. 
    (
     SELECT 
     '2000-01-01' :: DATE + number AS datum, 
     number      AS seq 
     FROM facts.numbers 
     WHERE number between 0 and 81 * 365 + 20 
    ) DQ 
    ORDER BY 1; 

^务必在结束日期范围设置号码,你需要

相关问题