tl; dr:我想在Redshift中生成一个日期表,以便更容易地生成报告。不需要大型表已经在redshift,需要上传一个csv文件。如何在Redshift中创建日期表?
长版本: 我正在编写一份报告,我必须平均每周创建新项目。日期范围可能会持续数月或更长时间,所以可能会有5个星期一,但只有4个星期日,这可能会使数学有点棘手。另外,我无法保证每天有单个项目的实例,特别是一旦用户开始分割数据。其中,这正在绊倒BI工具。
解决此问题的最佳方法很可能是日期表。但是,日期表的大多数教程都使用了Redshift无法提供或不完全支持的SQL命令(我在看着你,generate_series)。
有没有一种简单的方法在Redshift中生成日期表?
我尝试使用的代码(在此基础上也 - 不工作的建议:http://elliot.land/post/building-a-date-dimension-table-in-redshift)
CREATE TABLE facts.dates (
"date_id" INTEGER NOT NULL PRIMARY KEY,
-- DATE
"full_date" DATE NOT NULL,
-- YEAR
"year_number" SMALLINT NOT NULL,
"year_week_number" SMALLINT NOT NULL,
"year_day_number" SMALLINT NOT NULL,
-- QUARTER
"qtr_number" SMALLINT NOT NULL,
-- MONTH
"month_number" SMALLINT NOT NULL,
"month_name" CHAR(9) NOT NULL,
"month_day_number" SMALLINT NOT NULL,
-- WEEK
"week_day_number" SMALLINT NOT NULL,
-- DAY
"day_name" CHAR(9) NOT NULL,
"day_is_weekday" SMALLINT NOT NULL,
"day_is_last_of_month" SMALLINT NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;
INSERT INTO facts.dates
(
"date_id"
,"full_date"
,"year_number"
,"year_week_number"
,"year_day_number"
-- QUARTER
,"qtr_number"
-- MONTH
,"month_number"
,"month_name"
,"month_day_number"
-- WEEK
,"week_day_number"
-- DAY
,"day_name"
,"day_is_weekday"
,"day_is_last_of_month"
)
SELECT
cast(seq + 1 AS INTEGER) AS date_id,
-- DATE
datum AS full_date,
-- YEAR
cast(extract(YEAR FROM datum) AS SMALLINT) AS year_number,
cast(extract(WEEK FROM datum) AS SMALLINT) AS year_week_number,
cast(extract(DOY FROM datum) AS SMALLINT) AS year_day_number,
-- QUARTER
cast(to_char(datum, 'Q') AS SMALLINT) AS qtr_number,
-- MONTH
cast(extract(MONTH FROM datum) AS SMALLINT) AS month_number,
to_char(datum, 'Month') AS month_name,
cast(extract(DAY FROM datum) AS SMALLINT) AS month_day_number,
-- WEEK
cast(to_char(datum, 'D') AS SMALLINT) AS week_day_number,
-- DAY
to_char(datum, 'Day') AS day_name,
CASE WHEN to_char(datum, 'D') IN ('1', '7')
THEN 0
ELSE 1 END AS day_is_weekday,
CASE WHEN
extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
INTERVAL '1' MONTH) :: DATE -
INTERVAL '1' DAY) = extract(DAY FROM datum)
THEN 1
ELSE 0 END AS day_is_last_of_month
FROM
-- Generate days for 81 years starting from 2000.
(
SELECT
'2000-01-01' :: DATE + generate_series AS datum,
generate_series AS seq
FROM generate_series(0,81 * 365 + 20,1)
) DQ
ORDER BY 1;
会抛出这个错误
[Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
1 statement failed.
......因为,我假设INSERT和generate_series不允许在Redshift中的同一命令中
正如你已经发现,'generate_series()'不能与实际的数据,因为它仅执行领导节点上使用。你的方法生成一个数字表,然后加入它的效果很好。或者,在Excel中创建源文件并仅导入结果。像这样的日期表非常适合报告。您可能想要添加的其他内容:公共假期标志,季度标志的最后一天,年份标志的最后一天(适用于按期间最后一个日期分组的报告)。 –
我喜欢那些额外的列。谢谢约翰! – Phillip