2017-06-20 131 views
0

我想查询使用基本重复字段来存储这样的数据表:BigQuery的标准SQL枢轴结构体和求和非重叠窗口

+---+----------+------------+ 
| i | data.key | data.value | 
+---+----------+------------+ 
| 0 | a  |   1 | 
| | b  |   2 | 
| 1 | a  |   3 | 
| | b  |   4 | 
| 2 | a  |   5 | 
| | b  |   6 | 
| 3 | a  |   7 | 
| | b  |   8 | 
+---+----------+------------+ 

我试图找出如何到运行查询得到像

+---+----+----+ 
| i | a | b | 
+---+----+----+ 
| 1 | 4 | 6 | 
| 3 | 12 | 14 | 
+---+----+----+ 

一个结果,其中每一行代表一个非重叠的总和(即i=1是行i=0i=1的总和)和数据已被枢转,使得data.key现在是一个列。

问题1:

我尽我所能去转换this answer使用标准的SQL和结束:

SELECT 
    i, 
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`, 
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b` 
    FROM 
    `dataset.testing.dummy`) 

这工作,但我不知道是否有更好的方法做

SELECT 
    i, 
    SUM(a) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `a`, 
    SUM(b) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `b` 
FROM (
    SELECT 
    i, 
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`, 
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b` 
    FROM 
    `dataset.testing.dummy`) 
ORDER BY 
    i; 

问题2::

此,尤其是试图使用解析函数时,因为它产生了特别详细的查询

如何编写ROWRANGE语句,以使得结果窗口不重叠。在最后一个查询中,我得到了滚动的数据总和,这并不是我想要做的。

+---+----+----+ 
| i | a | b | 
+---+----+----+ 
| 0 | 1 | 2 | 
| 1 | 4 | 6 | 
| 2 | 8 | 10 | 
| 3 | 12 | 14 | 
+---+----+----+ 

滚动总和产生每行的结果,而我试图减少返回的行数。

回答

1

使用临时SQL函数加上命名窗口有助于提供详细信息。不过,我不得不在以后使用另一个子选项来应用i。下面是一个自包含的例子:

#standardSQL 
CREATE TEMP FUNCTION SumKey(
    data ARRAY<STRUCT<key STRING, value INT64>>, 
    target_key STRING) AS (
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = target_key) 
); 

WITH Input AS (
    SELECT 
    0 AS i, 
    ARRAY<STRUCT<key STRING, value INT64>>[('a', 1), ('b', 2)] AS data UNION ALL 
    SELECT 1, ARRAY<STRUCT<key STRING, value INT64>>[('a', 3), ('b', 4)] UNION ALL 
    SELECT 2, ARRAY<STRUCT<key STRING, value INT64>>[('a', 5), ('b', 6)] UNION ALL 
    SELECT 3, ARRAY<STRUCT<key STRING, value INT64>>[('a', 7), ('b', 8)] 
) 
SELECT * FROM (
    SELECT 
    i, 
    SUM(a) OVER W AS a, 
    SUM(b) OVER W AS b 
    FROM (
    SELECT 
     i, 
     SumKey(data, 'a') AS a, 
     SumKey(data, 'b') AS b 
    FROM Input 
) 
    WINDOW W AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) 
) 
WHERE MOD(i, 2) = 1 
ORDER BY i; 

这导致:

+---+----+----+ 
| i | a | b | 
+---+----+----+ 
| 1 | 4 | 6 | 
| 3 | 12 | 14 | 
+---+----+----+