2016-09-20 100 views
2

我想在BigQuery中连接三个表;表1具有一个事件的记录(即,每行是一条记录),表2具有第二事件的记录,并且表3具有类别名称。三个表的BigQuery连接

我要产生具有按类别和设备平台的表1和表2计数决赛桌。但是,每次运行时,我都会收到一个错误消息,说明joined.t3.category不是加入中任一表的字段。

这里是我当前的代码:

Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category 

from 

(

SELECT 
     Id, 
     Origin, 
     CatId 

    FROM [testing.table_1] as t1 

JOIN (SELECT category, 
      CategoryID 

FROM [testing.table_3]) as t3 

on t1.CatId = t3.CategoryID 

) AS joined 

JOIN (SELECT Id, 
      CategoryId 

FROM [testing.table_2]) as t2 

ON (joined.t1.CatId = t2.CategoryId)  

Group by platform,category; 

仅供参考,这里的表1和表2完美的作品之间的简单连接:

Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform 

from testing.table_1 as t1 

JOIN testing.table_2 as t2 

on t1.CatId = t2.CategoryId 

Group by platform; 

回答

1

简单的解决方法是添加在第一内SELECTcategory场 - 否则它是不可见的,最外面的SELECT - 这样的错误!这是问题!

此外,在BigQuery中传统的SQL可以使用EXACT_COUNT_DISTINCT否则你得到的统计逼近 - 看到更多COUNT([DISTINCT])

因此,对于传统的SQL查询可以关注一下:

SELECT 
    EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events, 
    EXACT_COUNT_DISTINCT(t2.Id) AS t2_events, 
    joined.t1.Origin AS platform, 
    joined.t3.category AS category 
FROM (
    SELECT 
    Id, Origin, CatId, category 
    FROM [testing.table_1] AS t1 
    JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category 

而且,我觉得就像你可以进一步简化它(假设没有任何含糊的字段)

SELECT 
    EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events, 
    EXACT_COUNT_DISTINCT(t2.Id) AS t2_events, 
    joined.t1.Origin AS platform, 
    joined.t3.category AS category 
FROM (
    SELECT 
    Id, Origin, CatId, category 
    FROM [testing.table_1] AS t1 
    JOIN [testing.table_3] AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN [testing.table_2] AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category 

当然你需要如果您将使用标准SQL版本(如Elliott所示:

SELECT 
    COUNT(DISTINCT joined.t1.Id) AS t1_events, 
    COUNT(DISTINCT t2.Id) AS t2_events, 
    joined.t1.Origin AS platform, 
    joined.t3.category AS category 
FROM (
    SELECT 
    Id, Origin, CatId, category 
    FROM `testing.table_1` AS t1 
    JOIN `testing.table_3` AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN `testing.table_2` AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category 
+0

您是真正的MVP--这项工作非常完美。 –

0

我不知道谷歌与BigQuery的,但我的SQL知识说我在列名之前有两个别名会导致问题。尝试删除之后的t-别名,例如使用joined.category而不是joined.t3.category

1

你可以尝试使用standard SQL您所查询的呢?它具有更好的别名处理能力,并且COUNT(DISTINCT ...)将为您提供精确的结果,而不是像传统SQL中的近似值。如果有帮助,你需要对查询进行的唯一修改是使用反引号来转义你的表名而不是括号。例如:

SELECT 
    COUNT(DISTINCT joined.t1.Id) as t1_events, 
    COUNT(DISTINCT t2.Id) as t2_events, 
    joined.t1.Origin as platform, 
    joined.t3.category as category 
FROM (
    SELECT 
    Id, 
    Origin, 
    CatId 
    FROM `testing.table_1` AS t1 
    JOIN (
    SELECT 
     category, 
     CategoryID 
    FROM `testing.table_3` 
) AS t3 
    ON t1.CatId = t3.CategoryID 
) AS joined 
JOIN (
    SELECT 
    Id, 
    CategoryId 
    FROM `testing.table_2` 
) AS t2 
ON joined.t1.CatId = t2.CategoryId 
GROUP BY platform, category;