2017-07-26 61 views
0

我正在尝试使用BigQuery重新创建GA漏斗(Google360上的自定义报告)。 GA上的漏斗使用每页上发生的事件的唯一计数。我发现,工作在大多数情况下此查询在线:在BigQuery上重新创建GA漏斗

SELECT 
    COUNT(s0.firstHit) AS Landing_Page, 
    COUNT(s1.firstHit) AS Model_Selection 
from(
SELECT 
     s0.fullvisitorID, 
     s0.firstHit, 
     s1.firstHit, 
    FROM (
      # Begin Subquery #1 aka s0 
      SELECT 
        fullvisitorID, 
        MIN(hits.hitNumber) AS firstHit 
      FROm [64269470.ga_sessions_20170720] 
      WHERE 
        hits.eventInfo.eventAction in ('landing_page') 
        AND totals.visits = 1 
      GROUP BY 
        fullvisitorID 
       ) s0 
    # End Subquery #1 aka s0 

    left join (

    # Begin Subquery #2 aka s1 
      SELECT 
       fullvisitorID, 
       MIN(hits.hitNumber) AS firstHit 
      FROM [64269470.ga_sessions_20170720] 
      WHERE 
      hits.eventInfo.eventAction in ('model_selection_page') 
      AND totals.visits = 1 
      GROUP BY 
       fullvisitorID, 
       ) s1 

     ON 
    s0.fullvisitorID = s1.fullvisitorID 

    ) 

查询工作正常,并为着陆页的值,因为我可以得到GA相同,但Model_Selection是高出10%左右。这个差异也随着漏斗的增加而增加(为了清楚起见,我只发布了两个步骤)。 任何想法我在这里想念什么?

回答

1

此查询确实需要什么,但在Standard SQL版本:

#standardSQL 
SELECT 
    SUM((SELECT COUNTIF(eventInfo.eventAction = 'landing_page') FROM UNNEST(hits))) Landing_Page, 
    SUM((SELECT COUNTIF(eventInfo.eventAction = 'model_selection_page') FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'landing_page'))) Model_Selection 
FROM `64269470.ga_sessions_20170720` 

这一点。 4线,方式更快,更便宜。

您也可以使用模拟数据,像玩:

#standardSQL 
WITH data AS(
    SELECT '1' AS fullvisitorid, ARRAY<STRUCT<eventInfo STRUCT<eventAction STRING > >> [STRUCT(STRUCT('landing_page' AS eventAction) AS eventInfo)] AS hits UNION ALL 
    SELECT '1' AS fullvisitorid, ARRAY<STRUCT<eventInfo STRUCT<eventAction STRING > >> [STRUCT(STRUCT('landing_page' AS eventAction) AS eventInfo), STRUCT(STRUCT('landing_page' AS eventAction) AS eventInfo)] AS hits UNION ALL 
    SELECT '1' AS fullvisitorid, ARRAY<STRUCT<eventInfo STRUCT<eventAction STRING > >> [STRUCT(STRUCT('landing_page' AS eventAction) AS eventInfo), STRUCT(STRUCT('model_selection_page' AS eventAction) AS eventInfo)] AS hits UNION ALL 
    SELECT '1' AS fullvisitorid, ARRAY<STRUCT<eventInfo STRUCT<eventAction STRING > >> [STRUCT(STRUCT('model_selection_page' AS eventAction) AS eventInfo), STRUCT(STRUCT('model_selection_page' AS eventAction) AS eventInfo)] AS hits 
) 

SELECT 
    SUM((SELECT COUNTIF(eventInfo.eventAction = 'landing_page') FROM UNNEST(hits))) Landing_Page, 
    SUM((SELECT COUNTIF(eventInfo.eventAction = 'model_selection_page') FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'landing_page'))) Model_Selection 
FROM data 

注意,当您需要选择谁曾至少一次烧制的游客在乔治亚州建立这种类型的报表可能会有点难度事件'landing_page',然后发起事件'model_selection_page'。确保你在GA中正确建立了这个报告(一种方法可能是首先构建一个自定义报告,只有'landing_page'被触发的客户,然后应用第二个过滤器寻找'model_selection_page')。

[编辑]:

你在你的关于把这个计数的会话和用户级别评论问。对于每个会话计数,可以将结果限制为1对每个子查询评估,像这样:

SELECT 
    SUM((SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'landing_page' LIMIT 1)) Landing_Page, 
    SUM((SELECT 1 FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'landing_page') AND eventInfo.eventAction = 'model_selection_page' LIMIT 1)) Model_Selection 
FROM data 

用于计数不同用户的想法是一样的,但是你必须应用COUNT(DISTINCT)操作,像这样:

SELECT 
    COUNT(DISTINCT(SELECT fullvisitorid FROM UNNEST(hits) WHERE eventInfo.eventAction = 'landing_page' LIMIT 1)) Landing_Page, 
    COUNT(DISTINCT(SELECT fullvisitorid FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'landing_page') AND eventInfo.eventAction = 'model_selection_page' LIMIT 1)) Model_Selection 
FROM data 
+0

嗨威利安,谢谢你的回答。这是您一直在使用的有趣方法。快速的问题,但。我会用这种结构来区分用户和会话。它看起来像是在计算总数。 谢谢! – Jacob

+0

@Jacob多亏了另一个引用这个问题的问题,我发现你的评论,抱歉花了这么长时间来回复。我编辑了我的答案,希望这是你正在寻找的。让我知道它是否工作:) –