2017-06-04 305 views
0

对于每个fullvisitorId,我试图在date_1和date_2之间获得所有visitId。这对于每个用户当然是不同的。每个用户的(不同)日期范围之间的VisitId

任何人都可以提供任何指针我怎么能做到这一点?

例如:

  • USER_1:我想所有visitId 1日之间& 6月20日
  • user_2:我想12 & 6月27日 之间的所有visitId ......等儿子

date_1和date_2对应于他们在网站上采取的重要操作(Event匹配)。下载试用&购买

在此先感谢您的任何线索。

回答

1

解决此问题的一种可能方法是使用analytical functions。举个例子:

#standardSQL 
WITH data AS(
    select '1' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL 
    select '1' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL 
    select '1' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event2' as eventCategory) as eventInfo)] hits UNION ALL 
    select '1' as user, '4' as visitid, '20170523' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL 

    select '2' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL 
    select '2' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event2' as eventCategory) as eventInfo)] hits UNION ALL 
    select '2' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits union all 

    select '3' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL 
    select '3' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL 
    select '3' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits 
) 

SELECT 
    user, 
    visitid, 
    date 
FROM(
    SELECT 
    user, 
    visitid, 
    date, 
    MIN(CASE WHEN hits.eventInfo.eventCategory = 'event1' THEN date END) OVER(PARTITION BY user) min_date, 
MAX(CASE WHEN hits.eventInfo.eventCategory = 'event2' THEN date END) OVER(PARTITION BY user) max_date 
FROM data, 
UNNEST(hits) hits 
) 
WHERE date BETWEEN min_date AND max_date 

哪里data是您ga_sessions数据(我命名为 'fullvisitorid' 为 '用户')的模拟。

这使得给定用户可以有日期1和日期2个不同事件的假设(所以它采取了MINMAX分别),并假定您保存在eventCategory场的情况下(假设您的活动“下载”和“购买”在会话级别中定义,我建议您使用customDimensions字段而不是hits.eventInfo.eventCategory一个)。

除了分析功能,您还可以用标准的SQL版本ARRAYs and STRUCTs工作:

SELECT 
    user, 
    ARRAY(SELECT AS STRUCT visitid, date FROM UNNEST(user_data) WHERE date BETWEEN min_date AND max_date) user_data 
FROM(
    SELECT 
    user, 
    ARRAY_AGG((SELECT AS STRUCT visitid, date)) user_data, 
    MIN(CASE WHEN EXISTS(SELECT 1 FROM UNNEST(hits) hits WHERE hits.eventInfo.eventCategory = 'event1') then date END) min_date, 
    MAX(CASE WHEN EXISTS(SELECT 1 FROM UNNEST(hits) hits WHERE hits.eventInfo.eventCategory = 'event2') THEN date END) max_date 
FROM data 
GROUP BY user 
) 
WHERE ARRAY_LENGTH(ARRAY(SELECT AS STRUCT visitid, date FROM UNNEST(user_data) WHERE date BETWEEN min_date AND max_date)) > 0 

如果我所做的假设是不与您的数据一致,你可以采用这些技术来查询你想要什么。您也可以将模拟数据用于测试目的(以及使其更适合您的数据集)。

+0

Thanks @Will This help! :) –

相关问题