2017-05-09 86 views
0

属性我有一个表称为删除重复的行:结果 我使用的BigQuery从GA根据谷歌的BigQuery SQL

SELECT 
    Date, 
    totals.pageviews, 
    h.transaction.transactionId, 
    h.item.itemQuantity, 
    h.transaction.transactionRevenue, 
    totals.bounces, 
    fullvisitorid, 
    totals.timeOnSite, 
    device.browser, 
    device.deviceCategory, 
    trafficSource.source, 
    channelGrouping, 
    h.page.pagePath, 
    h.eventInfo.eventCategory, 
    device.operatingSystem 
FROM 
    `atomic-life-148403.126959513.ga_sessions_*`, 
    UNNEST(hits) AS h 
WHERE 
    _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','') 
    AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-','')) 
    ORDER BY 
    date DESC 

enter image description here

选择数据有重复的一些记录。如何从表中删除重复的记录?

我想获得以下结果。 enter image description here

+1

您实际上是想查找并删除行,还是将它们从查询结果中隐藏起来?如果后者使用DISTINCT。如果前者,它会变得更复杂一点。 – ADyson

+0

如何只选择不同的行?因为频繁度和收入彼此分开 – bob90937

+0

对于SO的重要性 - 您可以使用投票下面的投票答案左侧的勾号来标记接受的答案。看到http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235为什么它很重要!对答案投票也很重要。表决有用的答案。还有更多......当某人回答你的问题时,你可以查看该怎么做 - http://stackoverflow.com/help/someone-answers。 –

回答

0
SELECT DISTINCT * 
FROM [YourTable] 
+0

不能只用 – bob90937

+0

@ bob90937,怎么来? – jarlh

+0

您需要使用[标准SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/)来使用SELECT DISTINCT。 –

0

可以使用ROW_NUMBER()解析函数像

select * from (
select *, 
ROW_NUMBER() OVER(PARTITION BY transactionid ORDER BY transactionid) rownum 
from result) xxx 
where rownum = 1; 
0

您可以选择的唯一行和删除其他:

DELETE FROM MyTable 
LEFT OUTER JOIN (
    SELECT DISTINCT * FROM MyTable 
) as UniqueRows ON 
    MyTable.KeyField= UniqueRows.KeyField 
WHERE 
    UniqueRows.KeyField IS NULL; 
0

使用与您的所有选中列的GROUP BY应该得到在结果中删除任何真正重复的行:

SELECT 
    Date, 
    totals.pageviews, 
    h.transaction.transactionId, 
    h.item.itemQuantity, 
    h.transaction.transactionRevenue, 
    totals.bounces, 
    fullvisitorid, 
    totals.timeOnSite, 
    device.browser, 
    device.deviceCategory, 
    trafficSource.source, 
    channelGrouping, 
    h.page.pagePath, 
    h.eventInfo.eventCategory, 
    device.operatingSystem 
FROM 
    `atomic-life-148403.126959513.ga_sessions_*`, 
    UNNEST(hits) AS h 
WHERE 
    _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 
YEAR) AS STRING), '-','') 
    AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-','')) 
GROUP BY 
    Date, 
    totals.pageviews, 
    h.transaction.transactionId, 
    h.item.itemQuantity, 
    h.transaction.transactionRevenue, 
    totals.bounces, 
    fullvisitorid, 
    totals.timeOnSite, 
    device.browser, 
    device.deviceCategory, 
    trafficSource.source, 
    channelGrouping, 
    h.page.pagePath, 
    h.eventInfo.eventCategory, 
    device.operatingSystem 
ORDER BY 
    date DESC; 
0

您可以使用ROW_NUMBER

WITH CTE AS 
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY 
transactionid) ROW FROM [YourTable]) 

DELETE [YourTable] 
FROM [YourTable] 
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid 
           WHERE CTE.ROW > 1 
1
下面

是BigQuery的标准SQL

#standardSQL 
SELECT DISTINCT 
    Date, 
    totals.pageviews, 
    h.transaction.transactionId, 
    h.item.itemQuantity, 
    h.transaction.transactionRevenue, 
    totals.bounces, 
    fullvisitorid, 
    totals.timeOnSite, 
    device.browser, 
    device.deviceCategory, 
    trafficSource.source, 
    channelGrouping, 
    h.page.pagePath, 
    h.eventInfo.eventCategory, 
    device.operatingSystem 
FROM 
    `atomic-life-148403.126959513.ga_sessions_*`, 
    UNNEST(hits) AS h 
WHERE 
    _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','') 
    AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-','')) 
    ORDER BY 
    date DESC 

正如你所看到的 - 我只是说DISTINCT到您的选择 - 看到更多关于SELECT and its modifiers可供BigQuery标准SQL