2016-04-23 94 views
0

我有一个表结构如下:SQL/Hive查询查找所有用户购买的第三个不同项目?

UserID itemName action 
---------------------- 

1   a   bought 

2   b   viewed 

3   c   bought 

1   b   bought 

2   c   bought 

1   c   bought 

3   b   viewed 

现在我想找到第三个(根据购买数量)不同项目由买(行动)的所有用户。所以你可以帮助我解决这个问题。抱歉表格格式不正确。

+0

你如何定义项目的购买订单购买了ITEMNAME?您是否想要确定每个用户购买的所有商品中的“第三”,还是您想知道每个用户是他们购买的“第三”商品? – collapsar

+0

您需要另一个包含时间戳的字段。这可能是[这个问题]的副本(http://stackoverflow.com/questions/400712/how-to-do-equivalent-of-limit-distinct) – 4castle

+0

@collapsar根据计数即第三最高计数每个人都买的物品 – user3396729

回答

1

我觉得你的描述中这样的话听起来很对。首先按数量选择前3位,按项目分组并按数量降序排序。然后从该组中按数量升序排序选择前1个。请记住,我不熟悉HiveSQL 100%,但这个SQL代码应该是非常接近标准:

SELECT TOP 1 itemName 
FROM (
     SELECT TOP 3 itemName, COUNT(*) AS boughtCount 
     FROM MyTable 
     WHERE action = 'bought' 

     GROUP BY itemName 
     ORDER BY boughtCount DESC 
    ) 
ORDER BY boughtCount 

编辑:按照注释中的精度:

编辑2:这是测试在MSSQL中工作,可能需要调整一些HiveSQL的语法。

SELECT TOP 1 itemId 
FROM (
     -- Get the list of the top 3 items that have as many ItemsByUsers entries as distinct userIds 
     -- in the table, group by item and sort by sum of items bought descending. 
     SELECT TOP 3 itemId, SUM(boughtCount) AS totalBought 
     FROM (
       -- Get a list of the most bought items by item and user 
       SELECT itemId, userId, COUNT(*) AS boughtCount 
       FROM MyTable 
       WHERE action = 'bought' 
       GROUP BY itemId, userId 
      ) AS ItemCountByUser 
     GROUP BY itemId 
     HAVING COUNT(*) = (SELECT COUNT(*) FROM (SELECT DISTINCT userId FROM MyTable) AS UserCount) 
     ORDER BY totalBought DESC 
    ) AS MostBought 
ORDER BY totalBought 
+0

您不需要订购外部选择,因为它只包含一条记录。 – 4castle

+0

但这并不能保证每个人都购买了它返回的物品。我想要每个用户购买的物品。 – user3396729

+0

没错。主要表现为OP的逻辑过程清晰度。 –

0

我的理解是,你想显示已被任何用户购买的itemNames 3次或更多次....?

SELECT a.itemName FROM 
    (SELECT 
     itemName AS itemName, 
     sum(action) AS action 
    FROM 
     (SELECT 
      a.itemName as itemName, 
      CASE 
       WHEN (action = 'bought') 
        THEN (1) 
       ELSE (0) 
      END AS action 
     FROM yourTableName) AS a 
    GROUP BY 
     itemName) AS a 
where action > 2; 

我还没有测试了这一点...

请让我知道这是不是你的解决方案,所以我可以探索其他选项..

0

请尝试以下的查询列出由所有用户,并在第3最高位置

from ( select itemname,count(action) boughtcount from data a join select distinct userid as id from data where action='bought' b on a.userid=b.id where a.action='bought' group by name order by boughtcount desc limit 3) as t select t.itemname limit 1;

相关问题