我觉得你的描述中这样的话听起来很对。首先按数量选择前3位,按项目分组并按数量降序排序。然后从该组中按数量升序排序选择前1个。请记住,我不熟悉HiveSQL 100%,但这个SQL代码应该是非常接近标准:
SELECT TOP 1 itemName
FROM (
SELECT TOP 3 itemName, COUNT(*) AS boughtCount
FROM MyTable
WHERE action = 'bought'
GROUP BY itemName
ORDER BY boughtCount DESC
)
ORDER BY boughtCount
编辑:按照注释中的精度:
编辑2:这是测试在MSSQL中工作,可能需要调整一些HiveSQL的语法。
SELECT TOP 1 itemId
FROM (
-- Get the list of the top 3 items that have as many ItemsByUsers entries as distinct userIds
-- in the table, group by item and sort by sum of items bought descending.
SELECT TOP 3 itemId, SUM(boughtCount) AS totalBought
FROM (
-- Get a list of the most bought items by item and user
SELECT itemId, userId, COUNT(*) AS boughtCount
FROM MyTable
WHERE action = 'bought'
GROUP BY itemId, userId
) AS ItemCountByUser
GROUP BY itemId
HAVING COUNT(*) = (SELECT COUNT(*) FROM (SELECT DISTINCT userId FROM MyTable) AS UserCount)
ORDER BY totalBought DESC
) AS MostBought
ORDER BY totalBought
你如何定义项目的购买订单购买了ITEMNAME?您是否想要确定每个用户购买的所有商品中的“第三”,还是您想知道每个用户是他们购买的“第三”商品? – collapsar
您需要另一个包含时间戳的字段。这可能是[这个问题]的副本(http://stackoverflow.com/questions/400712/how-to-do-equivalent-of-limit-distinct) – 4castle
@collapsar根据计数即第三最高计数每个人都买的物品 – user3396729