2017-05-09 55 views
2

我有以下数据,其中每个客户购买多个类别的项目的所有交易。我需要找到甚至不共享一个类别的客户对。如何比较使用配置单元的套件

Customer_id category_id 
    21   3 
    21   5 
    31   4 
    31   1 
    24   3 
    24   6 
    22   6 
    22   5 

我想先用collect_set,然后在一个交叉对比组加入,但我不知道在蜂巢任何这样的功能。是否有可能以更简单的方式做到这一点?我对数据输出上述应为(21,31),(31,24),(31,22),这是不共享任何category_ids

SELECT 
customer_id, COLLECT_LIST(category_id) AS aggr_set 
FROM 
    tablename 
GROUP BY 
    customer_id 

回答

0

你可以得到所有对客户的对通过使用cross join然后聚合:

select t1.customer_id, t2.customer_id 
from t t1 cross join 
    t t2 
group by t1.customer_id, t2.customer_id 
having sum(case when t1.category_id = t2.category_id then 1 else 0 end) = 0; 
0

获取对使用self-join客户和计数不匹配的每一对客户的总的行数和。如果它们相等,则意味着它们的所有category_id不匹配。

select c1,c2 
from (
select t1.customer_id as c1,t2.customer_id as c2 
,sum(case when t1.category_id=t2.category_id then 0 else 1 end) as mismatches 
,count(*) as combinations 
from tablename t1 
join tablename t2 on t1.customer_id<t2.customer_id 
group by t1.customer_id, t2.customer_id 
) t 
where combinations = mismatches