2017-08-04 68 views
0

我想知道是否可以使用一个查询在两个不同级别上汇总信息? 例如,我有表格,并希望获得购买特定商品的客户的唯一数量,以及每个customer_id购买的某个item_id的数量除以客户总数。在一个查询中汇总来自两个不同级别的信息

Table 
customer_id item_id bought_date 
    abc   12  2017-01-01 
    def   23  2017-01-08 
    abc   12  2017-01-02 
    abc   13  2017-01-02 
    ghi   23  2017-01-02 

我想输出

item_id customer_id item_count_per_customer customers_probability_per_item total_customers 
12   abc    2      1  3 
13   abc    1      1  3 
23   def    1      2  3 
23   ghi    1      2. 

我能得到item_count_per_customer单独列如下:

select item_id, customer_id, count(1) as item_count_per_customer 
from table 
group by item_id, customer_id 

我还可以得到单个列customers_count_per_item如下: 选择ITEM_ID,从列表 中逐项计数(distinct customer_id)为customers_count_per_item ID

我还需要总独特的客户数如下: SELECT COUNT(不同CUSTOMER_ID)从表

total_customers所以我需要所有这些信息在一排。要做到这一点的唯一方法是将这3个查询(可能作为子查询)结合起来,还是有更有效的方式去做到这一点?

回答

0

窗口功能

select  item_id 
      ,customer_id 
      ,count(*)            as item_count_per_customer 
      ,count(distinct customer_id) over (partition by item_id) as customers_count_per_item 
      ,count(distinct customer_id) over()      as total_customers 

from  mytable 

group by item_id 
      ,customer_id 
; 

+---------+-------------+-------------------------+--------------------------+-----------------+ 
| item_id | customer_id | item_count_per_customer | customers_count_per_item | total_customers | 
+---------+-------------+-------------------------+--------------------------+-----------------+ 
| 23  | ghi   | 1      | 2      | 3    | 
+---------+-------------+-------------------------+--------------------------+-----------------+ 
| 23  | def   | 1      | 2      | 3    | 
+---------+-------------+-------------------------+--------------------------+-----------------+ 
| 13  | abc   | 1      | 1      | 3    | 
+---------+-------------+-------------------------+--------------------------+-----------------+ 
| 12  | abc   | 2      | 1      | 3    | 
+---------+-------------+-------------------------+--------------------------+-----------------+ 
+0

是否在蜂巢这项工作? – vkaul11

+0

您在这里看到的是从执行的代码复制粘贴,所以答案是 - “是”。无论如何,你应该在你自己的系统上用你自己的Hive版本进行测试。 –

相关问题