2012-02-12 90 views
16

我需要这个猪脚的帮助。我只是得到一个单一的记录。我选择了2列,并在另一个列上进行了计数(明显),同时还使用了where子句来查找特定描述(desc)。选择数不同使用猪拉丁文

这是我的SQL与猪我想编码。

/* 
    For example in sql: 
    select domain, count(distinct(segment)) as segment_cnt 
    from table 
    where desc='ABC123' 
    group by domain 
    order by segment_count desc; 
    */ 

    A = LOAD 'myoutputfile' USING PigStorage('\u0005') 
      AS (
       domain:chararray, 
       segment:chararray, 
       desc:chararray 
       ); 
B = filter A by (desc=='ABC123'); 
C = foreach B generate domain, segment; 
D = DISTINCT C; 
E = group D all; 
F = foreach E generate group, COUNT(D) as segment_cnt; 
G = order F by segment_cnt DESC; 

回答

30

您上的每个结构域能集团,然后用nested FOREACH语法计数每组中不同元件的数量:

D = group C by domain; 
E = foreach D { 
    unique_segments = DISTINCT C.segment; 
    generate group, COUNT(unique_segments) as segment_cnt; 
}; 
+5

我认为是完美的应该是 unique_segments = DISTINCT C.segment; – 2014-02-03 15:23:30

1

可以更好地定义此作为宏:

DEFINE DISTINCT_COUNT(A, c) RETURNS dist { 
    temp = FOREACH $A GENERATE $c;                                      
    dist = DISTINCT temp;                                        
    groupAll = GROUP dist ALL;                                       
    $dist = FOREACH groupAll GENERATE COUNT(dist);                                  
} 

用法:

X = LOAD 'data' AS (x: int);

Y = DISTINCT_COUNT(X, x);

如果您需要在FOREACH使用它,而不是那么最简单的方法是这样的:

...GENERATE COUNT(Distinct(x))...

测试在猪12

0

如果你不想要指望任何组,您使用此:

G = FOREACH (GROUP A ALL){ 
unique = DISTINCT A.field; 
GENERATE COUNT(unique) AS ct; 
}; 

这只会给你一个数字。