2013-08-29 54 views
2

我有一个通用的关系的这样的:如何通过(b,a)过滤(a,b)关系?

DUMP A; 
(a, b) 
(a, c) 
(a, d) 
(b, a) 
(d, a) 
(d, b) 

看到,有一对(A,B)和(B,A);但(d,b)没有一对。 我想过滤这些“不成对”的元组。

最终的结果应该是这样的:

DUMP R; 
(a, b) 
(a, d) 
(b, a) 
(d, a) 

我怎么可以这样写对猪?

我可以用下面的代码来解决,但交叉操作太贵:

A_cp = FOREACH L GENERATE u1, u2; 
X = CROSS A, A_cp; 
F = FILTER X BY ($0 == $3 AND $1 == $2); 
R = FOREACH F GENERATE $0, $1; 

回答

1

这是我DESCRIBE A ; DUMP A ;的输出:

A: {first: chararray,second: chararray} 
(a,b) 
(a,c) 
(a,d) 
(b,a) 
(d,a) 
(d,b) 

这是一种方式,你可以解决这个问题:

A = LOAD 'foo.in' AS (first:chararray, second:chararray) ; 
-- Can't do a join on its self, so we have to duplicate A 
A2 = FOREACH A GENERATE * ; 

-- Join the As so that are in (b,a,a,c) etc. pairs. 
B = JOIN A BY second, A2 BY first ; 

-- We only want pairs where the first char is equal to the last char. 
C = FOREACH (FILTER B BY A::first == A2::second) 
    -- Now we project out just one side of the pair. 
    GENERATE A::first AS first, A::second AS second ; 

输出:

C: {first: chararray,second: chararray} 
(b,a) 
(d,a) 
(a,b) 
(a,d) 

更新:作为WinnieNicklaus指出,这可以缩短为:

B = FOREACH (JOIN A BY (first, second), A2 BY (second, first)) 
    GENERATE A::first AS first, A::second AS second ; 
+0

谢谢,我会尝试你的代码。 我能用下面的代码完成任务,但是交叉操作太贵了: A_cp = FOREACH A GENERATE u1,u2; X = CROSS A,A_CP; F = FILTER X BY($ 0 == $ 3 AND $ 1 == $ 2); R = FOREACH F生成$ 0,$ 1; – user2730009

+0

@ user2730009内连接应该明显更便宜。 – mr2ert

+0

它工作正常! Thx – user2730009

相关问题