0
我有这段代码。在猪中创建一个庞大的过滤器
large = load 'a super large file'
CC = FILTER large BY $19 == 'abc OR $20 == 'abc'
OR $19 == 'def' or $20 == 'def' ....;
或条件的数量可能会上升到100甚至数千。
有没有更好的方法来做到这一点?
我有这段代码。在猪中创建一个庞大的过滤器
large = load 'a super large file'
CC = FILTER large BY $19 == 'abc OR $20 == 'abc'
OR $19 == 'def' or $20 == 'def' ....;
或条件的数量可能会上升到100甚至数千。
有没有更好的方法来做到这一点?
是的,将这些条件放在另一个文件中。将它加载到一个关系中,并将两个关系连接到该列上。如果必须在多列上过滤,则创建与条件一样多的过滤器文件。下面是2列
large = load 'a super large file'
filter1 = load 'file with values needed to compare with $19';
filter2 = load 'file with values needed to compare with $20';
f1 = JOIN large BY $19,filter1 BY $0;
f2 = JOIN large BY $20,filter2 BY $0;
final = UNION f1,f2;
DUMP final;
你或许可以使用多列1个过滤文件和加入那些得到不同的过滤效果,然后就工会的关系。
large = load 'a super large file'
filter_file = load 'file with values in different columns';
f1 = JOIN large BY $19,filter_file BY $0;
f2 = JOIN large BY $20,filter_file BY $1;
final = UNION f1,f2;
DUMP final;