我有加入猪的问题。我将从给你的背景开始。这里是我的代码:猪 - 加入不起作用
-- START file loading
start_file = LOAD 'dir/start_file.csv' USING PigStorage(';') as (PARTRANGE:chararray, COD_IPUSER:chararray);
-- trim
A = FOREACH start_file GENERATE TRIM(PARTRANGE) AS PARTRANGE, TRIM(COD_IPUSER) AS COD_IPUSER;
dump A;
这给输出:
(79.92.147.88,20140310)
(79.92.147.88,20140310)
(109.31.67.3,20140310)
(109.31.67.3,20140310)
(109.7.229.143,20140310)
(109.8.114.133,20140310)
(77.198.79.99,20140310)
(77.200.174.171,20140310)
(77.200.174.171,20140310)
(109.17.117.212,20140310)
加载其他的文件:
-- Chargement du fichier recherche Hadopi
file2 = LOAD 'dir/file2.csv' USING PigStorage(';') as (IP_RECHERCHEE:chararray, DATE_HADO:chararray);
dump file2;
输出是这样的:
(2014/03/10 00:00:00,79.92.147.88)
(2014/03/10 00:00:01,79.92.147.88)
(2014/03/10 00:00:00,192.168.2.67)
现在,我想要做一个左外连接。下面的代码:
result = JOIN file2 by IP_RECHERCHEE LEFT OUTER, A by COD_IPUSER;
dump result;
输出是这样的:
(2014/03/10 00:00:00,79.92.147.88,,)
(2014/03/10 00:00:00,192.168.2.67,,)
(2014/03/10 00:00:01,79.92.147.88,,)
所有的“文件2”的记录都在这里,这是很好的,但任何start_file都在这里。这就好像加入失败了一样。
你知道问题在哪里吗?
谢谢。