3
我与猪合作,加载并用逗号分隔的文件/文件夹的Hadoop范围内的多个文件(this question on how to load multiple files in pig猪 - 负载不同的模式
问题是,每个文件夹有不同的模式文件(位于从该文件夹的方) - 这可能也给多模式文件
我与猪合作,加载并用逗号分隔的文件/文件夹的Hadoop范围内的多个文件(this question on how to load multiple files in pig猪 - 负载不同的模式
问题是,每个文件夹有不同的模式文件(位于从该文件夹的方) - 这可能也给多模式文件
如果你的模式文件所在的文件夹外,那么你有当您执行负载申报模式
例如? :
dataset_A = LOAD '/data/A' using PigStorage('\t') as (id:int, project:chararray, org:chararray);
dataset_B = LOAD '/data/B' using PigStorage(',') as (id:int, beta:chararray, delta:chararray, echo:int);
如果您在目录中的.pig_schema文件中有声明的模式,则只需执行加载即可,无需声明模式。
dataset_A = LOAD '/data/A' using PigStorage('\t');
dataset_B = LOAD '/data/B' using PigStorage(',');
/data/A/.pig_schema:
{"fields":
[{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"project","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"org","type":55,"description":"autogenerated from Pig Field Schema","schema":null}],
"version":0,"sortKeys":[],"sortKeyOrders":[]}
/data/B/.pig_schema:
{"fields":
[{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"beta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"delta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"echo","type":10,"description":"autogenerated from Pig Field Schema","schema":null},],
"version":0,"sortKeys":[],"sortKeyOrders":[]}