2017-07-25 67 views
1

我正在尝试使用Hadoop和Apache Pig。我有一些数据和脚本中的.txt文件.pig与我的脚本文件:猪脚本不适用于MapReduce

student = LOAD '/home/srv-hadoop/data.txt' USING PigStorage(',') 
    as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray); 

student_order = ORDER student BY firstname ASC; 

Dump student_order; 

这是我的.txt文件:

001,Rajiv,Reddy,21,9848022337,Hyderabad 
002,siddarth,Battacharya,22,9848022338,Kolkata 
003,Rajesh,Khanna,22,9848022339,Delhi 
004,Preethi,Agarwal,21,9848022330,Pune 
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 
006,Archana,Mishra,23,9848022335,Chennai 
007,Komal,Nayak,24,9848022334,trivendram 
008,Bharathi,Nambiayar,24,9848022333,Chennai 

但是,当我执行:猪 - X MapReduce的data.pig

17/07/25 17:04:59 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 
17/07/25 17:04:59 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 
17/07/25 17:04:59 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 
2017-07-25 17:04:59,399 [main] INFO org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58 
2017-07-25 17:04:59,399 [main] INFO org.apache.pig.Main - Logging error messages to: /home/srv-hadoop/pig_1500995099397.log 
2017-07-25 17:04:59,749 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
2017-07-25 17:04:59,930 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/srv-hadoop/.pigbootup not found 
2017-07-25 17:05:00,062 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 
2017-07-25 17:05:00,066 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:54310 
2017-07-25 17:05:00,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:54311 
2017-07-25 17:05:00,489 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-data.pig-2bb2e75c-41a7-42bf-926f-05354b881211 
2017-07-25 17:05:00,489 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false 
2017-07-25 17:05:01,230 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: ORDER_BY 
2017-07-25 17:05:01,279 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 
2017-07-25 17:05:01,308 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 
2017-07-25 17:05:01,362 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 699400192 to monitor. collectionUsageThreshold = 489580128, usageThreshold = 489580128 
2017-07-25 17:05:01,411 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 
2017-07-25 17:05:01,452 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizerMR - Using Secondary Key Optimization for MapReduce node scope-23 
2017-07-25 17:05:01,462 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 
2017-07-25 17:05:01,462 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3 
2017-07-25 17:05:01,515 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id 
2017-07-25 17:05:01,516 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 
2017-07-25 17:05:01,548 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 
2017-07-25 17:05:01,552 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 
2017-07-25 17:05:01,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 
2017-07-25 17:05:01,555 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 
2017-07-25 17:05:01,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 
2017-07-25 17:05:01,570 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication 
2017-07-25 17:05:01,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1676993497/tmp-1698368733/pig-0.17.0-core-h2.jar 
2017-07-25 17:05:01,932 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1676993497/tmp885160047/automaton-1.11-8.jar 
2017-07-25 17:05:01,975 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1676993497/tmp-1346471388/antlr-runtime-3.4.jar 
2017-07-25 17:05:02,012 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/srv-hadoop/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1676993497/tmp32088650/joda-time-2.9.3.jar 
2017-07-25 17:05:02,023 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 
2017-07-25 17:05:02,031 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 
2017-07-25 17:05:02,031 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 
2017-07-25 17:05:02,031 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 
2017-07-25 17:05:02,093 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 
2017-07-25 17:05:02,095 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address 
2017-07-25 17:05:02,095 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 
2017-07-25 17:05:02,104 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
2017-07-25 17:05:02,113 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 
2017-07-25 17:05:02,178 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 
2017-07-25 17:05:02,207 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 
2017-07-25 17:05:02,213 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area file:/home/srv-hadoop/hadoop-2.6.2/tmp/mapred/staging/srv-hadoop1897657638/.staging/job_local1897657638_0001 
2017-07-25 17:05:02,214 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:data.pig got an error while submitting 
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:294) 
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:302) 
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:319) 
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) 
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) 
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) 
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128) 
    at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205) 
    at java.lang.Thread.run(Thread.java:748) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:301) 
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280) 
    ... 18 more 
2017-07-25 17:05:02,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1897657638_0001 
2017-07-25 17:05:02,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases student 
2017-07-25 17:05:02,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: student[1,10],student[-1,-1] C: R: 
2017-07-25 17:05:02,600 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 
2017-07-25 17:05:07,608 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 
2017-07-25 17:05:07,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1897657638_0001 has failed! Stop running all dependent jobs 
2017-07-25 17:05:07,609 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 
2017-07-25 17:05:07,619 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
2017-07-25 17:05:07,620 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING 
2017-07-25 17:05:07,620 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 
2017-07-25 17:05:07,622 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 
2.6.2 0.17.0 srv-hadoop 2017-07-25 17:05:01 2017-07-25 17:05:07 ORDER_BY 

Failed! 

Failed Jobs: 
JobId Alias Feature Message Outputs 
job_local1897657638_0001 student MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:294) 
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:302) 
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:319) 
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) 
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) 
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) 
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.pig.backend.hadoop.PigJobControl.submit(PigJobControl.java:128) 
    at org.apache.pig.backend.hadoop.PigJobControl.run(PigJobControl.java:205) 
    at java.lang.Thread.run(Thread.java:748) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:301) 
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280) 
    ... 18 more 


Input(s): 
Failed to read data from "/home/srv-hadoop/data.txt" 

Output(s): 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

Job DAG: 
job_local1897657638_0001 -> null, 
null -> null, 
null 


2017-07-25 17:05:07,622 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 
2017-07-25 17:05:07,624 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias student_order 
Details at logfile: /home/srv-hadoop/pig_1500995099397.log 
2017-07-25 17:05:07,648 [main] INFO org.apache.pig.Main - Pig script completed in 8 seconds and 442 milliseconds (8442 ms) 

我得到:

Input(s): 
Failed to read data from "/home/srv-hadoop/data.txt" 

Output(s): 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

Job DAG: 
job_local1897657638_0001 -> null, 
null -> null, 
null 

位,如果我执行:猪-x当地data.pig - >它工作正常

我错过了什么?

+0

检查file.What的路径在田野间的data.txt中被使用的分隔符? –

+0

文件路径很好,字段被昏迷分隔。我用data.txt文件更新了我的问题。我必须将data.txt放在namenode中的datanode目录和data.pig中? – Deadpool

回答

1

嘿你的'data.txt'似乎在你的本地文件系统上。当您运行'pig -x mapreduce'时,它希望输入为hdfs。

由于'/home/srv-hadoop/data.txt'文件在本地文件系统上,因此'pig -x local'正在工作。在Hadoop文件系统

制作目录:

  1. Hadoop的FS -mkdir -p /家庭/ SRV-的Hadoop/

复制从本地您的data.txt文件的Hadoop

  • hadoop的FS -put /home/srv-hadoop/data.txt /家庭/ SRV-的hadoop/
  • 现在在mapreduce模式下运行你的猪。它会正常工作

    0

    您的文件中有6个字段,并且您指定的模式只有5个字段。您可能在姓氏之后有另一个字段,可能是age.Modify如下所示的加载语句。

    student = LOAD '/home/srv-hadoop/data.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray); 
    
    +0

    是的,但它不是错误背后的原因 –

    0

    你可以试试这个:猪-x MapReduce的文件://data.pig