2014-05-05 39 views
1

当我在较大的数据集上运行作业时,大量映射器/缩减器失败导致整个作业崩溃。以下是我在许多映射器上看到的错误:在hadoop上级联2.0.0作业失败FileNotFoundException job.split

java.io.FileNotFoundException: File does not exist: /mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201405050818_0001/job.split 
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1933) 
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1924) 
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:608) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) 
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:429) 
    at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:385) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 

有人能解决这个问题吗?我看到另一个遇到同样痛苦的人(here),可惜他无法及时得救。

回答

1

经过数小时的调试,我发现hadoop日志中没有任何用处(如往常一样)。然后我尝试以下变化:

  • 增加簇大小至10
  • 增加失败限制:
    1. 了mapred.map.max.attempts = 20个
    2. mapred.reduce.max.attempts = 20
    3. mapred.max.tracker.failures = 20
    4. mapred.max.map.failures.percent = 20
    5. mapred.max.reduce.failures.percent = 20

我能够对大量数据的随后运行我的级联工作。这似乎是级联造成的问题。