为什么我接下来不能处理我的hadoop程序？

大家好！我有一个计划关于日食的Hadoop，源代码是：为什么我接下来不能处理我的hadoop程序？

public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { 
     private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
     StringTokenizer itr = new StringTokenizer(value.toString()); 
     while(itr.hasMoreTokens()) { 
      word.set(itr.nextToken()); 
      context.write(word, one); 
     } 
    } 
} 

public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { 
    private IntWritable result = new IntWritable(); 
    @Override 
    protected void reduce(Text key, Iterable<IntWritable> values, 
      Context context) throws IOException, InterruptedException { 
     int sum = 0; 
     for(IntWritable val : values) { 
      sum += val.get(); 
     } 
     result.set(sum); 
     context.write(key, result); 
    } 
} 

public class WordCount { 
    public static void main(String[] args) throws Exception { 
     Configuration conf = new Configuration(); 
     String[] oargs = new GenericOptionsParser(conf, args).getRemainingArgs(); 
     if(oargs.length != 2) { 
      System.err.println("Usage: word count <in> <out>"); 
     } 
     System.out.println("input: "+oargs[0]); 
     System.out.println("output: "+oargs[1]); 
     Job job = new Job(conf, "word count"); 
     job.setJarByClass(WordCount.class); 
     job.setMapperClass(TokenizerMapper.class); 
     job.setCombinerClass(IntSumReducer.class); 
     job.setReducerClass(IntSumReducer.class); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(IntWritable.class); 
     FileInputFormat.addInputPath(job, new Path(oargs[0])); 
     FileOutputFormat.setOutputPath(job, new Path(oargs[1])); 
     System.out.println("=============================="); 
     System.out.println("start ..."); 
     boolean flag = job.waitForCompletion(true); 
      System.out.println(flag); 
     System.out.println("end ..."); 
     System.out.println("=============================="); 
    } 
}

和结果，请查看日志：

[email protected] /cygdrive/f/develop/hadoop/hadoop-1.0.3 
$ ./bin/hadoop jar ./jar/wordcount.jar /tmp/input /tmp/output 
input: /tmp/input 
output: /tmp/output 
============================== 
start ... 
12/07/25 14:59:17 INFO input.FileInputFormat: Total input paths to process : 2 
12/07/25 14:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
12/07/25 14:59:17 WARN snappy.LoadSnappy: Snappy native library not loaded 
12/07/25 14:59:17 INFO mapred.JobClient: Running job: job_201207251447_0001 
12/07/25 14:59:18 INFO mapred.JobClient: map 0% reduce 0%

日志不下去，永远停在那里。为什么？

我使用Windows XP系统中的cygwin软件以本地模式运行代码。

来源

2012-07-23 rory

“do it next”是什么意思？你期望它应该做什么？通常你必须等到你的集群处理这个工作并返回给你，这就是'waitForCompletion'的含义。如果你的工作不成功，你现有的JVM。 – 2012-07-23 08:03:57

您是否可以发布任何应该运行的两个地图任务的任务日志？您可以通过作业跟踪器web ui访问这些URL，http：// localhost：50030 – 2012-07-25 10:27:48

@罗里，正如托马斯所问，你可以更具体的“下一步做”？这是你在屏幕上获得的整个堆栈轨迹吗？你的意思是你编译过一次，然后得到结果，不能再运行一次？您是否已经为eclipse IDE上的程序指定了正确的输入参数，即输入和输出目录？

如果您的意思是您无法再次运行程序，可能是您没有指定不同的输出目录。但我想在看到堆栈跟踪后情况并非如此。

来源

2012-07-23 13:37:13

感谢Arun，我的意思是当我的代码调试到'job.waitForCompletion（true）'时，我的代码不会继续并永远停留在那里。 – rory 2012-07-24 08:12:58

我想，如果你问为什么你从来没有看到end ====================的println部分，然后检查你的代码：

System.exit(job.waitForCompletion(true)?0:1); 
System.out.println("end ..."); 
System.out.println("==============================");

你包裹job.waitForCompletion(true)通话用System.exit，因此JVM会前终止最后两个System.out可以执行。

编辑

日志添加器/记录器在这里消息是一个线索，任何其他异常可能被吞噬。你应该修改签名你的代码，以利用ToolRunner效用：

public class WordCount { 
    public static void main(String[] args) throws Exception { 
    ToolRunner.run(new WordCount(), args); 
    } 

    public int run(String args[]) { 
    if(args.length != 2) { 
     System.err.println("Usage: word count <in> <out>"); 
    } 
    System.out.println("input: "+args[0]); 
    System.out.println("output: "+args[1]); 
    Job job = new Job(getConf(), "word count"); 
    Configuration conf = job.getConf(); 

    job.setJarByClass(WordCount.class); 
    job.setMapperClass(TokenizerMapper.class); 
    job.setCombinerClass(IntSumReducer.class); 
    job.setReducerClass(IntSumReducer.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 

    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 

    System.out.println("=============================="); 
    System.out.println("start ..."); 
    int result = job.waitForCompletion(true) ? 0 : 1; 
    System.out.println("end ..."); 
    System.out.println("=============================="); 

    return results 
    } 
}

而且你应该使用$ HADOOP_HOME /斌/ Hadoop的脚本到你的作业提交到集群（如下，你需要替换你的罐子的名字和WordCount类的全名）：

#> hadoop jar wordcount.jar WordCount input output

来源

2012-07-24 00:48:30

谢谢Chris！我想看到“结束====================”println，但我的问题不是“System.exit”，当我的代码调试到“job.waitForCompletion（真）“，我的代码不会继续。 – rory 2012-07-24 08:06:40

谢谢克里斯！我想看看'end ===================='println，但是我的问题不是由于'System.exit'，当我的代码调试到'job中。 waitForCompletion（true）'，我的代码不会永远停留在那里 – rory 2012-07-24 08:15:41

你的工作甚至提交吗？警告信息看起来很可疑 - 你不会看到关于appenders/logger的信息 – 2012-07-24 10:39:46

为什么我接下来不能处理我的hadoop程序？

回答

相关问题