2011-04-16 87 views
0
I have two Map/Reduce classes, named MyMappper1/MyReducer1 and MyMapper2/MyReducer2, and want to use the output of MyReducer1 as the input of MyMapper2, by setting the input path of job2 to the output path of job1. 

的类型如下:错误在使用一个的MapReduce的输出作为另一个的MapReduce的输入

public class MyMapper1 extends Mapper<LongWritable, Text, IntWritable, IntArrayWritable> 
    public class MyReducer1 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable> 
    public class MyMapper2 extends Mapper<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable> 
    public class MyReducer2 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntWritable> 

public class IntArrayWritable extends ArrayWritable { 
    public IntArrayWritable() { 
     super(IntWritable.class); 
    } 
} 

以及用于设置输入/输出路径的代码是这样的:

Path temppath = new Path("temp-dir-" + temp_time); 

    FileOutputFormat.setOutputPath(job1, temppath); 

      ........... 

    FileInputFormat.addInputPath(job2, temppath); 

的设置输入/输出格式的代码如下:

job1.setOutputFormatClass(TextOutputFormat.class); 
      .......... 
    job2.setInputFormatClass(KeyValueTextInputFormat.class); 

但是我运行作业2时,总是得到异常:

11/04/16 12:34:09 WARN mapred.LocalJobRunner: job_local_0002 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable 
    at ligon.MyMapper2.map(MyMapper2.java:1) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) 

我曾试图改变InputFormat和OUTPUTFORMAT,但没有成功,类似的(尽管不相同)的例外发生在作业2。

我完整的代码包是: http://dl.dropbox.com/u/7361939/HW2_Q1.zip

非常感谢您!

回答

0

问题是,在作业2中,KeyValueTextInputFormat会生成类型的键值对,并且您试图使用接受的Mapper处理它们,从而导致ClassCastException。最好的选择是将您的映射器更改为接受并将文本转换为整数。

+1

谢谢。现在的问题是:ArrayWritable由第一个reducer输出如下 - 没有任何元素值 - 如何让第二个映射器接受这个并从这个字符串转换为对象? [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] b21 – 2011-04-17 14:31:07

+0

我也有同样的问题,并得到相同的错误。我想把另一个hadoop工作的输出用作第二个hadoop工作的输入。第一份工作的输出具有MapWritable作为值。第二份工作的解决方案是job.InputFormatClass()。但是我应该使用哪一个参数 – Yeameen 2012-04-20 07:21:51

0

我刚刚面临同样的问题,并在不久之前就想出了解决方案。由于您使用IntArrayWritable作为Reducer的输出,因此它易于编写,并稍后以二进制形式读取数据。

对于第一份工作:

job1.setOutputFormatClass(SequenceFileOutputFormat.class); 

    job1.setOutputKeyClass(IntWritable.class); 
    job1.setOutputValueClass(IntArrayWritable.class); 

的第二件事:

job2.setInputFormatClass(SequenceFileInputFormat.class); 

这应该工作你的情况

相关问题