2014-09-27 55 views
0

我刚开始学习hadoop,并运行hadoop map-reduce程序与自定义分区和比较器。我面临的问题是,主要和次要的排序没有完成复合键,超过一个复合材料的一部分键正在与其他compsite-key部分进行更改。复合键变更,Hadoop Map-Reduce?

例如我创建内部映射

key1 -> tagA,1 
key2 -> tagA,1 
key3 -> tagA,1 
key4 -> tagA,1 
key5 -> tagA,2 
key6 -> tagA,2 
key7 -> tagB,1 
key8 -> tagB,1 
key9 -> tagB,1 
key10 -> tagB,1 
key11 -> tagB,2 
key12 -> tagB,2 

和分割和组合下键如下

//Partitioner 
public static class TaggedJoiningPartitioner implements Partitioner<Text, Text> { 
    @Override 
    public int getPartition(Text key, Text value, int numPartitions) { 
     String line = key.toString(); 
     String tokens[] = line.split(","); 
     return (tokens[0].hashCode() & Integer.MAX_VALUE)% numPartitions; 
    } 
    @Override 
    public void configure(JobConf arg0) { 
     // TODO Auto-generated method stub //NOT OVERRIDING THIS METHOD 
    } 
} 
//Comparator 
public static class TaggedJoiningGroupingComparator extends WritableComparator { 

    public TaggedJoiningGroupingComparator() { 
     super(Text.class, true); 
    } 

    @Override 
    public int compare(WritableComparable a, WritableComparable b) { 
     String taggedKey1[] = ((Text)a).toString().split(","); 
     String taggedKey2[] = ((Text)b).toString().split(","); 
     return taggedKey1[0].compareTo(taggedKey2[0]); 
    } 
} 

在减速的主要目标是正确分组根据标签,但没有适当的排序。还原剂中键的顺序和内容如下:

//REDUCER 1 
key1 -> tagA,1 
key2 -> tagA,1 
key3 -> tagA,1 
key5 -> tagA,1 //2 changed by 1 here 
key6 -> tagA,1 //2 changed by 1 here 
key4 -> tagA,1 

//REDUCER 2 
key7 -> tagB,1 
key11 -> tagB,1 //2 changed by 1 here 
key12 -> tagB,1 //2 changed by 1 here 
key8 -> tagB,1 
key9 -> tagB,1 
key10 -> tagB,1 

尝试长时间解决但尚未成功,任何帮助赞赏?

+0

我在这里没有看到第二种排序。二次排序在哪里发生? – 2014-09-27 19:14:19

+0

我正在使用Hadoop的旧API。因此没有任何像job.setSortComparatorClass(CompositeKeyComparator.class);可用。你能否提供相当于旧的API。 ? – 2014-09-27 21:28:42

+0

另外我在JobConf对象中设置分区器和比较器,如下所示: - \t \t conf.setPartitionerClass(TaggedJoiningPartitioner.class); \t conf.setOutputKeyComparatorClass(TaggedJoiningGroupingComparator.class); – 2014-09-27 21:36:59

回答

0

终于得到它的工作,其实我改变

conf.setOutputKeyComparatorClass(TaggedJoiningGroupingComparator.class); 

conf.setOutputValueGroupingComparator(TaggedJoiningGroupingComparator.class); 

也Hadoop的API文档。 -

setOutputValueGroupingComparator(Class<? extends RawComparator> theClass) 
Set the user defined RawComparator comparator for grouping keys in the input to the reduce.