Hadoop MapReduce，如何减少自定义对象？

我是Hadoop的新手，我正在尝试使用Reducer类。所以，基本上我发现了一个在线教程，他们的减少类看起来是这样的，Hadoop MapReduce，如何减少自定义对象？

public class mapReducer extends Reducer<Text, IntWritable, Text, IntWritable>{ 
    IntWritable total = new IntWritable(); 
    @Override 
    protected void reduce(Text key, Iterable<IntWritable> values, 
      Reducer<Text, InWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException{ 
     for (IntWritable value: values){ 
      total += value.get(); 
     } 
     context.write(key, count); 
    } 
}

所以我想用myCustomObj改变总。参照上面的例子，像，

//.. 
myCustomObj total = new myCustomObj(); 
@Override 
protected void reduce(Text key, Iterable<myCustomObj> values, 
     Reducer<Text, InWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException{ 
    for (myCustomObj value: values){ 
     total.add(value); 
    } 
    context.write(key, total.getPrimaryAttribute()); 
}

目的：我要的是列表的hadoop后key -> total的对象已经完成减少。我认为上面的代码只会输出key -> primaryAttribute。

建议：如果这太繁琐，我有一个想法，我需要在磁盘上以XML格式存储的细节。但是，我不确定映射还原器背后的理论，还原器是在服务器还是客户端计算机（映射发生的地方）执行？如果它发生在客户端计算机上，那么我将在所有客户端计算机上有一点点我的XML文件。我只想把所有的信息集中到一台服务器上。

我希望我明确提出了我的问题。谢谢

编辑：我试图寻找在线来源。但是有很多定制的hadoops。我不知道我应该看什么。

来源

2017-04-01 user859385

目前尚不清楚你的问题是什么。 “myCustomObj”的实现是什么样的？ –

为了能够减少自定义对象，首先，映射器应该将此对象作为值返回。假设你的对象的名称是：CustomObject映射器的定义应该是这样的：

public class MyMapper extends Mapper<LongWritable, Text, Text, CustomObject> { 
    @Override 
    protected void map(LongWritable key, Text value, 
      Mapper<LongWritable, Text, Text, CustomObject>.Context context) throws IOException, InterruptedException { 
     // do you stuff here 
    } 
}

现在CustomObject本身应该实现WritableComparable接口与所有的三个必需的方法（主要为洗牌阶段要求）：

write - 定义你的对象存储到磁盘的方式
readFields - 如何从磁盘读取存储的对象
compareTo - 定义的方式，对象的排序方式（你可以离开这个空白，因为只有密钥被用于在洗牌阶段排序）

减速签名应该是这样的：

public class MyReducer extends Reducer<Text, CustomObject, Text, IntWritable>{ 
    @Override 
    protected void reduce(Text key, Iterable<CustomObject> values, 
      Reducer<Text, CustomObject, Text, IntWritable>.Context context) throws IOException, InterruptedException{ 
     // reducer code 
    } 
}

最后，在配置作业时，应指定正确的输入/输出组合。

job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(CustomObject.class); 
job.setOutputKeyClass(Text.class); 
job.setOutputValueClass(IntWritable.class); 
job.setMapperClass(MyMapper.class); 
job.setReducerClass(MyReducer.class);

这应该可以做到。

来源

2017-04-03 14:11:36 Serhiy

Hadoop MapReduce，如何减少自定义对象？

回答

相关问题