改善Wordcount中的身份映射器

我创建了一个读取wordcount示例[1]的映射输出的映射方法。此示例远离使用MapReduce提供的IdentityMapper.class，但这是我发现为Wordcount创建工作IdentityMapper的唯一方法。唯一的问题是这个Mapper花费的时间比我想要的要多得多。我开始认为，也许我正在做一些冗余的东西。任何帮助来提高我的WordCountIdentityMapper代码？改善Wordcount中的身份映射器

[1]身份映射器

public class WordCountIdentityMapper extends MyMapper<LongWritable, Text, Text, IntWritable> { 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context 
    ) throws IOException, InterruptedException { 
     StringTokenizer itr = new StringTokenizer(value.toString()); 
     word.set(itr.nextToken()); 
     Integer val = Integer.valueOf(itr.nextToken()); 
     context.write(word, new IntWritable(val)); 
    } 

    public void run(Context context) throws IOException, InterruptedException { 
     while (context.nextKeyValue()) { 
      map(context.getCurrentKey(), context.getCurrentValue(), context); 
     } 
    } 
}

生成该mapoutput

public static class MyMap extends Mapper<LongWritable, Text, Text, IntWritable> { 
    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    public void map(LongWritable key, Text value, Context context 
    ) throws IOException, InterruptedException { 
     StringTokenizer itr = new StringTokenizer(value.toString()); 

     while (itr.hasMoreTokens()) { 
      word.set(itr.nextToken()); 
      context.write(word, one); 
     } 
    } 

    public void run(Context context) throws IOException, InterruptedException { 
     try { 
      while (context.nextKeyValue()) { 
       map(context.getCurrentKey(), context.getCurrentValue(), context); 
      } 
     } finally { 
      cleanup(context); 
     } 
    } 
}

由于

[2] Map类，

来源

2016-08-21 xeon

这个问题的解决是通过在indexOf()方法更换StringTokenizer 。它效果更好。我获得了更好的表现。

来源

2016-08-25 20:21:52 xeon

改善Wordcount中的身份映射器

回答

相关问题