重复mapreduce程序输出？

我在输出中得到了很多重复值，所以我实现了一个reduce函数，如下所示，但这个reduce函数仍然可以作为一个身份函数，即使我有一个reduce也没有差别。我的缩小功能有什么问题？重复mapreduce程序输出？

 public class search 
{  
    public static String str="And"; 
    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> 
    { 
     String mname=""; 
     public void configure(JobConf job) 
     { 
      mname=job.get(str); 
      job.set(mname,str); 
     } 

     private Text word = new Text(); 
     public Text Uinput =new Text(""); 
     public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
     { 

      String mapstr=mname; 
      Uinput.set(mapstr); 
      String line = value.toString(); 
      Text fdata = new Text(); 

      StringTokenizer tokenizer = new StringTokenizer(line); 
      while (tokenizer.hasMoreTokens()) 
      { 
       word.set(tokenizer.nextToken()); 
       fdata.set(line); 

       if(word.equals(Uinput)) 
       output.collect(fdata,new Text("")); 
      } 

     } 
    } 

    public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> 
    { 
     public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
     { 

      boolean start = true; 
      //System.out.println("inside reduce :"+input); 
      StringBuilder sb = new StringBuilder(); 
      while (values.hasNext()) 
      { 
       if(!start) 

       start=false; 
       sb.append(values.next().toString()); 

      } 
      //output.collect(key, new IntWritable(sum)); 
      output.collect(key, new Text(sb.toString())); 
     } 
    }

公共静态无效的主要（字串[] args）抛出异常 {

JobConf conf = new JobConf(search.class); 
    conf.setJobName("QueryIndex"); 
    //JobConf conf = new JobConf(getConf(), WordCount.class); 
    conf.set(str,args[0]); 

    conf.setOutputKeyClass(Text.class); 
    conf.setOutputValueClass(Text.class); 

    conf.setMapperClass(Map.class); 
    //conf.setCombinerClass(SReducer.class); 
    conf.setReducerClass(SReducer.class); 

    conf.setInputFormat(TextInputFormat.class); 
    conf.setOutputFormat(TextOutputFormat.class); 



    FileInputFormat.setInputPaths(conf, new Path("IIndexOut")); 
    FileOutputFormat.setOutputPath(conf, new Path("searchOut")); 

    JobClient.runJob(conf); 
}

}

来源

2012-04-26 Karan Rekhi

可能的重复：http：//stackoverflow.com/questions/10305435/hadoop-inverted-index-without-recurrence-of-file-names – 2012-04-26 20:15:11

嗨马特，我已经通过该帖子，但它没有解决我的问题。这就是我发布自己的原因。 – 2012-04-26 20:18:07

也许你没有设置该减速器的实际减少功能来使用吗？这是通过使用

job.setReducerClass().

如果您没有将类设置为您的类，那么使用默认的减速器。你应该做以下几点：

job.setReducerClass(SReducer.class)

请张贴您的主要功能，以便我们可以验证。

来源

2012-04-26 20:02:15 Chaos

我做到了，我也在上面贴出来，请检查一下。 – 2012-04-26 20:07:53

你确定你正在阅读最新的输出吗？我建议您删除所有以前的输出文件并重新运行作业。什么是顺便运行你的工作？ – Chaos 2012-04-26 20:10:55

它是一个搜索引擎程序，所以indexout是倒排索引实现的输出，在这个搜索步骤中，我只需要搜索一个关键字并显示结果（即时获取重复的地方） – 2012-04-26 20:13:15

我没有看过彻底的代码，但有一点我可以肯定的是布尔变量开始没用这里，下面如果代码（！开始）应在括号去dup数据，否则你最终只能写入你从mapper接收的reducer中的所有数据。

public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> 
{ 
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
    { 

     boolean start = true; 
     //System.out.println("inside reduce :"+input); 
     StringBuilder sb = new StringBuilder(); 
     while (values.hasNext()) 
     { 
      if(!start) 
      { 
       start=false; 
       sb.append(values.next().toString()); 
      } 

     } 
     //output.collect(key, new IntWritable(sum)); 
     output.collect(key, new Text(sb.toString())); 
    } 
}

或最佳减少方法是只： -

public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> 
    { 
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
{ 

    //System.out.println("inside reduce :"+input); 
    StringBuilder sb = new StringBuilder(); 
    sb.append(values.next().toString()); 

    //output.collect(key, new IntWritable(sum)); 
    output.collect(key, new Text(sb.toString())); 
}

}

当你只关心迭代器的第一个值。

来源

2012-04-27 00:14:29 sulabhc

在地图和缩小功能之前使用@override注释。所以你可以肯定，你重写了基类的方法。

来源

2013-08-20 21:46:17

重复mapreduce程序输出？

回答

相关问题