2014-11-05 49 views
1

我想likue这个关键连接两个列表(NoHeaderIndexed和NoFirstIndexed):洗牌内存池是免费的:SPARK与Java

final Broadcast<JavaPairRDD<Long, Tuple2<String, String>>> c = ctx.broadcast(noHeaderIndexed); 
    JavaPairRDD<Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>> rs = noFirstIndexed.mapToPair(new PairFunction<Tuple2<Long, Tuple2<String, String>>, Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>>() { 
     @Override 
     public Tuple2<Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>> call(Tuple2<Long, Tuple2<String, String>> longTuple2Tuple2) throws Exception { 
      String s1 = ""; 
      if (c.value().lookup(longTuple2Tuple2._1).get(0)._1 != null) 
       s1 = c.value().lookup(longTuple2Tuple2._1).get(0)._1; 

      String s2 = ""; 
      if (c.value().lookup(longTuple2Tuple2._1).get(0)._2 != null) 
       s2 = c.value().lookup(longTuple2Tuple2._1).get(0)._2; 
      return new Tuple2<Tuple2<Tuple2<String, String>, Long>, Tuple2<Tuple2<String, String>, Long>>(new Tuple2<Tuple2<String, String>, Long>(new Tuple2<String, String>(longTuple2Tuple2._2._1,longTuple2Tuple2._2._2),longTuple2Tuple2._1),new Tuple2<Tuple2<String, String>, Long>(new Tuple2<String, String>(s1,s2),longTuple2Tuple2._1)); 
     } 
    }); 

    //writeResult(rs, "rs.txt"); 
    rs.coalesce(1,true).saveAsTextFile(path+ "rs");  

但是,当我试着执行它时,它显示此:

INFO ShuffleMemoryManager: Thread 61 waiting for at least 1/2N of shuffle memory pool to be free  

而且它不终止执行。你能否向我解释这个问题,我该如何解决这个问题。

预先感谢您。

在此命令

回答

1

这里

rs.coalesce(1,真).saveAsTextFile(路径+ “RS”);

您只能创建一个分区,以便所有数据都会到达一个节点。你需要增加分区

的数量试试这取决于你的数据大小

rs.coalesce(10,真).saveAsTextFile(路径+ “RS”);

+0

谢谢你的回答,但我有同样的问题与rs.coalesce(10,真正的).saveAsTextFile(路径+“rs”); – 2014-11-06 09:11:42

+1

尝试 rs.saveAsTextFile(path +“rs”); – user1989252 2014-11-06 16:11:25

+0

我有几个文件喜欢结果。但总是出现同样的问题 – 2014-11-12 10:15:52