2017-09-25 120 views
0

我有一个图形,我想计算最大度数。特别是具有最大程度的顶点我想知道所有属性。 这是代码片段:Scala - Spark:从特定节点返回顶点属性

def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = { 
    if (a._2 > b._2) a else b 
} 

val maxDegrees : (VertexId, Int) = graphX.degrees.reduce(max) 
max: (a: (org.apache.spark.graphx.VertexId, Int), b: (org.apache.spark.graphx.VertexId, Int))(org.apache.spark.graphx.VertexId, Int) 
maxDegrees: (org.apache.spark.graphx.VertexId, Int) = (2063726182,56387) 

val startVertexRDD = graphX.vertices.filter{case (hash_id, (id, state)) => hash_id == maxDegrees._1} 
startVertexRDD.collect() 

但它返回此异常:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 1 times, most recent failure: Lost task 0.0 in stage 145.0 (TID 5380, localhost, executor driver): scala.MatchError: (1009147972,null) (of class scala.Tuple2) 

如何解决这个问题?

回答

1

我认为这是问题所在。在这里:

val startVertexRDD = graphX.vertices.filter{case (hash_id, (id, state)) => hash_id == maxDegrees._1} 

所以它会尝试一些比较元组这样

(2063726182,56387) 

期待这样的事情:

(hash_id, (id, state)) 

抚养scala.MatchError因为比较的Tuple2(VertextId ,Int)与Tuple2的(VertexId,Tuple2(id,state))

要小心这一点L:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 145.0 failed 1 times, most recent failure: Lost task 0.0 in stage 145.0 (TID 5380, localhost, executor driver): scala.MatchError: (1009147972,null) (of class scala.Tuple2) 

具体位置:

scala.MatchError: (1009147972,null) 

没有为顶点1009147972计算的程度,所以当它比较能提出一些问题,以及。

希望这会有所帮助。

+0

我检查是否有与该代码段分离的节点:VAL vertexDegree: VertexRDD [INT] = graphX.degrees VAL vertexNoDegree = vertexDegree.filter {情况下(ID,度)=>度== NULL} vertexNoDegree.isEmpty() res6:布尔值= true 没有孤立的节点...我不知道该怎么办 – alukard990