2016-12-16 171 views
1

我正在运行下面的代码,试图在Apache Spark中的GraphX中创建图。VertexRDD给我类型不匹配错误

import org.apache.spark.SparkConf 

import org.apache.spark.SparkContext 

import org.apache.spark.graphx.GraphLoader 

import org.apache.spark.graphx.Graph 

import org.apache.spark.rdd.RDD 
import org.apache.spark.graphx.VertexId 

//loads file from the array 

val lines = sc.textFile("hdfs://moonshot-ha-nameservice/data/google-plus/2309.graph"); 

//maps lines and takes the first 21 characters of each line which is the node. 

val result = lines.map(line => line.substring(0,20)) 

//creates a new variable with each node followed by a long . 

val result2 = result.map(word => (word,1L).toLong) 

//where i am getting an error 

val vertexRDD: RDD[(Long,Long)] = sc.parallelize(result2) 

我收到以下错误:

error: type mismatch; 

found : org.apache.spark.rdd.RDD[(Long, Long)] 

required: Seq[?] 

Error occurred in an application involving default arguments. 
     val vertexRDD: RDD[(Long, Long)] = sc.parallelize(result2) 

回答

3

首先,您的地图可以简化为以下代码:现在

val vertexRDD: RDD[(Long, Long)] = 
    lines.map(line => (line.substring(0, 17).toLong, 1L)) 

,你的错误:你不能用RDD拨打sc.parallelize。您的vertexRDD已经由result2定义。然后,您可以创建RESULT2您的图形和你EdgesRDD:

val g = Graph(result2, edgesRDD) 

,或者,如果采用我的建议:

val g = Graph(vertexRDD, edgesRDD) 
+0

当我运行代码我得到以下几点:(0 + 9)/ 59] 16/12/16 18:12:26 WARN TaskSetManager:在阶段3.0(TID 126,moon07.eecs.qmul.ac.uk)中丢失的任务8.0:java.lang.NumberFormatException:对于输入字符串:“10867043655226952823” \t at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) \t at java.lang.Long.parseLong(Long.java:592) \t在java.lang.Long.parseLong(Long.java:631) \t在scala.collection.immutable.StringLike $ class.toLong(StringLike.scala:230) ...... –

+0

@RhysCopperthwaite哦,当然,最大Long值有19个字符,所以你的子字符串应该限制为18。 GraphX不支持将字符串作为ID的顶点,因此您必须具有适合Long值的数字ID。如果需要,您也可以尝试'line.hashCode()'而不是'line.substring()'。 –

+0

@RhysCopperthwaite使用'hashCode()'可能不是定义ID的最佳方式。您需要确保每个节点都有一个可以放入Long变量的独特数字ID。 –