1
我正在运行下面的代码,试图在Apache Spark中的GraphX中创建图。VertexRDD给我类型不匹配错误
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.graphx.GraphLoader
import org.apache.spark.graphx.Graph
import org.apache.spark.rdd.RDD
import org.apache.spark.graphx.VertexId
//loads file from the array
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/data/google-plus/2309.graph");
//maps lines and takes the first 21 characters of each line which is the node.
val result = lines.map(line => line.substring(0,20))
//creates a new variable with each node followed by a long .
val result2 = result.map(word => (word,1L).toLong)
//where i am getting an error
val vertexRDD: RDD[(Long,Long)] = sc.parallelize(result2)
我收到以下错误:
error: type mismatch;
found : org.apache.spark.rdd.RDD[(Long, Long)]
required: Seq[?]
Error occurred in an application involving default arguments.
val vertexRDD: RDD[(Long, Long)] = sc.parallelize(result2)
当我运行代码我得到以下几点:(0 + 9)/ 59] 16/12/16 18:12:26 WARN TaskSetManager:在阶段3.0(TID 126,moon07.eecs.qmul.ac.uk)中丢失的任务8.0:java.lang.NumberFormatException:对于输入字符串:“10867043655226952823” \t at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) \t at java.lang.Long.parseLong(Long.java:592) \t在java.lang.Long.parseLong(Long.java:631) \t在scala.collection.immutable.StringLike $ class.toLong(StringLike.scala:230) ...... –
@RhysCopperthwaite哦,当然,最大Long值有19个字符,所以你的子字符串应该限制为18。 GraphX不支持将字符串作为ID的顶点,因此您必须具有适合Long值的数字ID。如果需要,您也可以尝试'line.hashCode()'而不是'line.substring()'。 –
@RhysCopperthwaite使用'hashCode()'可能不是定义ID的最佳方式。您需要确保每个节点都有一个可以放入Long变量的独特数字ID。 –