0
嗨,大家好,我想将RDD [Vector]和RDD [Int]结合到RDD [Vector] 这就是我所做的,我使用Kmeans来预测集群,想法是添加在前面的每个矢量。这里我就是这样做的通讯员簇合并两种不同类型的RDD
val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
val data = spark.sparkContext.textFile("C:/spark/data/mllib/kmeans_data.txt")
//Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()//RDD[vector]
val clusters = KMeans.train(parsedData, numClusters, numIterations)
val resultatOfprediction=clusters.predict(parsedData)//RDD[int]
val finalData=parsedData.zip(resultatOfprediction)
finalData.collect().foreach(println)
结果是
([0.0,0.0,0.0],0)
([0.1,0.1,0.1],0)
([0.2,0.2,0.2],0)
([9.0,9.0,9.0],1)
([9.1,9.1,9.1],1)
([9.2,9.2,9.2],1)
输出我想
[0.0,0.0,0.0,1.0]
[0.1,0.1,0.1,1.0]
[0.2,0.2,0.2,1.0]
[9.0,9.0,9.0,0.0]
[9.1,9.1,9.1,0.0]
[9.2,9.2,9.2,0.0]
的目标是,我要AA最终RDD [载体]保存到一个txt文件中grid.but您提供的结果状态并没有一个RDD [矢量]
请检查更新感谢 –
检查更新的答案 –
我没有得到一个正确的答案,请你会提供对方的回答感谢 –