2014-11-14 86 views
3

我有这个程序在apache-spark上打印Kmeans算法的MSSE。有20个集群生成。我试图打印clusterID和分配给各个clusterID的元素。我如何遍历clusterID来打印元素。使用Spark KMeans算法打印ClusterID及其元素。

谢谢你们!

  val sc = new SparkContext("local", "KMeansExample","/usr/local/spark/", List("target/scala-2.10/kmeans_2.10-1.0.jar")) 
      // Load and parse the data 
      val data = sc.textFile("kmeans.csv") 
     val parsedData = data.map(s => Vectors.dense(s.split(',').map(_.toDouble))) 

     // Cluster the data into two classes using KMeans 
     val numIterations = 20 
     val numClusters = 20 
     val clusters = KMeans.train(parsedData, numClusters, numIterations) 
     val clusterCenters = clusters.clusterCenters map (_.toArray) 
     println("The Cluster Centers are = " + clusterCenters) 
     // Evaluate clustering by computing Within Set Sum of Squared Errors 
     val WSSSE = clusters.computeCost(parsedData) 
     println("Within Set Sum of Squared Errors = " + WSSSE) 

回答

3

,因为我知道你应该运行预测每个元素。

KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations); 

    List<Vector> vectors = parsedData.collect(); 
    for(Vector vector: vectors){ 
     System.out.println("cluster "+clusters.predict(vector) +" "+vector.toString()); 
    }