2014-10-18 295 views
3

我是Spark的新手。我能够训练DataSet。但不能使用训练的数据集进行预测。Spark:如何使用训练好的数据集进行预测(MLLIB:SVMWithSGD)

这里是训练数据的代码是1800x4000矩阵。

import org.apache.spark.mllib.classification.SVMWithSGD 
import org.apache.spark.mllib.regression.LinearRegressionWithSGD 
import org.apache.spark.mllib.regression.LabeledPoint 
import org.apache.spark.mllib.linalg.Vectors 

// Load and parse the data 
val data = sc.textFile("data/mllib/ridge-data/myfile.txt") 
val parsedData = data.map { line => 
    val parts = line.split(' ') 
    LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble))) 
} 

val firstDataPoint = parsedData.take(1)(0) 

// Building the model 
val numIterations = 100 
val model = SVMWithSGD.train(parsedData, numIterations) 
//val model = LinearRegressionWithSGD.train(parsedData,numIterations) 


val labelAndPreds = parsedData.map { point => 
    val prediction = model.predict(point.features) 
    (point.label, prediction) 
} 
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble/parsedData.count 
println("Training Error = " + trainErr) 

现在我加载要使用的数据,以执行预测:数据是1800个值

val test = sc.textFile("data/mllib/ridge-data/data.txt") 

但不知道如何使用该数据来执行预测的矢量。请帮忙。

回答

0

先装入文本文件的labeledPoints(记住,你必须保存RDD与saveAsTextFile):

JavaRDD<LabeledPoint> test = MLUtils.loadLabeledPoints(init.context, "hdfs://../test/", 30).toJavaRDD(); 
JavaRDD<Tuple2<Object, Object>> scoreAndLabels = test.map(
    new Function<LabeledPoint, Tuple2<Object, Object>>() { 
    public Tuple2<Object, Object> call(LabeledPoint p) { 
     Double score = model.predict(p.features()); 
     return new Tuple2<Object, Object>(score, p.label()); 
    } 
    } 
); 

现在收集分数和在它们之间迭代:

List<Tuple2<Object, Object>> scores = scoreAndLabels.collect(); 
    for(Tuple2<Object, Object> score : scores){ 
    System.out.println(score._1 + " \t" + score._2); 
} 

这是在Java中,但也许你可以转换它:)

但预测值没有意义: -18.841544889249917 0.0 16 8.32916035523283 1.0 420.67763915879794 1.0 -974.1942589201286 0.0 71.73602841256813 1.0 233.13636224524993 1.0 -1000.5902168199027 0.0 是否有人知道他们是什么意思?

相关问题