我对Python很熟悉,我正在学习Spark-Scala。在Spark-Scala中,如何将数组列表复制到DataFrame中?
我想建立具有由这种语法desribed结构的数据帧:
// Prepare training data from a list of (label, features) tuples.
val training = spark.createDataFrame(Seq(
(1.1, Vectors.dense(1.1, 0.1)),
(0.2, Vectors.dense(1.0, -1.0)),
(3.0, Vectors.dense(1.3, 1.0)),
(1.0, Vectors.dense(1.2, -0.5))
)).toDF("label", "features")
我从这个网址上面的语法: http://spark.apache.org/docs/latest/ml-pipeline.html
目前我的数据是数组,我已经退出出了DF的:
val my_a = gspc17_df.collect().map{row => Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}
我的阵列的结构非常类似于上述DF:
my_a: Array[Seq[Any]] =
Array(
List(-1.4830674013266898, [-0.004192832940431825,-0.003170667657263393]),
List(-0.05876766500768526, [-0.008462913654529357,-0.006880595828929472]),
List(1.0109273250546658, [-3.1816797620416693E-4,-0.006502619326182358]))
如何将数据从我的数组复制到具有上述结构的DataFrame?
我想这句法:
val my_df = spark.createDataFrame(my_a).toDF("label","features")
星火我吼道:
<console>:105: error: inferred type arguments [Seq[Any]] do not conform to method createDataFrame's type parameter bounds [A <: Product]
val my_df = spark.createDataFrame(my_a).toDF("label","features")
^
<console>:105: error: type mismatch;
found : scala.collection.mutable.WrappedArray[Seq[Any]]
required: Seq[A]
val my_df = spark.createDataFrame(my_a).toDF("label","features")
^
scala>