VectorAssembler不支持StringType类型scala spark转换

我有一个包含字符串列的数据框，我打算将它用作使用spark和scala的k-means的输入。我使用下面的方法将数据帧的我的字符串类型的列：VectorAssembler不支持StringType类型scala spark转换

val toDouble = udf[Double, String](_.toDouble) 
val analysisData = dataframe_mysql.withColumn("Event", toDouble(dataframe_mysql("event"))).withColumn("Execution", toDouble(dataframe_mysql("execution"))).withColumn("Info", toDouble(dataframe_mysql("info")))    
val assembler = new VectorAssembler() 
    .setInputCols(Array("execution", "event", "info")) 
    .setOutputCol("features") 
val output = assembler.transform(analysisData) 
println(output.select("features", "execution").first())

当我打印的analysisData架构的皈依是正确的。但我收到一个异常：VectorAssembler不支持StringType类型 这意味着我的值仍然是字符串！我如何转换这些值而不仅仅是模式类型？

感谢

来源

2016-05-30 Kratos

事实上，VectorAssembler变压器不采取串。所以你需要确保你的列匹配数字，布尔，矢量类型。确保你的udf做的是正确的事情，并确保没有任何列有StringType。

要在星火数据帧转换列另一种类型，使之成为简单使用CAST（）函数DSL像这样：

val analysisData = dataframe_mysql.withColumn("Event", dataframe_mysql("Event").cast(DoubleType))

它应该工作！

来源

2016-08-10 10:27:02

如果不是只有一列或几列，而是说50或100或300需要转换为浮动，你会怎么做？ –

嘿@EvanZamir，你可以尝试像'df.selectExpr（“cast（col1 as float）col1”，“cast（col2 as float）col2”）' –

VectorAssembler不支持StringType类型scala spark转换

回答

相关问题