我有一个数据帧,它看起来像下面应用UDF的多个列中的星火据帧
| id| age| rbc| bgr| dm|cad|appet| pe|ane|classification|
+---+----+------+-----+---+---+-----+---+---+--------------+
| 3|48.0|normal|117.0| no| no| poor|yes|yes| ckd|
....
....
....
我写了UDF来分类yes, no, poor, normal
转换成二进制0s
和1s
def stringToBinary(stringValue: String): Int = {
stringValue match {
case "yes" => return 1
case "no" => return 0
case "present" => return 1
case "notpresent" => return 0
case "normal" => return 1
case "abnormal" => return 0
}
}
val stringToBinaryUDF = udf(stringToBinary _)
我申请这到数据框如下
val newCol = stringToBinaryUDF.apply(col("pc")) //creates the new column with formatted value
val refined1 = noZeroDF.withColumn("dm", newCol) //adds the new column to original
如何将多个列传递到UDF中,以便我不必为其他分类列重复自己?
你的问题@Giridhar是什么?为什么你多次接受和不接受答案?如果答案帮助你,然后接受另外的评论。 :) –