有关下列哪些?
scala> val diddy = Seq(
| ("2017/03/07", 4),
| ("2016/12/09", 2)).toDF("event_date", "no_of_days_gap")
diddy: org.apache.spark.sql.DataFrame = [event_date: string, no_of_days_gap: int]
scala> diddy.flatMap(r => Seq.fill(r.getInt(1))(r.getString(0))).show
+----------+
| value|
+----------+
|2017/03/07|
|2017/03/07|
|2017/03/07|
|2017/03/07|
|2016/12/09|
|2016/12/09|
+----------+
// use explode instead
scala> diddy.explode("no_of_days_gap", "events") { n: Int => 0 until n }.show
warning: there was one deprecation warning; re-run with -deprecation for details
+----------+--------------+------+
|event_date|no_of_days_gap|events|
+----------+--------------+------+
|2017/03/07| 4| 0|
|2017/03/07| 4| 1|
|2017/03/07| 4| 2|
|2017/03/07| 4| 3|
|2016/12/09| 2| 0|
|2016/12/09| 2| 1|
+----------+--------------+------+
如果你坚持withColumn
,那么......是......它! 扣紧!
diddy
.withColumn("concat", concat($"event_date", lit(",")))
.withColumn("repeat", expr("repeat(concat, no_of_days_gap)"))
.withColumn("split", split($"repeat", ","))
.withColumn("explode", explode($"split"))
在火花壳这将返回一个错误: '''阶> VAL范围= UDF((ⅰ:整数)=>(0直到ⅰ).toSeq) scala.MatchError:阶。 collection.immutable.Range(类scala.reflect.internal.Types $ ClassNoArgsTypeRef) at org.apache.spark.sql.catalyst.ScalaReflection $ .schemaFor(ScalaReflection.scala:692) at org.apache.spark.sql .catalyst.ScalaReflection $ .schemaFor(ScalaReflection.scala:671) at org.apache.spark.sql.functions $ .udf(functions.scala:3072) ... 48 elided''' – Diddy