如何将列拆分成多行（使用管道作为分隔符）？

我有一个包含以下内容的数据帧：如何将列拆分成多行（使用管道作为分隔符）？

movieId/movieName/genre 
1   example1 action|thriller|romance 
2   example2 fantastic|action

我想获得第二数据框（从第一个），包含以下内容：

movieId/movieName/genre 
1   example1 action 
1   example1 thriller 
1   example1 romance 
2   example2 fantastic 
2   example2 action

我怎么能这样做？

来源

2017-05-14 Lechucico

为什么其他的答案，因为split建议的UDF Spark SQL中的本地函数？请参阅functions对象。

考虑到其他两个答案，我认为最简单的答案如下：

scala> movies.show(truncate = false) 
+-------+---------+-----------------------+ 
|movieId|movieName|genre     | 
+-------+---------+-----------------------+ 
|1  |example1 |action|thriller|romance| 
|2  |example2 |fantastic|action  | 
+-------+---------+-----------------------+ 

scala> movies.withColumn("genre", explode(split($"genre", "[|]"))).show 
+-------+---------+---------+ 
|movieId|movieName| genre| 
+-------+---------+---------+ 
|  1| example1| action| 
|  1| example1| thriller| 
|  1| example1| romance| 
|  2| example2|fantastic| 
|  2| example2| action| 
+-------+---------+---------+

来源

2017-05-14 18:05:42

你可以将explode这个数组改为多行。您可以使用udf将pipe delimited string转换为array。下面是Scala

val data = Seq(("1", "example1", "action|thriller|romance"), 
    ("2", "example2", "fantastic|action")).toDF("movieId","movieName", "genre")

码转换的genrecolumn到Array通过使用简单的UDF功能

val stringtoArray = udf((genre : String) => {genre.split('|')})

而且爆炸的

data.withColumn("genre", explode(stringtoArray($"genre"))).show

来源

2017-05-14 14:52:24

如何将列拆分成多行（使用管道作为分隔符）？

回答

相关问题