2017-08-16 40 views
0

我目前使用scala并想知道我们是否可以将不同的列合并为一个列? 例如,如果我有:斯卡拉数据框,将不同的列合并到新行中

+------+--------+-------+----------+-----+ 
| User | family | phone | location | raz | 
+------+--------+-------+----------+-----+ 
| u1 | f1  | p1 | l1  | r1 | 
+------+--------+-------+----------+-----+ 
| u2 | f2  | p2 | l2  | r2 | 
+------+--------+-------+----------+-----+ 
| u3 | f3  | p3 | l3  | r3 | 
+------+--------+-------+----------+-----+ 

如何在不同的行相结合的手机,位置和拉兹成1列,其价值的每一个?

| User | family | new | 
+------+--------+-------+ 
| u1 | f1  | p1 | 
+------+--------+-------+ 
| u1 | f1  | l1 | 
+------+--------+-------+ 
| u1 | f1  | r1 | 
+------+--------+-------+ 
| u2 | f2  | p2 | 
+------+--------+-------+ 
| u2 | f2  | l2 | 
+------+--------+-------+ 
| u2 | f2  | r2 | 
+------+--------+-------+ 
| u3 | f3  | p3 | 
+------+--------+-------+ 
| u3 | f3  | l3 | 
+------+--------+-------+ 
| u3 | f3  | r3 | 
+------+--------+-------+ 

感谢

回答

0

一种方法是收集这些列被压扁成array列,explode它:

val df = Seq(
    ("u1", "f1", "p1", "l1", "r1"), 
    ("u2", "f2", "p2", "l2", "r2"), 
    ("u3", "f3", "p3", "l3", "r3") 
).toDF("User", "family", "phone", "location", "raz") 

val df2 = df. 
    withColumn("plr", array($"phone", $"location", $"raz")). 
    withColumn("new", explode($"plr")). 
    select("User", "family", "new") 

df2.show 
+----+------+---+ 
|User|family|new| 
+----+------+---+ 
| u1| f1| p1| 
| u1| f1| l1| 
| u1| f1| r1| 
| u2| f2| p2| 
| u2| f2| l2| 
| u2| f2| r2| 
| u3| f3| p3| 
| u3| f3| l3| 
| u3| f3| r3| 
+----+------+---+ 
+0

它的工作!谢谢!如此简单,我正在研究如何进行多重连接和联合以获得最终结果,这样做会更好。 –