1
我有一个数据帧,其这样下去:移调火花
+---------+-------------+--------------------+--------+
| ID | reg_num| reg_typ|reg_code|
+---------+-------------+--------------------+--------+
|523528690| 134886307000|Chamber of Commer | 14246|
|523528690|2015/369956|Government Gazett | 14225|
|523528690| 997253630|Tax Registration | 14259|
|523528691| 997253633|Tax Doc | 14250|
|523528691| 997253634|Tax File | 14251|
|523528691| 997253635|Tax Data | 14252|
|523528691| 997253636|Tax Monitor | 14253|
+---------+-------------+--------------------+--------+
现在我试图实现与格式输出:
+---------+-------------+--------------------+--------+-------------+-------------+-------------+-------------+
| ID | reg_num| reg_typ|reg_code| reg_1 | reg_2 | reg_3 | reg_4 |
+---------+-------------+--------------------+--------+-------------+-------------+-------------+-------------+
|523528690| 134886307000|Chamber of Commer | 14246| 134886307000|2015/369956| 997253630 | null |
|523528690|2015/369956|Government Gazett | 14225|134886307000 |2015/369956|997253630 |null |
|523528690| 997253630|Tax Registration | 14259| 134886307000|2015/369956| 997253630 | null |
|523528691| 997253633|Tax Doc | 14250| 997253633| 997253634| 997253635| 997253636|
|523528691| 997253634|Tax File | 14251| 997253633| 997253634| 997253635| 997253636|
|523528691| 997253635|Tax Data | 14252| 997253633| 997253634| 997253635| 997253636|
|523528691| 997253636|Tax Monitor | 14253| 997253633| 997253634| 997253635| 997253636|
+---------+-------------+--------------------+--------+-------------+-------------+-------------+-------------+
我所看到的预定义功能像枢轴,但它似乎不适合我的情况。
我使用Spark版本1.6和Scala版本2.10.5。
帮助appriciated!
@eliasah该解决方案解决了这个问题,并根据需要进行。谢谢:) – Svk
很高兴听到! – eliasah
@eliasah只是一个问题,当我试图通过一个大型数据集时,reg_1,.. reg_4列的排列不是按照原始数据框中的顺序排列的,因为在第1个reg_num不对应于reg_1。是否因为窗口函数正在使用order by子句? – Svk