我在一个数据帧的工作有三列,可乐,COLB和COLCPySpark:如何将行转换为向量?
+---+-----+-----+-----+
|id |colA |colB |colC |
+---+-----+-----+-----+
| 1 | 5 | 8 | 3 |
| 2 | 9 | 7 | 4 |
| 3 | 3 | 0 | 6 |
| 4 | 1 | 6 | 7 |
+---+-----+-----+-----+
我需要合并的可乐,COLB和COLC列得到这样的下面新建一个数据帧:
+---+--------------+
|id | colD |
+---+--------------+
| 1 | [5, 8, 3] |
| 2 | [9, 7, 4] |
| 3 | [3, 0, 6] |
| 4 | [1, 6, 7] |
+---+--------------+
这是获得第一个数据帧的pyspark代码:
l=[(1,5,8,3),(2,9,7,4), (3,3,0,6), (4,1,6,7)]
names=["id","colA","colB","colC"]
db=sqlContext.createDataFrame(l,names)
db.show()
如何将行转换为矢量?有谁能帮助我吗? 感谢