0
我有以下RDD:pyspark:转换RDD [DenseVector]到数据帧
rdd.take(5)给我:
[DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
DenseVector([5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0, 4.0, 9.0]),
DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
DenseVector([9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699])]
我想使它成为一个数据帧应该看如:
-------------------------------------------------------------------
| features |
-------------------------------------------------------------------
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0, 4.0, 9.0] |
|-----------------------------------------------------------------|
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
这可能吗?我试图使用df_new = sqlContext.createDataFrame(rdd,['features'])
,但它没有奏效。有没有人有任何建议?谢谢!
谢谢!地图(lambda x:(x,))看起来很神秘,请您详细说明一下?谢谢! – Edamame
'(x,)'是单个元素'元组'。映射是必需的,因为只有[某些对象可以转换为'Row'](http://stackoverflow.com/a/32742294/1560062) – zero323