2016-11-24 66 views
5

我有地图的RDD,我想将它转换到数据帧 这里是RDD如何转换地图的RDD到数据帧

val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
    Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"), 
    Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"), 
    Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"), 
    Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"), 
    Map("empid" -> "16", "empName" -> "John", "depId" -> "701"))) 

的输入格式有什么办法可以转换成数据帧像

val df=mapRDD.toDf 

df.show

empid, empName, depId 
12  Rohan  201 
13  Ross  201 
14  Richard  401 
15  Michale  501 
16  John  701 
+2

“我RDD的地图” - 更准确,你有“一个地图的RDD“。 –

回答

10

您可以轻松地将其转换成星火DataFra我:

这里是会做的伎俩代码:

val mapRDD= sc.parallelize(Seq(
    Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"), 
    Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"), 
    Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"), 
    Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"), 
    Map("empid" -> "16", "empName" -> "John", "depId" -> "701"))) 

val columns=mapRDD.take(1).flatMap(a=>a.keys) 

val resultantDF=mapRDD.map{value=> 
     val list=value.values.toList 
     (list(0),list(1),list(2)) 
     }.toDF(columns:_*) 

resultantDF.show() 

输出是:

+-----+-------+-----+ 
|empid|empName|depId| 
+-----+-------+-----+ 
| 12| Rohan| 201| 
| 13| Ross| 201| 
| 14|Richard| 401| 
| 15|Michale| 501| 
| 16| John| 701| 
+-----+-------+-----+