样品JSON JavaRDD
{ “名”: “开发”, “工资”:10000, “职业”:“ENGG ”, “地址”: “诺伊达”} { “名”: “KARTHIK”, “工资”:20000, “职业”: “ENGG”, “地址”: “诺伊达”}
有用的代码:
final List<Map<String,String>> jsonData = new ArrayList<>();
DataFrame df = sqlContext.read().json("file:///home/dev/data-json/emp.json");
JavaRDD<String> rdd = df.repartition(1).toJSON().toJavaRDD();
rdd.foreach(new VoidFunction<String>() {
@Override
public void call(String line) {
try {
jsonData.add (new ObjectMapper().readValue(line, Map.class));
System.out.println(Thread.currentThread().getName());
System.out.println("List size: "+jsonData.size());
} catch (IOException e) {
e.printStackTrace();
}
}
});
System.out.println(Thread.currentThread().getName());
System.out.println("List size: "+jsonData.size());
jsonData
最后是空的。
输出:
Executor task launch worker-1
List size: 1
Executor task launch worker-1
List size: 2
Executor task launch worker-1
List size: 3
.
.
.
Executor task launch worker-1
List size: 100
main
List size: 0
由于列表在开始时似乎是空的,它可能是对象映射器无法解析它得到的行吗?你能提供一个[mcve]吗? – Thomas
什么是'rdd'? – khelwood
也许'System.out.println'在foreach完成任务之前执行(或者甚至开始)? – freedev