1
val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, _dataType, _period, struct(_VALUE, _time) as Value from df_h a left outer join df_ds b on a.batchId = b.batchId left outer join df_dsv c on b.batchId = c.batchId left outer join df_nv d on c.batchId = d.batchId))"
final_df.repartition(1).write.format("xml").option("rowTag","NewTag").save(output_path)
架构上面行XML是如下保存数据帧中的火花SQL
root
|-- _xmlns: string (nullable = true)
|-- md:Date: string (nullable = true)
|-- md:Creator: string (nullable = true)
|-- Station: struct (nullable = false)
| |-- _ngr: string (nullable = true)
| |-- _region: string (nullable = true)
| |-- SetofValues: struct (nullable = false)
| | |-- _dataType: string (nullable = true)
| | |-- _period: string (nullable = true)
| | |-- Value: struct (nullable = false)
| | | |-- _VALUE: double (nullable = true)
| | | |-- _time: string (nullable = true)
当我试图挽救数据帧的使用上面的命令得到XML文件,如下XML。
<ROWS>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 3.509" time="05:30:00"></Value>
</SetofValues>
</Station>
</NewTag>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 2.6" time="05:45:00"></Value>
</SetofValues>
</Station>
</NewTag>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 1.111" time="06:00:00"></Value>
</SetofValues>
</Station>
</NewTag>
</ROWS>
如何实现以下输出。通过创建阵列来回行..
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value time="05:30:00">3.509</Value>
<Value time="05:45:00">2.6</Value>
<Value time="06:00:00">1.111</Value>
</SetofValues>
</Station>
</NewTag>
我不能够在不同的行转换成数组列表中XML实现阵列
您的数据是不正确的格式本身。这就是为什么它是这样打印的原因。做一个final_df.show并看看它。正确转换数据,按照你的想法对它进行分组,然后将其保存。 –
@AbhishekAnand你能帮忙把行转换成数组吗? – Naveen