2016-12-03 106 views
1
val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, _dataType, _period, struct(_VALUE, _time) as Value from df_h a left outer join df_ds b on a.batchId = b.batchId left outer join df_dsv c on b.batchId = c.batchId left outer join df_nv d on c.batchId = d.batchId))" 
final_df.repartition(1).write.format("xml").option("rowTag","NewTag").save(output_path) 

架构上面行XML是如下保存数据帧中的火花SQL

root 
|-- _xmlns: string (nullable = true) 
|-- md:Date: string (nullable = true) 
|-- md:Creator: string (nullable = true) 
|-- Station: struct (nullable = false) 
| |-- _ngr: string (nullable = true) 
| |-- _region: string (nullable = true) 
| |-- SetofValues: struct (nullable = false) 
| | |-- _dataType: string (nullable = true) 
| | |-- _period: string (nullable = true) 
| | |-- Value: struct (nullable = false) 
| | | |-- _VALUE: double (nullable = true) 
| | | |-- _time: string (nullable = true) 

当我试图挽救数据帧的使用上面的命令得到XML文件,如下XML。

<ROWS> 
<NewTag xmlns="testing"> 
    <md:Date>2016-10-30</md:Date> 
    <md:Creator>USER_1</md:Creator> 
    <Station ngr="123456" region="North East"> 
     <SetofValues dataType="Total" period="15 min"> 
      <Value 3.509" time="05:30:00"></Value> 
     </SetofValues> 
    </Station> 
</NewTag> 
<NewTag xmlns="testing"> 
    <md:Date>2016-10-30</md:Date> 
    <md:Creator>USER_1</md:Creator> 
    <Station ngr="123456" region="North East"> 
     <SetofValues dataType="Total" period="15 min"> 
      <Value 2.6" time="05:45:00"></Value> 
     </SetofValues> 
    </Station> 
</NewTag> 
<NewTag xmlns="testing"> 
    <md:Date>2016-10-30</md:Date> 
    <md:Creator>USER_1</md:Creator> 
    <Station ngr="123456" region="North East"> 
     <SetofValues dataType="Total" period="15 min"> 
      <Value 1.111" time="06:00:00"></Value> 
     </SetofValues> 
    </Station> 
</NewTag> 
</ROWS> 

如何实现以下输出。通过创建阵列来回行..

<NewTag xmlns="testing"> 
<md:Date>2016-10-30</md:Date> 
<md:Creator>USER_1</md:Creator> 
<Station ngr="123456" region="North East"> 
    <SetofValues dataType="Total" period="15 min"> 
     <Value time="05:30:00">3.509</Value> 
     <Value time="05:45:00">2.6</Value> 
     <Value time="06:00:00">1.111</Value> 
    </SetofValues> 
</Station> 
</NewTag> 

我不能够在不同的行转换成数组列表中XML实现阵列

+0

您的数据是不正确的格式本身。这就是为什么它是这样打印的原因。做一个final_df.show并看看它。正确转换数据,按照你的想法对它进行分组,然后将其保存。 –

+0

@AbhishekAnand你能帮忙把行转换成数组吗? – Naveen

回答

0

迟到了,但以防万一有人怀疑你的架构包含一个价值对于值中的每组每个车站的每一根,像...

Root Station Set Value 
Root Station Set Value 
Root Station Set Value 
Root Station Set Value 

如果你想有一个输出需要通过按键减少,使“价值”的数组。

所以后三个键还原您的数据帧会是什么样子......

Root Station Set [Value, Value, Value, ...]