2017-10-05 79 views
0

我想写的Python:XML从URL中检索到CSV

的XML的格式如下Python脚本,从一个网址动态读取XML数据(例如http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72):

<station id="KCQT" name="Los Angeles/USC Campus Downtown" elev="179" lat="34.02355" lon="-118.29122" provider="NWS/FAA"> 
<ob time="04 Oct 7:10 pm" utime="1507169400"> 
<variable var="T" description="Temp" unit="F" value="61"/> 
<variable var="TD" description="Dewp" unit="F" value="39"/> 
<variable var="RH" description="Relh" unit="%" value="45"/> 
</ob> 
<ob time="04 Oct 7:05 pm" utime="1507169100"> 
<variable var="T" description="Temp" unit="F" value="61"/> 
<variable var="TD" description="Dewp" unit="F" value="39"/> 
<variable var="RH" description="Relh" unit="%" value="45"/> 
</ob> 
<ob time="04 Oct 7:00 pm" utime="1507168800"> 
<variable var="T" description="Temp" unit="F" value="61"/> 
<variable var="TD" description="Dewp" unit="F" value="39"/> 
<variable var="RH" description="Relh" unit="%" value="45"/> 
</ob> 
<ob time="04 Oct 6:55 pm" utime="1507168500"> 
<variable var="T" description="Temp" unit="F" value="61"/> 
<variable var="TD" description="Dewp" unit="F" value="39"/> 
<variable var="RH" description="Relh" unit="%" value="45"/> 
</ob> 
</station> 

我只想检索所有可用日期的时间戳和小数温度(“Temp”)(这里有4个以上)。

输出应该是一个CSV格式的文本文件,其中时间戳和温度值每行打印一对。

下面是我的代码(这是可怕的,并没有在所有的工作)的尝试:

import requests 

weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72") 

import xml.etree.ElementTree as ET 
import csv 

tree = ET.parse(weatherXML) 
root = tree.getroot() 

# open file for writing 
Time_Temp = open('timestamp_temp.csv', 'w') 

#csv writer object 
csvwriter = csv.writer(Time_Temp) 
time_temp = [] 

count = 0 
for member in root.findall('ob'): 
    if count == 0: 
     temperature = member.find('T').var 
     time_temp.append(temperature) 
     csvwriter.writerow(time_temp) 
     count = count + 1 

    temperature = member.find('T').text 
    time_temp.append(temperature) 

Time_Temp.close() 

请帮助。

+0

我怎么没看到“时间的年,月,日,分,秒和区偏移”在XML中表示文件。 –

+0

@BillBell对不起,我编辑了这个要求。时间戳现在将遵循xml文件中表示的格式。谢谢。 – WandaW

+0

“没有工作”......你得到了什么错误?它应该已经炸掉了,只是解析文件。改为使用'ET.fromstring(weatherXML.text)'。 –

回答

0

假设PYT本3,这将工作。我注意到,如果需要的Python 2的区别:

import xml.etree.ElementTree as ET 
import requests 
import csv 

weatherXML = requests.get("http://www.wrh.noaa.gov/mesowest/getobextXml.php?sid=KCQT&num=72") 
root = ET.fromstring(weatherXML.text) 

# Use this with Python 2 
# with open('timestamp_temp.csv','wb') as Time_Temp: 

with open('timestamp_temp.csv','w',newline='') as Time_Temp: 
    csvwriter = csv.writer(Time_Temp) 
    csvwriter.writerow(['Time','Temp']) 
    for member in root.iterfind('ob'): 
     date = member.attrib['time'] 
     temp = member.find("variable[@var='T']").attrib['value'] 
     csvwriter.writerow([date,temp]) 

输出:

Time,Temp 
04 Oct 11:47 pm,65 
04 Oct 10:47 pm,66 
04 Oct 9:47 pm,68 
04 Oct 8:47 pm,68 
04 Oct 7:47 pm,68 
04 Oct 6:47 pm,70 
04 Oct 5:47 pm,74 
04 Oct 4:47 pm,75 
    . 
    . 
0

可以遍历元素ob第一,获得元素ob的属性time,并查找其varT元素变量,并获得元素value温度,它们添加到列表,并将其写入到CSV文件:

import xml.etree.ElementTree as ET 
import csv 
tree = ET.parse('getobextXml.php.xml') 
root = tree.getroot() 
# open file for writing 
with open('timestamp_temp.csv', 'wb') as csvfile: 
    csvwriter = csv.writer(csvfile) 
    csvwriter.writerow(["Time","Temp"]) 
    for ob in root.iter('ob'): 
     time_temp = [] 
     timestamp = ob.get('time') #get the attribute time of element ob 
     temp = ob.find("./variable[@var='T']").get('value') #find element variable which var is T, and get the element value 
     time_temp.append(timestamp) 
     time_temp.append(temp) 
     csvwriter.writerow(time_temp) 

后,你可以找到timestamp_temp.csv会给你的结果:

Time,Temp 
04 Oct 8:47 pm,68 
04 Oct 7:47 pm,68 
04 Oct 6:47 pm,70 
04 Oct 5:47 pm,74 
04 Oct 4:47 pm,75 
04 Oct 3:47 pm,75 
04 Oct 2:47 pm,77 
04 Oct 1:47 pm,78 
04 Oct 12:47 pm,78 
04 Oct 11:47 am,76 
04 Oct 10:47 am,74 
04 Oct 9:47 am,72 
...