2017-07-27 77 views
1

我在Python中获取使用请求模块的响应,响应是xml形式。我想解析它并从每个'dt'标签中获取详细信息。我无法使用lxml来做到这一点。使用xlml解析xml(使用xlml解析xml的一种正确方法)

这里是XML响应:

<?xml version="1.0" encoding="utf-8" ?> 
    <entry_list version="1.0"> 
     <entry id="harsh"> 
      <ew>harsh</ew><subj>MD-2</subj><hw>harsh</hw> 
      <sound><wav>harsh001.wav</wav><wpr>[email protected]</wpr></sound> 
      <pr>ˈhärsh</pr> 
      <fl>adjective</fl> 
      <et>Middle English <it>harsk,</it> of Scandinavian origin; akin to Norwegian <it>harsk</it> harsh</et> 
      <def> 
       <date>14th century</date> 
       <sn>1</sn> 
       <dt>:having a coarse uneven surface that is rough or unpleasant to the touch</dt> 
       <sn>2 a</sn> 
       <dt>:causing a disagreeable or painful sensory reaction :<sx>irritating</sx></dt> 
       <sn>b</sn> 
       <dt>:physically discomforting :<sx>painful</sx></dt> 
       <sn>3</sn> 
       <dt>:unduly exacting :<sx>severe</sx></dt> 
       <sn>4</sn> 
       <dt>:lacking in aesthetic appeal or refinement :<sx>crude</sx></dt> 
       <ss>rough</ss> 
      </def> 
      <uro><ure>harsh*ly</ure> <fl>adverb</fl></uro> 
      <uro><ure>harsh*ness</ure> <fl>noun</fl></uro> 
     </entry> 
    </entry_list> 

回答

1

一个简单的方法是向下遍历XML文档的层次结构。

import requests 
from lxml import etree 

re = requests.get(url) 
root = etree.fromstring(re.content) 
print(root.xpath('//entry_list/entry/def/dt/text()')) 

这将为xml文档中的每个'dt'标签提供文本值。

0
from xml.dom import minidom 

# List with dt values 
dt_elems = [] 

# Process xml getting elements by tag name 
xmldoc = minidom.parse('text.xml') 
itemlist = xmldoc.getElementsByTagName('dt') 

# Get the values 
for i in itemlist: 
    dt_elems.append(" ".join(t.nodeValue for t in i.childNodes if t.nodeType==t.TEXT_NODE)) 

# Print the list result 
print dt_elems