2016-06-08 75 views
-1

我有一个XML文件,我想根据字符串检索元素的文本属性。使用基于文本字符串的lxml解析XML文件

在下面的例子中,我想找到包含字符串主页(两个元素)的所有主题元素。一旦我得到这些元素,我可以回溯文本的值。

<?xml version="1.0" ?> 
<zAppointments reminder="15"> 
    <appointment> 
     <subject>Bring pizza home</subject> 
     <shape>circule</shape> 
    </appointment> 
    <appointment> 
     <subject>Bring hamburger home</subject> 
     <shape>box</shape> 
    </appointment> 
    <appointment> 
     <subject>Bring banana homes</subject> 
    </appointment> 
    <appointment> 
     <subject>Check MS Office website for updates</subject> 
    </appointment> 
</zAppointments> 

回答

2

使用contains() XPath函数:

//subject[contains(., 'home')]/text() 

演示:

>>> import lxml.etree as ET 
>>> 
>>> data = """<?xml version="1.0" ?> 
... <zAppointments reminder="15"> 
...  <appointment> 
...   <subject>Bring pizza home</subject> 
...  </appointment> 
...  <appointment> 
...   <subject>Bring hamburger home</subject> 
...  </appointment> 
...  <appointment> 
...   <subject>Check MS Office website for updates</subject> 
... </appointment> 
... </zAppointments>""" 
>>> root = ET.fromstring(data) 
>>> root.xpath("//subject[contains(., 'home')]/text()") 
['Bring pizza home', 'Bring hamburger home'] 
+0

谢谢您的回答。是否有可能返回标签文本的元素?因为我想设置_shape_的值,以便在我的元素中找到字符串_home_ @appointment_ – Eagle

+0

@Eagle yes,您可以通过// subject [contains(。,'home')]'expression 。然后,从'.text'属性中获取文本。 – alecxe