2011-10-07 78 views
3

我想使用lxml的ElementTree etree在我的xml文档中查找特定的标签。 标签如下所示:在Python lxml中查找前缀标记的技巧?

<text:ageInformation> 
    <text:statedAge>12</text:statedAge> 
</text:ageInformation> 

我希望用etree.find(“文本:statedAge”),但这种方法并不像“文”字头。 它提到我应该将“文本”添加到前缀映射中,但我不确定如何去做。有小费吗?

编辑: 我希望能够写入hr4e前缀标签。 下面是该文件的重要组成部分:在XML文档中

<?xml version="1.0" encoding="utf-8"?> 
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd"> 
    <header> 
    <documentID root="18c41e51-5f4d-4d15-993e-2a932fed720a" /> 
    <title>Health Records for Everyone Continuity of Care Document</title> 
    <version> 
    <number>1</number> 
</version> 
<confidentiality codeSystem="2.16.840.1.113883.5.25" code="N" /> 
<documentTimestamp value="201105300211+0800" /> 
<personalInformation> 
    <patientInformation> 
    <personID root="2.16.840.1.113883.3.881.PI13023911" /> 
    <personAddress> 
     <streetAddressLine nullFlavor="NI" /> 
     <city>Santa Cruz</city> 
     <state nullFlavor="NI" /> 
     <postalCode nullFlavor="NI" /> 
    </personAddress> 
    <personPhone nullFlavor="NI" /> 
    <personInformation> 
     <personName> 
     <given>Benjamin</given> 
     <family>Keidan</family> 
     </personName> 
     <gender codeSystem="2.16.840.1.113883.5.1" code="M" /> 
     <personDateOfBirth value="NI" /> 
     <hr4e:ageInformation> 
     <hr4e:statedAge>9424</hr4e:statedAge> 
     <hr4e:estimatedAge>0912</hr4e:estimatedAge> 
     <hr4e:yearInSchool>1</hr4e:yearInSchool> 
     <hr4e:statusInSchool>attending</hr4e:statusInSchool> 
     </hr4e:ageInformation> 
    </personInformation> 
    <hr4e:livingSituation> 
     <hr4e:homeVillage>Putney</hr4e:homeVillage> 
     <hr4e:tribe>Oromo</hr4e:tribe> 
    </hr4e:livingSituation> 
    </patientInformation> 
</personalInformation> 

回答

7

命名空间前缀必须声明(映射到URI)。然后你可以使用{URI}localname notation找到text:statedAge和其他元素。像这样:

from lxml import etree 

XML = """ 
<root xmlns:text="http://example.com"> 
<text:ageInformation> 
    <text:statedAge>12</text:statedAge> 
</text:ageInformation> 
</root>""" 

root = etree.fromstring(XML) 

ageinfo = root.find("{http://example.com}ageInformation") 
age = ageinfo.find("{http://example.com}statedAge") 
print age.text 

这将打印“12”。

做的另一种方式:

ageinfo = root.find("text:ageInformation", 
        namespaces={"text": "http://example.com"}) 
age = ageinfo.find("text:statedAge", 
        namespaces={"text": "http://example.com"}) 
print age.text 

您还可以使用XPath

age = root.xpath("//text:statedAge", 
       namespaces={"text": "http://example.com"})[0] 
print age.text 
+0

我不断收到NoneTypes。 .. 是我的根文件。 我试过ageInfo = root.find(“{hr4e :: patientdata} ageInformation”) – super

+0

@super:如果您提供了一个完整的示例XML文档(更新问题),这将有所帮助。 – mzjn

+0

kk。我包括它。 – super

1

我最后不得不使用嵌套的前缀:

from lxml import etree 

XML = """ 
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd"> 
<personInformation> 
<hr4e:ageInformation> 
    <hr4e:statedAge>12</hr4e:statedAge> 
</hr4e:ageInformation> 
</personInformation> 
</greenCCD>""" 

root = etree.fromstring(XML) 
#root = etree.parse("hr4e_patient.xml") 

ageinfo = root.find("{AlschulerAssociates::GreenCDA}personInformation/{hr4e::patientdata}ageInformation") 
age = ageinfo.find("{hr4e::patientdata}statedAge") 
print age.text 
+0

伟大的,它适合你(我认为我给了原来的问题一个很好的答案,考虑到有关实际命名空间的重要信息被省略)。 – mzjn

+0

没有你的帮助,我不会找到我的解决方案。非常感谢您的亲切先生。 – super