在Python lxml中查找前缀标记的技巧？

我想使用lxml的ElementTree etree在我的xml文档中查找特定的标签。标签如下所示：在Python lxml中查找前缀标记的技巧？

<text:ageInformation> 
    <text:statedAge>12</text:statedAge> 
</text:ageInformation>

我希望用etree.find（“文本：statedAge”），但这种方法并不像“文”字头。它提到我应该将“文本”添加到前缀映射中，但我不确定如何去做。有小费吗？

编辑：我希望能够写入hr4e前缀标签。下面是该文件的重要组成部分：在XML文档中

<?xml version="1.0" encoding="utf-8"?> 
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd"> 
    <header> 
    <documentID root="18c41e51-5f4d-4d15-993e-2a932fed720a" /> 
    <title>Health Records for Everyone Continuity of Care Document</title> 
    <version> 
    <number>1</number> 
</version> 
<confidentiality codeSystem="2.16.840.1.113883.5.25" code="N" /> 
<documentTimestamp value="201105300211+0800" /> 
<personalInformation> 
    <patientInformation> 
    <personID root="2.16.840.1.113883.3.881.PI13023911" /> 
    <personAddress> 
     <streetAddressLine nullFlavor="NI" /> 
     <city>Santa Cruz</city> 
     <state nullFlavor="NI" /> 
     <postalCode nullFlavor="NI" /> 
    </personAddress> 
    <personPhone nullFlavor="NI" /> 
    <personInformation> 
     <personName> 
     <given>Benjamin</given> 
     <family>Keidan</family> 
     </personName> 
     <gender codeSystem="2.16.840.1.113883.5.1" code="M" /> 
     <personDateOfBirth value="NI" /> 
     <hr4e:ageInformation> 
     <hr4e:statedAge>9424</hr4e:statedAge> 
     <hr4e:estimatedAge>0912</hr4e:estimatedAge> 
     <hr4e:yearInSchool>1</hr4e:yearInSchool> 
     <hr4e:statusInSchool>attending</hr4e:statusInSchool> 
     </hr4e:ageInformation> 
    </personInformation> 
    <hr4e:livingSituation> 
     <hr4e:homeVillage>Putney</hr4e:homeVillage> 
     <hr4e:tribe>Oromo</hr4e:tribe> 
    </hr4e:livingSituation> 
    </patientInformation> 
</personalInformation>

来源

2011-10-07 super

命名空间前缀必须声明（映射到URI）。然后你可以使用{URI}localname notation找到text:statedAge和其他元素。像这样：

from lxml import etree 

XML = """ 
<root xmlns:text="http://example.com"> 
<text:ageInformation> 
    <text:statedAge>12</text:statedAge> 
</text:ageInformation> 
</root>""" 

root = etree.fromstring(XML) 

ageinfo = root.find("{http://example.com}ageInformation") 
age = ageinfo.find("{http://example.com}statedAge") 
print age.text

这将打印“12”。

做的另一种方式：

ageinfo = root.find("text:ageInformation", 
        namespaces={"text": "http://example.com"}) 
age = ageinfo.find("text:statedAge", 
        namespaces={"text": "http://example.com"}) 
print age.text

您还可以使用XPath：

age = root.xpath("//text:statedAge", 
       namespaces={"text": "http://example.com"})[0] 
print age.text

来源

2011-10-08 09:06:28 mzjn

我不断收到NoneTypes。 .. 是我的根文件。我试过ageInfo = root.find（“{hr4e :: patientdata} ageInformation”） – super

@super：如果您提供了一个完整的示例XML文档（更新问题），这将有所帮助。 – mzjn

kk。我包括它。 – super

我最后不得不使用嵌套的前缀：

from lxml import etree 

XML = """ 
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd"> 
<personInformation> 
<hr4e:ageInformation> 
    <hr4e:statedAge>12</hr4e:statedAge> 
</hr4e:ageInformation> 
</personInformation> 
</greenCCD>""" 

root = etree.fromstring(XML) 
#root = etree.parse("hr4e_patient.xml") 

ageinfo = root.find("{AlschulerAssociates::GreenCDA}personInformation/{hr4e::patientdata}ageInformation") 
age = ageinfo.find("{hr4e::patientdata}statedAge") 
print age.text

来源

2011-10-11 22:53:26 super

伟大的，它适合你（我认为我给了原来的问题一个很好的答案，考虑到有关实际命名空间的重要信息被省略）。 – mzjn

没有你的帮助，我不会找到我的解决方案。非常感谢您的亲切先生。 – super

在Python lxml中查找前缀标记的技巧？

回答

相关问题