没有命名空间声明的漂亮的打印子节点

我有一个XML文档，我想提取一个子节点（boundedBy）和pretty_print完全像它在原始文档中看起来（除了漂亮的格式）。没有命名空间声明的漂亮的打印子节点

<?xml version="1.0" encoding="UTF-8" ?> 
<wfs:FeatureCollection 
    xmlns:sei="https://somedomain.com/namespace" 
    xmlns:wfs="http://www.opengis.net/wfs" 
    xmlns:gml="http://www.opengis.net/gml" 
    xmlns:ogc="http://www.opengis.net/ogc" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://www.opengis.net/wfs http://schemas.opengis.net/wfs/1.1.0/wfs.xsd 
         https://somedomain.com/schemas/wfsnamespace some.xsd"> 
     <gml:boundedBy> 
     <gml:Box srsName="EPSG:4326"> 
      <gml:coordinates>-10.934396,-139.997120 77.396455,-53.627763</gml:coordinates> 
     </gml:Box> 
     </gml:boundedBy> 
    <gml:featureMember> 
     <sei:HUB_HEIGHT_FCST> 
     <!--- This is the section I want ---> 
     <gml:boundedBy> 
      <gml:Box srsName="EPSG:4326"> 
       <gml:coordinates>14.574435,-139.997120 14.574435,-139.997120</gml:coordinates> 
      </gml:Box> 
     </gml:boundedBy> 
     <!--- This is the section I want ---> 
     <sei:geometry_4326> 
     <gml:Point srsName="EPSG:4326"> 
      <gml:coordinates>14.574435,-139.997120</gml:coordinates> 
     </gml:Point> 
     </sei:geometry_4326> 
     <sei:rundatetime>2017-09-26 00:00:00</sei:rundatetime> 
     <sei:validdatetime>2017-09-26 17:00:00</sei:validdatetime> 
     </sei:HUB_HEIGHT_FCST> 
    </gml:featureMember> 
</wfs:FeatureCollection>

这里是我如何提取的子节点：

# parse the xml string 
parser = etree.XMLParser(remove_blank_text=True, remove_comments=True, recover=False, strip_cdata=False) 
root = etree.fromstring(xmlstr, parser=parser) 
#find the subnode I want 
subnodes = root.xpath("./gml:boundedBy", namespaces={'gml': 'http://www.opengis.net/gml'}) 
subnode = subnodes[0] 
# make a pretty output 
xmlstr = etree.tostring(subnode, xml_declaration=False, encoding="UTF-8", pretty_print=True) 
print xmlstr

这给了我这个。不幸的是，lxml正在将命名空间添加到boundedBy节点（为了xml的完整性，这是有意义的）。

<gml:boundedBy xmlns:gml="http://www.opengis.net/gml" xmlns:sei="https://somedomain.com/namespace" xmlns:wfs="http://www.opengis.net/wfs" xmlns:ogc="http://www.opengis.net/ogc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
    <gml:Box srsName="EPSG:4326"> 
    <gml:coordinates>-10.934396,-139.997120 77.396455,-53.627763</gml:coordinates> 
    </gml:Box> 
</gml:boundedBy>

我只希望子节点，因为它在原始文档中看去。

<gml:boundedBy> 
    <gml:Box srsName="EPSG:4326"> 
     <gml:coordinates>14.574435,-139.997120 14.574435,-139.997120</gml:coordinates> 
    </gml:Box> 
</gml:boundedBy>

我有没有使用lxml的灵活，但无论哪种方式，我还没有找到如何做到这一点的选择。

编辑：由于有人指出，我应该解释为什么我要做到这一点...

我试图登录的XML片段，而不会改变它的原始结构。我正在构建的自动化测试查看某些节点的正确性。在这个过程中，我正在记录这个片段，并希望让它更容易阅读。一些片段可能会变得相当大，这就是为什么pretty_print非常好。

来源

2017-09-26 Marcel Wilson

您正在要求图书馆帮助您创建“XML”，它不是* [** namespace-well-formed **]（https://stackoverflow.com/a/25830482/290085）。这不会帮助你做到这一点，你不应该试图做到这一点。 – kjhughes

...但如果你只是真的希望未包含* unused *名称空间声明，那么你的请求会更合理。他们在那里没有错 - 只是不必要的，可以说是难看。 – kjhughes

我很清楚，lxml添加它们并没有错。这不是我问的问题。我想打印原始文档的一个片段。这个的目的不在于有效的xml，而是关于打印xml的部分。 –

您可以使用Python正则表达式模块（re）。有一个function for substitution。所以你可以用一个空字符串替换命名空间。

import re 

print re.sub(' xmlns:\w+="[^"]+"', '', xmlstr)

来源

2017-09-26 22:46:56

我想过这样做。它感觉有点脏。 –

没有命名空间声明的漂亮的打印子节点

回答

相关问题