提取在不同的txt文件中的每个XML节点

我有一个XML文件是这样的：提取在不同的txt文件中的每个XML节点

<root> 
    <article> 
     <article_taxonomy></article_taxonomy> 
     <article_place>Somewhere</article_place> 
     <article_number>1</article_number> 
     <article_date>2001</article_date> 
     <article_body>Blah blah balh</article_body> 
    </article> 

    <article> 
     <article_taxonomy></article_taxonomy> 
     <article_place>Somewhere</article_place> 
     <article_number>2</article_number> 
     <article_date>2001</article_date> 
     <article_body>Blah blah balh</article_body> 
    </article> 

    ... 
    ... 
    more nodes 

</root>

什么，我试图做的是给每个节点（从<article> to </article>标签）提取出来并写成一个单独的TXT或XML文件。我想保留标签也。

是否有可能没有正则表达式呢？有什么建议吗？

来源

2014-09-03 anarchos78

我建议使用XML的一个模块，而不是正则表达式。它以正确的方式完成工作。顺便说一句，你的XML似乎没有一个根节点是有效的。 – 2014-09-03 14:09:50

下面是一个使用ElementTree做到这一点的一种方法：

import xml.etree.ElementTree as ElementTree 

def main(): 
    with open('data.xml') as f: 
     et = ElementTree.parse(f) 
     for article in et.findall('article'): 
      xml_string = ElementTree.tostring(article) 
      # Now you can write xml_string to a new file 
      # Take care to name the files sequentially 

if __name__ == '__main__': 
    main()

来源

2014-09-03 14:20:22

尝试这样：

from xml.dom import minidom 
xmlfile = minidom.parse('yourfile.xml') 
#for example for 'article_body' 
article_body = xmlfile.getElementsByTagName('article_body')

或

import xml.etree.ElementTree as ET 
xmlfile = ET.parse('yourfile.xml') 
root_tag = xmlfile.getroot() 
for each_article in root_tag.findall('article'): 
    article_taxonomy = each_article.find('article_taxonomy').text 
    article_place = each_article.find('article_place').text 
    # etc etc

来源

2014-09-03 14:17:33 doniyor

提取在不同的txt文件中的每个XML节点

回答

相关问题