标签丢失时解析xml文件

我尝试解析xml文件。在标签中的文本被成功解析（或者看起来如此），但我想输出为不包含在某些标签中的文本，下面的程序只是忽略它。标签丢失时解析xml文件

from xml.etree.ElementTree import XMLTreeBuilder 

class HtmlLatex:      # The target object of the parser 
    out = '' 
    var = '' 
    def start(self, tag, attrib): # Called for each opening tag. 
     pass 
    def end(self, tag):    # Called for each closing tag. 
     if tag == 'i': 
      self.out += self.var 
     elif tag == 'sub': 
      self.out += '_{' + self.var + '}' 
     elif tag == 'sup': 
      self.out += '^{' + self.var + '}' 
     else: 
      self.out += self.var 
    def data(self, data): 
     self.var = data 
    def close(self): 
     print(self.out) 


if __name__ == '__main__': 
    target = HtmlLatex() 
    parser = XMLTreeBuilder(target=target) 

    text = '' 
    with open('input.txt') as f1: 
     text = f1.read() 

    print(text) 

    parser.feed(text) 
    parser.close()

输入我想分析的一部分： p0 = (m3+(2l2+l1) m2+(l22+2l1 l2+l12) m) /(m3+(3l2+2l1)) }.

来源

2010-01-02 Dimitris Leventeas

这就像没有XML我见过。当然你不想要一个_html_解析器？ – James 2010-01-02 15:08:25

它是从这里生产的：http://wims.unice.fr/wims/en_tool~linear~linsolver.en.html 当你得到解决方案时，如果你看看源代码，你会看到类似的东西。 – 2010-01-02 15:28:46

刚编辑出LaTeX标签。 ??? – 2010-01-02 17:03:53

这是一个pyparsing版本 - 我希望评论足够说明。

src = """<p><i>p</i><sub>0</sub> = (<i>m</i><sup>3</sup>+(2<i>l</i><sub>2</sub>+<i>l</i><sub>1</sub>) """ \ 
     """<i>m</i><sup>2</sup>+(<i>l</i><sub>2</sub><sup>2</sup>+2<i>l</i><sub>1</sub> <i>l</i><sub>2</sub>+""" \ 
     """<i>l</i><sub>1</sub><sup>2</sup>) <i>m</i>) /(<i>m</i><sup>3</sup>+(3<i>l</i><sub>2</sub>+""" \ 
     """2<i>l</i><sub>1</sub>)) }.</p>""" 

from pyparsing import makeHTMLTags, anyOpenTag, anyCloseTag, Suppress, replaceWith 

# set up tag matching for <sub> and <sup> tags 
SUB,endSUB = makeHTMLTags("sub") 
SUP,endSUP = makeHTMLTags("sup") 

# all other tags will be suppressed from the output 
ANY,endANY = map(Suppress,(anyOpenTag,anyCloseTag)) 

SUB.setParseAction(replaceWith("_{")) 
SUP.setParseAction(replaceWith("^{")) 
endSUB.setParseAction(replaceWith("}")) 
endSUP.setParseAction(replaceWith("}")) 

transformer = (SUB | endSUB | SUP | endSUP | ANY | endANY) 

# now use the transformer to apply these transforms to the input string 
print transformer.transformString(src)

给人

p_{0} = (m^{3}+(2l_{2}+l_{1}) m^{2}+(l_{2}^{2}+2l_{1} l_{2}+l_{1}^{2}) m) /(m^{3}+(3l_{2}+2l_{1})) }.

来源

2010-01-02 15:32:02 PaulMcG

看一看BeautifulSoup，一个Python库用于解析，导航和操作HTML和XML。它有一个方便的界面，可能会解决您的问题...

来源

2010-01-02 15:16:09 miku

感谢您的建议。我会看看它。 – 2010-01-02 16:07:50

标签丢失时解析xml文件

回答

相关问题