某些element.tail属性是空的，尽管它们不应该

我想解析的是large XML file（带有一本圣经书），它使用python 3.4中的xml.etree.ElementTree（为了与Windows兼容，我希望保持标准库模块），相关的方法在这里。某些element.tail属性是空的，尽管它们不应该

class BibleTree: 
    def __init__(self, file_name: str) -> None: 
     self.root = ET.parse(file_name).getroot() 

    @staticmethod 
    def _list_to_clean_text(str_in: str) -> str: 
     out = re.sub(r'[\s\n]+', ' ', str_in, flags=re.DOTALL) 
     return out.strip() 

    @staticmethod 
    def _clean_text(intext: Optional[str]) -> str: 
     return intext if intext is not None else '' 

    def __iter__(self) -> Tuple[int, int, str]: 
     collected = None 
     cur_chap = 0 
     cur_verse = 0 

     for child in self.root: 
      if child.tag in ['kap', 'vers']: 
       if collected and collected.strip(): 
        yield cur_chap, cur_verse, self._list_to_clean_text(collected) 
       if child.tag == 'kap': 
        cur_chap = int(child.attrib['n']) 
       elif child.tag == 'vers': 
        cur_verse = int(child.attrib['n']) 
       collected = self._clean_text(child.tail) 
      else: 
       if collected is not None: 
        collected += self._clean_text(child.text) 
        collected += self._clean_text(child.tail)

的问题是，在某些情况下（例如，线路54上的元件<odkazo/>）可变child的tail属性是无，尽管它应该是IMHO文本。

有什么想法，我做错了吗？

来源

2016-08-18 mcepl

代码应该做什么？代码不完整，所以我无法运行;例如有一个对'Optional'的引用，它没有被定义。如果你能提供清晰的[mcve]，这将有所帮助。 – mzjn

这是PEBKAC ...我假定其他元素中没有里程元素。所以，我需要将整个函数重写为递归函数。好吧。

来源

2016-08-20 18:59:12 mcepl

某些element.tail属性是空的，尽管它们不应该

回答

相关问题