的XmlSlurper从未发现节点

我想页面凑一些DOM，看起来像这样：的XmlSlurper从未发现节点

<span>text</span>

，有时看起来是这样的：

<span><p>text</p></span>

不过，我只是似乎无法弄清楚如何在第二种情况下得到text。我试过多种方法，这里就是我的想法应该工作如下：

def html = slurper.parse(reader) 
Collection<NodeChild> nodes = html.'**'.findAll { it.name() == 'span' && [email protected] == 'style2' } 
... 
def descriptionNode = html.'**'.find { it.name() == 'span' && [email protected] == 'style20' } 
def innerNode = descriptionNode.'**'.find { it.name() == 'p' } 
def description 
if (innerNode?.size() > 0) 
{ 
description = innerNode.text() 
} 
else 
{ 
description = descriptionNode.text() 
}

任何想法，我需要怎么去使用的XmlSlurper得到我需要的行为？

来源

2011-01-24 Stefan Kendall

事实证明，HTML一定是无效的。 Tagsoup创建

<div> 
<span> 
</span> 
<p></p> 
</div>

但Firebug的显示

<div> 
<span> 
<p></p> 
</span> 
</div>

多么可怕的错误。

来源

2011-01-25 02:25:20

你有没有试过xpath：//span/text()？您可能需要查询两次以解释p标记。

来源

2011-01-24 06:08:28 Steven

这听起来像你想检查给定span是否包含嵌套p。您可以遍历span节点的子节点以检查该情况。例如：

def xml = """ 
<test> 
    <span>test1</span> 
    <span><p>test2</p></span> 
    <other><span>test3</span></other> 
    <other><span><p>test4</p></span></other> 
</test> 
""" 

def doc = new XmlSlurper().parseText(xml) 
def descriptions = [] 
doc.'**'.findAll { it.name() == 'span' }.each { node -> 
    if (node.children().find { it.name() == 'p' }) { 
      descriptions << node.p.text() 
    } else { 
      descriptions << node.text() 
    } 
} 
assert descriptions == ['test1', 'test2', 'test3', 'test4']

来源

2011-01-24 06:35:15 ataylor

的XmlSlurper从未发现节点

回答

相关问题