使用BeautifulSoup提取两个节点

我有一个文件这样之间的兄弟节点：使用BeautifulSoup提取两个节点

<p class="top">I don't want this</p> 

<p>I want this</p> 
<table> 
    <!-- ... --> 
</table> 

<img ... /> 

<p> and all that stuff too</p> 

<p class="end>But not this and nothing after it</p>

我想提取的P [CLASS =顶部]和P [CLASS =结束]段落之间的一切。

有没有一种很好的方式可以用BeautifulSoup做到这一点？

来源

2010-03-24 Oli

node.nextSibling属性是你的解决方案：

from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup(html) 

nextNode = soup.find('p', {'class': 'top'}) 
while True: 
    # process 
    nextNode = nextNode.nextSibling 
    if getattr(nextNode, 'name', None) == 'p' and nextNode.get('class', None) == 'end': 
     break

这种复杂的情况是，你所访问的HTML标签，而不是串节点的属性是肯定的。

来源

2010-03-24 12:03:45

使用BeautifulSoup提取两个节点

回答

相关问题