2017-08-07 45 views
0

你好标签时,我喜欢下面:精华内容,包括与BeautifulSoup 4

soup.find('div', id='id1') 

我得到这样的:

<div id="id1"> 
<p id="ptag"> hello this is "p" tag</p> 
<span id="spantag"> hello this is "p" tag</span> 
<div id="divtag"> hello this is "p" tag</div> 
<h1 id="htag"> hello this is "p" tag</h1> 
</div> 

在哪里,我只需要像这样:

<p id="ptag"> hello this is "p" tag</p> 
<span id="spantag"> hello this is "p" tag</span> 
<div id="divtag"> hello this is "p" tag</div> 
<h1 id="htag"> hello this is "p" tag</h1> 

有什么办法可以得到像上面这样的内容吗?我尝试使用.contents,但没有得到我需要的东西。

感谢

回答

1
from bs4 import BeautifulSoup 

html = """<div id="id1"> 
<p id="ptag"> hello this is "p" tag</p> 
<span id="spantag"> hello this is "p" tag</span> 
<div id="divtag"> hello this is "p" tag</div> 
<h1 id="htag"> hello this is "p" tag</h1> 
</div>""" 

soup = BeautifulSoup(html, 'html.parser') 
el = soup.find('div', id='id1') 
print el.decode_contents(formatter="html") 

输出:

<p id="ptag"> hello this is "p" tag</p> 
<span id="spantag"> hello this is "p" tag</span> 
<div id="divtag"> hello this is "p" tag</div> 
<h1 id="htag"> hello this is "p" tag</h1> 
+0

谢谢你洙多!!!!为我工作:) –

0

使用contents我得到了以下几点:

[u'\n', <p id="ptag"> hello this is "p" tag</p>, u'\n', <span id="spantag"> hello this is "p" tag</span>, u'\n', <div id="divtag"> hello this is "p" tag</div>, u'\n', <h1 id="htag"> hello this is "p" tag</h1>, u'\n'] 

通过迭代列表,你可以很容易地得到输出你想(跳过\n元素)。

0

我假设soup.find是变量名,然后:

soup.find = re.sub("<div>.*<\/div>", "", soup.find) 

可能工作。

0

有在BeautifulSoup一个特定的功能,将做的正是你所需要的 - unwrap()

Tag.unwrap()wrap()相反。它用标签内的任何内容替换标签。这是很好的剔除标记

工作例如:

from bs4 import BeautifulSoup 


data = """ 
<div id="id1"> 
<p id="ptag"> hello this is "p" tag</p> 
<span id="spantag"> hello this is "p" tag</span> 
<div id="divtag"> hello this is "p" tag</div> 
<h1 id="htag"> hello this is "p" tag</h1> 
</div> 
""" 
soup = BeautifulSoup(data, 'html.parser') 
soup.div.unwrap() 

print(soup) 

将打印:

<p id="ptag"> hello this is "p" tag</p> 
<span id="spantag"> hello this is "p" tag</span> 
<div id="divtag"> hello this is "p" tag</div> 
<h1 id="htag"> hello this is "p" tag</h1>