2017-05-26 67 views
0

使用Beautifulsoup,我想找到<a><p>封闭,并用<p>它括包装他们,但我不知道该怎么办呢如何找到不是由特定标签环绕标签与标签

<p><a href="example1.com">example1.com</a></p> 
<p><a href="example2.com">example2.com</a></p> 
<a href="example3.com">example3.com</a> 
<p><a href="example3.com">example3.com</a></p> 

我想改变HTML如上

<p><a href="example1.com">example1.com</a></p> 
<p><a href="example2.com">example2.com</a></p> 
<p><a href="example3.com">example3.com</a></p> <-here 
<p><a href="example3.com">example3.com</a></p> 
+0

你尝试过什么?你的代码? –

回答

2

你需要使用css selectorwrap他们每个人的选择那些美女主播与p标签

In [2]: from bs4 import BeautifulSoup as BS 

In [3]: html = """<p><a href="example1.com">example1.com</a></p> 
    ...: <p><a href="example2.com">example2.com</a></p> 
    ...: <a href="example3.com">example3.com</a> 
    ...: <p><a href="example3.com">example3.com</a></p>""" 

In [4]: soup = BS(html, "html.parser") 

In [5]: for a in soup.select("p ~ a"): 
    ...:  a.wrap(soup.new_tag("p")) 
    ...:  

In [6]: soup 
Out[6]: 
<p><a href="example1.com">example1.com</a></p> 
<p><a href="example2.com">example2.com</a></p> 
<p><a href="example3.com">example3.com</a></p> 
<p><a href="example3.com">example3.com</a></p> 
1
soup = BeautifulSoup(...) 
items = soup.find_all('a') 
for item in items: 
    if item.parent.name != u'p': 
     item.wrap(soup.new_tag('p')) 
0

试试这个:

from bs4 import BeautifulSoup 

    data = """ 
    <p><a href="example1.com">example1.com</a></p> 
    <p><a href="example2.com">example2.com</a></p> 
    <a href="example3.com">example3.com</a> 
    <p><a href="example3.com">example3.com</a></p> 
    """ 


    soup = BeautifulSoup(data, 'html.parser') 
    for a in soup('a'): # shortcut for soup.find_all('p') 

     if a.parent.name != 'p' : 
      a.wrap(soup.new_tag("p")) 
    print soup