2016-01-22 87 views
1
content='<p>Hello, the web site is <a href="https://www.google.com">Google</a></p>. <p>The search engine is <a href="https://www.baidu.com">Baidu</a></p>.' 
soup = BeautifulSoup(content, 'html.parser') 

现在我想在HREF的URL地址来替换整个<a> </a>。所以我想得到预期的结果:更换<a></a>与HREF在BeautifulSoup

Hello, the web site is https://www.google.com. The search engine is https://www.baidu.com. 

任何人都可以提供解决方案吗?

+0

和问题是什么?首先使用BS找到''并获得'href'。 – furas

回答

1

首先找到a并获得href那么你可以添加href以前的兄弟和删除a

from bs4 import BeautifulSoup 

content='<p>Hello, the web site is <a href="https://www.google.com">Google</a></p>. <p>The search engine is <a href="https://www.baidu.com">Baidu</a></p>.' 
soup = BeautifulSoup(content, 'html.parser') 

# find all `a` 
all_a = soup.findAll('a') 

for a in all_a: 
    # find `href` in `a` 
    href = a['href'] 

    #print('--- before ---') 
    #print(soup) 

    # add `href` to `previousSibling` 
    a.previousSibling.replaceWith(a.previousSibling + href) 

    # remove `a` 
    a.extract() 

    #print('--- after ---') 
    #print(soup) 

print(soup) 

'<p>Hello, the web site is https://www.google.com</p>. <p>The search engine is https://www.baidu.com</p>.'