2009-05-06 23 views

回答

8

假设变量test_html有以下HTML内容:

<html> 
<head><title>Test title</title></head> 
<body> 
<p>Some paragraph</p> 
Useless Text 
<a href="http://stackoverflow.com">Some link</a>not a link 
<a href="http://python.org">Another link</a> 
</body></html> 

只是这样做:

from BeautifulSoup import BeautifulSoup 

test_html = load_html_from_above() 
soup = BeautifulSoup(test_html) 

for t in soup.findAll(text=True): 
    text = unicode(t) 
    for vowel in u'aeiou': 
     text = text.replace(vowel, u'') 
    t.replaceWith(text) 

print soup 

,打印:

<html> 
<head><title>Tst ttl</title></head> 
<body> 
<p>Sm prgrph</p> 
Uslss Txt 
<a href="http://stackoverflow.com">Sm lnk</a>nt lnk 
<a href="http://python.org">Anthr lnk</a> 
</body></html> 

注意,标记和属性都不变。