BeautifulSoup - Finding Logos

我正在使用BeautifulSoup和Python 3来识别网站徽标的自动化程序。对于第一步，我正在寻找图像名称中包含术语“徽标”的图像。它实际上运作良好。然而，我想把这个扩展到一个可能包含术语图像的图像，或者包含在一个带有标识的class/id /属性的链接中，或者甚至更深地埋藏在一个包含一类“商标'。例如：BeautifulSoup - Finding Logos

<div id="logo"> 
    <a href="http://www.mexgrocer.com/"> 
     <img src="http://ep.yimg.com/ca/I/mex-grocer_2269_22595" width="122" height="72" border="0" hspace="0" vspace="0" alt="Mexican Food"> 
    </a> 
</div>

我的代码现在的问题是：

img = soup.find("img",src=re.compile(r'logo',re.I))

我怎么能扩展为通过所有父标签属性的搜索？

来源

2014-11-01 user2694306

使用find_all来查找整个文档中的所有特定标签。你可以尝试这样的

from bs4 import Beautifulsoup 
import urllib2 
soup = BeautifulSoup(urllib2.urlopen('your_url').read()) 
for x in soup.find_all(id='logo'): 
    try: 
     if x.name == 'img': 
      print x['src'] 
    except:pass

，如果你想在类搜索，只需使用类= '标志'

来源

2014-11-01 19:05:59 Hackaholic

但是，您如何搜索所有可能的属性？ – user2694306 2014-11-01 19:41:31

所有可能的手段，如id，类风格，名称等？ – Hackaholic 2014-11-01 19:47:12

你可以使用'attrs'。尝试像这样'soup.find_all（attrs = {'id'：'id_value'，'name'：'name_value'，'class'：'class_name'}）' – Hackaholic 2014-11-01 19:54:43

可以find_all（标签，atributte）的使用，例如：

from bs4 import Beautifulsoup 
soup = BeautifulSoup(f) 

var =soup.find_all("font",color="#990000") //all <font color=#990000></font> 
var2 = soup.find_all("a",class_="LinkIndex") // all <a class="LinkIndex"></a>

来源

2014-11-01 19:18:06 Krraskl13

BeautifulSoup - Finding Logos

回答

相关问题