2016-07-14 40 views
-1

我试图获取元素的img alt部分,但由于某种原因,它不会注册为元素的属性,即使它包含在打印输出中串。我怎样才能拉img alt也(我需要的文字说,盒装警告,但我不知道如何解决/让我的代码得到它)?在python 3.5中用BS4解析HTML图像信息

page=requests.get('http://www.drugs.com/labeling-changes/May-2016.html') 
noStarchSoup=bs4.BeautifulSoup(page.text, "html.parser") 
elems=noStarchSoup.select('tr td > a') 
print(elems[0].getText()) 
print(str(elems[1])) 
print(elems[1].get('alt')) 
print(elems[1].attrs) 

Anaprox 
<a href="/labeling-changes/May-2016/anaprox-naproxen-4321.html" rel="nofollow"><img alt="Boxed Warning" height="16" src="/img/icons/exclamation.png" title="Changes have been made to the Boxed Warning section of the safety label." width="16"/></a> 
None 
{'href': '/labeling-changes/May-2016/anaprox-naproxen-4321.html', 'rel':['nofollow']} 
+0

[用beautifulsoup提取属性值]可能的副本(http://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup) –

回答

0

打印第一单元输出这个

print(elems[1]) 
    <a href="/labeling-changes/May-2016/anaprox-naproxen-4321.html" rel="nofollow"> 
    <img alt="Boxed Warning" height="16" src="/img/icons/exclamation.png" title="Chan 
ges have been made to the Boxed Warning section of the safety label." width="16"/> 
    </a> 

所以第一个元素a有一个名为img孩子。所以你想要检索那个孩子的alt属性。

print(elems[1].img.get('alt', '')) 
Boxed Warning 
print(elems[1].img.get('width', '')) 
16 
print(elems[1].img.get('height', '')) 
16 

widthheight并不意味着这个值是图像的原始大小。

+0

谢谢!这正是我需要的 – TheSplicer