在BeautifulSoup中访问属性的问题

我在使用Python（2.7）时遇到问题。该代码主要包括：在BeautifulSoup中访问属性的问题

str = '<el at="some">ABC</el><el>DEF</el>' 
z = BeautifulStoneSoup(str) 

for x in z.findAll('el'): 
    # if 'at' in x: 
    # if hasattr(x, 'at'): 
     print x['at'] 
    else: 
     print 'nothing'

我预计第一if说法正确（即：如果at不存在，打印"nothing"）工作，但它始终没有打印（即：始终False）。另一方面，第二个if始终为True，当尝试从第二个<el>元素尝试访问at时，将导致代码增加KeyError，这当然不存在。

来源

2011-05-01 NullUserException

'str'是不是你最好选择一个易变的名字，因为它掩盖了内建的str类型 - 像'xmltext'这样的东西呢？ – PaulMcG 2011-05-01 20:51:01

in运算符用于序列和映射类型，是什么让你认为BeautifulSoup返回的对象应该正确实现它？根据BeautifulSoup文档，您应该使用[]语法访问属性。

Re hasattr，我想你混淆了HTML/XML属性和Python对象属性。 hasattr适用于后者，BeaitufulSoup AFAIK不反映它在它自己的对象属性中解析的HTML/XML属性。

P.S.请注意0中的Tag对象确实实施__contains__ - 所以也许您尝试使用错误的对象？你能否展示一个完整但很简单的例子来证明这个问题？

运行此：

from BeautifulSoup import BeautifulSoup 

str = '<el at="some">ABC</el><el>DEF</el>' 
z = BeautifulSoup(str) 

for x in z.findAll('el'): 
    print type(x) 
    print x['at']

我得到：

<class 'BeautifulSoup.Tag'> 
some 
<class 'BeautifulSoup.Tag'> 
Traceback (most recent call last): 
    File "soup4.py", line 8, in <module> 
    print x['at'] 
    File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 601, in __getitem__ 
    return self._getAttrMap()[key] 
KeyError: 'at'

这是我所期待的。第一个el有一个at属性，第二个没有 - 并且抛出一个KeyError。

更新2：BeautifulSoup.Tag.__contains__看起来内容标签的，而不是它的属性里面。要检查属性是否存在，请使用in。

for x in z.findAll('el'): 
    print x.get('at', 'nothing')

来源

2011-05-01 12:25:36

'in'也是映射接口的一部分，在这些情况下引用密钥。 – delnan 2011-05-01 12:26:54

@delnan：谢谢，我在答案中加了这个说明 – 2011-05-01 12:28:06

这个问题就是访问一个不存在的元素会引发一个'KeyError'。 – NullUserException 2011-05-01 12:28:40

我通常使用get（）方法来访问属性

link = soup.find('a') 
href = link.get('href') 
name = link.get('name') 

if name: 
    print 'anchor' 
if href: 
    print 'link'

来源

2011-05-01 13:07:47

如果你的代码是为你提供简单，你可以在一个紧凑的方式以解决它只是通过标记名称扫描元素，pyparsing解决方案可能会更具可读性（并且不使用已弃用的API，如has_key）：

from pyparsing import makeXMLTags 

# makeXMLTags creates a pyparsing expression that matches tags with 
# variations in whitespace, attributes, etc. 
el,elEnd = makeXMLTags('el') 

# scan the input text and work with elTags 
for elTag, tagstart, tagend in el.scanString(xmltext): 
    if elTag.at: 
     print elTag.at

对于添加细化，pyparsing允许您定义过滤解析动作，这样，如果一个特定的属性值（或者属性anyvalue）被发现标签将只匹配：

# import parse action that will filter by attribute 
from pyparsing import withAttribute 

# only match el tags having the 'at' attribute, with any value 
el.setParseAction(withAttribute(at=withAttribute.ANY_VALUE)) 

# now loop again, but no need to test for presence of 'at' 
# attribute - there will be no match if 'at' is not present 
for elTag, tagstart, tagend in el.scanString(xmltext): 
    print elTag.at

来源

2011-05-01 15:47:43 DzinX

要：

来源

2011-05-01 21:04:02 PaulMcG

在BeautifulSoup中访问属性的问题

回答

相关问题