Python的BeautifulSoup是不必要的慢

for olay in soup("li", {"class":"textb"}): 
    tanim = olay("strong") 
    try: 
     print tanim[0] 
    except IndexError: 
     pass

这样获取字符串属性使得这个码要慢得多：

for olay in soup("li", {"class":"textb"}): 
    tanim = olay("strong") 
    try: 
     print tanim[0].string 
    except IndexError: 
     pass

我的问题是，我在做什么，我不应该得到那样的字符串属性？我是否应该使用其他方法来获取纯文本版本的对象？

更新：这也是工作得很快，所以我认为字符串属性的缓慢是独特的？

for olay in soup("li", {"class":"textb"}): 
    tanim = olay("strong") 
    try: 
     print tanim[0].text 
    except IndexError: 
     pass

2011-12-17 yasar

我无法重现这一点。对我而言，`.text`最慢，其他两个差不多。尽管如此，整体差异并不大。所以问题是：你在测试什么，以及如何？ – ekhumoro 2011-12-17 20:10:57

如果您只是想打印tanim[0]的字符串表示形式。你应该这样做：print str(tanim[0])。此外，请执行dir(tanim[0])以查看它是否有称为string的属性。

for olay in soup("li", {"class":"textb"}): 
    tanim = olay("strong") 
    try: 
     print str(tanim[0]) 
    except IndexError: 
     pass

给大家提供一个更好的答案，你也可以发布目标HTML或URI和提这一点你试图提取出来。

2011-12-17 11:09:10 gsbabil

回答