使用python中的beautifulsoup的输出

嘿，我正在使用beautifulsoup（在scrapy的两天挣扎未果之后）来刮掉星际2的联赛数据，但是我遇到了一个问题。使用python中的beautifulsoup的输出

我有这个表，而我想，我不喜欢这样所有标签的字符串内容结果：

from BeautifulSoup import * 
from urllib import urlopen 

def parseWithSoup(url): 
    print "Reading:" , url 
    html = urlopen(url).read().lower() 
    bs = BeautifulSoup(html) 
    table = bs.find(lambda tag: tag.name=='table' and tag.has_key('id') and tag['id']=="tblt_table") 
    rows = table.findAll(lambda tag: tag.name=='tr') 

    rows.pop(0) #first row is header 
    for row in rows: 
     tags = row.findAll(lambda tag: tag.name=='a') 
     content = [] 
     for tagcontent in tags: 
      content.append(tagcontent.string) 
     print content 

if __name__ == '__main__': 
    content = "http://www.teamliquid.net/tlpd/sc2-international/games#tblt-5018-1-1-DESC" 
    metSoup = parseWithSoup(content)

但输出如下：

[u'+', u'gadget show live i..', u'crevasse', u'naniwa', u'socke'] 
[u'+', u'gadget show live i..', u'metalopolis 1.1', u'naniwa', u'socke'] 
[u'+', u'gadget show live i..', u'shakuras plateau 2.0', u'socke', u'select'] 
etc...

我的问题是：你来自哪里（来自unicode？），我该如何删除它？我只需要在u中的字符串...

来源

2011-04-17 Javaaaa

>>> l = [u'+'，u'gadget show live i ..'，u'crevasse'，u'naniwa'，u'socke'] >>> l [1] u'小工具秀现场直播......'。 >>> print l [1] #still unicode but you with print no 012 gadget show live我.. – snippsat 2011-04-17 15:36:03

u表示Unicode字符串。它并不会改变你作为程序员的任何东西，你应该忽略它。像普通的字符串一样对待它你真的想要你在那里。

请注意，所有美丽的汤的输出是unicode。这是一件好事，因为如果在抓取过程中遇到任何Unicode字符，就不会有任何问题。如果你想想摆脱u，（我不推荐它），你可以使用unicode字符串的decode()方法。

来源

2011-04-17 15:10:56

真的很好奇：你为什么不推荐它？最终我想输出到一个.csv文件中，我不想让你在那里（或者这会被自动处理吗？） – Javaaaa 2011-04-17 15:13:48

@Javaaa当Python向你显示数据结构时，它只是表示的一部分。如果你输出到标准输出或文件，它实际上并不显示。 – 2011-04-17 15:15:29

您不能使用str（）将unicode字符串转换为标准字符串。这是典型的美国建议，人们只知道ASCII。要将unicode字符串正确转换为字符串，您需要使用some_unicode_string.decode（encoding）方法。在unicode字符串上调用str（）永远不适用。 – 2011-04-17 15:15:50

你看到的是Python unicode字符串。

检查Python文档

http://docs.python.org/howto/unicode.html

为了与unicode字符串正确对待。

来源

2011-04-17 15:16:52

这一切现在都有很大的意义，在使用tweepy的另一个项目中也有同样的问题，谢谢！ – Javaaaa 2011-04-17 15:24:42

使用python中的beautifulsoup的输出

回答

相关问题