在python 3.5中解析html会返回奇怪的类型

我正在运行python 3.5，并试图从此网页中提取BINGO数据，并遇到一些问题。当我拆分html响应时，我不断收到字母b之前的字符串列表，并使其无法检查。我检查了我不熟悉的html输出及其类字节。为什么这个b在我所有的字符串之前，第二我怎么能更干净地解析一个html页面。在python 3.5中解析html会返回奇怪的类型

import urllib.request 
with urllib.request.urlopen('http://www.executiveadministrator.com/cgi-local/inoutPROhosted4/inoutPRO.pl?refresh=1&ID=AFTCO') as response: 
    html = response.read() 

htmllist = html.split() 

print(htmllist) 
for i in htmllist: 
    #if i == 'BINGO': 
    print(i)

示例输出：b'class = “colorlinkbody”>续订 'b'Board' b'Contract
'b'Copyright' b'1996-2013' B ''

来源

2017-02-24 M4dW0r1d

因为response.read返回'字节'不再'str'。使用'encode（）' –

由于response.read()返回bytes不再像注释中提到的str一样，如果您需要从字节对象获取字符串值，则必须调用字节对象的decode(encoding)方法。使您的打印功能：

for i in htmllist: 
    print(i.decode('utf-8'))

来源

2017-02-24 15:04:33 metame

感谢这似乎是一个笨重的方式从html中获取字符串列表。有没有更好的办法？意思是其他urllib.request？如果有问题，我在Windows平台上。 – M4dW0r1d

取决于你想要对他们做什么，但你应该更多地看看HTML解析库如'lxml'或'BeautifulSoup'又名'bs4' – metame

在python 3.5中解析html会返回奇怪的类型

回答

相关问题