我正在使用urlfetch来获取URL。当我尝试将其发送到html2text功能(剥掉所有的HTML标签),我得到以下信息:获取URL时出现UnicodeEncodeError
UnicodeEncodeError: 'charmap' codec can't encode characters in position ... character maps to <undefined>
我一直在尝试处理编码(“UTF-8”,“忽略”)上字符串,但我不断收到此错误。
任何想法?
感谢,
乔尔
一些代码:
result = urlfetch.fetch(url="http://www.google.com")
html2text(result.content.encode('utf-8', 'ignore'))
和错误消息:
File "C:\Python26\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 159-165: character maps to <undefined>
请添加'content_type = result.headers.getheader('Content-Type'); print(content_type)'到你的代码(在'result = urlfetch.fetch(...)'之后),并告诉我们结果。 – unutbu 2010-09-12 17:01:32
输出结果为:“windows-1255”。我尝试切换到html2text(result.content.decode('windows-1255','ignore')),但我仍然得到“UnicodeEncodeError:'charmap'编解码器无法编码2-8位字符:字符映射到 “ –
Joel
2010-09-12 17:14:34