lxml unicode输出问题

新的python和lxml，请耐心等待。现在坚持看起来是unicode问题。我试过.encode，美丽的汤的unicodedammit没有运气。已经搜索论坛和网页，但我缺乏python技能未能将建议的解决方案应用于我的特定代码。感谢任何帮助，谢谢。lxml unicode输出问题

代码：

import requests 
import lxml.html 

sourceUrl = "http://www.hkex.com.hk/eng/market/sec_tradinfo/stockcode/eisdeqty.htm" 

sourceHtml = requests.get(sourceUrl) 

htmlTree = lxml.html.fromstring(sourceHtml.text) 

for stockCodes in htmlTree.xpath('''/html/body/printfriendly/table/tr/td/table/tr/td/table/tr/table/tr/td'''): 
    string = stockCodes.text 
    print string

错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)

来源

2013-04-07 Om Nom

你能提供关于错误的更多细节吗？或者在'print string'之前添加一行'print type（string）'来查看发生了什么。 – iceout 2013-04-07 14:46:04

当我运行这样的代码python lx.py，我没有得到这个错误。但是，当我将结果发送到sdtout python lx.py > output.txt时，就会发生。所以，试试这个：

# -*- coding: utf-8 -*- 
import requests 
import lxml.html 
import sys 
reload(sys) 
sys.setdefaultencoding('utf-8')

这使您可以从默认的ASCII码为UTF-8，这Python运行时将使用每当它解码的字符串缓冲区为Unicode转换。

来源

2013-04-07 08:06:05 iceout

谢谢。将输出重定向到屏幕时没有看到错误？我可以问你的Python版本吗？我跑2.7.3 – 2013-04-07 08:38:04

另外，试过你的建议，但没有喜悦。 – 2013-04-07 08:38:33

我正在使用2.6。你使用哪种操作系统，Linux还是Windows？ – iceout 2013-04-07 09:36:44

text属性总是返回纯字节，但content属性应该尝试为您编码。你也可以尝试：sourceHTML.text.encode('utf-8')或sourceHTML.text.encode('ascii')但我相当肯定后者会导致同样的例外。

来源

2013-04-08 17:02:35

lxml unicode输出问题

回答

相关问题