编码给出“'ascii'编解码器不能编码字符......序号不在范围内（128）”

我正在通过Django RSS阅读器项目here工作。编码给出“'ascii'编解码器不能编码字符......序号不在范围内（128）”

该RSS源将读取像“OKLAHOMA CITY（AP） - 詹姆斯哈登让”。 RSS提要的编码读取编码=“UTF-8”，所以我相信我在下面的代码片段中将utf-8传递给降价。他的破折号就是它窒息的地方。

我得到Django错误的''ascii'编解码器无法编码字符u'\ u2014'在位置109：序号不在范围（128）“这是一个UnicodeEncodeError。在通过的变量中，我看到“OKLAHOMA CITY（AP）\'James Harden”。不工作的代码行是：

content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

我使用markdown 2.0，django 1.1和python 2.4。

什么是编码和解码的魔法序列，我需要做这个工作？

（响应于普罗米修斯请求。我同意的格式帮助）

因此，在我的意见添加parsed_feed编码行的上方smart_unicode线...

content = smart_unicode(content, encoding='utf-8', strings_only=False, errors='strict') 
content = content = content.encode(parsed_feed.encoding, "xmlcharrefreplace")

这推动问题我的models.py对我来说，我有我

def save(self, force_insert=False, force_update=False): 
    if self.excerpt: 
     self.excerpt_html = markdown(self.excerpt) 
     # super save after this

如果我改变保存方法有。 ..

def save(self, force_insert=False, force_update=False): 
    if self.excerpt: 
     encoded_excerpt_html = (self.excerpt).encode('utf-8') 
     self.excerpt_html = markdown(encoded_excerpt_html)

我得到错误 “ 'ASCII' 编解码器不能在141位解码字节0xe2：在范围序数不（128）”因为现在它读取 “\ XE2 \ X80 \ X94”其中破折号是

来源

2010-03-25 user140314

你可以请张贴回溯原样？ – tzot 2010-03-26 12:50:20

基本上，'parsed_feed.encoding'的价值是什么？每个机会都是'ascii'吗？（这将解释你的错误）。 – tzot 2010-03-26 12:52:30

Django provides a couple of useful functions for converting back and forth between Unicode and bytestrings:

从django.utils.encoding进口smart_unicode，smart_str

来源

2010-03-25 04:24:12 nikola

使用... 含量= smart_unicode（内容，编码= 'UTF-8'，strings_only =假，误差= '严格'）含量=含量= content.encode（parsed_feed.encoding “xmlcharrefreplace”）推（self，force_insert = False，force_update = False）： if self.excerpt： self.excerpt_html = markdown（self.excerpt）＃超级保存之后，这个问题给我的models.py我在哪里 def save 如果我改变保存方法具有 encoded_excerpt_html =（self.excerpt）.encode（ 'UTF-8'） self.excerpt_html =降价（encoded_excerpt_html） – user140314 2010-03-25 05:00:55

第2部分：我得到的错误“ 'ASCII'编码解码器无法解码位置141中的字节0xe2：序号不在范围（128）中“，因为现在它读取了”\ xe2 \ x80 \ x94“，其中em是破折号。 – user140314 2010-03-25 05:01:21

你能否用上面的方法修改你原来的文章？如果没有正确的格式化，阅读起来非常困难。 – nikola 2010-03-25 08:00:51

如果您正在接收的数据实际上是以UTF-8编码的，那么它应该是Python中的一个字节序列 - 一个Python'str'对象2.X

您可以验证一个断言：

assert isinstance(content, str)

一旦你知道这是真的，你可以移动到实际的编码。 Python不会进行转码 - 例如，直接从UTF-8转换为ASCII。首先，您需要将您的字节序列转换成Unicode字符串，通过解码它：

unicode_content = content.decode('utf-8')

（如果你可以信任parsed_feed.encoding，然后用这个来代替文字“UTF-8”无论哪种方式，。为错误做好准备。）

然后，您可以采取的字符串，并以ASCII编码它，代字高为它们的XML实体等价物：

xml_content = unicode_content.encode('ascii', 'xmlcharrefreplace')

完整的方法，那么，看起来财产以后这样的：

try: 
    content = content.decode(parsed_feed.encoding).encode('ascii', 'xmlcharrefreplace') 
except UnicodeDecodeError: 
    # Couldn't decode the incoming string -- possibly not encoded in utf-8 
    # Do something here to report the error

来源

2011-12-30 23:14:10

我在使用zip文件写入文件名期间遇到此错误。下面失败

ZipFile.write(root+'/%s'%file, newRoot + '/%s'%file)

及以下工作

ZipFile.write(str(root+'/%s'%file), str(newRoot + '/%s'%file))

来源

2012-09-07 02:33:20 highvelcty

在非ASCII字符的unicode值上调用'str（）'会导致OP看到完全相同的错误。 – 2012-09-25 15:00:53

@MartijnPieters：嗨，这是你做出的一个非常重要的观点。我可以在[精细手册]（http://docs.python.org/2/library/functions.html#str）中找不到有关'str（）'实际执行的操作，但是我把它归因于我Python noob不仅仅是手册的错误。这里记录了什么，'str（）'对参数做了什么，'str（）'返回的是什么？谢谢！ – dotancohen 2013-06-12 07:59:36

'str（）'返回一个*字节的字符串*;值在0到255之间的字符，通常以0-127解释并显示为ASCII字符。另一方面，'unicode（）'值可以表示Unicode标准中的任何代码点，介于0和1114111之间。因此，使用'str（unicodevalue）'将unicode转换为字节字符串将涉及* some *转换。 – 2013-06-12 12:29:27

编码给出“'ascii'编解码器不能编码字符......序号不在范围内（128）”

回答

相关问题