python2如何在内部处理字符串和unicode？

我对python的unicode/str进程感到困惑。我在python2中遇到了一些情况。python2如何在内部处理字符串和unicode？

以下句子在IDE pycharm中用utf8编码写入py文件。

print "hello! %s" % u"中国"
print "hello! %s" % "中国"
print u"hello! %s" % "中国"

只有3情况提高解码错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128).

有人可以告诉我怎么蟒蛇交易这句话。为什么有结果呢？

来源

2016-03-04 user3876484

你会发现这篇文章有帮助：务实的Unicode（http://nedbatchelder.com/text/unipain.html），写由SO老将斯内德尔德。 –

如果删除打印语句中，可以看到更多的细节：

>>> "hello! %s" % u"中国" 
u'hello! \u4e2d\u56fd' 
>>> "hello! %s" % "中国" 
'hello! \xe4\xb8\xad\xe5\x9b\xbd' 
>>> u"hello! %s" % "中国" 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

这给了我们一个线索，这是怎么回事。每当涉及任何unicode字符串时，Python都会尝试将另一端转换为unicode;并且像往常一样，没有任何相反的指示，它将总是假定编码是ASCII。

在第一种情况下，它会尝试将“hello”字符串转换为unicode;由于没有非ASCII字符，所以这可以正常工作，并且结果可以安全地使用现有的unicode字符串进行插值。

在第二种情况下，双方都是字节串，所以不会尝试转换;结果仍然是一个字节串。

在第三种情况下，“hello”已经是unicode，因此它会尝试转换另一端;但由于这些是非ASCII字符，因此失败。指定编码直接做工作，但是：

>>> u"hello! %s" % "中国".decode('utf-8') 
u'hello! \u4e2d\u56fd'

来源

2016-03-04 10:12:02

好的，谢谢。我从你的答案中得到了两个信息。 1.当whey都是字节串时，python不会将bytestrings转换为unicode。有没有一些文件？ 2.第三种情况。该文件标有utf8编码。 Python如何以及为什么认为它是ASCIl编码。 – user3876484

python2如何在内部处理字符串和unicode？

回答

相关问题