源代码如何在字符串文字中应用？

PEP-263指定在源指定编码以下顺序施加：源代码如何在字符串文字中应用？

read the file

decode it into Unicode assuming a fixed per-file encoding

convert it into a UTF-8 byte string

tokenize the UTF-8 content

compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the UTF-8 data into 8-bit string data using the given file encoding

所以，如果我借此代码：

print 'abcdefgh' 
print u'abcdefgh'

并将其转换为ROT-13：

# coding: rot13 

cevag 'nopqrstu' 
cevag h'nopqrstu'

我希望它是第一次解码，然后变得相同的原始，打印：

abcdefgh 
abcdefgh

但是，相反，它打印：

nopqrstu 
abcdefgh

所以，expeced的unicode文学作品，但仍然str未转换。 为什么？

消除一些可能性：

我证实，这个问题是不是在以后的阶段（打印到控制台），而是立即在解析，监守这个代码产生“ValueError异常：不支持的格式索引1"字符 'q'（0x71）：

x = '%q' % 1 # that is %d !

来源

2016-10-04 zvone

我想最后一点其实解释相当准确地会发生什么：

compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the UTF-8 data into 8-bit string data using the given file encoding

后的前4个步骤，在源文件的内容是以下串的标记化Unicode版本：

print 'abcdefgh' 
print u'abcdefgh'

在此之后，在步骤5中，字符串对象'abcdefgh'是再编码到使用给定的文件编码（这是ROT13），所以内容变成8位字符串数据：

print 'nopqrstu' 
print u'abcdefgh'

来源

2018-02-25 16:39:39 zvone

你答2年后BEC ause我刚刚在我最喜欢的今天添加了你的问题？几个小时前我正在考虑这个问题，并在我的终端上进行测试，然后布姆！一个答案。不管怎样，谢谢你。我来到这里首先是因为我试图理解步骤5的含义。 – Maggyero

@Maggyero是的，我收到了一个关于这个问题的通知，并决定我知道答案：D – zvone

所以总结（纠正我，如果我错了）：标记器只需要UTF-8字符串作为输入。因此，源代码需要转码为UTF-8，也就是说从已声明的源代码编码中解码并编码为UTF-8。然而，源代码中的*字节字符串文字*在该过程中也会被解码，尽管它们不应该（但是在这个阶段我们还不知道它们的类），因为它们是字面的。这就是为什么在第5步中，他们被重新编码为声明的源代码编码。顺便说一句，最后一件事情还不清楚，tokenizer的输出是什么：Unicode或UTF-8字符串？ – Maggyero

源代码如何在字符串文字中应用？

回答

相关问题