蟒蛇不正确的格式西里尔

def inp(text): 
    tmp = str() 
    arr = ['.' for x in range(1, 40 - len(text))] 
    tmp += text + ''.join(arr) 
    print tmp 

s=['tester', 'om', 'sup', 'jope'] 
sr=['тестер', 'ом', 'суп', 'жопа'] 
for i in s: 
    inp(i) 
for i in sr: 
    inp(i)

输出：蟒蛇不正确的格式西里尔

tester................................. 
om..................................... 
sup.................................... 
jope................................... 

тестер........................... 
ом................................... 
суп................................. 
жопа...............................

为什么Python中不正确地处理西里尔？行结束并不是直的，而且很糟糕。使用格式也一样。这怎么可以纠正？感谢

来源

2013-03-03 Spouk

阅读：

http://docs.python.org/2/howto/unicode.html

基本上，你在text参数inp功能是一个字符串。在Python 2.7中，字符串默认是字节。在例如编码时，Cyrilic字符未被映射到1-1到字节。 utf-8编码，但需要多于一个字节（在utf-8中通常为2），所以当你做len(text)时，你不会得到字符数，而是字节数。

为了获得字符的数量，你需要知道你的编码。假设这是UTF-8，您可以将文本解码到编码和将打印正确的：

#!/usr/bin/python 
# coding=utf-8 
def inp(text): 
    tmp = str() 
    utext = text.decode('utf-8') 
    l = len(utext) 
    arr = ['.' for x in range(1, 40 - l)] 
    tmp += text + ''.join(arr) 
    print tmp 

s=['tester', 'om', 'sup', 'jope'] 
sr=['тестер', 'ом', 'суп', 'жопа'] 
for i in s: 
    inp(i) 
for i in sr: 
    inp(i)

重要的线是这两个：

utext = text.decode('utf-8') 
    l = len(utext)

，你首先解码文本，从而导致一个unicode字符串。之后，您可以使用内置的len来获取字符长度，这正是您想要的。

希望这会有所帮助。

来源

2013-03-03 03:48:14

非常感谢。准确和详细的回应。再次感谢你。 – Spouk 2013-03-03 10:49:40

@Spouk当然，很高兴帮助！ – 2013-03-03 21:39:39

蟒蛇不正确的格式西里尔

回答

相关问题