2012-05-07 135 views
0
>>> print('\ufeff') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
UnicodeEncodeError: 'gbk' codec can't encode character '\ufeff' in position 0: illegal multibyte sequence 

I knowHow to make python 3 print('ufeff')

>>> stdout = open(1, 'w', encoding='gb2312', errors='ignore') 
>>> print('\ufeff', file=stdout) 

or

>>> print(repr('\ufeff')) 
'\ufeff' 

but too long,What else I can finish it simple

英语真难写,有木有?这坛子有国人么?帮老弟一帮啊。。

回答

0

You seem to be trying to print a Unicode character to a terminal which do not support that character. Doing so is essentially impossible. It may also be that the character in question should be a part of the GBK encoding, but that the Python implementation has a bug.

Your first solution where you open stdout using gb2312 indicates that the terminal itself do support the character if you just change it's encoding. That should be doable as a setting in the operating system somehow. That's probably the best solution for you. If you can, then switch to UTF-8 or UTF-16. They should support all Unicode characters.

Otherwise all you can do is to try to filter the character out of what you are printing before you print it, or encode it to binary with errors='ignore' or errors='replace' .

+0

Thank you very much! – beeang

1

The '\ufeff' is unprintable Unicode character with special meaning. It is used as the UTF-16 BOM (Byte Order Mark) to detect the order of bytes stored in memory (later written to a file) when two-byte integers are used. When found at the begining of the file, it should help only to detect the way the hardware stores the small integers, and then it should be ignored.

Have a look at http://en.wikipedia.org/wiki/Byte_order_mark for more details.

+0

Thank you. 好像我只会说这个,抱歉啦。 – beeang

+0

Oh, *now* I notice the "errors='ignore'" in the second example. That's the difference, not the encoding. D'oh! –

+0

Actually, the question is whether the file is stored in UTF-16. If yes, the file should not be open with 'encoding='gb2312''. Another problem is that the console may not be capable to convert Unicode string to the console encoding. Also, 'repr(unicodestr)' does not convert non-ASCII characters to ASCII sequences in Python 3 (Python 2 does that). Use 'ascii()' function to get escape sequences when needed. – pepr