>>> teststring = 'aõ'
>>> type(teststring)
<type 'str'>
>>> teststring
'a\xf5'
>>> print teststring
aõ
>>> teststring.decode("ascii", "ignore")
u'a'
>>> teststring.decode("ascii", "ignore").encode("ascii")
'a'
这是我真正想让它在内部存储,因为我删除非ASCII字符。为什么解码(“ASCII给出一个Unicode字符串?在Python中删除任何给定的字符串类型的非ASCII字符
>>> teststringUni = u'aõ'
>>> type(teststringUni)
<type 'unicode'>
>>> print teststringUni
aõ
>>> teststringUni.decode("ascii" , "ignore")
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
teststringUni.decode("ascii" , "ignore")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf5' in position 1: ordinal not in range(128)
>>> teststringUni.decode("utf-8" , "ignore")
Traceback (most recent call last):
File "<pyshell#81>", line 1, in <module>
teststringUni.decode("utf-8" , "ignore")
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf5' in position 1: ordinal not in range(128)
>>> teststringUni.encode("ascii" , "ignore")
'a'
这又是我想要的。 我不明白这个问题。有人能向我解释这里发生了什么?
编辑:我认为这将我了解的东西,所以我可以解决我的真正的程序问题,我在此声明: Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)
这个角度实际上解决了它=),谢谢 – fullmooninu 2010-09-08 15:57:56
如果这不起作用,也尝试使用BeautifulSoup(html).encode为html或正则表达式模块 – 2014-09-03 14:57:27