如何从Python2.x中的unicode字符串中删除转义字符（转义Unicode字符）？

>>> test 
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"' 
>>> test2 
'"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"' 
>>> print test 
"Hello," he said‏‎. 
     "I am nine years oldâ" 
>>> print test2 
"Hello," he\u200b said\u200f\u200e. 
     "I\u200b am\u200b nine years old"

那么我如何从test2转换到测试（即打印unicode字符）呢？ .decode('utf-8')不这样做。如何从Python2.x中的unicode字符串中删除转义字符（转义Unicode字符）？

来源

2017-06-25 kawakaze

您可以使用unicode-escape encoding来解码'\\u200b'至u'\u200b'。

>>> test1 = u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"' 
>>> test2 = '"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"' 
>>> test2.decode('unicode-escape') 
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old"' 
>>> print test2.decode('unicode-escape') 
"Hello," he said‏‎. 
    "I am nine years old"

注：不过，即使这一说法，test2不能被解码，以精确匹配test1因为刚刚结束引号（"）之前有一个在test1一个u'\xe2'。

>>> test1 == test2.decode('unicode-escape') 
False 
>>> test1.replace(u'\xe2', '') == test2.decode('unicode-escape') 
True

来源

2017-06-25 04:47:25 falsetru

是否打印了转义的unicode字符？鉴于在这个例子中的零空间，我无法确定。我假设u'\ xe2'不能被打印，因为它不是unicode？ – kawakaze

@kawakaze，'u'\ xe2''是'LATIN小字母A CIRCUMFLEX'。你可以使用'unicodedata.name（u'\ xe2'）来检查' – falsetru

非常感谢！ – kawakaze

如何从Python2.x中的unicode字符串中删除转义字符（转义Unicode字符）？

回答

相关问题