在Python中使用unicode（）和encode（）函数

我有一个路径变量的编码问题，并将其插入到SQLite数据库中。我试图用编码（“utf-8”）函数解决它，但没有帮助。然后我使用unicode（）函数，它给我类型unicode。在Python中使用unicode（）和encode（）函数

print type(path)     # <type 'unicode'> 
path = path.replace("one", "two") # <type 'str'> 
path = path.encode("utf-8")  # <type 'str'> strange 
path = unicode(path)    # <type 'unicode'>

最后我获得了的unicode类型，但我仍然有其存在时路径变量的类型是海峡同样的错误

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

你能帮我解决这个错误并解释encode("utf-8")和unicode()函数的正确用法？我经常与之战斗。

编辑：

这的execute（）陈述引发的错误：

cur.execute("update docs set path = :fullFilePath where path = :path", locals())

我忘了改fullFilePath变量，同样的问题受到影响的编码，但我现在很困惑。我应该只使用unicode（）或编码（“utf-8”）或两者吗？

我不能使用

fullFilePath = unicode(fullFilePath.encode("utf-8"))

，因为它提出了这样的错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 32: ordinal not in range(128)

的Python版本是2.7.2

来源

2012-04-23 xralf

哪里是引发错误的代码？ – newtover 2012-04-23 20:48:04

您确切的问题已经得到解答：[http://stackoverflow.com/questions/2392732/sqlite-python-unicode-and-non-utf-data][1] [1]：http：// stackoverflow.com/questions/2392732/sqlite-python-unicode-and-non-utf-data – garnertb 2012-04-23 20:51:25

@newtover我编辑了这个问题。 – xralf 2012-04-23 20:55:58

您正在使用encode("utf-8")不正确。 Python字节字符串（str类型）具有编码，Unicode不具有。您可以使用uni.encode(encoding)将Unicode字符串转换为Python字节字符串，并且可以使用s.decode(encoding)（或等效地unicode(s, encoding)）将字节字符串转换为Unicode字符串。

如果fullFilePath和path目前是str类型，您应该弄清楚它们是如何编码的。例如，如果当前的编码是UTF-8，你可以使用：

path = path.decode('utf-8') 
fullFilePath = fullFilePath.decode('utf-8')

如果仍不能解决问题，实际问题可能是你不使用你的电话Unicode字符串，尝试将其更改为以下内容：

cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())

来源

2012-04-23 21:15:32

此语句'fullFilePath = fullFilePath.decode（“utf-8”）'仍然会产生错误'UnicodeEncodeError：'ascii'编解码器无法对位置32-34中的字符进行编码：序号不在范围内（128）。 fullFilePath是类型* str *和从db表的* text *列取得的应该是utf-8编码的字符串的组合。 – xralf 2012-04-23 21:25:40

根据[this]（http://www.sqlite.org/datatype3.html），但可以是UTF-8，UTF-16BE或UTF-16LE。我能以某种方式找出它吗？ – xralf 2012-04-23 21:31:27

@xralf，如果组合不同的'str'对象，则可能是混合编码。你可以显示'print repr（fullFilePath）'的结果吗？ – 2012-04-23 21:34:40

str是文本表示字节， unicode是以文字表示的字符。

您将文本从字节解码为unicode，并使用某种编码将unicode编码为字节。

即：

>>> 'abc'.decode('utf-8') # str to unicode 
u'abc' 
>>> u'abc'.encode('utf-8') # unicode to str 
'abc'

来源

2012-04-23 21:08:53 newtover

谢谢。非常有用的信息。 – Rupam 2016-06-13 15:05:08

非常干净的答案，非常聪明谢谢 – 2017-07-18 14:41:47

非常好的答案，直接点。我会补充一点，'unicode'说的是字母或符号，或者更一般地说：** runes **，而'str'代表某个编码中的字节串，那么您必须“解码”（显然是在正确的编码中）获得特定的符文 – arainone 2017-08-28 14:44:05

确保在从shell运行脚本之前设置好了语言环境设置，例如

$ locale -a | grep "^en_.\+UTF-8" 
en_GB.UTF-8 
en_US.UTF-8 
$ export LC_ALL=en_GB.UTF-8 
$ export LANG=en_GB.UTF-8

文档：man locale，man setlocale。

来源

2017-09-26 11:56:15 kenorb

在Python中使用unicode（）和encode（）函数

回答

相关问题