2010-05-14 77 views
1

好吧,所以我有一个MySQL数据库设置。大多数表都是latin1,Django可以很好地处理它们。但是,其中一些是UTF-8,而Django不处理它们。Django编码问题与MySQL

下面是一个示例表(这些表都是从Django的GEONAMES):

DROP TABLE IF EXISTS `geoname`; 
SET @saved_cs_client  = @@character_set_client; 
SET character_set_client = utf8; 
CREATE TABLE `geoname` (
    `id` int(11) NOT NULL, 
    `name` varchar(200) NOT NULL, 
    `ascii_name` varchar(200) NOT NULL, 
    `latitude` decimal(20,17) NOT NULL, 
    `longitude` decimal(20,17) NOT NULL, 
    `point` point default NULL, 
    `fclass` varchar(1) NOT NULL, 
    `fcode` varchar(7) NOT NULL, 
    `country_id` varchar(2) NOT NULL, 
    `cc2` varchar(60) NOT NULL, 
    `admin1_id` int(11) default NULL, 
    `admin2_id` int(11) default NULL, 
    `admin3_id` int(11) default NULL, 
    `admin4_id` int(11) default NULL, 
    `population` int(11) NOT NULL, 
    `elevation` int(11) NOT NULL, 
    `gtopo30` int(11) NOT NULL, 
    `timezone_id` int(11) default NULL, 
    `moddate` date NOT NULL, 
    PRIMARY KEY (`id`), 
    KEY `country_id_refs_iso_alpha2_e2614807` (`country_id`), 
    KEY `admin1_id_refs_id_a28cd057` (`admin1_id`), 
    KEY `admin2_id_refs_id_4f9a0f7e` (`admin2_id`), 
    KEY `admin3_id_refs_id_f8a5e181` (`admin3_id`), 
    KEY `admin4_id_refs_id_9cc00ec8` (`admin4_id`), 
    KEY `fcode_refs_code_977fe2ec` (`fcode`), 
    KEY `timezone_id_refs_id_5b46c585` (`timezone_id`), 
    KEY `geoname_52094d6e` (`name`) 
) ENGINE=MyISAM DEFAULT CHARSET=utf8; 
SET character_set_client = @saved_cs_client; 

现在,如果我尝试直接使用MySQLdb的和光标从表中的数据得到的,我得到的文本正确的编码:

>>> import MySQLdb 
>>> from django.conf import settings 
>>> 
>>> conn = MySQLdb.connect (host = "localhost", 
... user = settings.DATABASES['default']['USER'], 
... passwd = settings.DATABASES['default']['PASSWORD'], 
... db = settings.DATABASES['default']['NAME']) 
>>> cursor = conn.cursor() 
>>> cursor.execute("select name from geoname where name like 'Uni%Hidalgo'"); 
1L 
>>> g = cursor.fetchone() 
>>> g[0] 
'Uni\xc3\xb3n Hidalgo' 
>>> print g[0] 
Unión Hidalgo 

但是,如果我尝试使用Geoname模型(这实际上是一个django.contrib.gis.db.models.Model),它失败:

>>> from geonames.models import Geoname 
>>> g = Geoname.objects.get(name__istartswith='Uni',name__icontains='Hidalgo') 
>>> g.name 
u'Uni\xc3\xb3n Hidalgo' 
>>> print g.name 
Unión Hidalgo 

这里有一个非常明显的编码错误。在这两种情况下,数据库都返回'Uni \ xc3 \ xb3n Hidalgo',但Django(错误地?)将'\ xc3 \ xb3n'翻译为³。

我能做些什么来解决这个问题?

更新

好了,这是奇怪:

>>> c = unicode('Uni\xc3\xb3n Hidalgo','utf-8') 
>>> c 
u'Uni\xf3n Hidalgo' 
>>> print c 
Unión Hidalgo 

如果我蟒蛇将字符串从UTF-8编码转换成Unicode,它的工作原理。然而,这又重现了这个错误:

>>> c = unicode('Unión Hidalgo','latin1') 
>>> c 
u'Uni\xc3\xb3n Hidalgo' 
>>> print c 
Unión Hidalgo 

所以,我猜MySQL发送utf-8但告诉Python它是latin1吗?

回答

0

貌似问题是在MySQL毕竟。我删除了这些表,使用charset和collat​​e将其重新创建为UTF,然后重新导入所有数据。

现在工作。

0

,你可以使用这样

>>> print g.name.encode('latin1') 
Unión Hidalgo 
+0

我认真无法做到这一点,我到处都需要这个值。有什么我可以做的MySQL或Django让他们表现? – 2010-05-14 15:32:15

0

的Django 1.10,MariaDB的47年5月5日


一个非常重要的事情是设置数据库的字符集,当您创建数据库:

CREATE DATABASE `my_database` CHARACTER SET utf8; 

然后你就可以检查你的mysql的配置文件/etc/my.cnf(我使用MariaDB):

[client] 
default-character-set=utf8 

[mysql] 
default-character-set=utf8 

[mysqld] 
datadir=/var/lib/mysql 
socket=/var/lib/mysql/mysql.sock 
# Disabling symbolic-links is recommended to prevent assorted security risks 
symbolic-links=0 
# Settings user and group are ignored when systemd is used. 
# If you need to run mysqld under a different user or group, 
# customize your systemd unit file for mariadb according to the 
# instructions in http://fedoraproject.org/wiki/Systemd 
collation-server=utf8_unicode_ci 
init-connect='SET NAMES utf8' 
character-set-server=utf8 

还记得要重新启动你的sqlse起动转矩:

sudo systemctl restart mariadb.service 

裁判:

https://docs.djangoproject.com/en/1.10/ref/databases/#creating-your-database https://mariadb.com/kb/en/the-mariadb-library/setting-character-sets-and-collations/