2013-03-11 64 views
1

我在使用python的库灯泡插入和查找来自neo4j的数据时遇到了一些麻烦。问题与字符编码有关。我得到:neo4j,bulbs和utf8

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 22: ordinal not in range(128) 

当试图查找索引中的节点。我已经搜索了一些方法来改变neo4j灯泡中的字符编码,但似乎找不到办法。

编辑 下面是重现错误代码:

from bulbs.model import Node 
from bulbs.neo4jserver import Graph 
from bulbs.property import String 
import MySQLdb 
import sys 


class Topic(Node): 
    element_type = 'node' 
    name = String(nullable=False) 


g = Graph() 
g.add_proxy('topics', Topic) 

con = MySQLdb.connect(host='127.0.0.1', user='root', db='wiki_new', charset='utf8') 
cur = con.cursor() 
cur.execute('SELECT page_title FROM page') 
while True: 
    row = cur.fetchone() 
    if not row: 
     break 

    sys.stdout.write(row[0] + '\n') 
    nds = g.topics.index.lookup(name=row[0]) 
    if not nds: 
     g.topics.create(name=row[0]) 

导致该错误的字符串是:Xóõ语。

UPDATE

我越来越从XML文件中的数据,现在(维基百科页面转储),使用Python的SAX解析器。该代码基本上是相同的,并且错误我得到:ATP-toernooi面包车蒙特利尔/多伦多:

File "graph.py", line 197, in <module> 
    build_wikipedia_graph(WIKI_DUMP_PATH) 
    File "graph.py", line 195, in build_wikipedia_graph 
    filter_handler.parse(open(wiki_dump_path)) 
    File "/usr/lib/python2.7/xml/sax/saxutils.py", line 255, in parse 
    self._parent.parse(source) 
    File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse 
    xmlreader.IncrementalParser.parse(self, source) 
    File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse 
    self.feed(buffer) 
    File "/usr/lib/python2.7/xml/sax/expatreader.py", line 207, in feed 
    self._parser.Parse(data, isFinal) 
    File "/usr/lib/python2.7/xml/sax/expatreader.py", line 304, in end_element 
    self._cont_handler.endElement(name) 
    File "/home/pedro/wiki/1.0/page_parser.py", line 55, in method 
    getattr(self._downstream, method_name)(*a, **k) 
    File "/home/pedro/wiki/1.0/page_parser.py", line 87, in endElement 
    self.pageCallBack(self.currentPage, self.callbackArgs) 
    File "graph.py", line 181, in _callback 
    kgraph.set_links_to(page.title, target) 
    File "graph.py", line 59, in set_links_to 
    topic_dst = self._g.topics.get_or_create('name', topic_dst, name=topic_dst) 
    File "/usr/local/lib/python2.7/dist-packages/bulbs/element.py", line 607, in get_or_create 
    vertex = self.index.get_unique(key, value) 
    File "/usr/local/lib/python2.7/dist-packages/bulbs/neo4jserver/index.py", line 335, in get_unique 
    resp = lookup(self.index_name,key,value) 
    File "/usr/local/lib/python2.7/dist-packages/bulbs/neo4jserver/client.py", line 878, in lookup_vertex 
    path = build_path(index_path, vertex_path, index_name, key, value) 
    File "/usr/local/lib/python2.7/dist-packages/bulbs/utils.py", line 126, in build_path 
    segments = [quote(str(segment), safe='') for segment in args if segment is not None] 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 22: ordinal not in range(128) 

当我试图创建一个名称的节点时发生错误。

另一个更新 随着更新泡图书馆,我得到一个不同的错误:

File "/usr/local/lib/python2.7/dist-packages/bulbs/utils.py", line 129, in build_path 
    segments = [quote(unicode(segment), safe='') for segment in args if segment is not None] 
    File "/usr/lib/python2.7/urllib.py", line 1238, in quote 
    return ''.join(map(quoter, s)) 
KeyError: u'\xe9' 

任何帮助吗?

谢谢!

+0

请提供的示例代码,所以我可以看到发生了什么。 – espeed 2013-03-12 09:59:40

+0

@espeed添加了一些在创建节点时导致相同错误的代码。谢谢。 – user1491915 2013-03-12 12:19:54

+0

谢谢。请发布完整的错误消息,以便我可以看到堆栈中的位置。 – espeed 2013-03-12 14:34:03

回答

0

灯泡串店在Neo4j的服务器的Unicode - 请注意,属性类型为字符串值强制转换为Unicode(Unicode字符串在Python 3默认):

见Python的Unicode指南:

http://docs.python.org/2/howto/unicode.html#python-2-x-s-unicode-support

首先,确认你的MySQL服务器有UTF-8支持:

mysql> show character set like 'utf%';

此外,注意到我的变化和评论...

from bulbs.model import Node 
from bulbs.neo4jserver import Graph 
from bulbs.property import String 
import MySQLdb 
import sys 


class Topic(Node): 
    element_type = 'node'   # by convention name this 'topic' 
    name = String(nullable=False) 


g = Graph() 
g.add_proxy('topics', Topic) 

# Make sure use_unicode to set True 
con = MySQLdb.connect(host='127.0.0.1', user='root', db='wiki_new', use_unicode=True, charset='utf8') 
cur = con.cursor() 
cur.execute('SELECT page_title FROM page') 
while True: 
    row = cur.fetchone() 
    if not row: 
     break 

    sys.stdout.write(row[0] + '\n') 

    # Use Bulbs' get_or_create method to simplify your code 
    nds = g.topics.get_or_create(name, row[0], name=row[0]) 
+0

感谢您的回复,我没有尝试该解决方案,因为我的数据源更改为XML文件,但我仍然收到相同的错误。请参阅原文。感谢您的提示! – user1491915 2013-03-14 17:15:09