2015-08-21 85 views
1

我正在使用elasticsearch-py进行弹性搜索操作。elasticsearch使用python创建或更新文档

我在尝试使用elasticsearch.helpers.bulk来创建或更新多个记录。

from elasticsearch import Elasticsearch 
from elasticsearch import helpers 
es = Elasticsearch() 

data = [ 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_op_type": "create", 
     "_id": 3, 
     "doc" : {"name": "test"} 
    }, 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_op_type": "create", 
     "_id": 4, 
     "doc" : {"name": "test"} 
    }, 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_op_type": "create", 
     "_id": 5, 
     "doc" : {"name": "test"} 
    }, 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_op_type": "create", 
     "_id": 6, 
     "doc" : {"name": "test"} 
    }, 
] 


print helpers.bulk(es, data) 

是否有任何方法可以执行此操作?

现在我们只能给_op_type作为createupdate。如果我们给update并且记录不存在,那么它会引发错误。

Traceback (most recent call last): 
    File "/tmp/test.py", line 37, in <module> 
    print helpers.bulk(es, data) 
    File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk 
    for ok, item in streaming_bulk(client, actions, **kwargs): 
    File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk 
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors) 
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}]) 
+1

你试过用'index'作为op_type而不是'create'和'update'吗? – Val

+0

@Val,根据'helpers.bulk'文件,我们必须给'index',我也试过你的解决方案,它给出'ValidationError','elasticsearch.exceptions.TransportError:TransportError(500,u'ActionRequestValidationException [Validation Failed :1:没有添加任何请求;]')' – Nilesh

+0

这很奇怪...你确定你有''_op_type“:”index“'? – Val

回答

2

按照_bulk endpoint文档,你可以和应该使用这个index行动,提供您的文档始终具有相同的标识符。

create在第一次创建文档时很有用,而update更适合做部分和/或脚本更新。

您也可以根本不指定任何_op_type,并且index将默认采用。

2

我尝试了@Val建议的解决方案,它用作魅力。

from elasticsearch import Elasticsearch 
from elasticsearch import helpers 
es = Elasticsearch() 

data = [ 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_id": 3, 
     "doc" : {"name": "test"} 
    }, 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_id": 4, 
     "doc" : {"name": "test"} 
    }, 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_id": 5, 
     "doc" : {"name": "test"} 
    }, 
    { 
     "_index": "customer", 
     "_type": "external", 
     "_id": 6, 
     "doc" : {"name": "test"} 
    }, 
] 


print helpers.bulk(es, data)