2017-04-12 117 views
0

我在使用Elasticsearch Python客户端时遇到了一个问题。我有一个名为test.json的文件(有效!)JSON。我现在想要在elasticsearch中索引该JSON。我试过这个little Tutorial来检查我是否可以连接到我的本地elasticsearch实例,它的工作,所以我相信这个问题是不是在我与elasticsearch连接。Elasticsearch Python客户端索引JSON

当我跑我的小代码在这里:

from elasticsearch import Elasticsearch 
import json 

es = Elasticsearch([{'host': 'localhost', 'port': 9200}]) 

with open('test.json') as json_data: 
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data)) 

我在我的命令行得到这个异常(mapper_parsing_exception?):

Traceback (most recent call last): 
    File "app.py", line 13, in <module> 
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data)) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped 
    return func(*args, params=params, **kwargs) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index 
    _make_path(index, doc_type, id), params=params, body=body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request 
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request 
    self._raise_error(response.status, raw_data) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error 
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) 
elasticsearch.exceptions.RequestError: TransportError(400, u'mapper_parsing_exception', u'failed to parse') 

你能指出我在赖特方向,什么可能是问题吗?

啊,是的,我打印了“json.load(json_data)”蚂蚁工作完美,这意味着从文件加载JSON没有问题。

感谢您的帮助! Greez

更新:

with open('test.json') as json_data: 
    #d = json.load(json_data) 
    print(json_data) 
    es.index(index='testdata', doc_type='generated', id=1, body=json_data) 

此代码也不管用,我甚至不能打印JSON的CL。现在

错误:

<open file 'test.json', mode 'r' at 0x7f8329340c00> 
Traceback (most recent call last): 
    File "app.py", line 14, in <module> 
    es.index(index='testdata', doc_type='generated', id=1, body=json_data) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped 
    return func(*args, params=params, **kwargs) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index 
    _make_path(index, doc_type, id), params=params, body=body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request 
    body = self.serializer.dumps(body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps 
    raise SerializationError(data, e) 
elasticsearch.exceptions.SerializationError: (<closed file 'test.json', mode 'r' at 0x7f8329340c00>, TypeError("Unable to serialize <open file 'test.json', mode 'r' at 0x7f8329340c00> (type: <type 'file'>)",)) 

多数民众赞成在test.json文件(只是一些随机生成的JSON)的内容:

[ 
    { 
     "_id": "58ee19e75ffc814d4dff17da", 
     "index": 0, 
     "guid": "45476739-80b3-49de-8f00-9923f84f56ce", 
     "isActive": true, 
     "balance": "$2,882.08", 
     "picture": "http://placehold.it/32x32", 
     "age": 31, 
     "eyeColor": "blue", 
     "name": "Liliana Odom", 
     "gender": "female", 
     "company": "PLASTO", 
     "email": "[email protected]", 
     "phone": "+1 (983) 474-3785", 
     "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", 
     "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", 
     "registered": "2015-05-07T05:40:28 -02:00", 
     "latitude": -46.141522, 
     "longitude": -157.943368, 
     "tags": [ 
      "labore", 
      "quis" 
     ], 
     "friends": [ 
      { 
      "id": 0, 
      "name": "Earline Bass" 
      } 
     ], 
     "greeting": "Hello, Liliana Odom! You have 5 unread messages.", 
     "favoriteFruit": "apple" 
     } 
    ] 

更新2:

我想这现在:

id = 1 
with open('test.json') as json_data: 
    data = json.load(json_data) 
    for dat in data: 
     print(json.dumps(dat)) 
     es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat)) 
     id += 1 

打印(json.dumps(DAT))的作品,但我现在得到一个IllegalArgumentException:

Traceback (most recent call last): 
    File "app.py", line 15, in <module> 
    es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat)) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped 
    return func(*args, params=params, **kwargs) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index 
    _make_path(index, doc_type, id), params=params, body=body) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request 
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request 
    self._raise_error(response.status, raw_data) 
    File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error 
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) 
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'[Bloodstorm][127.0.0.1:9300][indices:data/write/index[p]]') 

更新3: Hereis ES日志,貌似id字段是该指数定义了两次。

[2017-04-12 17:43:07,847][DEBUG][action.index    ] [Bloodstorm] failed to execute [index {[testdata][generated][AVti1SY7fn4azWzi8gyQ], source[{"guid": "45476739-80b3-49de-8f00-9923f84f56ce", "index": 0, "favoriteFruit": "apple", "latitude": -46.141522, "company": "PLASTO", "email": "[email protected]", "picture": "http://placehold.it/32x32", "tags": ["labore", "quis"], "registered": "2015-05-07T05:40:28 -02:00", "eyeColor": "blue", "phone": "+1 (983) 474-3785", "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", "friends": [{"id": 0, "name": "Earline Bass"}], "isActive": true, "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", "balance": "$2,882.08", "name": "Liliana Odom", "gender": "female", "age": 31, "greeting": "Hello, Liliana Odom! You have 5 unread messages.", "longitude": -157.943368, "_id": "58ee19e75ffc814d4dff17da"}]}] on [[testdata][3]] 
java.lang.IllegalArgumentException: Field [_id] is defined twice in [generated] 
     at org.elasticsearch.index.mapper.MapperService.checkFieldUniqueness(MapperService.java:496) 
     at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:376) 
     at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320) 
     at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:306) 
     at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:230) 
     at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:480) 
     at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:784) 
     at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231) 
     at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 

回答

2

鉴于你test.json文件的结构,你需要分析它,然后每个文档遍历数组中:

with open('test.json') as raw_data: 
    json_docs = json.loads(raw_data) 
    for json_doc in json_docs: 
     my_id = json_doc.pop('_id', None) 
     es.index(index='testdata', doc_type='generated', id=my_id, body=json.dumps(json_doc)) 
+0

看来我要: 'with打开( 'test.json')作为json_data: #D = json.load(json_data) 打印(json_data) es.index(指数='TESTDATA ',doc_type ='generated',id = 1,body = json_data)' 给我这个新错误 'elasticsearch.exceptions.SerializationError :((type :) )似乎反引号不起作用来标记内联代码 – PouletFreak

+0

您应该更新您的问题与该错误,所以它更清晰。你也可以分享你的'test.json'文件的内容吗? – Val

+0

对不起,我在这里比较新;-),更新了我的问题 – PouletFreak

0

可以从您的test.json文件中删除括号,并尝试再次。

{ 
     "_id": "58ee19e75ffc814d4dff17da", 
     "index": 0, 
     "guid": "45476739-80b3-49de-8f00-9923f84f56ce", 
     "isActive": true, 
     "balance": "$2,882.08", 
     "picture": "http://placehold.it/32x32", 
     "age": 31, 
     "eyeColor": "blue", 
     "name": "Liliana Odom", 
     "gender": "female", 
     "company": "PLASTO", 
     "email": "[email protected]", 
     "phone": "+1 (983) 474-3785", 
     "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", 
     "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", 
     "registered": "2015-05-07T05:40:28 -02:00", 
     "latitude": -46.141522, 
     "longitude": -157.943368, 
     "tags": [ 
      "labore", 
      "quis" 
     ], 
     "friends": [ 
      { 
      "id": 0, 
      "name": "Earline Bass" 
      } 
     ], 
     "greeting": "Hello, Liliana Odom! You have 5 unread messages.", 
     "favoriteFruit": "apple" 
     } 
+1

他的JSON文件中可能有几条记录,它的有效性是 – Val

+0

是的,在我的其他json文件中,有更多的记录。 – PouletFreak

相关问题