2016-09-30 1546 views
0

我在尝试上传大型json文件时收到MapperParsingException。下面是完整的错误,我回来从elasticsearch:elasticsearch mapperParsingException批量导入

on [[sample][4]] 
MapperParsingException[failed to parse]; nested: IllegalArgumentException[Malformed content, found extra data after parsing: START_OBJECT]; 
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:156) 
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309) 
    at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:529) 
    at org.elasticsearch.index.shard.IndexShard.prepareCreateOnPrimary(IndexShard.java:506) 
    at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:214) 
    at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:223) 
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:157) 
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:66) 
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:657) 
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) 
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:287) 
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279) 
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77) 
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376) 
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.IllegalArgumentException: Malformed content, found extra data after parsing: START_OBJECT 
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:141) 
    ... 17 more 

我想更好地了解到底为什么我试着在饲料中的数据格式不正确的,我能怎么办呢,以更好地调试这种情况?

编辑这2点亿的例子一个巨大的文件,但这里有一个例子数据点 {"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]}

+0

你能否给我们提供一些更多的信息.​​.JSON的一段代码会很有用。 –

+1

@SimonLudwig这个文件有2亿个条目,并不是所有条目都填写了所有的数据,我可以举几个例子。 – TheM00s3

回答

0

确保每一个奇数行是唯一ID行:

{ "index": {}} 

这每个偶数行是数据:

{ "index": {}} 
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]} 

并使用_bulk,所以增加了弹性时:从您的日志, found extra data after parsing: START_OBJECT

POST /index/type/_bulk 
{ "index": {}} 
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]} 
{ "index": {}} 
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]} 
{ "index": {}} 
{"company":"E-Corp","title":"Sith lord","people":[{"id":"12345","name":"Darth Vader","title":"The Sith Lord"}]} 

只是猜测,错误信息的原因。

+0

是的,那是错误信息。这里是我的索引看起来像'localhost:9200/sample'那么这意味着我的curl语句看起来像'curl -XPOST localhost:9200/sample/_bulk --binary-data @ output.json'? – TheM00s3

+0

'curl -XPOST http:// server:port/index/type/_bulk --binary-data @ filename.json' – Anuga

0

您是否指定了映射? 如果您不是,那么elasticsearch将根据第一个文档创建一个映射。现在,如果其他任何文档都具有不映射到这些特定字段的值,则可能会出现错误。

https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html

例如,company可能将被映射为字符串,但如果文档与在该领域的数字或日期走来,那么错误可能被抛出。

你也有嵌套的文件(人) - 我也会研究。你可以试着拿一些样本文件 - 比如前10个,看看你是否可以使用批量API来索引它们。

或者您可以为每个这些字段创建您自己的映射,因为您似乎没有很多每个文档的字段。