2017-08-09 400 views
0

我正在将elasticsearch prod数据从1.4.3v迁移到5.5v,为此我使用的是reindex。当我尝试重新索引老ES指数新ES指数编制索引失败并抛出异常Failed Reason: mapper [THROUGHPUT_ROWS_PER_SEC] cannot be changed from type [long] to [float]. Failed Type: illegal_argument_exceptionElasticsearch数据与映射不匹配

为task_history指数ES映射ES在ES 5.5V task_history指数1.4.3v

{ 
    "task_history": { 
     "mappings": { 
     "task_run_hist": { 
      "_all": { 
       "enabled": false 
      }, 
      "_routing": { 
       "required": true, 
       "path": "org_id" 
      }, 
      "properties": { 
       "RUN_TIME_IN_MINS": { 
        "type": "double" 
       }, 
       "THROUGHPUT_ROWS_PER_SEC": { 
        "type": "long" 
       }, 
       "account_name": { 
        "type": "string", 
        "index": "not_analyzed", 
        "store": true 
       } 
      } 
     } 
     } 
    } 
} 

ES映射(该映射被创建作为部分重新编制索引)

{ 
    "task_history": { 
    "mappings": { 
     "task_run_hist": { 
     "_all": { 
      "enabled": false 
     }, 
     "_routing": { 
      "required": true 
     }, 
     "properties": { 
      "RUN_TIME_IN_MINS": { 
      "type": "float" 
      }, 
      "THROUGHPUT_ROWS_PER_SEC": { 
      "type": "long" 
      }, 
      "account_name": { 
      "type": "keyword", 
      "store": true 
      } 
     } 
     } 
    } 
    } 
} 

样本数据

{ 
    "_index": "task_history", 
    "_type": "task_run_hist", 
    "_id": "1421955143", 
    "_score": 1, 
    "_source": { 
    "RUN_TIME_IN_MINS": 0.47, 
    "THROUGHPUT_ROWS_PER_SEC": 46, 
    "org_id": "xxxxxx", 
    "account_name": "Soma Acc1" 
    } 
}, 
{ 
    "_index": "task_history", 
    "_type": "task_run_hist", 
    "_id": "1421943738", 
    "_score": 1, 
    "_source": { 
    "RUN_TIME_IN_MINS": 1.02, 
    "THROUGHPUT_ROWS_PER_SEC": 65.28, 
    "org_id": "yyyyyy", 
    "account_name": "Choma Acc1" 
    } 
} 

2个问题

  1. 如何为THROUGHPUT_ROWS_PER_SEC类型是long映射时elasticsearch 1.4.3在保存浮点数?
  2. 如果这是旧ES中的数据问题,我怎么能在开始重新索引过程之前删除所有的浮点数?

对于第二个选项我想列出使用以下查询,这样我就可以验证一次,并删除它有浮点数的所有文件,但下面的查询仍然列出有THROUGHPUT_ROWS_PER_SEC非浮点数文件。

注:Groovy脚本启用

GET task_history/task_run_hist/_search?size=100 
{ 
    "filter": { 
     "script": { 
     "script": "doc['THROUGHPUT_ROWS_PER_SEC'].value % 1 == 0" 
     } 
    } 
} 

一个由Val

提供当我尝试在下面重新索引脚本解决方案更新,我得到一个运行时错误。下面列出。任何关于在这里得到的东西的线索?我添加了附加条件,将RUN_TIME_IN_MINS浮动为原始脚本在RUN_TIME_IN_MINS字段中指出的错误。 mapper [RUN_TIME_IN_MINS] cannot be changed from type [long] to [float]"

POST _reindex?wait_for_completion=false 
{ 
    "source": { 
    "remote": { 
     "host": "http://esip:15000" 
    }, 
    "index": "task_history" 
    }, 
    "dest": { 
    "index": "task_history" 
    }, 
    "script": { 
    "inline": "if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' } ctx._source.RUN_TIME_IN_MINS = (float) ctx._source.RUN_TIME_IN_MINS;", 
    "lang": "painless" 
    } 
} 

运行时错误

{ 
    "completed": true, 
    "task": { 
    "node": "wZOzypYlSayIRlhp9y3lVA", 
    "id": 645528, 
    "type": "transport", 
    "action": "indices:data/write/reindex", 
    "status": { 
     "total": 18249521, 
     "updated": 4691, 
     "created": 181721, 
     "deleted": 0, 
     "batches": 37, 
     "version_conflicts": 0, 
     "noops": 67076, 
     "retries": { 
     "bulk": 0, 
     "search": 0 
     }, 
     "throttled_millis": 0, 
     "requests_per_second": -1, 
     "throttled_until_millis": 0 
    }, 
    "description": """ 
reindex from [host=esip port=15000 query={ 
    "match_all" : { 
    "boost" : 1.0 
    } 
}][task_history] updated with Script{type=inline, lang='painless', idOrCode='if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' } ctx._source.RUN_TIME_IN_MINS = (float) ctx._source.RUN_TIME_IN_MINS;', options={}, params={}} to [task_history] 
""", 
    "start_time_in_millis": 1502336063507, 
    "running_time_in_nanos": 93094657751, 
    "cancellable": true 
    }, 
    "error": { 
    "type": "script_exception", 
    "reason": "runtime error", 
    "script_stack": [], 
    "script": "if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' } ctx._source.RUN_TIME_IN_MINS = (float) ctx._source.RUN_TIME_IN_MINS;", 
    "lang": "painless", 
    "caused_by": { 
     "type": "null_pointer_exception", 
     "reason": null 
    } 
    } 
} 
+0

这是非常可能的,你在ES 1.x中创建的第一个文件有很长的值(参见'“THROUGHPUT_ROWS_PER_SEC”:46')和映射是在创建基础。然后所有后续值(无论是否浮动)都将被强制延长。您需要在启动reindex过程之前在ES 5 **中创建映射。 – Val

+0

@Val:在这种情况下,具有浮点数的文档将抛出异常并停止重新索引过程,并且映射是正确的。它必须是'long'类型的。 – abi1964

+0

您显然需要在ES 5.x映射中设置'double'以适应您的不同值 – Val

回答

0

你显然希望保持与long因此,所有你需要做的是现有的ES 5.x的映射脚本添加到您的通话重新索引可修改THROUGHPUT_ROWS_PER_SEC字段为long。像这样的东西应该做的:

POST _reindex 
{ 
    "source": { 
    "remote": { 
     "host": "http://es1host:9200", 
    }, 
    "index": "task_history" 
    }, 
    "dest": { 
    "index": "task_history" 
    }, 
    "script": { 
    "inline": "if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' }" }, 
    "lang": "painless" 
    } 
} 
+0

但后来的数据将是错误的。相反,如果我可以删除所有浮点数并仅重新索引长数字,这种方式很少有数据会丢失,并且不会有误导性。 – abi1964

+0

在这种情况下,如果您的模数条件成立,您可以忽略文档。查看我更新的答案 – Val

+0

经过一些修改后,我得到一个运行时异常,更新了我的问题以获取更多信息。 – abi1964