导致CircuitBreakingException使用icu_collation日文文本的嵌套排序

我使用Elasticsearch 2.4，添加了icu_analysis插件以提供对日文文本的排序。它适用于我的本地环境，其中有文件数量有限，不够好，但是当我尝试它放在一个更真实的数据集，查询失败，出现以下CircuitBreakingException：导致CircuitBreakingException使用icu_collation日文文本的嵌套排序

"CircuitBreakingException[[fielddata] Data too large, data for [translations.name.jp_sort] would be larger than limit of [10239895142/9.5gb]]"

据我所知，这个尝试时，会发生对大量文档计数的字段数据进行排序，应该使用文档值 - 但我不确定在这种情况下是否可以完成这项工作，或者为什么尚未发生。

索引中有大约4.7亿个文档，它们将翻译存储为嵌套文档 - 全集中只有约3500万包含日文翻译。下面是文件的映射：

{ 
    "settings" : { 
    "number_of_shards" : 6, 
    "number_of_replicas": 0, 
    "analysis": { 
     "filter": { 
      "trigrams_filter": { 
       "type":  "ngram", 
       "min_gram": 3, 
       "max_gram": 3 
      }, 
      "japanese_ordering": { 
      "type":  "icu_collation", 
      "language": "ja", 
      "country": "JP" 
      } 
     }, 
     "analyzer": { 
     "trigrams": { 
      "tokenizer": "my_ngram_tokenizer", 
      "filter": "lowercase" 
     }, 
     "japanese_ordering": { 
      "tokenizer": "keyword", 
      "filter": [ "japanese_ordering" ] 
     } 
     }, 
     "tokenizer": { 
     "my_ngram_tokenizer": { 
      "type": "nGram", 
      "min_gram": "3", 
      "max_gram": "3", 
      "token_chars": [ 
      "letter", 
      "digit", 
      "symbol", 
      "punctuation" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings" : { 
    "product" : { 
     "_all" : { 
     "enabled" : false 
     }, 
     "properties" : { 
     "name" : { 
      "type" : "string", 
      "analyzer": "trigrams", 
      "fields": { 
      "value" : { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "record_status" : { 
      "type" : "integer" 
     }, 
     "categories" : { 
      "type" : "integer" 
     }, 
     "variant_status" : { 
      "type" : "integer" 
     }, 
     "visit_count" : { 
      "type" : "integer" 
     }, 
     "translations": { 
      "type": "nested", 
      "properties": { 
      "name": { 
       "type": "string", 
       "fields": { 
       "jp_sort": { 
        "type":  "string", 
        "analyzer": "japanese_ordering" 
       } 
       } 
      }, 
      "language_id": { 
       "type": "short" 
      } 
      } 
     } 
     } 
    } 
    } 
}

，这是CircuitBreaking查询：

{ 
    "from": 0, 
    "size": 20, 
    "query": { 
     "bool": { 
      "should": [], 
      "must_not": [], 
      "must": [{ 
       "nested": { 
        "path": "translations", 
        "score_mode": "max", 
        "query": { 
         "bool": { 
          "must": [{ 
           "match": { 
            "translations.name": { 
             "query": "\u30C6\u30B9\u30C8", 
             "boost": 5 
            } 
           } 
          }] 
         } 
        } 
       } 
      }] 
     } 
    }, 
    "filter": { 
     "bool": { 
      "must": [{ 
       "terms": { 
        "variant_status": ["1"], 
        "_cache": true 
       } 
      }, { 
       "nested": { 
        "path": "translations", 
        "query": { 
         "bool": { 
          "must": [{ 
           "term": { 
            "translations.language_id": 9, 
            "_cache": true 
           } 
          }] 
         } 
        } 
       } 
      }, { 
       "term": { 
        "record_status": 1, 
        "_cache": true 
       } 
      }], 
      "must_not": [{ 
       "term": { 
        "product_collections": 0 
       } 
      }] 
     } 
    }, 
    "sort": [{ 
     "translations.name.jp_sort": { 
      "order": "asc", 
      "nested_path": "translations" 
     } 
    }] 
}

来源

2017-07-10 Chris Barcroft

的ES 5.5版本已经推出了名为'icu_collation_keyword'新的字段类型解决了您所遇到的问题。你可以在这里阅读更多信息：https://www.elastic.co/blog/elasticsearch-5-5-0-released – Val

实际上，这确实解决了它 - 我花了几个小时更新我的查询和索引器的版本更改，并且然后尝试了icu_collation_keyword。它运作良好，而且速度非常快！如果您想提交您的评论作为答案，我会将其标记为已接受。谢谢！ –

的ES 5.5版本已推出名为icu_collation_keyword新的字段类型解决了你的问题面对。

你可以在这里阅读更多：https://www.elastic.co/blog/elasticsearch-5-5-0-released

来源

2017-07-11 15:59:10 Val

导致CircuitBreakingException使用icu_collat​​ion日文文本的嵌套排序

回答

相关问题

导致CircuitBreakingException使用icu_collation日文文本的嵌套排序