我认为Edge NGram tokenizer/filter会有所帮助。
您可以使用仅索引的索引和search only分析器。 索引分析器将只是小写,并使边缘节点。搜索分析器有一个Word Delimiter filter这将负责解析您的查询。 请注意,您可以省略Word分隔符过滤器,只需使用Standard tokenizer而不是Whitespace,它会照顾将其分割为空格和逗号。单词分隔符可让您更好地控制如何分割令牌。
您总是可以使用_analyze api来测试您的标记化如何工作。
指数设置:
{
"settings" : {
"analysis" : {
"filter": {
"word_delimiter_filter": {
"preserve_original": "true",
"catenate_words": "true",
"catenate_all": "true",
"split_on_case_change": "true",
"type": "word_delimiter",
"catenate_numbers": "true",
"stem_english_possessive": "false"
},
"edgengram_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer" : {
"my_edge_ngram_analyzer" : {
"filter": [
"lowercase",
"edgengram_filter"
],
"type": "custom",
"tokenizer" : "whitespace"
},
"my_edge_ngram_search_analyzer": {
"filter": [
"lowercase",
"word_delimiter_filter",
"edgengram_filter"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
映射:
{
"properties": {
"surname_edgengrams": {
"type": "string",
"analyzer": "my_edge_ngram_analyzer",
"search_analyzer": "my_edge_ngram_search_analyzer"
},
"surname": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"surname_edgengrams"
]
}
}
}
我收录的一些文件使用批量API:
{ "index" : { "_index" : "edge_test", "_type" : "test_mapping", "_id" : "1" } }
{ "surname" : "Smith" }
{ "index" : { "_index" : "edge_test", "_type" : "test_mapping", "_id" : "2" } }
{ "surname" : "Rodriguez" }
{ "index" : { "_index" : "edge_test", "_type" : "test_mapping", "_id" : "3" } }
{ "surname" : "Roberts" }
{ "index" : { "_index" : "edge_test", "_type" : "test_mapping", "_id" : "4" } }
{ "surname" : "Doe" }
,并使用以下搜索模板:
{
"query" : {
"bool" : {
"should" : [{
"match" : {
"surname_edgengrams" : {
"query" : "Smith, Rodriguez, ROBERTS, doe",
"boost" : 3
}
}
}
]
}
}
}
结果:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.14085768,
"hits": [
{
"_index": "edge_test",
"_type": "test_mapping",
"_id": "1",
"_score": 0.14085768,
"_source": {
"surname": "Smith"
}
},
{
"_index": "edge_test",
"_type": "test_mapping",
"_id": "3",
"_score": 0.14085768,
"_source": {
"surname": "Roberts"
}
},
{
"_index": "edge_test",
"_type": "test_mapping",
"_id": "2",
"_score": 0.13145615,
"_source": {
"surname": "Rodriguez"
}
},
{
"_index": "edge_test",
"_type": "test_mapping",
"_id": "4",
"_score": 0.065728076,
"_source": {
"surname": "Doe"
}
}
]
}
}