2016-11-18 79 views
2

鹅卵石过滤器,我有以下文件:Elasticsearch使用带有同义词

  • south africa
  • north africa

我想从找回我的 “南非” 的文件:

  • (a)
  • southafrica(b)中
  • safrica(c)中

我所定义的以下的过滤器和分析仪:

POST test_index 
{ 
    "settings": { 
    "analysis": { 
     "filter": { 
     "synonym_filter": { 
      "type": "synonym", 
      "synonyms": [ 
      "south,s", 
      "north,n" 
      ] 
     }, 
     "shingle_filter": { 
      "type": "shingle", 
      "min_shingle_size": 2, 
      "max_shingle_size": 3, 
      "token_separator": "" 
      } 
     }, 
     "analyzer": { 
     "my_shingle": { 
      "type":  "custom", 
      "tokenizer": "standard", 
      "filter": ["shingle_filter"] 
     }, 
     "my_shingle_synonym": { 
      "type":  "custom", 
      "tokenizer": "standard", 
      "filter": ["shingle_filter", "synonym_filter"] 
     }, 
     "my_synonym_shingle": { 
      "type":  "custom", 
      "tokenizer": "standard", 
      "filter": ["synonym_filter", "shingle_filter"] 
     } 
    } 
    } 
    }, 
    "mappings": {} 
} 

1)随着my_shinglesouth africa将被索引为southsouthafricaafrica

2)With my_shingle_synonymsouth africa将被索引为southssouthafricaafrica

3)同my_synonym_shinglesouth africa将被索引为southsouthssouthsafricassafricaafrica

因此,与

  • (1)I wil升二分找到B

  • (2)I将找到的a,b

  • (3)I会发现,C

我想south africa要被索引为:southssouthafricasafricaafrica

回答

1

你做要输出的所有可能的令牌按您的要求。您的问题可以通过在multi fields上使用不同的分析仪来解决。

您可以像这样定义mapping所需的字段。

"mappings": { 
    "your_mapping": { 
     "properties": { 
     "name": { 
      "type": "string", 
      "analyzer": "my_shingle", 
      "fields": { 
      "synonym": { 
       "type": "string", 
       "analyzer": "my_synonym_shingle" 
      } 
      } 
     } 
     } 
    } 
    } 

样本文档,指数

PUT test_index/your_mapping/1 
{ 
    "name" : "south africa" 
} 

,那么你会用wildcard expression的名称字段的所有变体查询。

GET test_index/your_mapping/_search 
{ 
    "query": { 
    "query_string": { 
     "fields": [ 
     "name*" 
     ], 
     "query": "safrica" 
    } 
    } 
}