2016-12-04 80 views
0

我需要在关键字和文本数据类型的索引中更改一些字符。 为了做到这一点,我想在我的项目中添加相同的字符映射过滤器到关键字分析器和语言特定的分析器。 我选择的解决方案是: 创建两个自定义分析器,扩展语言特定的分析器和关键字分析器,然后在我的领域使用它们。 并有我的实施:elasticsearch filter charachter关键字和文本数据类型

"analysis": { 
    "analyzer": { 
    "persian_text_analyzer": { 
     "type": "persian", 
     "char_filter": [ 
     "arabic_to_persian_filter" 
     ] 
    }, 
    "persian_keyword_analyzer": { 
     "type": "keyword", 
     "char_filter": [ 
     "arabic_to_persian_filter" 
     ] 
    } 
    }, 
    "char_filter": { 
    "arabic_to_persian_filter": { 
     "type": "mapping", 
     "mappings": [ 
     "\u0660 => 0", 
     "\u0661 => 1", 
     "\u0662 => 2", 
     "\u0663 => 3", 
     "\u0664 => 4", 
     "\u0665 => 5", 
     "\u0666 => 6", 
     "\u0667 => 7", 
     "\u0668 => 8", 
     "\u0669 => 9", 
     "\u064a => \u06cc", 
     "\u0643 => \u06a9" 
     ] 
    } 
    } 
} 

但它不起作用。 是否可以扩展现有分析器并添加一些过滤器而不覆盖现有的过滤器?

如果不是我能做些什么来解决我的问题?

+0

什么'$ endpoint/$ index/_analyze?analyzer = persian_keyword_analyzer&text = 0'说? – jasonz

+0

@jasonz 下面是结果: { \t “令牌”:[ \t \t { \t \t \t “令牌”: “0”, \t \t \t “start_offset”:0, \t \t \t“end_offset “:1, \t \t \t ”类型“:字, \t \t \t ”位置“:0 \t \t} \t] } –

+0

您是否更新了索引的设置?因为结果应该是'{“tokens”:[{“token”:“0”,“start_offset”:0,“end_offset”:1,“type”:“word”,“position”:0}] }',至少在ES v2.3.5上。 – jasonz

回答

0

与@jasonz我已经解决了我的问题坦克 我有2个失误: 由1-

"persian_keyword_analyzer": { 
     "type": "keyword", 

类型必须是定制
2-关键字类型不能有分析
所以这里我的最终配置:

"settings": { 
    "analysis": { 
     "char_filter": { 
     "zero_width_spaces": { 
      "type":  "mapping", 
      "mappings": [ "\u200c => \u0020"] 
     }, 
     "arabic_to_persian_filter": { 
      "type": "mapping", 
      "mappings": [ 
      "\u0660 => 0", 
      "\u0661 => 1", 
      "\u0662 => 2", 
      "\u0663 => 3", 
      "\u0664 => 4", 
      "\u0665 => 5", 
      "\u0666 => 6", 
      "\u0667 => 7", 
      "\u0668 => 8", 
      "\u0669 => 9", 
      "\u064a => \u06cc", 
      "\u0643 => \u06a9" 
      ] 
     } 
     }, 
     "filter": { 
     "persian_stop": { 
      "type":  "stop", 
      "stopwords": "_persian_" 
     } 
     }, 
     "analyzer": { 
     "persian_text_analyzer": { 
      "type": "custom", 
      "tokenizer":  "standard", 
      "char_filter": [ 
      "zero_width_spaces", 
      "arabic_to_persian_filter" 
      ], 
      "filter": [ 
      "lowercase", 
      "arabic_normalization", 
      "persian_normalization", 
      "persian_stop" 
      ] 
     }, 
     "persian_keyword_analyzer": { 
      "type": "custom", 
      "tokenizer":  "keyword", 
      "char_filter": [ 
      "zero_width_spaces", 
      "arabic_to_persian_filter" 
      ], 
      "filter": [ 
      "lowercase", 
      "arabic_normalization", 
      "persian_normalization" 
      ] 
     } 
     } 
    } 
    }