2015-03-30 84 views
6

编辑:要添加到此,同义词似乎与基本querystring查询一起工作。Elasticsearch同义词分析器不工作

"query_string" : { 
    "default_field" : "location.region.name.raw", 
    "query" : "nh" 
} 

这将返回新罕布什尔州的所有结果,但“nh”的“匹配”查询不会返回任何结果。


我试图同义词添加到我的位置字段在我的弹性指数,所以,如果我做了“大众”,“马”或“马萨诸塞州一个位置搜索”我会得到相同的结果每次。我将同义词过滤器添加到了我的设置中,并更改了位置的映射。这里是我的设置:

analysis":{ 
    "analyzer":{ 
     "synonyms":{ 
      "filter":[ 
       "lowercase", 
       "synonym_filter" 
      ], 
     "tokenizer": "standard" 
    } 
}, 
"filter":{ 
    "synonym_filter":{ 
     "type": "synonym", 
     "synonyms":[ 
      "United States,US,USA,USA=>usa", 
      "Alabama,Al,Ala,Ala", 
      "Alaska,Ak,Alas,Alas", 
      "Arizona,Az,Ariz", 
      "Arkansas,Ar,Ark", 
      "California,Ca,Calif,Cal", 
      "Colorado,Co,Colo,Col", 
      "Connecticut,Ct,Conn", 
      "Deleware,De,Del", 
      "District of Columbia,Dc,Wash Dc,Washington Dc=>Dc", 
      "Florida,Fl,Fla,Flor", 
      "Georgia,Ga", 
      "Hawaii,Hi", 
      "Idaho,Id,Ida", 
      "Illinois,Il,Ill,Ills", 
      "Indiana,In,Ind", 
      "Iowa,Ia,Ioa", 
      "Kansas,Kans,Kan,Ks", 
      "Kentucky,Ky,Ken,Kent", 
      "Louisiana,La", 
      "Maine,Me", 
      "Maryland,Md", 
      "Massachusetts,Ma,Mass", 
      "Michigan,Mi,Mich", 
      "Minnesota,Mn,Minn", 
      "Mississippi,Ms,Miss", 
      "Missouri,Mo", 
      "Montana,Mt,Mont", 
      "Nebraska,Ne,Neb,Nebr", 
      "Nevada,Nv,Nev", 
      "New Hampshire,Nh=>Nh", 
      "New Jersey,Nj=>Nj", 
      "New Mexico,Nm,N Mex,New M=>Nm", 
      "New York,Ny=>Ny", 
      "North Carolina,Nc,N Car=>Nc", 
      "North Dakota,Nd,N Dak, NoDak=>Nd", 
      "Ohio,Oh,O", 
      "Oklahoma,Ok,Okla", 
      "Oregon,Or,Oreg,Ore", 
      "Pennsylvania,Pa,Penn,Penna", 
      "Rhode Island,Ri,Ri & PP,R Isl=>Ri", 
      "South Carolina,Sc,S Car=>Sc", 
      "South Dakota,Sd,S Dak,SoDak=>Sd", 
      "Tennessee,Te,Tenn", 
      "Texas,Tx,Tex", 
      "Utah,Ut", 
      "Vermont,Vt", 
      "Virginia,Va,Virg", 
      "Washington,Wa,Wash,Wn", 
      "West Virginia,Wv,W Va, W Virg=>Wv", 
      "Wisconsin,Wi,Wis,Wisc", 
      "Wyomin,Wi,Wyo" 
     ] 
    } 
} 

而对于location.region领域的映射:

"region":{ 
    "properties":{ 
     "id":{"type": "long"}, 
     "name":{ 
      "type": "string", 
      "analyzer": "synonyms", 
      "fields":{"raw":{"type": "string", "index": "not_analyzed" }} 
     } 
    } 
} 

但同义词分析似乎并没有被做任何事情。该查询例如:

"match" : { 
    "location.region.name" : { 
     "query" : "Massachusetts", 
     "type" : "phrase", 
     "analyzer" : "synonyms" 
    } 
} 

这将返回数百个结果,但如果我取代“马萨诸塞州”与“马”或“质量”,我得到0的结果。为什么它不工作?

回答

10

过滤器的顺序是

filter":[ 
    "lowercase", 
    "synonym_filter" 
] 

所以,如果elasticsearch是“lowercasing”第一令牌,当它执行第二步,synonym_filter,它不会匹配任何已定义的条目。

为了解决这个问题,我会在较低的情况下定义的同义词

+0

我想问一个关于过滤器的工作机制问题。分析器如何使用过滤器?在这个例子中,小写的过滤工作和返回的令牌和令牌由synonym_filter获取,并且synonym_filter工作并返回新的过滤的令牌。这种情况是正确的还是如何? – hkulekci 2015-04-30 07:09:16

+0

是的,你描述的场景是正确的:)一般来说,tokenizer(在这种情况下是标准tokenizer)被执行,然后令牌按定义的顺序过滤(在这种情况下,首先是小写,然后是synonym_filter)。该文档解释它很好http://www.elastic.co/guide/en/elasticsearch/reference/1.5/analysis-analyzers.html – moliware 2015-04-30 07:28:42

+0

好的谢谢。我读他们:)我想确定。 – hkulekci 2015-04-30 07:34:44

0

您也可以定义同义词过滤器不区分大小写:

 

    "filter":{ 
     "synonym_filter":{ 
      "type": "synonym", 
      "ignore_case" : "true", 
      "synonyms":[ 
       ... 
      ] 
     } 
    }