2013-04-25 182 views
9

我有一个名为deviations的属性的“文件”(activerecords)。该属性具有“Bin X”“Bin $”“Bin q”“Bin%”等值。elasticsearch的查询字符串中的符号

我试图使用轮胎/ elasticsearch搜索属性。我正在使用空白分析器来索引偏差属性。这里是我创建索引的代码:

settings :analysis => { 
    :filter => { 
     :ngram_filter => { 
     :type => "nGram", 
     :min_gram => 2, 
     :max_gram => 255 
     }, 
     :deviation_filter => { 
     :type => "word_delimiter", 
     :type_table => ['$ => ALPHA'] 
     } 
    }, 
    :analyzer => { 
     :ngram_analyzer => { 
     :type => "custom", 
     :tokenizer => "standard", 
     :filter => ["lowercase", "ngram_filter"] 
     }, 
     :deviation_analyzer => { 
     :type => "custom", 
     :tokenizer => "whitespace", 
     :filter => ["lowercase"] 
     } 
    } 
    } do 
    mapping do 
     indexes :id, :type => 'integer' 
     [:equipment, :step, :recipe, :details, :description].each do |attribute| 
     indexes attribute, :type => 'string', :analyzer => 'ngram_analyzer' 
     end 
     indexes :deviation, :analyzer => 'whitespace' 
    end 
    end 

当查询字符串不包含特殊字符时,搜索似乎正常工作。例如Bin X将只返回那些在其中包含BinX这些字的记录。但是,搜索诸如Bin $Bin %之类的东西会显示包含字Bin的所有结果几乎会忽略该符号(带符号的结果在搜索中显示的结果较高)。

这里是我创造

def self.search(params) 
    tire.search(load: true) do 
     query { string "#{params[:term].downcase}:#{params[:query]}", default_operator: "AND" } 
     size 1000 
    end 
end 

的搜索方法,在这里是怎么了构建搜索表单:

<div> 
    <%= form_tag issues_path, :class=> "formtastic issue", method: :get do %> 
     <fieldset class="inputs"> 
     <ol> 
      <li class="string input medium search query optional stringish inline"> 
       <% opts = ["Description", "Detail","Deviation","Equipment","Recipe", "Step"] %> 
       <%= select_tag :term, options_for_select(opts, params[:term]) %> 
       <%= text_field_tag :query, params[:query] %> 
       <%= submit_tag "Search", name: nil, class: "btn" %> 
      </li> 
     </ol> 
     </fieldset> 
    <% end %> 
</div> 
+0

你不只是逃避,字符有含义的Lucene用反斜杠?当然,在一个Ruby字符串中,你需要一个双反斜杠\\来在ruby字符到达Elastic Search API之前转义它。我没有试过Tyre,所以我不知道它是否适用于你的世界。仅供参考,这里是受影响字符的快速参考:http://docs.lucidworks.com/display/lweug/Escaping+Special+Syntax+Characters – Phil 2013-04-26 13:39:26

+0

我不认为这是问题,因为查询Bin $或Bin%会受到影响,但它们并未列在上面的链接中作为特殊字符。 – Arnob 2013-04-26 17:48:15

+0

我从我自己的数据库全文搜索(Oracle认为它是和MySQL用于varchar或文本字段中的LIKE测试)中了解到,%是'匹配所有'字符。也许上面的链接不完整,或者与您的问题无关。你是否尝试过逃避,看看是否能解决问题? – Phil 2013-04-27 18:34:36

回答

24

可以净化你的查询字符串。这里是为我试着在它扔一切正常消毒剂:

def sanitize_string_for_elasticsearch_string_query(str) 
    # Escape special characters 
    # http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping Special Characters 
    escaped_characters = Regexp.escape('\\/+-&|!(){}[]^~*?:') 
    str = str.gsub(/([#{escaped_characters}])/, '\\\\\1') 

    # AND, OR and NOT are used by lucene as logical operators. We need 
    # to escape them 
    ['AND', 'OR', 'NOT'].each do |word| 
    escaped_word = word.split('').map {|char| "\\#{char}" }.join('') 
    str = str.gsub(/\s*\b(#{word.upcase})\b\s*/, " #{escaped_word} ") 
    end 

    # Escape odd quotes 
    quote_count = str.count '"' 
    str = str.gsub(/(.*)"(.*)/, '\1\"\3') if quote_count % 2 == 1 

    str 
end 

params[:query] = sanitize_string_for_elasticsearch_string_query(params[:query]) 
+2

我需要将正斜杠也添加到'escaped_characters'数组。 'escaped_characters = Regexp.escape('\\ + - &|!(){} [] ^〜*?:\ /')'因为它正在打破正斜杠的字符串。 – rubyprince 2013-06-27 12:19:14

+0

这很奇怪,因为'/'不是Lucene中的特殊字符:http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping%20Special%20Characters – 2013-06-27 13:19:56

+0

嗨,请参阅http:/ /50.16.250.253:9200/locations/location/_search?q=123%2F345 ..我认为这是一个错误,因为'/'在字符串内......当我用'\\'转义时,错误已解决,http://50.16.250.253:9200/locations/location/_search?q=123%5C%2F345 – rubyprince 2013-07-01 11:58:30