如何使Elasticsearch首选匹配字符串进行排序/偏好匹配

我正在使用默认分析器和索引。所以我们可以说我有这个简单的映射：（这是一个例子对不起，如果有错别字）如何使Elasticsearch首选匹配字符串进行排序/偏好匹配

"question": { 
    "properties": { 
     "title": { 
      "type": "string" 
     }, 
     "answer": { 
      "properties": { 
       "text": { 
        "type": "string" 
       } 
      } 
     } 
    } 
}

现在，我执行下面的搜索。

GET _search 
{ 
    "query": { 
     "query_string": { 
      "query": "yes correct", 
      "fields": ["answer.text"] 
     } 
    } 
}

结果将得分为text的值，如“是正确的”。（文档ID值1）高于简单的“是的正确”（没有一个句点，文档ID值181）。两个匹配都具有相同的分数值，但匹配数组首先列出了较小的doc ID。我知道默认索引选项包括按文档ID排序，那么如何排除该属性并仍使用其余默认选项？

我没有设置任何自定义分析器，所以一切都使用Elasticsearch 2.0的默认值。

来源

2015-11-03 user5243421

请注意''fields'“应该是”default_field“，否则查询将不起作用。两人在我的最后都得到了完全相同的分数。你能展示你正在基于自己的样本文件吗？ – Val

对不起，我想我的代码中有一个错字。使用'fields'对我有用，并将其更改为'default_field'不会改变匹配分数。我也没有意识到分数是完全一样的。 * oops * – user5243421

我的不好，抱歉，''fields“'当然需要一些咖啡:)' – Val

这可能是一个用例Dis Max Query

生成的通过其子查询产生的文档的联合查询，并且分数具有最大分数的文档作为由任何子查询产生的每个文档，再加上一条打破增加额外的匹配子查询。

因此，您需要将您的答案分数作为完全匹配并给予最高提升。你必须为此使用自定义分析器。这会是你的映射：

PUT /test 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "my_keyword": { 
      "type": "custom", 
      "tokenizer": "keyword", 
      "filter": [ 
      "asciifolding", 
      "lowercase" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "question": { 
     "properties": { 
     "title": { 
      "type": "string" 
     }, 
     "answer": { 
      "type": "object", 
      "properties": { 
      "text": { 
       "type": "string", 
       "analyzer": "my_keyword", 
       "fields": { 
       "stemmed": { 
        "type": "string", 
        "analyzer": "standard" 
       } 
       } 
      } 
      } 
     } 
     } 
    } 
    } 
}

您的测试数据：

PUT /test/question/1 
{ 
    "title": "title nr1", 
    "answer": [ 
    { 
     "text": "yes correct." 
    } 
    ] 
} 

PUT /test/question/2 
{ 
    "title": "title nr2", 
    "answer": [ 
    { 
     "text": "yes correct" 
    } 
    ] 
}

现在，当你使用这样的查询查询"yes correct."：

POST /test/_search 
{ 
    "query": { 
    "dis_max": { 
     "tie_breaker": 0.7, 
     "boost": 1.2, 
     "queries": [ 
     { 
      "match": { 
      "answer.text": { 
       "query": "yes correct.", 
       "type": "phrase" 
      } 
      } 
     }, 
     { 
      "match": { 
      "answer.text.stemmed": { 
       "query": "yes correct.", 
       "operator": "and" 
      } 
      } 
     } 
     ] 
    } 
    } 
}

你得到这样的输出：

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.37919715, 
     "hits": [ 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "1", 
      "_score": 0.37919715, 
      "_source": { 
       "title": "title nr1", 
       "answer": [ 
        { 
        "text": "yes correct." 
        } 
       ] 
      } 
     }, 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "2", 
      "_score": 0.11261705, 
      "_source": { 
       "title": "title nr2", 
       "answer": [ 
        { 
        "text": "yes correct" 
        } 
       ] 
      } 
     } 
     ] 
    } 
}

如果y OU运行同样的查询，而尾随点，然后成为"yes correct"，你得到这样的结果：

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.37919715, 
     "hits": [ 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "2", 
      "_score": 0.37919715, 
      "_source": { 
       "title": "title nr2", 
       "answer": [ 
        { 
        "text": "yes correct" 
        } 
       ] 
      } 
     }, 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "1", 
      "_score": 0.11261705, 
      "_source": { 
       "title": "title nr1", 
       "answer": [ 
        { 
        "text": "yes correct." 
        } 
       ] 
      } 
     } 
     ] 
    } 
}

希望这是你在找什么。

顺便说一句，我建议在执行文本搜索时总是使用Match查询。从资料为准：

比较QUERY_STRING /场

匹配家庭查询不会通过“查询解析”的过程走的。它不支持字段名称前缀，通配符或其他“高级” 功能。由于这个原因，它失败的几率很小/非存在，它提供了一个很好的行为，当它涉及到分析和运行该文本作为查询行为（这通常是什么文本搜索框）。另外，phrase_prefix类型可以提供一个很棒的“你输入”行为来自动加载搜索结果。

来源

2015-11-03 07:30:41

感谢您推荐'match'。我刚刚开始使用Elasticsearch，需要付出很多努力。我认为这将像设置索引并开始查询一样简单！哈哈 – user5243421

我不确定我遵循我将不得不用作“其他东西”的东西。我只是想匹配''是的正确的'' - 精确匹配排序和/或得分高于类似'“是的正确。”'。 – user5243421

我误解了你一下。我会很快更新我的答案。 –

Elasticsearch或更确切地说Lucene评分没有考虑到令牌的相对定位。它utlizes 3条不同的规定 - 做同样的

词频 - 频率处的搜索词出现在文档
倒排文档频率 - 在整个数据库中搜索词的出现次数。发生的次数越多，常见搜索词越少，搜索词重要性越低
字段长度标准化 - 目标字段中存在的标记数。

您可以了解更多关于它here。

来源

2015-11-03 04:23:31

这很令人困惑，因为它表示'string'字段默认情况下分析为位置：https：//www.elastic .co/guide/en/elasticsearch/reference/2.0/index-options.html – user5243421

该位置也存储，但不用于计算相关性。 –

那么我们可以告诉Elasticsearch在计算相关性时使用位置吗？我觉得我的情况应该足够普遍，我应该能够找到答案的地方，但我有很多困扰寻找正确的术语... – user5243421

如何使Elasticsearch首选匹配字符串进行排序/偏好匹配

回答

相关问题