我有以下数据在ElasticSearch上进行索引。在ElasticSearch自动补全上突出显示
我想实现自动完成功能,以及强调为什么一个特定的文件相匹配的查询。
这是我的指标的设置:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"autocomplete_filter"
]
}
}
}
}
}
指数分析单词边界
- 拆分文本。
- 删除pontuation。
- 小写
- 边缘的n-gram每个令牌
所以倒排索引的样子:
我这是怎么定义的映射名称字段:
{
"index_type": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
当我查询:
GET http://localhost:9200/index/type/_search
{
"query": {
"match": {
"name": "soft"
}
},
"highlight": {
"fields" : {
"name" : {}
}
}
}
搜索:软
运用标准标记者,“软”一词,找上了倒排索引。该搜索相匹配的文件:1,3,4,5,6,7,它是正确的,但是高亮部分我希望是“软”,而不是整个字:
{
"hits": [
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
},
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> AG"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> AG2"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> AG good <em>software</em> better"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> AG"
]
}
},
{
"_source": {
"name": "is soft ware ok"
},
"highlight": {
"name": [
"is <em>soft</em> ware ok"
]
}
}
]
}
搜索: 软件公司
应用标准Tokenizer,“软件ag”被转换成“软件”和“ag”,以找到倒排索引。这个搜索匹配文档:1,3,4,5,6,这是正确的,但突出部分我希望是“软件”和“ag”,而不是围绕“软件”和“ag”的全部词语:
{
"hits": [
{
"_source": {
"name": "Software AG"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Software AG2"
},
"highlight": {
"name": [
"<em>Software</em> <em>AG2</em>"
]
}
},
{
"_source": {
"name": "Op Software AG"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em>"
]
}
},
{
"_source": {
"name": "Op Software AG good software better"
},
"highlight": {
"name": [
"Op <em>Software</em> <em>AG</em> good <em>software</em> better"
]
}
},
{
"_source": {
"name": "SoftwareRocks everytime"
},
"highlight": {
"name": [
"<em>SoftwareRocks</em> everytime"
]
}
}
]
}
我读elasticsearch亮点文件,但我不明白如何进行突出显示。对于上面的两个例子,我只希望高亮显示倒排索引上的匹配标记,而不是整个单词。 任何人都可以帮助如何突出显示只传递的值?
更新
所以,似乎关于ElasticSearch网站,在服务器端自动完成功能类似于我的实现。但是,他们似乎突出显示了客户端上的匹配查询。 如果他们这样做,我开始认为在ElasticSearch方面没有合适的解决方案,所以我在服务器端而不是客户端实现了高亮功能(就像他们似乎这样做)。
我在服务器端(使用PHP)的实现是:
public function search($term)
{
$params = [
'index' => $this->getIndexName(),
'type' => $this->getIndexType(),
'body' => [
'query' => [
'match' => [
'name' => $term
]
]
]
];
$results = $this->client->search($params);
$hits = $results['hits']['hits'];
$data = [];
$wrapBefore = '<strong>';
$wrapAfter = '</strong>';
foreach ($hits as $hit) {
$data[] = [
$hit['_source']['id'],
$hit['_source']['name'],
preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
];
}
return $data;
}
输出什么,我针对这个问题:
我加了奖金,看看是否有一个解决方案在ElasticSearch级别达到我上面描述的。
你试过双引号吗?像'“\”软\“”'我怀疑是可能是那种搜索,你在那里可能会有所作为 – gerosalesc
你试过[强制源高亮](https://www.elastic.co /guide/en/elasticsearch/reference/current/search-request-highlighting.html#_force_highlighting_on_source)通过指定'“force_source”:true'? – Val
另外,您可能还想查看“前缀查询”(https://www.elastic.co/guide/en/elasticsearch/guide/current/prefix-query.html)以获取“软”示例 – gerosalesc