在ElasticSearch自动补全上突出显示

我有以下数据在ElasticSearch上进行索引。在ElasticSearch自动补全上突出显示

我想实现自动完成功能，以及强调为什么一个特定的文件相匹配的查询。

这是我的指标的设置：

{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
      "filter": { 
       "autocomplete_filter": { 
        "type":  "edge_ngram", 
        "min_gram": 1, 
        "max_gram": 15 
       } 
      }, 
      "analyzer": { 
       "autocomplete": { 
        "type":  "custom", 
        "tokenizer": "standard", 
        "filter": [ 
         "autocomplete_filter" 
        ] 
       } 
      } 
     } 
    } 
}

指数分析单词边界

拆分文本。
删除pontuation。
小写
边缘的n-gram每个令牌

所以倒排索引的样子：

我这是怎么定义的映射名称字段：

{ 
    "index_type": { 
     "properties": { 
      "name": { 
       "type":  "string", 
       "index_analyzer": "autocomplete", 
       "search_analyzer": "standard" 
      } 
     } 
    } 
}

当我查询：

GET http://localhost:9200/index/type/_search 

{ 
    "query": { 
     "match": { 
      "name": "soft" 
     } 
    }, 
    "highlight": { 
     "fields" : { 
      "name" : {} 
     } 
    } 
}

搜索：软

运用标准标记者，“软”一词，找上了倒排索引。该搜索相匹配的文件：1，3，4，5，6，7，它是正确的，但是高亮部分我希望是“软”，而不是整个字：

{ 
    "hits": [ 
    { 
     "_source": { 
     "name": "SoftwareRocks everytime" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>SoftwareRocks</em> everytime" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> AG" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Software AG2" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> AG2" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG good software better" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> AG good <em>software</em> better" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> AG" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "is soft ware ok" 
     }, 
     "highlight": { 
     "name": [ 
      "is <em>soft</em> ware ok" 
     ] 
     } 
    } 
    ] 
}

搜索： 软件公司

应用标准Tokenizer，“软件ag”被转换成“软件”和“ag”，以找到倒排索引。这个搜索匹配文档：1,3,4,5,6，这是正确的，但突出部分我希望是“软件”和“ag”，而不是围绕“软件”和“ag”的全部词语：

{ 
    "hits": [ 
    { 
     "_source": { 
     "name": "Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> <em>AG</em>" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Software AG2" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> <em>AG2</em>" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> <em>AG</em>" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG good software better" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> <em>AG</em> good <em>software</em> better" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "SoftwareRocks everytime" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>SoftwareRocks</em> everytime" 
     ] 
     } 
    } 
    ] 
}

我读elasticsearch亮点文件，但我不明白如何进行突出显示。对于上面的两个例子，我只希望高亮显示倒排索引上的匹配标记，而不是整个单词。任何人都可以帮助如何突出显示只传递的值？

更新

所以，似乎关于ElasticSearch网站，在服务器端自动完成功能类似于我的实现。但是，他们似乎突出显示了客户端上的匹配查询。如果他们这样做，我开始认为在ElasticSearch方面没有合适的解决方案，所以我在服务器端而不是客户端实现了高亮功能（就像他们似乎这样做）。

我在服务器端（使用PHP）的实现是：

public function search($term) 
{ 
    $params = [ 
     'index' => $this->getIndexName(), 
     'type' => $this->getIndexType(), 
     'body' => [ 
      'query' => [ 
       'match' => [ 
        'name' => $term 
       ] 
      ] 
     ] 
    ]; 

    $results = $this->client->search($params); 

    $hits = $results['hits']['hits']; 

    $data = []; 

    $wrapBefore = '<strong>'; 
    $wrapAfter = '</strong>'; 

    foreach ($hits as $hit) { 
     $data[] = [ 
      $hit['_source']['id'], 
      $hit['_source']['name'], 
      preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name'])) 
     ]; 
    } 

    return $data; 
}

输出什么，我针对这个问题：

我加了奖金，看看是否有一个解决方案在ElasticSearch级别达到我上面描述的。

来源

2016-11-11 João Alves

你试过双引号吗？像'“\”软\“”'我怀疑是可能是那种搜索，你在那里可能会有所作为 – gerosalesc

你试过[强制源高亮]（https://www.elastic.co /guide/en/elasticsearch/reference/current/search-request-highlighting.html#_force_highlighting_on_source）通过指定'“force_source”：true'？ – Val

另外，您可能还想查看“前缀查询”（https://www.elastic.co/guide/en/elasticsearch/guide/current/prefix-query.html）以获取“软”示例 – gerosalesc

截至目前与最新版本的弹性这是不可能的，因为highligh文档不提及任何设置或查询。我在xhr requests选项卡下的浏览器控制台中检查了弹性自动完成示例，并找到了关键字“att”自动完成响应的响应，如下所示。

url - https://search.elastic.co/suggest?q=att 
    { 
     "current_page": 1, 
     "last_page": 4, 
     "total_hits": 49, 
     "hits": [ 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/jp/not-attending", 
       "section": "Elasticon", 
       "title": "Not <em>Attending</em> - JP" 
      }, 
      { 
       "section": "Elasticon", 
       "title": "<em>Attending</em> from Training - JP", 
       "tags": [], 
       "url": "/elasticon/tour/2016/jp/attending-training" 
      }, 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/jp/attending-keynote", 
       "title": "<em>Attending</em> from Keynote - JP", 
       "section": "Elasticon" 
      }, 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/not-attending", 
       "section": "Elasticon", 
       "title": "Thank You - Not <em>Attending</em>" 
      }, 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/attending", 
       "section": "Elasticon", 
       "title": "Thank You - <em>Attending</em>" 
      }, 
      { 
       "section": "Blog", 
       "title": "What It's Like to <em>Attend</em> Elastic Training", 
       "tags": [], 
       "url": "/blog/what-its-like-to-attend-elastic-training" 
      }, 
      { 
       "tags": "Elasticsearch", 
       "url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html", 
       "section": "Docs/", 
       "title": "Highlighting <em>attachments</em>" 
      }, 
      { 
       "title": "<em>attachments</em> » email", 
       "section": "Docs/", 
       "tags": "Logstash", 
       "url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments" 
      }, 
      { 
       "section": "Docs/", 
       "title": "Configuring Email <em>Attachments</em> » Actions", 
       "tags": "Watcher", 
       "url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments" 
      }, 
      { 
       "url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes", 
       "tags": "Watcher", 
       "title": "HipChat Action <em>Attributes</em> » Actions", 
       "section": "Docs/" 
      }, 
      { 
       "title": "Slack Action <em>Attributes</em> » Actions", 
       "section": "Docs/", 
       "tags": "Watcher", 
       "url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes" 
      } 
     ], 
     "aggs": { 
      "sections": [ 
       { 
        "Elasticon": 5 
       }, 
       { 
        "Blog": 1 
       }, 
       { 
        "Docs/": 43 
       } 
      ], 
      "top_tags": [ 
       { 
        "XPack": 14 
       }, 
       { 
        "Elasticsearch": 12 
       }, 
       { 
        "Watcher": 9 
       }, 
       { 
        "Logstash": 4 
       }, 
       { 
        "Clients": 3 
       }, 
       { 
        "Shield": 1 
       } 
      ] 
     } 
    }

但是在前端，他们显示“att”只在自动提示结果中突出显示。因此，他们正在处理浏览器层上的重要内容。

来源

2016-11-11 17:39:07 user3775217

感谢您的支持。我检查了弹性网站上的自动完成功能，是的，他们似乎改变了浏览器图层中突出显示的词语，这对我来说似乎很陌生。我期待着一种不同的方式，但是，如果这是要走的路...... –

在ElasticSearch自动补全上突出显示

回答

相关问题