2016-11-11 54 views
4

我有以下数据在ElasticSearch上进行索引。在ElasticSearch自动补全上突出显示

enter image description here

我想实现自动完成功能,以及强调为什么一个特定的文件相匹配的查询。

这是我的指标的设置:

{ 
    "settings": { 
     "number_of_shards": 1, 
     "analysis": { 
      "filter": { 
       "autocomplete_filter": { 
        "type":  "edge_ngram", 
        "min_gram": 1, 
        "max_gram": 15 
       } 
      }, 
      "analyzer": { 
       "autocomplete": { 
        "type":  "custom", 
        "tokenizer": "standard", 
        "filter": [ 
         "autocomplete_filter" 
        ] 
       } 
      } 
     } 
    } 
} 

指数分析单词边界

  • 拆分文本。
  • 删除pontuation。
  • 小写
  • 边缘的n-gram每个令牌

所以倒排索引的样子:

enter image description here

我这是怎么定义的映射名称字段:

{ 
    "index_type": { 
     "properties": { 
      "name": { 
       "type":  "string", 
       "index_analyzer": "autocomplete", 
       "search_analyzer": "standard" 
      } 
     } 
    } 
} 

当我查询:

GET http://localhost:9200/index/type/_search 

{ 
    "query": { 
     "match": { 
      "name": "soft" 
     } 
    }, 
    "highlight": { 
     "fields" : { 
      "name" : {} 
     } 
    } 
} 

搜索:

运用标准标记者,“软”一词,找上了倒排索引。该搜索相匹配的文件:1,3,4,5,6,7,它是正确的,但是高亮部分我希望是“软”,而不是整个字:

{ 
    "hits": [ 
    { 
     "_source": { 
     "name": "SoftwareRocks everytime" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>SoftwareRocks</em> everytime" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> AG" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Software AG2" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> AG2" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG good software better" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> AG good <em>software</em> better" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> AG" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "is soft ware ok" 
     }, 
     "highlight": { 
     "name": [ 
      "is <em>soft</em> ware ok" 
     ] 
     } 
    } 
    ] 
} 

搜索: 软件公司

应用标准Tokenizer,“软件ag”被转换成“软件”和“ag”,以找到倒排索引。这个搜索匹配文档:1,3,4,5,6,这是正确的,但突出部分我希望是“软件”和“ag”,而不是围绕“软件”和“ag”的全部词语:

{ 
    "hits": [ 
    { 
     "_source": { 
     "name": "Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> <em>AG</em>" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Software AG2" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>Software</em> <em>AG2</em>" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> <em>AG</em>" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "Op Software AG good software better" 
     }, 
     "highlight": { 
     "name": [ 
      "Op <em>Software</em> <em>AG</em> good <em>software</em> better" 
     ] 
     } 
    }, 
    { 
     "_source": { 
     "name": "SoftwareRocks everytime" 
     }, 
     "highlight": { 
     "name": [ 
      "<em>SoftwareRocks</em> everytime" 
     ] 
     } 
    } 
    ] 
} 

我读elasticsearch亮点文件,但我不明白如何进行突出显示。对于上面的两个例子,我只希望高亮显示倒排索引上的匹配标记,而不是整个单词。 任何人都可以帮助如何突出显示只传递的值?

更新

所以,似乎关于ElasticSearch网站,在服务器端自动完成功能类似于我的实现。但是,他们似乎突出显示了客户端上的匹配查询。 如果他们这样做,我开始认为在ElasticSearch方面没有合适的解决方案,所以我在服务器端而不是客户端实现了高亮功能(就像他们似乎这样做)。

我在服务器端(使用PHP)的实现是:

public function search($term) 
{ 
    $params = [ 
     'index' => $this->getIndexName(), 
     'type' => $this->getIndexType(), 
     'body' => [ 
      'query' => [ 
       'match' => [ 
        'name' => $term 
       ] 
      ] 
     ] 
    ]; 

    $results = $this->client->search($params); 

    $hits = $results['hits']['hits']; 

    $data = []; 

    $wrapBefore = '<strong>'; 
    $wrapAfter = '</strong>'; 

    foreach ($hits as $hit) { 
     $data[] = [ 
      $hit['_source']['id'], 
      $hit['_source']['name'], 
      preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name'])) 
     ]; 
    } 

    return $data; 
} 

输出什么,我针对这个问题:

enter image description here

我加了奖金,看看是否有一个解决方案在ElasticSearch级别达到我上面描述的。

+0

你试过双引号吗?像'“\”软\“”'我怀疑是可能是那种搜索,你在那里可能会有所作为 – gerosalesc

+0

你试过[强制源高亮](https://www.elastic.co /guide/en/elasticsearch/reference/current/search-request-highlighting.html#_force_highlighting_on_source)通过指定'“force_source”:true'? – Val

+0

另外,您可能还想查看“前缀查询”(https://www.elastic.co/guide/en/elasticsearch/guide/current/prefix-query.html)以获取“软”示例 – gerosalesc

回答

1

截至目前与最新版本的弹性这是不可能的,因为highligh文档不提及任何设置或查询。我在xhr requests选项卡下的浏览器控制台中检查了弹性自动完成示例,并找到了关键字“att”自动完成响应的响应,如下所示。

url - https://search.elastic.co/suggest?q=att 
    { 
     "current_page": 1, 
     "last_page": 4, 
     "total_hits": 49, 
     "hits": [ 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/jp/not-attending", 
       "section": "Elasticon", 
       "title": "Not <em>Attending</em> - JP" 
      }, 
      { 
       "section": "Elasticon", 
       "title": "<em>Attending</em> from Training - JP", 
       "tags": [], 
       "url": "/elasticon/tour/2016/jp/attending-training" 
      }, 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/jp/attending-keynote", 
       "title": "<em>Attending</em> from Keynote - JP", 
       "section": "Elasticon" 
      }, 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/not-attending", 
       "section": "Elasticon", 
       "title": "Thank You - Not <em>Attending</em>" 
      }, 
      { 
       "tags": [], 
       "url": "/elasticon/tour/2016/attending", 
       "section": "Elasticon", 
       "title": "Thank You - <em>Attending</em>" 
      }, 
      { 
       "section": "Blog", 
       "title": "What It's Like to <em>Attend</em> Elastic Training", 
       "tags": [], 
       "url": "/blog/what-its-like-to-attend-elastic-training" 
      }, 
      { 
       "tags": "Elasticsearch", 
       "url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html", 
       "section": "Docs/", 
       "title": "Highlighting <em>attachments</em>" 
      }, 
      { 
       "title": "<em>attachments</em> » email", 
       "section": "Docs/", 
       "tags": "Logstash", 
       "url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments" 
      }, 
      { 
       "section": "Docs/", 
       "title": "Configuring Email <em>Attachments</em> » Actions", 
       "tags": "Watcher", 
       "url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments" 
      }, 
      { 
       "url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes", 
       "tags": "Watcher", 
       "title": "HipChat Action <em>Attributes</em> » Actions", 
       "section": "Docs/" 
      }, 
      { 
       "title": "Slack Action <em>Attributes</em> » Actions", 
       "section": "Docs/", 
       "tags": "Watcher", 
       "url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes" 
      } 
     ], 
     "aggs": { 
      "sections": [ 
       { 
        "Elasticon": 5 
       }, 
       { 
        "Blog": 1 
       }, 
       { 
        "Docs/": 43 
       } 
      ], 
      "top_tags": [ 
       { 
        "XPack": 14 
       }, 
       { 
        "Elasticsearch": 12 
       }, 
       { 
        "Watcher": 9 
       }, 
       { 
        "Logstash": 4 
       }, 
       { 
        "Clients": 3 
       }, 
       { 
        "Shield": 1 
       } 
      ] 
     } 
    } 

但是在前端,他们显示“att”只在自动提示结果中突出显示。因此,他们正在处理浏览器层上的重要内容。

+0

感谢您的支持。我检查了弹性网站上的自动完成功能,是的,他们似乎改变了浏览器图层中突出显示的词语,这对我来说似乎很陌生。我期待着一种不同的方式,但是,如果这是要走的路...... –