使用NEST ElasticSearch客户端获取不同的值

我正在使用NEST客户端在我的.NET应用程序中使用Elastic Search构建产品搜索引擎，并且我遇到了一件麻烦事。获取一组不同的值。使用NEST ElasticSearch客户端获取不同的值

我在寻找产品，有成千上万的产品，但当然我一次只能返回10或20个给用户。对于这个分页工作正常。但除了这个主要结果之外，我想向我的用户展示在完整搜索中找到的品牌列表，以展示这些用于过滤的品牌。

我已阅读有关我应该使用此术语聚合。但是，我无法得到比这更好的东西。而这仍然不能给我我想要的东西，因为它将诸如“20世纪福克斯”之类的价值分成3个单独的价值观。

var brandResults = client.Search<Product>(s => s 
     .Query(query) 
     .Aggregations(a => a.Terms("my_terms_agg", t => t.Field(p => p.BrandName).Size(250)) 
     ) 
    ); 

    var agg = brandResult.Aggs.Terms("my_terms_agg");

这是否正确？或者应该使用完全不同的东西？而且，我如何获得正确的完整值？（不是按空格拆分..但我想这就是你要求的'条款'列表时得到的结果？）

我在找什么是你会得到，如果你会这样做在MS SQL

SELECT DISTINCT BrandName FROM [Table To Search] WHERE [Where clause without paging]

来源

2015-02-23 Bart

你说得对，你想要的是术语聚合。您遇到的问题是ES正在将它返回的结果中的“BrandName”字段拆分。这是ES中字段的预期默认行为。

我建议您将BrandName更改为“Multifield”，这将允许您搜索所有各个部分，以及在“未分析”（又名完整的“20世纪福克斯” “）术语。

这里是来自ES的文档。

https://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html

[更新] 如果您正在使用ES版本1.4或更新版本的语法，多领域，现在有一点不同。

https://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields

这里是一个完整的工作示例的示出了在ES 1.4.4的点。请注意，映射会指定该字段的“not_analyzed”版本。

PUT hilden1 

PUT hilden1/type1/_mapping 
{ 
    "properties": { 
    "brandName": { 
     "type": "string", 
     "fields": { 
     "raw": { 
      "type": "string", 
      "index": "not_analyzed" 
     } 
     } 
    } 
    } 
} 

POST hilden1/type1 
{ 
    "brandName": "foo" 
} 

POST hilden1/type1 
{ 
    "brandName": "bar" 
} 

POST hilden1/type1 
{ 
    "brandName": "20th Century Fox" 
} 

POST hilden1/type1 
{ 
    "brandName": "20th Century Fox" 
} 

POST hilden1/type1 
{ 
    "brandName": "foo bar" 
} 

GET hilden1/type1/_search 
{ 
    "size": 0, 
    "aggs": { 
    "analyzed_field": { 
     "terms": { 
     "field": "brandName", 
     "size": 10 
     } 
    }, 
    "non_analyzed_field": { 
     "terms": { 
     "field": "brandName.raw", 
     "size": 10 
     } 
    }  
    } 
}

最后一次查询的结果：

{ 
    "took": 3, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 5, 
     "max_score": 0, 
     "hits": [] 
    }, 
    "aggregations": { 
     "non_analyzed_field": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "20th Century Fox", 
       "doc_count": 2 
      }, 
      { 
       "key": "bar", 
       "doc_count": 1 
      }, 
      { 
       "key": "foo", 
       "doc_count": 1 
      }, 
      { 
       "key": "foo bar", 
       "doc_count": 1 
      } 
     ] 
     }, 
     "analyzed_field": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "20th", 
       "doc_count": 2 
      }, 
      { 
       "key": "bar", 
       "doc_count": 2 
      }, 
      { 
       "key": "century", 
       "doc_count": 2 
      }, 
      { 
       "key": "foo", 
       "doc_count": 2 
      }, 
      { 
       "key": "fox", 
       "doc_count": 2 
      } 
     ] 
     } 
    } 
}

注意，不是分析的领域保持“20世纪福克斯”和“富巴”在一起，其中作为分析的领域打破了起来。

来源

2015-02-23 15:42:49 jhilden

我刚刚从一周前开始。所以我正在研究最新的1.4.4版本。 – Bart 2015-02-23 15:52:43

你是什么意思改变品牌名称。更新数据库模式？或者在我的查询中改变它内联？ – Bart 2015-02-23 15:59:26

更改ES（数据库）索引器。 – jhilden 2015-02-23 16:40:36

我有类似的问题。我正在显示搜索结果并希望显示类别和子类别的计数。

你是对的使用聚合。我也遇到了字符串被标记的问题（即20世纪的狐狸被分裂） - 这是因为字段被分析。对于我来说，我添加了以下映射（即告诉ES不是分析这一领域）：

"category": { 
      "type": "nested", 
      "properties": { 
      "CategoryNameAndSlug": { 
       "type": "string", 
       "index": "not_analyzed" 
      }, 
      "SubCategoryNameAndSlug": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }

由于jhilden建议，如果你使用这个领域超过一个原因（例如搜索和聚合），您可以将其设置为多字段。所以一方面它可以被分析和用于搜索，另一方面可以不被分析用于聚合。

来源

2015-02-23 16:18:49 Ali

已经很清楚了。谢谢（你的）信息！ – Bart 2015-02-23 16:52:26

你的独特和计数查询是什么样的？听起来很有趣。 – Bart 2015-02-23 17:06:52

使用NEST ElasticSearch客户端获取不同的值

回答

相关问题