2017-04-20 103 views
2

我在Elasticsearch中嵌套聚合有一些问题。我有一个嵌套的字段映射:Elasticsearch。嵌套字段上的术语聚合具有重复值

POST my_index/ my_type/_mapping 
{ 
    "properties": { 
     "name": { 
      "type": "keyword" 
     }, 
     "nested_fields": { 
      "type": "nested", 
       "properties": { 
       "key": { 
        "type": "keyword" 
       }, 
       "value": { 
        "type": "keyword" 
       } 
      } 
     } 
    } 
} 

然后,添加一个文件索引:

POST my_index/ my_type 
{ 
    "name":"object1", 
     "nested_fields":[ 
      { 
       "key": "key1", 
       "value": "value1" 

      }, 
      { 
       "key": "key1", 
       "value": "value2" 
      } 
     ] 
} 

正如你看到的,在我的嵌套数组我有两个项目,其中有类似key领域,但不同value字段。然后我想做出这样的查询:

GET/my_index/my_type/_search 
{ 
    "query": { 
     "nested": { 
      "path": "nested_fields", 
       "query": { 
       "bool": { 
        "must": [ 
         { 
          "term": { 
           "nested_fields.key": { 
            "value": "key1" 
           } 
          } 
         }, 
         { 
          "terms": { 
           "nested_fields.value": [ 
            "value1", 
            "value2" 
           ] 
          } 
         } 
        ] 
       } 
      } 
     } 
    }, 
    "aggs": { 
     "agg_nested_fields": { 
      "nested": { 
       "path": "nested_fields" 
      }, 
      "aggs": { 
       "agg_nested_fields_key": { 
        "terms": { 
         "field": "nested_fields.key", 
          "size": 10 
        } 
       } 
      } 
     } 
    } 
} 

正如你看到的,我想找到的所有文件,其中至少有一个物体在nested_field数组,key属性等于key1并提供一个值(value1value2)。然后我想通过nested_fields.key将创建的文档分组。但是,我有这样的反应

{ 
    "took": 13, 
     "timed_out": false, 
      "_shards": { 
     "total": 5, 
      "successful": 5, 
       "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
      "max_score": 0.87546873, 
       "hits": [ 
        { 
         "_index": "my_index", 
         "_type": "my_type", 
         "_id": "AVuLNXxiryKmA7VEwOfV", 
         "_score": 0.87546873, 
         "_source": { 
          "name": "object1", 
          "nested_fields": [ 
           { 
            "key": "key1", 
            "value": "value1" 
           }, 
           { 
            "key": "key1", 
            "value": "value2" 
           } 
          ] 
         } 
        } 
       ] 
    }, 
    "aggregations": { 
     "agg_nested_fields": { 
      "doc_count": 2, 
       "agg_nested_fields_key": { 
       "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
         "buckets": [ 
          { 
           "key": "key1", 
           "doc_count": 2 
          } 
         ] 
      } 
     } 
    } 
} 

正如你从反应看,我有一重击(这是正确的),但该文件在聚集(见doc_count: 2)计算两次,因为它有两个项目与“ key1'的值在nested_fields数组中。我如何在聚合中获得正确的计数?

+0

这是正确的计数,因为每个嵌套元素本身就是一个文档。所以你真的有两个嵌套的文件,它们的key1和'value1'或'value2'都是它们的值。 – Val

+0

是的,我需要这个。我如何解决这个问题? – Stalso

+0

是否有帮助https://stackoverflow.com/a/27578607/7379424? –

回答

0

您将不得不在嵌套聚合中使用reverse_nested aggs以返回根文档上的聚合计数。

{ 
    "query": { 
     "nested": { 
      "path": "nested_fields", 
      "query": { 
       "bool": { 
        "must": [{ 
          "term": { 
           "nested_fields.key": { 
            "value": "key1" 
           } 
          } 
         }, 
         { 
          "terms": { 
           "nested_fields.value": [ 
            "value1", 
            "value2" 
           ] 
          } 
         } 
        ] 
       } 
      } 
     } 
    }, 
    "aggs": { 
     "agg_nested_fields": { 
      "nested": { 
       "path": "nested_fields" 
      }, 
      "aggs": { 
       "agg_nested_fields_key": { 
        "terms": { 
         "field": "nested_fields.key", 
         "size": 10 
        }, 
        "aggs": { 
         "back_to_root": { 
          "reverse_nested": { 
           "path": "_source" 
          } 
         } 
        } 
       } 
      } 
     } 
    } 
} 
+0

这也是不正确的。 – Stalso

+0

如何,你想要父/根doc的数量。好吧,我直到这里才明白'正如你从响应中看到的那样,我有一个命中(这是正确的),但是文档在聚合中被计数了两次(参见doc_count:2),因为它有两个'key1'值nested_fields数组。我如何在聚合中获得正确的数量?'你有更多的信息添加你想要达到的目标 – user3775217

相关问题