2015-07-20 64 views
2

我希望能够查询文本,但也只检索与特定整场在我的数据中的最大值的结果。我已阅读关于聚合和过滤器的文档,我不太明白我在找什么。如何使弹性搜索查询过滤字段的最大值?

举例来说,我有编入索引是除了整型字段相同的一些重复数据 - 我们称之为领域lastseen

所以,作为一个例子,给出这个数据放入elasticsearch:

// these two the same except "lastseen" field 
    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    "lastseen": 1000 
    }' 

    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    "somevalue": 100 
    }' 

    # and these two the same except "lastseen" field 
    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    "lastseen": 2000 
    }' 

    curl -XPOST localhost:9200/myindex/myobject -d '{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    "lastseen": 200 
    }' 

如果我查询"dinner"

curl -XPOST localhost:9200/myindex -d '{ 
    "query": { 
     "query_string": { 
      "query": "dinner" 
     } 
    } 
    }' 

我会得到4个结果返回。我想有一个过滤器,这样我只得到两个结果回来 - 只与最大lastseen领域的项目。

这是显然不对,但希望它给你的是什么,我以后的想法:

{ 
    "query": { 
     "query_string": { 
      "query": "dinner" 
     } 
    }, 
    "filter": { 
      "max": "lastseen" 
     } 

} 

结果看起来是这样的:

"hits": [ 
     { 
     ... 
     "_source": { 
      "field1": "dinner carrot potato broccoli", 
      "field2": "something here", 
      "lastseen": 1000 
     } 
     }, 
     { 
     ... 
     "_source": { 
      "field1": "fish chicken something", 
      "field2": "dinner", 
      "lastseen": 2000 
     } 
     } 
    ] 

更新1:我试图创建从被索引排除lastseen的映射。这没有奏效。仍然获得4个结果。

curl -XPOST localhost:9200/myindex -d '{ 
    "mappings": { 
     "myobject": { 
     "properties": { 
      "lastseen": { 
      "type": "long", 
      "store": "yes", 
      "include_in_all": false 
      } 
     } 
     } 
    } 
}' 

更新2: 我试图与AGG方案listed here,重复数据删除,并没有工作,但更重要的是,我没有看到一个办法结合起来,与关键字搜索。

+0

如果你有两个文件与'lastseen:2000',你想同时退回或具有'lastseen:2000'和'lastseen:1000'? –

+0

另外,你认为什么是重复的文件?我发现你认识到这种类型的文档具有相同的'field1'。 –

+0

@AndreiStefan复制文档将具有相同的field1和field2。 –

回答

4

不理想,但我认为它可以让你得到你所需要的。

更改您的field1领域的映射,假设这是用来定义“复制”文件,像这样的一个:

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "properties": { 
     "field1": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "field2": { 
      "type": "string" 
     }, 
     "lastseen": { 
      "type": "long" 
     } 
     } 
    } 
    } 
} 

意思,你添加一个.raw子是not_analyzed这意味着它将按照它的方式进行索引,不进行分析和分解。这是为了使有些“重复的文件发现”成为可能。

然后,你需要使用的field1.raw(重复项)和top_hits子聚集terms聚集,获得每个field1值的单个文件:

GET /lastseen/test/_search 
{ 
    "size": 0, 
    "query": { 
    "query_string": { 
     "query": "dinner" 
    } 
    }, 
    "aggs": { 
    "field1_unique": { 
     "terms": { 
     "field": "field1.raw", 
     "size": 2 
     }, 
     "aggs": { 
     "first_one": { 
      "top_hits": { 
      "size": 1, 
      "sort": [{"lastseen": {"order":"desc"}}] 
      } 
     } 
     } 
    } 
    } 
} 

同样,单一文件由top_hits返回是最高的lastseen(由"sort": [{"lastseen": {"order":"desc"}}]提供的东西)。

你会得到的结果是这些(aggregations下不hits):

... 
    "aggregations": { 
     "field1_unique": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "dinner carrot potato broccoli", 
       "doc_count": 2, 
       "first_one": { 
        "hits": { 
        "total": 2, 
        "max_score": null, 
        "hits": [ 
         { 
          "_index": "lastseen", 
          "_type": "test", 
          "_id": "AU60ZObtjKWeJgeyudI-", 
          "_score": null, 
          "_source": { 
           "field1": "dinner carrot potato broccoli", 
           "field2": "something here", 
           "lastseen": 1000 
          }, 
          "sort": [ 
           1000 
          ] 
         } 
        ] 
        } 
       } 
      }, 
      { 
       "key": "fish chicken something", 
       "doc_count": 2, 
       "first_one": { 
        "hits": { 
        "total": 2, 
        "max_score": null, 
        "hits": [ 
         { 
          "_index": "lastseen", 
          "_type": "test", 
          "_id": "AU60ZObtjKWeJgeyudJA", 
          "_score": null, 
          "_source": { 
           "field1": "fish chicken something", 
           "field2": "dinner", 
           "lastseen": 2000 
          }, 
          "sort": [ 
           2000 
          ] 
         } 
        ] 
        } 
       } 
      } 
     ] 
     } 
    } 
+0

谢谢。这正是我所期待的。 –