2015-07-22 61 views
4

我有一个索引映射与两个字符串字段,field1field2,都被声明为copy_to到另一个字段称为all_fieldsall_fields被索引为“not_analyzed”。elasticsearch copy_to字段行为不像聚合预期

当我在all_fields上创建存储桶聚合时,我期待将field1和field2的键连接在一起的不同存储桶。取而代之的是,我使用field1和field2的键未分开的单独桶。

实施例: 映射:在

{ 
    "mappings": { 
     "myobject": { 
     "properties": { 
      "field1": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "field2": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
      } 
     } 
     } 
    } 
    } 

数据:

{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    } 

{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    } 

聚合:

{ 
    "aggs": { 
    "t": { 
     "terms": { 
     "field": "all_fields" 
     } 
    } 
    } 
} 

结果:

... 
"aggregations": { 
    "t": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "dinner", 
       "doc_count": 1 
      }, 
      { 
       "key": "dinner carrot potato broccoli", 
       "doc_count": 1 
      }, 
      { 
       "key": "fish chicken something", 
       "doc_count": 1 
      }, 
      { 
       "key": "something here", 
       "doc_count": 1 
      } 
     ] 
    } 
} 

我所期待的只有2桶,fish chicken somethingdinnerdinner carrot potato broccolisomethinghere

我在做什么错?

回答

2

你在找什么是串联的两个字符串。 copy_to即使看起来是在做这件事,事实并非如此。通过copy_to,您概念上可以创建一组来自field1field2的值,而不是将它们连接起来。

您的使用情况下,你有两个选择:

  1. 使用_source transformation
  2. 执行脚本聚集

我会建议_source转型,因为我觉得它比做脚本更有效。意思是,在编制索引时你付出一些代价,而不是做一个沉重的脚本聚合。

对于_source改造

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "transform": { 
     "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']" 
     }, 
     "properties": { 
     "field1": { 
      "type": "string" 
     }, 
     "field2": { 
      "type": "string" 
     }, 
     "lastseen": { 
      "type": "long" 
     }, 
     "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
     } 
     } 
    } 
    } 
} 

和查询:

GET /lastseen/test/_search 
{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "field": "all_fields", 
     "size": 10 
     } 
    } 
    } 
} 

对于脚本聚集,更容易做的(意思是,使用doc['field'].value而不是更昂贵_source.field )将.raw子字段添加到field1field2

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "properties": { 
     "field1": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "field2": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "lastseen": { 
      "type": "long" 
     } 
     } 
    } 
    } 
} 

和脚本将使用这些.raw子字段:

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
} 

没有.raw子域(这是故意做成not_analyzed),你会需要做这样的事情,这是更贵:

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "_source.field1 + ' ' + _source.field2", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
}