elasticsearch copy_to字段行为不像聚合预期

我有一个索引映射与两个字符串字段，field1和field2，都被声明为copy_to到另一个字段称为all_fields。 all_fields被索引为“not_analyzed”。elasticsearch copy_to字段行为不像聚合预期

当我在all_fields上创建存储桶聚合时，我期待将field1和field2的键连接在一起的不同存储桶。取而代之的是，我使用field1和field2的键未分开的单独桶。

实施例：映射：在

{ 
    "mappings": { 
     "myobject": { 
     "properties": { 
      "field1": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "field2": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
      } 
     } 
     } 
    } 
    }

数据：

{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    }

和

{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    }

聚合：

{ 
    "aggs": { 
    "t": { 
     "terms": { 
     "field": "all_fields" 
     } 
    } 
    } 
}

结果：

... 
"aggregations": { 
    "t": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "dinner", 
       "doc_count": 1 
      }, 
      { 
       "key": "dinner carrot potato broccoli", 
       "doc_count": 1 
      }, 
      { 
       "key": "fish chicken something", 
       "doc_count": 1 
      }, 
      { 
       "key": "something here", 
       "doc_count": 1 
      } 
     ] 
    } 
}

我所期待的只有2桶，fish chicken somethingdinner和dinner carrot potato broccolisomethinghere

我在做什么错？

来源

2015-07-22 adapt-dev

你在找什么是串联的两个字符串。 copy_to即使看起来是在做这件事，事实并非如此。通过copy_to，您概念上可以创建一组来自field1和field2的值，而不是将它们连接起来。

您的使用情况下，你有两个选择：

使用_source transformation
执行脚本聚集

我会建议_source转型，因为我觉得它比做脚本更有效。意思是，在编制索引时你付出一些代价，而不是做一个沉重的脚本聚合。

对于_source改造：

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "transform": { 
     "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']" 
     }, 
     "properties": { 
     "field1": { 
      "type": "string" 
     }, 
     "field2": { 
      "type": "string" 
     }, 
     "lastseen": { 
      "type": "long" 
     }, 
     "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
     } 
     } 
    } 
    } 
}

和查询：

GET /lastseen/test/_search 
{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "field": "all_fields", 
     "size": 10 
     } 
    } 
    } 
}

对于脚本聚集，更容易做的（意思是，使用doc['field'].value而不是更昂贵_source.field ）将.raw子字段添加到field1和field2：

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "properties": { 
     "field1": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "field2": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "lastseen": { 
      "type": "long" 
     } 
     } 
    } 
    } 
}

和脚本将使用这些.raw子字段：

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
}

没有.raw子域（这是故意做成not_analyzed），你会需要做这样的事情，这是更贵：

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "_source.field1 + ' ' + _source.field2", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
}

来源

2015-07-22 07:07:03

elasticsearch copy_to字段行为不像聚合预期

回答

相关问题