7

我使用elasticsearch和需要实现分层对象如下方面的搜索:层次刻面与Elasticsearch

  • 类别1(10)
    • 子类别1(4)
    • 子类别2 (6)
  • 类别2(X)
    • ...

所以我需要方面的两个相关的对象。文件说,这是可能获得此类方面的数值,但我需要它串http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-stats-facet.html

这里是另一个有趣的话题,可惜它的老:http://elasticsearch-users.115913.n3.nabble.com/Pivot-facets-td2981519.html

它可能有弹力的搜索? 如果是这样,我该怎么做?

回答

3

目前,elasticsearch不支持开箱即用的分层分解。但即将发布的1.0版本具有新的aggregations模块,可用于获取这些类型的面(更像是透视面而不是等级面)。版本1.0目前处于测试阶段,您可以download the second beta并自行测试aggregatins。你举的例子可能看起来像

curl -XPOST 'localhost:9200/_search?pretty' -d ' 
{ 
    "aggregations": { 
     "main category": { 
     "terms": { 
      "field": "cat_1", 
      "order": {"_term": "asc"} 
     }, 
     "aggregations": { 
      "sub category": { 
       "terms": { 
        "field": "cat_2", 
        "order": {"_term": "asc"} 
       } 
      } 
     } 
     } 
    } 
}' 

的想法是,以有磨制的每个级别不同的领域和铲斗基于第一级(cat_1)的条款提出你的面。根据第二级的条款(cat_2),这些聚合将具有子桶。结果可能看起来像

{ 
    "aggregations" : { 
    "main category" : { 
     "buckets" : [ { 
     "key" : "category 1", 
     "doc_count" : 10, 
     "sub category" : { 
      "buckets" : [ { 
      "key" : "subcategory 1", 
      "doc_count" : 4 
      }, { 
      "key" : "subcategory 2", 
      "doc_count" : 6 
      } ] 
     } 
     }, { 
     "key" : "category 2", 
     "doc_count" : 7, 
     "sub category" : { 
      "buckets" : [ { 
      "key" : "subcategory 1", 
      "doc_count" : 3 
      }, { 
      "key" : "subcategory 2", 
      "doc_count" : 4 
      } ] 
     } 
     } ] 
    } 
    } 
} 
+0

谢谢!还发现github上的bug和相关的帖子说它将在ES 1.0中修复。实现已经在beta 2中可用。现在玩它:)谢谢! – zonder

5

以前的解决方案的作品真的很好,直到你有没有比单文档多级标签更多。在这种情况下,简单的聚合不起作用,因为lucene字段的平面结构会混合内部聚合的结果。 请参见下面的例子:

DELETE /test_category 
POST /test_category 

# Insert a doc with 2 hierarchical tags 
POST /test_category/test/1 
{ 
    "categories": [ 
    { 
     "cat_1": "1", 
     "cat_2": "1.1" 
    }, 
    { 
     "cat_1": "2", 
     "cat_2": "2.2" 
    } 
    ] 
} 

# Simple two-levels aggregations query 
GET /test_category/test/_search?search_type=count 
{ 
    "aggs": { 
    "main_category": { 
     "terms": { 
     "field": "categories.cat_1" 
     }, 
     "aggs": { 
     "sub_category": { 
      "terms": { 
      "field": "categories.cat_2" 
      } 
     } 
     } 
    } 
    } 
} 

这是错误的反应,我已经在ES 1.4,其中对内部聚集的字段是在文件级混合有:

{ 
    ... 
    "aggregations": { 
     "main_category": { 
     "buckets": [ 
      { 
       "key": "1", 
       "doc_count": 1, 
       "sub_category": { 
        "buckets": [ 
        { 
         "key": "1.1", 
         "doc_count": 1 
        }, 
        { 
         "key": "2.2", <= WRONG 
         "doc_count": 1 
        } 
        ] 
       } 
      }, 
      { 
       "key": "2", 
       "doc_count": 1, 
       "sub_category": { 
        "buckets": [ 
        { 
         "key": "1.1", <= WRONG 
         "doc_count": 1 
        }, 
        { 
         "key": "2.2", 
         "doc_count": 1 
        } 
        ] 
       } 
      } 
     ] 
     } 
    } 
} 

解决方案可以使用嵌套的对象。这些都是做的步骤:

1)定义架构中的一个新的类型与嵌套对象

POST /test_category/test2/_mapping 
{ 
    "test2": { 
    "properties": { 
     "categories": { 
     "type": "nested", 
     "properties": { 
      "cat_1": { 
      "type": "string" 
      }, 
      "cat_2": { 
      "type": "string" 
      } 
     } 
     } 
    } 
    } 
} 

# Insert a single document 
POST /test_category/test2/1 
{"categories":[{"cat_1":"1","cat_2":"1.1"},{"cat_1":"2","cat_2":"2.2"}]} 

2)运行嵌套聚集查询:

GET /test_category/test2/_search?search_type=count 
{ 
    "aggs": { 
    "categories": { 
     "nested": { 
     "path": "categories" 
     }, 
     "aggs": { 
     "main_category": { 
      "terms": { 
      "field": "categories.cat_1" 
      }, 
      "aggs": { 
      "sub_category": { 
       "terms": { 
       "field": "categories.cat_2" 
       } 
      } 
      } 
     } 
     } 
    } 
    } 
} 

这就是响应,现在正确,我得到了:

{ 
     ... 
     "aggregations": { 
      "categories": { 
      "doc_count": 2, 
      "main_category": { 
       "buckets": [ 
        { 
         "key": "1", 
         "doc_count": 1, 
         "sub_category": { 
         "buckets": [ 
          { 
           "key": "1.1", 
           "doc_count": 1 
          } 
         ] 
         } 
        }, 
        { 
         "key": "2", 
         "doc_count": 1, 
         "sub_category": { 
         "buckets": [ 
          { 
           "key": "2.2", 
           "doc_count": 1 
          } 
         ] 
         } 
        } 
       ] 
      } 
      } 
     } 
    } 

相同的解决方案可以延长t o超过两个层次的层面。