2013-02-18 66 views
1

我的问题是:Elasticsearch计数与我的数据库不一样。对嵌套资源使用AND的过滤器

我收录的 “用户” 表中,每个用户可以有一个或多个apps_events:

curl localhost:9200/users/_count 
{"count":190291,"_shards":{"total":5,"successful":5,"failed":0}} 

SELECT COUNT(*) FROM users 
count : 190291 

=>相同的计数,一切都很好!

但是,当我做2个过滤器,一个词,一个方面一个嵌套的资源搜索:

curl -X GET 'http://localhost:9200/users/user/_search?load=&size=10&pretty' -d ' 
{ 
"query": { 
    "match_all": { 
    } 
}, 
"filter": { 
    "and": [ 
    { 
     "terms": { 
     "apps_events.type": [ 
      "sale" 
     ] 
     } 
    }, 
    { 
     "term": { 
     "apps_events.status": "active" 
     } 
    } 
    ] 
}, 
"size": 10 
} 

total : 63756 

而且在我的数据库:

SELECT 
    COUNT(DISTINCT(users_id)) 
FROM 
    apps_event 
WHERE 
    apps_event_state_id = 1 AND apps_event_project_id = 2; 

count : 63340 

因为实际上,elasticsearch SQL等价查询是:

SELECT 
    COUNT(DISTINCT(users_id)) 
FROM apps_event 
WHERE apps_event_state_id = 1 
AND users_id IN 
    (SELECT DISTINCT(users_id) FROM apps_event WHERE apps_event_project_id = 2) 

count : 63756 

===>如何为每个资源做一个简单的“AND”?

感谢

回答

0

你可能选中此,而是apps_event_project_id正确的推论apps_events.type?他们在表面上看起来并不一样,但你肯定知道。另外,users_id是否直接映射到ES _id?这可能是因为你的索引中存在重复数据而导致数据膨胀。

+1

是没有重复的,但是我终于发现,apps_events是一个嵌套的ressource,当你有这样的elasticsearch搜索,在真正的问题: SELECT COUNT (DISTINCT(users_id))FROM apps_event WHERE apps_event_state_id = 1 AND users_id IN(SELECT DISTINCT(users_id)FROM apps_event WHERE apps_event_project_id = 2); 合计:63756 – zywx 2013-02-18 16:30:54

+0

感谢您的跟进。我甚至没有想过嵌套! – drewr 2013-02-18 19:05:04