2017-03-07 89 views
3

我试图滚动我的ES索引并获取所有文档,但它看起来像我一直缺少初始滚动返回的第一组文档。例如,如果我的滚动大小为10,滚动后我的查询总共返回100,那么我只有90个文档。对我失踪的任何建议?Elasticsearch滚动扫描查询不返回所有文档,缺少第一组

这就是我目前的尝试:

$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}'; 

$params = [ 
    "scroll" => "1m", 
    "size" => 50, 
    "index" => "myindex", 
    "type" => "mytype", 
    "body" => $json 
]; 

$results = $client->search($params); 
$scroll_size = $results['hits']['total']; // returns total docs that match query 
$s_id = $results['_scroll_id']; 

print " total results: " . $scroll_size; 

//scroll 
$count = 0; 
while ($scroll_size > 0) { 
    print " SCROLLING..."; 
    $scroll_results = $client->scroll([ 
     'scroll_id' => $s_id, 
     'scroll' => '1m' 
    ]); 

    // get number of results returned in the last scroll 
    $scroll_size = sizeof($scroll_results['hits']['hits']); 
    print " scroll size: " . $scroll_size; 

    // do something with results 
    for ($i=0; $i<$scroll_size; $i++) { 
     $count++; 
    } 
} 
print " total id count: " . $id_count; 

回答

3

你执行返回的文件数第一次查询,也返回文档。第一个查询是建立滚动并获取第一组文档。处理完第一组结果后,您可以使用scroll_id获取下一页等等。

0

谢谢@Ramdev。是的,我意识到,经过一点挖掘。对于任何其他人这是什么结束了为我工作:

$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}'; 
$count = 0; 
$params = [ 
    "scroll" => "1m", 
    "size" => 50, 
    "index" => "myindex", 
    "type" => "mytype", 
    "body" => $json 
]; 

$results = $client->search($params); 
$scroll_size = $results['hits']['total']; // returns total docs that match query 
$s_id = $results['_scroll_id']; 

print " total results: " . $scroll_size; 

// first set of scroll results 
for ($i=0; $i<$size; $i++) { 
    $count++; 
} 
//scroll 
while ($scroll_size > 0) { 
    print " SCROLLING..."; 
    $scroll_results = $client->scroll([ 
     'scroll_id' => $s_id, 
     'scroll' => '1m' 
    ]); 

    // get number of results returned in the last scroll 
    $scroll_size = sizeof($scroll_results['hits']['hits']); 
    print " scroll size: " . $scroll_size; 

    // do something with results 
    for ($i=0; $i<$scroll_size; $i++) { 
     $count++; 
    } 
} 
print " total id count: " . $id_count; 
相关问题