2014-10-27 253 views
0

我有一个包含多个嵌套数组的100,000个文档的集合。我需要基于位于最低级别的属性进行查询,并只返回数组底部的对象。MongoDB复杂子文档查询

文档结构:

{ 
    _id: 12345, 
    type: "employee", 
    people: [ 
     { 
      name: "Rob", 
      items: [ 
       { 
        itemName: "RobsItemOne", 
        value: "$10.00", 
        description: "some description about the item" 
       }, 
       { 
        itemName: "RobsItemTwo", 
        value: "$15.00", 
        description: "some description about the item" 
       } 
      ] 
     } 
    ] 
} 

我一直在使用聚合管道来获得其不工作预期的结果,但是性能是相当可怕的。这里是我的查询:

db.collection.aggregate([ 
      { 
       $match: { 
        "type": "employee" 
       } 
      }, 

      {$unwind: "$people"}, 
      {$unwind: "$people.items"}, 
      {$match: {$or: [ //There could be dozens of items included in this $match 
          {"people.items.itemName": "RobsItemOne"}, 
          {"people.items.itemName": "RobsItemTwo"} 
          ] 
        } 
      }, 
      { 
       $project: { 
        _id: 0,// This is because of the $out 
        systemID: "$_id", 
        type: "$type", 
        item: "$people.items.itemName", 
        value: "$people.items.value" 
       } 
      }, 
      {$out: tempCollection} //Would like to avoid this, but was exceeding max document size 
     ]) 

结果是:

[ 
    { 
     "type" : "employee", 
     "systemID" : 12345, 
     "item" : "RobsItemOne", 
     "value" : "$10.00" 
    }, 
    { 
     "type" : "employee", 
     "systemID" : 12345, 
     "item" : "RobsItemTwo", 
     "value" : "$10.00" 
    } 
] 

我能做些什么,使这个查询更快?我已经尝试过使用索引,但是每个Mongo文档都会忽略超过最初$匹配的索引。

回答

0

你可以另外尝试一下,就是在你的$unwind之后加一个$match算子。

...{$unwind: "$people"}, 
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}}, 
{$unwind: "$people.items"}, .... 

这将降低到由以下$unwind$match运营商被查询的记录数。

既然你有大量的记录,你可以利用{allowDiskUse:true} option.which的,

允许写入临时文件。设置为true时,聚合 阶段可以将数据写入dbPath 目录中的_tmp子目录。

所以,你最终的查询想这样的:

db.collection.aggregate([ 
     { 
      $match: { 
       "type": "employee" 
      } 
     }, 

     {$unwind: "$people"}, 
     {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}}, 
     {$unwind: "$people.items"}, 
     {$match: {$or: [ //There could be dozens of items included in this $match 
         {"people.items.itemName": "RobsItemOne"}, 
         {"people.items.itemName": "RobsItemTwo"} 
         ] 
       } 
     }, 
     { 
      $project: { 
       _id: 0,// This is because of the $out 
       systemID: "$_id", 
       type: "$type", 
       item: "$people.items.itemName", 
       value: "$people.items.value" 
      } 
     } 

    ], {allowDiskUse:true}) 
+0

我会给你一个镜头。是聚合管道而不是地图在这里减少适当的选择? – Rob 2014-10-27 19:42:14

+0

请看看这个:http://stackoverflow.com/questions/16310730/mongodb-use-aggregation-framework-or-mapreduce-for-matching-array-of-strings-w – BatScream 2014-10-27 23:53:17

+0

在上面的例子中,所有文档具有唯一的键,因此不会为所有文档调用缩减功能。即使您为所有文档发出公共密钥,reduce函数也必须将大量文档作为输入,处理将比聚合管道慢得多,因为管道会消除$ match阶段中的文档。 – BatScream 2014-10-28 00:07:29

0

我发现有别的东西可以争取后@ BatScream努力改进。你可以试一试。

// if the final result set is relatively small, this index will be helpful. 
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 }); 

var itemCriteria = { 
    $in : [ "RobsItemOne", "RobsItemTwo" ] 
}; 

db.collection.aggregate([ { 
    $match : { 
     "type" : "employee", 
     "people.items.itemName" : itemCriteria  // add this criteria to narrow source range further 
    } 
}, { 
    $unwind : "$people" 
}, { 
    $match : { 
     "people.items.itemName" : itemCriteria  // narrow data range further 
    } 
}, { 
    $unwind : "$people.items" 
}, { 
    $match : { 
     "people.items.itemName" : itemCriteria  // final match, avoid to use $or operator 
    } 
}, { 
    $project : { 
     _id : 0,         // This is because of the $out 
     systemID : "$_id", 
     type : "$type", 
     item : "$people.items.itemName", 
     value : "$people.items.value" 
    } 
}, { 
    $out: tempCollection       // optional 
} ], { 
    allowDiskUse : true 
});