MongoDB复杂子文档查询

我有一个包含多个嵌套数组的100,000个文档的集合。我需要基于位于最低级别的属性进行查询，并只返回数组底部的对象。MongoDB复杂子文档查询

文档结构：

{ 
    _id: 12345, 
    type: "employee", 
    people: [ 
     { 
      name: "Rob", 
      items: [ 
       { 
        itemName: "RobsItemOne", 
        value: "$10.00", 
        description: "some description about the item" 
       }, 
       { 
        itemName: "RobsItemTwo", 
        value: "$15.00", 
        description: "some description about the item" 
       } 
      ] 
     } 
    ] 
}

我一直在使用聚合管道来获得其不工作预期的结果，但是性能是相当可怕的。这里是我的查询：

db.collection.aggregate([ 
      { 
       $match: { 
        "type": "employee" 
       } 
      }, 

      {$unwind: "$people"}, 
      {$unwind: "$people.items"}, 
      {$match: {$or: [ //There could be dozens of items included in this $match 
          {"people.items.itemName": "RobsItemOne"}, 
          {"people.items.itemName": "RobsItemTwo"} 
          ] 
        } 
      }, 
      { 
       $project: { 
        _id: 0,// This is because of the $out 
        systemID: "$_id", 
        type: "$type", 
        item: "$people.items.itemName", 
        value: "$people.items.value" 
       } 
      }, 
      {$out: tempCollection} //Would like to avoid this, but was exceeding max document size 
     ])

结果是：

[ 
    { 
     "type" : "employee", 
     "systemID" : 12345, 
     "item" : "RobsItemOne", 
     "value" : "$10.00" 
    }, 
    { 
     "type" : "employee", 
     "systemID" : 12345, 
     "item" : "RobsItemTwo", 
     "value" : "$10.00" 
    } 
]

我能做些什么，使这个查询更快？我已经尝试过使用索引，但是每个Mongo文档都会忽略超过最初$匹配的索引。

来源

2014-10-27 Rob

你可以另外尝试一下，就是在你的$unwind之后加一个$match算子。

...{$unwind: "$people"}, 
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}}, 
{$unwind: "$people.items"}, ....

这将降低到由以下$unwind和$match运营商被查询的记录数。

既然你有大量的记录，你可以利用{allowDiskUse:true} option.which的，

允许写入临时文件。设置为true时，聚合阶段可以将数据写入dbPath 目录中的_tmp子目录。

所以，你最终的查询想这样的：

db.collection.aggregate([ 
     { 
      $match: { 
       "type": "employee" 
      } 
     }, 

     {$unwind: "$people"}, 
     {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}}, 
     {$unwind: "$people.items"}, 
     {$match: {$or: [ //There could be dozens of items included in this $match 
         {"people.items.itemName": "RobsItemOne"}, 
         {"people.items.itemName": "RobsItemTwo"} 
         ] 
       } 
     }, 
     { 
      $project: { 
       _id: 0,// This is because of the $out 
       systemID: "$_id", 
       type: "$type", 
       item: "$people.items.itemName", 
       value: "$people.items.value" 
      } 
     } 

    ], {allowDiskUse:true})

来源

2014-10-27 19:13:47 BatScream

我会给你一个镜头。是聚合管道而不是地图在这里减少适当的选择？ – Rob 2014-10-27 19:42:14

请看看这个：http：//stackoverflow.com/questions/16310730/mongodb-use-aggregation-framework-or-mapreduce-for-matching-array-of-strings-w – BatScream 2014-10-27 23:53:17

在上面的例子中，所有文档具有唯一的键，因此不会为所有文档调用缩减功能。即使您为所有文档发出公共密钥，reduce函数也必须将大量文档作为输入，处理将比聚合管道慢得多，因为管道会消除$ match阶段中的文档。 – BatScream 2014-10-28 00:07:29

我发现有别的东西可以争取后@ BatScream努力改进。你可以试一试。

// if the final result set is relatively small, this index will be helpful. 
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 }); 

var itemCriteria = { 
    $in : [ "RobsItemOne", "RobsItemTwo" ] 
}; 

db.collection.aggregate([ { 
    $match : { 
     "type" : "employee", 
     "people.items.itemName" : itemCriteria  // add this criteria to narrow source range further 
    } 
}, { 
    $unwind : "$people" 
}, { 
    $match : { 
     "people.items.itemName" : itemCriteria  // narrow data range further 
    } 
}, { 
    $unwind : "$people.items" 
}, { 
    $match : { 
     "people.items.itemName" : itemCriteria  // final match, avoid to use $or operator 
    } 
}, { 
    $project : { 
     _id : 0,         // This is because of the $out 
     systemID : "$_id", 
     type : "$type", 
     item : "$people.items.itemName", 
     value : "$people.items.value" 
    } 
}, { 
    $out: tempCollection       // optional 
} ], { 
    allowDiskUse : true 
});

来源

2014-10-28 01:31:47 Wizard

MongoDB复杂子文档查询

回答

相关问题