我有2点集合,说A和B
实例A:
[
{"Account": "99", "Cat_1": "Losses", "Cat_2": "Marketing"},
{"Account": "89", "Cat_1": "Losses", "Cat_2": "Consultancy"},
{"Account": "79", "Cat_1": "Losses", "Cat_2": "Marketing"},
{"Account": "69", "Cat_1": "Losses", "Cat_2": "Consultancy"},
{"Account": "59", "Cat_1": "Profits", "Cat_2": "Marketing"},
{"Account": "49", "Cat_1": "Profits", "Cat_2": "Consultancy"},
{"Account": "29", "Cat_1": "Profits", "Cat_2": "Marketing"},
{"Account": "00", "Cat_1": "Profits", "Cat_2": "Consultancy"}
...
]
例B:
[
{"Name": "Example A", "Year": 2014, "Account": "99", "Amount": -5000},
{"Name": "Example A", "Year": 2015, "Account": "99", "Amount": -5000},
{"Name": "Example A", "Year": 2014, "Account": "89", "Amount": -2000},
{"Name": "Example A", "Year": 2015, "Account": "79", "Amount": -3000},
{"Name": "Example A", "Year": 2014, "Account": "69", "Amount": 0},
{"Name": "Example A", "Year": 2015, "Account": "59", "Amount": 100},
{"Name": "Example A", "Year": 2016, "Account": "49", "Amount": 5000},
{"Name": "Example A", "Year": 2014, "Account": "29", "Amount": 4000},
{"Name": "Example A", "Year": 2015, "Account": "00", "Amount": 900},
{"Name": "Example B", "Year": 2013, "Account": "99", "Amount": -500},
{"Name": "Example B", "Year": 2011, "Account": "89", "Amount": -10000},
...
]
现在我想,例如,要获取其类型的所有“Cat_1”帐户以此结束:
[
{"cat": "Losses", "Accounts": ["99", "89", "79", "69"]},
{"cat": "Profits", "Accounts": ["59", "49", "29", "00"]}
]
或者我会为某个类别获取Cat_n
并获得相似的结果。
接下来,我展开帐户并在集合B上执行查找。这是出错的地方,并且超过了最大文档大小。我要指出,我只是在一个时间1个utiliser感兴趣,所以我的查询看起来像这样的时刻:
...
{
"$lookup": {
"from": "collection_B",
"localField": "Account",
"foreignField": "Account",
"as": "results"
}
},
{
"$addFields": {
"results": {
"$filter": {
"input": "$results",
"as": "comp",
"cond": {
"$eq": [
"$$results.Name", "Example A"
]
}
}
}
}
},
...
我用$addFields
覆盖原来的结果领域的查找后,因为其中大部分人我不想要,因为我只对特定的用户感兴趣。
第二个集合中有大约10M个文档,每个utiliser约为300k。所以在查找之后,结果中不会超过300k。当请求cat_1
类别时,结果将是两个阵列“损失”和“利润”,它们都包含大约800个账户。
我减小文档大小$project
只包含我实际需要的字段。此外,我还尽可能早地使用$match
以消除聚合中不需要的文档。
虽然这一切都没有帮助,但该文档不断超出16MB BSON限制。只有使用$limit
时,如果值为±300,结果将被返回并且缺少信息。
什么我中生成包含这样的事情对于一个给定utiliser和Cat_n
{
"Name": "Example A",
"Losses": [
{"Year": 2014, "Amount": ...},
{"Year": 2015, "Amount": ...},
{"Year": 2016, "Amount": ...}
],
"Profits": [
{"Year": 2014, "Amount": ...},
{"Year": 2015, "Amount": ...},
{"Year": 2016, "Amount": ...}
],
}
我一直在为获得该类别想着刚刚创建了两个单独的聚合,一个的文件,最终有兴趣1个用于汇总来自B集合的结果。但是,我必须检查每个文档以找出它属于哪个类别,这看起来效率不高。 或者,我可以创建第三个集合,将两个集合中的文档合并到一起,然后在那里进行集合,但如果可能,我宁愿避免这样做,因为这会在稍后维护或审阅此数据时增加额外的复杂性。
您是否期待这样“2.6版本中已更改:db.collection.aggregate()方法返回一个游标并可返回任意大小的结果集。以前的版本将所有结果都返回到单个文档中,结果集的大小限制为16兆字节“https://docs.mongodb.com/manual/reference/method/db.collection.aggregate/ –
@DanieleTassone,有趣的功能!虽然它现在没有解决我的问题,但我有一种感觉,我可能会在某处使用它 – kbao