我一直在试图将许多大型数据集合放入一个集合中,但我遇到了编写MapReduce函数的问题。MongoDB集合上的MapReduce变为空
这是我的数据是什么样子(这里有17行,在现实中我有4+万元):
{"user": 1, "day": 1, "type": "a", "sum": 10}
{"user": 1, "day": 2, "type": "a", "sum": 32}
{"user": 1, "day": 1, "type": "b", "sum": 11}
{"user": 2, "day": 4, "type": "b", "sum": 2}
{"user": 1, "day": 2, "type": "b", "sum": 1}
{"user": 1, "day": 3, "type": "b", "sum": 9}
{"user": 1, "day": 4, "type": "b", "sum": 12}
{"user": 2, "day": 2, "type": "a", "sum": 3}
{"user": 3, "day": 2, "type": "b", "sum": 81}
{"user": 1, "day": 4, "type": "a", "sum": 22}
{"user": 1, "day": 5, "type": "a", "sum": 39}
{"user": 2, "day": 5, "type": "a", "sum": 8}
{"user": 2, "day": 3, "type": "b", "sum": 1}
{"user": 3, "day": 3, "type": "b", "sum": 99}
{"user": 2, "day": 3, "type": "a", "sum": 5}
{"user": 1, "day": 3, "type": "a", "sum": 41}
{"user": 3, "day": 4, "type": "b", "sum": 106}
...
我试图让它看起来像这样到底(数组每种类型的,其中的内容都只是由天决定,如果那天没有该类型存在合适的索引的总和,它只是0):
{"user": 1, "type_a_sums": [10, 32, 41, 22, 39], "type_b_sums": [11, 1, 9, 12, 0]}
{"user": 2, "type_a_sums": [0, 3, 5, 0, 8], "type_b_sums": [0, 0, 1, 2, 0]}
{"user": 3, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 81, 99, 106, 0]}
...
这是MapReduce的我一直尝试:
var mapsum = function(){
var output = {user: this.user, type_a_sums: [0, 0, 0, 0, 0], type_b_sums: [0, 0, 0, 0, 0], tempType: this.type, tempSum: this.sum, tempDay: this.day}
if(this.type == "a") {
output.type_a_sums[this.day-1] = this.sum;
}
if(this.type == "b") {
output.type_b_sums[this.day-1] = this.sum;
}
emit(this.user, output);
};
var r = function(key, values) {
var outs = {user: 0, type_a_sums: [0, 0, 0, 0, 0], type_b_sums: [0, 0, 0, 0, 0], tempType: -1, tempSum: -1, tempDay: -1}
values.forEach(function(v){
outs.user = v.user;
if(v.tempType == "a") {
outs.type_a_sums[v.tempDay-1] = v.tempSum;
}
if(v.tempType == "b") {
outs.type_b_sums[v.tempDay-1] = v.tempSum;
}
});
return outs;
};
res = db.sums.mapReduce(mapsum, r, {out: 'joined_sums'})
这给了我,我在小样本输出,但是当我运行在所有4个万I得到一吨的输出看起来像这样:
{"user": 1, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 0, 0, 0, 0]}
{"user": 2, "type_a_sums": [0, 3, 5, 0, 8], "type_b_sums": [0, 0, 1, 2, 0]}
{"user": 3, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 0, 0, 0, 0]}
凡users
很大一部分应该有它们的数组中的和实际上只是填充了reduce
函数outs
对象中虚拟数组中的0,然后才用实际函数填充它们。
真奇怪的是,如果我在同一个集合上运行相同的确切函数,但只检查一个用户res = db.sums.mapReduce(mapsum, r, {query: {user: 1}, out: 'joined_sums'})
,我知道他们的数组中应该有总和,但以前一直都是0,我会实际得到我只需要该用户的输出。再次运行400万,我回到0的地方。这就好像它只是写了所有与虚拟填充阵列相关的工作。
我有太多数据吗?考虑到时间,它不应该能够通过它吗?或者我遇到了一些我不知道的障碍?
有没有机会在过时的MongoDB版本中发现错误? – maerics 2012-03-26 04:32:39
我认为它与'reduce()'每个键不止一次被调用有关。我正在尝试使用'finalize',但我很困惑它是如何工作的。 – TFX 2012-03-26 05:15:53