MongoDB集合上的MapReduce变为空

我一直在试图将许多大型数据集合放入一个集合中，但我遇到了编写MapReduce函数的问题。MongoDB集合上的MapReduce变为空

这是我的数据是什么样子（这里有17行，在现实中我有4+万元）：

{"user": 1, "day": 1, "type": "a", "sum": 10} 
{"user": 1, "day": 2, "type": "a", "sum": 32} 
{"user": 1, "day": 1, "type": "b", "sum": 11} 
{"user": 2, "day": 4, "type": "b", "sum": 2} 
{"user": 1, "day": 2, "type": "b", "sum": 1} 
{"user": 1, "day": 3, "type": "b", "sum": 9} 
{"user": 1, "day": 4, "type": "b", "sum": 12} 
{"user": 2, "day": 2, "type": "a", "sum": 3} 
{"user": 3, "day": 2, "type": "b", "sum": 81} 
{"user": 1, "day": 4, "type": "a", "sum": 22} 
{"user": 1, "day": 5, "type": "a", "sum": 39} 
{"user": 2, "day": 5, "type": "a", "sum": 8} 
{"user": 2, "day": 3, "type": "b", "sum": 1} 
{"user": 3, "day": 3, "type": "b", "sum": 99} 
{"user": 2, "day": 3, "type": "a", "sum": 5} 
{"user": 1, "day": 3, "type": "a", "sum": 41} 
{"user": 3, "day": 4, "type": "b", "sum": 106} 
...

我试图让它看起来像这样到底（数组每种类型的，其中的内容都只是由天决定，如果那天没有该类型存在合适的索引的总和，它只是0）：

{"user": 1, "type_a_sums": [10, 32, 41, 22, 39], "type_b_sums": [11, 1, 9, 12, 0]} 
{"user": 2, "type_a_sums": [0, 3, 5, 0, 8], "type_b_sums": [0, 0, 1, 2, 0]} 
{"user": 3, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 81, 99, 106, 0]} 
...

这是MapReduce的我一直尝试：

var mapsum = function(){ 
    var output = {user: this.user, type_a_sums: [0, 0, 0, 0, 0], type_b_sums: [0, 0, 0, 0, 0], tempType: this.type, tempSum: this.sum, tempDay: this.day} 

    if(this.type == "a") { 
     output.type_a_sums[this.day-1] = this.sum; 
    } 

    if(this.type == "b") { 
     output.type_b_sums[this.day-1] = this.sum; 
    } 

    emit(this.user, output); 
}; 

var r = function(key, values) { 
    var outs = {user: 0, type_a_sums: [0, 0, 0, 0, 0], type_b_sums: [0, 0, 0, 0, 0], tempType: -1, tempSum: -1, tempDay: -1} 

    values.forEach(function(v){ 

     outs.user = v.user; 

     if(v.tempType == "a") { 
      outs.type_a_sums[v.tempDay-1] = v.tempSum; 
     } 

     if(v.tempType == "b") { 
      outs.type_b_sums[v.tempDay-1] = v.tempSum; 
     } 

    }); 

    return outs; 
}; 


res = db.sums.mapReduce(mapsum, r, {out: 'joined_sums'})

这给了我，我在小样本输出，但是当我运行在所有4个万I得到一吨的输出看起来像这样：

{"user": 1, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 0, 0, 0, 0]} 
{"user": 2, "type_a_sums": [0, 3, 5, 0, 8], "type_b_sums": [0, 0, 1, 2, 0]} 
{"user": 3, "type_a_sums": [0, 0, 0, 0, 0], "type_b_sums": [0, 0, 0, 0, 0]}

凡users很大一部分应该有它们的数组中的和实际上只是填充了reduce函数outs对象中虚拟数组中的0，然后才用实际函数填充它们。

真奇怪的是，如果我在同一个集合上运行相同的确切函数，但只检查一个用户res = db.sums.mapReduce(mapsum, r, {query: {user: 1}, out: 'joined_sums'})，我知道他们的数组中应该有总和，但以前一直都是0，我会实际得到我只需要该用户的输出。再次运行400万，我回到0的地方。这就好像它只是写了所有与虚拟填充阵列相关的工作。

我有太多数据吗？考虑到时间，它不应该能够通过它吗？或者我遇到了一些我不知道的障碍？

来源

2012-03-25 TFX

有没有机会在过时的MongoDB版本中发现错误？ – maerics 2012-03-26 04:32:39

我认为它与'reduce（）'每个键不止一次被调用有关。我正在尝试使用'finalize'，但我很困惑它是如何工作的。 – TFX 2012-03-26 05:15:53

谢谢你提供很多细节。这里有几个问题。

让我们从顶部开始。

我试图让它看起来像这样到底

{ “用户”：2， “type_a_sums”：0，3，5，0,8]， “type_b_sums” ：[0，0，1，2，0]}

它实际上将是这样的：

{ _id: { "user": 2 }, value: { "type_a_sums": [0, 3, 5, 0, 8], "type_b_sums": [0, 0, 1, 2, 0] }

注意_id有点像你的 “分组依据” 和value是怎么样的像你的“总和”栏。

所以问题1是你发出user作为你的钥匙，但它也是你价值的一部分。这不是必需的。的减少只会减少共享同一个密钥的两个值，则不需要此行之一：outs.user = v.user;

你也有问题2：你的reduce是不正确。

我认为它与reduce（）被称为每个键不止一次有关。

reduce()的目标是它将被多次调用。它应该跨服务器扩展。因此，一台服务器可能会调用几次，这些结果可能会合并并发送到另一台服务器。

以下是查看它的不同方法。 Reduce获取一组value对象，并将它们减少为一个对象value对象。

这里有一些推论：

如果我做reduce([a, b])，它应该是一样reduce([b, a])。
如果我做reduce([a, reduce([b,c]))应该是一样reduce([reduce([a,b]), c])

所以它不应该不管什么样的顺序我还是多少倍，价值被降低运行他们，它总是相同的输出。

如果你看看你的代码，这不是发生了什么事情。只要看看type_a_sums。如果我得到以下两个values即将减少会发生什么？

reduce([ [0,0,1,0,0], [0,2,0,0,0] ]) => ???

对我来说，这看起来像输出应该是[0,2,1,0,0]。如果这是真的，那么你不需要所有这些temp_X字段。相反，您需要关注正确的数组，然后正确地合并这些数组。

来源

2012-03-26 07:12:41

谢谢！我已经更改了代码（http://pastie.org/private/dc9gizrsrzckzq6hvjkqq），但直到早上才能运行它。我删除了temp值，而是改变了'reduce（）'中的代码，将每个'emit（）'数组的元素添加到'reduce（）'输出数组中的相应元素。这是否更有意义？我不明白为什么只使用数组会有所帮助。为什么不添加'temp'变量中的元素同样适用。这样，我只是使用一个数组的元素来代替“int”，但我觉得它是一回事。我需要'敲定'吗？ – TFX 2012-03-26 07:42:56

这*看起来好多了。您不需要临时工的原因是排放*的输出可能是最终结果。它可能不是，但它可能是。考虑到一天结束时你真正想要的是数组，你希望数组在每个阶段都是正确的。 – 2012-03-26 18:21:17

MongoDB集合上的MapReduce变为空

回答

相关问题