7

I've written a MapReduce in MongoDB and would like to use a global variable as a cache to write to/read from. I know it is not possible to have global variables across map function instances - I just want a global variable within each function instance. This type of functionality exists in Hadoop's MapReduce so I was expecting it to be there in MongoDB. But the following does not seem to work:

var cache = {}; // Does not seem to work!
function () {
  var hashValue = this.varValue1 + this.varValue2;
  if(typeof(cache[hashValue])!= 'undefined') {
    // Do nothing, we've processed at least one input record with this hash
  } else {
    // Process the input record
    // Cache the record
    cache[hashValue] = '1';
  }
}

Is this not allowed in MongoDB's MapReduce implementation, or am I doing something wrong in JavaScript (not experienced in JS)?

2
  • OK, I've flipped through this again and there is a point of confusion here. Is this function your map or your reduce? If you want an "ad-hoc" cache, you can simply create a temporary collection in Mongo and reference that from the map or reduce. However, without knowing both the map() and reduce() functions it's hard to say if you can't just solve this problem in the reduce phase. Commented Jun 11, 2010 at 15:46
  • This is the map function. I could do this at the reduce function but I have other stuff I need to do at that point, i.e. aggregate some values. I could also create a collection in MongoDB to serve as a cache - in fact that's what I did in the first place. However this is not an ideal solution (locking issues if there are multiple map function instances may slow things down) + this is a feature that already exists in Hadoop's MapReduce, so was expecting it here as well. Feel free to call me a nitpicker, but I believe it's something that needs to be fixed in MongoDB. Commented Jun 16, 2010 at 9:17

2 Answers 2

5

Looking at the docs, I'm finding the following:

db.runCommand(
 { mapreduce : <collection>,
   map : <mapfunction>,
   reduce : <reducefunction>
   [, scope : <object where fields go into javascript global scope >]
 }
);

I think that "scope" variable is what you need.

There's a test / example on Github that uses the "scope" variable.

I'm still new to this stuff, but hopefully that's enough to get you started.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks - but no, scope is not global read/write, only read. By the way, this example seems not to be using the variable xx that scope is passing to the map/reduce functions.
The variables in the scope are perfectly writable.
github link is dead
1

As Gates VP said, you need to add cache into global scope. So, to provide complete answer, considering your script, this is what you'll need to do:

db.runCommand(
 { mapreduce : <your collection>,
   map : <your map function, or reference to it>,
   reduce : <your reduce function, or reference to it>,
   scope : { cache : {} }
 }
);

The command will inject contents of the 'scope' object parameter into your global context. The caching then will work per how you are using it in your map function. I've tested this.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.