10

I'm working on a complicated map-reduce process for a mongodb database. I've split some of the more complex code off into modules, which I then make available to my map/reduce/finalize functions by including it in my scopeObj like so:

  const scopeObj = {
    userCalculations: require('../lib/userCalculations')
  }

  function myMapFn() {
    let userScore = userCalculations.overallScoreForUser(this)
    emit({
      'Key': this.userGroup
    }, {
      'UserCount': 1,
      'Score': userScore
    })
  }

  function myReduceFn(key, objArr) { /*...*/ }

  db.collection('userdocs').mapReduce(
    myMapFn,
    myReduceFn,
    {
      scope: scopeObj,
      query: {},
      out: {
        merge: 'userstats'
      }
    },
    function (err, stats) {
      return cb(err, stats);
    }
  )

...This all works fine. I had until recently thought it wasn't possible to include module code into a map-reduce scopeObj, but it turns out that was just because the modules I was trying to include all had dependencies on other modules. Completely standalone modules appear to work just fine.

Which brings me (finally) to my question. How can I -- or, for that matter, should I -- incorporate more complex modules, including things I've pulled from npm, into my map-reduce code? One thought I had was using Browserify or something similar to pull all my dependencies into a single file, then include it somehow... but I'm not sure what the right way to do that would be. And I'm also not sure of the extent to which I'm risking severely bloating my map-reduce code, which (for obvious reasons) has got to be efficient.

Does anyone have experience doing something like this? How did it work out, if at all? Am I going down a bad path here?

UPDATE: A clarification of what the issue is I'm trying to overcome: In the above code, require('../lib/userCalculations') is executed by Node -- it reads in the file ../lib/userCalculations.js and assigns the contents of that file's module.exports object to scopeObj.userCalculations. But let's say there's a call to require(...) somewhere within userCalculations.js. That call isn't actually executed yet. So, when I try to call userCalculations.overallScoreForUser() within the Map function, MongoDB attempts to execute the require function. And require isn't defined on mongo.

Browserify, for example, deals with this by compiling all the code from all the required modules into a single javascript file with no require calls, so it can be run in the browser. But that doesn't exactly work here, because I need to be the resulting code to itself be a module that I can use like I use userCalculations in the code sample. Maybe there's a weird way to run browserify that I'm not aware of? Or some other tool that just "flattens" a whole hierarchy of modules into a single module?

Hopefully that clarifies a bit.

8
  • I don't know answer about accessing the modules here but would you be willing to consider alternative which is rewrite the map reduce code using aggregation framework. If yes, see if you can post the relevant code from map and reduce from other modules. More here Commented Mar 14, 2018 at 16:39
  • @Veeram Unless I'm missing something, I don't think the Aggregation Framework will work for me -- I need to be able to do some pretty complex calculations in the reduce stage, and I also need to be able to do incremental updates (i.e., "merge" style output). Commented Mar 14, 2018 at 17:24
  • How much 'control' do you have over the dependency hierarchy? Means are there any hidden dependencies / 3rd party code? I am not sure if this is the cause of your problem but it if it is, it could be tackled. Commented Mar 15, 2018 at 11:37
  • @Jankapunkt Please see my update above. Commented Mar 15, 2018 at 13:48
  • Okay I see. What if you wrap the require in a self executing function like so: userCalculations: (function(){ return require('../lib/userCalculations') })() ? It should resolve the required modules first. Only problem would then be to make sure, that this function itself is not executed before it's expected turn at runtime. Commented Mar 15, 2018 at 13:52

1 Answer 1

3
+100

As a generic response, the answer to your question: How can I -- or, for that matter, should I -- incorporate more complex modules, including things I've pulled from npm, into my map-reduce code? - is no, you can not safely include complex modules in node code you plan to send to MongoDB for mapReduce jobs.

You mentioned the problem yourself - nested require statements. Now, require is sync, but if you have nested functions inside, these require calls would not be executed until call time, and MongoDB VM would throw at this point.

Consider the following example of three files: data.json, dep.js and main.js.

// data.json - just something we require "lazily"
false

// dep.js -- equivalent of your userCalculations
module.exports = {
  isValueTrue() {
    // The problem: nested require
    return require('./data.json');
  }
}


// main.js - from here you send your mapReduce to MongoDB.
// require dependency instantly
const calc = require('./dep.js');
// require is synchronous, the effectis the same if you do:
//   const calc = (function () {return require('./dep.js')})();

console.log('Calc is loaded.');
// Let's mess with unwary devs
require('fs').writeFileSync('./data.json', 'false');

// Is calc.isValueTrue() true or false here?
console.log(calc.isValueTrue());

As a general solution, this is not feasible. While vast majority of modules will likely not have nested require statements, HTTP calls, or even internal, service calls, global variables and similar, there are those who do. You cannot guarantee that this would work.

Now, as a your local implementation: e.g. you require exactly specific versions of NPM modules that you have tested well with this technique and you know it will work, or you published them yourself, it is somewhat feasible.

However, even if this case, if this is a team effort, there's bound to be a developer down the line who will not know where your dependency is used or how, use globals (not on purpose, but by ommission, e.g they wrongly calculate this) or simply not know the implications of whatever they are doing. If you have strong integration testing suite, you could guard against this, but the thing is, it's unpredictable. Personally I think that when you can choose between unpredictable and predictable, almost always you should use predictable.

Now, if you have an explicitly stated purpose for a certain library to be used in MongoDB mapReduce, this would work. You would have to guard well against ommisions and problems, and have strong testing infra, but I would make certain the purpose is explicit before feeling safe enough to do this. But of course, if you're using something that is so complex that you need several npm packages to do, maybe you can have those functions directly on MongoDB server, maybe you can do your mapReducing in something better suited for the purpose, or similar.

To conclude: As a purposefuly built library with explicit mission statement that it is to be used with node and MongoDB mapReduce, I would make sure my tests cover all my mission-critical and important functionality, and then import such npm package. Otherwise I would not use nor recommend this approach.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for all the information here (I was away for the weekend, hence the late response). You haven't addressed the concept of using something like browserify or webpack -- i.e. tools which compile a bunch of JS files including required dependencies into a single file. Do you have any thoughts on that, as relates to your answer? Thanks!
The problem comes down the same. Webpack doesn't and should not actually call your methods. It basically says "if the code require's 'some-module', I will load this file, or show this already loaded stuff". So it cannot go deep through this. What you could in theory do is write, e.g. something like a babel/webpack plugin, that walks the AST, and if it finds require or import node, it calls this right away and puts its content instead. Maybe such a thing already exist. Writing such a plugin would be a super interesting task. Being accountable for it when it goes in production? Nope.
That very last point -- being accountable for it in production code -- is a very good one. Going to mark this as correct; you've convinced me to rethink this idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.