24

I'd like to initialize module in asynchronous way and come up with couple of ideas. I need DB object with list of collections from Mongo and other data, but list of files in ./ will do for brevity.

I can't export function or class because I need require('db') to return same object everytime.


First and simplest what came to my mind is to assign module.exports to Object and populate it later:

var exports = {};
module.exports = exports;

require('fs').readdir('.', function(err, files) {
  exports.error = err;
  exports.files = files;
});

Bad thing — I don't really know from outside when list is ready and no good way to check for errors.


Second way I've comed up with is to inherit EventEmitter and notify everyone that DB is ready or error occured. If everything ok - keep going.

var events = require('events');
var util = require('util');

function Db() {
  events.EventEmitter.call(this);
  this.ready = false;
  this.files = null;
  this.initialize();
}

util.inherits(Db, events.EventEmitter);

Db.prototype.initialize = function() {
  if (this.ready)
    return this.emit('ready');

  var self = this;
  require('fs').readdir('.', function(err, files) {
    if (err)
      return self.emit('error', err);

    self.files = files;
    self.ready = true;
    self.emit('ready');
  });
};

module.exports = new Db();

And now I think that's more reasonable:

// db.js
var exports = {init: init};
module.exports = exports;

function init(callback) {
  callback = (typeof callback === 'function') ? callback : function() {};
  require('fs').readdir('.', function(err, files) {
    delete exports.init;
    exports.result = files; // that's pretty much what I need,
                            // so don't mind result slightly differs
                            // from previous cases
    callback(err);
  });
}
// main.js
var db = require('./db');

// check for `db.init` presence maybe...

db.init(function(err) {
  return err ? console.error('Bad!')
             : console.log(db); // It works!
});

What should I pick and why? How bad is such idea in general and my options in particular?

Thanks for feedback.

3 Answers 3

27
+50

TL;DR: Use readdirSync() instead of readdir() if you're just planning to read local files at startup time. If you're planning to actually read data from remote database or do any I/O at runtime, use your option #2 - the callback. Explanation and code examples below.

Detailed explanation:

While at first this might seem like a module/dependecy/require-related question, it's really not. It's a generic question of how to handle asynchronous code. Let me explain:

require() is basically the only synchronous function widely used throughout node that deals with I/O (it requires other modules from filesystem). Synchronous means it actually returns it's data as return value, instead of calling a callback.

The most basic 101 rule in asynchronous programming is:

You can never take an asynchronous piece of code and create a synchronous API for it.

require uses a special synchronous version of readFile called readFileSync. Since modules are really only loaded at the start of the program, the fact that it blocks the node.js execution while it's reading the module is not a problem.

In your example however, you try to perform additional asynchronous I/O - readdir() done during the require stage. Thus, you either need to use synchronous version of this command or the API needs to change...

So there's the background to your problem.

You identified the two basic options:

  1. using a promise (which is essentially the same as your EventEmitter example)
  2. using a callback (your second example shows this well) and a third is:
  3. using a synchronous version of the readdir() command called readdirSync()

I would use the option #3 for simplicity reason - but only if you're planning to just read a couple files at startup time as your example implies. If later your DB module is actually going to connect to a database - or if you're planning to do any of this at runtime, jump the boat now and go with async API.

Not many people remember this anymore, but promises were actually the original default of how to handle async in node.js. In node 0.1.30 however promisses were removed and replaced by a standardized callback with the function(err, result) signature. This was done largely for simplicity reasons.

These days, vast majority of your async calls takes this standard callback as the last parameter. Your database driver does it, your web framework does it - it's everywhere. You should stay with the prevalent design and use it too.

The only reason to prefer promises or events is if you have multiple different results that can happen. For example a socket can be opened, receive data, be closed, flushed etc.

This is not your case. Your module always does the same (reads some files). So option #2 it is (unless you can stay synchronous).

Finally, here are the two winning options rewritten slightly:

Synchronous option:
good just for local filesystem at startup time

// db.js
var fs = require('fs');
exports = fs.readdirSync('.');

// main.js
var db = require('./db');
// insert rest of your main.js code here

Asynchronous option:
for when you want to use DBs etc.

// db.js
var fs = require('fs'), cached_files;

exports.init = function(callback) {
  if (cached_files) {
    callback(null, cached_files);
  } else {
    fs.readdir('.', function(err, files) {
      if (!err) {
        cached_files = files;
      }
      callback(err, files);
    });
  }
};

// main.js
require('./db').init(function(err, files) {
  // insert rest of your main.js code here
});
Sign up to request clarification or add additional context in comments.

1 Comment

Promises don't help you with "multiple different results". Promises help you synchronize both errors and proper results when you have multiple asynchronous results that something else needs to use the results of. So I would edit the promises part out of that quote, because its not correct
5

In general it's very bad idea to have any state in module. Modules should expose functions, not data (yes, this requires to change your code structure a bit). Just pass references to your data to a modules functions as parameters.

(edit: just realised that this is approach from your last example. My vote for it)

module1:

module.exports = function(params, callback) { ... }

module2:

var createSomething = require('module1');
module.exports = function(params, callback) { 
   ...
   var db = createSomething(params, function(err, res) {
       ...
       callback(err, res);
   }
}

main code:

var createSomethingOther = require('module2');
createSomethingOther(err, result) {
    // do stuff
}

Comments

1

On my side such module is a function that takes callback (and if internally configured with promises also returns promise (see https://github.com/medikoo/deferred));

The only problem with callback is that by convention it always should be invoked in nextTick, so even when you call your module function when all the data is gathered you should still call your callback in next tick with result set.

8 Comments

Thanks. So, lazy initialization? BTW, why nextTick, what convention?
@elmigranto your module should resemble convention of typical asynchronous function in Node.js, so it should be function, that takes callback as last argument. By convention callback should always be called in next tick (never immediately).
I've got your point, what I'm asking is why callback should always be nextTicked even if it can be invoked immediately. Also, what is “always” exactly? Is it “ultimately always” like in function stuff(callback){ async.parallel([/*…*/], function done(err){/*process.nextTick(callback); here to?*/}); }. If that's not just your own POV, good question could be born.
It's rule of the API, on which programmers often depend writing their code. e.g. they run asynchronous function, then they execute the code which they expect to be run before callback is executed. Executing callback immediately can lead to unexpected errors
I think his point is more than just a personal POV. It breaks out of the current tick and defers it to the next tick - essentially treating the operation as asynchronous even though it could have been returned immediately. This oftentimes has implications on race-conditions (perhaps the developer using your module hooked into your events after initializing, expecting it to be processed on next tick)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.