0

I have a Node.js application that basically caches data from a web service. I have also a queue which receives approximately 500 items that need to be processed as quickly as possible. By processed, I mean that each one of them represents one HTTP request to be made and its response to be cached.

Now, the single-threaded architecture of Node is not ideal for this scenario. Ideally, I would like to spawn 5-10 "threads" to process the queue as quickly as possible. I read there is a child_process module that can fork processes, but I have never used it. Could this module help?

Can anyone suggest a solution for this problem?

5
  • 3
    Any asynchronous IO in Node is made using background threads. Also, Node can handle a bigger volume of data than Apache, for example, so only when you reach an absurd number of requests you should think about forking processes :) Commented Oct 15, 2013 at 15:34
  • 2
    I agree with +gustavohenke. For the most part you should be fine with just using setTimeout and setInterval in order to prevent blocking. If you must create child process for whatever reason look at the following: nodejs.org/api/child_process.html . Also the latest node.js version has a beta implementation for a cluster of node.js processes that listen on the same ports, for more info look here: nodejs.org/api/cluster.html Commented Oct 15, 2013 at 15:40
  • Hi guys. The issue here is not concurrency. It is about speed of processing. I already use setTimeout. The thing is that I need the items processed as fast as possible. That is why I am looking into "threads". If for instance I have 5 threads processing the queue, it will finish way faster than what I have now Commented Oct 15, 2013 at 18:00
  • You probably want to follow StMotorSpark's answer regarding the child_process and cluster methods. What you're describing is very doable, you just have to look at it as a multi-process model rather than a multi-threaded model. Commented Oct 15, 2013 at 19:37
  • 3
    @Thomas are you making assumptions about how fast Node will handle this sort of workload, or have you actually benchmarked and come to the conclusion that you need multithreading/-processing? What speed (in terms of requests/sec) are you aiming for? Commented Oct 15, 2013 at 19:39

2 Answers 2

2

child_processes are simply forks of a new node process running the same or a different script. you can use that api to spawn system processes aswell but thats not what i will describe here.

they behave like true nodejs processes because thats what they are.

there is a big big negative side:

you need to keep in mind that spawning a node process takes alot of time and ressources so usualy its faster to compute data within one node process OR to spawn worker childs to communicate work to. as you can see in the documentation you are able to send and recceive data from and to the child_process wich makes you be able to delegate work to already spawned childs.

child processes usually share the same stdin and stdout as the process that spawned it unless you change it. just take a look at the documentation. its very well documented and easy to work with.

child_process documentation

i've never made worker childs but i've made stuff like this wich you may consider usefull.

if (process.argv.indexOf("child") == -1) {
  process.chdir(module.filename.replace(/\/[^\/]+$/, ""));
  var child;
  var spawn = function () {
    console.log("spawning child process " + new Date());
    child = require("child_process").fork(module.filename, ["child"]);
    child.on("close", function () {
      spawn();
    });
  }
  spawn();

  process.on("exit", function () {
    child.kill();
  });
  return;
}

// child code begins here

var fs = require("fs");

fs.watch(process.argv[1], function () {
  process.exit();
});
Sign up to request clarification or add additional context in comments.

Comments

0

The child_process module will somewhat do what you want.

Only issue is, you literally spawn new processes, so hence, there is a memory overhead that you have to consider. Assuming you want the elegance of defining your subroutines within the same file, you can pass a JavaScript string to the node command.

So this is exactly what we will do. But first, let's create a function that accepts a JSON-compatible object, and a function, which will then run that function on a new thread:

var child_process = require('child_process');

function startThread(data, fn, callback) {
  var fnStr = '(' + fn.toString() + ')(' + JSON.stringify(data) + ');';

  var node = child_process.spawn('node', ['-e', fnStr]);

  var output = [];

  var onData = function (data) {
    output.push(data.toString('utf8').trim());
  };

  node.stdout.on('data', onData);
  node.stderr.on('data', onData);

  node.on('close', function (code) {
    callback(code, output);
  });
}

And as an example, we are going to be spawning a new thread to generate the lyrics of the "99 bottles of beer" song:

startThread({ doFor: '99' }, function (data) {
  var str = '';
  while (data.doFor) {
    str += data.doFor + ' bottles of beer on the wall ' + data.doFor +
    ' bottles of beer. You take one out, toss it around, ';
    data.doFor--;
    str += data.doFor + ' bottles of beer on the wall\n';
  }
  console.log(str.trim());
}, function (code, outputs) {
  console.log(outputs.join(''));
});

Unfortunately, the function that will be used in the other "thread" wouldn't have access to variables in the parent thread.

And also, data is passed through STDOUT and STDERR.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.