3

I recently read about Node's "worker_threads" module that allows parallel execution of Javascript code in multiple threads which is useful for CPU-intensive operations. (NOTE: these are not web workers made by Chrome in the browser)

I'm building a feature where I need to do a massive amount of Postgres INSERTs without blocking the browser.

The problem is: in my Javascript files where I instantiate the worker, I'm not allowed to import anything, including native Node modules or NPM libraries like Knex.js which is necessary to do database queries. I get an error that says: Cannot use import statement outside a module as soon as the file is executed.

I've tried putting the worker code in another file with an import statement at the top (same error). I've tried giving the Knex object to workerData but it cannot clone a non-native JS object.

I'm out of ideas- does anyone know how to interact with a database in a worker thread if we can't import any NPM libraries?!?!

// mainThread.js

const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

import knex from 'knex'; // --> *** UNCAUGHT EXCEPTION: Cannot use import statement outside a module ***

if (isMainThread) {
  module.exports = async function runWorker (rowsToInsert = []) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename, { workerData: { rowsToInsert } });

      worker.on('message', (returningRows) => resolve(returningRows));
      worker.on('error', reject);
      worker.on('exit', (code) => {
        if (code !== 0) reject(new Error(`Worker stopped with exit code ${code}`));
      });
    });
  };
} else {
  const { rowsToInsert } = workerData;

  return knex('table').insert(rowsToInsert)
    .then((returningRows) => {
      parentPort.postMessage({ data: returningRows });
    });
}

I am following a tutorial from this webpage: https://blog.logrocket.com/use-cases-for-node-workers/

5
  • 2
    Yes, you can use workerThreads for that, but it is probably not required or useful. The database is, by itself its own process. So, all the nodejs process is doing is sending it network requests in a non-blocking, asynchronous fashion. nodejs is unlikely to be doing anything CPU intensive that would benefit from workerThreads. Commented Jul 15, 2020 at 23:08
  • 2
    FYI, when writing workerThread code I find it 200% cleaner and clearer and easier to debug if you put your workerThread code in it's own file and do not mix main thread code with workerThread code in the same file. You should not need if (isMainThread) {} when doing it that way. Plus, the whole intellectual understanding of what code runs where is a lot simpler to follow when you separate them cleanly into two files. Commented Jul 15, 2020 at 23:10
  • Thanks for this explanation! Makes sense to me! @jfriend00 Commented Jul 16, 2020 at 16:33
  • @jfriend00 By the way, if you have any other insight- what other tools are at my disposable when trying to do a high-volume insert using Node (e.g. up to 1 million rows per ingestion)? Commented Jul 16, 2020 at 16:57
  • 1
    nodejs is not your challenge there. The challenge will be doing 1 million inserts into a database. The database will be the bottleneck there, not nodejs. For further help on that you'd probably want to make a new question about the specific database you're using and what exactly you're trying to insert and then maybe someone who really knows that database can talk about the most efficient ways to get all that data into the database. Again, it's not really a nodejs issue, but how to optimally use the database. Commented Jul 16, 2020 at 21:50

2 Answers 2

14

It is of course possible, but it's a very bad idea.

Database drivers are already asynchronous and non-blocking of the JavaScript thread. Moving your insert calls to a separate thread as you propose will not only get you no performance gains, it will actually decrease overall performance because of the overhead involved with interthread communication:

  • Synchronization and message passing is not free
  • JavaScript uses structured cloning when moving data between threads. This means all your rowsToInsert must be copied, which is (relatively) expensive.

Generally, the only time it's really appropriate to use JS threads is when your JavaScript code is performing CPU-intensive work. The node docs say as much right at the top:

Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.

This means if you're doing a lot of parsing, math, or similar, it may be appropriate to do the work in a thread. However, simply shoveling data from one place to another (ie, I/O) is not a good candidate for a thread — after all, node's design is tuned to be efficient at this kind of work.

You don't say where your rowsToInsert come from, but if it's coming in from HTTP request(s), a thread is the wrong thing to use. However, if you're parsing eg a CSV or JSON file on the server, it may be worthwhile to do that in a thread, but it's important that the thread does all the work (so memory need not be moved between threads). The message you post to the worker should just be "process the file located at /foo/bar.csv", and then the worker thread does the rest.


The error you're getting is the same that you'd get without worker threads: you're trying to use import in a regular, non-module JS file. Either rename the worker file to *.mjs or use require('knex') instead.

node's ES module documentation goes into detail about how import differs from require.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for this amazing answer! It's super clear and your tip to use require worked. So if I wanted to do a high-volume insert, what tools are available? Should I use Node clusters? I'm googling "high volume inserts node" or "node cluster postgres inserts" but not getting a clear answer.
It depends on where your data comes from. Where does it come from?
The data comes from parsing multiple CSV files in Node. I iterate over the data and create an array of JS objects for insertion. Currently I'm chunking the inserts by 10K rows each chunk. And I'm running the Promises to insert them into Postgres one after another (as opposed to at the same time with Promise.all)
@BruceWang: If your input data is already well-formed, your best bet might be COPY, which natively imports CSV to a table. pg-copy-streams will help. However, if you're doing a lot of parsing and transformation in node, then it might make sense to do the job in a worker thread or process, but I'd profile to find the real bottleneck(s) first. Also look in to prepared statements.
…and generally, read the pg docs' performance tips.
|
0

Cannot use import statement outside a module

This is just a complaint that your code is using the "import" style of imports, but "import" is only supported inside ES modules and your code is in a CommonJS context. In CommonJS code you need to use require() instead:

const knex = require('knex');

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.