9

I was testing streams with node and I setup a program to read a large file and write it again using streams. The problem is when running the program, memory usage of node goes up to 1.3 GB, which is exactly the size of the file that is being read. It is like it doesn't stream it, it buffers it and writes it in one go OR the garbage collector doesn't destroy the chunk variables in memory. This is the program:

const {  createReadStream, createWriteStream } = require('fs');

const readStream = createReadStream('../movie.mp4', {
    highWaterMark: 10000
});
const writeStream = createWriteStream('./copy.mp4', {
    highWaterMark: 10000
});

readStream.on('data', function (chunk) {
    writeStream.write(chunk);
})

readStream.on('end', function () {
    console.log("reading done");
    writeStream.end();
});

writeStream.on('close', function () {
    console.log("Writing done.");
})

And the weird thing is if I pipe these streams, it works as expected and the memory usage won't go above 20 MB. Like this:

const {  createReadStream, createWriteStream } = require('fs');

const readStream = createReadStream('../movie.mp4', {
    highWaterMark: 10000
});
const writeStream = createWriteStream('./copy.mp4', {
    highWaterMark: 10000
});

readStream.pipe(writeStream);

What could cause such behavior?

Node version: v14.15.4

1 Answer 1

16

Well, I found the problem. There is this condition called Back pressure. In my case it happens because the read stream flow is way faster than the write stream flow. and it happens because the readStream buffers the read data in memory until the writeMemory writes it. So the solution is we pause the readStream temporarily until the writeStream finishes writing and then we feed it more chunks of data. This is the right program:

const {  createReadStream, createWriteStream } = require('fs');

const readStream = createReadStream('../movie.mp4', {
    highWaterMark: 10000
});

const writeStream = createWriteStream('./copy.mp4', {
    highWaterMark: 10000
});


readStream.on('data', function (chunk) {
    // according to docs the value of result variable is: 
    // Returns: <boolean> false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true.
    const result = writeStream.write(chunk);

    if(!result) {
        console.log("BACKPRESSURE");
        readStream.pause();
    }
});

writeStream.on('drain', () => {
    console.log("DREAINED");
    readStream.resume();
});

readStream.on('end', function () {
    console.log("reading done");
    writeStream.end();
});

writeStream.on('close', function () {
    console.log("Writing done.");
})

And the docs on drain event is here.

Sign up to request clarification or add additional context in comments.

1 Comment

Very nice, I was reading now about the backpressure on the official docs: nodejs.org/es/docs/guides/backpressuring-in-streams. Your problem/answer fits precisely my searches.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.