3

I'm building an app which among other things has the ability to upload files to an existing API. This API takes both file metadata and contents in a JSON object, so I need to convert the binary contents of the files to base64 encoded strings.

Since this is a potentially heavy operation, I moved the functionality into a web worker. The worker takes in an ArrayBuffer object with the binary file contents (returned from FileReader.readAsArrayBuffer()), and returns a base64 encoded string.

This works fine for smaller files, but for the largest files I need to support (~40 MB) this causes out of memory exceptions for my worker (8007000E in Internet Explorer). On rare occasions it goes through, but most of the time the worker just dies. The same happened before moving it into the worker, except then the entire browser page crashed (both in IE and Chrome). Chrome seems to be a bit more resilient to the memory strain in workers than IE is, but I still have to make it work properly in IE (10+).

My worker:

onmessage = e => {
  const bytes = new Uint8Array(e.data);
  const l = bytes.length;
  const chars = new Array(l);
  for (let i = 0, j = l - 1; i <= j; ++i, --j) {
    chars[i] = String.fromCharCode(bytes[i]);
    chars[j] = String.fromCharCode(bytes[j]);
  }
  const byteString = chars.join('');
  const base64bytes = btoa(byteString);

  try {
    postMessage(base64bytes, [base64bytes]);
  } catch (e) {
    postMessage(base64bytes);
  }
};

Am I making some big no-nos here? Are there any ways to reduce the memory consumption? One solution I've thought about would be to process the contents in chunks rather than the whole file, then concatenate the resulting strings and encode it on the outside. Would that be viable, or will that cause problems of its own? Are there any other magical functions I don't know about? I had a glimmer of hope with FileReader.readAsBinaryString(), but it's now removed from the standard (and not supported in IE10 anyway) so I can't use it.

(I realize this question could be relevant at Code Review too, but since my code is actually crashing, I figured SO was the correct place)

2
  • Not sure if it's related to a solution to your issue, but why fill chars starting from each end and finishing in the middle? Commented Mar 23, 2016 at 21:31
  • It halves the number of iterations (from 40M to 20M for a 40MB file) so it was an attempt at optimization. It increased the size it could handle before going oom a bit, but it's still not enough for the largest files. Commented Mar 24, 2016 at 9:02

1 Answer 1

2

One solution I've thought about would be to process the contents in chunks rather than the whole file, then concatenate the resulting strings and encode it on the outside. Would that be viable, or will that cause problems of its own?

This is what https://github.com/beatgammit/base64-js seems to do, doing ~16k at a time. Using this, not using transferables (as IE 10 doesn't support them) on my computer, Chrome manages to encode a 190mb ArrayBuffer (larger than this it complains about invalid string length), and IE 11 40mb (larger than this I get an out of memory exception) .

You can see this at https://plnkr.co/edit/SShi1PE4DuMATcyqTRPx?p=preview, where the worker has the code

var exports = {};
importScripts('b64.js')

onmessage = function(e) {
  var base64Bytes = fromByteArray(new Uint8Array(e.data));
  postMessage(base64Bytes);
};

and the main thread

var worker = new Worker('worker.js');
var length = 1024 * 1024 * 40;
worker.postMessage(new ArrayBuffer(length));

worker.onmessage = function(e) {
  console.log('Received Base64 in UI thread', e.data.length, 'bytes');
}

To go beyond the 40mb limit, one way that seems promising, is to only pass a smaller slice to the worker at a time (say 1mb), encode it, return the result, and only then pass the next slice to the worker, concatenating all the results at the end. I've managed to use this to encode larger buffers (up to 250mb in IE 11). My suspicion is that the asynchronicity allows the garbage collector to run between invocations.

For example at, https://plnkr.co/edit/un7TXeHwYu8eBltfYAII?p=preview, with the same code in the worker as above, but in the UI thread:

var worker = new Worker('worker.js');
var length = 1024 * 1024 * 60;
var buffer = new ArrayBuffer(length);

var maxMessageLength = 1024 * 1024;
var i = 0;
function next() {
  var end = Math.min(i + maxMessageLength, length);
  var copy = buffer.slice(i, end);
  worker.postMessage(copy);
  i = end;
}

var results = [];
worker.onmessage = function(e) {
  results.push(e.data);
  if (i < length) {
    next();
  } else {
    results = results.join('');
    alert('done ' + results.length);
  }
};

next();
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.