1

To be brief, I have written a file uploader using the HTML5 FileReader API and xhr posts to upload user-selected files to a server. My client-side code has a few essential tasks, including getting values from the selected files' headers (these are DICOM image files) and displaying them prior to sending the files to the server, updating a progress bar, etc. I have some more features, including zipping the files if it will speed things up, etc. going on.

Fairly quickly, I noticed that large files ate up a ton of memory (Chrome-specific). Given a large enough data set, Chrome "Aw, Snap!"s and crashes entirely. I've implemented countless fixes: exhaustive searches for memory leaks, delayed reading and sending of files using callbacks and a small queue, only reading size n chunks of each file at a time, etc. As you can imagine, this has lead to some pretty hefty client-side JavaScript (coffeescript, actually). In the following fiddle, a coworker and I have pared it down to the bare essentials: reading all selected files in chunks and setting a variable to that binary data (sparing everyone reading through the code that parses the headers, zips when necessary, and sends each chunk).

https://jsfiddle.net/3nails4you/gsqzrk9g/8/, or see below:

HTML:

<input id="file" type="file" onchange="slice()" multiple="" />

JavaScript:

function slice() {

    var filesArr = document.getElementById('file').files;
    var index;
    for (index = 0; index < filesArr.length; index++) {
        readFile(filesArr[index]);
    }
}

function readFile(file) {

    var fr = new FileReader(),
        chunkSize = 2097152,
        chunks = Math.ceil(file.size / chunkSize),
        chunk = 0;

    function loadNext() {
        var start, end, blob;

        start = chunk * chunkSize;
        end = start + chunkSize >= file.size ? file.size : start + chunkSize;

        fr.onload = function (e) {
            // get file content
            var filestream = e.target.result;
            if (++chunk < chunks) {
                console.info(chunk);
                loadNext();
            }
        };
        blob = file.slice(start, end);
        fr.readAsBinaryString(blob);
    }
    loadNext();
}

I have tried different methods of reading (as ArrayBuffer, DataURL), many different structures as far as variable scopes (e.g. declaring only 1 FileReader and reusing, etc.), and have tried many different chunk sizes for optimization. When I select a specific data set that is ~1 GB, across 16 files, memory usage looks like this:

[EDIT] I'm not able to post images, yet, so I'll just describe. Looking at the Windows task manager, the chrome process is using 625,000 K memory.

Notably, if I wait for the reading to finish (console log will stop outputting), the memory usage becomes static. If, at that point, I open the JavaScript console, the memory usage drops to what it was before the file reading began. My suspicion is that the act of opening the console fires off Chrome's garbage collection, or something along those lines, but I'm uncertain.

I've found other questions about somewhat similar issues, but all of them are answered with the assumption that the client does not actually need to use the binary data of the file. I absolutely do - any suggestions? Is this simply a bug to report on the Chromium projects? Is there a glaring error in my code that I've simply missed? I usually tend to suspect the latter, but the "opening the console clears the memory" point continues to irk me - if there was a memory leak, would that really be the case? Thanks for reading, I appreciate any suggestions!

3
  • you could try doing one file at a time instead of loading them all at once. you can also try Workers to create a disposable runtime. Commented Mar 27, 2015 at 17:49
  • I've done one file at a time - the problem still persists. I've even set timeouts that allow for a 60 second pause between reading each file - the memory would just jump up every 60 seconds and never decrease (just as in the example, but obviously, remarkably slower). The workers are an interesting idea - I've never worked with them before, I'm going to go take a look into that! Thanks! Commented Mar 27, 2015 at 18:03
  • workers have FileReaderSync(), which might make it better, but you can always call self.terminate() from inside a worker to stop the program and recover the RAM. you can also pass the files to the worker(s) as blobs, so you should be able to do all the same things as before. Commented Mar 27, 2015 at 18:05

1 Answer 1

1

In case anyone stumbles onto this question with the same problem, I thought I'd share what we found in order to alleviate this.

I ended up purchasing a license and incorporating plupload into my coffeescript. This helps to solve the memory issue in this way:

First, I create a new plupload object, and set its event handlers (BeforeUpload, UploadProgress, etc.). Its 'Destroy' handler calls a javascript function, nextUploader(), which creates another uploader object and queues up the next portion of files. After the destroy occurs, the plupload object's memory usage is successfully reclaimed, and so the browser's memory usage stays within a reasonable range.

If anyone is looking to do HTML5 file reading and uploading, I highly recommend exploring plupload - it's quite easy to use, and we found that it is in use by Dropbox as well.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.