To be brief, I have written a file uploader using the HTML5 FileReader API and xhr posts to upload user-selected files to a server. My client-side code has a few essential tasks, including getting values from the selected files' headers (these are DICOM image files) and displaying them prior to sending the files to the server, updating a progress bar, etc. I have some more features, including zipping the files if it will speed things up, etc. going on.
Fairly quickly, I noticed that large files ate up a ton of memory (Chrome-specific). Given a large enough data set, Chrome "Aw, Snap!"s and crashes entirely. I've implemented countless fixes: exhaustive searches for memory leaks, delayed reading and sending of files using callbacks and a small queue, only reading size n chunks of each file at a time, etc. As you can imagine, this has lead to some pretty hefty client-side JavaScript (coffeescript, actually). In the following fiddle, a coworker and I have pared it down to the bare essentials: reading all selected files in chunks and setting a variable to that binary data (sparing everyone reading through the code that parses the headers, zips when necessary, and sends each chunk).
https://jsfiddle.net/3nails4you/gsqzrk9g/8/, or see below:
HTML:
<input id="file" type="file" onchange="slice()" multiple="" />
JavaScript:
function slice() {
var filesArr = document.getElementById('file').files;
var index;
for (index = 0; index < filesArr.length; index++) {
readFile(filesArr[index]);
}
}
function readFile(file) {
var fr = new FileReader(),
chunkSize = 2097152,
chunks = Math.ceil(file.size / chunkSize),
chunk = 0;
function loadNext() {
var start, end, blob;
start = chunk * chunkSize;
end = start + chunkSize >= file.size ? file.size : start + chunkSize;
fr.onload = function (e) {
// get file content
var filestream = e.target.result;
if (++chunk < chunks) {
console.info(chunk);
loadNext();
}
};
blob = file.slice(start, end);
fr.readAsBinaryString(blob);
}
loadNext();
}
I have tried different methods of reading (as ArrayBuffer, DataURL), many different structures as far as variable scopes (e.g. declaring only 1 FileReader and reusing, etc.), and have tried many different chunk sizes for optimization. When I select a specific data set that is ~1 GB, across 16 files, memory usage looks like this:
[EDIT] I'm not able to post images, yet, so I'll just describe. Looking at the Windows task manager, the chrome process is using 625,000 K memory.
Notably, if I wait for the reading to finish (console log will stop outputting), the memory usage becomes static. If, at that point, I open the JavaScript console, the memory usage drops to what it was before the file reading began. My suspicion is that the act of opening the console fires off Chrome's garbage collection, or something along those lines, but I'm uncertain.
I've found other questions about somewhat similar issues, but all of them are answered with the assumption that the client does not actually need to use the binary data of the file. I absolutely do - any suggestions? Is this simply a bug to report on the Chromium projects? Is there a glaring error in my code that I've simply missed? I usually tend to suspect the latter, but the "opening the console clears the memory" point continues to irk me - if there was a memory leak, would that really be the case? Thanks for reading, I appreciate any suggestions!
FileReaderSync(), which might make it better, but you can always callself.terminate()from inside a worker to stop the program and recover the RAM. you can also pass the files to the worker(s) as blobs, so you should be able to do all the same things as before.