43

I am trying to find some solution to stream file on amazon S3 using node js server with requirements:

  • Don't store temp file on server or in memory. But up-to some limit not complete file, buffering can be used for uploading.
  • No restriction on uploaded file size.
  • Don't freeze server till complete file upload because in case of heavy file upload other request's waiting time will unexpectedly increase.

I don't want to use direct file upload from browser because S3 credentials needs to share in that case. One more reason to upload file from node js server is that some authentication may also needs to apply before uploading file.

I tried to achieve this using node-multiparty. But it was not working as expecting. You can see my solution and issue at https://github.com/andrewrk/node-multiparty/issues/49. It works fine for small files but fails for file of size 15MB.

Any solution or alternative ?

6 Answers 6

46

You can now use streaming with the official Amazon SDK for nodejs in the section "Uploading a File to an Amazon S3 Bucket" or see their example on GitHub.

What's even more awesome, you finally can do so without knowing the file size in advance. Simply pass the stream as the Body:

var fs = require('fs');
var zlib = require('zlib');

var body = fs.createReadStream('bigfile').pipe(zlib.createGzip());
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body})
  .on('httpUploadProgress', function(evt) { console.log(evt); })
  .send(function(err, data) { console.log(err, data) });
Sign up to request clarification or add additional context in comments.

6 Comments

this isn't working with my output stream from yazl zip object?
Brilliant! You can also pipe Buffers to zlib.createGzip() by transforming it into a Stream. const { Duplex } = require('stream'); `
Does anyone know how this works? If each part is a fixed size, how do they fill in the last part if it doesn't exactly match the full size?
Can you update the link Johann? It appears to have changed.
@anon58192932 thanks for catching that, the link is now updated!
|
8

For your information, the v3 SDK were published with a dedicated module to handle that use case : https://www.npmjs.com/package/@aws-sdk/lib-storage

Took me a while to find it.

1 Comment

Ran into issues with this where the stream passed in is transformed into a geojson feature collection.
2

Give https://www.npmjs.org/package/streaming-s3 a try.

I used it for uploading several big files in parallel (>500Mb), and it worked very well. It very configurable and also allows you to track uploading statistics. You not need to know total size of the object, and nothing is written on disk.

Comments

1

If it helps anyone I was able to stream from the client to s3 successfully (without memory or disk storage):

https://gist.github.com/mattlockyer/532291b6194f6d9ca40cb82564db9d2a

The server endpoint assumes req is a stream object, I sent a File object from the client which modern browsers can send as binary data and added file info set in the headers.

const fileUploadStream = (req, res) => {
  //get "body" args from header
  const { id, fn } = JSON.parse(req.get('body'));
  const Key = id + '/' + fn; //upload to s3 folder "id" with filename === fn
  const params = {
    Key,
    Bucket: bucketName, //set somewhere
    Body: req, //req is a stream
  };
  s3.upload(params, (err, data) => {
    if (err) {
      res.send('Error Uploading Data: ' + JSON.stringify(err) + '\n' + JSON.stringify(err.stack));
    } else {
      res.send(Key);
    }
  });
};

Yes putting the file info in the headers breaks convention but if you look at the gist it's much cleaner than anything else I found using streaming libraries or multer, busboy etc...

+1 for pragmatism and thanks to @SalehenRahman for his help.

Comments

0

I'm using the s3-upload-stream module in a working project here.

There is also some good examples from @raynos in his http-framework repository.

Comments

0

Alternatively you can look at - https://github.com/minio/minio-js. It has minimal set of abstracted API's implementing most commonly used S3 calls.

Here is an example of streaming upload.

$ npm install minio
$ cat >> put-object.js << EOF

var Minio = require('minio')
var fs = require('fs')

// find out your s3 end point here:
// http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

var s3Client = new Minio({
  url: 'https://<your-s3-endpoint>',
  accessKey: 'YOUR-ACCESSKEYID',
  secretKey: 'YOUR-SECRETACCESSKEY'
})

var outFile = fs.createWriteStream('your_localfile.zip');
var fileStat = Fs.stat(file, function(e, stat) {
  if (e) {
    return console.log(e)
  }
  s3Client.putObject('mybucket', 'hello/remote_file.zip', 'application/octet-stream', stat.size, fileStream, function(e) {
    return console.log(e) // should be null
  })
})
EOF

putObject() here is a fully managed single function call for file sizes over 5MB it automatically does multipart internally. You can resume a failed upload as well and it will start from where its left off by verifying previously upload parts.

Additionally this library is also isomorphic, can be used in browsers as well.

2 Comments

Can this library stream upload a file from an uploading user instead me having to buffer it to my server first (whether on memory or disk)?
It takes input stream, it can be a file stream or any stream whatsoever. It will upload automatically to server until the stream closes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.