upload a file directly into S3 using python

Question

I want to download a file received from a http url, directly into an amazon s3 bucket, instead of local system. I run python on a 64 bit windows os.

I tried providing the Amazon S3's bucket url as the second argument of urlretrieve function of python during the file extract.

urllib.request.urlretrieve(url, amazon s3 bucket url)

I expected it to upload the file directly to s3, however it fails with filenotFound error , which , after some thought makes sense.

John Rotenstein · Accepted Answer · 2019-04-03 11:32:13Z

3

It appears that you want to run a command on a Windows computer (either local or running on Amazon EC2) that will copy the contents of a page identified by a URL directly onto Amazon S3.

This is not possible. There is no API call for Amazon S3 that retrieves content from a different location.

You will need to download the file from the Internet and then upload it to Amazon S3. The code would look something like:

import boto3
import urllib.request

urllib.request.urlretrieve('http://example.com/hello.txt', '/tmp/hello.txt')

s3 = boto3.client('s3')
s3.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')

answered Apr 3, 2019 at 11:32

John Rotenstein

273k28 gold badges456 silver badges541 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

j raj Over a year ago

Thank you. Seeing 'not possible' , settles my mind a bit. :) Nevertheless, Is it possible to run the same using AWS glue and get the file onto S3? i just want to save double hops onto S3

John Rotenstein Over a year ago

I think that Glue only processes data coming in from Amazon S3. It would be quite a bit of overhead compared to the simple example above. Another option is to upload a "manifest file", which is simply a list of URLs. You could configure Amazon S3 Events to trigger a Lambda function that could read the file, then run the above code for each URL mentioned. So, you just upload a list of URLs and they magically load into S3 (but you'd have to write the Lambda code to do so). this would be a good approach if you commonly need to upload dozens/hundreds of files.

j raj Over a year ago

I cannot fully comprehend your approach. I ran above code on AWS Glue as Python Shell. It failed with Syntax error. I guess i need to run above code on another external entity and push the file onto S3.

John Rotenstein Over a year ago

Correct. AWS Glue transforms data as it "flows" through. It is not intended for running bits of code that do external functions.

Collectives™ on Stack Overflow

upload a file directly into S3 using python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related