0

I want to download a file received from a http url, directly into an amazon s3 bucket, instead of local system. I run python on a 64 bit windows os.

I tried providing the Amazon S3's bucket url as the second argument of urlretrieve function of python during the file extract.

urllib.request.urlretrieve(url, amazon s3 bucket url)

I expected it to upload the file directly to s3, however it fails with filenotFound error , which , after some thought makes sense.

1 Answer 1

3

It appears that you want to run a command on a Windows computer (either local or running on Amazon EC2) that will copy the contents of a page identified by a URL directly onto Amazon S3.

This is not possible. There is no API call for Amazon S3 that retrieves content from a different location.

You will need to download the file from the Internet and then upload it to Amazon S3. The code would look something like:

import boto3
import urllib.request

urllib.request.urlretrieve('http://example.com/hello.txt', '/tmp/hello.txt')

s3 = boto3.client('s3')
s3.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. Seeing 'not possible' , settles my mind a bit. :) Nevertheless, Is it possible to run the same using AWS glue and get the file onto S3? i just want to save double hops onto S3
I think that Glue only processes data coming in from Amazon S3. It would be quite a bit of overhead compared to the simple example above. Another option is to upload a "manifest file", which is simply a list of URLs. You could configure Amazon S3 Events to trigger a Lambda function that could read the file, then run the above code for each URL mentioned. So, you just upload a list of URLs and they magically load into S3 (but you'd have to write the Lambda code to do so). this would be a good approach if you commonly need to upload dozens/hundreds of files.
I cannot fully comprehend your approach. I ran above code on AWS Glue as Python Shell. It failed with Syntax error. I guess i need to run above code on another external entity and push the file onto S3.
Correct. AWS Glue transforms data as it "flows" through. It is not intended for running bits of code that do external functions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.