Import CSV from AWS S3 instance to Numpy

Question

I've been trying to directly read a csv file from AWS S3 to numpy. I've used:

s3 = boto3.client(service_name = 's3')

def s3_read(filename):
    s3_obj = s3.get_object(Bucket = 'bucket-name', Key = filename)
    body = s3_obj['Body']
    return body.read()

as an attempt to pull the data but I'm running into an issue of formatting from AWS that I don't know how to handle.

When I print out the data that is being returned from that there is a weird header before the data:

b{\n "name":"filename",\n "data":{\n "type":"Buffer,\n "data:[\n 114,\n 97,...]}}

So there's a bunch of \n's and the weird header. Would this have something to do with the way I uploaded the file to AWS or is there something I'm messing up with the reading of the file?

Anna Nevison · Accepted Answer · 2020-06-21 23:13:32Z

4

body.read() returns bytes.

import json
j = json.loads(s3_obj['Body'].read().decode('utf-8'))

decode will turn bytes to string, json.loads will parse the string to dictionary.

edited Jun 21, 2020 at 23:13

answered Jun 21, 2020 at 22:44

Anna Nevison

2,7379 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Import CSV from AWS S3 instance to Numpy

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related