How to read a csv file from an s3 bucket using Pandas in Python

Question

I am trying to read a CSV file located in an AWS S3 bucket into memory as a pandas dataframe using the following code:

import pandas as pd
import boto

data = pd.read_csv('s3:/example_bucket.s3-website-ap-southeast-2.amazonaws.com/data_1.csv')

In order to give complete access I have set the bucket policy on the S3 bucket as follows:

{
"Version": "2012-10-17",
"Id": "statement1",
"Statement": [
    {
        "Sid": "statement1",
        "Effect": "Allow",
        "Principal": "*",
        "Action": "s3:*",
        "Resource": "arn:aws:s3:::example_bucket"
    }
  ]
}

Unfortunately I still get the following error in python:

boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed

Wondering if someone could help explain how to either correctly set the permissions in AWS S3 or configure pandas correctly to import the file. Thanks!

yes, you're right there should be. I also had to change the location of the bucket and file: tripData = pd.read_csv('htps://s3-ap-southeast-2.amazonaws.com/example_bucket/data.csv'). and I had to update the permissions on the individual file. but it works now. cheers. — Paul_M
– Paul_M, Commented Jun 13, 2015 at 23:05
Please add your solution as an Answer to help other Stackoverflow users. — John Rotenstein
– John Rotenstein, Commented Jun 15, 2015 at 5:19
When using read_csv to read files from s3, does pandas first downloads locally to disk and then loads into memory? Or does it streams from the network directly into the memory? — krackoder
– krackoder, Commented Apr 5, 2016 at 21:46

jpobst · Accepted Answer · 2022-06-22 15:51:41Z

86

Using pandas 0.20.3

import boto3
import pandas as pd
import sys

if sys.version_info[0] < 3: 
    from StringIO import StringIO # Python 2.x
else:
    from io import StringIO # Python 3.x

client = boto3.client('s3')

bucket_name = 'my_bucket'

object_key = 'my_file.csv'
csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
body = csv_obj['Body']
csv_string = body.read().decode('utf-8')

df = pd.read_csv(StringIO(csv_string))

edited Jun 22, 2022 at 15:51

answered Sep 20, 2017 at 13:40

jpobst

3,7413 gold badges27 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Matt M Over a year ago

When I import it this way the df's columns do not appear?

Zach Oakes Over a year ago

I'm trying this and I'm getting errors in the id and secret key calls to os.environ -- is that something I have to set up in terminal or something?

jpobst Over a year ago

@ZachOakes Yes, that's something you would have needed to set up. Those two lines assume that your ID and SECRET were previously saved as environment variables, but you don't need to pull them from environment variables. Instead, you can replace those two lines with whatever method you like to get your ID and SECRET into your code.

Aaron Lelevier Over a year ago

Also works for DictReader: reader = csv.DictReader(io.StringIO(body), fieldnames=fieldnames)

Jhirschibar · Accepted Answer · 2019-02-14 20:00:06Z

10

Based on this answer that suggested using smart_open for reading from S3, this is how I used it with Pandas:

import os
import pandas as pd
from smart_open import smart_open

aws_key = os.environ['AWS_ACCESS_KEY']
aws_secret = os.environ['AWS_SECRET_ACCESS_KEY']

bucket_name = 'my_bucket'
object_key = 'my_file.csv'

path = 's3://{}:{}@{}/{}'.format(aws_key, aws_secret, bucket_name, object_key)

df = pd.read_csv(smart_open(path))

edited Feb 14, 2019 at 20:00

Jhirschibar

2551 gold badge4 silver badges16 bronze badges

answered Jul 30, 2018 at 11:02

kepler

2,01019 silver badges19 bronze badges

Comments

Community · Accepted Answer · 2017-10-16 21:59:02Z

You don't need pandas.. you can just use the default csv library of python

def read_file(bucket_name,region, remote_file_name, aws_access_key_id, aws_secret_access_key):
    # reads a csv from AWS

    # first you stablish connection with your passwords and region id

    conn = boto.s3.connect_to_region(
        region,
        aws_access_key_id=aws_access_key_id,
        aws_secret_access_key=aws_secret_access_key)

    # next you obtain the key of the csv you want to read
    # you will need the bucket name and the csv file name

    bucket = conn.get_bucket(bucket_name, validate=False)
    key = Key(bucket)
    key.key = remote_file_name
    data = key.get_contents_as_string()
    key.close()

    # you store it into a string, therefore you will need to split it
    # usually the split characters are '\r\n' if not just read the file normally 
    # and find out what they are 

    reader = csv.reader(data.split('\r\n'))
    data = []
    header = next(reader)
    for row in reader:
        data.append(row)

    return data

hope it solved your problem, good luck! :)

cs_stackX · Accepted Answer · 2023-01-26 20:42:03Z

4

Without pandas (it's a big dependency just to read a csv file folks):

client = boto3.client("s3", region_name="eu-west-2")     
data = client.get_object(Bucket=bucket, Key=_file)    
reader = csv.DictReader(StringIO(data['Body'].read().decode('utf-8')))

answered Jan 26, 2023 at 20:42

cs_stackX

1,5373 gold badges21 silver badges28 bronze badges

1 Comment

HuLu ViCa Over a year ago

How can I load only a fraction of the csv?

Paul_M · Accepted Answer · 2015-06-16 08:18:59Z

3

I eventually realised that you also need to set the permissions on each individual object within the bucket in order to extract it by using the following code:

from boto.s3.key import Key
k = Key(bucket)
k.key = 'data_1.csv'
k.set_canned_acl('public-read')

And I also had to modify the address of the bucket in the pd.read_csv command as follows:

data = pd.read_csv('https://s3-ap-southeast-2.amazonaws.com/example_bucket/data_1.csv')

answered Jun 16, 2015 at 8:18

Paul_M

5471 gold badge4 silver badges5 bronze badges

2 Comments

nick_liu Over a year ago

How to modify address to become a url that can be read by pandas?

Carl Summers Over a year ago

You've made this file readable by anyone in the world which most people should probably avoid doing. @jpobst's answer above that provides the correct credentials to read the file is what most folks should do.

Theofilos Papapanagiotou · Accepted Answer · 2023-01-12 23:52:25Z

3

You can use AWS SDK for Pandas, a library that extends Pandas to work smoothly with AWS data stores, such as S3.

import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")

edited Jan 12, 2023 at 23:52

answered Jan 12, 2023 at 23:42

Theofilos Papapanagiotou

5,6492 gold badges21 silver badges26 bronze badges

Comments

Jayani Sumudini · Accepted Answer · 2024-04-08 04:57:02Z

2

Pandas (starting with version 1.2.0) supports the ability to read and write files stored in S3 using the s3fs Python package. S3Fs is a Pythonic file interface to S3. It builds on top of botocore.

pip install s3fs

Use S3 URI.

To reading file

import pandas as pd

df = pd.read_csv("s3://my-bucket-name/sample.csv")

To writing file

import pandas as pd

df.to_csv("s3://my-bucket-name/sample.csv")

answered Apr 8, 2024 at 4:57

Jayani Sumudini

1,5392 gold badges24 silver badges29 bronze badges

Comments

zabidima · Accepted Answer · 2023-06-13 06:04:22Z

0

You can try this:

 import boto3
 import pandas as pd

 s3_client = boto3.client(
    "s3",
    aws_access_key_id = ACCESS_KEY_ID,
    aws_secret_access_key = SECRET_ACCESS_KEY,
    endpoint_url = ENDPOINT_URL
    )
response = s3_client.get_object(Bucket=BUCKET_NAME, Key=OBJECT_KEY)
status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")

if status == 200:
    df = pd.read_csv(response.get("Body"))
    print('Successfully read dataframe from S3')
else:
    print(f"Unsuccessful S3 get_object. Status: {status}")

answered Jun 13, 2023 at 6:04

zabidima

415 bronze badges

Collectives™ on Stack Overflow

How to read a csv file from an s3 bucket using Pandas in Python

8 Answers 8

4 Comments

Comments

Comments

1 Comment

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

4 Comments

Comments

Comments

1 Comment

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related