AWS Lambda - read csv and convert to pandas dataframe

Question

I have got a simple Lambda code to read the csv file from S3 Bucket. All is working fine however I tried to get the csv data to pandas data frame and the error comes up string indices must be integers

My code is bog-standard but I just need to use the csv as a data frame for further manipulation. The hashed line is the source of the error. I can print data with no problems so the bucket and file details are configured properly.

updated code

import json
import pandas as pd
import numpy as np
import requests
import glob
import time
import os
from datetime import datetime
from csv import reader
import boto3
import traceback
import io

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]
        resp = s3_client.get_object(Bucket=bucket_name, Key=s3_file_name)
        
        data = resp['Body'].read().decode('utf-8')
        df=pd.DataFrame( list(reader(data)))
        print (df.head())

    except Exception as err:
        print(err)
        

        
        
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello fr2om Lambda!')
    }
    
    traceback.print_exc()

Please include the traceback message so that we can easily spot the errant line. — tdelaney
– tdelaney, Commented Oct 27, 2020 at 21:25
When you have something like event["Records"][0]["s3"]["bucket"]["name"] giving you a problem, you can toss in some thow away code to narrow it down. event["Records"] followed by event["Records"][0]["s3"] and ``event["Records"][0]["s3"]["bucket"]`. Whichever one blows up will let you know the problem. — tdelaney
– tdelaney, Commented Oct 27, 2020 at 21:29
You could import traceback and in your exception handler add traceback.print_exc(). — tdelaney
– tdelaney, Commented Oct 27, 2020 at 21:30
You say the code works now, but that it doesn't work? What exactly is the issue and what is the expected result, can you clarify please? — cs95
– cs95, Commented Nov 1, 2020 at 21:34

Life is complex · Accepted Answer · 2020-11-04 12:55:03Z

11

+100

I believe that your problem is likely tied to this line - df=pd.DataFrame( list(reader(data))) in your function. The answer below should allow you to read the csv file into the pandas dataframe for processes.

import boto3
import pandas as pd
from io import BytesIO

s3_client = boto3.client('s3')

def lambda_handler(event, context):
   try:
       bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
       s3_file_name = event["Records"][0]["s3"]["object"]["key"]
       resp = s3_client.get_object(Bucket=bucket_name, Key=s3_file_name)

       ###########################################
       # one of these methods should work for you. 
       # Method 1
       # df_s3_data = pd.read_csv(resp['Body'], sep=',')
       #
       # Method 2
       # df_s3_data = pd.read_csv(BytesIO(resp['Body'].read().decode('utf-8')))
       ###########################################
       print(df_s3_data.head())

   except Exception as err:
      print(err)

edited Nov 4, 2020 at 12:55

answered Nov 4, 2020 at 4:58

Life is complex

15.8k5 gold badges34 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kalenji Over a year ago

Nope. It gives me No columns to parse from file

Life is complex Over a year ago

@Kalenji Did you try the second df_s3_data that is commented out?

Kalenji Over a year ago

Method 1 worked! One additional question, did you manage to get files from S3 to lambda based on the part of the file name?

François B. · Accepted Answer · 2020-11-08 16:55:16Z

5

import json
import pandas as pd
import numpy as np
import requests
import glob
import time
import os
from datetime import datetime
from csv import reader
import boto3
import io

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]
        obj = s3_client.get_object(Bucket=bucket_name, Key= s3_file_name)
        df = pd.read_csv(obj['Body']) # 'Body' is a key word
        print(df.head())

    except Exception as err:
        print(err)
        
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello fr2om Lambda!')
    }

answered Nov 8, 2020 at 16:55

François B.

1,18610 silver badges21 bronze badges

1 Comment

Некто Over a year ago

This answer worked for me. The additional .read() method performed in the accepted answer was giving me errors

Iñigo González · Accepted Answer · 2020-11-06 08:59:58Z

You can read the S3 file directly from pandas using read_csv:

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]

        # This 'magic' needs s3fs (https://pypi.org/project/s3fs/)
        df=pd.read_csv(f's3://{bucket_name}/{s3_file_name}', sep=',')

        print (df.head())

    except Exception as err:
        print(err)

Things to remember:

Pandas needs s3fs to read remote files - see [Reading Remote Files] in pandas documentation (https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-remote-files)
You'll need to package the s3fs library with your lambda - see AWS Lambda deployment package in Python
If you're using this outside a lambda (for thesting) the tricky part is authentication.
Since you're billed for CPU and Memory usage, Pandas DataFrame.info() might help you to assess CSV memory usage and/or troubleshoot out-of-memory errors:

   # Track memory usage at cost of CPU. Great for troubleshooting. Use wisely.
   print(df.info(verbose=True, memory_usage='deep'))

Collectives™ on Stack Overflow

AWS Lambda - read csv and convert to pandas dataframe

3 Answers 3

3 Comments

1 Comment

Things to remember:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Things to remember:

Comments

Your Answer

Sign up or log in

Post as a guest

Related