8

I have got a simple Lambda code to read the csv file from S3 Bucket. All is working fine however I tried to get the csv data to pandas data frame and the error comes up string indices must be integers

My code is bog-standard but I just need to use the csv as a data frame for further manipulation. The hashed line is the source of the error. I can print data with no problems so the bucket and file details are configured properly.

updated code

import json
import pandas as pd
import numpy as np
import requests
import glob
import time
import os
from datetime import datetime
from csv import reader
import boto3
import traceback
import io

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]
        resp = s3_client.get_object(Bucket=bucket_name, Key=s3_file_name)
        
        data = resp['Body'].read().decode('utf-8')
        df=pd.DataFrame( list(reader(data)))
        print (df.head())

    except Exception as err:
        print(err)
        

        
        
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello fr2om Lambda!')
    }
    
    traceback.print_exc()
7
  • Please include the traceback message so that we can easily spot the errant line. Commented Oct 27, 2020 at 21:25
  • 2
    Did you try pd.read_csv(data)? Commented Oct 27, 2020 at 21:28
  • 1
    When you have something like event["Records"][0]["s3"]["bucket"]["name"] giving you a problem, you can toss in some thow away code to narrow it down. event["Records"] followed by event["Records"][0]["s3"] and ``event["Records"][0]["s3"]["bucket"]`. Whichever one blows up will let you know the problem. Commented Oct 27, 2020 at 21:29
  • You could import traceback and in your exception handler add traceback.print_exc(). Commented Oct 27, 2020 at 21:30
  • 1
    You say the code works now, but that it doesn't work? What exactly is the issue and what is the expected result, can you clarify please? Commented Nov 1, 2020 at 21:34

3 Answers 3

11
+100

I believe that your problem is likely tied to this line - df=pd.DataFrame( list(reader(data))) in your function. The answer below should allow you to read the csv file into the pandas dataframe for processes.

import boto3
import pandas as pd
from io import BytesIO

s3_client = boto3.client('s3')

def lambda_handler(event, context):
   try:
       bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
       s3_file_name = event["Records"][0]["s3"]["object"]["key"]
       resp = s3_client.get_object(Bucket=bucket_name, Key=s3_file_name)

       ###########################################
       # one of these methods should work for you. 
       # Method 1
       # df_s3_data = pd.read_csv(resp['Body'], sep=',')
       #
       # Method 2
       # df_s3_data = pd.read_csv(BytesIO(resp['Body'].read().decode('utf-8')))
       ###########################################
       print(df_s3_data.head())

   except Exception as err:
      print(err)
Sign up to request clarification or add additional context in comments.

3 Comments

Nope. It gives me No columns to parse from file
@Kalenji Did you try the second df_s3_data that is commented out?
Method 1 worked! One additional question, did you manage to get files from S3 to lambda based on the part of the file name?
5
import json
import pandas as pd
import numpy as np
import requests
import glob
import time
import os
from datetime import datetime
from csv import reader
import boto3
import io

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]
        obj = s3_client.get_object(Bucket=bucket_name, Key= s3_file_name)
        df = pd.read_csv(obj['Body']) # 'Body' is a key word
        print(df.head())

    except Exception as err:
        print(err)
        
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello fr2om Lambda!')
    }

1 Comment

This answer worked for me. The additional .read() method performed in the accepted answer was giving me errors
1

You can read the S3 file directly from pandas using read_csv:

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]

        # This 'magic' needs s3fs (https://pypi.org/project/s3fs/)
        df=pd.read_csv(f's3://{bucket_name}/{s3_file_name}', sep=',')

        print (df.head())

    except Exception as err:
        print(err)

Things to remember:

   # Track memory usage at cost of CPU. Great for troubleshooting. Use wisely.
   print(df.info(verbose=True, memory_usage='deep'))  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.