1

I am trying to create a Lambda function which will clean automatically csv files from an S3 bucket. The S3 bucket receives files every 5mn, and I have therefore created a trigger for the Lambda function. To clean the csv files I will use pandas library to create a dataframe. I have already installed a pandas layer. When creating a dataframe, there is an error message. This is my code:

import json
import boto3
import pandas as pd
from io import StringIO


#call s3 bucket
client = boto3.client('s3')

def lambda_handler(event, context):
    
    #define bucket_name and object _name
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    object_name = event['Records'][0]['s3']['object']['key']
    
    #create a df from the object
    df = pd.read_csv(object_name)
    

This is the error message:

[ERROR] FileNotFoundError: [Errno 2] No such file or directory: 'object_name'

On Cloudwatch it additionally says:

OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k

Has anyone experienced the same issues? Thanks in advance for all your help!

2
  • read_csv("object_name") - I hope you noticed that "object_name" is a string here and not the actual variable declared 2 lines above. Commented Jun 4, 2022 at 12:30
  • Thanks for your answer! I have changed it to read_csv(object_name), and I get the following error message: "errorMessage": "[Errno 2] No such file or directory: 'test%2Fkey'", Commented Jun 4, 2022 at 12:37

3 Answers 3

2

You have to use the s3 client to download the file from s3 before using pandas. Something like:

response = client.get_object(Bucket=bucket_name, Key=object_name)
df = pd.read_csv(response["Body"])

You'll have to make sure lambda has the right permissions to access the s3 bucket.

Sign up to request clarification or add additional context in comments.

Comments

0

Change this line:

df = pd.read_csv("object_name")

to this:

df = pd.read_csv(object_name)

3 Comments

thank you ! just updated it, however still an error message..
I changed it in the post, thanks for your insight!
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
0

Cause of error

object_name is just a relative path(key) of the s3 object with respect to bucket and it has no significance without the bucket_name hence when you are trying to read the csv file you are getting FileNotFoundError

Solution for the error

In order to properly refer the s3 object you have to construct the fully qualified s3 path from bucket_name and object_name. Also notice that the object key has some quoted characters so before constructing the fully qualified path you have to unquote them.

from urllib.parse import unquote_plus

def lambda_handler(event, context):
    
    #define bucket_name and object _name
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    object_name = event['Records'][0]['s3']['object']['key']
    
    #create a df from the object
    filepath = f's3://{bucket_name}/{unquote_plus(object_name)}'
    df = pd.read_csv(filepath)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.