Process CSV line by line from S3 using python on Lambda

Question

I am trying to process .csv (30MB) file that is on S3 bucket using AWS Lambda (Python). I wrote my python code locally to process file, now trying to execute using Lambda. Having a hard time to read file line by line.

Please let me know how I can traverse file line by line using boto3 or s3 methods. Please help me on the same at the earliest. Thanks

In Lambda:

s3 = boto3.client("s3")
        file_obj = event["Records"][0]
        filename=str(file_obj['s3']['object']['key'])
        #print('file name is :', filename)
        fileObj = s3.get_object(Bucket=<mybucket>, Key=filename)
        file_content = fileObj["Body"].read().decode('utf-8')

My Original code:

import csv
import pandas as pd
import datetime
#from datetime import datetime,timedelta
import numpy as np
with open ('sample.csv', 'r') as file_name:

    csv_reader = csv.reader(file_name, delimiter=',')
    Time = []
    Latitude=[]
    Longitude= []
    Org_Units=[]
    Org_Unit_Type =[]
    Variable_Name=[]
    #New columns
    Year=[]
    Month= []
    Day =[]
    Celsius=[]
    Far=[]
    Conv_Units=[]
    Conv_Unit_Type=[]
    header = ['Time','Latitude', 'Longitude','Org_Units','Org_Unit_Type','Conv_Units','Conv_Unit_Type','Variable_Name']
    out_filename = 'Write' + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") #need to rename based on the org file name

    with open(out_filename +'.csv', 'w') as csvFile:
        outputwriter = csv.writer(csvFile, delimiter=',')
        outputwriter.writerow(header)
        next(csv_reader, None)  # avoid hearder

        for row in csv_reader:
           # print(row)
            Time = row[0]
            Org_Lat=row[1]
            Org_Long=row[2]
            Org_Units=row[3]
            Org_Unit_Type =row[4]
            Variable_Name=row[5]
            # print(Time,Org_Lat,Org_Long,Org_Units,Org_Unit_Type,Variable_Name)

            if Org_Unit_Type == 'm s-1':
                Conv_Units =round(float(Org_Units) * 1.151,2)
                Conv_Unit_Type = 'miles'
            if Org_Unit_Type == 'm':
                Conv_Units =round(float(Org_Units) / 1609.344,2)
                 # print (Org_Units,Conv_Units)
                Conv_Unit_Type = 'miles'
            if Org_Unit_Type == 'Pa':
                Conv_Units =round(float(Org_Units) / 6894.757,2)
                Conv_Unit_Type = 'Psi'
                #print(type(Time))
            date_time_obj = datetime.datetime.strptime(Time, '%m-%d-%Y, %H:%M')
             #  Year = time.strptime(date_time_obj, "%B")
            #print(date_time_obj)
            f_row =[Time,Latitude,Longitude,Org_Units,Org_Unit_Type,Conv_Units,Conv_Unit_Type,Variable_Name]
            outputwriter.writerow(f_row)
csvFile.close()
print("done")

Hi, can you explain a little bit more about what the issue with your code is, i.e. what error are you hitting, as well as provide a minimal reproducible example? — vekerdyb
– vekerdyb, Commented May 4, 2019 at 14:14

amittn · Accepted Answer · 2019-05-04 07:51:19Z

0

I think this should work the only thing you need to check is your lambda needs a role with policy which has read access on s3 bucket. Initially for testing i would give full access on s3 to the lambda AmazonS3FullAccess

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

python code

s3 = boto3.client('s3')
def lambda_handler(event, context):
    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key'].encode('utf8')
    obj = s3.get_object(Bucket=bucket, Key=key)
    rows    = obj['Body'].read().split('\n')
    print("rows" + rows)

answered May 4, 2019 at 7:51

amittn

2,3651 gold badge14 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

RajaR Over a year ago

I am able to get the data now, my question still reamins the same on how i can itrate through and pick each row and column. For ex in python I am doing : for row in csv_reader: # print(row) Time = row[0] Org_Lat=row[1] Org_Long=row[2] My current Lambda code looks like this:

RajaR Over a year ago

import boto3 def lambda_handler(event,context): s3 = boto3.client('s3') if event: #print("Evnet is:", event) file_obj =event['Records'][0] filename = str(file_obj['s3']['object']['key']) print ('file is ---',filename) fileobj = s3.get_object(Bucket='my bucket name', Key=filename) print('file object is--',fileobj) file_content =fileobj['Body'].read().decode('utf-8') header = ['Time','Latitude',

RajaR Over a year ago

'Longitude','Org_Units','Org_Unit_Type','Conv_Units','Conv_Unit_Type','Variable_Name'] for row in file_content: print(row ) please let me know how I can go through each line and pick the values of the attributes for my calculations. Thanks.

John Rotenstein · Accepted Answer · 2019-05-04 07:55:03Z

0

Rather than using .read() to read the object as a stream, you might find it easier to download the object to local storage:

s3_client = boto3.client('s3', region='ap-southeast-2')
s3_client.download_file(bucket, key, '/tmp/local_file.csv')

You can then use your original program to process the file.

Once you have finished, be sure to delete the temporary file because the AWS Lambda container might be reused and there is only 500MB of disk space available.

answered May 4, 2019 at 7:55

John Rotenstein

273k28 gold badges456 silver badges541 bronze badges

3 Comments

RajaR Over a year ago

Hi John, file size around 30-40MB.

RajaR Over a year ago

Hi John, what max data file size i can process using lambda?

John Rotenstein Over a year ago

AWS Lambda provides 500MB of storage in /tmp/. Individual file size limitations would be based on Linux, which would be much larger than that.

Collectives™ on Stack Overflow

Process CSV line by line from S3 using python on Lambda

2 Answers 2

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related