0

I am trying to process .csv (30MB) file that is on S3 bucket using AWS Lambda (Python). I wrote my python code locally to process file, now trying to execute using Lambda. Having a hard time to read file line by line.

Please let me know how I can traverse file line by line using boto3 or s3 methods. Please help me on the same at the earliest. Thanks

In Lambda:

s3 = boto3.client("s3")
        file_obj = event["Records"][0]
        filename=str(file_obj['s3']['object']['key'])
        #print('file name is :', filename)
        fileObj = s3.get_object(Bucket=<mybucket>, Key=filename)
        file_content = fileObj["Body"].read().decode('utf-8')

My Original code:

import csv
import pandas as pd
import datetime
#from datetime import datetime,timedelta
import numpy as np
with open ('sample.csv', 'r') as file_name:

    csv_reader = csv.reader(file_name, delimiter=',')
    Time = []
    Latitude=[]
    Longitude= []
    Org_Units=[]
    Org_Unit_Type =[]
    Variable_Name=[]
    #New columns
    Year=[]
    Month= []
    Day =[]
    Celsius=[]
    Far=[]
    Conv_Units=[]
    Conv_Unit_Type=[]
    header = ['Time','Latitude', 'Longitude','Org_Units','Org_Unit_Type','Conv_Units','Conv_Unit_Type','Variable_Name']
    out_filename = 'Write' + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") #need to rename based on the org file name

    with open(out_filename +'.csv', 'w') as csvFile:
        outputwriter = csv.writer(csvFile, delimiter=',')
        outputwriter.writerow(header)
        next(csv_reader, None)  # avoid hearder

        for row in csv_reader:
           # print(row)
            Time = row[0]
            Org_Lat=row[1]
            Org_Long=row[2]
            Org_Units=row[3]
            Org_Unit_Type =row[4]
            Variable_Name=row[5]
            # print(Time,Org_Lat,Org_Long,Org_Units,Org_Unit_Type,Variable_Name)

            if Org_Unit_Type == 'm s-1':
                Conv_Units =round(float(Org_Units) * 1.151,2)
                Conv_Unit_Type = 'miles'
            if Org_Unit_Type == 'm':
                Conv_Units =round(float(Org_Units) / 1609.344,2)
                 # print (Org_Units,Conv_Units)
                Conv_Unit_Type = 'miles'
            if Org_Unit_Type == 'Pa':
                Conv_Units =round(float(Org_Units) / 6894.757,2)
                Conv_Unit_Type = 'Psi'
                #print(type(Time))
            date_time_obj = datetime.datetime.strptime(Time, '%m-%d-%Y, %H:%M')
             #  Year = time.strptime(date_time_obj, "%B")
            #print(date_time_obj)
            f_row =[Time,Latitude,Longitude,Org_Units,Org_Unit_Type,Conv_Units,Conv_Unit_Type,Variable_Name]
            outputwriter.writerow(f_row)
csvFile.close()
print("done")
1
  • 1
    Hi, can you explain a little bit more about what the issue with your code is, i.e. what error are you hitting, as well as provide a minimal reproducible example? Commented May 4, 2019 at 14:14

2 Answers 2

0

I think this should work the only thing you need to check is your lambda needs a role with policy which has read access on s3 bucket. Initially for testing i would give full access on s3 to the lambda AmazonS3FullAccess

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

python code

s3 = boto3.client('s3')
def lambda_handler(event, context):
    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key'].encode('utf8')
    obj = s3.get_object(Bucket=bucket, Key=key)
    rows    = obj['Body'].read().split('\n')
    print("rows" + rows)
Sign up to request clarification or add additional context in comments.

3 Comments

I am able to get the data now, my question still reamins the same on how i can itrate through and pick each row and column. For ex in python I am doing : for row in csv_reader: # print(row) Time = row[0] Org_Lat=row[1] Org_Long=row[2] My current Lambda code looks like this:
import boto3 def lambda_handler(event,context): s3 = boto3.client('s3') if event: #print("Evnet is:", event) file_obj =event['Records'][0] filename = str(file_obj['s3']['object']['key']) print ('file is ---',filename) fileobj = s3.get_object(Bucket='my bucket name', Key=filename) print('file object is--',fileobj) file_content =fileobj['Body'].read().decode('utf-8') header = ['Time','Latitude',
'Longitude','Org_Units','Org_Unit_Type','Conv_Units','Conv_Unit_Type','Variable_Name'] for row in file_content: print(row ) please let me know how I can go through each line and pick the values of the attributes for my calculations. Thanks.
0

Rather than using .read() to read the object as a stream, you might find it easier to download the object to local storage:

s3_client = boto3.client('s3', region='ap-southeast-2')
s3_client.download_file(bucket, key, '/tmp/local_file.csv')

You can then use your original program to process the file.

Once you have finished, be sure to delete the temporary file because the AWS Lambda container might be reused and there is only 500MB of disk space available.

3 Comments

Hi John, file size around 30-40MB.
Hi John, what max data file size i can process using lambda?
AWS Lambda provides 500MB of storage in /tmp/. Individual file size limitations would be based on Linux, which would be much larger than that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.