How to read and overwrite a file in AWS s3 using Lambda and Python?

Question

I'm trying the following. But when i overwite a file which was invoked by lambda, due to this it is going in a loop. Can you anyone please help me. Below also pasted the piece of code which am using for lambda.

Task

Read a file in a folder called 'Folder A' when it is uploaded to this folder
Then replace a particualr column which has character more then 10
then upload this file back to the same folder but unfortunately it is going in a loop due to lambda invoke
Tried moved to a different folder called TrimmedFile then it is working fine without any loops.

Can someone tell me how to read, edit, save the file in the same folder which was invoked?

    import json
    import urllib.parse
    import boto3
    import json
    import os
    import csv
    print('Loading function')
    s3 = boto3.client('s3')
    
    def lambda_handler(event, context):
        # Get the object from the event and show its content type
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
        try:
            #print("CONTENT TYPE: " + key['ContentType'])
            #for record in event['Records']:
            print("file name " + key)
            #bucket = record['s3']['bucket']['name']
            #file_key = urllib.parse.unquote_plus(record['s3']['object']['key'], encoding='utf-8')
        
        file_key = key
        csvfile = s3.get_object(Bucket=bucket, Key=file_key)
        csvcontent = csvfile["Body"].read().decode("utf-8")
        file = csvcontent.split("\n")
        csv_reader = csv.reader(file)
        line_count = 0
        colindex = ''
        content = []
        contentstring = ''
        s33 = boto3.resource('s3')
        copy_source = {
              'Bucket': bucket,
              'Key': file_key
            }
        new_bucket = s33.Bucket(bucket)
        print(file_key)
        print(bucket)
        src_folder = "FolderA/"
        new_filekey = file_key.replace(src_folder,"")
        print(new_filekey)
        new_bucket.copy(copy_source, 'BKP/' + new_filekey )
        for row in csv_reader:
            if row:
                row = list(map(str.strip, row))
                if line_count == 0:
                    if 'ColToTruncate' in row:
                        colindex = row.index('ColToTruncate')
                        line_count += 1
                    else:
                        print('No ColToTruncate column found in '+ file_key)
                        return 'No ColToTruncate column found in '+ file_key
                else:
                    if len(row[colindex ]) >= 10:
                        row[colindex ] = row[colindex ][0:2]
                    line_count += 1  
                content.append(row)
                contentstring += ', '.join(row) 
                contentstring = contentstring + '\n'
        #print(contentstring)
        #filename = file_key + '.csv'
        uploadByteStream = bytes(contentstring.encode('utf-8'))
        #new_key = 'TrimmedFiles/' + new_filekey
        s3.put_object(Bucket=bucket, Key=file_key , Body=uploadByteStream)
        return True
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

Naman · Accepted Answer · 2021-05-12 10:27:44Z

2

I believe you have created an event Trigger on S3 and associated it with Lambda and when you are replacing the file you get the lambda triggered and it becomes a loop.

There could be 2 ways to handle it:

1.Configure a PUT OR POST event type ( which ever suits your case) to trigger the lambda. Now save the updated file at another location and then copy it to the original one. Doing this s3 will generate a "S3:ObjectCreated:Copy" event which will not invoke the Lambda again.

 # Copying file from secondary location to original location
 copy_sr = {
        "Bucket":bucket,
        "Key"   :file_key_copy
        
    }
    
    s3_resource.meta.client.copy(copy_sr, 
    final_bucket,file_key_copy
    )
    
    #Deleting the file from the secondary location
    s3_client.delete_object(Bucket=bucket,
    Key=file_key_copy
    )

2.Use SQS queue and configure it not to precess any message received twice in a specified period of time ( depending on the frequency of file getting updated)

edited May 12, 2021 at 10:27

answered May 12, 2021 at 8:25

Naman

3062 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jeeva Over a year ago

The first option was really cool. It worked. Thanks.

KnowledgeGainer · Accepted Answer · 2021-05-11 16:11:06Z

1

This is to demonstrate how to read a file and and replace it after editing. It can act as a skeleton code.

import boto3
import base64
import json
import io


client = boto3.client('s3')
res = boto3.resource('s3')

def lambda_handler(event, context):

    file_key = event['file_key']
    file_obj = s3_res.Object("bucket_name", file_key)

    content_obj = file_obj.get()['Body'].read().decode('utf-8') # fetching the data in

    res.Object("bucket_name", file_key).delete() # Here you are deleting the old file

    ######Performing your operation and saving in new_data variable#########

    new_file = io.BytesIO(new_data.encode())

    client.upload_fileobj(new_file, "bucket_name", file_key) # uploading the file at the exact same location.

answered May 11, 2021 at 16:11

KnowledgeGainer

1,1071 gold badge7 silver badges15 bronze badges

4 Comments

Jeeva Over a year ago

the last line which says to upload in the same location. Will it not invoke a lamda again and go on a infinite loop.

KnowledgeGainer Over a year ago

@Jeeva In the skeleton code, what I am doing is, that I am receiving input of key from event, then I am using that key to read the file from S3, after reading the file and storing in content_obj , I am deleting the file, after that once a certain operation on content_obj is done and stored in another variable called new_file, converting that data to bytes and then uploading it on the exact same location in S3.

Jeeva Over a year ago

Tried your skeleton code and still it is going in a loop

KnowledgeGainer Over a year ago

@Jeeva It seems the problem is in the updation part Of file.

Collectives™ on Stack Overflow

How to read and overwrite a file in AWS s3 using Lambda and Python?

2 Answers 2

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related