1

I am writing a AWS Lambda function that deletes 100,000 objects per lambda function call from S3 bucket. I am trying to see if I can create and run the deletion on a background threads. I have the following code.

import boto3
import boto3.session
from threading import Thread

http_response = []
MAX = 999
threads = []

class myThread(Thread):
  def __init__(self, objects_to_delete, bucket_name):
    Thread.__init__(self)

    self.objects_to_delete = objects_to_delete
    self.bucket_name = bucket_name

  def run(self):
    session = boto3.session.Session().client('s3')
    s3 = session.client('s3')

    #### 
     COMES HERE AND PRINTS THE NAME OF THE BUCKET.
    ####
    print(self.bucket_name)

    response = s3.delete_objects(Bucket=bucket_name, Delete={'Objects': objects_to_delete[0:MAX] })

    #### 
      THIS IS NOT GETTING PRINTED. MEANING, delete_object IS BREAKING/NOT EXECUTING
    ####
    print(response)


def handler(event, context):

  keys = event['keys']
  bucket_name = event["bucket"]

   if (len(keys) == 0 or len(bucket_name) == 0):
    return {
        "message": http_response
    }

try:
        t = myThread(objects_to_delete[0:MAX], bucket_name)
        t.start()
        threads.append(t)

except:
    print("Something Went wrong!!! " + str(objects_to_delete))


del keys[0:MAX]

 for i in range(len(threads)):
     threads[i].start()


handler({'keys': keys, 'bucket': bucket_name}, context)

Is there anything wrong I am doing here? Seems like thread is starting, however it's not making the "delete_objects" call. It's not even returning any error messages to learn about the error. Any thoughts or ideas?

One more thing, when I run this function locally on my computer, it runs just fine without any problem.

3
  • why are you using threading? you can delete up to 1000 keys at once and you're making a single request to delete 999, if you were deleting more I don't see any iteration logic or the boto3 paginator Commented Nov 10, 2017 at 2:45
  • Sorry, I will be deleting up to 100, 000 objects per every Lambda function call. Commented Nov 10, 2017 at 2:50
  • 3
    Side-note: An efficient way to delete lots of objects from Amazon S3 is to use a Lifecycle rule that applies to a particular path. Deletions will not be immediate, but they will be done for free. Commented Nov 10, 2017 at 3:27

1 Answer 1

4

turns out after starting a thread, you should join them because, once the process quits, the threads die as well. So I did the following

import boto3
from threading import Thread

MAX = 999
threads = []

class myThread(Thread):
    def __init__(self, bucket_name, objects):
        Thread.__init__(self)
        self.bucket_name = bucket_name
        self.objects = objects

def run(self):
    s3 = boto3.client('s3', region_name="us-east-1")
    response = s3.delete_objects(Bucket=self.bucket_name, Delete={'Objects':self.objects})
    print(response)


 def handler(event, context):

   keys = event["keys"]
   bucket_name = event["bucket"]

   objects_to_delete = [1...100,000]

   while (len(objects_to_delete) != 0):
      t = myThread(bucket_name, objects_to_delete[0:MAX])
      threads.append(t)
      del objects_to_delete[0:MAX]


    for thread in threads:
       thread.start()

    for thread in threads:
       thread.join()

    return {
       "message": "Success Message."
    }
Sign up to request clarification or add additional context in comments.

1 Comment

Will the lambda provide a response only after the deletion is complete in your use case?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.