How to trigger AWS Lambda function when multiple files in S3 are ready

Question

I am trying to build a service with AWS Lambda/S3 that takes as input a users email and outputs a responding email with a PDF attachment. The final PDF I send to the user is generated by merging together two types of PDFs I generate earlier in the process based on the input email. A full diagram of the architecture is found in the diagram below.

Diagram of Architecture

The issue I am encountering is with regards to the Merge PDFs Lambda function that takes in the type 1 and type 2 PDFs and produces a type 3 PDF. I need it to trigger once a complete set of type 1 and 2 PDFs is ready and waiting in S3. For example, a user sends an email and the Parse Email function kicks off the production of one type 2 PDF and fifty type 1 PDFs - as soon as these 51 PDFs are generated I want the Merge PDFs function to run. How do I get an AWS Lambda function to trigger once a set of multiple files in S3 are ready?

Can it be assumed that the single type 2 pdf will always be ready before the full set of type 1 pdfs are? — K Mo
– K Mo, Commented Nov 18, 2019 at 21:27
It is highly likely that the single type 2 pdf will be ready before the full set of type 1 pdfs because it is significantly lighter and there is only one. That being said there is no rigorous guarantee that it has to finish before the full set of type 1 pdfs. In other words yeah probably, but that feels kind of hacky — Caleb Schoepp
– Caleb Schoepp, Commented Nov 18, 2019 at 21:54
OK, second question, are the type 1 pdfs conveniently numbered, eg typeone1.pdf through typeone50.pdf? Or can they be? — K Mo
– K Mo, Commented Nov 18, 2019 at 22:04
Yeah, I can name any of the pdfs whatever I need to to make this work. Probably would also want to include the initial email request name in the pdfs name to scope it. — Caleb Schoepp
– Caleb Schoepp, Commented Nov 18, 2019 at 23:18
Also, I should add that there is a chance that type 1 pdfs could potentially fail to generate (broken URLs etc) and this should be gracefully handled — Caleb Schoepp
– Caleb Schoepp, Commented Nov 18, 2019 at 23:19

K Mo · Accepted Answer · 2019-11-19 00:34:06Z

There is no trigger that I am aware of that waits for several things to be put into S3 in one or more buckets before raising an event.

I originally thought about using a s3 trigger when a file with the suffix '50.pdf' was created, but that leaves a lot of issues around what finishes first and what happens if something50.pdf fails to generate. But if you do want to go down that route, there is some good documentation from AWS here.

An alternative would be to have the lambdas that generate the type 1 and 2 pdfs to invoke the Merge PDF Lambda once they have finished their processing.

You would need to have some sort of external state held somewhere (like a db) which noted some sort of id (which could be included the naming of the type 1 and 2 pdfs) and if type 1 pdf generation was complete and if type 2 pdf generation was complete.

So the Parse Email Lambda would need to seed a db with a reference before doing its work. Then the URL to PDF Lambda would record on the db that it had finished and check the db if the HTML to PDF Lambda had finished. If so, invoke Merge PDF Lambda (probably via SNS) or if not finish. HTML to PDF Lambda would do the same thing, except it would check to see if the URL to PDF Lambda had finished before starting the merge or finishing.

On a slightly separate note, I'd probably trigger the Clean Buckets Lambda at the end of the Merge PDF Lambda. That way you could have a Check For Unprocessed Work Lambda that triggered every hour and made some form of notification if it found anything in the buckets older than x.

Collectives™ on Stack Overflow

How to trigger AWS Lambda function when multiple files in S3 are ready

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related