AWS Lambda, retries on timeout, Python SDK

Question

I was trying to invoke a Lambda function through the Python SDK in a synchronous fashion in a Jupyter notebook. An event I am sending is such that it takes more than the maximum possible timeout limit (15 min) to complete.

I noticed that the event sometimes (not always) is being re-sent to the lambda upon the timeout error. This keeps going on and on until I shutdown the lambda by setting its concurrency to 0. This never happens if I lower the timeout limit (e.g., 10 minutes), meaning, the event is never being re-sent, there is only one invocation in the log, only one error and no activity afterwards.

What is going on? How do I rationalize these observations?

Hi, have you looked at setting a maximum retry allocation as well as configuration of a DLQ? More information here: aws.amazon.com/about-aws/whats-new/2019/11/… — Chris Williams
– Chris Williams, Commented Jun 3, 2020 at 17:04
@mokugo-devops Yes, of course. But that applies only to asynchronous invocations in my understanding. — slava-kohut
– slava-kohut, Commented Jun 3, 2020 at 17:06
Alternatively can you try looking at putting it into a step function and invoking that instead? — Chris Williams
– Chris Williams, Commented Jun 3, 2020 at 17:08

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

I recommend turning on DEBUG level debugging and examining CloudWatch logs when you see it get executed more than once. I've seen this sometimes, and when I do I usually see log entries that come from the SDK code itself that tell me it has some built-in retry logic that is executing. If the call to invoke the lambda doesn't get a proper response, it may retry the call again--but it is possible the service received the original request and executed it, yet something went wrong with the response and so the caller re-at

Check out what is said at this link: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-retry-timeout-sdk/

Note: API calls can take longer than expected when network connection issues occur. Network issues can also cause retries and duplicated API requests. To prepare for these occurrences, your Lambda function must always be idempotent.

If you make an API call using an AWS SDK and the call fails, the SDK automatically retries the call. How long and how many times the SDK retries is determined by settings that vary among each SDK.

That article has tips for troubleshooting or changing config settings.

Also see https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jun 3, 2020 at 22:38

Shawn

9,5726 gold badges41 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

slava-kohut Over a year ago

Great info, thanks so much - investigating. I will the question open for the time being in case someone else contributes.

Shawn Over a year ago

Are you still waiting for more answers on this one?

Chris Williams · Accepted Answer · 2020-06-03 17:15:51Z

1

Try looking at step functions, by doing this you can control the retry logic of Lambda and mark it as a failure.

IF your Lambda function is taking 15 minutes, determine whether you can break it down into smaller Lambda functions and invoke each of these in turn in your Lambda function.

answered Jun 3, 2020 at 17:15

Chris Williams

35.7k4 gold badges46 silver badges79 bronze badges

Collectives™ on Stack Overflow

AWS Lambda, retries on timeout, Python SDK

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related