AWS Lambdas Retry Behaviors

4 min readJun 17, 2022

In this article, I’ll share you the reason why the lambda retry

#1: When the lambda will retry

https://docs.aws.amazon.com/lambda/latest/operatorguide/sqs-retries.html

Before answering the question, we need to understand AWS lambda retry behaviors and which kind of lambda event source AWS supports retriable, We have 3 types of event source supported retry is

1) Synchronous events ( API Gateway, Cognito, CloudFormation, CloudFront)

When the lambda will retry: When you invoke a function directly, you can check the response for errors and retry. The AWS CLI and AWS SDK also automatically retry on client timeouts, throttling, and service errors (error code 502). Link

2) Asynchronous events (S3, SNS, SES, CloudWatch Events..)

When you invoke a function asynchronously, you don’t wait for a response from the function code. You hand off the event to Lambda and Lambda handles the rest. You can configure how Lambda handles errors and can send invocation records to a downstream resource to chain together components of your application.

When the lambda will retry: Lambda manages the function’s asynchronous event queue and attempts to retry on errors. If the function returns an error, Lambda attempts to run it two more times, with a one-minute wait between the first two attempts, and two minutes between the second and third attempts. Function errors include errors returned by the function’s code and errors returned by the function’s runtime, such as timeouts.

If the function doesn’t have enough concurrency available to process all events, additional requests are throttled. For throttling errors (429) and system errors (500-series), Lambda returns the event to the queue and attempts to run the function again for up to 6 hours. The retry interval increases exponentially from 1 second after the first attempt to a maximum of 5 minutes. If the queue contains many entries, Lambda increases the retry interval and reduces the rate at which it reads events from the queue. Link

Configuring error handling for asynchronous invocation — You can set it up when creating the lambda.

Maximum age of event — The maximum amount of time Lambda retains an event in the asynchronous event queue, up to 6 hours.
Retry attempts — The number of times Lambda retries when the function returns an error, between 0 and 2.

3) Stream-based events (DynamoDB, SQS, …)

When the lambda will retry: AWS will triggers the Lambda functions again until they’re returned successfully or the data expires, and Your lambda execution will be block the event source until in finished

Without Stream-Based events you can configuration the batch record and number of retry some configuration

MaximumRetryAttempts: (Streams only) Discard records after the specified number of retries.

BatchSize: The maximum number of records in each batch that Lambda pulls from your stream or queue and sends to your function. Lambda passes all of the records in the batch to the function in a single call, up to the payload limit for synchronous invocation (6 MB).

Default value: Varies by service. For Amazon SQS, the default is 10. For all other services, the default is 100.

Related setting: When you set BatchSize to a value greater than 10, you must set MaximumBatchingWindowInSeconds to at least 1.

In case, if the lambda throw an error, the entire batch is reprocessed until the it was run succesfully, or until the all messages in the event source ( queue, streams) expired. link

#2: What is problem from your code make the lambda retry?

Your lambda function failed by some situations and the lambda will be trigger again with error belows

Out of memory : Base on your configuration and some exception case ( read file from S3 or external service ) and process it ( Example in the serverless framework you can setup it in the serverless yml file )

From your cloud watch you can see the error like : Memory Size 384 MB Max Memory Used: 400 MB

Timeout exception : Some reason your code must to wait a process and the time processing exceed the timeout configuration ( example : Task timed out after 30.00 seconds )
Coding mistake : some error from your code make the lambda throw exception and it also make the lambda retry ( tha’t so werid )

In the next article, I’ll show you the way handle error from the lambda and avoid the unneccesary retry workflow.