How To Handle Failure In AWS Lambda Using DLQ (Mechanism And Setup)

2021.12.20

In this blog i have explained the failure handling mechanism and automatic retries behaviour of lambda function using DLQ. Also understand how to setup the lambda function with DLQ.

Mechanism for handling lambda function execution failure.

  • DLQ is a dead letter queue which helps to prevent message loss in the case of failure during the lambda function invocations.
  • SNS Topic is use to publish the messages. It asynchronously deliver the message every time when you publish a message on SNS.
  • It asynchronously invoke the lambda function. Initially, Lambda will put a message into an internal queue. Only lambda can see and communicate with internal queue.
  • Lambda tries to process those messages in the internal queue. If the lambda execution fails, due to some kind of runtime exception the message will be returned back to the internal queue 3 times with some degree of temporal back-off period between each attempt.
  • After 3 failures lambda will abandon this request and send it to the DLQ. All the messages that fail to be processed by the lambda function will eventually deliver to the DLQ.
  • You have two options with dead letter queue either use SNS or SQS. So with SNS you can broadcast that message to an email or mobile device. You can use SQS if you need to hold the messages for reprocessing, after fixing the problem with lambda function or perhaps changing the configuration you can attempt to reprocess those messages later.

Setup

  1. First, Go to AWS management console and create the IAM role for Lambda function. Attach following permission policies.

  2. Create lambda function.

  3. Write the lambda code to check the condition failure. Deploy the code and test it.

    • To test the condition of failure lets assign the sleep time of 3 second that means lambda will sleep for 3 second and then it will process the message.
    • Go to the General Cofiguration and Edit Basic Setting of lambda Function. Set timeout value to 2 second and add the existing execution role (IAM Role) which has been created to use with this lambda function.
    • Lambda will show the time out error. Because lambda needs 3 second to run the code but timeout in lambda configuration is 2 second. This will automatically fails the lambda function.
  4. Create SNS topic to publish the message. Follow the process shown in image and click on create topic.

    • Topic is created.
    • Now lets create the subscription. Select protocol as AWS Lambda which is the type of endpoint. Select the endpoint ARN which is the ARN of lambda function which has been created. And click on create subscription.
  5. Create SQS dead letter queue(DLQ).
    • Go to the Amazon SQS service and create a queue. Assign the name for queue and keep the remaining settings as it is.
  6. Go back to the lambda function. You can see the SNS is added to the lambda function. As SNS is an asynchronous environment. It invokes the lambda asynchronously. SNS will asynchronously deliver the message every time when message get publish on SNS. For that lets configure the asynchronous invocation for lambda.
    • Go to the configuration. Select asynchronous invocation and click on Edit. To check the retry behaviour of lambda function assign the maximum number of retry attempts when function returns the error.
    • You can send the unprocessed messages to DLQ or SNS. Configure DLQ if you need to deliver the messages to SQS dead letter queue. Select the Amazon SQS and add the name of the DLQ which has been created.
  7. Go to the SNS and publish the message.

  8. As per the configuration lambda function will try maximum 2 time to process the message. If the lambda is unable to process that message then eventually that message is delivered to the DLQ. Check if the message is received in DLQ. Go to the DLQ. Click on send and receive messages and Start polling for the message. Here you can see the message received in DLQ. Also you can check the cloudwatch logs to check the log details.

Conclusion

Failures do happen occasionally, somethings go wrong with the lambda function code or events. To overcome this situation you need to have a plan to retrieve the messages which are failed to process. It is important to be prepared to handle the failure. Hence, it is the good idea to setup the DLQ to avoid the worst case situations and make the system more reliable.