Lifecycle of a Message in Amazon SQS – A Detailed Coverage

Spread the love
  •  
  •  
  •  
  •  

In case you haven’t been following this blog series on Amazon SQS, we have already looked at the mechanics of consuming messages from Amazon SQS through one of my earlier blog post, and have looked at the concepts and implementation details to implement Delivery Delays with Amazon SQS through another earlier blog post.

As you may agree as well, the most important part of Messaging is messages themselves and their reliable delivery and processing- it is important that when a message is sent, it should reach the appropriate consumer in time, the consumer should be able to read and process it in the allotted time as necessary and the queuing system should always be in a healthy state while acting as broker, continuously facilitating sending and receiving of messages. Obviously, the messaging system (broker/provider) has a lot of responsibility in providing a reliable infrastructure to move the message from its source to destination, and while this happens, the messaging broker has to manage and maintain a lot of state about the message, additionally providing appropriate mechanisms to triage messages that cannot be processed. A number of factors come into play and a lot of this is automatically taken care of by the messaging providers, in this case Amazon SQS, however it needs to be configured appropriately to suit the needs of the business. Obviously we do not want our messages to get lost or go unprocessed.

Thus gaining a thorough understanding of what stages/states a message goes through (or can go through) from the time it is sent to the time it is consumed and/or discarded and what factors influence the state of the messages is crucial. Through this blog, I hope to provide a comprehensive coverage of the complete lifecycle of a message when using Amazon SQS as a queuing system. The following sections describe various stages in the lifecycle of a message, followed by a graphical representation of the complete flow in the form of a flowchart.

A New Message

Obviously, it all starts when a message is sent. The sender sends the message to an Amazon SQS queue and it gets queued up in the queue for potential consumers. Amazon SQS queues are modeled on point-to-point messaging model, so a message can be consumed by only a single consumer at any given point in time (the ending clause ‘at any given point in time is important‘ – more on this later when we talk about Message Visibility Timeout).

Delivery of Messages can be delayed…!

Alright, so a message is sent but we do not want the message to be consumed by any of its consumers right away. This may be due to the fact that the message may make more sense when it is consumed with a delay, for example, to avoid a race condition or to cater to another dependency.

One way to achieve this is to ask SQS queue consumers to not pick up the message(s) for sometime, but that may be not practical and not in our control, hence error prone. So, how about controlling this behavior at message broker (Amazon SQS) level itself.

Yes, that is possible and there are a couple of options to do that in Amazon SQS. In fact, the behavior of Delivery Delay varies by queue types – standard queues and FIFO queues, with differential handling for existing messages, multiple mechanisms with subtle differences and customization options available to introduce Delivery Delay for messages, so this topic warranted a complete blog and may not be possible to repeat it here. So, please feel free to read through the Delivery Delay blog post to understand the details about Delivery Delays.

Messages have an expiry too…!

As everything in life, messages in Amazon SQS have a lifetime too and it is enforced (cant get away from it). This is obviously considering Amazon SQS being a cloud based high throughput messaging system, it cannot let messages just pile up in queues, without getting consumed, or without getting drained from the queue to be more precise. This is also important from the perspective that with Amazon SQS standard queues, strict message ordering is not guaranteed.

Key points to note:

  • A queue can be configured so that all messages sent to that queue would have a certain expiry period. This can be done while creating the queue, or at a later point by changing the queuing configuration through the varied mechanisms supported by Amazon SQS, including Amazon console, CLI, CloudFormation and SDK’s.
  • The queue attribute for this setting is called MessageRetentionPeriod, its value is specified in seconds, and can range from 60 seconds (1 minute) to 1209600 seconds (14 days).
  • The default value for a newly created queue is 345,600 seconds (4 days), unless modified.
  • There is no way to set the expiry period at a message level.

Let us take a look at an example of setting this value to 14 days for a queue using the Java SDK.

 

I cant process a message, what do I do…!

There could be situations where a queue contains a message, but none of the consumers of the queue can process it, may be because they do not recognize it at all, or may be because the message format has evolved overtime and there is some mismatch between what is in the queue versus what can be processed. There could be other reasons as well, like application logic error, race condition, infrastructure issues etc.

Obviously, you cannot stop processing the queue and bring the whole system down for one such poison message. The queue may have thousands of other messages that can be processed and are waiting to be processed.

In such a case, the standard messaging pattern is to move the message to a Dead Letter Queue after a certain number of retries. The Dead Letter Queue is a separate queue that is monitored separately for such messages and can be used for alerting and notifications about such events.

Amazon SQS supports configuring a Dead Letter Queue for every SQS queue you create, and lets you configure a threshold in terms of the number of times a message is retried for processing, failing which, the message is moved to the Dead Letter Queue. This configuration itself is known as Redrive policy and the threshold setting is known as maxReceiveCount within Amazon SQS. Thus, for example, if you have configured the maxReceiveCount setting as 100, Amazon SQS would move a message to the configured Dead Letter Queue only after the message has been read 100 times but still not deleted from the original queue, indicating the consumer(s) are unable to process this message.

I need isolation while processing my messages…!

Finally, the message has passed the initial hurdles and is now getting ready to be available to consumers for processing. But what if there are multiple consumers looking to consume the message. In most cases, multiple consumers would be in the form of multiple load balanced instances of the same consumer application so that it can process a large number of messages. Obviously, we would not want the same message to be processed multiple times by more than one instance of the same application, each not knowing that the other is also processing the same message. To overcome this problem, JMS based messaging brokers let the client/consuming application use CLIENT_ACKNOWLEDGE as the acknowledgement mode. There is no concept of acknowledgement modes within Amazon SQS, however there is a similar concept of message visibility timeout provided by SQS.

Message Visibility Timeout

So, imagine the case where multiple consumers are trying to read the queue for consuming messages, whenever a message is returned in the response to the read request made by a particular consumer, that message is hidden from other consumers for a specific period of time. The time window is called Visibility Timeout. The basic idea is that once that particular consumer is done processing the message, it would make a delete call to delete the message. Yes, delete is not automatic, Amazon SQS requires you to delete the message, but the mechanism itself is very similar to CLIENT_ACKNOWLEDGE within JMS, just that in JMS world, the consumer acknowledges that it is done processing and the JMS broker takes care of removing the message from the queue, while in Amazon SQS, the consumer directly makes a delete call to SQS indicating that it is done processing.

One key point to take care is that the delete should happen within that message visibility timeout period because it is during this period that the message is not available to other consumers. Once the message is visible again, it may be read by other consumers and hence reprocessed. The message is said to be Inflight while it is being read by a consumer and is within the message visibility timeout period.

The following diagram illustrates the concept of Message Visibility Timeout.

In terms of implementation, the visibility timeout can be set at two levels – at the queue level as well as while submitting a read request.

Key points to note:

  • All queues have a default visibility timeout of 30 seconds, can be configured at queue level. This can be done at the time of creation of queue or at a later point by changing the queue attributes using various mechanisms supported by Amazon SQS.
  • The maximum period supported is 12 hours.
  • Visibility timeout can be specified by consumers in the receive message request. The visibility timeout specified as part of the receive message request does not impact the overall queue level visibility timeout settings, they are just valid for messages returned in response to that particular receive message request.
  • In fact, the visibility timeout of a specific message (or a set of messages) can be extended using the ChangeMessageVisibility action/API to get more time to process a message in case the existing visibility timeout of the read message is not sufficient.

An example to set the Message Visibility Timeout at a queue level for an existing queue is given below. The Visibility Timeout at queue level applies to all messages.

An example to set the Message Visibility at a message level, for messages read in a particular read request is given below.

A picture is worth a thousand words

Alright, so we have looked at so many aspects around the states/stages that a message goes through from the time it is sent to the time it is consumed or discarded and in the process, learnt concepts, implementation details, workarounds and much more. The following diagram tries to depict all of the above information through a flowchart for easier understanding and reference.

In this diagram, the green boxes denote external actions taken by producers or consumers of the message. The structures in blue denote Message states and the amber ones denote internal SQS processes.

We have covered a lot through this blog, and hopefully it has helped you visualize the complete picture in terms of what states a message goes through (or can go through) when you are using Amazon SQS for messaging.

Feedback, suggestions are most welcome as always.

Happy learning, Happy sharing!!!

– Amit


Spread the love
  •  
  •  
  •  
  •  

5 Comments

  1. Hey Amit,

    Thanks for the detailed explanation. I would like to understand if we can retrieve messages based on a certain key/attribute at the consumers.

    For example: consumer 1 receives only the messages with key = ‘a’ and consumer 2 receives only the messages with key = ‘b’ etc

    Any solution would be highly appreciated.

    Kr,
    Ravi

    1. Hi Ravi,

      Thanks for your feedback.
      In terms of fetching messages selectively based on filters, the SQS service itself does not support this feature inherently. One can build such a filtering logic within consumers. Or one can use frameworks based on Enterprise Application Integration like Spring Integration, Apache Camel to do this, although they would be doing the same (implementing filtering over and above the SQS API), but by using such a framework, the consumers are somewhat free of implementing this filtering on their own.

      Based on the use case and the nature of filters (fixed/limited or completely dynamic/uncontrolled), one could also use SNS-SQS combination to achieve out of the box message filtering. Essentially, setup a SNS topic and multiple SQS queues subscribe to that SNS topic with subscription filters. The producer of the message sends the message to SNS topic, and the messages automatically get routed to the appropriate SQS queue(s) based on subscription filter definition. The consumers continue to consume from respective SQS queues, without having to worry about applying filters while consuming.

      Hope this helps!
      Amit

  2. Hi Amit,
    Thank you for such a piece of fruitfull information. I am using AWS FIFO SQS with JMS in Spring boot application, I need to add some delay if there is any exception in @JMSListner. Means some delay in the retry. if the retry completes, then it should move the message for Dead letter Queue.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *