Blog Logo

Kafka Retries and Maintaining the Order of Retry Events

In a microservices system, retries are crucial especially when a service goes down. But with an event-driven architecture and Kafka consuming events, retry strategies become more complex. This article explores the problem of maintaining the order of retry events and proposes a non-blocking retry approach. The article also applies this approach to Altitude, a hotel digital platform that uses Kafka as a stream processor for automatic key management. The key service in Altitude consumes events from Kafka and sends requests to the key server. As the key server can go down for a few seconds, retry policy is important to ensure that the reservations are still managed seamlessly. The article compares the simple retry and non-blocking retry strategies and shows how non-blocking retry can reduce the waiting time of events in the worst-case scenario. Finally, the article concludes with some practical tips on implementing the non-blocking retry strategy in Kafka consumers.