Kafka: How to handle duplicate messages

- June 11, 2025

Kafka is "at least once" by default.

????

What is "At Least Once" in Kafka? (Simple Explanation)

"At least once" means:

Kafka guarantees that every message will be delivered,
But sometimes the same message might be delivered more than once (duplicate).

🔁 Why it happens:

If a consumer crashes or times out, Kafka resends the message after retry.

(Here, what I meant by resend that Kafka allows to pull the same message. If you know about Kafka Architecture it does not push/ send messages to consumer. Instead consumer is the one pulls the messages)

If the system didn’t record that it already processed it, it may process it again.

🛠 Example:

You (Consumer) receive a payment event.
You process it and update the database.
But before you commit (Telling Kafka "I have successfully processed this message" or in other words consumer failed to submit the offset to Kafka), the consumer crashes.
Kafka thinks it failed, so it retries (allows consumers to pull the message) the message.
You process the same payment again → duplicate!

💡 To Handle It:

Make your processing idempotent (safe to repeat).
Or track processed messages (deduplication).

Below are the ways how idempotent can be guaranteed within Kafka
Use message keys

Set a unique key (like order_id) so Kafka sends messages with same key to the same partition.

🔍 What It Does:

In Kafka, a message key determines the partition it goes to.
Kafka uses a hash of the key to decide the partition.
So:
→ Messages with the same key always go to the same partition.

💡 Why That Helps:

Kafka guarantees ordering within a partition, not across partitions.
If all messages about a specific entity (e.g., an order_id) go to one partition, then:
- You can safely deduplicate or replace older messages with newer ones.
- Consumers can track state per key, making it easier to detect and skip duplicates.

🧾 Example:

Imagine you get two messages for the same order:

order_id = 123, status = "paid"
order_id = 123, status = "shipped"

If these messages go to the same partition, your consumer can:

Keep a record of the last status.
Detect if a duplicate "paid" status appears again and skip it.

Enable idempotence in producers

Kafka producer setting:
enable.idempotence = true

Ensures retries don't create duplicates.

🔍 What It Does:

This makes the Kafka producer smart enough to avoid writing the same message twice, even if a retry happens.

🧠 Why retries happen:

A producer sends a message.
Kafka receives it, but the acknowledgment gets lost (e.g., network issue).
Producer thinks it failed and retries sending the same message.

Without idempotence:

Kafka writes the same message again → duplicate.

With idempotence enabled:

Kafka knows it's the same message, and only writes it once.

🔐 How it works under the hood:

When enable.idempotence=true, Kafka uses:

Producer ID (PID) → identifies the producer.
Sequence numbers per topic-partition.

Kafka tracks:

“Hey, producer X already sent message #7 to this partition — don’t store it again.”

This makes retries safe and duplicate-free.

⚠️ Important Notes:

Idempotence only guarantees no duplicates in the Kafka log.
If a producer sends the same logical message twice with different metadata, Kafka won’t treat them as duplicates.

Use idempotent consumers / processing
- Design your system so repeated processing doesn't cause problems (e.g., check if a record already exists before inserting).

Use deduplication logic
- Store processed message IDs (in DB or cache like Redis) and skip if already processed.

Search This Blog

Simply Everything I want to share