Kafka Log Compaction
Master log compaction in Kafka to efficiently retain the latest state for each key while reducing storage costs and maintaining data integrity in your streaming applications.
Understanding Kafka Log Compaction
Kafka log compaction is a retention mechanism that ensures Kafka retains at least the last known value for each message key within a partition. Unlike time-based or size-based retention that deletes old segments entirely, log compaction selectively removes records while preserving the most recent update for each key.
How Kafka Compaction Works
The log cleaner thread periodically scans log segments and:
- 1.Identifies duplicate keys across segments
- 2.Retains only the latest value for each key
- 3.Removes older records with the same key (tombstones excepted)
- 4.Rewrites cleaned segments to reclaim disk space
When to Use Kafka Log Compaction
Change Data Capture (CDC)
Replicate database state to Kafka. Compaction ensures you always have the latest row state for each primary key without storing every historical change.
Kafka Streams State Stores
Changelog topics for KTable state stores use compaction to rebuild state efficiently after application restarts or rebalances.
Configuration Management
Store application configuration where you only need the latest setting value for each configuration key.
User Profile/Session State
Maintain current user profiles or session data where historical versions are not needed after updates.
Log Compaction Configuration
Topic-Level Configuration
# Enable compaction for a topic
kafka-topics.sh --alter --topic my-compacted-topic \
--config cleanup.policy=compactSet cleanup.policy=compact to enable log compaction. Use cleanup.policy=compact,delete for both compaction and time-based deletion.
Key Configuration Parameters
| Parameter | Default | Description |
|---|---|---|
min.cleanable.dirty.ratio | 0.5 | Minimum ratio of dirty log to trigger compaction |
min.compaction.lag.ms | 0 | Minimum time before a message can be compacted |
max.compaction.lag.ms | ∞ | Maximum time a message can remain uncompacted |
delete.retention.ms | 86400000 | Time to retain tombstone markers (24 hours) |
segment.ms | 604800000 | Time before rolling a new segment (7 days) |
Broker-Level Configuration
# server.properties
log.cleaner.enable=true
log.cleaner.threads=2
log.cleaner.dedupe.buffer.size=134217728
log.cleaner.io.buffer.size=524288
log.cleaner.io.max.bytes.per.second=1.7976931348623157E308These broker settings control the log cleaner behavior across all compacted topics.
Tombstones: Deleting Data in Compacted Topics
In compacted topics, you cannot simply delete a key by omitting it. Instead, you must publish a tombstone—a message with the same key but a null value. The compactor will eventually remove all records for that key.
Publishing a Tombstone
// Java Producer Example
producer.send(new ProducerRecord<>("my-topic", "key-to-delete", null));Tombstones are retained for delete.retention.ms (default 24 hours) to ensure downstream consumers see the delete before it's removed.
Important: Tombstone Timing
If a consumer is offline for longer than delete.retention.ms, it may miss tombstone records and not see that a key was deleted. Consider increasing this value for systems with infrequent consumers.
Log Compaction Best Practices
1. Design Keys Carefully
Keys are the unit of compaction. Ensure your key design matches your deduplication needs.
2. Set Appropriate min.compaction.lag.ms
Prevent premature compaction of recent messages that consumers may still need.
3. Monitor Compaction Lag
Track how far behind the cleaner is to ensure compaction keeps up with data ingestion.
kafka.log:type=LogCleaner,name=max-dirty-percent to detect compaction issues.4. Size Your Dedup Buffer
The log cleaner uses an in-memory buffer to track keys during compaction.
log.cleaner.dedupe.buffer.size for topics with many unique keys to avoid multiple compaction passes.5. Use compact,delete for Bounded Retention
Combine compaction with time-based deletion for storage-bounded use cases.
cleanup.policy=compact,delete compacts within the retention window, then deletes old segments entirely.Monitor Kafka Compaction with KLogic
KLogic provides comprehensive monitoring for Kafka log compaction, helping you track cleaner progress, identify compaction bottlenecks, and optimize your topic configurations.