KLogic
Log Compaction Guide

Kafka Log Compaction

Master log compaction in Kafka to efficiently retain the latest state for each key while reducing storage costs and maintaining data integrity in your streaming applications.

Published: January 10, 2026 • 16 min read • Data Management Guide

Understanding Kafka Log Compaction

Kafka log compaction is a retention mechanism that ensures Kafka retains at least the last known value for each message key within a partition. Unlike time-based or size-based retention that deletes old segments entirely, log compaction selectively removes records while preserving the most recent update for each key.

How Kafka Compaction Works

The log cleaner thread periodically scans log segments and:

  • 1.Identifies duplicate keys across segments
  • 2.Retains only the latest value for each key
  • 3.Removes older records with the same key (tombstones excepted)
  • 4.Rewrites cleaned segments to reclaim disk space

When to Use Kafka Log Compaction

Change Data Capture (CDC)

Replicate database state to Kafka. Compaction ensures you always have the latest row state for each primary key without storing every historical change.

Kafka Streams State Stores

Changelog topics for KTable state stores use compaction to rebuild state efficiently after application restarts or rebalances.

Configuration Management

Store application configuration where you only need the latest setting value for each configuration key.

User Profile/Session State

Maintain current user profiles or session data where historical versions are not needed after updates.

Log Compaction Configuration

Topic-Level Configuration

# Enable compaction for a topic
kafka-topics.sh --alter --topic my-compacted-topic \
  --config cleanup.policy=compact

Set cleanup.policy=compact to enable log compaction. Use cleanup.policy=compact,delete for both compaction and time-based deletion.

Key Configuration Parameters

ParameterDefaultDescription
min.cleanable.dirty.ratio0.5Minimum ratio of dirty log to trigger compaction
min.compaction.lag.ms0Minimum time before a message can be compacted
max.compaction.lag.msMaximum time a message can remain uncompacted
delete.retention.ms86400000Time to retain tombstone markers (24 hours)
segment.ms604800000Time before rolling a new segment (7 days)

Broker-Level Configuration

# server.properties
log.cleaner.enable=true
log.cleaner.threads=2
log.cleaner.dedupe.buffer.size=134217728
log.cleaner.io.buffer.size=524288
log.cleaner.io.max.bytes.per.second=1.7976931348623157E308

These broker settings control the log cleaner behavior across all compacted topics.

Tombstones: Deleting Data in Compacted Topics

In compacted topics, you cannot simply delete a key by omitting it. Instead, you must publish a tombstone—a message with the same key but a null value. The compactor will eventually remove all records for that key.

Publishing a Tombstone

// Java Producer Example
producer.send(new ProducerRecord<>("my-topic", "key-to-delete", null));

Tombstones are retained for delete.retention.ms (default 24 hours) to ensure downstream consumers see the delete before it's removed.

Important: Tombstone Timing

If a consumer is offline for longer than delete.retention.ms, it may miss tombstone records and not see that a key was deleted. Consider increasing this value for systems with infrequent consumers.

Log Compaction Best Practices

1. Design Keys Carefully

Keys are the unit of compaction. Ensure your key design matches your deduplication needs.

Use entity identifiers (user_id, order_id) as keys for state management use cases.

2. Set Appropriate min.compaction.lag.ms

Prevent premature compaction of recent messages that consumers may still need.

Set to at least your maximum consumer downtime to ensure consumers see all intermediate states if needed.

3. Monitor Compaction Lag

Track how far behind the cleaner is to ensure compaction keeps up with data ingestion.

Monitor kafka.log:type=LogCleaner,name=max-dirty-percent to detect compaction issues.

4. Size Your Dedup Buffer

The log cleaner uses an in-memory buffer to track keys during compaction.

Increase log.cleaner.dedupe.buffer.size for topics with many unique keys to avoid multiple compaction passes.

5. Use compact,delete for Bounded Retention

Combine compaction with time-based deletion for storage-bounded use cases.

cleanup.policy=compact,delete compacts within the retention window, then deletes old segments entirely.

Monitor Kafka Compaction with KLogic

KLogic provides comprehensive monitoring for Kafka log compaction, helping you track cleaner progress, identify compaction bottlenecks, and optimize your topic configurations.

Real-time compaction progress tracking
Dirty ratio monitoring and alerting
Storage savings visualization
Configuration recommendations
Try KLogic Free