Kafka Cluster Configuration
Master Kafka cluster configuration with this comprehensive guide. Learn essential broker settings, topic configurations, and production-ready setup strategies for reliable streaming infrastructure.
Kafka Cluster Architecture Basics
A Kafka cluster consists of multiple brokers working together to provide fault-tolerant, scalable message streaming. Proper cluster configuration is essential for performance, reliability, and operational efficiency.
Cluster Sizing Guidelines
1-3 brokers, minimal resources
3-5 brokers, mirrors production
5+ brokers, high availability
Essential Broker Configuration
Core Broker Settings
# server.properties - Essential settings
broker.id=0
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://broker-1.example.com:9092
# Cluster coordination (KRaft mode - Kafka 3.x+)
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
# Or ZooKeeper mode (legacy)
# zookeeper.connect=zk-1:2181,zk-2:2181,zk-3:2181/kafkaEach broker needs a unique broker.id and properly configured listeners for client connections.
Log and Storage Settings
# Log directories - use multiple disks for performance
log.dirs=/data/kafka-logs-1,/data/kafka-logs-2
# Retention settings
log.retention.hours=168 # 7 days default
log.retention.bytes=-1 # No size limit (use time-based)
log.segment.bytes=1073741824 # 1GB segments
# Cleanup
log.cleanup.policy=delete # or compact, or delete,compact
log.cleaner.enable=trueReplication Settings
# Default replication for new topics
default.replication.factor=3
min.insync.replicas=2
# Replication performance
num.replica.fetchers=4
replica.fetch.max.bytes=1048576
replica.socket.receive.buffer.bytes=65536
# Leadership
auto.leader.rebalance.enable=true
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10For production, always use replication.factor=3 andmin.insync.replicas=2 for durability.
Network and Threading
# Network threads
num.network.threads=8 # Handles network requests
num.io.threads=16 # Handles disk I/O
# Socket settings
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600 # 100MB
# Request handling
queued.max.requests=500
request.timeout.ms=30000Topic Configuration Best Practices
Creating Production Topics
# Create topic with production settings
kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--topic orders \
--partitions 12 \
--replication-factor 3 \
--config min.insync.replicas=2 \
--config retention.ms=604800000 \
--config segment.bytes=1073741824Partition Count Guidelines
| Throughput | Partitions | Considerations |
|---|---|---|
| Low (<10MB/s) | 3-6 | Minimum for HA with 3 brokers |
| Medium (10-100MB/s) | 6-24 | Match to consumer parallelism |
| High (100MB-1GB/s) | 24-100 | Scale with brokers and consumers |
| Very High (>1GB/s) | 100+ | Consider multiple topics |
Warning: Partition Count is Hard to Change
Increasing partitions after creation can break key-based ordering guarantees. Plan partition counts carefully based on expected throughput growth. It's better to over-provision initially than to increase later.
Hardware Configuration
CPU
- • 8-16 cores per broker (production)
- • Kafka is not CPU-intensive normally
- • More cores needed for compression/encryption
- • Prefer higher clock speed over more cores
Memory
- • 32-64GB RAM minimum for production
- • 6-8GB for JVM heap (don't go higher)
- • Rest used for OS page cache
- • Page cache is critical for performance
Storage
- • SSDs strongly recommended (NVMe preferred)
- • Multiple disks with JBOD (no RAID)
- • Size: retention period × throughput × replication
- • XFS filesystem recommended
Network
- • 10Gbps minimum for production
- • Low latency between brokers (<1ms)
- • Separate network for replication (optional)
- • Consider cross-AZ bandwidth costs
JVM Configuration
Recommended JVM Settings
# KAFKA_HEAP_OPTS
export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"
# KAFKA_JVM_PERFORMANCE_OPTS
export KAFKA_JVM_PERFORMANCE_OPTS="-server \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=20 \
-XX:InitiatingHeapOccupancyPercent=35 \
-XX:+ExplicitGCInvokesConcurrent \
-XX:+ParallelRefProcEnabled \
-XX:+DisableExplicitGC \
-Djava.awt.headless=true"Keep heap size at 6-8GB. Larger heaps lead to longer GC pauses. Kafka relies heavily on OS page cache, not JVM heap.
Pro Tip: Monitor GC Metrics
Enable GC logging with -Xlog:gc*:file=/var/log/kafka/gc.log:time,tags:filecount=10,filesize=100Mand monitor for pause times exceeding 100ms. Consider ZGC for Java 17+ deployments.
Kafka Cluster Setup Checklist
Minimum 3 brokers for production
Enables replication factor of 3 with fault tolerance.
Brokers in different failure domains
Use rack awareness to spread replicas across AZs or racks.
Configure min.insync.replicas=2
Prevents data loss when combined with acks=all producers.
Enable unclean.leader.election.enable=false
Prevents data loss from out-of-sync replicas becoming leader.
Set up monitoring and alerting
Monitor broker health, under-replicated partitions, and disk usage.
Configure authentication and authorization
Enable SASL/SSL and ACLs for secure operations.
Plan storage capacity with 30% headroom
Account for retention, replication, and traffic growth.
Document runbooks for common operations
Broker restarts, partition reassignment, and emergency procedures.
Monitor Your Kafka Cluster with KLogic
KLogic provides comprehensive monitoring for your Kafka cluster configuration, helping you validate settings, detect configuration drift, and optimize performance.