Kafka Streams Monitoring
Master Kafka Streams monitoring with this comprehensive guide. Learn essential metrics, state store health monitoring, processing latency tracking, and production best practices for stream processing applications.
Understanding Kafka Streams Monitoring
Kafka Streams applications are stateful, distributed stream processing engines. Monitoring them requires visibility into processing rates, state store health, consumer lag, and application-specific metrics that go beyond basic Kafka monitoring.
Key Monitoring Areas
Essential Kafka Streams Metrics
Processing Metrics
| Metric | Description | What to Watch |
|---|---|---|
process-rate | Records processed per second | Throughput drops |
process-latency-avg | Average processing time per record | > 100ms concerning |
process-latency-max | Maximum processing latency | Latency spikes |
records-lag | Consumer lag for input topics | Growing lag |
State Store Metrics
| Metric | Description | Importance |
|---|---|---|
put-rate | Records written to state store per second | Write throughput |
get-rate | Records read from state store per second | Read throughput |
put-latency-avg | Average write latency | Storage performance |
restore-rate | Records restored from changelog per second | Startup performance |
Thread and Task Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
alive-stream-threads | Number of running stream threads | < expected count |
rebalance-rate | Consumer rebalance frequency | > 1 per minute |
failed-stream-threads | Threads that crashed | > 0 |
Enabling Kafka Streams Metrics
Configure Metrics Recording Level
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-streams-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
// Enable detailed metrics (DEBUG level for all metrics)
props.put(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "DEBUG");
// Or use INFO for production (less overhead)
// props.put(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "INFO");
KafkaStreams streams = new KafkaStreams(topology, props);Use DEBUG for development and INFOfor production to balance observability with performance.
Exposing Metrics to Prometheus
// Add JMX Exporter to your application
// Download jmx_prometheus_javaagent
// Run with: -javaagent:jmx_prometheus_javaagent.jar=8080:kafka-streams.yml
// kafka-streams.yml example:
---
lowercaseOutputName: true
lowercaseOutputLabelNames: true
whitelistObjectNames:
- "kafka.streams:*"
- "kafka.consumer:*"
- "kafka.producer:*"
rules:
- pattern: kafka.streams<type=(.+), thread-id=(.+), task-id=(.+)><>(.+)
name: kafka_streams_$1_$4
labels:
thread_id: "$2"
task_id: "$3"Programmatic Metrics Access
// Access metrics programmatically
for (Metric metric : streams.metrics().values()) {
MetricName name = metric.metricName();
if (name.group().equals("stream-processor-node-metrics")) {
System.out.println(name.name() + ": " + metric.metricValue());
}
}
// Or use streams.localThreadsMetadata() for thread info
for (ThreadMetadata thread : streams.localThreadsMetadata()) {
System.out.println("Thread: " + thread.threadName());
System.out.println("State: " + thread.threadState());
for (TaskMetadata task : thread.activeTasks()) {
System.out.println("Task: " + task.taskId());
}
}State Store Health Monitoring
State stores are critical to Kafka Streams applications. Monitoring their health ensures reliable stream processing and fast recovery from failures.
Restoration Progress
When a Streams application starts or rebalances, state stores must be restored from changelog topics. Monitor restoration progress to understand startup time.
restore-remaining-records andrestore-rate to estimate restoration time.RocksDB Metrics
If using the default RocksDB state store, monitor these RocksDB-specific metrics:
- •
rocksdb.bytes-read-rate- Bytes read per second - •
rocksdb.bytes-written-rate- Bytes written per second - •
rocksdb.memtable-bytes-flushed-rate- Memtable flush activity - •
rocksdb.compaction-time-avg- Compaction duration
Watch for State Store Size Growth
Unbounded state store growth can exhaust disk space. Monitor state store size and implement proper retention strategies using windowed stores or session windows with appropriate grace periods.
Kafka Streams Alerting Strategies
Recommended Alerts
Critical: Thread Failure
Alert immediately when alive-stream-threads drops below expected count or failed-stream-threads > 0.
Warning: High Processing Latency
Alert when process-latency-avg exceeds your SLA threshold (e.g., > 500ms for 5 minutes).
Warning: Growing Consumer Lag
Alert when records-lag is consistently increasing, indicating the application can't keep up with input rate.
Info: Frequent Rebalances
Alert when rebalance-rate exceeds 2 per hour, which may indicate stability issues.
Monitor Kafka Streams with KLogic
KLogic provides comprehensive Kafka Streams monitoring with built-in dashboards, intelligent alerting, and deep visibility into stream processing performance.