Apache Kafka is a distributed event streaming platform for publishing, storing, and processing continuous data streams. Three key capabilities: (1) publish/subscribe to event streams, (2) store streams durably with configurable retention, (3) process streams real-time or retrospectively. Key features: high throughput (millions of messages/sec), low latency (<10ms), horizontal scalability via partitions, fault tolerance through replication, exactly-once semantics, KRaft mode (ZooKeeper-free since Kafka 4.0 March 2025), tiered storage (production-ready 3.9.0), stream processing (Kafka Streams with RocksDB state stores). Use cases: event sourcing, real-time analytics, log aggregation, CDC pipelines, microservices messaging. Architecture: distributed, partitioned, replicated commit log with producer-consumer model.
Kafka FAQ & Answers
40 expert Kafka answers researched from official documentation. Every answer cites authoritative sources you can verify.
unknown
40 questionsTopics are Kafka's fundamental storage units representing categories or feeds of events. Each topic has a unique name (e.g., 'orders.created', 'user.login') and is divided into partitions for parallelism. Topics persist messages durably based on retention policies: time-based (retention.ms=604800000 for 7 days) or size-based (retention.bytes=1073741824). Naming best practices: use lowercase with hyphens, pattern like 'team.domain.event-type', avoid spaces/special characters. Topics support different configurations: cleanup.policy=delete (time/size retention) or compact (keep latest per key), replication.factor for durability, segment.bytes for segment size. Consumers subscribe to topics via group.id for parallel processing.
Create topics using kafka-topics.sh or AdminClient API. CLI example: kafka-topics.sh --create --topic user-events --bootstrap-server localhost:9092 --partitions 10 --replication-factor 3 --config retention.ms=604800000 --config segment.bytes=1073741824 --config cleanup.policy=delete --config min.insync.replicas=2. Best practices (2025): use 10 partitions per topic default (rule of 10), replication-factor=3 for production, min.insync.replicas=2 for durability with acks=all. Modify topics: kafka-topics.sh --alter --topic user-events --partitions 20 --bootstrap-server localhost:9092 (partitions can only increase). AdminClient example (Java): AdminClient.create(props).createTopics(Collections.singleton(new NewTopic("events", 10, (short) 3))).all().get(). Monitor: kafka-topics.sh --describe --topic user-events.
Partitions are ordered, immutable sequences of records within a topic, each with unique ID starting from 0. Partitions enable parallelism: topic with 10 partitions supports 10 concurrent consumers. Key guarantee: messages with same key (hash(key) % num_partitions) always route to same partition, preserving per-key ordering. Each partition is replicated across replication-factor brokers for fault tolerance. Leader partition handles all reads/writes; followers replicate data. Offset identifies message position within partition (0, 1, 2...). Partitions distributed across brokers for horizontal scaling. ISR (In-Sync Replicas) tracks replicas caught up with leader. Best practice: set partitions = max(target_throughput_MB/s / 10MB/s per partition, expected_max_consumers). Over-partition for future growth (1-2 year projection).
Calculate partitions using formula: partitions = max(target_throughput / partition_throughput_produce, target_throughput / partition_throughput_consume). Example: for 1GB/s target with 100MB/s producer throughput and 50MB/s consumer throughput, need max(10, 20) = 20 partitions. Industry rule (2025): 10 partitions per topic default, 10,000 partitions per cluster maximum. Single partition typical throughput: 1-10MB/s (depends on message size, compression). Best practices: over-partition for 1-2 year growth projection, test with actual workload, monitor consumer lag via kafka-consumer-groups.sh. More partitions = more parallelism but higher rebalancing overhead and metadata. Use topicpartitions.com calculator for sizing. Avoid <3 partitions (limits parallelism) and >1000/topic (metadata overhead).
Producers batch messages to improve throughput via batch.size (bytes per partition) and linger.ms (wait time). Batching logic: accumulate messages until batch.size=16384 (default 16KB) or linger.ms=0 (default, no wait) elapsed, then send. High-throughput config: batch.size=131072, linger.ms=10-20, compression.type=lz4 achieves 3400 msg/s. Compression with batching: LZ4 (594 MB/s compression, 2428 MB/s decompression, 40.7% ratio) for speed; zstd level 1 (409 MB/s, 844 MB/s, 26.4% ratio) for better compression. Low-latency config: batch.size=16384, linger.ms=0, compression.type=none. Monitor: batch-size-avg, compression-rate-avg, request-latency-avg metrics. Larger batches reduce network requests and improve compression efficiency. Batching per partition, not per topic.
Acks control durability-latency trade-off: acks=0 (no confirmation, 0ms latency, data loss on broker failure), acks=1 (leader confirmation only, ~5ms latency, data loss if leader fails before replication), acks=all or acks=-1 (wait for min.insync.replicas, ~10-50ms latency, no data loss). Production config: acks=all, min.insync.replicas=2, replication.factor=3 ensures durability (survives 1 broker failure). Exactly-once config: acks=all, enable.idempotence=true, max.in.flight.requests.per.connection=5, retries=2147483647. Use acks=1 for logs/metrics (tolerate loss), acks=all for transactions/orders (zero loss). Monitor: request-latency-avg, record-error-rate. With acks=all, writes committed only after ISR replication confirmation.
Enable idempotent producers (Kafka 0.11+) to prevent duplicates during retries: Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("enable.idempotence", "true");. Required configs auto-set: acks=all, max.in.flight.requests.per.connection=5, retries=Integer.MAX_VALUE. Mechanism: broker assigns Producer ID (PID) and tracks sequence numbers per partition. Duplicate writes (same PID + sequence) rejected. Example: network error causes retry, broker detects duplicate sequence number, returns success without re-writing. Use cases: financial transactions, inventory updates, event sourcing where duplicates break correctness. Idempotence is free (no performance cost) and recommended for all production producers. Compatible with transactions for exactly-once end-to-end.
Consumers read events from Kafka topics via subscribe() and poll() loop. Processing pattern: consumer.subscribe(Arrays.asList("orders")); while (true) { ConsumerRecords<K,V> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<K,V> record : records) { process(record); } consumer.commitSync(); }. Essential configs: bootstrap.servers=localhost:9092, group.id=order-processor, key.deserializer=StringDeserializer, value.deserializer=JsonDeserializer, auto.offset.reset=earliest (start from beginning) or latest (only new messages). Consumer tracks position via offsets stored in __consumer_offsets topic. Each partition assigned to one consumer in group. Automatic broker failover and partition rebalancing. Monitor: consumer-lag, records-consumed-rate, fetch-latency-avg.
Consumer groups enable parallel processing by assigning topic partitions across consumers with same group.id. Each partition assigned to exactly one consumer in group (exclusive assignment). Example: topic with 12 partitions + 3 consumers in group = 4 partitions per consumer. Add 4th consumer: rebalance assigns 3 partitions each. Rule: max consumers = partition count (extra consumers idle). Benefits: horizontal scaling (add consumers for throughput), fault tolerance (failed consumer triggers rebalance, partitions reassigned), load balancing (automatic partition distribution). Kafka 4.0 (2025) introduced enhanced consumer group protocol for faster rebalancing. Monitor: kafka-consumer-groups.sh --describe --group order-processor shows partition assignments and consumer lag. Partition ordering preserved; total ordering requires single partition.
Offsets are sequential integers (0, 1, 2...) identifying message position within partition. Each message has unique offset per partition. Consumers commit offsets to __consumer_offsets internal topic to track progress. Commit stores: consumer group, topic, partition, offset. On restart, consumer resumes from last committed offset + 1. Manual seek operations: consumer.seekToBeginning(partitions) (replay all), consumer.seekToEnd(partitions) (skip to latest), consumer.seek(partition, offset) (specific position). Use cases: replay messages (seekToBeginning), skip corrupt messages (seek forward), time-based replay (offsetsForTimes). Monitor lag: current offset (high water mark) - consumer offset = lag. Example: partition at offset 1000, consumer committed 950 = 50 messages lag.
Two commit strategies: auto-commit and manual commit. Auto-commit (default): enable.auto.commit=true, auto.commit.interval.ms=5000 commits every 5 seconds during poll(). Risk: duplicate processing if consumer crashes between poll and processing. Manual commit provides at-least-once guarantee: enable.auto.commit=false; while(true) { ConsumerRecords<K,V> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<K,V> record : records) { process(record); } consumer.commitSync(); }. Sync commit (commitSync): blocks until committed, throws exception on failure. Async commit (commitAsync): non-blocking with callback: consumer.commitAsync((offsets, ex) -> { if (ex != null) log.error("Commit failed", ex); }). Production pattern: commitAsync in loop, commitSync on shutdown for guaranteed final commit. Batch commits improve throughput.
Key-based partitioning ensures messages with same key route to same partition via murmur2 hash: partition = hash(key) % num_partitions. Guarantees per-key ordering. Example: ProducerRecord<String, Order> record = new ProducerRecord<>("orders", "customer-123", orderData); producer.send(record);. All messages for customer-123 always go to same partition. Null key uses round-robin (sticky partitioner in Kafka 2.4+). Use cases: user events (preserve user action sequence), IoT devices (device ID as key), database CDC (table row ID as key), session tracking (session ID as key). Key distribution matters: poor key choices (e.g., boolean) cause partition skew. Monitor partition sizes: kafka-log-dirs.sh detects hotspots. Change partition count requires key migration (hash changes).
Implement Partitioner interface for custom routing logic: public class GeoPartitioner implements Partitioner { public void configure(Map<String, ?> configs) {} public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) { String region = extractRegion((String)key); return Math.abs(region.hashCode() % cluster.partitionCountForTopic(topic)); } public void close() {} }. Configure producer: props.put("partitioner.class", "com.example.GeoPartitioner");. Use cases: geo-routing (US-EAST/US-WEST to partitions), priority lanes (VIP customers to low-latency partitions), load balancing (distribute hot keys). Ensure deterministic logic: same input always returns same partition. Test distribution: kafka-consumer-perf-test validates no partition skew.
Time-based retention deletes log segments after retention period expires. Configure via retention.ms=604800000 (7 days) at broker or topic level. Retention process: closed segments (segment.bytes=1073741824 default 1GB) eligible for deletion when age exceeds retention.ms. Active segment never deleted. Broker checks every log.retention.check.interval.ms=300000 (5 mins). Topic override: kafka-configs.sh --alter --topic events --add-config retention.ms=86400000 (1 day). Kafka 3.9+ tiered storage: local retention (hot data) + remote retention (cold data in S3). Monitor: kafka-log-dirs.sh --describe --bootstrap-server localhost:9092 shows size per partition. Retention vs compaction: cleanup.policy=delete (time/size based) vs compact (key-based).
Log compaction retains latest value per key, deleting older versions. Configure: cleanup.policy=compact or compact,delete (both time and compact). Compaction keeps: last value per key in each segment, all records in active segment, tombstones (null value) for delete.retention.ms=86400000 (24 hrs). Process: cleaner thread scans segments, builds key→latest_offset map, copies latest records to cleaned segment. Config: min.cleanable.dirty.ratio=0.5 (50% old records triggers compaction), segment.ms=604800000 (7 days) controls compaction frequency. Use cases: database changelog (user-profile topic), KV store (config-updates), state snapshots. Example: kafka-configs.sh --alter --topic user-state --add-config cleanup.policy=compact,min.cleanable.dirty.ratio=0.3. Monitor: kafka.log:type=LogCleaner metrics.
Replication factor controls fault tolerance: RF=1 (no durability, 1 broker failure = data loss), RF=2 (tolerate 1 failure), RF=3 (tolerate 2 failures, production standard). Storage impact: RF=3 requires 3x disk capacity. Network impact: leader replicates to RF-1 followers. Production config: replication.factor=3, min.insync.replicas=2, acks=all ensures no data loss if 1 broker fails. Rule: RF ≤ broker count (cannot exceed available brokers). Monitor: kafka-topics.sh --describe --under-replicated-partitions detects replication lag. ISR (In-Sync Replicas) must include min.insync.replicas for writes to succeed. Use RF=3 for production critical data, RF=2 for dev/test, RF=1 only for ephemeral data. Cost trade-off: RF=3 = 3x storage but 2-broker fault tolerance.
Topic naming conventions (2025 standards): use pattern team.domain.event-type (e.g., payments.orders.created, auth.users.login-failed). Use lowercase with hyphens as separators (not underscores/dots). Avoid: spaces, special characters (@#$%), dots in Kafka Connect contexts (conflicts with subject names). Include versioning for breaking schema changes: orders-v2, events-v1. Length: 50 characters maximum for readability. Environment prefixes: prod-orders, dev-orders for multi-environment clusters. Lifecycle tags: temp-, archive- for temporary topics. Pattern examples: <env>.<team>.<domain>.<entity>.<action> = prod.payments.billing.invoice.generated. Document naming policy in team wiki. Good names enable topic discovery, filtering (kafka-topics.sh --list | grep payments), and operational clarity.
Optimize partitioning using throughput-based calculation: partitions = target_throughput_MB_s / single_partition_throughput_MB_s. Single partition throughput: 1-10MB/s (depends on message size, network, disk). Example: 500MB/s target with 10MB/s per partition = 50 partitions. Industry guideline (2025): 10 partitions per topic default, 10,000 partitions per cluster max, 1000 partitions per broker max. Considerations: more partitions = better parallelism but higher rebalancing overhead (20-60s rebalance for 1000+ partitions). Consumer scaling: max consumers = partition count. Over-partition for future growth (1-2 year projection). Monitor: kafka-consumer-groups.sh shows consumer lag per partition. Test throughput: kafka-producer-perf-test measures MB/s per partition. Avoid: <3 partitions (no parallelism), >1000/topic (rebalancing overhead).
Rebalancing triggers: (1) consumer joins group (new instance started), (2) consumer leaves gracefully (consumer.close()), (3) consumer heartbeat timeout (session.timeout.ms=10000 default), (4) poll() timeout (max.poll.interval.ms=300000 default 5 mins), (5) partitions added to subscribed topic, (6) topic deleted, (7) regex subscription matches new topics. Heartbeat failure: consumer doesn't send heartbeat within session.timeout.ms = considered dead. Poll timeout: processing takes longer than max.poll.interval.ms = consumer kicked from group. Rebalance process: stop-the-world (eager) or incremental (cooperative, Kafka 2.4+). Duration: 10-60 seconds depending on group size and partition count. Minimize rebalances: keep consumers alive, process messages within max.poll.interval.ms, avoid frequent deployments. Monitor: kafka.consumer:type=consumer-coordinator-metrics,rebalance-latency-avg.
Rebalancing timeout configs: session.timeout.ms=10000 (heartbeat timeout, 10s default), heartbeat.interval.ms=3000 (heartbeat frequency, 3s default, must be <session.timeout.ms/3), max.poll.interval.ms=300000 (processing timeout, 5 mins default), max.poll.records=500 (messages per poll, affects processing time). Tuning guidelines: session.timeout.ms=6000-30000 (lower = faster failure detection, higher = tolerate GC pauses), heartbeat.interval.ms=session.timeout.ms/3, max.poll.interval.ms=2x expected processing time. Slow processing example: max.poll.interval.ms=600000, max.poll.records=100 (10 mins, 100 records). Fast rebalancing: session.timeout.ms=6000, heartbeat.interval.ms=2000. Monitor: rebalance-latency-max, rebalance-rate-per-hour JMX metrics. Kafka 4.0 (2025): enhanced protocol reduces rebalance time by 50-80%.
Replication copies partition data across brokers for fault tolerance. Each partition has: 1 leader (handles reads/writes), N-1 followers (replicate from leader). ISR (In-Sync Replicas): followers caught up within replica.lag.time.max.ms=10000 (10s). Write flow: (1) producer sends to leader, (2) leader appends to local log, (3) followers fetch from leader (replica.fetch.min.bytes=1), (4) leader tracks ISR, (5) acks=all waits for min.insync.replicas=2. Leader failure: controller elects new leader from ISR (unclean.leader.election.enable=false prevents non-ISR election). Config: replication.factor=3, min.insync.replicas=2, num.replica.fetchers=4. Monitor: kafka-topics.sh --describe shows leader, ISR, under-replicated partitions. Replication ensures: durability (data survives failures), availability (service continues), read scalability (Kafka 2.4+ follower fetching).
KRaft (Kafka Raft) is Kafka's native metadata consensus protocol, mandatory in Kafka 4.0 (March 2025). Replaces ZooKeeper for cluster coordination. Architecture: 3-5 controller nodes in KRaft quorum manage metadata via Raft protocol. Benefits: simpler operations (no external ZooKeeper), faster controller failover (<1s vs 10-30s), higher partition scalability (millions vs 200k), unified security model. KRaft stores metadata in internal __cluster_metadata topic. Controller quorum config: process.roles=controller, controller.quorum.voters=1@host1:9093,2@host2:9093,3@host3:9093. Production setup (2025): 3 dedicated controller nodes + N broker nodes (isolated mode). Migration: Kafka 3.6+ supports ZooKeeper→KRaft migration via dual-write mode. Monitor: kafka-metadata.sh, kafka.controller JMX metrics. KRaft is production-ready (KIP-833) and mandatory for Kafka 4.0+.
Three delivery guarantees: (1) At-most-once: acks=0, retries=0 - messages may be lost (fastest, no durability). (2) At-least-once: acks=all, retries=2147483647, enable.auto.commit=false - no loss, possible duplicates (default, idempotent consumer handles duplicates). (3) Exactly-once: enable.idempotence=true, transactional.id=unique-id, isolation.level=read_committed - no loss, no duplicates (highest latency). Exactly-once producer config: props.put("enable.idempotence", "true"); props.put("transactional.id", "txn-1");. Exactly-once consumer: props.put("isolation.level", "read_committed");. Use at-least-once for logs/events (idempotent processing), exactly-once for financial transactions, payments, inventory. Trade-offs: at-most-once (lowest latency), at-least-once (balanced), exactly-once (2-3x latency, requires transactions).
Kafka Streams is a Java library for stream processing with exactly-once semantics. Architecture: stateless operations (map, filter) and stateful operations (aggregate, join) backed by RocksDB state stores. Example: StreamsBuilder builder = new StreamsBuilder(); KStream<String, String> source = builder.stream("input"); KTable<String, Long> counts = source.groupByKey().count(Materialized.as("counts-store")); counts.toStream().to("output");. Key features (2025): RocksDB-backed state stores with changelog topics for fault tolerance, exactly-once processing via transactions, windowing (tumbling, hopping, session), interactive queries (REST API to query state stores), auto-scaling via partition assignment. Config: processing.guarantee=exactly_once_v2 (default in 3.0+), state.dir=/tmp/kafka-streams, num.stream.threads=4. Kafka 4.0 (2025): versioned state stores, improved performance. Deploy as microservice or standalone app.
Brokers are Kafka server processes that store partitions and handle client requests. Cluster formation: multiple brokers with unique broker.id connect to controller (KRaft quorum in Kafka 4.0, ZooKeeper in 3.x). Controller: manages metadata (partition leaders, ISR, topic configs). Broker config: broker.id=1, listeners=PLAINTEXT://:9092, advertised.listeners=PLAINTEXT://broker1:9092, log.dirs=/var/kafka-logs. KRaft mode (2025): process.roles=broker, controller.quorum.voters=1@ctrl1:9093,2@ctrl2:9093,3@ctrl3:9093. Partition distribution: controller assigns partitions across brokers for load balancing. Failover: if broker fails, controller reassigns leader partitions to ISR replicas. Scaling: add brokers, reassign partitions via kafka-reassign-partitions.sh. Monitor: kafka.server JMX metrics, UnderReplicatedPartitions, OfflinePartitionsCount.
Producer optimization (2025 benchmarks): high-throughput config: batch.size=131072, linger.ms=10-20, compression.type=lz4, acks=1, buffer.memory=134217728, max.in.flight.requests.per.connection=5. LZ4 achieves 3400 msg/s, 594 MB/s compression (40.7% ratio). Low-latency config: batch.size=16384, linger.ms=0, compression.type=none, acks=1. Idempotent producer (recommended): enable.idempotence=true, acks=all, retries=2147483647. Partitioning: use keys for parallel writes across partitions. Async sends: producer.send(record, callback) for non-blocking. Monitor JMX: record-send-rate, batch-size-avg, compression-rate-avg, request-latency-avg. Hardware: 10Gb+ network, SSD for logs, 32GB+ RAM. Throughput target: 100MB/s per producer thread achievable with tuning.
Broker optimization (2025 production config): num.network.threads=8 (request handlers, 1 per 8 cores), num.io.threads=16 (disk threads, 2x disk count), num.replica.fetchers=8 (replication threads), socket.send.buffer.bytes=1048576, socket.receive.buffer.bytes=1048576, log.segment.bytes=1073741824 (1GB segments), log.retention.check.interval.ms=300000 (5 min cleanup). Hardware: NVMe SSDs (10k+ IOPS), 64-128GB RAM (heap=8GB, page cache=rest), 10-25Gb network, 12+ cores. JVM tuning: -Xmx8g -Xms8g, -XX:+UseG1GC, -XX:MaxGCPauseMillis=20. KRaft mode (2025): faster metadata operations than ZooKeeper. Monitor: CPU <70%, disk I/O wait <20%, GC pause <20ms, network throughput. Tiered storage (Kafka 3.9+): offload cold data to S3, reduce local disk. Scale horizontally: add brokers for >1GB/s sustained throughput.
Consumer optimization (2025 benchmarks): high-throughput config: fetch.min.bytes=1048576 (1MB min), fetch.max.wait.ms=500, max.partition.fetch.bytes=10485760 (10MB per partition), max.poll.records=1000, enable.auto.commit=false (manual batch commits). Processing pattern: while(true) { ConsumerRecords<K,V> records = consumer.poll(Duration.ofMillis(100)); processInParallel(records); consumer.commitSync(); }. Parallelization: consumer instances = partition count for max throughput. Monitor: consumer-lag (critical), fetch-latency-avg, records-consumed-rate. Lag alerts: lag > 10000 messages triggers autoscaling. Cooperative rebalancing (Kafka 2.4+): partition.assignment.strategy=CooperativeStickyAssignor reduces rebalance downtime by 80%. Hardware: sufficient CPU/RAM for processing logic. Throughput target: 50-100MB/s per consumer achievable.
Kafka Connect is a framework for streaming data between Kafka and external systems without code. Source connectors: import data (JDBC, MongoDB, Debezium CDC, S3, Salesforce) into Kafka. Sink connectors: export data to systems (Elasticsearch, Snowflake, PostgreSQL, S3, HDFS). Architecture: distributed mode (scalable, fault-tolerant) or standalone mode (dev/test). Example JDBC source: {"name":"jdbc-source", "connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url":"jdbc:postgresql://db:5432/mydb", "mode":"incrementing", "incrementing.column.name":"id", "table.whitelist":"users", "topic.prefix":"db-"}. Deploy: connect-distributed.sh, auto-balances tasks across workers. Connector plugins: download from Confluent Hub (400+ pre-built connectors). SMT (Single Message Transforms): modify messages in-flight. Use for: ETL pipelines, CDC (change data capture), data lake ingestion, zero-code integrations.
Monitoring strategy (2025 production standards): Key metrics: (1) Consumer lag (consumer offset - log end offset, alert >10k messages), (2) UnderReplicatedPartitions (ISR < replication factor, alert >0), (3) OfflinePartitionsCount (no leader, critical alert), (4) broker CPU/disk/network (alert >80%), (5) request latency (p99 <100ms), (6) throughput (bytes-in/out per sec). Tools: Prometheus + Grafana (open source, JMX exporter), Confluent Control Center (commercial UI), Burrow (consumer lag), Cruise Control (autoscaling). JMX metrics: kafka.server:type=BrokerTopicMetrics, kafka.consumer:type=consumer-fetch-manager-metrics. Commands: kafka-consumer-groups.sh --describe --group mygroup, kafka-topics.sh --describe --under-replicated-partitions. Alerts: consumer lag >threshold, broker offline, disk >85%, rebalancing rate >10/hour. Dashboard: lag trends, throughput graphs, partition distribution heatmap.
Transactions provide exactly-once semantics for atomic multi-partition writes. Producer setup: props.put("transactional.id", "my-txn-id"); props.put("enable.idempotence", "true"); producer.initTransactions();. Transaction API: try { producer.beginTransaction(); producer.send(new ProducerRecord<>("topic1", key, value)); producer.send(new ProducerRecord<>("topic2", key, value)); producer.commitTransaction(); } catch (Exception e) { producer.abortTransaction(); }. Consumer: props.put("isolation.level", "read_committed"); (only reads committed messages). Use cases: exactly-once stream processing (Kafka Streams), ETL with multi-topic writes, read-process-write patterns. State stored in __transaction_state topic. Performance: 2-3x latency overhead vs non-transactional. Limitations: single producer per transactional.id, transactions timeout after transaction.timeout.ms=60000 (1 min default). Kafka Streams uses transactions internally for exactly-once processing.
Schema Registry (Confluent, self-hosted or cloud) centralizes schema management for Avro, Protobuf, JSON Schema. Producer flow: (1) serialize message with schema, (2) Schema Registry assigns schema ID, (3) producer sends: [magic-byte][schema-id][message]. Consumer: (1) reads schema ID from message, (2) fetches schema from registry (cached), (3) deserializes. Compatibility modes: BACKWARD (consumers with new schema read old data), FORWARD (old consumers read new data), FULL (both directions), NONE (no checks). Example: KafkaAvroSerializer serializer = new KafkaAvroSerializer(schemaRegistryClient); props.put("schema.registry.url", "http://localhost:8081"); props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");. Schema evolution: add optional fields (backward), remove fields (forward). Schema Registry 7.7.4+ (2025): auto-retries on 429, Protobuf/JSON Schema full support. Use for: type safety, version control, documentation, preventing bad data.
Cooperative rebalancing (Kafka 2.4+) performs incremental partition reassignment, avoiding stop-the-world pauses. Config: partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor (Kafka 2.4+) or org.apache.kafka.clients.consumer.ConsumerPartitionAssignor (Kafka 3.0+ default). Benefits: 50-80% faster rebalancing, consumers continue processing non-revoked partitions during rebalance, better throughput during scaling. Process: (1) consumer joins, (2) coordinator identifies partitions to revoke, (3) consumer revokes only those partitions, (4) continues processing others, (5) gets new assignments. Eager (old) vs cooperative: eager stops all consumers, cooperative stops only affected consumers. Enable: props.put("partition.assignment.strategy", "org.apache.kafka.clients.consumer.CooperativeStickyAssignor");. Monitor: rebalance-latency-avg metric (should be <5s with cooperative). Kafka 4.0 (2025): enhanced consumer group protocol further improves rebalancing.
Kafka security (2025 best practices): (1) Encryption: listeners=SSL://:9093, ssl.keystore.location=/path/server.keystore.jks, ssl.keystore.password=secret, ssl.key.password=secret, ssl.truststore.location=/path/truststore.jks. (2) Authentication: SASL/SCRAM-SHA-256 (recommended) or mTLS. SASL config: listeners=SASL_SSL://:9092, sasl.mechanism.inter.broker.protocol=SCRAM-SHA-256, sasl.enabled.mechanisms=SCRAM-SHA-256. (3) Authorization ACLs: kafka-acls.sh --bootstrap-server localhost:9092 --add --allow-principal User:alice --operation Read --topic orders --group consumer-group. (4) Quotas: kafka-configs.sh --alter --add-config 'producer_byte_rate=10485760' --entity-type users --entity-name bob. Best practices: prefer SCRAM over PLAIN, use mTLS for inter-broker, set allow.everyone.if.no.acl.found=false, rotate credentials quarterly, use ACL prefixes for topic groups, store secrets in vault (HashiCorp, AWS Secrets Manager).
Kafka vs RabbitMQ/ActiveMQ architecture differences: Kafka = distributed append-only log (persistent, immutable, replayable), MQ = mutable queue (delete after consumption, no replay). Kafka: multi-subscriber (many consumers read same data), retention (days/weeks configurable), ordering per partition, horizontal scaling (add partitions/brokers), throughput (millions/sec), pull model. MQ: single subscriber per message (queue deletion), no retention (immediate deletion), ordering per queue, vertical scaling limits, throughput (thousands/sec), push model. Use Kafka for: event streaming, log aggregation, CDC pipelines, event sourcing, big data ingestion, replay scenarios, analytics. Use MQ for: task queues (work distribution), complex routing (topic exchanges), RPC patterns, low latency (<1ms), transient messages, traditional pub/sub. Choose based on: durability (Kafka wins), routing complexity (MQ wins), throughput (Kafka wins), replay (Kafka only), simplicity (MQ simpler for small scale).
Headers are key-value metadata (both byte arrays) attached to Kafka messages without affecting partitioning. Producer example: Headers headers = new RecordHeaders(); headers.add("trace-id", "abc123".getBytes(StandardCharsets.UTF_8)); headers.add("source-system", "mobile-app".getBytes()); headers.add("schema-version", "v2".getBytes()); ProducerRecord<String, String> record = new ProducerRecord<>("orders", null, "key", "value", headers); producer.send(record);. Consumer: for (ConsumerRecord<K,V> record : records) { Headers hdrs = record.headers(); Header traceId = hdrs.lastHeader("trace-id"); String trace = new String(traceId.value()); }. Use cases: distributed tracing (correlation IDs), routing hints (region, priority), metadata (schema version, timestamp), conditional processing (feature flags), auditing (user-id, request-id). Headers don't affect: partitioning, ordering, message size limit. Available Kafka 0.11+. Performance: minimal overhead (<1% for typical headers).
Ultra-low latency config (<10ms end-to-end): Producer: linger.ms=0, batch.size=16384, acks=1, compression.type=none, buffer.memory=33554432. Broker: num.replica.fetchers=8, replica.lag.time.max.ms=5000, use NVMe SSDs, log.flush.interval.messages=1 (risky, reduces durability). Consumer: fetch.min.bytes=1, fetch.max.wait.ms=0, max.poll.records=100. Network: co-locate producer/broker/consumer in same AZ/region, use 10-25Gb network, disable Nagle's algorithm. Hardware: NVMe SSDs (latency <100μs), 10+ cores, 64GB+ RAM, low-latency network cards. Monitoring: measure p50/p95/p99 end-to-end latency with timestamps in headers. Trade-offs: latency vs throughput (no batching), latency vs durability (acks=1). Achievable: <5ms p99 with co-location and NVMe, <10ms p99 in production. Not suitable for: high throughput (use batching), durability critical (use acks=all).
Quotas prevent resource exhaustion in multi-tenant clusters by rate-limiting clients. Types: (1) produce quota (bytes/sec), (2) fetch quota (bytes/sec), (3) request quota (% of broker capacity). Configure per-user: kafka-configs.sh --bootstrap-server localhost:9092 --alter --add-config 'producer_byte_rate=10485760,consumer_byte_rate=20971520,request_percentage=50' --entity-type users --entity-name alice. Per-client-id: --entity-type clients --entity-name my-app. Default quotas: --entity-type users --entity-default --add-config 'producer_byte_rate=1048576'. Exemptions: --entity-type users --entity-name admin --delete-config 'producer_byte_rate'. Behavior: quotas throttle requests (add latency) rather than reject. Monitor: kafka.server:type=Quota JMX metrics, throttle-time-avg client metric. Use for: multi-tenant SaaS, preventing noisy neighbors, enforcing SLAs, cost control. Best practices: set defaults, exempt monitoring/ops users, alert on throttling >10%.
Kafka anti-patterns (2025 edition): (1) Using Kafka as database - no point queries, updates, deletes; use PostgreSQL/MongoDB + CDC to Kafka. (2) Too many small topics - metadata overhead; consolidate into multi-tenant topics with filtering. (3) Single partition - no parallelism; use ≥10 partitions. (4) Large messages (>1MB) - broker memory pressure; store in S3, send reference in Kafka. (5) Synchronous request-reply - blocks producer; use async callbacks. (6) No consumer lag monitoring - lag grows unbounded; alert on lag >10k messages. (7) Ignoring rebalancing - 60s+ outages; use cooperative rebalancing, tune timeouts. (8) No schema management - breaking changes; use Schema Registry with compatibility checks. (9) Manual offset management without understanding - data loss/duplicates; use auto-commit or manual commit after processing. (10) No backpressure - consumer overwhelmed; use max.poll.records limits, rate limiting, autoscaling. Avoid these for production reliability.