Apache Kafka: Complete Guide to Distributed Event Streaming

Table of Contents

Introduction
What is Apache Kafka?
- Key Characteristics
- Use Cases
Core Concepts
Kafka Architecture
Installing and Running Kafka
- Docker Compose Setup (Easiest)
- Manual Installation (Linux/macOS)
Advanced Producer Patterns
Advanced Consumer Patterns
Kafka Streams
Kafka Connect
Monitoring and Operations
Production Best Practices
Security
Troubleshooting
Real-World Example: E-Commerce Event Pipeline
- Architecture
- Order Service (Consumer + Producer)
Resources and Next Steps
Conclusion
Related Topics

Introduction

Apache Kafka is the world's most popular distributed event streaming platform, processing trillions of events daily at companies like LinkedIn, Netflix, Uber, and Airbnb. It powers real-time data pipelines, streaming analytics, event-driven microservices, and log aggregation at massive scale.

This comprehensive guide covers everything you need to master Kafka: core concepts, architecture, producers, consumers, Kafka Streams, Kafka Connect, and production deployment patterns.

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform that enables you to:

Publish and subscribe to streams of records (like a message queue)
Store streams of records durably and reliably
Process streams of records in real-time

Key Characteristics

Distributed: Scales horizontally across clusters
Fault-Tolerant: Replicates data for high availability
High-Throughput: Handles millions of events per second
Low-Latency: Sub-millisecond message delivery
Durable: Persists messages to disk
Scalable: Linearly scales with cluster size

Use Cases

Real-Time Data Pipelines: Move data between systems reliably
Stream Processing: Transform and enrich data streams in real-time
Event-Driven Architecture: Decouple microservices with events
Log Aggregation: Centralize logs from distributed applications
Metrics and Monitoring: Collect and process operational telemetry
CDC (Change Data Capture): Stream database changes
IoT Data Ingestion: Handle millions of device events

Core Concepts

Topics

A topic is a category or feed name to which records are published. Topics are partitioned and replicated.

# Create a topic
kafka-topics.sh --create \
  --topic orders \
  --partitions 3 \
  --replication-factor 3 \
  --bootstrap-server localhost:9092

Key Properties:

Name: Unique identifier (e.g., user-signups, payment-events)
Partitions: Parallel processing units
Replication Factor: Number of copies for fault tolerance
Retention: How long messages are stored (time or size-based)

Partitions

Partitions enable parallel processing and scalability.

Topic: orders (3 partitions)

Partition 0: [msg1] [msg5] [msg9]
Partition 1: [msg2] [msg6] [msg10]
Partition 2: [msg3] [msg7] [msg11]

Key Points:

Each partition is an ordered, immutable sequence
Messages within a partition are ordered
Partitions are distributed across brokers
Number of partitions determines max parallelism

Producers

Producers publish messages to topics.

Python Example:

from kafka import KafkaProducer
import json
from datetime import datetime

# Create producer
producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    key_serializer=lambda k: k.encode('utf-8') if k else None,
    acks='all',  # Wait for all replicas to acknowledge
    retries=3,
    max_in_flight_requests_per_connection=1  # Ensure ordering
)

# Send message
order = {
    'order_id': '12345',
    'customer_id': 'CUST-001',
    'amount': 99.99,
    'timestamp': datetime.now().isoformat()
}

# Asynchronous send
future = producer.send(
    topic='orders',
    key='12345',  # Messages with same key go to same partition
    value=order
)

# Synchronous send (wait for confirmation)
try:
    record_metadata = future.get(timeout=10)
    print(f"Message sent to {record_metadata.topic} "
          f"partition {record_metadata.partition} "
          f"offset {record_metadata.offset}")
except Exception as e:
    print(f"Error sending message: {e}")

producer.close()

Key Configuration:

acks: Acknowledgment level (0, 1, all)
retries: Number of retry attempts
compression.type: Compression algorithm (gzip, snappy, lz4, zstd)
batch.size: Batching for efficiency
linger.ms: Wait time to batch messages

Consumers

Consumers read messages from topics. Consumers are organized into consumer groups.

Python Example:

from kafka import KafkaConsumer
import json

# Create consumer
consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['localhost:9092'],
    group_id='order-processing-service',
    auto_offset_reset='earliest',  # Start from beginning if no offset
    enable_auto_commit=False,  # Manual commit for exactly-once
    value_deserializer=lambda m: json.loads(m.decode('utf-8')),
    max_poll_records=100
)

# Process messages
try:
    for message in consumer:
        order = message.value
        print(f"Processing order: {order['order_id']}")

        # Process order
        process_order(order)

        # Manual commit after successful processing
        consumer.commit()

except KeyboardInterrupt:
    print("Shutting down consumer")
finally:
    consumer.close()

Consumer Groups

Consumer groups enable parallel processing and load balancing.

Topic: orders (3 partitions)

Consumer Group: order-processors
├── Consumer 1 → Partition 0
├── Consumer 2 → Partition 1
└── Consumer 3 → Partition 2

Key Points:

Each partition is consumed by exactly one consumer in a group
Multiple consumer groups can read the same topic independently
Adding consumers scales throughput (up to # of partitions)
Kafka handles rebalancing automatically

Offsets

Offsets track consumer position in a partition.

Partition 0: [0] [1] [2] [3] [4] [5] [6] [7]
                              ↑
                        Current Offset

Offset Management:

Auto-commit: Kafka commits offsets periodically
Manual commit: Application commits after processing
Stored in Kafka: __consumer_offsets topic

# Manual offset management
consumer = KafkaConsumer(
    'orders',
    enable_auto_commit=False
)

for message in consumer:
    process(message)

    # Commit synchronously
    consumer.commit()

    # Or commit asynchronously
    consumer.commit_async()

Kafka Architecture

Cluster Components

┌─────────────────────────────────────────┐
│         ZooKeeper Ensemble              │
│    (Coordination & Configuration)       │
└─────────────────────────────────────────┘
                    ↕
┌─────────────────────────────────────────┐
│         Kafka Cluster                   │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐│
│  │ Broker 1 │ │ Broker 2 │ │ Broker 3 ││
│  │(Leader P0)│ │(Leader P1)│ │(Leader P2)││
│  │(Follower)│ │(Follower)│ │(Follower)││
│  └──────────┘ └──────────┘ └──────────┘│
└─────────────────────────────────────────┘
        ↑                           ↓
    Producers                   Consumers

Brokers

Brokers are Kafka servers that store data and serve clients.

Responsibilities:

Receive messages from producers
Serve messages to consumers
Replicate partitions
Handle leader election

Configuration Example (server.properties):

# Broker ID (unique in cluster)
broker.id=1

# Listeners
listeners=PLAINTEXT://broker1:9092

# Log directory
log.dirs=/var/kafka-logs

# Zookeeper connection
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181

# Number of threads
num.network.threads=8
num.io.threads=8

# Retention
log.retention.hours=168  # 7 days
log.segment.bytes=1073741824  # 1GB

# Replication
default.replication.factor=3
min.insync.replicas=2

Replication

Replication provides fault tolerance.

Topic: orders, Partition 0, Replication Factor: 3

Broker 1 (Leader):     [msg1] [msg2] [msg3] [msg4]
Broker 2 (Follower):   [msg1] [msg2] [msg3] [msg4]
Broker 3 (Follower):   [msg1] [msg2] [msg3]
                                          ↑ In-Sync

Key Concepts:

Leader: Handles all reads and writes for a partition
Followers: Replicate leader's data
ISR (In-Sync Replicas): Followers that are caught up
min.insync.replicas: Minimum ISR required for write

ZooKeeper (Legacy) vs KRaft

ZooKeeper (Traditional):

Manages cluster metadata
Handles leader election
Stores configuration

KRaft (Modern - Kafka 3.0+):

Kafka's native consensus protocol
Removes ZooKeeper dependency
Simpler operations
Better scalability

KRaft Configuration:

# KRaft mode
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@localhost:9093

# Cluster ID (generate with kafka-storage.sh random-uuid)
cluster.id=MkU3OEVBNTcwNTJENDM2Qk

Installing and Running Kafka

Docker Compose Setup (Easiest)

docker-compose.yml:

version: '3.8'

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - '2181:2181'

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    depends_on:
      - zookeeper
    ports:
      - '9092:9092'
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    ports:
      - '8080:8080'
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092

Start Kafka:

docker-compose up -d

Access Kafka UI: http://localhost:8080

Manual Installation (Linux/macOS)

# Download Kafka
wget https://downloads.apache.org/kafka/3.6.0/kafka_2.13-3.6.0.tgz
tar -xzf kafka_2.13-3.6.0.tgz
cd kafka_2.13-3.6.0

# Start ZooKeeper
bin/zookeeper-server-start.sh config/zookeeper.properties

# Start Kafka broker (in another terminal)
bin/kafka-server-start.sh config/server.properties

# Create topic
bin/kafka-topics.sh --create \
  --topic test \
  --bootstrap-server localhost:9092 \
  --partitions 3 \
  --replication-factor 1

# Send messages
bin/kafka-console-producer.sh \
  --topic test \
  --bootstrap-server localhost:9092

# Consume messages
bin/kafka-console-consumer.sh \
  --topic test \
  --from-beginning \
  --bootstrap-server localhost:9092

Advanced Producer Patterns

Idempotent Producer

Ensures exactly-once delivery within a producer session.

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    enable_idempotence=True,  # Exactly-once semantics
    acks='all',
    retries=Integer.MAX_VALUE,
    max_in_flight_requests_per_connection=5
)

Transactional Producer

Atomic writes across multiple topics/partitions.

from kafka import KafkaProducer

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    transactional_id='order-processor-1',
    enable_idempotence=True
)

# Initialize transactions
producer.init_transactions()

try:
    # Begin transaction
    producer.begin_transaction()

    # Send multiple messages atomically
    producer.send('orders', value=b'order1')
    producer.send('inventory', value=b'reserved')
    producer.send('notifications', value=b'email-sent')

    # Commit transaction
    producer.commit_transaction()

except Exception as e:
    # Abort on error
    producer.abort_transaction()
    raise e

Custom Partitioner

Control partition assignment logic.

from kafka import KafkaProducer
from kafka.partitioner import Partitioner

class CustomerPartitioner(Partitioner):
    """Route VIP customers to partition 0"""

    def partition(self, key, all_partitions, available_partitions):
        if key and key.startswith(b'VIP'):
            return 0  # VIP partition
        else:
            # Hash-based for others
            return hash(key) % len(all_partitions)

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    partitioner=CustomerPartitioner()
)

Batching and Compression

Optimize throughput with batching and compression.

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    compression_type='gzip',  # or 'snappy', 'lz4', 'zstd'
    batch_size=32768,  # 32KB batches
    linger_ms=10,  # Wait 10ms to batch
    buffer_memory=67108864  # 64MB buffer
)

Advanced Consumer Patterns

Manual Partition Assignment

Bypass consumer groups for specific use cases.

from kafka import KafkaConsumer, TopicPartition

consumer = KafkaConsumer(
    bootstrap_servers=['localhost:9092'],
    auto_offset_reset='earliest'
)

# Manually assign partitions
partitions = [TopicPartition('orders', 0), TopicPartition('orders', 1)]
consumer.assign(partitions)

# Seek to specific offset
consumer.seek(TopicPartition('orders', 0), 100)

for message in consumer:
    print(f"Partition: {message.partition}, Offset: {message.offset}")

Exactly-Once Consumer

Combine with transactional producer for end-to-end exactly-once.

from kafka import KafkaConsumer, KafkaProducer

consumer = KafkaConsumer(
    'input-topic',
    bootstrap_servers=['localhost:9092'],
    group_id='processor',
    enable_auto_commit=False,
    isolation_level='read_committed'  # Only read committed messages
)

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    transactional_id='processor-1',
    enable_idempotence=True
)

producer.init_transactions()

for message in consumer:
    try:
        producer.begin_transaction()

        # Process message
        result = process(message.value)

        # Send result
        producer.send('output-topic', value=result)

        # Commit consumer offset within transaction
        producer.send_offsets_to_transaction(
            {TopicPartition(message.topic, message.partition):
             message.offset + 1},
            consumer.config['group_id']
        )

        producer.commit_transaction()

    except Exception as e:
        producer.abort_transaction()
        raise e

Parallel Consumer

Process messages in parallel while maintaining partition ordering.

from kafka import KafkaConsumer
from concurrent.futures import ThreadPoolExecutor

consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['localhost:9092'],
    max_poll_records=100
)

executor = ThreadPoolExecutor(max_workers=10)

def process_message(message):
    # Heavy processing
    result = expensive_operation(message.value)
    return result

for batch in consumer:
    # Process batch in parallel
    futures = [executor.submit(process_message, msg) for msg in batch]

    # Wait for all to complete
    for future in futures:
        result = future.result()

    # Commit after batch processed
    consumer.commit()

Kafka Streams

Kafka Streams is a client library for building real-time stream processing applications.

Word Count Example (Java)

import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.*;
import java.util.Arrays;

public class WordCountApp {
    public static void main(String[] args) {
        StreamsBuilder builder = new StreamsBuilder();

        // Input stream
        KStream<String, String> textLines =
            builder.stream("text-input");

        // Processing logic
        KTable<String, Long> wordCounts = textLines
            .flatMapValues(line -> Arrays.asList(line.toLowerCase().split("\\W+")))
            .groupBy((key, word) -> word)
            .count();

        // Output stream
        wordCounts.toStream()
            .to("word-count-output");

        // Start application
        KafkaStreams streams = new KafkaStreams(builder.build(), config);
        streams.start();
    }
}

Python Alternative: Faust

import faust

app = faust.App('word-count', broker='kafka://localhost:9092')

# Input topic
text_topic = app.topic('text-input', value_type=str)

# Output topic
word_count_topic = app.topic('word-count-output')

# Table for counts
word_counts = app.Table('word-counts', default=int)

@app.agent(text_topic)
async def count_words(stream):
    async for text in stream:
        for word in text.lower().split():
            word_counts[word] += 1
            await word_count_topic.send(value={'word': word, 'count': word_counts[word]})

if __name__ == '__main__':
    app.main()

Windowed Aggregations

from datetime import timedelta

@app.agent(orders_topic)
async def hourly_revenue(stream):
    # Tumbling window: 1 hour
    async for window, revenue in stream.tumbling(timedelta(hours=1)):
        total = sum(order.amount for order in window)
        print(f"Revenue for {window}: ${total}")

Kafka Connect

Kafka Connect is a framework for integrating Kafka with external systems.

Source Connector (Database → Kafka)

Debezium MySQL Connector (CDC):

{
  "name": "mysql-source-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "secret",
    "database.server.id": "184054",
    "database.server.name": "production",
    "database.include.list": "inventory",
    "table.include.list": "inventory.customers,inventory.orders",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes"
  }
}

Sink Connector (Kafka → Database)

JDBC Sink Connector:

{
  "name": "postgres-sink-connector",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "tasks.max": "1",
    "topics": "orders",
    "connection.url": "jdbc:postgresql://postgres:5432/analytics",
    "connection.user": "kafka",
    "connection.password": "secret",
    "auto.create": "true",
    "auto.evolve": "true",
    "insert.mode": "upsert",
    "pk.mode": "record_key",
    "pk.fields": "order_id"
  }
}

Deploy Connector

# REST API
curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d @mysql-source-connector.json

# List connectors
curl http://localhost:8083/connectors

# Check status
curl http://localhost:8083/connectors/mysql-source-connector/status

Monitoring and Operations

Key Metrics

Broker Metrics:

UnderReplicatedPartitions: Partitions not fully replicated
OfflinePartitionsCount: Partitions without leader
ActiveControllerCount: Should be 1
BytesInPerSec: Incoming throughput
BytesOutPerSec: Outgoing throughput

Producer Metrics:

record-send-rate: Messages/sec
record-error-rate: Failed sends
request-latency-avg: Average latency

Consumer Metrics:

records-lag-max: Max consumer lag
fetch-rate: Messages fetched/sec

Monitoring with JMX

# Enable JMX
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote \
  -Dcom.sun.management.jmxremote.port=9999 \
  -Dcom.sun.management.jmxremote.authenticate=false"

# Query metrics
kafka-run-class.sh kafka.tools.JmxTool \
  --object-name kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec

Consumer Lag Monitoring

# Check consumer group lag
kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --group order-processors

# Output:
# GROUP           TOPIC    PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
# order-processors orders  0          1500            1550            50
# order-processors orders  1          2300            2300            0
# order-processors orders  2          1800            1900            100

Production Best Practices

Topic Configuration

# Create production topic
kafka-topics.sh --create \
  --topic production-events \
  --partitions 12 \
  --replication-factor 3 \
  --config retention.ms=604800000 \  # 7 days
  --config compression.type=lz4 \
  --config min.insync.replicas=2 \
  --config segment.bytes=1073741824 \  # 1GB
  --bootstrap-server kafka:9092

Durability vs Throughput Trade-offs

Maximum Durability:

producer = KafkaProducer(
    acks='all',  # Wait for all ISR
    retries=Integer.MAX_VALUE,
    max_in_flight_requests_per_connection=1,
    enable_idempotence=True
)

Maximum Throughput:

producer = KafkaProducer(
    acks=1,  # Wait only for leader
    compression_type='lz4',
    batch_size=65536,  # 64KB
    linger_ms=100,
    buffer_memory=134217728  # 128MB
)

Capacity Planning

Throughput Calculation:

Message Size: 1 KB
Target Throughput: 100,000 msg/sec
Replication Factor: 3

Network Bandwidth Required:
= 1 KB × 100,000 msg/sec × 3 replicas
= 300 MB/sec = 2.4 Gbps

Disk Space (7-day retention):
= 1 KB × 100,000 msg/sec × 86,400 sec/day × 7 days × 3 replicas
= 181 TB

Partition Count:

Target Throughput: 100,000 msg/sec
Per-Partition Throughput: 10,000 msg/sec

Partitions Needed: 100,000 / 10,000 = 10
Add buffer: 12-15 partitions

Security

SSL/TLS Encryption

server.properties:

listeners=SSL://broker1:9093
advertised.listeners=SSL://broker1:9093

ssl.keystore.location=/var/ssl/kafka.server.keystore.jks
ssl.keystore.password=secret
ssl.key.password=secret
ssl.truststore.location=/var/ssl/kafka.server.truststore.jks
ssl.truststore.password=secret

Client Configuration:

producer = KafkaProducer(
    bootstrap_servers=['broker1:9093'],
    security_protocol='SSL',
    ssl_cafile='/path/to/ca-cert',
    ssl_certfile='/path/to/client-cert',
    ssl_keyfile='/path/to/client-key'
)

SASL Authentication

listeners=SASL_SSL://broker1:9093
security.inter.broker.protocol=SASL_SSL
sasl.mechanism.inter.broker.protocol=PLAIN
sasl.enabled.mechanisms=PLAIN

ACLs (Access Control Lists)

# Grant produce permission
kafka-acls.sh --bootstrap-server kafka:9092 \
  --add \
  --allow-principal User:alice \
  --operation Write \
  --topic orders

# Grant consume permission
kafka-acls.sh --bootstrap-server kafka:9092 \
  --add \
  --allow-principal User:bob \
  --operation Read \
  --topic orders \
  --group order-processors

Troubleshooting

Issue 1: High Consumer Lag

Diagnosis:

kafka-consumer-groups.sh --describe --group my-group

Solutions:

Add more consumers (up to # of partitions)
Optimize processing logic
Increase max.poll.records
Reduce max.poll.interval.ms

Issue 2: UnderReplicated Partitions

Diagnosis:

kafka-topics.sh --describe --under-replicated-partitions

Solutions:

Check broker health
Increase replica.lag.time.max.ms
Add more disk I/O capacity
Check network connectivity

Issue 3: Out of Memory

Symptoms: Brokers crashing, slow performance

Solutions:

# Increase heap size
export KAFKA_HEAP_OPTS="-Xmx6g -Xms6g"

# Tune GC
export KAFKA_JVM_PERFORMANCE_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=20"

Real-World Example: E-Commerce Event Pipeline

Architecture

┌─────────────┐
│   Web App   │─────→ orders-topic ─────→ ┌──────────────┐
└─────────────┘                            │ Order Service│
                                           └──────────────┘
┌─────────────┐                                   ↓
│ Mobile App  │─────→ clicks-topic                ↓
└─────────────┘                            ┌──────────────┐
                                           │  Inventory   │
┌─────────────┐                            └──────────────┘
│  API        │─────→ payments-topic              ↓
└─────────────┘                                   ↓
                                           ┌──────────────┐
                                           │ Notification │
                                           └──────────────┘

Order Service (Consumer + Producer)

from kafka import KafkaConsumer, KafkaProducer
import json

# Initialize consumer and producer
consumer = KafkaConsumer(
    'orders-topic',
    bootstrap_servers=['kafka:9092'],
    group_id='order-service',
    enable_auto_commit=False,
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

# Process orders
for message in consumer:
    order = message.value

    try:
        # Validate order
        if validate_order(order):
            # Reserve inventory
            producer.send('inventory-topic', {
                'action': 'reserve',
                'order_id': order['order_id'],
                'items': order['items']
            })

            # Send notification
            producer.send('notification-topic', {
                'type': 'order_confirmation',
                'customer_id': order['customer_id'],
                'order_id': order['order_id']
            })

        # Commit offset
        consumer.commit()

    except Exception as e:
        print(f"Error processing order: {e}")
        # Don't commit - will retry

Resources and Next Steps

Official Resources

Learning Resources

Ecosystem Tools

Schema Registry: Manage Avro/Protobuf/JSON schemas
KSQL: SQL interface for stream processing
Kafka Manager: Web-based cluster management
Cruise Control: Automated cluster operations

Conclusion

Apache Kafka has become the backbone of modern data infrastructure, enabling real-time data pipelines, event-driven architectures, and stream processing at massive scale.

Key Takeaways:

Kafka provides distributed, fault-tolerant event streaming
Topics and partitions enable horizontal scalability
Producer/consumer APIs are simple yet powerful
Kafka Streams and Kafka Connect extend capabilities
Production deployment requires careful capacity planning and monitoring

Whether you're building real-time analytics, microservices communication, or log aggregation systems, Kafka provides the scalability, durability, and performance needed for modern data-intensive applications.

Start with a local Docker setup, experiment with producers and consumers, and gradually explore advanced features like Kafka Streams and Kafka Connect as your use cases evolve.

Apache Spark - Process Kafka streams with Spark Structured Streaming
Apache Airflow - Orchestrate Kafka-based data pipelines
Databricks - Stream processing with Delta Lake and Kafka

Introduction

What is Apache Kafka?

Key Characteristics

Use Cases

Core Concepts

Topics

Partitions

Producers

Consumers

Consumer Groups

Offsets

Kafka Architecture

Cluster Components

Brokers

Replication

ZooKeeper (Legacy) vs KRaft

Installing and Running Kafka

Docker Compose Setup (Easiest)

Manual Installation (Linux/macOS)

Advanced Producer Patterns

Idempotent Producer

Transactional Producer

Custom Partitioner

Batching and Compression

Advanced Consumer Patterns

Manual Partition Assignment

Exactly-Once Consumer

Parallel Consumer

Kafka Streams

Word Count Example (Java)

Python Alternative: Faust

Windowed Aggregations

Kafka Connect

Source Connector (Database → Kafka)

Sink Connector (Kafka → Database)

Deploy Connector

Monitoring and Operations

Key Metrics

Monitoring with JMX

Consumer Lag Monitoring

Production Best Practices

Topic Configuration

Durability vs Throughput Trade-offs

Capacity Planning

Security

SSL/TLS Encryption

SASL Authentication

ACLs (Access Control Lists)

Troubleshooting

Issue 1: High Consumer Lag

Issue 2: UnderReplicated Partitions

Issue 3: Out of Memory

Real-World Example: E-Commerce Event Pipeline

Architecture

Order Service (Consumer + Producer)

Resources and Next Steps

Official Resources

Learning Resources

Ecosystem Tools

Conclusion

Related Topics

Related Articles

Top 25 Data Engineering Tools and Technologies in 2025

dbt (Data Build Tool): Complete Guide to Modern Data Transformation

Data Engineering Salary