Apache Kafka: Event-Driven Architecture Without the PhD
Learn Apache Kafka from scratch -- topics, partitions, producers, consumers, and real-world patterns with Python examples and Docker setup.
Every backend developer eventually hits a wall. You have Service A that needs to tell Service B something happened. So you make an HTTP call. Then Service C needs to know too. And Service D. Now you have a web of synchronous calls, and when Service C goes down for 30 seconds, everything backs up.
This is where event-driven architecture enters the picture, and Apache Kafka is the tool most teams reach for. It's a distributed event streaming platform -- think of it as a very durable, very fast message highway between your services.
Kafka has a reputation for being complex. And at scale, it absolutely is. But getting started, understanding the core concepts, and building useful things with it? That's more approachable than you'd think.
What Event-Driven Architecture Actually Means
In a traditional request-response architecture, services talk directly to each other. Service A calls Service B's API, waits for a response, and continues. This is simple and works well until it doesn't.
In an event-driven architecture, services communicate by publishing and subscribing to events. Service A publishes an "order-placed" event. Services B, C, and D each independently consume that event and do their own thing. Service A doesn't know or care what happens next.
The benefits:
- Decoupling -- Services don't need to know about each other
- Resilience -- If Service C is down, the event waits. No data loss.
- Scalability -- Add more consumers without changing the producer
- Audit trail -- Every event is stored and replayable
Kafka is the backbone that makes this work at scale.
Kafka Core Concepts
Before touching any code, you need to understand five things:
Topics
A topic is a named stream of events. Think of it as a category or channel. user-signups, order-events, payment-transactions -- each is a topic. Producers write to topics. Consumers read from topics.
Partitions
Each topic is split into partitions -- numbered, ordered sequences of events. Partitions are Kafka's secret weapon for parallelism. If your topic has 4 partitions, you can have 4 consumers reading in parallel, each handling a different partition.
Events within a partition are strictly ordered. Events across partitions are not. This matters when ordering is important (e.g., all events for user #42 should be processed in order). You control which partition an event goes to by setting a key.
Producers
Producers publish events to topics. They're the writers. A producer sends a message (a key-value pair plus optional headers) to a specific topic. If you provide a key, Kafka hashes it to determine the partition. Same key always goes to the same partition, which guarantees ordering for that key.
Consumers and Consumer Groups
Consumers read events from topics. They're the readers. But here's the important part: consumers belong to consumer groups.
Within a consumer group, each partition is assigned to exactly one consumer. If you have 4 partitions and 2 consumers in the same group, each consumer handles 2 partitions. If you have 4 consumers, each handles 1 partition. If you have 5 consumers... one sits idle (you can't have more consumers than partitions).
Different consumer groups independently track their own position (offset) in the topic. So your analytics service and your notification service can each consume the same events at their own pace.
Offsets
Every event in a partition has a sequential number called an offset. Consumers track which offset they've processed. If a consumer crashes and restarts, it picks up from its last committed offset. No events lost.
Setting Up Kafka with Docker
Running Kafka locally used to be painful. Docker makes it reasonable:
# docker-compose.yml
version: '3.8'
services:
kafka:
image: confluentinc/confluent-local:7.6.0
hostname: kafka
container_name: kafka
ports:
- "9092:9092"
- "9101:9101"
environment:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:29093'
KAFKA_LISTENERS: 'PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:29093'
KAFKA_PROCESS_ROLES: 'broker,controller'
KAFKA_NODE_ID: 1
KAFKA_CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
This uses KRaft mode (no ZooKeeper needed -- Kafka manages its own metadata since version 3.3).
docker-compose up -d
Verify it's running:
docker exec -it kafka kafka-topics --bootstrap-server localhost:9092 --list
Create your first topic:
docker exec -it kafka kafka-topics --bootstrap-server localhost:9092 \
--create --topic user-events --partitions 3 --replication-factor 1
Producing and Consuming with Python
Install the confluent-kafka library -- it's a Python wrapper around Kafka's C client (librdkafka), so it's fast:
pip install confluent-kafka
Producing Events
# producer.py
import json
import time
from confluent_kafka import Producer
def delivery_report(err, msg):
"""Callback for message delivery confirmation."""
if err is not None:
print(f"Delivery failed: {err}")
else:
print(f"Delivered to {msg.topic()} [{msg.partition()}] @ offset {msg.offset()}")
producer = Producer({
'bootstrap.servers': 'localhost:9092',
'client.id': 'my-producer'
})
# Produce some events
events = [
{"user_id": "u001", "action": "signup", "email": "alice@example.com"},
{"user_id": "u002", "action": "signup", "email": "bob@example.com"},
{"user_id": "u001", "action": "login", "ip": "192.168.1.10"},
{"user_id": "u003", "action": "signup", "email": "carol@example.com"},
{"user_id": "u002", "action": "login", "ip": "10.0.0.5"},
]
for event in events:
producer.produce(
topic='user-events',
key=event["user_id"], # Same user -> same partition -> ordered
value=json.dumps(event),
callback=delivery_report
)
producer.poll(0) # Trigger delivery callbacks
# Wait for all messages to be delivered
producer.flush()
print("All messages delivered")
Key points:
- The
keydetermines the partition. All events foru001go to the same partition, so they're processed in order. poll(0)triggers delivery callbacks without blocking.flush()blocks until all queued messages are delivered. Always call this before exiting.
Consuming Events
# consumer.py
import json
from confluent_kafka import Consumer, KafkaError
consumer = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'user-event-processors',
'auto.offset.reset': 'earliest', # Start from beginning if no committed offset
'enable.auto.commit': True,
})
consumer.subscribe(['user-events'])
print("Waiting for events... (Ctrl+C to stop)")
try:
while True:
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
if msg.error().code() == KafkaError._PARTITION_EOF:
print(f"Reached end of partition {msg.partition()}")
else:
print(f"Error: {msg.error()}")
continue
event = json.loads(msg.value().decode('utf-8'))
print(f"Received: partition={msg.partition()}, offset={msg.offset()}, "
f"key={msg.key().decode('utf-8')}, event={event}")
# Process the event
if event["action"] == "signup":
print(f" -> New user: {event.get('email', 'unknown')}")
elif event["action"] == "login":
print(f" -> User login from {event.get('ip', 'unknown')}")
except KeyboardInterrupt:
print("Shutting down...")
finally:
consumer.close()
Run the consumer in one terminal, then run the producer in another. You'll see events flowing through.
Real-World Pattern: Order Processing Pipeline
Let's build something closer to reality -- an order processing system with multiple consumers:
# order_producer.py
import json
import uuid
import random
import time
from confluent_kafka import Producer
producer = Producer({'bootstrap.servers': 'localhost:9092'})
products = ["Widget A", "Gadget B", "Doohickey C", "Thingamajig D"]
def produce_order():
order = {
"order_id": str(uuid.uuid4())[:8],
"customer_id": f"cust-{random.randint(100, 999)}",
"product": random.choice(products),
"quantity": random.randint(1, 5),
"price": round(random.uniform(9.99, 99.99), 2),
"timestamp": time.time()
}
producer.produce(
topic='orders',
key=order["customer_id"],
value=json.dumps(order)
)
producer.poll(0)
return order
# Simulate incoming orders
for i in range(20):
order = produce_order()
print(f"Order placed: {order['order_id']} - {order['product']} x{order['quantity']}")
time.sleep(0.5)
producer.flush()
Now create separate consumers for different concerns:
# inventory_consumer.py
import json
from confluent_kafka import Consumer
consumer = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'inventory-service', # Different group = independent consumption
'auto.offset.reset': 'earliest',
})
consumer.subscribe(['orders'])
print("[Inventory Service] Listening for orders...")
try:
while True:
msg = consumer.poll(1.0)
if msg is None or msg.error():
continue
order = json.loads(msg.value().decode('utf-8'))
print(f"[Inventory] Reducing stock: {order['product']} by {order['quantity']}")
except KeyboardInterrupt:
consumer.close()
# notification_consumer.py
import json
from confluent_kafka import Consumer
consumer = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'notification-service', # Different group!
'auto.offset.reset': 'earliest',
})
consumer.subscribe(['orders'])
print("[Notification Service] Listening for orders...")
try:
while True:
msg = consumer.poll(1.0)
if msg is None or msg.error():
continue
order = json.loads(msg.value().decode('utf-8'))
print(f"[Notify] Sending confirmation to {order['customer_id']}: "
f"Order {order['order_id']} placed")
except KeyboardInterrupt:
consumer.close()
Run both consumers, then the producer. Each consumer processes every order independently. The inventory service and notification service don't know about each other. Add an analytics service next week? Just create a new consumer with a new group ID. No changes to anything else.
Manual Offset Management
Auto-commit is convenient but dangerous. If your consumer crashes after committing but before actually processing the message, that message is lost. For critical data, commit manually:
consumer = Consumer({
'bootstrap.servers': 'localhost:9092',
'group.id': 'careful-consumers',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False, # We'll handle this
})
consumer.subscribe(['orders'])
batch = []
BATCH_SIZE = 10
try:
while True:
msg = consumer.poll(1.0)
if msg is None or msg.error():
continue
batch.append(msg)
if len(batch) >= BATCH_SIZE:
# Process the batch
for m in batch:
order = json.loads(m.value().decode('utf-8'))
process_order(order) # Your business logic
# Commit AFTER successful processing
consumer.commit(asynchronous=False)
print(f"Committed batch of {len(batch)} messages")
batch = []
except KeyboardInterrupt:
# Process remaining messages
if batch:
for m in batch:
process_order(json.loads(m.value().decode('utf-8')))
consumer.commit(asynchronous=False)
consumer.close()
Kafka Admin Operations with Python
You can manage topics programmatically:
from confluent_kafka.admin import AdminClient, NewTopic
admin = AdminClient({'bootstrap.servers': 'localhost:9092'})
# Create a topic
new_topics = [
NewTopic("analytics-events", num_partitions=6, replication_factor=1),
NewTopic("error-logs", num_partitions=3, replication_factor=1),
]
fs = admin.create_topics(new_topics)
for topic, f in fs.items():
try:
f.result()
print(f"Topic '{topic}' created")
except Exception as e:
print(f"Failed to create '{topic}': {e}")
# List topics
metadata = admin.list_topics(timeout=10)
for topic in metadata.topics:
print(f"Topic: {topic}, Partitions: {len(metadata.topics[topic].partitions)}")
When Kafka Is Overkill
Kafka is powerful, but it's not always the right tool. Don't use Kafka when:
- You have 2-3 services and simple HTTP calls work fine. Adding Kafka to a small system introduces complexity without much benefit.
- You need request-response semantics. Kafka is fire-and-forget by design. If you need a response from the consumer, you're fighting the abstraction.
- Your volume is low. Processing 100 events per day? A simple database queue or even Redis Pub/Sub will do.
- You don't have the operational capacity. Kafka clusters need monitoring, disk management, and partition rebalancing. If you're a solo developer, consider managed alternatives (Confluent Cloud, AWS MSK, or Upstash Kafka).
- Redis Streams -- Lightweight, fast, good for moderate volumes
- RabbitMQ -- Better for task queues and request-reply patterns
- AWS SQS/SNS -- Fully managed, pay-per-message, zero ops
- PostgreSQL LISTEN/NOTIFY -- If your events are already in Postgres
Common Mistakes
Not setting a message key. Without a key, events are round-robined across partitions. This breaks ordering guarantees. If order matters (and for user events, it usually does), set a key. Too many or too few partitions. You can't reduce partitions after creation. Start with 3-6 for development. In production, partition count roughly equals your maximum consumer parallelism. Ignoring consumer lag. If your consumers fall behind (consumer lag grows), events pile up. Monitor lag and scale consumers or fix slow processing. Not handling deserialization errors. A single malformed message can crash your consumer loop. Always wrap deserialization in try/except. Treating Kafka like a database. Kafka retains messages for a configurable period (default 7 days), not forever. It's a stream, not a data store. If you need permanent storage, consume events and write them to a database.What's Next
This tutorial covers the fundamentals, but Kafka's ecosystem goes deep:
- Kafka Connect -- Pre-built connectors for databases, S3, Elasticsearch, and dozens more
- Schema Registry -- Enforce message schemas with Avro or Protobuf
- Kafka Streams / ksqlDB -- Stream processing directly on Kafka data
- Exactly-once semantics -- Transactional producers and idempotent consumers
- Multi-cluster replication -- MirrorMaker 2 for geo-distributed setups
For more backend architecture and Python tutorials, check out CodeUp.