Event-driven architecture (EDA) has a reputation problem. Mention it to a small team and they picture Kafka clusters, schema registries, complex consumer groups, and a distributed systems PhD on the payroll. The enterprise literature does not help — it assumes you have a platform team, a dedicated infrastructure budget, and services counted in dozens.
But event-driven patterns can dramatically simplify small applications too. The key is choosing the right level of complexity for your team size. A 5-person startup does not need Kafka. They need patterns that decouple their services, handle failures gracefully, and can scale when the time comes.
This article presents event-driven architecture at three levels of complexity, with real implementation patterns for each. Start at level one and graduate to level two or three only when you hit specific scaling walls.
Why Events Matter for Small Teams
Consider a typical SaaS application. A user signs up. What happens next?
- Create the user record in the database
- Send a welcome email
- Create a Stripe customer
- Set up default project and workspace
- Notify the sales team in Slack
- Track the event in analytics
In a synchronous architecture, all of this happens in the signup handler:
// The synchronous nightmare
async function handleSignup(req, res) {
const user = await db.users.create(req.body); // 50ms
await sendWelcomeEmail(user); // 800ms
await stripe.customers.create({ email: user.email }); // 400ms
await createDefaultWorkspace(user.id); // 200ms
await notifySlack(`New signup: ${user.email}`); // 300ms
await analytics.track("user.signup", user); // 150ms
// Total: ~1900ms — user waits almost 2 seconds
// If Stripe is down, signup fails entirely
res.json({ user });
}
This has three critical problems: the user waits for all downstream operations, any single failure blocks the entire signup, and every new side effect makes the handler slower and more fragile.
Event-driven: the signup handler creates the user and publishes an event. Everything else happens asynchronously:
// Event-driven: clean, fast, resilient
async function handleSignup(req, res) {
const user = await db.users.create(req.body); // 50ms
await events.publish("user.signup", { userId: user.id, email: user.email });
res.json({ user }); // Total: ~60ms
}
// Each side effect is an independent handler
events.on("user.signup", sendWelcomeEmail);
events.on("user.signup", createStripeCustomer);
events.on("user.signup", createDefaultWorkspace);
events.on("user.signup", notifySlackSales);
events.on("user.signup", trackAnalytics);
Response time drops from 1900ms to 60ms. If Stripe is down, the user still signs up — the Stripe handler retries independently. Adding a new side effect is adding one handler, not modifying a critical code path.
Level 1: In-Process Events (0-3 Services)
For a single application or a monolith, you do not need a message broker. An in-process event emitter with a persistent job queue handles most event-driven patterns.
Implementation with BullMQ (Node.js)
// events/emitter.ts
import { Queue, Worker, QueueEvents } from "bullmq";
import Redis from "ioredis";
const connection = new Redis(process.env.REDIS_URL);
// One queue per event type
const queues = new Map<string, Queue>();
export function getQueue(eventName: string): Queue {
if (!queues.has(eventName)) {
queues.set(eventName, new Queue(eventName, { connection }));
}
return queues.get(eventName)!;
}
export async function publish(eventName: string, data: Record<string, unknown>) {
const queue = getQueue(eventName);
await queue.add(eventName, data, {
attempts: 3,
backoff: { type: "exponential", delay: 1000 },
removeOnComplete: { count: 1000 },
removeOnFail: { count: 5000 },
});
}
// Register a handler for an event
export function on(
eventName: string,
handler: (data: Record<string, unknown>) => Promise<void>,
options?: { concurrency?: number }
) {
const worker = new Worker(
eventName,
async (job) => {
await handler(job.data);
},
{
connection,
concurrency: options?.concurrency ?? 5,
limiter: { max: 10, duration: 1000 }, // Rate limit
}
);
worker.on("failed", (job, err) => {
console.error(`Handler for ${eventName} failed:`, {
jobId: job?.id,
attempt: job?.attemptsMade,
error: err.message,
});
});
return worker;
}
// handlers/signup.ts — Clean, isolated handlers
import { on } from "../events/emitter";
on("user.signup", async (data) => {
const { userId, email } = data;
await emailService.send({
to: email,
template: "welcome",
data: { userId },
});
}, { concurrency: 10 });
on("user.signup", async (data) => {
const { userId, email } = data;
const customer = await stripe.customers.create({
email,
metadata: { userId },
});
await db.users.update(userId, { stripeCustomerId: customer.id });
}, { concurrency: 3 }); // Lower concurrency for Stripe rate limits
This gives you retry logic, concurrency control, rate limiting, and failure tracking — all with Redis as the only infrastructure dependency (which you likely already have for caching).
Level 2: Message Broker (3-10 Services)
When you have multiple services that need to react to events, an in-process emitter is not enough. You need a message broker. For small teams, there are two solid choices:
Redis Streams
If you already run Redis, Redis Streams is the path of least resistance. It provides pub/sub with persistence, consumer groups, and acknowledgment — essentially a lightweight Kafka.
// Producer: publish events to a Redis Stream
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);
async function publishEvent(stream: string, event: object) {
const id = await redis.xadd(
stream,
"*", // Auto-generate ID
"data", JSON.stringify(event),
"type", event.type,
"timestamp", Date.now().toString()
);
return id;
}
// Consumer group: each service gets its own group
async function createConsumerGroup(stream: string, group: string) {
try {
await redis.xgroup("CREATE", stream, group, "0", "MKSTREAM");
} catch (err) {
if (!err.message.includes("BUSYGROUP")) throw err;
// Group already exists, that's fine
}
}
// Consumer: read events with acknowledgment
async function consumeEvents(
stream: string,
group: string,
consumer: string,
handler: (event: object) => Promise<void>
) {
while (true) {
const results = await redis.xreadgroup(
"GROUP", group, consumer,
"COUNT", 10,
"BLOCK", 5000, // Wait up to 5s for new events
"STREAMS", stream, ">"
);
if (!results) continue;
for (const [, messages] of results) {
for (const [id, fields] of messages) {
const data = JSON.parse(fields[1]); // "data" field
try {
await handler(data);
await redis.xack(stream, group, id);
} catch (err) {
console.error(`Failed to process ${id}:`, err);
// Message will be redelivered to another consumer
}
}
}
}
}
When to Graduate to RabbitMQ or NATS
Redis Streams works well for simple event routing. When you need:
- Complex routing (topic-based, header-based)
- Dead letter queues with sophisticated retry policies
- Priority queues
- Messages larger than 512MB
Then RabbitMQ or NATS is the right step. Both are dramatically simpler to operate than Kafka and handle millions of messages per day on a single node.
Level 3: Event Streaming (10+ Services or Real-Time Needs)
Kafka or Redpanda enter the picture when you need:
- Event replay (reprocess historical events)
- Event sourcing (derive state from event log)
- Real-time stream processing
- Guaranteed ordering within partitions
For small teams that genuinely need streaming, Redpanda is the better choice. It is API-compatible with Kafka but runs as a single binary without the ZooKeeper/KRaft complexity.
# docker-compose.yml for Redpanda (single-node, development)
services:
redpanda:
image: redpandadata/redpanda:v24.1.1
command:
- redpanda
- start
- --smp 1
- --memory 1G
- --overprovisioned
- --kafka-addr PLAINTEXT://0.0.0.0:9092
ports:
- "9092:9092"
- "8081:8081" # Schema registry
- "9644:9644" # Admin API
Event Design Patterns
Regardless of which level you operate at, these patterns determine whether your event-driven architecture is maintainable:
1. Event Naming Conventions
// Use past-tense, domain-specific names
// Good:
"user.signup.completed"
"order.payment.failed"
"project.member.invited"
// Bad:
"createUser" // Imperative — this is a command, not an event
"USER_CREATED" // No domain context
"handle-signup" // Implementation detail in the name
2. Event Schema with Versioning
// Every event has a consistent envelope
interface EventEnvelope<T> {
id: string; // Unique event ID (UUID)
type: string; // Event name
version: number; // Schema version
timestamp: string; // ISO 8601
source: string; // Which service produced this
correlationId: string; // For tracing across services
data: T; // The actual payload
}
// Example
const event: EventEnvelope<UserSignupData> = {
id: "evt_abc123",
type: "user.signup.completed",
version: 2,
timestamp: "2026-03-30T10:00:00Z",
source: "auth-service",
correlationId: "req_xyz789",
data: {
userId: "usr_456",
email: "developer@example.com",
plan: "starter",
},
};
3. Idempotent Handlers
Events will be delivered more than once. Every handler must be idempotent:
// Idempotent handler using event ID deduplication
async function handleUserSignup(event: EventEnvelope<UserSignupData>) {
// Check if we already processed this event
const processed = await db.processedEvents.findUnique({
where: { eventId: event.id },
});
if (processed) {
console.log(`Event ${event.id} already processed, skipping`);
return;
}
// Process the event
await stripe.customers.create({
email: event.data.email,
metadata: { userId: event.data.userId },
});
// Mark as processed
await db.processedEvents.create({
data: { eventId: event.id, processedAt: new Date() },
});
}
Monitoring Event-Driven Systems
The biggest operational challenge with EDA is visibility. When a signup handler fails silently, nobody knows until a user complains about not receiving their welcome email.
Essential metrics to track:
- Queue depth: How many unprocessed events are waiting? A growing queue indicates a consumer cannot keep up.
- Processing latency: Time from event publication to handler completion. Spikes indicate downstream service issues.
- Error rate per handler: Which handlers are failing, and how often?
- Dead letter queue size: How many events have exhausted all retries?
// Minimal monitoring with BullMQ
import { Queue } from "bullmq";
async function getQueueMetrics(queueName: string) {
const queue = new Queue(queueName, { connection });
const [waiting, active, completed, failed, delayed] = await Promise.all([
queue.getWaitingCount(),
queue.getActiveCount(),
queue.getCompletedCount(),
queue.getFailedCount(),
queue.getDelayedCount(),
]);
return { queueName, waiting, active, completed, failed, delayed };
}
// Expose as a /metrics endpoint for Prometheus
app.get("/metrics", async (req, res) => {
const queues = ["user.signup", "order.created", "payment.processed"];
const metrics = await Promise.all(queues.map(getQueueMetrics));
let output = "";
for (const m of metrics) {
const name = m.queueName.replace(/\./g, "_");
output += `queue_waiting{queue="${name}"} ${m.waiting}\n`;
output += `queue_failed{queue="${name}"} ${m.failed}\n`;
output += `queue_active{queue="${name}"} ${m.active}\n`;
}
res.type("text/plain").send(output);
});
When Not to Use Events
Event-driven architecture adds indirection. Not every operation benefits from it:
- Synchronous user flows: If the user needs an immediate response that depends on the result (e.g., payment confirmation before showing a receipt), keep it synchronous.
- Simple CRUD with no side effects: Creating a note in a note-taking app does not need an event system.
- Operations requiring strong consistency: If two operations must succeed or fail together, use a database transaction, not events.
The right amount of event-driven architecture for most small teams is selective: use events for side effects, keep core business logic synchronous. You get 80% of the benefits with 20% of the complexity.
