Database Replication Is More Interesting Than You Think
Replication is how databases stay available when hardware fails, distribute read load across multiple servers, and recover from disasters. Most developers know the basics — primary/replica, read replicas, failover. But the field has evolved significantly. Logical replication, Change Data Capture (CDC), and active-active setups have moved from enterprise-only features to something available on a $50/month database instance. Understanding the full picture helps you make better architectural decisions and avoid expensive mistakes.
Physical Replication: The Foundation
Physical (streaming) replication copies the raw binary data changes from primary to replica — it replicates everything at the storage level, regardless of database objects. In PostgreSQL, this is WAL (Write-Ahead Log) shipping.
# postgresql.conf on primary
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
hot_standby = on
# pg_hba.conf on primary — allow replica to connect
host replication replicator 10.0.1.5/32 md5
# On the replica: create a base backup from primary
pg_basebackup \
-h primary-db \
-U replicator \
-D /var/lib/postgresql/16/main \
-P \
-Xs \
-R # --write-recovery-conf: automatically creates standby.signal
The replica then streams WAL changes continuously. With PostgreSQL’s hot standby, replicas can serve read queries while following the primary.
Physical Replication Limitations
- Replica must run the same PostgreSQL major version as primary
- Replicas are read-only by default (hot standby mode)
- You can’t replicate to a different database engine
- You can’t replicate a subset of tables
- The entire cluster is replicated — can’t exclude high-churn tables
Logical Replication: Precision and Flexibility
Logical replication decodes WAL changes into row-level operations (INSERT, UPDATE, DELETE) and replays them on the subscriber. This gives you far more flexibility:
- Replicate specific tables, not the whole cluster
- Replicate to different PostgreSQL major versions
- Filter rows being replicated
- Transform data during replication
- Replicate to non-Postgres targets (via CDC consumers)
-- On the publisher (primary)
-- postgresql.conf: wal_level = logical
-- Create a publication for specific tables
CREATE PUBLICATION user_data_pub
FOR TABLE users, orders, payments
WITH (publish = 'insert, update, delete');
-- Create a publication for all tables
CREATE PUBLICATION all_tables_pub FOR ALL TABLES;
-- On the subscriber (replica)
CREATE SUBSCRIPTION user_data_sub
CONNECTION 'host=primary-db port=5432 dbname=myapp user=replicator password=secret'
PUBLICATION user_data_pub
WITH (
connect = true,
enabled = true,
copy_data = true, -- Initial data sync
create_slot = true
);
-- Check replication status
SELECT * FROM pg_stat_subscription;
SELECT * FROM pg_replication_slots;
Near-Zero-Downtime Major Version Upgrades
Logical replication enables a practical migration path between PostgreSQL major versions with minimal downtime:
- Set up a new cluster running PG 17 as a logical subscriber to your PG 16 primary
- Let it sync and catch up (can take hours/days depending on data volume)
- During a maintenance window: stop writes to PG 16, verify PG 17 is fully caught up, promote PG 17, update app connection strings
- Downtime: typically under 1 minute
Change Data Capture: Turning Your Database into an Event Stream
CDC takes logical replication further by treating every database change as an event that other systems can consume. Instead of application code explicitly publishing events to Kafka after a database write, the database itself becomes the source of truth for events.
The canonical setup uses Debezium, an open-source CDC tool that reads PostgreSQL WAL and publishes to Kafka:
-- postgresql.conf: wal_level = logical
-- Create a replication slot for Debezium
SELECT pg_create_logical_replication_slot(
'debezium_slot',
'pgoutput'
);
# Debezium connector configuration (Kafka Connect)
{
"name": "postgres-source-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "primary-db",
"database.port": "5432",
"database.user": "debezium",
"database.password": "secret",
"database.dbname": "myapp",
"database.server.name": "myapp",
"plugin.name": "pgoutput",
"slot.name": "debezium_slot",
"table.include.list": "public.orders,public.payments",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.handling.mode": "rewrite",
"topic.prefix": "myapp"
}
}
Each row change in the orders table becomes a Kafka message on the myapp.public.orders topic:
{
"before": null,
"after": {
"id": 9821,
"user_id": 4821,
"status": "completed",
"total_cents": 4999,
"updated_at": 1742041234000000
},
"op": "c", // c=create, u=update, d=delete
"ts_ms": 1742041234123,
"transaction": {
"id": "7:45982012",
"total_order": 1,
"data_collection_order": 1
}
}
CDC Use Cases That Justify the Setup Complexity
- Cache invalidation: Invalidate Redis cache entries when the underlying database rows change — without polluting application code with cache logic
- Search index updates: Stream changes to Elasticsearch or Typesense automatically
- Event sourcing without event sourcing overhead: Use your existing database as the source of truth, derive events from changes
- Audit logs: Every change to sensitive tables captured with full before/after values
- Cross-service data sync: Replicate specific tables to a separate analytics database without dual-writes in application code
Active-Active Replication: The Hard Problem
Physical and logical replication are both active-passive: one primary accepts writes, replicas follow. Active-active (multi-primary) replication allows writes on multiple nodes, with changes replicated in both directions. It sounds appealing but comes with a fundamental problem: write conflicts.
What happens when Node A and Node B both update the same row at the same time? Conflict resolution strategies:
- Last-write-wins: The change with the latest timestamp wins. Simple, but can silently discard writes.
- First-write-wins: The change that arrived first wins. Safer for some use cases.
- Application-level resolution: Your application defines merge logic. Complex but correct.
Citus (for horizontal sharding) and CockroachDB (for geo-distributed active-active) solve this with different trade-offs:
-- CockroachDB: Active-active across regions
-- No configuration needed for multi-region — built into the SQL layer
-- Set a table's primary region and survive region failures
ALTER TABLE orders SET LOCALITY REGIONAL BY ROW;
-- Queries automatically route to the nearest region
-- Writes involving rows in "us-east" go to us-east nodes
Monitoring Replication Lag: The Metric That Matters
Replication lag is the delay between a write on the primary and its appearance on replicas. For read replicas serving user-facing queries, stale data is a correctness problem. Monitor it:
-- On primary: check replication lag for each replica
SELECT
application_name,
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
write_lag,
flush_lag,
replay_lag,
sync_state
FROM pg_stat_replication;
-- On replica: check how far behind it is
SELECT
now() - pg_last_xact_replay_timestamp() AS replication_lag,
pg_is_in_recovery() AS is_replica;
# Prometheus alert rule for replication lag
- alert: PostgresReplicationLagHigh
expr: pg_replication_lag_seconds > 30
for: 5m
labels:
severity: warning
annotations:
summary: "Replication lag is {{ $value }}s on {{ $labels.instance }}"
Choosing the Right Replication Strategy
A decision framework based on your requirements:
- Read scaling + HA failover: Physical streaming replication. Simple, proven, built into Postgres.
- Major version upgrade with minimal downtime: Logical replication between versions.
- Selective table replication: Logical replication with publications.
- Database as event source: CDC with Debezium + Kafka.
- Multi-region active-active: CockroachDB or Spanner if you need it; most teams don’t.
- Analytics workload separation: CDC to a read-optimized store (Redshift, BigQuery, ClickHouse).
Most small-to-medium applications need exactly one thing: streaming physical replication to a hot standby for HA. The complexity of logical replication and CDC is justified when you have specific multi-system consistency requirements. Start simple, add complexity only when you have a concrete problem that simpler approaches can’t solve.
