Every developer has been burned by a backup strategy that looked solid on paper until the moment it needed to work. The database that restored from a week-old dump because nobody tested the incremental chain. The file backup that turned out to be syncing deletions upstream, so the ransomware attack took out both the source and the backup simultaneously. The off-site copy sitting in a storage bucket nobody had checked access credentials on in two years. A backup strategy is not a checkbox activity — it is a system with measurable properties, and most teams have not actually measured theirs. The 3-2-1 rule gives you a framework for thinking clearly about what you have and what you’re missing, but the rule itself is not enough. You also need to understand why well-intentioned backup implementations fail in practice and what it takes to build something that holds up under real conditions.

The 3-2-1 Rule: What It Actually Means

The 3-2-1 rule is simple: keep three copies of your data, on two different types of storage media, with one copy stored off-site. The formulation has been around since the film photography era, adapted for digital infrastructure, and it has survived because the underlying logic is sound. Three copies means that two simultaneous failures are required before you lose data — assuming the copies are truly independent. Two media types protect against media-specific failure modes: a storage array controller bug that corrupts all volumes attached to it doesn’t take out your tape archive. One off-site copy means that a physical disaster (fire, flood, theft) at your primary location doesn’t end the game.

Where the rule falls short is that it says nothing about recovery time, recovery point, or restore reliability. You can satisfy 3-2-1 perfectly and still find yourself unable to recover production in any reasonable window because nobody tested the restore process, the encryption key is missing, or the backup agent silently stopped running eight weeks ago. The rule describes a topology. What you actually need is a topology plus a tested procedure plus monitoring that proves the procedure ran.

A more complete way to think about it: 3-2-1 is the minimum structure. Recovery Point Objective (RPO) — how much data you can afford to lose — and Recovery Time Objective (RTO) — how long recovery can take before it becomes a business problem — are the properties your backup system needs to satisfy. Get those numbers from whoever owns the business side of the application, even if that person is you. Then design backward from those numbers, not forward from whatever backup tool you installed first.

Why Most Backup Strategies Fail in Practice

The failure modes are predictable enough that you can audit for them before disaster strikes. The most dangerous is the untested restore. Backup software that writes files successfully is not the same as backup software that produces restorable files. Corrupted archives, incomplete transaction logs, missing dependencies, and configuration drift between backup time and restore time are all real. If you have never actually restored from your backups to a staging environment and verified that the application ran correctly, you do not know whether your backups work. You have strong circumstantial evidence that they might work, which is not the same thing.

The second failure mode is missing automation. A manual backup process — even a well-designed one — degrades over time as the person who owns it gets busy, changes roles, or leaves the team. Backups that depend on human memory are not backups. They are intentions. Automation via cron or a dedicated job scheduler is not optional; it is the baseline. And automation without monitoring is hardly better than nothing: your cron job silently failing at 2 AM three weeks ago means you have been operating without a backup for three weeks while believing you had one.

The third failure mode is backup-sync confusion. Cloud storage sync tools — rsync, rclone in sync mode, Dropbox, even AWS S3 replication — propagate changes bidirectionally or from source to destination without retention. If you delete a file, the deletion replicates. If ransomware encrypts your source, the encrypted files overwrite your copies. Sync is not backup. Backup requires retention: the ability to recover from a point in the past, not just a copy of the current state.

The fourth is untested encryption key storage. Encrypted backups that are better than unencrypted ones in every threat scenario except one: the scenario where the key is inaccessible at restore time. Keys stored only on the server being backed up, in the same secrets manager whose credentials are now lost, or in an email nobody can access anymore represent a category of failure distinct from data loss but equally final.

Database Backups: Doing It Right for PostgreSQL and MySQL

Databases are not just files. Taking a filesystem-level snapshot of a live database without coordinating with the database engine produces an inconsistent copy — one that may technically contain all the bytes but represents no point in time the database was actually in. Do not back up database files with rsync or filesystem snapshots alone unless you understand exactly what coordination is required for your engine.

PostgreSQL: pg_dump plus WAL Archiving

pg_dump is the correct tool for logical backups of individual PostgreSQL databases. It produces a dump that is consistent to the point in time the dump started, independent of any locks on other tables, and portable across PostgreSQL versions. For a complete cluster backup including roles and tablespace configuration, use pg_dumpall. Both are included with any PostgreSQL installation and produce output that pg_restore can reconstruct reliably.

The limitation of pg_dump alone is RPO. If you run nightly dumps and your database has a failure at 5 PM, you lose a day’s writes. For workloads where that is acceptable, nightly pg_dump compressed with gzip and shipped off-site covers the basics. For anything requiring RPO measured in minutes rather than hours, add Write-Ahead Log (WAL) archiving.

WAL archiving captures every database change in sequence. Combined with a base backup, it lets you recover to any point in time — not just the last dump. The configuration is straightforward: set archive_mode = on and archive_command in postgresql.conf to a command that copies WAL segments to your backup destination as they complete. Tools like pgBackRest and Barman wrap this into a complete backup management solution with parallelism, retention management, and restore orchestration that is significantly more reliable than hand-rolled scripts.

MySQL: mysqldump plus Binary Log Backup

The MySQL equivalent of pg_dump is mysqldump with the --single-transaction flag, which uses InnoDB’s transactional consistency for a non-blocking consistent dump. Include --master-data=2 to record the binary log position at dump time — this is the coordinate you need to apply subsequent binary logs for point-in-time recovery. Without it, you have a consistent snapshot but no ability to replay incremental changes.

MySQL’s binary logs serve the same role as PostgreSQL’s WAL: they record every data-modifying statement in sequence. With log_bin enabled and binary logs archived, you can recover to any point after your last full dump by replaying the relevant binary log segments. The mysqlbinlog utility handles this, but the operational complexity of doing it correctly under pressure is significant. For production MySQL, consider Percona XtraBackup for physical backups (faster, lower overhead for large databases) combined with binary log archiving, or a managed service that handles this transparently.

Filesystem Backups: restic, BorgBackup, and rsync

For application files, configuration, and assets, the modern choice for most small teams is between restic and BorgBackup. Both support deduplication (so you’re not storing the same blocks repeatedly across snapshots), encryption, and efficient incremental backups. Both are actively maintained and support a wide range of storage backends.

Restic’s main advantages are simplicity and backend flexibility. A single binary, straightforward commands, and native support for S3, Backblaze B2, SFTP, and local storage make it easy to get working quickly. Restic’s deduplication is content-addressed at the chunk level, which means it handles file renames and moves more efficiently than block-level tools. The restore experience is clean: restic restore works, and the repo format is well-documented enough that the project has published recovery tools for use without the restic binary itself.

BorgBackup (Borg) is faster for large repositories, uses a slightly different compression and deduplication approach that can be more efficient for certain workloads, and has a mature ecosystem including BorgBase (a hosted storage service designed for Borg repositories). Its main friction point is that the Borg repository format is less portable — you need a compatible Borg version to read it, which is usually not a problem but is worth knowing.

Plain rsync deserves mention because it remains the tool most people already have and understand. For simple cases — syncing a directory tree to a remote server — rsync with --archive --delete works correctly. For backup use cases requiring versioning and retention, you need either rsync with hard-link snapshot patterns (time-machine style), or you should use restic or Borg instead. The snapshot pattern is: rsync to a dated directory, hard-linking unchanged files to the previous snapshot. This gives you a browsable history where each snapshot looks like a full copy but only stores new or changed data. Tools like rsnapshot automate this pattern.

Cloud Backup Targets: What to Use and What It Actually Costs

The economics of cloud object storage for backup have improved substantially. At the prices that have stabilized through 2025 and into 2026, the cost argument for not having off-site cloud backups is hard to make for any production system.

Backblaze B2 is the clearest choice for pure backup workloads where you are optimizing for storage cost. At roughly $6 per terabyte per month, with free egress for downloads via the Cloudflare CDN partnership and reasonable API call pricing, B2 is the lowest-cost option for the 1-in-the-cloud requirement of 3-2-1. Restic and Borg both support B2 natively. The one caveat is that B2 is a smaller company than AWS or Google, and if you need the absolute maximum in durability guarantees and SLA commitments, that matters.

Wasabi positions at roughly $7 per terabyte per month with free egress (subject to a minimum storage duration of 90 days per object — store data you plan to keep). It is S3-compatible, which means any tool that speaks S3 works with it, and it has multiple geographic regions. For teams already using S3-compatible tooling who want cost savings over AWS without moving to B2, Wasabi is the natural choice.

AWS S3 Glacier and Glacier Deep Archive exist at price points of roughly $4 and $1 per terabyte per month respectively, but the retrieval costs and latency make them suitable specifically for archive use cases where you are hoping never to retrieve but need to maintain compliance or long-term retention. For backups where you expect to test restores monthly and might need emergency recovery on short notice, standard S3 (or S3 Intelligent-Tiering) is a better fit than Glacier. The economics only favor Glacier if retrieval is genuinely rare and retrieval latency of hours is acceptable.

If you are already running AWS infrastructure, S3 Standard-IA (Infrequent Access) at roughly $12.50 per terabyte per month with lower retrieval costs than Glacier is a reasonable middle ground — higher than B2 or Wasabi, but with the operational simplicity of staying within your existing AWS account and IAM model.

Backup Automation and Monitoring

A backup job that runs unmonitored is not reliably a backup job. The setup that works at small scale is cron for scheduling plus a dead man’s switch service for monitoring. Healthchecks.io is the canonical tool for this: each backup job sends an HTTP ping to a project-specific URL when it completes successfully. If the ping doesn’t arrive within the expected window, Healthchecks.io sends you an alert. The free tier covers most small team use cases with ten checks and email notifications.

The pattern is straightforward. A backup script ends with:

#!/bin/bash
set -e

# Run backup
restic backup /var/www /etc /home \
  --repo s3:s3.wasabisys.com/my-backup-bucket \
  --password-file /etc/restic-password

# Forget old snapshots per retention policy
restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6 --prune

# Signal successful completion
curl -fsS --retry 3 https://hc-ping.com/your-check-uuid > /dev/null

The set -e at the top means any command failure exits the script immediately, so the ping URL never gets called if the backup or prune step fails. That is the correct behavior: you want the monitoring service to notice the absence of a successful ping, not receive a ping that was sent before the failure. Add --retry 3 to the curl call to handle transient network issues without masking real backup failures.

For PostgreSQL backup automation, the same pattern applies: wrap your pg_dump or pgBackRest stanza in a script, add the healthcheck ping at the end, schedule with cron, and verify the check fires as expected. Check the healthchecks.io dashboard after the first few scheduled runs to confirm timing expectations match reality.

Encryption: At Rest and in Transit

Backup data sent to a cloud provider or any remote storage that you do not fully control should be encrypted before it leaves your infrastructure. This is not about distrusting specific cloud providers — it is about the threat surface of storage credentials being compromised, provider employees, legal discovery, and the general principle that your backup data should be readable only by you.

Restic and BorgBackup both encrypt repository contents before writing to any backend. The encryption key (or passphrase) never leaves your infrastructure. A restic repository stored on Backblaze B2 is a collection of encrypted blobs: the storage provider sees only ciphertext. This is the correct model. Tools that rely on cloud provider-side encryption (SSE in S3 terms) protect you from some threats but not from provider-level access or credential compromise.

Key management is the hard part. The passphrase that decrypts your restic repository needs to be stored somewhere that is: accessible at restore time (including disaster scenarios where your primary infrastructure is gone), not stored only in the thing being backed up, and protected from unauthorized access. A password manager with offline backup, a physical document in a physically secure location, or a secrets management service with sufficient durability covers this. Document the key storage approach explicitly, and verify that whoever is on call for incidents knows where to find it.

For transit encryption: both restic and Borg use TLS when connecting to HTTPS-based storage backends. For SFTP targets, the SSH connection encrypts transit. The main risk area is custom scripts using rclone, aws s3 cp, or similar tools — verify that connections are HTTPS and that certificate validation is not disabled.

The Restore Test: Schedule Monthly Drills

Monthly restore drills are not optional if you want to know whether your backup system works. The drill does not need to be a full production cutover — it needs to verify that the restore process produces a working application from backup data without assistance from systems that might not exist in a real disaster.

A minimal monthly drill for a web application looks like: spin up a clean VM or container, restore the latest database backup, restore the latest file backup, run the application startup sequence, verify that the application responds correctly to a test request, confirm the data is recent. Document the time this takes. That is your actual RTO floor — the minimum time a recovery would require even under ideal conditions. If it takes four hours in a calm drill, plan for longer under real incident conditions.

Automate the drill if you can. A script that provisions infrastructure, restores from backup, runs smoke tests, and reports results is significantly more reliable than a calendar reminder to manually test. It also runs faster, which means you can run it weekly instead of monthly without consuming a day of engineering time each time. Tools that support this automation exist in every major infrastructure-as-code ecosystem.

When the drill fails — and it will, at some point — that is the system working correctly. You found the gap in a controlled context rather than at 2 AM with production down. Treat failed drills as the most valuable output of your backup program.

WordPress Backup Specifics

WordPress has a backup problem that is specific to its architecture: the application state is split between the filesystem (themes, plugins, uploads) and the database (content, settings, user data), and a backup that captures one but not the other at a matching point in time produces a restore that is inconsistent. Plugin versions in the database that don’t match plugin files on disk, or uploads referenced in the database that don’t exist in the filesystem, are common restore failure modes.

The cleanest approach for WordPress is to backup the database and filesystem together in a coordinated way, or to accept that restores from slightly different-time backups require manual reconciliation. For the database, mysqldump or a tool like WP-CLI’s db export produces a clean logical export. For the filesystem, back up wp-content (themes, plugins, uploads) — you do not need to back up WordPress core files since those are reproducible from WordPress.org. A wp-content backup plus a database dump with matching timestamps is a sufficient restore unit for most WordPress sites.

Managed WordPress hosting typically includes backups, but verify the retention period, verify that you can actually download the backup files (not just restore in-place with the same host), and verify that database backups are included and not just file backups. The ability to move your backup to a different host is a real property you should test before you need it.

When Managed Backups Are Worth the Cost

RDS automated snapshots, Supabase’s built-in backup, PlanetScale’s point-in-time recovery, and equivalents from other managed database services are genuine products with real value propositions. The question is not whether they are overpriced relative to DIY — they often are, at scale — but whether the operational cost of running your own backup infrastructure exceeds the premium.

RDS automated snapshots are on by default and require almost no configuration. They provide point-in-time recovery within the retention window, automatic storage management, and cross-region replication if configured. For a two-person team running a production application on RDS, the value proposition is strong: the backup system is maintained by AWS, tested continuously at scale, and requires no engineering attention beyond setting the retention period and occasionally verifying that snapshots exist. The cost (roughly 10-20% overhead on storage costs for most workloads) is worth it if the alternative is a self-managed backup solution that gets less attention than it should.

The case against managed-only backups is the single-provider dependency. If your RDS account is compromised, region-level snapshots may be in scope for an attacker. If your account is suspended or there is a billing dispute, access to snapshots may be interrupted. And RDS snapshots cannot easily be used to restore to a different database engine or to a self-hosted Postgres instance if you decide to migrate. For compliance, operational independence, or disaster scenarios that include provider failure, a secondary off-platform backup — even just weekly pg_dump exports to a different cloud account — significantly improves your position.

Disaster Recovery Planning for Small Teams

A disaster recovery plan is a document that answers specific questions under incident conditions when the people who know the system well are unavailable, stressed, or both. It does not need to be fifty pages. It needs to be accurate, findable, and usable by someone who knows the infrastructure category but may not know your specific application.

The minimum viable DR document for a small team includes: what needs to be recovered and in what priority order; where backups are stored and how to access them (including credentials or key locations, stored out-of-band); step-by-step recovery procedures for each critical system, written assuming the reader has general competence but no specific institutional knowledge; the RTO and RPO targets and how to verify you’ve met them; and contact information for dependencies (cloud providers, DNS registrars, CDN, any service your recovery depends on).

Test the document by having someone other than the author attempt a restore using only the document and the backups. The gaps that emerge are the most valuable output of the exercise. Update the document every time your infrastructure changes significantly. A DR plan that was accurate six months ago but predates a major infrastructure change is misleading in a disaster scenario — which is arguably worse than no plan, because it creates false confidence.

The backup strategy 3-2-1 rule in practice is not a deployment task you complete and file away. It is an ongoing system with measurable properties that you verify regularly. Three copies, two media types, one off-site is the topology. Tested restores, automated jobs with dead man’s switch monitoring, documented procedures, and scheduled drills are what make the topology into a system that actually works when you need it.

By Michael Sun

Founder and Editor-in-Chief of NovVista. Software engineer with hands-on experience in cloud infrastructure, full-stack development, and DevOps. Writes about AI tools, developer workflows, server architecture, and the practical side of technology. Based in China.

Leave a Reply

Your email address will not be published. Required fields are marked *