Docker Compose for Production: Patterns That Actually Work

The conventional wisdom says Docker Compose is for development and Kubernetes is for production. Like most conventional wisdom in infrastructure, this is partly true, partly cargo cult, and mostly an oversimplification that costs small teams real money and operational complexity they don’t need. Running Compose in production is not a shortcut or a compromise — for the right workloads and team sizes, it is the correct choice. The question is knowing which patterns make it reliable and which habits will cause you problems at 2 AM. Docker Compose production patterns are well-established enough that “we just use Compose” is a defensible answer for a wide range of real deployments. Understanding what makes those deployments stable is what separates teams that run Compose confidently from teams that half-assedly deployed it and are constantly anxious about it.

When Compose Belongs in Production

Before discussing how to run Compose in production, it is worth being precise about when it makes sense. Compose is a single-host orchestrator. That sentence contains both its strength and its ceiling. On a single server — even a well-provisioned one — Compose offers simplicity, predictability, and virtually no operational overhead. You can understand the entire deployment by reading one file. There is no distributed state, no control plane, no etcd to back up and restore.

The workloads where Compose genuinely excels: small to mid-traffic web applications, internal tooling, staging environments that mirror production, self-hosted services (Gitea, Nextcloud, monitoring stacks), and SaaS products in early growth phases where one strong server handles all traffic comfortably. Teams of one to five people without dedicated infrastructure engineers are often better served by mastering Compose than by partially implementing Kubernetes. A well-run Compose deployment on a $40/month VPS with proper backups and monitoring will outperform a poorly managed Kubernetes cluster at ten times the cost.

The signals that you have outgrown Compose are specific: you need workloads distributed across multiple nodes for either capacity or availability; you need automated horizontal scaling in response to traffic spikes; you have a team large enough that deployment coordination across services becomes its own problem; or you are running in a regulated environment where Kubernetes abstractions provide meaningful compliance benefits. If none of those describe your situation, the pressure to adopt Kubernetes is cultural, not technical.

The Foundation: Health Checks and Restart Policies

The two most impactful changes between a development Compose file and a production-ready one are health checks and restart policies. Most tutorials skip both entirely.

Restart policies determine what Docker does when a container exits. In development, this almost never matters — you restart things manually. In production, container crashes are inevitable: out-of-memory kills, application bugs, dependency timeouts. Without a restart policy, your container is just gone until someone notices.

The two policies that matter for production are unless-stopped and always. The difference is narrow but meaningful: always restarts the container even if you manually stopped it with docker compose stop, which makes intentional maintenance operations frustrating. unless-stopped respects manual stops while still restarting on crashes and after host reboots. Use unless-stopped as your default for application services.

Health checks are the mechanism by which Docker determines whether a container is actually functioning, not merely running. A container can be “up” in the Docker sense — the process is alive — while the application inside is deadlocked, misconfigured, or in the middle of a crash loop. Without a health check, Docker Compose has no way to distinguish a healthy container from a broken one that happens to still have a running process.

A production health check for a web service looks like this in your compose.yml:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

The start_period field is frequently overlooked. It tells Docker to not count failures during the initial startup window, which matters for services that take time to initialize — database migrations, cache warming, JVM startup. Without it, a service with a 30-second initialization time will be marked unhealthy before it has had a chance to start properly.

Resource Limits: The Configuration Most Teams Skip Until It’s Too Late

Running containers without resource limits on a shared host is an operational liability. A memory leak in one service, a runaway query, a traffic spike hitting your application — any of these can consume all available RAM on the host and trigger the kernel OOM killer, which will start terminating processes with no regard for which ones matter most. Without limits, the wrong process will be killed at the worst possible time.

Resource limits in Compose live under the deploy.resources key. This is a point of regular confusion: many developers set mem_limit at the top level, which worked in Compose v2 syntax but is ignored by current Compose implementations unless you specifically run with Swarm compatibility. Use deploy.resources:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: "0.5"
    reservations:
      memory: 256M
      cpus: "0.25"

The limits values are hard ceilings — the container cannot exceed them. The reservations values communicate to Docker what this container needs as a minimum, which matters for scheduling decisions if you later move to Swarm mode. Reservations without limits are advisory; limits without reservations are the more common and more immediately useful configuration.

Setting memory limits forces you to think carefully about what each service actually needs, which is itself useful. A sidecar container running a log shipper probably needs 64MB. Your database might legitimately need 2GB. Applying the same generous limit to everything wastes resources and defeats the purpose. Instrument first, then set limits at roughly 150% of observed peak usage to give room for legitimate spikes without letting runaway processes consume the host.

Persistent Data: Named Volumes Over Bind Mounts

Bind mounts — mapping a host directory into a container with a path like ./data:/var/lib/postgresql/data — are convenient in development because you can see and manipulate the files directly. In production they create several problems. Host path dependencies make your Compose configuration non-portable. File ownership and permission mismatches between host user IDs and container user IDs cause hard-to-debug access errors. Docker itself has less visibility into the lifecycle of bind-mounted data, making volume management and cleanup more fragile.

Named volumes solve all of this. Docker manages the storage location, handles ownership correctly within the container lifecycle, and provides proper isolation from the host filesystem:

volumes:
  postgres_data:
    driver: local
  redis_data:
    driver: local

These volume definitions at the bottom of your compose.yml, referenced in service configurations as - postgres_data:/var/lib/postgresql/data, give you volumes that persist across container replacements, are properly namespaced by project, and can be backed up with docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres_data.tar.gz /data.

The one legitimate use case for bind mounts in production is configuration files that need to be edited on the host and reflected immediately in the container — Nginx config, for instance, where you want to reload configuration without rebuilding the image. Even then, prefer using Docker configs or environment-variable-driven configuration where possible.

Network Isolation: Expose Only What Needs to Be Exposed

The default Docker network model puts all containers in a shared bridge network where every container can reach every other container. For a single application this is probably fine. As soon as you are running multiple unrelated services on the same host, or services with different security requirements, this default creates unnecessary exposure.

The correct production pattern uses explicit named networks with internal services not connected to any externally facing network. A typical stack might look like this:

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true

The internal: true flag on the backend network means containers on that network have no outbound internet access — they can only communicate with other containers on the same network. Your Postgres database, Redis cache, and internal API services belong on the backend network. Only your reverse proxy belongs on the frontend network, with an explicit connection to the backend to proxy requests through.

This pattern limits the blast radius of a container compromise. If your application is compromised, the attacker has network access to whatever that container is connected to, not to your entire host network. It also makes your architecture explicit in the Compose file itself, which is documentation that stays in sync with reality because it is the configuration that creates reality.

Environment Variable Management

Storing secrets in your compose.yml is the infrastructure equivalent of committing passwords to a public repository. It happens, it is a bad idea, and the alternatives are not complicated.

The .env file pattern is the lowest friction starting point: Docker Compose automatically reads a .env file in the same directory as your compose.yml and makes those variables available for interpolation. You commit a .env.example with dummy values to your repository and add .env to your .gitignore. On the server, the real .env file lives outside version control, readable only by the deploy user.

# .env.example — commit this
POSTGRES_PASSWORD=change_me
POSTGRES_USER=appuser
SECRET_KEY=change_me_to_a_long_random_string
REDIS_PASSWORD=change_me

Docker Secrets, available when running in Swarm mode, provides a more sophisticated mechanism where secret values are mounted as files inside containers and never appear in environment variables or process listings. This is appropriate for higher-security deployments where environment variable inspection via docker inspect is a concern. For most small-team Compose deployments, a properly permissioned .env file with strict file permissions (chmod 600 .env) is a reasonable and practical choice.

What is not a reasonable choice: hardcoded values in compose.yml, passing secrets on the command line where they appear in shell history, or using overly permissive file permissions on your secrets files. These are the mistakes that turn a minor breach into a complete compromise.

Logging Configuration That Does Not Fill Your Disk

By default, Docker logs container output to JSON files on the host with no size limit. On a busy application, this will fill your disk. It is one of those things that doesn’t matter for weeks or months and then matters catastrophically when your root partition hits 100% at an inconvenient moment.

The production fix is simple — configure log rotation in either your daemon.json or per-service in compose.yml:

logging:
  driver: json-file
  options:
    max-size: "50m"
    max-file: "5"

This caps each service’s logs at 250MB total (five files of 50MB each) with automatic rotation. Adjust the values based on your log volume and available disk space. A low-traffic internal service might warrant smaller values; a high-traffic API might need larger ones.

For production deployments where you want centralized log aggregation, replace the json-file driver with a shipping driver. Loki with the Docker plugin is a popular choice for teams already running Grafana. The Fluentd driver works well with ELK stacks. The key point is that making this decision and configuring it explicitly is far better than discovering that your logging strategy was “fill the disk” when you need to diagnose an incident.

A Production-Ready Stack: The Full Pattern

Abstract principles are more useful with a concrete example. Here is a representative production compose.yml for a standard web application stack — Nginx reverse proxy, a Node.js application, Postgres, and Redis — with the patterns discussed above applied throughout:

services:
  nginx:
    image: nginx:1.27-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - certbot_data:/etc/letsencrypt
    networks:
      - frontend
      - backend
    depends_on:
      app:
        condition: service_healthy
    logging:
      driver: json-file
      options:
        max-size: "20m"
        max-file: "3"
    deploy:
      resources:
        limits:
          memory: 128M
          cpus: "0.25"

  app:
    image: registry.example.com/myapp:${APP_VERSION}
    restart: unless-stopped
    environment:
      DATABASE_URL: postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
      REDIS_URL: redis://:${REDIS_PASSWORD}@redis:6379
      SECRET_KEY: ${SECRET_KEY}
    networks:
      - backend
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    logging:
      driver: json-file
      options:
        max-size: "50m"
        max-file: "5"
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "0.75"

  postgres:
    image: postgres:16-alpine
    restart: unless-stopped
    environment:
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DB}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - backend
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          memory: 1G
          cpus: "1.0"

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 256mb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    networks:
      - backend
    healthcheck:
      test: ["CMD", "redis-cli", "--pass", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    deploy:
      resources:
        limits:
          memory: 384M
          cpus: "0.5"

volumes:
  postgres_data:
  redis_data:
  certbot_data:

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true

Several decisions in this configuration are worth calling out explicitly. The APP_VERSION variable in the image tag means you never deploy :latest — every deployment is pinned to a specific version. The depends_on conditions use service_healthy rather than just service_started, so startup ordering respects actual readiness rather than just process existence. The backend network is internal, meaning Postgres and Redis have no outbound internet access. Redis is configured with a memory limit and an eviction policy that prevents it from growing unboundedly.

The Deployment Workflow

A reliable deployment sequence for Compose on a single server is straightforward but requires discipline to execute consistently. The sequence that minimizes downtime and maximizes confidence:

Pull the new image before touching the running stack: docker compose pull
Apply the update: docker compose up -d
Verify health: docker compose ps and check that all services show as healthy
Tail logs briefly to catch immediate errors: docker compose logs -f --tail=50
Clean up old images: docker image prune -f

docker compose up -d performs a rolling replacement — it recreates only containers whose configuration has changed, leaving others untouched. For most Compose deployments this is close enough to zero-downtime for practical purposes, especially with a reverse proxy in front that can absorb the brief reconnection. If your application requires strict zero-downtime (new container fully healthy before old one dies), this is one of the legitimate cases where Swarm mode’s update configuration or a proper orchestrator offers meaningful improvement over plain Compose.

One often-overlooked improvement: run deployments from a script, not by hand. Even a ten-line shell script that runs these commands in sequence, checks exit codes, and sends a notification on failure is dramatically better than manual deployment. Manual deployments accumulate drift; scripted deployments stay consistent.

Backup Strategies for Compose Volumes

Named volumes in production need a backup strategy. “The data is on the server” is not a backup strategy.

For Postgres, the most reliable approach combines pg_dump for logical backups with volume snapshots for point-in-time recovery. A daily cron job that runs pg_dump inside the container, compresses the output, and ships it to object storage (S3, Backblaze B2, or equivalent) handles the database layer:

docker exec postgres pg_dump -U $POSTGRES_USER $POSTGRES_DB | gzip > backup_$(date +%Y%m%d).sql.gz

For the volume itself, stopping the database container, snapshotting the volume directory, and restarting provides a filesystem-level backup that can recover from corruption that logical dumps cannot catch. Doing both is not redundant — they protect against different failure modes.

Test your backups. An untested backup is a backup that might not work when you need it. A monthly restore drill into a test environment is not excessive. It is the only way to know whether your backup strategy actually functions.

Monitoring: Portainer, cAdvisor, and the Watchtower Question

Three tools come up consistently in conversations about Compose production monitoring, and they serve different needs.

Portainer provides a web UI for managing Docker containers, stacks, and volumes. Its value is primarily visibility — seeing at a glance which containers are running, their resource usage, and recent log output without needing SSH access. For teams where multiple people occasionally need to check deployment status, Portainer is worth the operational overhead of running it. For solo operators who are comfortable on the command line, it adds complexity without proportional benefit.

cAdvisor (Container Advisor) exports per-container metrics in a Prometheus-compatible format, covering CPU usage, memory consumption, network I/O, and disk I/O over time. If you are running a Prometheus/Grafana stack, adding cAdvisor gives you time-series visibility into container resource behavior that docker stats cannot provide. It is lightweight, requires no configuration, and is the correct tool for building capacity planning baselines.

Watchtower automatically checks for updated container images and restarts containers with the new versions. The appeal is obvious — automatic updates without any deployment workflow. The risk is equally obvious: a broken upstream image will automatically deploy itself to production with no human review. Watchtower in fully automatic mode on production application containers is not a good idea. The appropriate use is limited scope: automatic updates for low-risk infrastructure containers (the Watchtower container itself, perhaps a simple proxy), with monitoring notifications enabled so you see when it acts. For application containers that change frequently or have real traffic, use a deliberate deployment workflow instead.

Common Mistakes That Will Cost You

The failure modes in Compose production deployments cluster around a handful of specific mistakes that appear repeatedly across teams.

Running as root inside containers is the most consequential default to override. Most base images run as root unless you specify otherwise. If a vulnerability in your application allows container escape or code execution, a root-running container provides significantly more access than a least-privilege user would. Add a non-root user to your Dockerfiles: RUN addgroup -S app && adduser -S app -G app and USER app. For official images that don’t provide a non-root user, specify one in your compose.yml with the user key.

Using the latest tag in production creates an unpredictable deployment surface. latest is whatever the image publisher decided to tag last. A routine docker compose pull can pull a completely different version of your software than what was running before. Pin specific versions — not just major versions, but full patch versions like postgres:16.2-alpine — and upgrade deliberately rather than accidentally.

Not setting memory limits has been discussed, but the underlying habit is worth naming explicitly: skipping resource limits is an act of optimism, not configuration. It assumes that nothing will misbehave. Production environments are precisely where things misbehave, because they are running real traffic under real conditions with real data edge cases. Set limits.

Storing secrets in compose.yml creates a version control and access control problem simultaneously. Every person with repository access has your production credentials. Every deployment history shows credential values. Use environment variable interpolation from a .env file or, for higher-security requirements, Docker Secrets.

The Honest Assessment

Docker Compose production deployments are not second-class infrastructure. They are appropriate infrastructure for a specific and common set of workloads, and the teams running them successfully are not making a compromise — they are making a deliberate choice that trades Kubernetes’s operational complexity for Compose’s simplicity and their team’s bandwidth for more important problems.

The patterns that make Compose production-worthy are not exotic. Health checks, restart policies, resource limits, network isolation, proper secret management, log rotation, and a disciplined deployment workflow — none of these are difficult to implement. The gap between a Compose file thrown together for development and one that can run reliably in production for years is a few focused hours of configuration work.

Do that work. Pin your image versions. Set your memory limits. Configure your health checks. Test your backups. The result is a deployment stack you can understand completely, debug quickly, and operate confidently — which, for teams without dedicated infrastructure engineers, is worth more than a thousand lines of Kubernetes YAML.

Docker Compose for Production: Patterns That Actually Work

ByMichael Sun

When Compose Belongs in Production

The Foundation: Health Checks and Restart Policies

Resource Limits: The Configuration Most Teams Skip Until It’s Too Late

Persistent Data: Named Volumes Over Bind Mounts

Network Isolation: Expose Only What Needs to Be Exposed

Environment Variable Management

Logging Configuration That Does Not Fill Your Disk

A Production-Ready Stack: The Full Pattern

The Deployment Workflow

Backup Strategies for Compose Volumes

Monitoring: Portainer, cAdvisor, and the Watchtower Question

Common Mistakes That Will Cost You

The Honest Assessment

By Michael Sun

Related Post

SQLite in Production: When the Simplest Database Is the Right One

Caching Strategies for Web Applications: Redis, CDN, and Browser Cache Explained

Monorepos vs Polyrepos in 2026: What Actually Works for Small Teams

Leave a Reply Cancel reply

You missed

Zero-Downtime Deployments: Blue-Green, Canary, and Rolling Updates Explained

Building Accessible Web Applications: Beyond Checkbox Compliance

Infrastructure as Code for Solo Developers: Terraform, Pulumi, and When a Shell Script Is Enough

SQLite in Production: When the Simplest Database Is the Right One