Storing Postgres container databases as files for easier backups

2023-03-21

In my servers setup, each service gets its own directory in /opt with a Docker Compose file. Static configuration files go in /opt/$SERVICE/config. When a services produces some data it needs to maintain its state, that is also stored in /opt/$SERVICE/data. This way, each service is encapsulated and can, for example, be moved to another host simply by moving its directory (and a docker compose up). More importantly, services can easily be backed up.

Most services I deploy (like changedetection.io, TVHeadend), use flat files to serialize and store their state. I purposefully try to avoid services that need much state, and often refrain from deploying a service when a database management system is required, or I use SQLite if possible.

When choosing my current feed reader to self-host, however, I decided to go with Miniflux, which works only with Postgres. To still make the service portable and easy to back up, I came up with this setup:

every time the backing Postgres container is stopped (and additionally, on a schedule), it dumps its databases into a dump.sql.gz file,
when the Postgres service is started afresh, it automatically restores from dump.sql.gz.

The Setup

First, the compose.yaml (or equivalent) file has to be adapted, so that the postgres service has with the following keys:

volumes,
entrypoint,
stop_signal,
stop_grace_period,
healthcheck.

services:
  miniflux:
    image: miniflux/miniflux:2.0.43
    environment:
      DATABASE_URL: postgres://miniflux:internal@postgres/miniflux?sslmode=disable
      RUN_MIGRATIONS: 1
      CREATE_ADMIN: 1
      ADMIN_USERNAME: admin
      ADMIN_PASSWORD: miniflux
    restart: unless-stopped

  postgres:
    image: postgres:15.1-alpine
    environment:
      POSTGRES_USER: miniflux
      POSTGRES_PASSWORD: internal
    volumes:
      - ./data:/docker-entrypoint-initdb.d
      - ./utils:/utils:ro
    entrypoint: /utils/entrypoint
    stop_signal: SIGTERM
    stop_grace_period: 5m
    healthcheck:
      test: [CMD, /utils/dump]
      start_period: 3m
      interval: 1h
    restart: unless-stopped

Notice how the usual /var/lib/postgresql/data is not mounted in the container.

Then, two executables have to be created in the ./utils directory of the host, next to the compose.yaml file.

An executable ./utils/dump invokes pg_dump and unconditionally exits successfully:

#!/bin/sh

pg_dump --username="$POSTGRES_USER" --file=/docker-entrypoint-initdb.d/dump.sql.gz --compress=9
exit 0

An executable ./utils/entrypoint sets up a signal handler to be run before the container is stopped, and then invokes the original postgres entrypoint:

#!/bin/sh

DUMP_PATH=$(dirname "$0")/dump

on_exit() {
	"$DUMP_PATH"

	kill -SIGTERM "$POSTGRES_PID"
	wait "$POSTGRES_PID"

	exit 143
}

trap on_exit SIGTERM

/usr/local/bin/docker-entrypoint.sh postgres "$@" &
POSTGRES_PID=$!

tail -f /dev/null &
wait $!

Explanation

When the Postgres container is created, any file in /docker-entrypoint-initdb.d (such as our dump.sql.gz) is automatically executed, as explained in documentation for the official postgres Docker image.
(If the container is restarted and a database already exists, it will ignore the initialization directory and continue to use its existing storage and continue from where it left off.)

When the container is stopped, the overridden entrypoint will invoke the dump script which generates /docker-entrypoint-initdb.d/dump.sql.gz (from the container perspective) aka ./data/dump.sq.gz (host perspective).

Finally, the container healthcheck is misused to run a job on a schedule. In this case, it is configured to:

create a scheduled dump every hour,
not create scheduled dumps within the first 3 minutes of the each container start,
when stopping the container, wait up to 5 minutes for a dump to finish before letting the container engine forcefully kill the container.

The scheduled dump via healthcheck is performed so that in case of abrupt termination of the container (such as a power loss), where the signal handler defined in the overriden entrypoint doesn’t have a chance to run, a relatively up-to-date dump is still available.
Such a job can alternatively be set with cron on the host, for example, to avoid misusing the healthcheck functionality.

This setup is appropriate only for small databases, since the regular dumping of Postgres might get expensive, time- and CPU-wise, with bigger databases. Moreover, the reliability of the dump running is not guaranteed, so I would be conscious relying the setup for important data. However, it is always possible to check the dump.sql.gz creation date to be reassured before proceeding with a docker compose down.