22

What is the best practice for backing up a Postgres database running on Google Cloud Container Engine?

My thought is working towards storing the backups in Google Cloud Storage, but I am unsure of how to connect the Disk/Pod to a Storage Bucket.

I am running Postgres in a Kubernetes cluster using the following configuration:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: postgres-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - image: postgres:9.6.2-alpine
          imagePullPolicy: IfNotPresent
          env:
            - name: PGDATA
              value: /var/lib/postgresql/data
            - name: POSTGRES_DB
              value: my-database-name
            - name: POSTGRES_PASSWORD
              value: my-password
            - name: POSTGRES_USER
              value: my-database-user
          name: postgres-container
          ports:
            - containerPort: 5432
          volumeMounts:
            - mountPath: /var/lib/postgresql
              name: my-postgres-volume
      volumes:
        - gcePersistentDisk:
            fsType: ext4
            pdName: my-postgres-disk
          name: my-postgres-volume

I have attempted to create a Job to run a backup:

apiVersion: batch/v1
kind: Job
metadata:
  name: postgres-dump-job
spec:
  template:
    metadata:
      labels:
        app: postgres-dump
    spec:
      containers:
        - command:
            - pg_dump
            - my-database-name
          # `env` value matches `env` from previous configuration.
          image: postgres:9.6.2-alpine
          imagePullPolicy: IfNotPresent
          name: my-postgres-dump-container
          volumeMounts:
            - mountPath: /var/lib/postgresql
              name: my-postgres-volume
              readOnly: true
      restartPolicy: Never
      volumes:
        - gcePersistentDisk:
            fsType: ext4
            pdName: my-postgres-disk
          name: my-postgres-volume

(As far as I understand) this should run the pg_dump command and output the backup data to stdout (which should appear in the kubectl logs).

As an aside, when I inspect the Pods (with kubectl get pods), it shows the Pod never gets out of the "Pending" state, which I gather is due to there not being enough resources to start the Job.

Is it correct to run this process as a Job? How do I connect the Job to Google Cloud Storage? Or should I be doing something completely different?

I'm guessing it would be unwise to run pg_dump in the database Container (with kubectl exec) due to a performance hit, but maybe this is ok in a dev/staging server?

1
  • This is exactly my situation. Did you find a working solution you could share ? Commented Jul 2, 2020 at 7:21

6 Answers 6

10

As @Marco Lamina said you can run pg_dump on postgres pod like

DUMP
// pod-name         name of the postgres pod
// postgres-user    database user that is able to access the database
// database-name    name of the database
kubectl exec [pod-name] -- bash -c "pg_dump -U [postgres-user] [database-name]" > database.sql


RESTORE
// pod-name         name of the postgres pod
// postgres-user    database user that is able to access the database
// database-name    name of the database
cat database.sql | kubectl exec -i [pod-name] -- psql -U [postgres-user] -d [database-name]

You can have a job pod that does run this command and exports this to a file storage system such as AWS s3.

Sign up to request clarification or add additional context in comments.

1 Comment

The pg_dump process is the easy part. The difficulty is copying it to a GCS bucket, which is why I have left this question open.
3

The easiest way to dump without storing any additional copies on your pod:

kubectl -n [namespace] exec -it [pod name] -- bash -c "export PGPASSWORD='[db password]'; pg_dump -U [db user] [db name]" > [database].sql

Comments

2

The reason for the Jobs POD to stay in Pending state is that it forever tries to attach/mount the GCE persistent disk and fails to do so because it is already attached/mounted to another POD.

Attaching a persistent disk to multiple PODs is only supported if all of them attach/mount the volume in ReadOnly mode. This is of course no viable solution for you.

I never worked with GCE, but it should be possible to easily create a snapshot from the PD from within GCE. This would not give a very clean backup, more like something in the state of "crashed in the middle, but recoverable", but this is probably acceptable for you.

Running pg_dump inside the database POD is a viable solution, with a few drawbacks as you already noticed, especially performance. You'd also have to move out the resulting backup from the POD afterwards, e.g. by using kubectl cp and another exec to cleanup the backup in the POD.

2 Comments

Thank you for your suggestion but, as you say, while snapshots of the disk are possible, they are far from an ideal backup solution. I am looking for something stable that can be used in production, so I'll be leaving this question open for other solutions. Interesting about the disks, I had thought that connecting as read-only would be possible, but seemingly not.
As said, only if all PODs attach/mount as read only. Otherwise it could not guarantee consistency to the read only mounts.
2

I think running pg_dump as a job is a good idea, but connecting directly to your DB's persistent disk is not. Try having pg_dump connect to your DB over the network! You could then have a second disk onto which your pg_dump command dumps the backups. To be on the safe side, you can create regular snapshots of this second disk.

4 Comments

May I ask why you need the backups to be in Google Cloud Storage? I think the easiest way would be to just write a little script that executes pg_dump and pushes the data directly to Google Cloud Storage. Containerize the script, run it as a K8 job and you're done!
Buckets seem to be more scalable (no size management) and offer tools for mounting locally. Disks appear in the Compute section, so they seem targeted at running instances rather than long-term storage. This entire process needs to be automatic (no manual disk creation or snapshots), so help writing the script (connected to appropriate storage) is exactly what I was after with this question.
@MarcoLamina Can you elaborate "Containerize the script, run it as a K8 job" ?
@void A job is a K8 resource for running containers once (or x times) till completion: kubernetes.io/docs/concepts/workloads/controllers/…
2

A lot of tutorials use kubectl cp or transfer the file inside the pod, but you can also pipe the pg_dump container output directly to another process.

kubectl run --env=PGPASSWORD=$PASSWORD --image=bitnami/postgresql postgresql -it --rm -- \
  bash -c "pg_dump -U $USER -h $HOST -d $DATABASE" |\
  gzip > backup.sql.gz

Comments

1

You can use Minio Client

First of all use simple dockerfile to make docker image contains postgres along with minio client (let name this image postgres_backup):

FROM postgres

RUN apt-get update && apt-get install -y wget

RUN wget https://dl.min.io/client/mc/release/linux-amd64/mc

RUN chmod +x mc

RUN ./mc alias set gcs  https://storage.googleapis.com BKIKJAA5BMMU2RHO6IBB V8f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12

Now you can use postgres_backup image in your CronJob (I assumed you made backups bucket in your Google storage):

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: backup-job
spec:
  # Backup the database every day at 2AM
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: postgres-backup
            image: postgres_backup
            env:
            - name: POSTGRES_HOST_AUTH_METHOD
              value: trust
            command: ["/bin/sh"]
            args: ["-c", 'pg_dump -Fc -U [Your Postgres Username] -W [Your Postgres Password] -h [Your Postgres Host] [Your Postgres Database] | ./mc pipe gcs/backups/$(date -Iseconds).dump']
          restartPolicy: Never

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.