Backup SQL Data from Kubernetes to Google Cloud Storage

The Problem

We have a large cloud-native application running in Google Kubernetes Engine (GKE) that uses Microsoft SQL Server (mssql) for transactional persistence. SQL Server .bak files are difficult to access for the terminal-averse. If you have any experience with GKE or Kubernetes, you understand that the GUI doesn’t compare to the power of the CLI.

The Situation

The SQL Server instance is a click-to-deploy from Google Cloud Marketplace. All of the customization is done through the YAML deployment manifest. It has persistent volume claims (PVC) for both data storage and backups. SQL Server makes regular backups to the backup PVC. The bound persistent volumes (PV) have daily snapshots taken and stored in Google Cloud Storage (GCS). We want another, easier to access, backup that can be easily downloaded for on-prem testing. Additionally, we wanted 30 days of backups without an oversized PV.

Too Many Backups

Backups are like bandwidth: there’s no such thing as too much if you can afford it. GCS regional nearline storage cost is currently in low single digit pennies per GB per month. A few hundred GB stored for the month is less than the average lunch for one.

The Path To A Solution

We could create a custom multistage Dockerfile using the same SqlServer:latest base and installing kubectl and gsutils. We didn’t want to lose our click-to-deploy application (and SQL license) if we could help it. We reached out to Google and their amazing support team pointed us towards the beta feature CronJob. That started us down a flurry of searching and learning. Originally, we were trying to execute a command in the context of the pod. We could script out the commands we wanted, but without gsutil installed in the SQL Server pod, they were all doomed to failure.

We could copy them pod-to-pod and run the gsutil from our CronJob pod. It’s not the best option so we searched for a solution. As it turns out, you can mount a GCS bucket as a volume on a pod using FUSE. The downside is FUSE requires the pod to run with elevated privileges to the tune of SYS_ADMIN. We tried it anyway and… it failed with the large backup files.

The Current Solution

…is not the best solution in our opinion. We will continue to search for a better option and will update this when we have one. For now, we are using kubectl to copy the files from the SQL Server pod into our CronJob pod to the ephemeral volume. Then we are using tar to compress the backups before sending them to GCS. The bucket as a whole has a lifecycle rule of 30+ days then delete. We save the backup files into a subfolder in the format /backup/YYYY-mm-dd.

In the YAML we have a ClusterRole which is necessary for the default service account to be able to list the pods and execute the copy. It is applied with this command:
[codesyntax lang=”bash”]

kubectl create clusterrolebinding pod-reader --clusterrole=pod-reader --serviceaccount=default:default

[/codesyntax]
[codesyntax lang=”make”]

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
- apiGroups: [""]
  resources: ["pods/exec"]
  verbs: ["create"]

---

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: sql-backup-gcs
spec:
  schedule: "0 8 * * *"
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app: sql-backup-gcs
        spec:
          containers:
          - name: sql-backup-gcs
            image: gcr.io/google.com/cloudsdktool/cloud-sdk:latest
            command: 
            - bash 
            args: 
            - -xc
            - >-
                mkdir backups;
                cd backups;
                kubectl cp $(kubectl get pods --selector=app=mssql --output=jsonpath={.items[0].metadata.name}):/var/opt/bkp/data/ .;
                for i in *.bak; do tar -czf $i.tar.gz $i; done;
                find . -type f -iname "*.tar.gz" | gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp -I gs://my-bucket-name/backups/$(date +"%Y-%m-%d")/;
          restartPolicy: OnFailure

[/codesyntax]