In this edition: longer lasting data!


If you’re coming here from part one, you should have a single-node Rancher Kubernetes node just waiting to be explored. This part will go into more detail about some important components and how to use them to your advantage.

Storage in a Nutshell

Kubernetes, just like Docker, is stateless. As soon as you stop a pod, the data it contained is gone forever. This is only really a bad thing if you’re running something that needs to have persistent storage, like a database or wiki.

Keep in mind that most micro-services are designed to be configured through config-maps, secrets and environment variables without needing to store data, but there are easy ways to provide more permanent storage.

Persistent Volumes

Storage in Kubernetes is centered around the idea of Persistent Volumes being able to fulfil claims for storage. There are many types of built-in PVs that can be useful depending on the architecture and cloud environment of your system, such as NFS or vSphere. Take a look at the official docs to see a list of all supported PV types.

As this is a standalone server that’s not a part of the cluster (and more on this in another part), you probably won’t have access to many of these offerings. Luckily, you can specify that a PV should simply use the local server’s storage for the files. For instance:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: testing-pv
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/testing-pv"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: testing-pvc
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

This will create a 5GB volume under /mnt of the server, and a PVC that will be bound to that. You can then use testing-pvc as a PVC for a container.

Storage Provisioners

The above example isn’t ideal in the real world. Every time you need to deploy a workload, you’ll have to manually configure additional storage for each PVC that needs to be fulfilled.

Kubernetes has the concept of storage classes, which are a way to provide different kinds of storage to your cluster.

When you have a class set up, you can use dynamic volume provisioning to automatically provision just the right amount of storage that any PVCs require. By setting the reclaimPolicy in the storage class, you can also ensure that PVCs that are destroyed will clean up after themselves by removing the underlying PV and storage on-disk.

Example: NFS Provisioner

As this describes a single-node cluster, it makes sense to use the local disk as a storage source for containers running on it. There is a dead-simple NFS provisioner available that does the job well.

Installing via Rancher

As the previous section dealt with getting Rancher up and running, you should be able to follow along with deploying the server via a Helm chart in the GUI to make the process extremely easy:

  1. Log into the Rancher UI and ensure you’re at the Global scope:

rke-01

  1. Click on Catalogs, and make sure Helm Stable is set to Enabled
  2. Drop into the Cluster: local scope and click on Storage ▶️ Persistent Volumes
  3. Add a new volume
  4. Choose Local Node Path and a sane path (e.g. /mnt/ganesha), ensuring the directory is created
  5. Drop into the System scope and click Catalog Apps at the top
  6. Launch, and select nfs-provisioner from Library
  7. Enable persistent volume, and ensure the volume size matches the above PV

Watch the deployment and ensure everything sets itself up properly (it should!)

Dynamic Provisioning in Action

Now that a dynamic provisioner and default storage class exist, the above example can be simplified to this:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: testing-pvc
spec:
  storageClassName: nfs-provisioner
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Since the storage class is specified (or even if it isn’t when it’s set as the default), Kubernetes will automatically create a PV on the NFS “server” to match the PVC created. Pretty slick!

Backups

So, you have containers being able to tap into the local storage on your node. Great! But what happens if that node, or its disk, fails?

Stash and restic

Luckily, AppsCode have open-sourced a product called Stash: a backup system for Kubernetes that’s built on top of the excellent restic binary (which you should check out for your other projects).

In a nutshell, Restic is a system that provides a very, very simple way to back up to a wide range of destinations; from simple rsync locations to S3 or OpenStack buckets. Stash builds on top of this by creating sidecars in your deployments that access the persistent storage of your containers while they’re running, which get snapshotted, deduplicated, encrypted and sent offsite.

Installing Stash

Just like the above nfs-provisioner, Stash provides a Helm chart to make it easy to install in Kubernetes. In this example I’ll step through doing it via the command line instead.

There’s not much to the process: these steps are largely taken from the official guide:

helm repo add appscode https://charts.appscode.com/stable/
helm repo update
helm install appscode/stash --name stash-operator --version 0.8.3 --namespace kube-system

It’s very important that this is installed into the kube-system namespace. If you’ve followed the last part, you will most certainly have Role-based Access Control in effect, so Stash needs to be able to create its own role bindings to operate.

Configuring Backups

Prerequisites

First and foremost, you are going to need a location to back up to. The restic docs contain a list of all supported back-ends.

Secondly, you’ll need your API credentials (e.g. AWS_SECRET_ACCESS_KEY) for your chosen service. I’m using BackBlaze/B2 in this example, as I’ve configured it that way for my services.

Choosing a Target

Let’s take a look at an example workload to find out what’s needed to get Stash running. You can pull out the deployment’s YAML with kubectl get -n your-namespace deployment foobar -o yaml if you don’t have it handy.

For instance, my Gitea deployment looks something like this:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    git.service: gitea-web
  name: gitea-web
  namespace: git
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        git.service: gitea-web
    spec:
      containers:
      - image: gitea/gitea:1.6
        name: gitea-web
        ports:
        - containerPort: 3000
        - containerPort: 22
        resources: {}
        volumeMounts:
        - mountPath: /data
          name: gitea-web-claim0
      restartPolicy: Always
      volumes:
      - name: gitea-web-claim0
        persistentVolumeClaim:
          claimName: gitea-web-claim0

As shown in the YAML, the deployment is mounting gitea-web-claim0 to /data, and is running in the git namespace. The workload can be found using the label git.service: gitea-web.

Telling Stash What to Do

Next, you should make a secret containing your API keys and a password used to encrypt your repository:

pwgen -B 16 1 | tr -d '\n' > RESTIC_PASSWORD
echo -n 'your_b2_account_id' > B2_ACCOUNT_ID
echo -n 'your_b2_account_key' > B2_ACCOUNT_KEY
kubectl create secret generic -n git gitea-restic-secret \
    --from-file=./RESTIC_PASSWORD \
    --from-file=./B2_ACCOUNT_ID \
    --from-file=./B2_ACCOUNT_KEY

It’s very important that all trailing whitespace is removed from these files.

You should then create the spec for the restic that Stash will look after:

apiVersion: stash.appscode.com/v1alpha1
kind: Restic
metadata:
  name: gitea-backups
  namespace: git
spec:
  selector:
    matchLabels:
      git.service: gitea-web
  fileGroups:
  - path: /data
    retentionPolicyName: 'keep-last-14'
  backend:
    b2:
      bucket: git-data
    storageSecretName: gitea-restic-secret
  schedule: '30 1 * * *'
  volumeMounts:
  - mountPath: /data
    name: gitea-web-claim0
  retentionPolicies:
  - name: 'keep-last-14'
    keepLast: 14
    prune: true

It’s in the restic specification that you can dictate the schedule and number of backups to keep. You can be creative here, and have multiple schedules running multiple retention policies. This is also where you specify that the destination is B2, and that the bucket name is git-data.

Be warned: applying the Stash spec will cause the running container to restart as the sidecar is created!

To finish, apply your restic spec with kubectl apply -n git -f git-restic.yaml.

Confirming Your Backups

Simply run the following:

kubectl get -n git repo          
NAME                   BACKUP-COUNT   LAST-SUCCESSFUL-BACKUP   AGE
deployment.gitea-db    1349           51m                      56d
deployment.gitea-web   1342           21m                      56d

Conclusion

By adding persistent storage into your Kubernetes node, and ensuring that the data is properly backed up, you can get a lot more mileage out of just a single node. While I would certainly recommend against running any kind of production workloads on just one server, for personal services this is a great compromise and ensures your data is at least kept safe.

In future posts I’ll be exploring more ways to add security and redundancy to Kubernetes, but for now I hope you find this information helpful.