Resilient storage for your k8s cluster with Longhorn

In a previous article I talked about setting up a k3s cluster for your homelab. Now it’s time to discuss about storage for your workloads.

Default behavior

If you followed my article, you now have a pretty vanilla k3s cluster. If you run a workload that requires persistent storage, say a database instance, you request it from the cluster by defining something like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mariadb-data
  namespace: vikunja
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

Here we are defining a PersistentVolumeClaim of 500 MB, which subsequently can be used as a volume for some workload. But where does this storage come from? By default, the k3s cluster will provide it from the OS disk of the node running the workload. This has a few issues:

It’s not resilient. If your disk breaks, the data goes bye-bye.
You cannot move the workload. If you want to perform some maintenance on a node, you will have downtime, You can’t drain the node and start the workloads on other nodes because the storage is on the node that went down.
Different workloads running on different nodes cannot share a volume. This also includes the case where you want multiple pods of the same application to access the same storage (for example multiple instances of a web server accessing a database).

Longhorn to the rescue

Longhorn makes it possible to have highly available persistent storage in your k8s cluster and it also gives you the possibility to easily perform backups and incremental snaphots of your volumes.

I will mention there is a notable alternative in MicroCeph, which is a lightweight version of Ceph, a professional grade distributed system for people running their own k8s clusters in production. But for a small setup like a home lab, Longhorn seems to be the more popular option. I haven’t tried MicroCeph, but I have to say that once you set it up, Longhorn is extremely user friendly.

Installation

You should definitely follow the official docs, but the TL;DR is:

$ helm repo add longhorn https://charts.longhorn.io
$ helm repo update
$ helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.8.1

Check if it started successfully by running this command:

$ kubectl -n longhorn-system get pod
NAME                                                READY   STATUS    RESTARTS   AGE
csi-attacher-5d68b48d9-8hg9z                        1/1     Running   0          27s
csi-attacher-5d68b48d9-fkjnj                        1/1     Running   0          27s
csi-attacher-5d68b48d9-kdgff                        1/1     Running   0          27s
csi-provisioner-6fcc6478db-mbb5c                    1/1     Running   0          27s
csi-provisioner-6fcc6478db-x6nhk                    1/1     Running   0          27s
csi-provisioner-6fcc6478db-z7csq                    1/1     Running   0          27s
csi-resizer-6c558c9fbc-btwcs                        1/1     Running   0          27s
csi-resizer-6c558c9fbc-fgw9g                        1/1     Running   0          27s
csi-resizer-6c558c9fbc-ggpc4                        1/1     Running   0          27s
csi-snapshotter-874b9f887-2n6lx                     1/1     Running   0          27s
csi-snapshotter-874b9f887-gb2rm                     1/1     Running   0          27s
csi-snapshotter-874b9f887-jqhl4                     1/1     Running   0          27s
engine-image-ei-db6c2b6f-8lptt                      1/1     Running   0          84s
engine-image-ei-db6c2b6f-krtkt                      1/1     Running   0          84s
engine-image-ei-db6c2b6f-vrlh7                      1/1     Running   0          84s
instance-manager-1a1bab1ec71bdfa611ed414eec69bb5c   1/1     Running   0          45s
instance-manager-768de505cd7376cc6d5ec438199607b9   1/1     Running   0          54s
instance-manager-77d0fd072369dff7dc6a6198ad62847e   1/1     Running   0          45s
longhorn-csi-plugin-jz2sq                           3/3     Running   0          27s
longhorn-csi-plugin-ldn7c                           3/3     Running   0          27s
longhorn-csi-plugin-q2xfj                           3/3     Running   0          27s
longhorn-driver-deployer-7f95558b85-qcqrb           1/1     Running   0          106s
longhorn-manager-5d2rr                              2/2     Running   0          106s
longhorn-manager-qjgpj                              2/2     Running   0          106s
longhorn-manager-s4667                              2/2     Running   0          106s
longhorn-ui-7ff79dfb4-kjjsm                         1/1     Running   0          106s
longhorn-ui-7ff79dfb4-rkkt9                         1/1     Running   0          106s

Once it’s all up and running, you can check the k8s storage classes have been created. You should see the Longhorn specific ones as below:

$ kubectl get sc
NAME                   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  145m
longhorn (default)     driver.longhorn.io      Delete          Immediate              true                   2m13s
longhorn-static        driver.longhorn.io      Delete          Immediate              true                   2m11s

Set default storage class

You can see in the above output that the default storage class is still local-path, meaning the local node disk will be used. You can leave it like this and request for the longhorn storage class explicitly when defining a PersistentVolumeClaim, or you can set it as default and stop caring about it, which is what I did:

$ kubectl patch storageclass longhorn -p \
  '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'
$ kubectl patch storageclass local-path -p \
  '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "false"}}}'

Replication factor

The storage class called longhorn that was just created during the installation has a replication factor of 3. You can check by running:

$ kubectl get storageclass longhorn -o yaml
...
  numberOfReplicas: "3"
...

Among other output, you will see a numberOfReplicas parameter, whose value is 3.

If you, like me, are running a 3 node cluster and allocate one disk on each to the Longhorn storage pool, 3 is the maximum useful value you can go. If you have more disks you can of course go higher, but honestly how many disks are gonna fail at the same time? 3 should be enough.

However, you might want to go lower. A replication factor of 3 means that you will use 3x as much storage for your data. If you are constrained on disk space, you might want to go lower, so I will show you how to create a storage class with a replication factor of 2. Create a file called longhorn-2-replicas.yml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-2-replicas
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "30"
  fromBackup: ""
  fsType: "ext4"
  dataLocality: "disabled"
  unmapMarkSnapChainRemoved: "ignored"
  disableRevisionCounter: "true"
  dataEngine: "v1"
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

And the run:

$ kubectl apply -f longhorn-2-replicas.yml

If you want, you can set it as default by following the process I presented earlier.

Accessing the UI

For now you can access the interface by creating a local port-forward:

$ kubectl port-forward -n longhorn-system svc/longhorn-frontend 8080:80

Then open http://localhost:8080 in a browser.

In a future article, after we also install MetalLB on the cluster, I will also talk about exposing it through an ingress.

Adding disks in the OS

By default, each node already has a disk created, which is a folder that Longhorn created on the OS disk. You can see this in the Node section of the UI.

This will work fine, but if you want to use dedicated disks for Longhorn here’s what you need to do. Note that you will have to do this on each of your nodes. It goes without saying that you will lose any data that already exists on these drives.

For me the extra drive is recognized as nvme0n1, so in the commands below replace it with your own. You can find out the device name by running the lsblk command.

First, we wipe any existing partitions by running:

$ sudo wipefs -a /dev/nvme0n1
$ sudo sgdisk --zap-all /dev/nvme0n1

Afterwards it should look something like this:

$ lsblk /dev/nvme0n1
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0    0 238.5G  0 disk

Next, format the disk as ext4.

$ sudo mkfs.ext4 /dev/nvme0n1

Create a mount point:

$ sudo mkdir -p /mnt/longhorn

Mount the device in /etc/fstab and mount it:

$ echo '/dev/nvme0n1 /mnt/longhorn ext4 defaults 0 0' | sudo tee -a /etc/fstab
$ sudo mount -a

Check everything went well:

$ lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
...
nvme0n1 259:0    0 238.5G  0 disk /mnt/longhorn

Adding disks in Longhorn

Open the Longhorn UI and go to the Node section. On the right of each node, on the Operation column, go to Edit node and disks. Click Add disk at the bottom of that form. Fill the information as follows:

If you. like me, don’t want the OS drive used (the used folder is /var/lib/longhorn/), you can remove it by clicking Disable on Scheduling and then the Delete icon at the bottom.

Do this on all your nodes.

Solving reported issues

This is completely optional, none of these issues actually affect your cluster running just fine. But I don’t like seeing red, so I solved a couple of them.

In the Edit node and disks form there is also a Conditions section where some issues are reported: KernelModulesLoaded, Multipathd and RequiredPackages.

KernelModulesLoaded

Message: ‘Kernel modules [dm_crypt] are not loaded’.

The reason for it is that Longhorn optionally supports volume encryption and it needs this module in the OS. If you don’t plan to use encryption you can ignore it, but if you want to fix it you can run:

$ sudo modprobe dm_crypt
$ echo dm_crypt | sudo tee -a /etc/modules

Multipathd

Message: multipathd is running with a known issue that affects Longhorn. See description and solution at…

You’re probably not using multipathd, so the solution for this one is to disable it:

$ sudo systemctl disable multipathd --now
$ sudo systemctl disable multipathd.socket --now

RequiredPackages

Message: Missing packages: [nfs-common].

To fix this one, run:

$ sudo apt install -y nfs-common

Conclusion

You are now ready to schedule workloads with Persistent Volume Claims in your k3s cluster and your data will be replicated across all 3 nodes. In case a drive goes bad, you’re completely fine. You’re also fine if you want to take a node down for maintenance.

Hope this helps, have fun clickity-clacking.