Backup and recovery of Longhorn – Kubernetes Cluster
With the release of Longhorn v1.3, CloudCasa is happy to announce that it fully supports the backup and recovery of Longhorn persistent volumes (PVs) on Kubernetes clusters. While previous versions of Longhorn supported volume snapshots and the CSI interface, Longhorn v1.3 introduced full support for the CSI snapshot interface so it can now be used to trigger volume snapshots in a cluster.
CloudCasa makes use of this welcome new functionality for backup and recovery of clusters with Longhorn PVs, using either local snapshots or snapshots plus copies to remote storage. Backups can easily be managed across many clusters, and advanced recovery use cases for restores can be performed to the same cluster, across clusters, or even across cloud accounts, regions, and cloud providers.
What is Longhorn?
A Persistent Volume (PV) is a storage resource created and managed by the Kubernetes API that can exist beyond the lifetime of an individual pod. This is often used in conjunction with the aptly named StatefulSets. In AKS, you can use Azure Disks or Azure Files to provide the Persistent Volume. The choice of Disks or Files is often determined by the need for concurrent access to the data from multiple nodes or pods (lean towards Files) or the need for a performance tier (lean towards Disks).
Longhorn can seamlessly convert a large block of storage into thousands of volumes distributed as PVs – effectively delivering storage as a microservice. Replicas, snapshots and backups are some of the core functionalities that have long been ably supported by Longhorn.
Longhorn v1.3 now supports two types of data protection:
- Snapshots: Snapshots are stored locally, as a part of each replica of a volume. They are stored on the disk of the nodes within the Kubernetes cluster.
- Backups: Backups are objects stored in the backup store (BackupStore), which is an NFS or S3 compatible object store external to the Kubernetes cluster.
What is Container Storage Interface (CSI)?
Apart from provisioning PVs, a key functionality in CSI drivers is to enable snapshots and recoveries. Both Azure Managed Disks and Azure Files support snapshots of Persistent Volumes. However, not all CSI drivers are created equal. While they all support provisioning, several cloud and storage vendors do not support CSI for some of the more advanced functionality, such as resizing, recovery, and snapshots. Microsoft itself supports CSI drivers more extensively for Azure Managed Disks when compared to Azure Files.
Longhorn has supported CSI snapshots since 2020. However, there was an important intricacy to be aware of until the recently released version of Longhorn v1.3. The Longhorn CSI driver only supported volume backups to a target outside of a cluster as part of its Kubernetes VolumeSnapshot implementation. This was despite the Longhorn UI allowing both volume backup and snapshot options to be executed. In other words, for CSI Snapshotter requests, the Longhorn CSI driver invoked its backup workflow for the volume. A further complication was that this behavior also required that a user first manually configure an out of cluster BackupStore in order to invoke it.
What changed in Longhorn v1.3 CSI Snapshots?
As described in this Longhorn GitHub issue, the new behavior in Longhorn v1.3 allows in-cluster snapshots to be created through the CSI API. Longhorn v1.3 introduced a type parameter that allows you to request either a backup or a snapshot when a CSI snapshot is triggered, as shown below.
# To request a backup
# To request a snapshot
This change in CSI snapshot behavior is important for the following reasons:
- Consistency: As discussed above, the Longhorn UI already supported both backup and snapshot options. Most of the popular persistent storage systems behave similarly to the new implementation.
- Ease of Use: Previously, users had to configure an out of cluster BackupStore, even for taking a snapshot. Now, there is no such need, given snapshots are stored in-cluster.
- Compatibility: The most common side effects of the old behavior was that it broke third-party backup solutions that use snapshots. In CloudCasa, for example, PV backups often failed with a time-out, since snapshots weren’t expected to run for hours, and the jobs would often end up as partially successful.
Now purpose-built backup solutions can manage the snapshots and backup copies, setting retention policies for compliance and immutability for ransomware protection, tampering or accidental deletion.
Community Participation with Longhorn Team
At the KubeCon NA 2021 conference in LA last October, we engaged with the Longhorn team about the need to support in-cluster CSI snapshots. We were happy to provide some external testing, which was well received by the Longhorn engineering team. Using CloudCasa, we verified this functionality in the master branch as well as in recent Longhorn release candidates. We at Catalogic congratulate the Longhorn maintainers on the release of v1.3. and we thank the community for the work that went into it.
What is CloudCasa?
CloudCasa by Catalogic is a cyber-resilient backup service to protect Kubernetes workloads. CloudCasa integrates natively with all flavors of Kubernetes as well as Kubernetes management platforms like SUSE Rancher, and managed Kubernetes services such as AKS and Amazon EKS. CloudCasa relies on CSI-compliant storage platforms like Longhorn to take and manage snapshots to back up and restore Kubernetes Persistent Volumes (PVs) from recovery points.
Premium service plans provide PV backups along with cluster and cloud metadata backups, to ensure your data is safe and protected with unlimited retention times, and immutable recovery points. The saving of resource data and metadata enables advanced migration and recovery use cases, allowing organizations to easily restore data across clusters, regions, cloud accounts and cloud providers. This is important for disaster recovery scenarios, for cluster migration, and for providing replicas for Dev/Test environments.
The free service plan for CloudCasa has no limits on the number of snapshots managed, worker nodes, or clusters supported, and it provides up to 30 days of local PV snapshot retention and cluster resource data backups on secure, encrypted S3 storage. The premium plans are priced based on the amount of data you protect, not on the number of clusters you have or the number of worker nodes running.
How does CloudCasa backup Longhorn PVs
Let’s start with a Rancher cluster already configured with Longhorn v1.3. In a previous blog we covered the process for installing the CloudCasa agent from the SUSE Rancher Apps & Marketplace. The Helm chart for CloudCasa orchestrates installation of the CloudCasa backup agent containers on Rancher managed clusters and connects to the CloudCasa data protection service to register the clusters.
The screenshot below shows a registered cluster, “Longhorn1.3Cluster”, which has a namespace “longhornworkload” with a PV provisioned by Longhorn v1.3.
Next, we add a backup job through the CloudCasa UI, where you can choose to back up either the full cluster or specific namespaces and/or resources tagged with specific labels. In the screenshots below, we added a backup job and selected Full Cluster and all PVs attached to it. To demonstrate the new CSI snapshot process, we selected “Snapshot only” for the job.
The activity details of the backup job can be viewed in real time, and the PV details of the job once it has been completed.
Now that we have a successful backup, we can delete the namespace and restore it back to the same cluster to show a selective restore of a Longhorn v1.3 PV snapshot that is stored in-cluster. The restore job completes in almost the same time it took for us to back up.
In Summary and Your Next Steps
With CloudCasa, you can now leverage Longhorn v1.3 snapshots as your data protection method in CloudCasa very easily. You can setup backups of Longhorn data to any S3 storage, either self-managed or managed by CloudCasa. These snapshots can also be restored to alternate clusters or cloud providers, as well as mapped to different storage classes via the advanced cluster restore capabilities of CloudCasa. But that process didn’t change with this new version of Longhorn, so we will leave that for another day and another blog.
Until then, feel free to create a free account at cloudcasa.io/signup and start taking and managing snapshots to protect your clusters. We are confident that you’ll be done before your next coffee run.