CloudCasa by Catalogic and SUSE have partnered to deliver a new disaster recovery solution for Kubernetes and SUSE Virtualization environments. This joint approach combines CloudCasa’s Kubernetes-native backup and recovery platform with SUSE Storage (a.k.a. Longhorn) and its Disaster Recovery Volumes feature. This provides a resilient, storage-driven DR strategy that reduces cost, complexity, and recovery times.
The solution separates the backup and restore of Kubernetes resources, which is managed by CloudCasa, from the replication of persistent volumes, which is handled by the storage layer. By leveraging Longhorn’s DR replication, the combined solution provides low-RTO DR that is simpler and more cost-effective than other alternatives.
The Challenges: Overcoming Traditional DR Pain Points
Imagine your team is running Kubernetes and SUSE Virtualization clusters, and you’re struggling to find a disaster recovery solution that meets your needs for low RPO and RTO. You’re likely dealing with traditional solutions that are not only prohibitively expensive and complex, but also lead to limited flexibility and vendor lock-in. For large hybrid and edge environments, constraints on bandwidth, cost, and supportability make it even more difficult to find an appropriate solution, causing a constant sense of anxiety.The Solution: CloudCasa and SUSE Storage
The CloudCasa and SUSE Longhorn solution eliminates much of the cost and complexity by aligning application recovery with storage-based volume replication. CloudCasa protects Kubernetes application configurations, metadata, and associated Kubernetes resources. Longhorn maintains disaster recovery volumes at the destination cluster, which are periodically updated from the source cluster. In the event of a DR failover, CloudCasa restores the application and automatically maps and activates the corresponding Longhorn DR volumes. This ensures that persistent volumes are instantly available to the application without manual intervention. The result is a streamlined, repeatable workflow that minimizes downtime, reduces the amount of data that must be transferred during recovery, and removes the need to duplicate replication functionality.Understanding RPO and RTO
- A recovery point objective (RPO) is the amount of data loss that can be tolerated in the event of a failure, measured in time. For example, if you are willing to lose at most the last 30 minutes worth of activity in a given system in the event of a disaster, the RPO for it would be 30 minutes.
- A recovery time objective (RTO) is the maximum amount of downtime that can be tolerated for an application. For example, if an application must be back online within 60 minutes after a failure, the RTO would be 60 minutes.
How It Works: The Seamless Workflow
- Preparation: Longhorn DR volumes are configured to ensure volume replication between your source and destination clusters. The replication frequency should be based on your RPO.
- Protection: CloudCasa runs periodic application-level backups, capturing Kubernetes resources and metadata, while relying on Longhorn to handle persistent volume replication. The frequency of CloudCasa backups should also be based on your RPO. PV data can be excluded from CloudCasa backups or preferably backed up less frequently to protect against logical failures.
- Disaster Recovery: In a disaster recovery scenario, CloudCasa restores the application resource data, referencing the existing Longhorn DR volumes. The DR volumes are automatically activated and made available as persistent volumes on the destination cluster as part of the restore. Since the amount of data restored by CloudCasa is very small, the recovery process can be completed, and the applications made available within minutes.