CloudCasa and Data Movement
When we began developing CloudCasa, a Software as a Service (SaaS) platform, for protecting Kubernetes applications, we looked at the data protection landscape and focused on areas that we could improve upon and give back to the user community. We wanted to provide them with a quick and efficient way in which they could start protecting this infrastructure with minimal effort, overhead, and most importantly minimal cost.
With CloudCasa, users can register a cluster or scan in a cloud account and immediately begin snapshotting stateful workloads for free by simply signing up for a CloudCasa user account. Users also have the option of backing up and moving that data to our object storage repositories (or object storage of their choosing) as part of CloudCasa Pro plans which are based upon the amount data you are backing up.
Velero and the Choice Between Restic vs Kopia
Velero is a popular backup and recovery tool for Kubernetes infrastructure, and with the proper know-how, users can perform Kubernetes backup and recovery using command line operations. Velero focuses on orchestrating Kubernetes interactions and relies on external open source tools to facilitate data movement between Kubernetes clusters and the object storage repository. Restic has been the tool most widely used in Velero deployments for this transfer of data1. However, another open source tool, Kopia, whose libraries CloudCasa has implemented into its data movement framework, presents a strong alternative. In fact, upon comparing Restic vs Kopia, Kopia is strong enough for the Velero community to consider deprecating Restic as the tool of choice for Kubernetes persistent volume data transfers. In this blog we show our analysis within CloudCasa, to support the choice we made between Restic vs Kopia.
Data Movement Performance
From a performance perspective – Kopia is the clear winner when it comes to data movement of persistent volumes from your Kubernetes cluster. Velero and CloudCasa both support backups to S3 compliant object storage. These backup storage locations are often in the cloud. Hence, it is critical to ensure that the persistent data from your Kubernetes clusters is moved as quickly and efficiently as possible to an object storage destination of your choice.
While deciding the choice of which data movement library – Restic vs Kopia – CloudCasa would adopt, our developers noted that the main advantage Kopia had over Restic is parallelism, i.e. the ability to send multiple streams concurrently to the object storage destination. With minor tweaking, we were able to sustain a 4x increase in performance in our lab environment backing up to an Amazon S3 Bucket.
Test Environment
Our environment was configured as follows:
Amazon EC2
- m5.xlarge
- 4x 3.1Ghz Xeon
- “up to” 10 GBps network
- “up to” 4.75 GBps EBS
Amazon EBS
- 12TB of General Purpose SSD
- 2 TB extra is for incrementals
Amazon S3
- 12TB of S3 Standard storage
Test Results
We used aws s3 cp as a benchmark to compare data transfer throughput between Restic vs Kopia.
In our initial test it seemed that Restic would win hands down, nearly doubling the performance of Kopia in a like for like test. Restic finished a 1TB base backup in about 4 hours, 6 minutes and 11 secs 4:06:11 at a rate of 68 MB/s, about 50% of the performance one could achieve through aws s3 cp. That same test when run against Kopia took over twice the time for backup, 10 hours, 33 minutes and 51 secs 10:33:51, at a paltry rate of 26 MB/s, only 19% of the performance that can be achieved from aws s3 cp.
Note that aws s3 cp uses a multipart upload. It will fragment one large file into multiple segments and upload them in parallel. Kopia does not have this feature and relies on uploading multiple files in parallel to achieve its concurrency.
So game over, right? Restic wins hands down and we can all call it day? Not quite, because as the saying goes, the “devil is always in the details.”
Kopia in this case can not make use of parallelism to enhance performance through multiple streams going to a storage destination. By default, Kopia uses 4 parallel threads when sending data. However, in the 1 TB of backup data contained in a single file test, a thread can only be created for each file.
For single threaded process like Restic which backs up a large file, this is more advantageous vs. a framework like Kopia, which would not be able to take advantage of its multithreaded capabilities.
Kopia wins on real life workloads
When dealing with persistent volumes, however, persistent volumes consist of multiple files and this is where Kopia can really begin to shine.
The way in which CloudCasa performs a Kubernetes backup of persistent volumes is by performing a base backup and then only backing up those files that have changed since the last backup.
This backup may consist of thousands of files ranging in size, but since Kopia is multithreaded it is not restricted to sending one file sequentially at a time. We can make use of this multi-threaded operation to send files in parallel and here is where we can really take notice of the difference. Kopia’s data transfer throughput times surpassed that of Restic by 4x when performing this test spread across many files. The 1 TB test, that was performed using a total of 16 parallel threads for our 1 TB data payload divided across 100 files, ran in 1 hr, 15 mins, 44 secs or 1:15:44 at 220 MB/s, nearly doubling the performance throughput of an aws s3 cp operation. This statistic is emphasized even more when increasing the number of parallel threads from 4 to 16, which can be adjusted dynamically within CloudCasa to achieve a level of optimal performance.
Conclusion
So what have we learned? In areas of Kubernetes data movement, the two most popular libraries for implementing transport of data are clearly Restic and Kopia. While they each have their own pros and cons, we’ve decided to implement Kopia as the data movement framework in CloudCasa, to scale Kubernetes backup performance through the multi-threaded framework of Kopia when sending over datasets with a large number of files.
CloudCasa announced a new service called “CloudCasa for Velero” in April 2023 which offers importing configuration from existing Velero installs and centralized management through a SaaS platform. CloudCasa for Velero also offers advanced, guided recovery to On-Premises and Cloud environments hosted in EKS, AKS and GKE – the Big3 Cloud Kubernetes Engines. Sign up for the free service plan and give it a try!