Homegrown ZFS-based Cloud Backup

I started using ZFS a few years ago, and it’s been nothing short of amazing. However, one area that I wanted to take advantage of is the fact that zfs send/recv act as a very convenient incremental backup system. After some research, I found that very few providers natively supported this, and the ones that did didn’t seem to have competitive pricing for a home-tier usage (i.e. enterprise grade redundancy and reliability not required). With a pool of about 7TB and growing, most of the options just ended up being too pricy. After looking at my options, I decided to apply a bit of DIY.

First off, the obvious option: rsync.net. They were the only provider I found that natively supported zfs send/recv as a backup scheme. However, their pricing comes out to $25/mo/TB. As someone who is cheap, and used to paying $6 a month for unlimited backup on Backblaze, I didn’t feel like paying $175 a month.

Amazon Glacier has great pricing, at $4/mo/TB. However, you’d still need something to actually bridge ZFS to Glacier. Most of the pre-canned solutions I found for this had deficiencies compared to having direct access to the receiving ZFS system. For example, if I simply take the output of zfs send -R and put it in the cloud, I would lose the ability to free up space by deleting old snapshots that I no longer need. The lifecycle management of the backups would become the problem. You’d basically be stuck with either one full backup and endless incremental snapshots, trying to do a weird hybrid of differential+incremental backups, or having to occasionally do a new full backup (which, given the size, is not an option).

What I ended up doing was simply paying for a block storage slab from my usual VPS provider, BuyVM/Frantech. At $5/mo/TB, it comes in at slightly more expensive than glacier, but it checks all the boxes – I can just attach a VPS node to the storage slab, and I’m good to go. I still get one of the big advantages of cloud storage, in that I can grow the filesystem as needed (by purchasing additional slabs), but I get all the native ZFS goodies. If I no longer need an intermediate snapshot, I can just delete it without worrying about whether any incrementals are dependent on it. Do I get all the data integrity guarantees of an enterprise cloud provider? No, but I’ve had them lose my data too. I’m willing to accept the tiny risk that my main copy, local backups, and cloud backup could all be wiped out at the same time.

The biggest challenge with a cloud backup is actually getting the data there. It will take a while, so you need a solution that allows for an interrupted transfer to be resumed (which ZFS can do). There’s also the issue of bandwidth caps. Comcast has a 1.2TB cap on combined upload+download – but you get a “courtesy month” once every 12 months where you can use however much you want. At 40mbps up, I can upload about 13TB in a single month. If it were any company other than Comcast, I might have felt bad about that. However, this means that I basically have one single chance to get it right.

I went with zrepl for managing snapshots and replication. It’s nice in that it will do as little or as much as you want, so you can fit it into an existing backup scheme or use it for everything. After initial setup, it’s very self-sufficient and any problems will usually work themselves out with minimal poking and prodding. Apart from a ZFS version compatibility issue which necessitated an upgrade, it just works.

It even takes care of a common annoyance when doing a direct zfs recv in that it creates placeholder filesystems beforehand. This avoids issues where the new filesystems get every single setting of the original, including mountpoint, resulting in a bunch of stuff getting mounted on your backup receiver. There are ways around it, but this seems like a rather clean solution.

One disadvantage is that there’s no built-in way to limit bandwidth. With a plain send/recv, you can use a program like pv to limit the transfer speed. However, zrepl does not provide a way to hook this in. Thus, you will have to rely on another solution, such as firewall-based throttling.

Overall, definitely a nice backup solution. If you’re already using ZFS, I can definitely recommend this setup.

Leave a Reply