A quick-start guide to OpenZFS native encryption

23 Giugno 2021 News

Enlarge / On-disk encryption is a complex topic, but this article should give you a solid handle on OpenZFS’ implementation.

One of the many features OpenZFS brings to the table is ZFS native encryption. First introduced in OpenZFS 0.8, native encryption allows a system administrator to transparently encrypt data at rest within ZFS itself. This obviates the need for separate tools like LUKS, VeraCrypt, or BitLocker.

OpenZFS’ encryption algorithm defaults to either aes-256-ccm (prior to 0.8.4) or aes-256-gcm (>= 0.8.4) when encryption=on is set. But it may also be specified directly. Currently supported algorithms are:

aes-128-ccm
aes-192-ccm
aes-256-ccm (default in OpenZFS < 0.8.4)
aes-128-gcm
aes-192-gcm
aes-256-gcm (default in OpenZFS >= 0.8.4)

There’s more to OpenZFS native encryption than the algorithms used, though—so we’ll try to give you a brief but solid grounding in the sysadmin’s-eye perspective on the “why” and “what” as well as the simple “how.”

Why (or why not) OpenZFS native encryption?

A clever sysadmin who wants to provide at-rest encryption doesn’t actually need OpenZFS native encryption, obviously. As mentioned in the introduction, LUKS, VeraCrypt, and many other schemes are available and can be layered either beneath or atop OpenZFS itself.

First, the “why not”

Putting something like Linux’s LUKS underneath OpenZFS has an advantage—with the entire disk encrypted, an enterprising attacker can no longer see the names, sizes, or properties of ZFS datasets and zvols without access to the key. In fact, the attacker can’t necessarily see that ZFS is in use at all!

But there are significant disadvantages to putting LUKS (or similar) beneath OpenZFS. One of the gnarliest is that each individual disk that will be part of the pool must be encrypted, with each volume loaded and decrypted prior to the ZFS pool import stage. This can be a noticeable challenge for ZFS systems with many disks—in some cases, many tens of disks. Another problem with encryption-beneath-ZFS is that the extra layer is an extra thing to go wrong—and it’s in a position to undo all of ZFS’ normal integrity guarantees.

Putting LUKS or similar atop OpenZFS gets rid of the aforementioned problems—a LUKS encrypted zvol only needs one key regardless of how many disks are involved, and the LUKS layer cannot undo OpenZFS’ integrity guarantees from here. Unfortunately, encryption-atop-ZFS introduces a new problem—it effectively nerfs OpenZFS inline compression, since encrypted data is generally incompressible. This approach also requires the use of one zvol per encrypted filesystem, along with a guest filesystem (e.g., ext4) to format the LUKS volume itself with.

Now, the “why”

OpenZFS native encryption splits the difference: it operates atop the normal ZFS storage layers and therefore doesn’t nerf ZFS’ own integrity guarantees. But it also doesn’t interfere with ZFS compression—data is compressed prior to being saved to an encrypted dataset or zvol.

There’s an even more compelling reason to choose OpenZFS native encryption, though—something called “raw send.” ZFS replication is ridiculously fast and efficient—frequently several orders of magnitude faster than filesystem-neutral tools like rsync—and raw send makes it possible not only to replicate encrypted datasets and zvols, but to do so without exposing the key to the remote system.

This means that you can use ZFS replication to back up your data to an untrusted location, without concerns about your private data being read. With raw send, your data is replicated without ever being decrypted—and without the backup target ever being able to decrypt it at all. This means you can replicate your offsite backups to a friend’s house or at a commercial service like rsync.net or zfs.rent without compromising your privacy, even if the service (or friend) is itself compromised.

In the event that you need to recover your offsite backup, you can simply replicate it back to your own location—then, and only then, loading the decryption key to actually access the data. This works for either full replication (moving every single block across the wire) or asynchronous incremental replication (beginning from a commonly held snapshot and only moving the blocks that have changed since that snapshot).

What’s encrypted—and what isn’t?

OpenZFS native encryption isn’t a full-disk encryption scheme—it’s enabled or disabled on a per-dataset/per-zvol basis, and it cannot be turned on for entire pools as a whole. The contents of encrypted datasets or zvols are protected from at-rest spying—but the metadata describing the datasets/zvols themselves is not.

Let’s say we create an encrypted dataset named pool/encrypted, and beneath it we create several more child datasets. The encryption property for the children is inherited by default from the parent dataset, so we can see the following:

root@banshee:~# zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase banshee/encrypted
Enter passphrase: Re-enter passphrase: root@banshee:~# zfs create banshee/encrypted/child1
root@banshee:~# zfs create banshee/encrypted/child2
root@banshee:~# zfs create banshee/encrypted/child3 root@banshee:~# zfs list -r banshee/encrypted
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted 1.58M 848G 432K /banshee/encrypted
banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1
banshee/encrypted/child2 320K 848G 320K /banshee/encrypted/child2
banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3 root@banshee:~# zfs get encryption banshee/encrypted/child1
NAME PROPERTY VALUE SOURCE
banshee/encrypted/child1 encryption aes-256-gcm -

At the moment, our encrypted datasets are all mounted. But even if we unmount them and unload the encryption key—making them inaccessible—we can still see that they exist, along with their properties:

root@banshee:~# wget -qO /banshee/encrypted/child2/HuckFinn.txt http://textfiles.com/etext/AUTHORS/TWAIN/huck_finn root@banshee:~# zfs unmount banshee/encrypted
root@banshee:~# zfs unload-key -r banshee/encrypted
1 / 1 key(s) successfully unloaded root@banshee:~# zfs mount banshee/encrypted
cannot mount 'banshee/encrypted': encryption key not loaded root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory root@banshee:~# zfs list -r banshee/encrypted
NAME USED AVAIL REFER MOUNTPOINT
banshee/encrypted 2.19M 848G 432K /banshee/encrypted
banshee/encrypted/child1 320K 848G 320K /banshee/encrypted/child1
banshee/encrypted/child2 944K 848G 720K /banshee/encrypted/child2
banshee/encrypted/child3 320K 848G 320K /banshee/encrypted/child3

As we can see above, after unloading the encryption key, we can no longer see our freshly downloaded copy of Huckleberry Finn in /banshee/encrypted/child2/. What we can still see is the existence—and structure—of our entire ZFS-encrypted tree. We can also see each encrypted dataset’s properties, including but not limited to the USED, AVAIL, and REFER of each dataset.

It’s worth noting that trying to ls an encrypted dataset that doesn’t have its key loaded won’t necessarily produce an error:

root@banshee:~# zfs get keystatus banshee/encrypted
NAME PROPERTY VALUE SOURCE
banshee/encrypted keystatus unavailable -
root@banshee:~# ls /banshee/encrypted
root@banshee:~#

This is because a naked directory exists on the host, even when the actual dataset is not mounted. Reloading the key doesn’t automatically remount the dataset, either:

root@banshee:~# zfs load-key -r banshee/encrypted
Enter passphrase for 'banshee/encrypted': 1 / 1 key(s) successfully loaded
root@banshee:~# zfs mount | grep encr
root@banshee:~# ls /banshee/encrypted
root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory

In order to access our fresh copy of Huckleberry Finn, we’ll also need to actually mount the freshly key-reloaded datasets:

root@banshee:~# zfs get keystatus banshee/encrypted/child2
NAME PROPERTY VALUE SOURCE
banshee/encrypted/child2 keystatus available - root@banshee:~# ls -l /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory root@banshee:~# zfs mount -a
root@banshee:~# ls -lh /banshee/encrypted/child2
total 401K
-rw-r--r-- 1 root root 554K Jun 13 2002 HuckFinn.txt

Now that we’ve both loaded the necessary key and mounted the datasets, we can see our encrypted data again.

https://arstechnica.com/?p=1775690