12
Last Processed / June 4, 2026 7 min read
|

ZFS Housekeeping

Hello! It’s been a little while since I did anything ZFS related.

This past Memorial Day weekend I decided to do some office reorganizing, a daunting task I’ve been putting off because I needed to shutdown the whole homelab for an undetermined amount of time. This includes my desktop workstation, NAS, backup servers, their storage pools, and UPS. Not something I take lightly. But, I’m happy to announce that I finally did it! And it went without a hitch! Well, mostly.

Uh oh

One of two issues I encountered was when I went to replace my UPS battery. As the source of power for all of the above machines, I had the foresight to intend on handling this when I rearranged everything. Except much to my dismay, after unplugging everything and opening up the UPS and the packaging for the replacement battery, it was obvious I bought the wrong one. Even though it was listed as a replacement for my model of UPS. That’s what I get for not doing my due diligence, and for buying one off eBay. So that will have to wait for another day.

The other thing that didn’t go as planned was my intention to move one of my backup zpools to another machine. After building my NAS earlier this year to upgrade from the Intel nuc I was using, I left the main zpool on its old proxmox host and made it a replication target for the NAS. I figured I’d keep the zpool running until the day finally came to move things around. I tried before moving it to its intended new destination, a 2018 Mac Mini with the T2 Linux kernel & Debian as a dedicated backup server. With two pools attached via Thunderbolt—not ideal with ZFS, hence the NAS—it was already handling backups for said NAS and the daily LXC container backups from proxmox. I figured I could simply export the zpool, plug in, import the zpool, and keep things rolling. In practice, I didn’t get very far. With data actively moving around, the server didn’t like it when I plugged in a new pool, and ZFS threw checksum errors into my pools, which resulted in scrubs that luckily fixed the issue with loss of data. Having learned my lesson, this time I took the care to shut down the server first. Unfortunately, issues persisted. My main guess after looking for any thing resembling a clue in dmesg or journalctl was that the bus for Thunderbolt management simply couldn’t seem to handle 3 zpools, and it kept falling over. I visualize a sort of digital traffic jam. So I moved it back to where it started, my Arch Linux desktop. All things considered, this seems to be an acceptable option. OpenZFS has been keeping up to date with the kernel pretty well lately, and I have an LTS kernel installed. Only difference really is that the NAS is the obvious source of truth now. I know it’s all a little disparate, but I’m working with what I got. Shits expensive nowadays and this works.

I still had an issue with the backup server falling over with only two pools connected. This wasn’t happening before. After some more digging, I ended up disabling ASPM in grub, a power management feature that would put the PCIe links into a sleep state that ZFS did not seem to like. I haven’t had an issue since disabling it. I’m hesitant to try putting the desktop zpool on it again, especially since I have everything configured so nicely now.

The ASPM settings were changed with the following:

sudo vim /etc/default/grub

# then edit
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off"

sudo update-grub
sudo reboot

Syncoid & Sanoid

Once I brought everything back online, I needed to reassess my backup strategy and update my syncoid script. Sanoid & Syncoid are the tools for ZFS snapshot & replication management made by the exalted Jim Salter (the host of one of my favorite podcasts, 2.5 Admins) to manage ZFS snapshots and replications. I have only a little clue what I’m doing here, I’m surprised at how I’ve managed to get as far as I have. The update was a simple change to the host variable in my script. But when testing, I noticed something peculiar and potentially catastrophic in the potential future event of needing to restore from backup.

My script kept causing my backup pools datasets to roll back in time to a certain snapshot, and then would replicate everything past that snapshot. It seems the issue was an incorrect setting in sanoid.conf, and that I was using the flag —delete-target-snapshots in my syncoid command. Switching to —no-sync-snap and reviewing and fixing my sanoid.conf snapshot policy seems to have solved the problem. Syncoid now only sends snapshots Sanoid makes, instead of sending its own.

2026-06-02 Update: It did not fix the problem. It continued sending the latest daily snapshot 4 times a day instead of the latest hourly. Which isn’t what I wanted to do. I simply removed the —no-sync-snap flag and let syncoid send and manage its own snapshots as the new incremental and that (fingers crossed) seems to have solved the problem. That’s what I get for trying to mess with a tool that already knows what it’s doing.

I also had Sanoid taking and managing its own snapshots on each machine, when I should have had set ‘autosnap=no’. Duh. Pools on the receiving end shouldn’t be making snapshots, it should only be pruning what syncoid delivers. Thus, syncoid trying to delete target snapshots was throwing things off. I’m truly a terrible sysadmin. My logic was off that day I suppose. I will definitely be keeping a closer eye on my zpools, which feels difficult because I thought I already was! The deeper I get into ZFS the more I realize how many moving parts there are to get and keep everything working together seamlessly. Simply managing my own homelab makes me appreciate how expeditiously content is stored and delivered all across the web.

I figured I’d keep tweaking. Instead of my backup server pulling from my NAS, and then pushing from its own storage to the storage pool on my desktop I restarted syncoid on my desktop to pull from my NAS. Figured that if my backup server goes down my desktop machine is still kept up to date.

OpenZFS & MacOS

I also added a manual sync script to a cronjob on an m4 Mac Mini I have. Attached to it is a 10tb single disk zpool in which said script pulls the most important and irreplaceable datasets from my NAS on a daily schedule and via Backblaze sends data offsite. And here we have reached the a cliff of an edge-case. OpenZFS on Mac is finicky, especially when MacOS decides to automatically update and the version of OpenZFS installed is no longer compatible with the updated OS. And to add insult to injury, Backblaze doesn’t seem to like the ZFS filesystem anymore. Every time I go to the settings page for Backblaze the application crashes. I confirmed that new data is still being sent and can be downloaded from my online account portal, but when contacting support they told me explicitly ZFS is not recommended. Eventually I may have to format the drive to be compatible with their software and change my ZFS send/receive to an rsync which is something I’d really like to avoid but for now, it works. Added a snapshot retention script and things are humming.

Now that things are tightened up a bit more, the confidence I have in my data and backup strategy is better than ever. I know I need to setup a better monitoring system than the zpool status script I get sent via a daily email, but that’s a future project. I wonder what new surprise lurks around the next corner? Who knows! I have also been on a 4 year endeavor changing my photography workflow to Linux, and so far it hasn’t worked. Perhaps I’ll make my next post about getting my Canon Pixma Pro-10 photo printer working on Linux (if I ever get there, I’m this close!) So stay tuned!