Reducing the number of ZFS user filesystems
Okay, so switching to ZFS and using a filesystem per user seemed like a good idea at the time...
However, when you have roughly a zillion users AND you are using a snapshot-based backup mechanism (znapzend), you wind up with around 10 zillion filesystems. Even simple commands like zfs list
can take minutes in this environmet. We also ran into an obscure limit in ZFS send/recv that limits the number of datasets that can be synced to 10-20,000.
So let's see if we can get this back under a zillion.
We already have a mechanism to "inactivate" users who have not logged in within some period of time. This causes their user homedir filesystem to no longer be mounted and thus no longer exported to boss or other machines. Currently about 6000 of our 10000 users are inactive based on a 24 month activity window. This was our first attempt to reduce the impact of zillions of filesystems as it badly affected mountd
and exports.
What I would now like to do is move all inactive user homedirs to a single filesystem, say /users/inactive. That would eliminate all the distinct filesystems and snapshots. Since we don't need to export these user homedirs individually to any machine there is no export issue. Since they will be inaccesible to the users, they cannot put stuff out there and thus we don't need individual quotas either. We could also export these to boss if we want using a single static line in /etc/exports.head
. One downside is that we have to physically copy the contents of such directories (we cannot just rename them). Not an issue with the serialized bulk-inactivate CLI script we use, but maybe a concern if we dynamically re-activate users via a web page. The latter because of the highly variable amount of time it will take to copy files back to a per-user filesystem.
One variant of this would be not just rsync
their directory when inactivating, but rather just tar up the contents, maybe excluding dotfiles, and then when we restore their directory we just give them their dotfiles and a tarball. This would probably be slightly faster to restore and we can encourage them to get rid of the old tarball if they don't need any of their previous stuff.