January 29, 2009

ZFS Install

Note: This was brought across from old blog. Textile -> Markdown did not go so well so there could be a few ugly formatting problems.

I have rapidly become a FreeBSD convert. From day zero I have been impressed with its consistency, code and quality (just compare it to most of the GNU code!). Like any good code-base built on a solid foundation, it enables you to build on it with radically advanced and powerful features. One such feature is ZFS. ZFS is a one of many next-generation file-systems, it originated in the open solaris camp but has been ported to FreeBSD, OS-X and Linux (although it is hamstrung in Linux due to some licensing issues).

I am not going to try to do too much of a sales pitch for ZFS, it pretty much sells itself. Basically, It breaks a bunch of rules to deliver a stunning file system. At a high level ZFS offers mirroring (RAID1 analogue), raidz1 which is stripping with 1 parity bit (RAID5 analogue), raidz2 which is stripping with 2 parity bits (RAID6 analogue). Some important differences between traditional raid are the transactional copy-on-write semantics and variable strip width. These combine to help eliminate the RAID5 write hole.

I have been involved with a number of ZFS installs, the following documents the setup and process for getting up and running. These steps have been verified for FreeBSD 7.1 and 7.2. Many thanks must go to How to install FreeBSD 7.0 under ZFS and ZfsForReplication. These steps are heavily based on those two articles, my final layout has some subtle differences to both of these so I though it would be worth documenting again.

Preamble.

This following process is for the raidz1 setup, there is very little difference for mirroring.

A slight forewarning that this may not be the ‘optimal’ install process. I am aware that a number of these steps could be compressed into a single step. I am also aware that things could be done in a different order, in particular, you could setup the zpool first and worry about mirroring the boot partition later. However, I have been pretty cautious, moving a single step at a time, and this process has served me well for a number of installs now.

WARNING: Proceed through these steps slowly. I recommend you read it all before you start. Ordering of the steps is very import. In a number of cases there are 2 copies of the same file so be very aware what you are editing and understand where it should be going. If you want to ask me a question shoot an email to mark@markh.id.au.

The boot problem.

The FreeBSD 7.x boot loader does not grok ZFS so a layer of indirection is required. This is achieved by using a small UFS partition to bootstrap (note: this is no longer true for 8-CURRENT). In the case of failure you need a strategy for restoring the bootstrap OS. The simplest is to have a separate disk (or array of disks) for the os and use ZFS just for the file-store. A more advanced solution, and what I have gone with, is to allocate a (small) mirrored partition at the start of each disk for the bootstrap.

The layout.

The target layout has three identical disks (although it is possible to use different size disks if you are willing to waste some space). Each disk has a single slice partitioned like:

|_. Type |_.  Size   |
| UFS    | 1G        |
| SWAP   | 2G        |
| ZFS    | REMAINDER |

The install.

|_. Partition |_. Size |_. Type |_. Mount |
| A | 1G | UFS | / |
| B | 2G | SWAP | |
| D | REMAINDER | | |

Preparing other disks.

Having booted into single user mode. Select /bin/sh as shell and mount / as writable:

# mount -w /

fdisk and label the disks the same as ad0

# fdisk -BI ad1
# fdisk -BI ad2
# bsdlabel ad0s1 > /tmp/label
# bsdlabel -RB ad1s1 /tmp/label
# bsdlabel -RB ad2s1 /tmp/label

Note: This process can be followed with n disks and is the same whether doing mirroring, raidz1 or raidz2.

Mirroring bootage.

Setting up the mirror is pretty straight forward. Following the simplest path, we are going to create the mirror on one of the spare disks (ad1), copy everything across and then add back in the first disk and any the rest of the extras. The mirror is going to be called bootage, but it could be anything.

# gmirror label bootage ad0s1a
# kldload geom_mirror             # should get some messages about the mirror booting
# newfs /dev/mirror/bootage
# mount /dev/mirror/bootage /mnt

Update /boot/loader.conf

# echo 'geom_mirror_load="YES"' >> /boot/loader.conf

Update /etc/fstab, changing / mount from /dev/ad0s1a to /dev/mirror/bootage. It should look something like:

/dev/mirror/bootage     /               ufs     rw              1       1
/dev/ad0s1b             none            swap    sw              0       0
/dev/acd0               /cdrom          cd9660  ro,noauto       0       0

Now copy across the install on / to the new mirror.

# find -x / | cpio -pmd /mnt

Chances are you will get a 2 or 3 errors from files in /var/ that could not be copied. It is safe to ignore these errors.

Reboot onto your mirror.

Now to add the redundancy. Simply perform a gmirror insert to add the original disk plus the extras.

# gmirror insert bootage ad0s1a     # the original install disk
# gmirror insert bootage ad2s1a     # the empty third disk.

These disks should sync up pretty quick. Check out the progress with gmirror status. When it is finished it should look something like:

# gmirror status
          Name    Status  Components
mirror/bootage  COMPLETE  ad0s1a
                          ad1s1a
                          ad2s1a

Adding the extra swap.

For a three disk system there is no benefit in doing anything fancy like RAID3 for the swap partitions. Just add them to /etc/fstab.

/dev/mirror/bootage     /               ufs     rw              1       1
/dev/ad0s1b             none            swap    sw              0       0
/dev/ad1s1b             none            swap    sw              0       0
/dev/ad2s1b             none            swap    sw              0       0
/dev/acd0               /cdrom          cd9660  ro,noauto       0       0

The main game. ZFS.

Reboot into single user mode. Select /bin/sh for shell. Mount file-system as writable.

# mount -w /

Create the zpool. A potential gotcha using raidz is that all disks must be added at the same time. When mirroring you can add one at time.

# zpool create tank0 raidz1 ad{0,1,2}s1d

Note 1: If you wanted mirroring instead of raidz, just drop the from the command. Note 2: tank0 can be any name for your pool. Most examples use tank. I prefer the numeric identifier. It is a convention that can be extended to multiple pools in a way that makes sense.

Prevent ZFS from creating a mountpoint.

# zfs set mountpoint=none tank0

Create the partitions you would normally configure your system with.

# zfs create tank0/root
# zfs create tank0/home
# zfs create tank0/usr
# zfs create tank0/var
# zfs create tank0/tmp
# zfs create tank0/stash      # this just a dumping ground I put on all my systems.

Set some temporary mounts.

# zfs set mountpoint=/tank0 tank0/root
# zfs set mountpoint=/tank0/home tank0/home
# zfs set mountpoint=/tank0/usr tank0/usr
# zfs set mountpoint=/tank0/var tank0/var
# zfs set mountpoint=/tank0/tmp tank0/tmp
# zfs set mountpoint=/tank0/stash tank0/stash

Enable zfs in /etc/rc.conf.

# echo 'zfs_enable="YES"' >> /etc/rc.conf

Now copy your installation to your ZFS layout in much the same way as you did for the mirror.

# find -x / | cpio -pmd /tank0

You can ignore errors again.

As discussed earlier, we need to do some bootstrapping to get the system up under ZFS. To achieve this we use a bit of trickery and have the UFS partition mounted under the boot dir.

# rm -rf /tank0/boot
# mkdir /tank0/bootdir
# cd /tank0
# ln -s bootdir/boot boot

Enable zfs in /boot/loader.conf.

# echo 'zfs_load="YES"' >>  /boot/loader.conf
# echo 'vfs.root.mountfrom="zfs:tank0/root"'  >>  /boot/loader.conf

Edit /etc/fstab to mount UFS partition at /bootdir instead of /. It should now look like:

/dev/mirror/bootage     /bootdir        ufs     rw              1       1
/dev/ad0s1b             none            swap    sw              0       0
/dev/ad1s1b             none            swap    sw              0       0
/dev/ad2s1b             none            swap    sw              0       0
/dev/acd0               /cdrom          cd9660  ro,noauto       0       0

Yep, there is no entry for /. This is correct. Bootloader and ZFS will take care of that.

The last thing that needs to be done is to configure the final ZFS mounts.

# zfs set mountpoint=/home tank0/home
# zfs set mountpoint=/usr tank0/usr
# zfs set mountpoint=/var tank0/var
# zfs set mountpoint=/tmp tank0/tmp
# zfs set mountpoint=/stash tank0/stash

Set tank0/root mount legacy. Bootloader will have already mounted it.

# cd /
# zfs set mountpoint=legacy tank0/root

Done. Reboot.

Checking things out.

Login as root.

To get some idea of what is going on:

# df -h
# zfs list
# zpool status

Tuning.

Checkout the references for more details here. ZFS is still experimental in FreeBSD and as such needs a bit of love to get it running just right. There are also a known issue caused by the fact that ZFS caches inside kernel memory. This is ok on Solaris as kernel memory can be expanded to some pretty large numbers, but on FreeBSD kernel memory is currently limited to ~1500M (at least at the time of my installs, I think there are some changes in CURRENT to work around this).

My /boot/loader.conf tuning:

# magic zfs donuts, be careful.
vm.kmem_size="1500M"
vm.kmem_size_max="1500M"
vfs.zfs.arc_max="512M"
vfs.zfs.prefetch_disable=1

Recovery.

I can not improve on what is specified in this article, http://www.ish.com.au/solutions/articles/freebsdzfs, so I am just going to pass you on until I get some more time to write up my process.

Things I didn’t know.

Things to improve on for next time.

References.

How to install FreeBSD 7.0 under ZFS [ZfsForReplication]:(http://nzfug.nz.freebsd.org/nzfug/AndrewThompson/ZfsForReplication)