Note: This was brought across from old blog. Textile -> Markdown did not go so well so there could be a few ugly formatting problems.
I have rapidly become a FreeBSD convert. From day zero I have been impressed with its consistency, code and quality (just compare it to most of the GNU code!). Like any good code-base built on a solid foundation, it enables you to build on it with radically advanced and powerful features. One such feature is ZFS. ZFS is a one of many next-generation file-systems, it originated in the open solaris camp but has been ported to FreeBSD, OS-X and Linux (although it is hamstrung in Linux due to some licensing issues).
I am not going to try to do too much of a sales pitch for ZFS, it pretty much sells itself. Basically, It breaks a bunch of rules to deliver a stunning file system. At a high level ZFS offers mirroring (RAID1 analogue), raidz1 which is stripping with 1 parity bit (RAID5 analogue), raidz2 which is stripping with 2 parity bits (RAID6 analogue). Some important differences between traditional raid are the transactional copy-on-write semantics and variable strip width. These combine to help eliminate the RAID5 write hole.
I have been involved with a number of ZFS installs, the following documents the setup and process for getting up and running. These steps have been verified for FreeBSD 7.1 and 7.2. Many thanks must go to How to install FreeBSD 7.0 under ZFS and ZfsForReplication. These steps are heavily based on those two articles, my final layout has some subtle differences to both of these so I though it would be worth documenting again.
Preamble.
This following process is for the raidz1 setup, there is very little difference for mirroring.
A slight forewarning that this may not be the ‘optimal’ install process. I am aware that a number of these steps could be compressed into a single step. I am also aware that things could be done in a different order, in particular, you could setup the zpool first and worry about mirroring the boot partition later. However, I have been pretty cautious, moving a single step at a time, and this process has served me well for a number of installs now.
WARNING: Proceed through these steps slowly. I recommend you read it all before you start. Ordering of the steps is very import. In a number of cases there are 2 copies of the same file so be very aware what you are editing and understand where it should be going. If you want to ask me a question shoot an email to mark@markh.id.au.
The boot problem.
The FreeBSD 7.x boot loader does not grok ZFS so a layer of indirection is required. This is achieved by using a small UFS partition to bootstrap (note: this is no longer true for 8-CURRENT). In the case of failure you need a strategy for restoring the bootstrap OS. The simplest is to have a separate disk (or array of disks) for the os and use ZFS just for the file-store. A more advanced solution, and what I have gone with, is to allocate a (small) mirrored partition at the start of each disk for the bootstrap.
The layout.
The target layout has three identical disks (although it is possible to use different size disks if you are willing to waste some space). Each disk has a single slice partitioned like:
|_. Type |_. Size | | UFS | 1G | | SWAP | 2G | | ZFS | REMAINDER |
The install.
- Boot from DVD and start the installation as per normal.
- Select standard install.
- Take note of your disk names. For the remainder I will use disks
ad0,ad1,ad2(however it is unlikely that the disks numbering will be that nice, in reality mine are 4, 6 and 10). - Select first disk, using fdisk select auto for a single slice covering the whole disk. ** Note: At this point you could also set up the remaining disks in sysinstall, but it is easier to use bsdlabel at a later stage.
- Continue to disklabel. Label as per the following table. ** To create ‘D’, partition with some arbitrary mount point. Than go edit the mountpoint, key ‘M’ in disklabel, and just clear out the mountpoint and hit ok.
|_. Partition |_. Size |_. Type |_. Mount | | A | 1G | UFS | / | | B | 2G | SWAP | | | D | REMAINDER | | |
- Save layout.
- Select minimal install (no ports, src, man pages etc…).
- Do post install config and reboot into single user mode.
Preparing other disks.
Having booted into single user mode. Select /bin/sh as shell and mount / as writable:
# mount -w /
fdisk and label the disks the same as ad0
# fdisk -BI ad1 # fdisk -BI ad2 # bsdlabel ad0s1 > /tmp/label # bsdlabel -RB ad1s1 /tmp/label # bsdlabel -RB ad2s1 /tmp/label
Note: This process can be followed with n disks and is the same whether doing mirroring, raidz1 or raidz2.
Mirroring bootage.
Setting up the mirror is pretty straight forward. Following the simplest path, we are going to create the mirror on one of the spare disks (ad1), copy everything across and then add back in the first disk and any the rest of the extras. The mirror is going to be called bootage, but it could be anything.
# gmirror label bootage ad0s1a # kldload geom_mirror # should get some messages about the mirror booting # newfs /dev/mirror/bootage # mount /dev/mirror/bootage /mnt
Update /boot/loader.conf
# echo 'geom_mirror_load="YES"' >> /boot/loader.conf
Update /etc/fstab, changing / mount from /dev/ad0s1a to /dev/mirror/bootage. It should look something like:
/dev/mirror/bootage / ufs rw 1 1 /dev/ad0s1b none swap sw 0 0 /dev/acd0 /cdrom cd9660 ro,noauto 0 0
Now copy across the install on / to the new mirror.
# find -x / | cpio -pmd /mnt
Chances are you will get a 2 or 3 errors from files in /var/ that could not be copied. It is safe to ignore these errors.
Reboot onto your mirror.
Now to add the redundancy. Simply perform a gmirror insert to add the original disk plus the extras.
# gmirror insert bootage ad0s1a # the original install disk # gmirror insert bootage ad2s1a # the empty third disk.
These disks should sync up pretty quick. Check out the progress with gmirror status. When it is finished it should look something like:
# gmirror status
Name Status Components
mirror/bootage COMPLETE ad0s1a
ad1s1a
ad2s1a
Adding the extra swap.
For a three disk system there is no benefit in doing anything fancy like RAID3 for the swap partitions. Just add them to /etc/fstab.
/dev/mirror/bootage / ufs rw 1 1 /dev/ad0s1b none swap sw 0 0 /dev/ad1s1b none swap sw 0 0 /dev/ad2s1b none swap sw 0 0 /dev/acd0 /cdrom cd9660 ro,noauto 0 0
The main game. ZFS.
Reboot into single user mode. Select /bin/sh for shell. Mount file-system as writable.
# mount -w /
Create the zpool. A potential gotcha using raidz is that all disks must be added at the same time. When mirroring you can add one at time.
# zpool create tank0 raidz1 ad{0,1,2}s1d
Note 1: If you wanted mirroring instead of raidz, just drop the tank0 can be any name for your pool. Most examples use tank. I prefer the numeric identifier. It is a convention that can be extended to multiple pools in a way that makes sense.
Prevent ZFS from creating a mountpoint.
# zfs set mountpoint=none tank0
Create the partitions you would normally configure your system with.
# zfs create tank0/root # zfs create tank0/home # zfs create tank0/usr # zfs create tank0/var # zfs create tank0/tmp # zfs create tank0/stash # this just a dumping ground I put on all my systems.
Set some temporary mounts.
# zfs set mountpoint=/tank0 tank0/root # zfs set mountpoint=/tank0/home tank0/home # zfs set mountpoint=/tank0/usr tank0/usr # zfs set mountpoint=/tank0/var tank0/var # zfs set mountpoint=/tank0/tmp tank0/tmp # zfs set mountpoint=/tank0/stash tank0/stash
Enable zfs in /etc/rc.conf.
# echo 'zfs_enable="YES"' >> /etc/rc.conf
Now copy your installation to your ZFS layout in much the same way as you did for the mirror.
# find -x / | cpio -pmd /tank0
You can ignore errors again.
As discussed earlier, we need to do some bootstrapping to get the system up under ZFS. To achieve this we use a bit of trickery and have the UFS partition mounted under the boot dir.
# rm -rf /tank0/boot # mkdir /tank0/bootdir # cd /tank0 # ln -s bootdir/boot boot
Enable zfs in /boot/loader.conf.
# echo 'zfs_load="YES"' >> /boot/loader.conf # echo 'vfs.root.mountfrom="zfs:tank0/root"' >> /boot/loader.conf
Edit /etc/fstab to mount UFS partition at /bootdir instead of /. It should now look like:
/dev/mirror/bootage /bootdir ufs rw 1 1 /dev/ad0s1b none swap sw 0 0 /dev/ad1s1b none swap sw 0 0 /dev/ad2s1b none swap sw 0 0 /dev/acd0 /cdrom cd9660 ro,noauto 0 0
Yep, there is no entry for /. This is correct. Bootloader and ZFS will take care of that.
The last thing that needs to be done is to configure the final ZFS mounts.
# zfs set mountpoint=/home tank0/home # zfs set mountpoint=/usr tank0/usr # zfs set mountpoint=/var tank0/var # zfs set mountpoint=/tmp tank0/tmp # zfs set mountpoint=/stash tank0/stash
Set tank0/root mount legacy. Bootloader will have already mounted it.
# cd / # zfs set mountpoint=legacy tank0/root
Done. Reboot.
Checking things out.
Login as root.
To get some idea of what is going on:
# df -h # zfs list # zpool status
Tuning.
Checkout the references for more details here. ZFS is still experimental in FreeBSD and as such needs a bit of love to get it running just right. There are also a known issue caused by the fact that ZFS caches inside kernel memory. This is ok on Solaris as kernel memory can be expanded to some pretty large numbers, but on FreeBSD kernel memory is currently limited to ~1500M (at least at the time of my installs, I think there are some changes in CURRENT to work around this).
My /boot/loader.conf tuning:
# magic zfs donuts, be careful. vm.kmem_size="1500M" vm.kmem_size_max="1500M" vfs.zfs.arc_max="512M" vfs.zfs.prefetch_disable=1
Recovery.
I can not improve on what is specified in this article, http://www.ish.com.au/solutions/articles/freebsdzfs, so I am just going to pass you on until I get some more time to write up my process.
Things I didn’t know.
- The zfs metadata is stored in
/boot/zfs/zpool.cache
Things to improve on for next time.
- Only a minimal install is required for boot partion, so 1G is a bit excessive, I originally layed out the disk like this to give me a bit of a buffer in case I needed to do any running repairs. It turned out to be unesecary.
- It is possible to use labels rather than raw disk names when creating zpool. This helps if you need to go re-cabling your drives and the disk numbering changes.
References.
How to install FreeBSD 7.0 under ZFS [ZfsForReplication]:(http://nzfug.nz.freebsd.org/nzfug/AndrewThompson/ZfsForReplication)