README-upgrade-11.1-11.2.txt 15.5 KB
Newer Older
Upgrading Emulab servers from FreeBSD 11.1 to 11.2.
2 3 4 5 6 7 8 9 10 11 12 13

These are fairly specific, but not always exact instructions for the process.
They are also oriented toward the CloudLab family of clusters, hence the
references to mothership, Clemson, Wisconsin, Apt, etc.

Start with the boss node, and then you will repeat the instructions for ops.
Note that there are a couple of steps below that you only do on the boss or
the ops node, so pay attention!

[XXX these would benefit from breaking ops out from boss as they are different
enough that describing them with a lather-rinse-repeat process is confusing...]

A. Things to do in advance of shutting down Emulab.

16 17 18 19
   These first few steps can be done in advance of shutting down your site.
   These include making a backup, fetching the new release files and merging
   in local changes, building a custom kernel (if you use one), and stashing
   away state about your current packages.


23 24 25
   If your boss and ops are VM on Xen, you can create shadows of the disks
   that you can roll back to. Really only need to backup the root disk which
   has all the FreeBSD stuff. Login to the control node and:

27 28 29
   # apt
   sudo lvcreate -s -L 33g -n boss.backup xen-vg/boss
   sudo lvcreate -s -L 33g -n ops.backup xen-vg/ops

31 32 33
   # cloudlab utah/clemson
   sudo lvcreate -s -L 17g -n boss.backup xen-vg/boss
   sudo lvcreate -s -L 17g -n ops.backup xen-vg/ops

35 36 37
   This will seriously degrade the performance of the upgrade process due
   to the inefficiencies of disk writes when shadows are present, but it is
   worth it to avoid a total screw up.

2. Fetch the new release with freebsd-update.

41 42
   This will not install anything, it will just fetch the new files and merge
   local changes in. You can do this on both boss and ops simultaneously.

44 45 46 47
   Do not do it too far (i.e., more than a day) in advance, since the base
   system changes and your local mods may change as well. For example, new
   users might be added in the interim which would invalidate your merged

49 50
   Before fetching, make sure your /etc/freebsd-update.conf is correct,
   in particular the "Components" line.

52 53 54 55 56
   By default it will want to update your kernel ("kernel") and source tree
   ("src") as well as the binaries ("world"). Life will be much easier if you
   go with the flow and just let it do that. However, if you have a custom
   source tree (or update it yourself with svn or git) then remove "src"
   from the line:

     Components world kernel # don't update src

   If you have a custom kernel, then remove "kernel":

     Components world # don't update src or kernel
63 64 65 66 67 68 69 70 71

   However, because you are changing major releases, rebuilding your
   custom kernel (next step) will require rebuilding the entire world first,
   which takes a long time and pretty much elimiates the advantages of
   using the binary update system. So, you might reconsider why you have a
   custom kernel and move back to the default kernel instead. If you opt
   for the default GENERIC kernel, make sure to leave "kernel" in the
   components above.

72 73 74
   Once you have /etc/freebsd-update.conf squared away, do the "fetch"
   part of the upgrade:

     sudo freebsd-update -r 11.2-RELEASE upgrade
76 77 78 79 80

   Since this will ask you to merge a bunch of local changes into various
   files and will want to fire up an editor, you might want to make sure
   you get a *real* editor by doing:

     sudo -E EDITOR=emacs freebsd-update -r 11.2-RELEASE upgrade
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

   instead. Otherwise you will probably wind up with vi.

   It will crunch for a long time and then probably want you to merge
   some conflicts. Here are a couple to take note of:

     * /etc/ssh/sshd_config: make sure Protocol does not include 1,
       otherwise it will spit out constant warnings to the console.

     * /etc/ttys: for Xen VMs make sure the getty on ttyu0 is "off"
       and not "onifconsole". Otherwise you will have competing gettys
       on /dev/console.
   NOTE: if you built and installed your system from sources originally,
   you may also get some conflicts with other files where it calls out diffs
   is the RCS header or in comments. Favor the newer versions of those to
   hopefully avoid future conflicts.

100 101 102 103
   REALLY IMPORTANT NOTE: if it shows you a diff and asks you if something
   "looks reasonable" and you answer "no", it will dump you out of the update
   entirely and you have to start over. It will *not* just let you fire up
   the editor and fix things!
104 105 106 107 108 109

   It will then show you several potentially long lists of files that it
   will be adding, deleting, etc. It uses "more" to display them, so you
   can 'q' out of those without dumping out of the update entirely (the
   last one will exit the update, but that is because it is done).

110 111 112 113
3. (Optional) Upgrade your custom kernel
   If you have a custom kernel config, then you should build and install
   a new kernel first. As mentioned in the last step, this will take a long
   time because you must build (but not install) the entire world before
114 115
   building the kernel. You can again to this on boss and ops simultaneously.

   Clone the FreeBSD 11.2 source repo:
117 118 119

   cd /usr
   sudo mv src Osrc
   sudo svn checkout -q svn:// src
121 122 123 124 125 126
   <copy over your custom config file from Osrc/sys/amd64/conf/CUSTOM>

   cd src
   sudo make -j 8 buildworld
   sudo make -j 8 buildkernel KERNCONF=CUSTOM

4. Stash away the current set of packages you have installed.

129 130 131
   This will allow you to figure out the extra ports you have installed so
   that you can update them later. First make a list of everything installed:
   Do this on boss and then on ops. For boss:

133 134 135
   mkdir ~/upgrade
   cd ~/upgrade
   pkg query "%n-%v %R" > boss.pkg.list

137 138 139
   This is mostly to keep track of any ports you may have installed locally.
   One way to determine local installs is to see which ports did NOT come
   from the Emulab repository:

   grep -v 'Emulab$' boss.pkg.list | awk '{ print $1; }' > boss.pkg.local

   This will give you the list of packages that you may need to reinstall.

145 146
   You may want to list the dependencies of each to see what the top-level
   packages are and just install those.

148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
   pkg query -x "%n %v usedby=%#r" `cat boss.pkg.local` | \
       grep 'usedby=0' | awk '{ print $1; }' > boss.pkg.reinstall

   Now login to ops and do the same thing:

   cd ~/upgrade
   pkg query "%n-%v %R" > ops.pkg.list
   grep -v 'Emulab$' ops.pkg.list | awk '{ print $1; }' > ops.pkg.local
   pkg query -x "%n %v usedby=%#r" `cat ops.pkg.local` | \
       grep 'usedby=0' | awk '{ print $1; }' > ops.pkg.reinstall

B. Updating the base FreeBSD system

1. If you are on the boss node, shutdown the testbed and some other services
   right off the bat.

     sudo /usr/testbed/sbin/testbed-control shutdown
     sudo /usr/local/etc/rc.d/ stop
     sudo /usr/local/etc/rc.d/apache24 stop
168 169 170 171
     sudo /usr/local/etc/rc.d/capture stop

     sudo /usr/local/etc/rc.d/ stop
     sudo /usr/local/etc/rc.d/apache24 stop
     sudo /usr/local/etc/rc.d/capture stop
175 176
2. Before installing the new binaries/libraries/etc., you might want to back
   up the files that have Emulab changes just in case. Those files are:

178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193
     /etc/ntp.conf       # if you have customized it
     /etc/ttys           # if you have configured a serial console

   The easiest thing to do is just:

     sudo cp -rp /etc /Oetc

3. Install the new system binaries/libraries/etc:

   If it has been more than a day or so since you did the "upgrade"
   command back in step A2, then you might consider doing it again.
   Doing it again basically throws away everything it built up on the
   previous run and you will have to go through all the manual merging
   again. Once you are satisfied, do the install of the new binaries:
194 195 196

    sudo /usr/sbin/freebsd-update install

197 198
  After a while it will want you to reboot the new kernel. Before you reboot,
  if you built a custom kernel back in step A3, install it now:
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217

   cd /usr/src
   sudo make installkernel KERNCONF=CUSTOM

   When I did the custom kernel install, I saw errors of the form:

      kldxref /boot/kernel
      kldxref: unknown metadata record 4 in file atacard.ko
      kldxref: unknown metadata record 4 in file atp.ko

   They did not seem to affect the following boot.

   NOTE: I have noticed a couple of times on VM-based elabinelab boss/ops
   upgrades that the root filesystem has some issues after the upgrade,
   so it is good to run an fsck. I prefer to do this while shutting down.
   Before you do this, make sure you first have access to the console!

   sudo shutdown now

219 220 221 222 223 224 225 226 227 228 229 230 231
   umount -at nfs
   umount -at ufs
   accton	# turn off accounting that has a file open on /
   mount -o ro -u /
   fsck -y /

   NOTE: when rebooting boss, mysqlcheck might take awhile when rebooting.

   When it comes back up, you should login and shutdown services that
   restarted, including some that won't work right.

     sudo /usr/testbed/sbin/testbed-control shutdown
     sudo /usr/local/etc/rc.d/apache24 stop
234 235 236
     sudo /usr/local/etc/rc.d/ stop
     sudo /usr/local/etc/rc.d/ stop
     sudo /usr/local/etc/rc.d/capture stop
237 238

     sudo /usr/local/etc/rc.d/apache24 stop
     sudo /usr/local/etc/rc.d/ stop
241 242 243 244 245 246 247 248 249 250 251 252 253 254

   and then again run freebsd-update to finish:

    sudo /usr/sbin/freebsd-update install

   NOTE that it will tell you to rebuild all third-party packages and
   run freebsd-update again. We do this later below, so don't worry.

   Now you can compare against the files you saved to make sure all the
   Emulab changes were propagated; e.g.:

     sudo diff -r /Oetc /etc

   Of course, this will show you every change made by the update as well,
255 256
   so you might just want to focus on the files listed in B2 above. When
   you are happy:
257 258 259

     sudo rm -rf /Oetc

260 261 262 263 264 265 266 267 268 269 270 271 272 273
   LATE BREAKING NEWS: we have noticed that the changes to the password
   file (adding user _ypldap and changing "games" homedir) don't seem
   to be reflected, like the .db files didn't get properly recreated.
   (If "echo ~games" shows "/usr/games" instead of "/"). Remake the DB
   files to be certain:

     sudo pwd_mkdb -p /etc/master.passwd

   If you don't get this sorted out now, it may cause problems when you
   add users to the testbed later. In particular, when adding user "foo"
   it might spit out messages:

     pw: user 'foo' disappeared during update

4. The mothership may need some additional local hacks to some standard
275 276 277
   utilities, in particular "mountd" and "pw". Both should have a patch
   in the Emulab source tree patches subdir.

5. How did that work out for ya?

   If all went well, skip to C (Updating ports/packages).
281 282 283 284 285

   If that didn't work, see ~mike/upgrade-from-10.0.txt and follow steps
   A1 - A10. Return here for upgrading your ports.

C. Updating ports/packages

   Updating the core ports from 11.1 to 11.2 is pretty easy. However, if
289 290
   you installed extra ports that will require a bit more work.

291 292
0. If you forgot to save off your package info back in A4, or it has been
   awhile, then you might want to go back and do that now.

1. Modify your /etc/pkg/Emulab.conf file, replacing "11.1" with "11.2" in
295 296
   the "url" line:

      sudo sed -i .bak -e 's;/11.1/;/11.2/;' /etc/pkg/Emulab.conf
298 299 300 301 302 303

2. Unlock the pkg tool and install new packages:

    sudo pkg unlock pkg
    sudo -E ASSUME_ALWAYS_YES=true pkg upgrade -r Emulab

3. Tweak package installs:

306 307
   REALLY, REALLY IMPORTANT: at some point, the perl port stopped installing
   the /usr/bin/perl link which we use heavily in Emulab scripts. Ditto for
308 309 310 311 312 313 314
   python and the /usr/local/bin/python link. Make sure those two symlinks
   exist, e.g.:

      ls -la /usr/bin/perl /usr/local/bin/python

   If not, get them back with:

315 316
      sudo ln -sf /usr/local/bin/perl5 /usr/bin/perl
      sudo ln -sf /usr/local/bin/python2 /usr/local/bin/python
317 318 319 320

   REALLY, REALLY IMPORTANT PART 2: Because perl changed, you will need
   to make sure that the event library SWIG-generated perl module is rebuilt,
   and then all the event clients. Otherwise you will get bus errors when
321 322 323 324 325 326 327 328 329 330 331 332 333 334
   they all try to start. So do not skip step E2 below!

   REALLY, REALLY IMPORTANT PART 3: For those with Moonshot chassis,
   you cannot use an ipmitool port *newer* than 1.8.15 due to issues with
   "double bridged" requests. Either ipmitool or HPE got it wrong and it
   doesn't behave like ipmitool expects as of commit 6dec83ff on
   Sat Jul 25 13:15:41 2015. Anyway, you will need to relace the standard
   ipmitool install with the "emulab-ipmitool-old-1.8.15_1" package from
   the emulab repository:

     sudo pkg delete ipmitool
     sudo pkg install -r Emulab emulab-ipmitool-old

   But ONLY do this if you have Moonshot chassis.

4. Reinstall local ports.
337 338 339 340 341 342 343 344 345 346 347

   To find ports that are installed but that are not part of the Emulab
   pkg query "%t %n-%v %R" `cat boss.pkg.reinstall` |\
       grep -v Emulab | sort -n

   These will be sorted by install time. You can see ones that are old
   and attempt to reinstall them with "pkg install". Note that just because
   they are old that doesn't mean they need to be reinstalled.

D. Repeat steps B and C for ops.

E. Update Emulab software
351 352 353 354

1. Make sure your Emulab sources are up to date.

   You must use the emulab-devel repository at this point as only it has
   the necessary changes to support FreeBSD 11.2. If you don't already
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385
   have an emulab-devel repo, clone it with:

   git clone git://
   git clone

   Make sure to copy over your existing defs-* file to the new source

2. Reconfigure, rebuild, and reinstall the software.

   You want everything to be built against the new ports and libraries
   anyway though, so just rebuild and install everything.

   For this upgrade, you will also need to reinstall apache config files
   and move over the certs.

   In your build tree, look at config.log to see how it was configured
   and then:

      # on both (in different build trees!)
      cd <builddir>
      head config.log	# see what the configure line is
      sudo rm -rf *
      <run the configure line>

      # on ops -- do this first
      sudo gmake opsfs-install

      # on boss -- do this after ops
      sudo /usr/local/etc/rc.d/ start
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402
      sudo gmake all boss-install 

   The reason for the ops install is that, while boss-install updates
   most of the ops binaries/libraries via NFS, there are some that it
   doesn't. So by doing a separate build/install on ops, you are
   guaranteed to catch everything.

   If the boss install tells you that there are updates to install,
   run the command like it says:

      sudo gmake update-testbed
   This will actually turn the testbed back on at the end so you will
   not have to do #3 below. Note also that this command may take awhile
   and provide no feedback.

3. Re-enable the testbed on boss.

   sudo /usr/local/etc/rc.d/apache24 start
406 407
   sudo /usr/local/etc/rc.d/ start
   sudo /usr/testbed/sbin/testbed-control boot

4. Re-run the freebsd-update again to remove old shared libraries.

   Now that everything has been rebuilt:

   sudo freebsd-update install
414 415 416 417 418 419 420 421

5. Reboot boss and ops again!

   NOTE: if you reboot ops after boss, you may need to restart all the
   event schedulers from boss:

   sudo /usr/testbed/sbin/eventsys_start

F. Update the MFSes
423 424 425 426

   This is not strictly part of updating the OS, but it would be good to
   do this if you have not for awhile. See the instructions in