README-upgrade-10.3-11.2.txt 16.3 KB
Newer Older
1
Upgrading Emulab servers from FreeBSD 10.3 to 11.2.
2 3 4 5 6 7 8 9 10 11 12 13

These are fairly specific, but not always exact instructions for the process.
They are also oriented toward the CloudLab family of clusters, hence the
references to mothership, Clemson, Wisconsin, Apt, etc.

Start with the boss node, and then you will repeat the instructions for ops.
Note that there are a couple of steps below that you only do on the boss or
the ops node, so pay attention!

[XXX these would benefit from breaking ops out from boss as they are different
enough that describing them with a lather-rinse-repeat process is confusing...]

14
A. Things to do in advance of shutting down Emulab.
15

16 17 18 19
   These first few steps can be done in advance of shutting down your site.
   These include making a backup, fetching the new release files and merging
   in local changes, building a custom kernel (if you use one), and stashing
   away state about your current packages.
20

21
1. BACKUP IF YOU CAN!
22

23 24 25
   If your boss and ops are VM on Xen, you can create shadows of the disks
   that you can roll back to. Really only need to backup the root disk which
   has all the FreeBSD stuff. Login to the control node and:
26

27 28 29
   # apt
   sudo lvcreate -s -L 33g -n boss.backup xen-vg/boss
   sudo lvcreate -s -L 33g -n ops.backup xen-vg/ops
30

31 32 33
   # cloudlab utah/clemson
   sudo lvcreate -s -L 17g -n boss.backup xen-vg/boss
   sudo lvcreate -s -L 17g -n ops.backup xen-vg/ops
34

35 36 37
   This will seriously degrade the performance of the upgrade process due
   to the inefficiencies of disk writes when shadows are present, but it is
   worth it to avoid a total screw up.
38

39
2. Fetch the new release with freebsd-update.
40

41 42
   This will not install anything, it will just fetch the new files and merge
   local changes in. You can do this on both boss and ops simultaneously.
43

44 45 46 47
   Do not do it too far (i.e., more than a day) in advance, since the base
   system changes and your local mods may change as well. For example, new
   users might be added in the interim which would invalidate your merged
   changes. 
48

49 50
   Before fetching, make sure your /etc/freebsd-update.conf is correct,
   in particular the "Components" line.
51

52 53 54 55 56
   By default it will want to update your kernel ("kernel") and source tree
   ("src") as well as the binaries ("world"). Life will be much easier if you
   go with the flow and just let it do that. However, if you have a custom
   source tree (or update it yourself with svn or git) then remove "src"
   from the line:
57

58
     Components world kernel # don't update src
59

60
   If you have a custom kernel, then remove "kernel":
61

62
     Components world # don't update src or kernel
63 64 65 66 67 68 69 70 71

   However, because you are changing major releases, rebuilding your
   custom kernel (next step) will require rebuilding the entire world first,
   which takes a long time and pretty much elimiates the advantages of
   using the binary update system. So, you might reconsider why you have a
   custom kernel and move back to the default kernel instead. If you opt
   for the default GENERIC kernel, make sure to leave "kernel" in the
   components above.

72 73 74
   Once you have /etc/freebsd-update.conf squared away, do the "fetch"
   part of the upgrade:

75
     sudo freebsd-update -r 11.2-RELEASE upgrade
76 77 78 79 80

   Since this will ask you to merge a bunch of local changes into various
   files and will want to fire up an editor, you might want to make sure
   you get a *real* editor by doing:

81
     sudo -E EDITOR=emacs freebsd-update -r 11.2-RELEASE upgrade
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109

   instead. Otherwise you will probably wind up with vi.

   It will crunch for a long time and then probably want you to merge
   some conflicts. Here are a couple to take note of:

     * /etc/ssh/sshd_config: make sure Protocol does not include 1,
       otherwise it will spit out constant warnings to the console.

     * /etc/ttys: for Xen VMs make sure the getty on ttyu0 is "off"
       and not "onifconsole". Otherwise you will have competing gettys
       on /dev/console.
   
   NOTE: if you built and installed your system from sources originally,
   you may also get some conflicts with other files where it calls out diffs
   is the RCS header or in comments. Favor the newer versions of those to
   hopefully avoid future conflicts.

   NOTE: if it shows you a diff and asks you if something "looks reasonable"
   and you answer "no", it will dump you out of the update entirely and
   you have to start over. It will *not* just let you fire up the editor
   and fix things!

   It will then show you several potentially long lists of files that it
   will be adding, deleting, etc. It uses "more" to display them, so you
   can 'q' out of those without dumping out of the update entirely (the
   last one will exit the update, but that is because it is done).

110 111 112 113
3. (Optional) Upgrade your custom kernel
   If you have a custom kernel config, then you should build and install
   a new kernel first. As mentioned in the last step, this will take a long
   time because you must build (but not install) the entire world before
114 115
   building the kernel. You can again to this on boss and ops simultaneously.

116
   Clone the FreeBSD 11.2 source repo:
117 118 119

   cd /usr
   sudo mv src Osrc
120
   sudo svn checkout -q svn://svn0.us-west.freebsd.org/base/releng/11.2 src
121 122 123 124 125 126
   <copy over your custom config file from Osrc/sys/amd64/conf/CUSTOM>

   cd src
   sudo make -j 8 buildworld
   sudo make -j 8 buildkernel KERNCONF=CUSTOM

127
4. Stash away the current set of packages you have installed.
128

129 130 131
   This will allow you to figure out the extra ports you have installed so
   that you can update them later. First make a list of everything installed:
   Do this on boss and then on ops. For boss:
132

133 134 135
   mkdir ~/upgrade
   cd ~/upgrade
   pkg query "%n-%v %R" > boss.pkg.list
136

137 138 139
   This is mostly to keep track of any ports you may have installed locally.
   One way to determine local installs is to see which ports did NOT come
   from the Emulab repository:
140

141
   grep -v 'Emulab$' boss.pkg.list | awk '{ print $1; }' > boss.pkg.local
142

143
   This will give you the list of packages that you may need to reinstall.
144

145 146
   You may want to list the dependencies of each to see what the top-level
   packages are and just install those.
147

148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173
   pkg query -x "%n %v usedby=%#r" `cat boss.pkg.local` | \
       grep 'usedby=0' | awk '{ print $1; }' > boss.pkg.reinstall

   Now login to ops and do the same thing:

   cd ~/upgrade
   pkg query "%n-%v %R" > ops.pkg.list
   grep -v 'Emulab$' ops.pkg.list | awk '{ print $1; }' > ops.pkg.local
   pkg query -x "%n %v usedby=%#r" `cat ops.pkg.local` | \
       grep 'usedby=0' | awk '{ print $1; }' > ops.pkg.reinstall

B. Updating the base FreeBSD system

1. If you are on the boss node, shutdown the testbed and some other services
   right off the bat.

   boss:
     sudo /usr/testbed/sbin/testbed-control shutdown
     sudo /usr/local/etc/rc.d/2.mysql-server.sh stop
     sudo /usr/local/etc/rc.d/apache22 stop
     sudo /usr/local/etc/rc.d/capture stop

   ops:
     sudo /usr/local/etc/rc.d/1.mysql-server.sh stop
     sudo /usr/local/etc/rc.d/apache22 stop
     sudo /usr/local/etc/rc.d/capture stop
174
   
175 176
2. Before installing the new binaries/libraries/etc., you might want to back
   up the files that have Emulab changes just in case. Those files are:
177

178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193
     /etc/hosts
     /etc/ntp.conf       # if you have customized it
     /etc/ssh/sshd_config
     /etc/ttys           # if you have configured a serial console

   The easiest thing to do is just:

     sudo cp -rp /etc /Oetc

3. Install the new system binaries/libraries/etc:

   If it has been more than a day or so since you did the "upgrade"
   command back in step A2, then you might consider doing it again.
   Doing it again basically throws away everything it built up on the
   previous run and you will have to go through all the manual merging
   again. Once you are satisfied, do the install of the new binaries:
194 195 196

    sudo /usr/sbin/freebsd-update install

197 198
  After a while it will want you to reboot the new kernel. Before you reboot,
  if you built a custom kernel back in step A3, install it now:
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217

   cd /usr/src
   sudo make installkernel KERNCONF=CUSTOM

   When I did the custom kernel install, I saw errors of the form:

      kldxref /boot/kernel
      kldxref: unknown metadata record 4 in file atacard.ko
      kldxref: unknown metadata record 4 in file atp.ko
      ...

   They did not seem to affect the following boot.

   NOTE: I have noticed a couple of times on VM-based elabinelab boss/ops
   upgrades that the root filesystem has some issues after the upgrade,
   so it is good to run an fsck. I prefer to do this while shutting down.
   Before you do this, make sure you first have access to the console!

   sudo shutdown now
218

219 220 221 222 223 224 225 226 227 228 229 230 231
   umount -at nfs
   umount -at ufs
   accton	# turn off accounting that has a file open on /
   mount -o ro -u /
   fsck -y /
   reboot

   NOTE: when rebooting boss, mysqlcheck might take awhile when rebooting.

   When it comes back up, you should login and shutdown services that
   restarted, including some that won't work right.

   boss:
232 233 234 235 236
     sudo /usr/testbed/sbin/testbed-control shutdown
     sudo /usr/local/etc/rc.d/apache22 stop
     sudo /usr/local/etc/rc.d/2.dhcpd.sh stop
     sudo /usr/local/etc/rc.d/2.mysql-server.sh stop
     sudo /usr/local/etc/rc.d/capture stop
237 238

   ops:
239 240
     sudo /usr/local/etc/rc.d/apache22 stop
     sudo /usr/local/etc/rc.d/1.mysql-server.sh stop
241 242 243 244 245 246 247 248 249 250 251 252 253 254

   and then again run freebsd-update to finish:

    sudo /usr/sbin/freebsd-update install

   NOTE that it will tell you to rebuild all third-party packages and
   run freebsd-update again. We do this later below, so don't worry.

   Now you can compare against the files you saved to make sure all the
   Emulab changes were propagated; e.g.:

     sudo diff -r /Oetc /etc

   Of course, this will show you every change made by the update as well,
255 256
   so you might just want to focus on the files listed in B2 above. When
   you are happy:
257 258 259

     sudo rm -rf /Oetc

260 261 262 263 264 265 266 267 268 269 270 271 272 273
   LATE BREAKING NEWS: we have noticed that the changes to the password
   file (adding user _ypldap and changing "games" homedir) don't seem
   to be reflected, like the .db files didn't get properly recreated.
   (If "echo ~games" shows "/usr/games" instead of "/"). Remake the DB
   files to be certain:

     sudo pwd_mkdb -p /etc/master.passwd

   If you don't get this sorted out now, it may cause problems when you
   add users to the testbed later. In particular, when adding user "foo"
   it might spit out messages:

     pw: user 'foo' disappeared during update

274
4. The mothership may need some additional local hacks to some standard
275 276 277
   utilities, in particular "mountd" and "pw". Both should have a patch
   in the Emulab source tree patches subdir.

278
5. How did that work out for ya?
279

280
   If all went well, skip to C (Updating ports/packages).
281 282 283 284 285

   If that didn't work, see ~mike/upgrade-from-10.0.txt and follow steps
   A1 - A10. Return here for upgrading your ports.


286
C. Updating ports/packages
287

288
   Updating the core ports from 10.3 to 11.2 is pretty easy. However, if
289 290
   you installed extra ports that will require a bit more work.

291 292
0. If you forgot to save off your package info back in A4, or it has been
   awhile, then you might want to go back and do that now.
293

294
1. Modify your /etc/pkg/Emulab.conf file, replacing "10.3" with "11.2" in
295 296
   the "url" line:

297
      sudo sed -i .bak -e 's;/10.3/;/11.2/;' /etc/pkg/Emulab.conf
298 299 300 301 302 303

2. Unlock the pkg tool and install new packages:

    sudo pkg unlock pkg
    sudo -E ASSUME_ALWAYS_YES=true pkg upgrade -r Emulab

304
3. Tweak package installs:
305

306 307
   REALLY, REALLY IMPORTANT: at some point, the perl port stopped installing
   the /usr/bin/perl link which we use heavily in Emulab scripts. Ditto for
308 309 310 311 312 313 314
   python and the /usr/local/bin/python link. Make sure those two symlinks
   exist, e.g.:

      ls -la /usr/bin/perl /usr/local/bin/python

   If not, get them back with:

315 316
      sudo ln -sf /usr/local/bin/perl5 /usr/bin/perl
      sudo ln -sf /usr/local/bin/python2 /usr/local/bin/python
317 318 319 320

   REALLY, REALLY IMPORTANT PART 2: Because perl changed, you will need
   to make sure that the event library SWIG-generated perl module is rebuilt,
   and then all the event clients. Otherwise you will get bus errors when
321 322 323 324 325 326 327 328 329 330 331 332 333 334
   they all try to start. So do not skip step E2 below!

   REALLY, REALLY IMPORTANT PART 3: For those with Moonshot chassis,
   you cannot use an ipmitool port *newer* than 1.8.15 due to issues with
   "double bridged" requests. Either ipmitool or HPE got it wrong and it
   doesn't behave like ipmitool expects as of commit 6dec83ff on
   Sat Jul 25 13:15:41 2015. Anyway, you will need to relace the standard
   ipmitool install with the "emulab-ipmitool-old-1.8.15_1" package from
   the emulab repository:

     sudo pkg delete ipmitool
     sudo pkg install -r Emulab emulab-ipmitool-old

   But ONLY do this if you have Moonshot chassis.
335 336 337 338 339

4. Fix apache setup.

   We upgraded from apache 2.2 to apache 2.4 this go round, so you will have
   to tweak your /etc/rc.conf file and change any instances of "apache22" to
340 341 342 343 344
   "apache24" and copy over your certificates from the 2.2 install:

     sudo sed -i .bak -e 's;apache22;apache24;g' /etc/rc.conf
     sudo cp -r /usr/local/etc/apache2{2,4}/ssl.crt
     sudo cp -r /usr/local/etc/apache2{2,4}/ssl.key
345

346 347
   This is just the first part. One more thing will need to be done later
   when updating the Emulab software.
348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367

5. Reinstall local ports.

   To find ports that are installed but that are not part of the Emulab
   repository:
   
   pkg query "%t %n-%v %R" `cat boss.pkg.reinstall` |\
       grep -v Emulab | sort -n

   These will be sorted by install time. You can see ones that are old
   and attempt to reinstall them with "pkg install". Note that just because
   they are old that doesn't mean they need to be reinstalled.

6. Update your /etc/make.conf file in the event that you need to build a
   port from source in the future. Make sure your DEFAULT_VERSION line(s)
   look like:

   DEFAULT_VERSIONS=perl5=5.26 python=2.7 php=5.6 mysql=5.7 apache=2.4 tcltk=8.6
   DEFAULT_VERSIONS+=ssl=base

368
D. Repeat steps B and C for ops.
369

370
E. Update Emulab software
371 372 373 374

1. Make sure your Emulab sources are up to date.

   You must use the emulab-devel repository at this point as only it has
375
   the necessary changes to support FreeBSD 11.2. If you don't already
376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406
   have an emulab-devel repo, clone it with:

   git clone git://git-public.flux.utah.edu/emulab-devel.git
     or
   git clone http://git-public.flux.utah.edu/git/emulab-devel.git

   Make sure to copy over your existing defs-* file to the new source
   tree.

2. Reconfigure, rebuild, and reinstall the software.

   You want everything to be built against the new ports and libraries
   anyway though, so just rebuild and install everything.

   For this upgrade, you will also need to reinstall apache config files
   and move over the certs.

   In your build tree, look at config.log to see how it was configured
   and then:

      # on both (in different build trees!)
      cd <builddir>
      head config.log	# see what the configure line is
      sudo rm -rf *
      <run the configure line>

      # on ops -- do this first
      sudo gmake opsfs-install
      cd apache ; sudo gmake control-install

      # on boss -- do this after ops
407
      sudo /usr/local/etc/rc.d/2.mysql-server.sh start
408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424
      sudo gmake all boss-install 
      cd apache ; sudo gmake install

   The reason for the ops install is that, while boss-install updates
   most of the ops binaries/libraries via NFS, there are some that it
   doesn't. So by doing a separate build/install on ops, you are
   guaranteed to catch everything.

   If the boss install tells you that there are updates to install,
   run the command like it says:

      sudo gmake update-testbed
      
   This will actually turn the testbed back on at the end so you will
   not have to do #3 below. Note also that this command may take awhile
   and provide no feedback.

425
3. Re-enable the testbed on boss.
426

427 428 429
   sudo /usr/local/etc/rc.d/apache22 start
   sudo /usr/local/etc/rc.d/2.dhcpd.sh start
   sudo /usr/testbed/sbin/testbed-control boot
430

431
4. Re-run the freebsd-update again to remove old shared libraries.
432

433
   Now that everything has been rebuilt:
434

435
   sudo freebsd-update install
436 437 438 439 440 441 442 443

5. Reboot boss and ops again!

   NOTE: if you reboot ops after boss, you may need to restart all the
   event schedulers from boss:

   sudo /usr/testbed/sbin/eventsys_start

444
F. Update the MFSes
445 446 447 448

   This is not strictly part of updating the OS, but it would be good to
   do this if you have not for awhile. See the instructions in
   ops.emulab.net:~mike/upgrade-mfs.txt.