Update Using the Recovery MFS authored by David Johnson's avatar David Johnson
Using the Linux Recovery MFS
============================
The Linux Recovery MFS is a memory-based, network-booted environment
that can be used to examine and/or fix a non-booting experiment node.
Booting into the Recovery MFS
-----------------------------
When looking at your Emulab/CloudLab/POWDER experiment status page, you
can click on the troubled node, and select the `Recovery` option from
the popup menu. You will then be given more information and a `Confirm`
button in a dialogue that pops up. Click `Confirm`, and your node will
be rebooted and/or power cycled, and then network-booted into this
environment. Your account and ssh public keys will be installed as the
environment boots and configures, so once the Recovery MFS has booted,
login to your node via `ssh`. Note that we do not install account
passwords, so you cannot login on serial console as your own uid; but
you can login to the console via the root account and its random
password. This random password is available once you've opened the
console from the experiment status page.
Using the Recovery MFS
----------------------
There are many reasons why your node may not boot. We'll cover the most
common ones in this document, but you will possibly have to do some of
your own research based on the information here. It's best if you can
look at the console logs for your node (an option on the experiment
status page), and use any error information or timeouts to point you to
the right solution. Consider also what packages you may have
installed---note carefully if you upgraded the kernel, or grub.
Because the Linux MFS is network-booted via PXE/TFTP, it is optimized
for space. It has a carefully-chosen set of utilities that will help
you diagnose and repair problems related to your root filesystem and
bootloader. Its Linux kernel configuration is small, so it may not have
support for all things you might want to try, even if you've used
`chroot` to enter your root filesystem and run programs (and the goal of
this guide is to get you back up and running your own kernel and on-disk
environment, in any case).
It is non-trivial to *add* software to the MFS, due to its optimization
for space. The MFS is based on `buildroot`, and we use `uclibc` to
provide the standard C library, and customize its configuration for
space. This means that (at best) you should only expect
statically-linked executables to run in the MFS, and only if those
executables were built against something approximating the Linux ~4.14
ABI. If your statically-linked binary depends at all on the GNU libc
libraries (e.g. if your gcc was not built with `--enable-static-nss` and
your program needed NSS support), you won't be able to use it in this
context. The best solution would be to build your software against our
build environment, but this is non-trivial, so we don't currently
support it.
Finding the Partition Containing the Root Filesystem
----------------------------------------------------
We have two primary standard Linux image types. The newest are GPT-based, dual BIOS/UEFI-bootable (https://gitlab.flux.utah.edu/emulab/emulab-devel/-/wikis/Dual-boot-UEFI-BIOS-Images), and place the rootfs in the 3rd partition. Older BIOS bootable-only images have an MBR partition table, and place the rootfs in the 1st partition.
Our newer dual BIOS/UEFI-bootable images install both a UEFI loader in the EFI System Partition (GPT partition 1), and a grub loader in the MBR (note also the presence of the BIOS boot partition, GPT partition 2). Older BIOS-bootable images require the bootloader (typically grub) must also be installed to
the first partition, *not* to the MBR.
You can list the detected filesystems and non-empty partitions by
running the `blkid` command. For instance, `/dev/sda1` holds the root
partition in the output below:
```
# blkid
/dev/sda1: UUID="5fac4c69-51f6-47af-9773-a0d722426942" TYPE="ext3"
/dev/sda3: UUID="71e2e601-c555-466b-b925-e9d672f72835" TYPE="swap"
```
On nodes with NVME devices, the root partition would instead be named
`/dev/nvmen1p1`, or similar. Do not assume that `/dev/sda1` is the
root.
Checking for Root Filesystem Errors
-----------------------------------
If your node will not boot and you see on console that the root
filsystem failed to mount due to unfixable errors, you will need to use
the `fsck` program to correct these errors:
```
# fsck /dev/sda1
fsck 1.44.1 (24-Mar-2018)
e2fsck 1.44.1 (24-Mar-2018)
/dev/sda1: clean, 124348/1048576 files, 659357/4194304 blocks
```
You can run `fsck -h` to learn more about specific options, depending on
how badly your filesystem may be damaged.
Checking Filesystem Metadata
----------------------------
Our standard Linux images's root partitions are formatted using `ext3`
or `ext4`. You can use a program called `tune2fs` to view (and change)
metadata and filesystem options. Be careful when using tune2fs---know
what you are doing before using it to make changes! Here is an example
of listing root filesystem metadata on the root partition of our
`UBUNTU18-64-STD` image:
```
# tune2fs -l /dev/sda1
tune2fs 1.44.1 (24-Mar-2018)
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 5fac4c69-51f6-47af-9773-a0d722426942
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1048576
Block count: 4194304
Reserved block count: 209715
Free blocks: 3534947
Free inodes: 924228
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1023
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Filesystem created: Wed May 9 17:38:11 2018
Last mount time: Thu Mar 5 09:19:03 2020
Last write time: Thu Mar 5 10:05:48 2020
Mount count: 140
Maximum mount count: -1
Last checked: Wed May 9 17:38:11 2018
Check interval: 0 (<none>)
Lifetime writes: 280 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 91b8dc1a-0f8b-4236-8b4e-a6bfe93db18b
Journal backup: inode blocks
```
Mounting the Root Filesystem
----------------------------
Once you've verified that your root filesystem is intact, you'll want to
`mount` it at a mountpoint (typically `/mnt`). As described above, our
standard Linux image root partitions are formatted with the `ext3` or
`ext4` filesystems. The Recovery MFS does not support all filesystem
types, so YMMV if you're trying to examine a different kind of
filesystem. A straightforward use of `mount` should work (of course
change `/dev/sda1` to the device that hosts your root filesystem):
```
mount /dev/sda1 /mnt
```
If `mount` complains, you may need to manually specify the filesystem
type (and/or first run `fsck` on the partition as described above):
```
mount -t ext3 /dev/sda1 /mnt
```
Make sure you haven't already mounted something else at `/mnt`. If you
need more than one partition mounted at once, you'll need to create more
directories to serve as additional mountpoints. Always make sure to
`umount` *everything* you mount before rebooting or exiting the Recovery
MFS!
At this point, you can browse filesystem contents and make any necessary
changes. For instance, you might want to view the syslog:
```
less /mnt/var/log/messages
```
If you forget whether or where you've mounted the root filesystem, you
can simply invoke `mount` with no options:
```
mount
```
The final line should tell you if and where you've mounted the root
filesystem.
Using `chroot` to Run On-Disk Programs
--------------------------------------
Once you have mounted your root partition, you may want or need to run
on-disk programs to make repairs. To do this, you need to mount
additional special filesystems and files into the mountpoint containing
your root filesystem. The details of these special filesystems is
beyond the scope of this document, and not all programs require all of
these filesystems to be mounted; other more comprehensive guides are
easy to find. Not all on-disk software will run reliably in this
configuration, but most tools will.
Again, make sure you already have your root filesystem
mounted at `/mnt`. Then:
```
mount -o bind /proc /mnt/proc
mount -o bind /dev /mnt/dev
mount -o bind /dev/pts /mnt/dev/pts
mount -o bind /sys /mnt/sys
# If you need internet access and name resolution:
mount -o bind /etc/resolv.conf /mnt/etc/resolv.conf
# Change your root to be the root of the on-disk filesystem:
chroot /mnt /bin/bash
# Run whatever programs you need, such as apt-get or grub-install.
# e.g., $ apt-get update && apt-get install ...
# e.g., $ grub-install --force /dev/sda1
# Exit the chroot environment
exit
# Unmount special filesystems (in reverse order from above):
umount /mnt/etc/resolv.conf /mnt/sys /mnt/dev/pts /mnt/dev /mnt/proc
# Unmount the root filesystem:
umount /mnt
```
Reinstalling the Bootloader (grub2)
-----------------------------------
If you don't see any messages from the on-disk bootloader (usually
`grub2`) in the console log of your failing node, you may well have
broken your bootloader installation or its configuration. As mentioned
earlier, the bootloader for our standard Linux images *must* be
installed to the first partition of the boot device, and this partition
also contains the root filesystem. Do not reinstall to the MBR; this
will not help you unless your disk image is a whole-disk image, and this
is almost never the case.
We do not provide `grub2` binaries in the MFS, so you will need to use
the `chroot` strategy above to run `grub-install` or `grub2-install`
from within your on-disk root filesystem, once you've mounted it as
described above. Once you've entered the `chroot` environment, and know
the device containing your root filesystem, you can do
```
grub-install --force /dev/sda1
```
(Replace `/dev/sda1` with the path of the partition containing your root
filesystem.)
Make sure to `umount` special filesystems and the root filesystem when
you're finished, before rebooting!
Advanced Filesystem Configurations
----------------------------------
The Recovery MFS does provide LVM (e.g., `lvdisplay` et al) and software
RAID (e.g. `dmraid`) tools, so you could use them to examine, expose,
and mount logical volumes or software RAID devices---but these advanced
configurations are beyond the scope of this document. You will need to
read and understand either toolset to proceed.
Recovering non-Linux filesystems
--------------------------------
Using the Linux Recovery MFS
============================
The Linux Recovery MFS is a memory-based, network-booted environment
that can be used to examine and/or fix a non-booting experiment node.
Booting into the Recovery MFS
-----------------------------
When looking at your Emulab/CloudLab/POWDER experiment status page, you
can click on the troubled node, and select the `Recovery` option from
the popup menu. You will then be given more information and a `Confirm`
button in a dialogue that pops up. Click `Confirm`, and your node will
be rebooted and/or power cycled, and then network-booted into this
environment. Your account and ssh public keys will be installed as the
environment boots and configures, so once the Recovery MFS has booted,
login to your node via `ssh`. Note that we do not install account
passwords, so you cannot login on serial console as your own uid; but
you can login to the console via the root account and its random
password. This random password is available once you've opened the
console from the experiment status page.
Using the Recovery MFS
----------------------
There are many reasons why your node may not boot. We'll cover the most
common ones in this document, but you will possibly have to do some of
your own research based on the information here. It's best if you can
look at the console logs for your node (an option on the experiment
status page), and use any error information or timeouts to point you to
the right solution. Consider also what packages you may have
installed---note carefully if you upgraded the kernel, or grub.
Because the Linux MFS is network-booted via PXE/TFTP, it is optimized
for space. It has a carefully-chosen set of utilities that will help
you diagnose and repair problems related to your root filesystem and
bootloader. Its Linux kernel configuration is small, so it may not have
support for all things you might want to try, even if you've used
`chroot` to enter your root filesystem and run programs (and the goal of
this guide is to get you back up and running your own kernel and on-disk
environment, in any case).
It is non-trivial to *add* software to the MFS, due to its optimization
for space. The MFS is based on `buildroot`, and we use `uclibc` to
provide the standard C library, and customize its configuration for
space. This means that (at best) you should only expect
statically-linked executables to run in the MFS, and only if those
executables were built against something approximating the Linux ~4.14
ABI. If your statically-linked binary depends at all on the GNU libc
libraries (e.g. if your gcc was not built with `--enable-static-nss` and
your program needed NSS support), you won't be able to use it in this
context. The best solution would be to build your software against our
build environment, but this is non-trivial, so we don't currently
support it.
Finding the Partition Containing the Root Filesystem
----------------------------------------------------
We have two primary standard Linux image types. The newest are GPT-based, dual BIOS/UEFI-bootable (https://gitlab.flux.utah.edu/emulab/emulab-devel/-/wikis/Dual-boot-UEFI-BIOS-Images), and place the rootfs in the 3rd partition. Older BIOS bootable-only images have an MBR partition table, and place the rootfs in the 1st partition.
Our newer dual BIOS/UEFI-bootable images install both a UEFI loader in the EFI System Partition (GPT partition 1), and a grub loader in the MBR (note also the presence of the BIOS boot partition, GPT partition 2). Older BIOS-bootable images require the bootloader (typically grub) must also be installed to
the first partition, *not* to the MBR.
You can list the detected filesystems and non-empty partitions by
running the `blkid` command. For instance, `/dev/sda1` holds the root
partition in the output below:
```
# blkid
/dev/sda1: UUID="5fac4c69-51f6-47af-9773-a0d722426942" TYPE="ext3"
/dev/sda3: UUID="71e2e601-c555-466b-b925-e9d672f72835" TYPE="swap"
```
On nodes with NVME devices, the root partition would instead be named
`/dev/nvmen1p1`, or similar. Do not assume that `/dev/sda1` is the
root.
Checking for Root Filesystem Errors
-----------------------------------
If your node will not boot and you see on console that the root
filsystem failed to mount due to unfixable errors, you will need to use
the `fsck` program to correct these errors:
```
# fsck /dev/sda1
fsck 1.44.1 (24-Mar-2018)
e2fsck 1.44.1 (24-Mar-2018)
/dev/sda1: clean, 124348/1048576 files, 659357/4194304 blocks
```
You can run `fsck -h` to learn more about specific options, depending on
how badly your filesystem may be damaged.
Checking Filesystem Metadata
----------------------------
Our standard Linux images's root partitions are formatted using `ext3`
or `ext4`. You can use a program called `tune2fs` to view (and change)
metadata and filesystem options. Be careful when using tune2fs---know
what you are doing before using it to make changes! Here is an example
of listing root filesystem metadata on the root partition of our
`UBUNTU18-64-STD` image:
```
# tune2fs -l /dev/sda1
tune2fs 1.44.1 (24-Mar-2018)
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 5fac4c69-51f6-47af-9773-a0d722426942
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1048576
Block count: 4194304
Reserved block count: 209715
Free blocks: 3534947
Free inodes: 924228
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1023
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Filesystem created: Wed May 9 17:38:11 2018
Last mount time: Thu Mar 5 09:19:03 2020
Last write time: Thu Mar 5 10:05:48 2020
Mount count: 140
Maximum mount count: -1
Last checked: Wed May 9 17:38:11 2018
Check interval: 0 (<none>)
Lifetime writes: 280 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 91b8dc1a-0f8b-4236-8b4e-a6bfe93db18b
Journal backup: inode blocks
```
Mounting the Root Filesystem
----------------------------
Once you've verified that your root filesystem is intact, you'll want to
`mount` it at a mountpoint (typically `/mnt`). As described above, our
standard Linux image root partitions are formatted with the `ext3` or
`ext4` filesystems. The Recovery MFS does not support all filesystem
types, so YMMV if you're trying to examine a different kind of
filesystem. A straightforward use of `mount` should work (of course
change `/dev/sda1` to the device that hosts your root filesystem):
```
mount /dev/sda1 /mnt
```
If `mount` complains, you may need to manually specify the filesystem
type (and/or first run `fsck` on the partition as described above):
```
mount -t ext3 /dev/sda1 /mnt
```
Make sure you haven't already mounted something else at `/mnt`. If you
need more than one partition mounted at once, you'll need to create more
directories to serve as additional mountpoints. Always make sure to
`umount` *everything* you mount before rebooting or exiting the Recovery
MFS!
At this point, you can browse filesystem contents and make any necessary
changes. For instance, you might want to view the syslog:
```
less /mnt/var/log/messages
```
If you forget whether or where you've mounted the root filesystem, you
can simply invoke `mount` with no options:
```
mount
```
The final line should tell you if and where you've mounted the root
filesystem.
Using `chroot` to Run On-Disk Programs
--------------------------------------
Once you have mounted your root partition, you may want or need to run
on-disk programs to make repairs. To do this, you need to mount
additional special filesystems and files into the mountpoint containing
your root filesystem. The details of these special filesystems is
beyond the scope of this document, and not all programs require all of
these filesystems to be mounted; other more comprehensive guides are
easy to find. Not all on-disk software will run reliably in this
configuration, but most tools will.
Again, make sure you already have your root filesystem
mounted at `/mnt`. Then:
```
mount -o bind /proc /mnt/proc
mount -o bind /dev /mnt/dev
mount -o bind /dev/pts /mnt/dev/pts
mount -o bind /sys /mnt/sys
# If you need internet access and name resolution:
mount -o bind /etc/resolv.conf /mnt/etc/resolv.conf
# Or, if the above command fails because /mnt/etc/resolv.conf
# is a symlink to the systemd stub resolver, try
mount -o bind /etc/resolv.conf /mnt/run/systemd/resolve/stub-resolv.conf
# Change your root to be the root of the on-disk filesystem:
chroot /mnt /bin/bash
# Run whatever programs you need, such as apt-get or grub-install.
# e.g., $ apt-get update && apt-get install ...
# e.g., $ grub-install --force /dev/sda1
# Exit the chroot environment
exit
# Unmount special filesystems (in reverse order from above):
umount /mnt/etc/resolv.conf /mnt/sys /mnt/dev/pts /mnt/dev /mnt/proc
# Unmount the root filesystem:
umount /mnt
```
Reinstalling the Bootloader (grub2)
-----------------------------------
If you don't see any messages from the on-disk bootloader (usually
`grub2`) in the console log of your failing node, you may well have
broken your bootloader installation or its configuration. As mentioned
earlier, the bootloader for our standard Linux images *must* be
installed to the first partition of the boot device, and this partition
also contains the root filesystem. Do not reinstall to the MBR; this
will not help you unless your disk image is a whole-disk image, and this
is almost never the case.
We do not provide `grub2` binaries in the MFS, so you will need to use
the `chroot` strategy above to run `grub-install` or `grub2-install`
from within your on-disk root filesystem, once you've mounted it as
described above. Once you've entered the `chroot` environment, and know
the device containing your root filesystem, you can do
```
grub-install --force /dev/sda1
```
(Replace `/dev/sda1` with the path of the partition containing your root
filesystem.)
Make sure to `umount` special filesystems and the root filesystem when
you're finished, before rebooting!
Advanced Filesystem Configurations
----------------------------------
The Recovery MFS does provide LVM (e.g., `lvdisplay` et al) and software
RAID (e.g. `dmraid`) tools, so you could use them to examine, expose,
and mount logical volumes or software RAID devices---but these advanced
configurations are beyond the scope of this document. You will need to
read and understand either toolset to proceed.
Recovering non-Linux filesystems
--------------------------------
If your node is running FreeBSD rather than Ubuntu or CENTOS Linux, then the Linux recovery filesystem won't help you right now. Contact testbed-ops for help.
\ No newline at end of file