Commit 95e7bded authored by Leigh Stoller's avatar Leigh Stoller

Working Mellanox user alloc switch support (issue #445):

* The primary problem with the mellanox is that the install image does a
  kexec out of ONIE into Linux, spends 30+ minutes doing stuff, and then
  reboots. This throws the reload state machine out of whack cause we do
  not get a chance to send the RELOADDONE state. So ... some change to
  rc.testbed and rc.reload on the USB dongle: the ONIE MFS sends
  RELOADING and writes a flag file to the ONIE partition on the
  "disk" (not the usb). Then the kexec into MLNX, the install happens,
  and reboots. The next boot into ONIE sees the flag file, erases it and
  sends REDLOADDONE. Waits for a bit, and then continues on the normal
  path. This abuses stated in that there a whiny messages in the stated
  log file, but I am immune to stated whining.

* Another item of note is that the switch DHCPs, but only to get the IP
  info, there is no ability to give it an initial config file like we
  can with the Dell switches. The main problem here is that the switch
  comes up with its default login/password which is obviously well known
  cause its in the manual. That means there is a window where the switch
  is vulnerable, but since we block the switches from the public side,
  this is not a serious problem. As soon as we can get in (sshd is
  running) we login and update the config with passwords, keys,
  etc.

* Other changes to the machine dependent osload library module, I had
  done some of this before switching to the Dells way back when, but it
  needed to be updated/completed.
parent 11074445
......@@ -25,6 +25,12 @@
# The device is always the same in ONIE.
DISKDEV=/dev/sda
# Special grub env file for our flipping.
EMULABENV=/mnt/onie-boot/emulabenv
# Marker file for MLNX reload. See below.
MLNXRELOAD=/mnt/onie-boot/mlnxreload
if [ -r /etc/emulab/paths.sh ]; then
. /etc/emulab/paths.sh
else
......@@ -33,8 +39,8 @@ else
ETCDIR=/etc/testbed
fi
PLATFORM=`onie-sysinfo -b`
TMCC="$BINDIR/tmcc"
BOSSNAME=`$TMCC bossinfo | cut -d ' ' -f 2`
#
......@@ -112,9 +118,25 @@ handle_loadinfo()
zap_flash
# See ./rc.testbed for an explanation.
if [ "$PLATFORM" = "mlnx_x86" ]; then
/bin/touch $MLNXRELOAD
rc=$?
if [ $rc -ne 0 ]; then
echo "Failed to create $MLNXRELOAD"
return 1
fi
fi
write_image $IMAGEPATH || {
return 1
}
if [ "$PLATFORM" = "mlnx_x86" ]; then
# Ah, we loaded an image that does not kexec, so we returned.
/bin/rm -f $MLNXRELOAD
fi
echo "Image load complete at `date`"
return 0
}
......
......@@ -28,6 +28,9 @@ DISKDEV=/dev/sda
# Special grub env file for our flipping.
EMULABENV=/mnt/onie-boot/emulabenv
# Marker file for MLNX reload. See below.
MLNXRELOAD=/mnt/onie-boot/mlnxreload
if [ -r /etc/emulab/paths.sh ]; then
. /etc/emulab/paths.sh
else
......@@ -36,8 +39,19 @@ else
ETCDIR=/etc/testbed
fi
PLATFORM=`onie-sysinfo -b`
TMCC="$BINDIR/tmcc"
# Make sure this exists.
if [ ! -s $EMULABENV ]; then
grub-editenv $EMULABENV create
rc=$?
if [ $rc -ne 0 ]; then
echo "Failed to create new grub env"
exit 1
fi
fi
#
# Extract a variable of interest from the VAR=VALUE string and return value.
# If variable does not exist, return the given default (if provided).
......@@ -127,21 +141,13 @@ boot_nos()
{
echo "Setting up to boot the NOS"
if [ ! -s $EMULABENV ]; then
grub-editenv $EMULABENV create
rc=$?
if [ $rc -ne 0 ]; then
echo "Failed to create new grub env"
return 1
fi
fi
grub-editenv $EMULABENV set bootnos=yes
rc=$?
if [ $rc -ne 0 ]; then
echo "Failed to update grub env with bootnos=yes"
return 1
fi
# Tell boss we are booting into reload MFS.
# Tell boss we are booting into the NOS.
$TMCC state BOOTING
sleep 5
......@@ -150,6 +156,23 @@ boot_nos()
exit 0;
}
#
# Special case; we just did a reload on an MLNX switch that did a kexec
# did the install and then rebooted. So we get here and this file exists
# (see rc.reload). Remove the file, send the RELOADDONE event and keep
# going.
#
if [ "$PLATFORM" = "mlnx_x86" ]; then
if [ -e $MLNXRELOAD ]; then
/bin/rm -f $MLNXRELOAD
echo "sending RELOADDONE"
$TMCC state RELOADDONE
echo "waiting a bit for server to react"
sleep 15
fi
fi
#
# We might need to wait for something to do, so loop.
#
......
......@@ -85,16 +85,11 @@ sub createExpectObject($$)
return -1
if (!defined($admin_pswd));
if (0 && !exists($INC{'libtblog.pm'})) {
close(SOUT);
close(SERR);
print "Closing SOUT\n";
}
$self->dprint(0,"$self createExpectObject($node_id):\n");
# Host keys change every reload, do not want to save them.
my $spawn_cmd = "ssh -o userknownhostsfile=/dev/null ".
"-l $admin_user $node_id";
$self->dprint(0,"$self createExpectObject($node_id): $spawn_cmd\n");
# Create Expect object and initialize it:
my $exp = new Expect();
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment