1. 14 Nov, 2016 7 commits
    • Mike Hibler's avatar
      Ignore state transitions from NORMALv2/ISUP -> BOOTING. · 990fcc0b
      Mike Hibler authored
      In the !BOOTINFO_EVENTS world, someone making a random DHCP request would
      cause a state transition to BOOTING which would start a timeout ticking and
      most likely would timeout in a couple of minutes and reboot the node.
      990fcc0b
    • Mike Hibler's avatar
      Fix obscure error with (not) invalidating disks during reload. · 4925a276
      Mike Hibler authored
      Our MBR/superblock/LVM/ZFS smashing code in rc.frisbee relied on dmesg
      output to determine the local disks to call zapdisk on. However, the RE we
      used assumed well ordered output like:
      
        da0 at mpt0 bus 0 scbus0 target 0 lun 0
        da0: <ATA WDC WD5003ABYZ-0 1S03> Fixed Direct Access SPC-3 SCSI device
        da0: Serial Number      WD-WMAYP0DPNFLM
        da0: 300.000MB/s transfers
        da0: Command Queueing enabled
        da0: 476940MB (976773168 512 byte sectors)
      
      where we matched that last line. But due to the asynchronous nature of disk
      initialization, probably due to some soon-to-be-failing disks on the d710s,
      the last line was delayed and came out mashed-up with the da1 output:
      
        da1: <ATA WDC WD5003ABYX-1 1S02> Fixed Direct Access SPC-3 SCSI device
        da1: Serial Number      WD-WMAYP4939538
        da1: 300.000MB/s transfersda0: 476940MB (976773168 512 byte sectors)
      
      so we didn't see da0 and didn't call zapdisk on it. This led to some LVM
      metadata on /dev/sda4 leaking through to a new experiment and if that experiment
      tried to setup LVM (e.g., a vnode host), it would blow up.
      
      Now we use a sysctl call (kern.disks) to get the disk names.
      4925a276
    • David Johnson's avatar
    • Jonathon Duerig's avatar
    • Jonathon Duerig's avatar
      All portals hosted from the main site should use the 'emulab.net' domain when... · 3d66b3f4
      Jonathon Duerig authored
      All portals hosted from the main site should use the 'emulab.net' domain when displaying URNs for images.
      3d66b3f4
    • Mike Hibler's avatar
      Tweaks to the agreement between mkextrafs and the blockstore system. · 939b0ae7
      Mike Hibler authored
      For the case in which mkextrafs is used to create local homedirs/projdirs:
      
      Look for the desired mount point (/local) in /etc/fstab and use that if
      it exists (i.e., that FS was already setup by the blockstore system or a
      previous mkextrafs).
      
      Otherwise, look for /var/emulab/boot/extrafs which should contain info
      left behind by the local blockstore setup code indicating a FS or unused
      device to use. For an unused device, rc.storage will identify the largest
      available device that is at least 10MB.
      939b0ae7
    • Leigh B Stoller's avatar
  2. 13 Nov, 2016 4 commits
    • Leigh B Stoller's avatar
      Add portal_monitor startup.x · f91bd4c6
      Leigh B Stoller authored
      f91bd4c6
    • Leigh B Stoller's avatar
      Minor tweak to make schemacheck happy. · 459fce68
      Leigh B Stoller authored
      459fce68
    • Leigh B Stoller's avatar
      b95550af
    • Leigh B Stoller's avatar
      Bring the cluster monitor "inhouse", rather then depending on the jfed · d7c4230e
      Leigh B Stoller authored
      monitoring system.
      
      New portal_monitor daemon does a GetVersion/ListResources call at each
      of the clusters every five minutes, and updates the new table in the
      DB called apt_aggregate_status. We calculate free/inuse counts for
      physical nodes and a free count for VMs. Failure to contact the
      aggregate for more then 10 minutes sets the aggregate as down, since
      from our perspective if we cannot get to it, the cluster is down.
      
      Unlike the jfed monitoring system, we are not going to try to
      instantiate a new experiment or ssh into it. Wait and see if that is
      necessary in our context.
      
      On the instantiate page, generate a json structure for each cluster,
      similar the one described in issue #172 by Keith. This way we can easily
      switch the existing code over to this new system, but fail back to the
      old mechanism if this turn out to be a bust.
      
      Some other related changes to how we hand cluster into the several web
      pages.
      d7c4230e
  3. 12 Nov, 2016 1 commit
  4. 11 Nov, 2016 8 commits
  5. 10 Nov, 2016 2 commits
    • Gary Wong's avatar
      Add atomic updating of existing reservations. · 21413c23
      Gary Wong authored
      21413c23
    • David Johnson's avatar
      Fix two bugs in skb processing in Linux ipod module. · 0e3d8b99
      David Johnson authored
      One was minor (not rolling the ip optional field length into
      pskb_may_pull check).  The second was not minor; we weren't
      appropriately calling pskb_may_pull to check if the iph + icmph + ipod secret
      was in a linear buf... and then we finally ran across a driver for which
      the ipod secret did not fully fit in the first skb buffer chunk... so
      linearization was actually necessary.
      
      Another way that has been suggested to fix the potential bugs that arise
      from linearization, the use of skb_header_pointer, isn't the most
      desireable option in this case, since it costs more stack memory *for
      each* input ICMP packet (and nearly 100% of the time, it's not an ipod
      and we don't care).
      0e3d8b99
  6. 09 Nov, 2016 3 commits
  7. 08 Nov, 2016 5 commits
  8. 07 Nov, 2016 8 commits
  9. 06 Nov, 2016 2 commits