1. 29 Mar, 2019 1 commit
  2. 27 Mar, 2019 1 commit
  3. 25 Mar, 2019 1 commit
  4. 13 Mar, 2019 1 commit
  5. 06 Mar, 2019 5 commits
  6. 04 Mar, 2019 1 commit
  7. 28 Feb, 2019 1 commit
  8. 19 Feb, 2019 1 commit
  9. 15 Feb, 2019 1 commit
    • Leigh Stoller's avatar
      Fix the empty password on the user account we create on the dell · 69c75587
      Leigh Stoller authored
      switches, I must have forgot to deal with it.
      
      Basically, the FTOS manual says you can give it an MD5 or sha256 hash of
      the password. Well, FTOS must come from another dimension, cause the MD5
      and sha256 it generates, is different then what boss generates, for the
      same input string. So that makes it hard to set up an account with a
      password, unless I send over the cleartext. Which is okay, it stores it
      internally as the hash (I cannot seem to generate). So we send over the
      cleartext per-node root password that is regenerated for each time it is
      allocated to an experiment. Not a big deal, we load the user's ssh keys,
      they should not need to use passwords.
      69c75587
  10. 12 Feb, 2019 1 commit
    • Leigh Stoller's avatar
      Recovery mode: · bde6c94d
      Leigh Stoller authored
      * Add a new Portal context menu option to nodes, to boot into "recovery"
        mode, which will be a Linux MFS (rather then the FreeBSD MFS, which
        99% of user will not know what to do with).
      
      * Plumb all through to the Geni RPC interface, which invokes node_admin
        with a new option, to use the recovery mfs nodetype attribute.
      
      * recoverymfs_osid is a distinct osid from adminmfs_osid, we use that in
        the CM to add an Emulab name space attribute to the manifest, that
        tells the Portal that a node supports recovery mode (and thus gets a
        context menu option).
      
      * Add an inrecovery flag to the sliver status blob, which the Portal
        uses to determine that a node is currently in recovery mode, so that
        we can indicate that in the topology and list tabs.
      bde6c94d
  11. 07 Feb, 2019 1 commit
  12. 03 Jan, 2019 1 commit
  13. 02 Jan, 2019 1 commit
  14. 14 Dec, 2018 1 commit
  15. 13 Dec, 2018 1 commit
  16. 06 Dec, 2018 1 commit
    • Leigh Stoller's avatar
      Various fixes for ualloc switches: · cdcbedc7
      Leigh Stoller authored
      * Stop using the ALWAYSUP state machine for switches, this causes ISUP
        to always get sent, which in certain cases, results in stated
        rebooting the switch!
      
        Added new ONIE state machine, which handles the way switches actually
        boot into ONIE first and then does the bootinfo/grub dance, or does a
        reload or does admin mode.
      
      * Do not send PXEBOOTING from ONIE; this was a mistake, it throws us
        into the PXEKERNEL state machine, which sometimes results is stated
        rebooting the switch!
      
        We still use PXEWAIT (it is sent by bootinfod), since that is the
        "waiting" state that is wired into a lot of Emulab, it just happens to
        now be a state in the ONIE state machine, so its legal.
      
      * Fix a bug in libossetup, that was fooling libossetup_switch into
        thinking the wrong thing.
      
      * Add some timeouts to the libosload_mlnx code, sshd sometime refuses to
        answer after a failed login. Strange.
      
      * Fix a fork() problem in the switch reload code; gotta call exit, not
        return! This was wreaking subtle (okay not so subtle) havoc in
        libossetup.
      cdcbedc7
  17. 03 Dec, 2018 1 commit
  18. 29 Nov, 2018 1 commit
  19. 28 Nov, 2018 2 commits
  20. 26 Nov, 2018 1 commit
  21. 16 Nov, 2018 1 commit
  22. 15 Nov, 2018 1 commit
  23. 14 Nov, 2018 1 commit
    • Leigh Stoller's avatar
      Use sunlink flag to prevent users from removing critical directories in · 982f3f59
      Leigh Stoller authored
      /proj. Applied to top level only for now, since that was reasonably easy
      to do, since projects and group stuff is all done on ops already (where
      the chflags has to run). We could apply this to experiment and image
      directories too, but we all know the better approach is to stop mounting
      /proj on experimental nodes, right?
      
      Also a new script mkprojdirs to create/recreate missing project
      directories and do the chflags (calls over to ops and uses the
      existing proxy script).
      982f3f59
  24. 10 Nov, 2018 1 commit
  25. 08 Nov, 2018 1 commit
  26. 07 Nov, 2018 2 commits
  27. 05 Nov, 2018 2 commits
    • Leigh Stoller's avatar
      Working Mellanox user alloc switch support (issue #445): · 95e7bded
      Leigh Stoller authored
      * The primary problem with the mellanox is that the install image does a
        kexec out of ONIE into Linux, spends 30+ minutes doing stuff, and then
        reboots. This throws the reload state machine out of whack cause we do
        not get a chance to send the RELOADDONE state. So ... some change to
        rc.testbed and rc.reload on the USB dongle: the ONIE MFS sends
        RELOADING and writes a flag file to the ONIE partition on the
        "disk" (not the usb). Then the kexec into MLNX, the install happens,
        and reboots. The next boot into ONIE sees the flag file, erases it and
        sends REDLOADDONE. Waits for a bit, and then continues on the normal
        path. This abuses stated in that there a whiny messages in the stated
        log file, but I am immune to stated whining.
      
      * Another item of note is that the switch DHCPs, but only to get the IP
        info, there is no ability to give it an initial config file like we
        can with the Dell switches. The main problem here is that the switch
        comes up with its default login/password which is obviously well known
        cause its in the manual. That means there is a window where the switch
        is vulnerable, but since we block the switches from the public side,
        this is not a serious problem. As soon as we can get in (sshd is
        running) we login and update the config with passwords, keys,
        etc.
      
      * Other changes to the machine dependent osload library module, I had
        done some of this before switching to the Dells way back when, but it
        needed to be updated/completed.
      95e7bded
    • Leigh Stoller's avatar
      Changes to how we handle/report mapping failures that also fail the · 11074445
      Leigh Stoller authored
      empty testbed test.
      
      Prior to this commit, we were not invoking the empty testbed case
      consitently. Now we do, but that exposed another problem; reporting that
      to the error to the Portal in a meaningful way. Basically, we can report
      a different error code for an impossible to map error, but then we lose
      the info we store now about what the actual failure was (which we show
      to the user with additional helpful info). Since we cannot (easily)
      change the Geni API for CreateSliver(), I have elected to continue the
      practice of returning the specific error codes (which also go into the
      database for long term historical info), and add more helpful text that
      for the Portal user that explains clearly that the mapping is impossible
      on the target cluster. This extra text also go into the database in the
      attached message field, so we ccan come back later and post process if
      we decide to do something different.
      11074445
  28. 30 Oct, 2018 1 commit
  29. 26 Oct, 2018 1 commit
  30. 11 Oct, 2018 1 commit
  31. 10 Oct, 2018 1 commit
  32. 27 Sep, 2018 1 commit
  33. 19 Sep, 2018 1 commit