All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

  1. 11 Apr, 2018 1 commit
    • Leigh B Stoller's avatar
      Initial checkin of ONIE clientside. · 72d6a8e6
      Leigh B Stoller authored
      * Add onie-dongle and onie-dongle-install targets, which builds and
        installs (DESTDIR required) the bits and pieces we need. This install
        is intended to update the initram FS. ONIE operates as the admin MFS
        and the "frisbee" MFS, bootinfoclient used to emulate PXEWAIT
        waitmode.
      
      * Need to be build in the ONIE cross compiler environment, see the
        ftos.env and mlnx.env for the environment variables before config and
        build.
      
      * Basic operation is like the old CDROM; use bootinfoclient and tmcc
        bootwhat to drop into "admin" or "frisbee" mode, or boot the NOS. Use
        tmcc loadinfo and call onie-nos-install. Use a grub environment
        variable to tell grub to either boot the NOS (and then clear the
        variable) or boot into ONIE.
      72d6a8e6
  2. 05 Apr, 2018 1 commit
  3. 02 Apr, 2018 2 commits
    • Leigh B Stoller's avatar
      b13459fe
    • David Johnson's avatar
      Fix a race in kill/restart of pubsubd in rc.bootsetup . · a3b1a555
      David Johnson authored
      pubsubd wasn't restarting, surely because the existing pubsubd was still
      running and/or socket state was still live in the kernel even after
      putative death.  This took a long time to manifest, and it's not clear
      exactly what the problem was, but making sure pubsubd is dead (and is no
      longer holding its specific port) is appropriate even if we assume
      REUSEADDR is working, and fixes the current problem.  This was only
      observable on the pc3000s and c220g2s, as far as I saw.
      a3b1a555
  4. 30 Mar, 2018 2 commits
    • Mike Hibler's avatar
      Install the right file Mike.. · 396431a0
      Mike Hibler authored
      396431a0
    • Mike Hibler's avatar
      Support for frisbee direct image upload to fs node. · 99943a19
      Mike Hibler authored
      We have had issues with uploading images to boss where they are then written
      across NFS to ops. That seems to be a network hop too far on CloudLab Utah
      where we have a 10Gb control network. We get occasional transcient timeouts
      from somewhere in the TCP code. With the convoluted path through real and
      virtual NICs, some with offloading, some without, packets wind up getting
      out of order and someone gets far enough behind to cause problems.
      
      So we work around it.
      
      If IMAGEUPLOADTOFS is defined in the defs-* file, we will run a frisbee
      master server on the fs (ops) node and the image creation path directs the
      nodes to use that server. There is a new hack configuration for the master
      server "upload-only" which is extremely specific to ops: it validates the
      upload with the boss master server and, if allowed, fires up an upload
      server for the client to talk to. The image will thus be directly uploaded
      to the local (ZFS) /proj or /groups filesystems on ops. This seems to be
      enough to get around the problem.
      
      Note that we could allow this master server to serve downloads as well to
      avoid the analogous problem in that direction, but this to date has not
      been a problem.
      
      NOTE: the ops node must be in the nodes table in the DB or else boss will
      not validate proxied requests from it. The standard install procedure is
      supposed to add ops, but we have a couple of clusters where it is not in
      the table!
      99943a19
  5. 29 Mar, 2018 2 commits
  6. 27 Mar, 2018 1 commit
  7. 26 Mar, 2018 3 commits
  8. 22 Mar, 2018 2 commits
  9. 07 Mar, 2018 1 commit
    • David Johnson's avatar
      On Linux clients, fix eth speed via ethtool; fallback to autoneg on failure. · 9337f7d4
      David Johnson authored
      This is now our strategy for everything except a speed of 0 (which means
      autonegotiate); and a speed of 1 Gbps (which requires autoneg anyway).
      
      Not all cards allow ethtool to "fix" speeds; but some (still) require
      it.  For instance, this commit when applied to an Intel X710 card that
      should be at 10Gbps has no affect; apparently ethtool cannot fix speeds
      for that card and/or driver.  On the other hand, Mellanox 10/25Gbps
      cards sometimes require the speed to be manually set (i.e., if they are
      directly connected to each other via a layer1 switch).  Even on those
      cards, it's not really setting it; it's just hinting to the autoneg
      process which speed you really want (and I'd guess that is true for all
      modern high-speed Ethernet chips).
      
      Anyway, we don't know when to force the speed set/suggest or not unless
      we track more data at the server side, so the current strategy is to
      always attempt to set/suggest the speed we want, and fallback to setting
      autoneg if we fail.  Note that sometimes ethtool returns successfully
      even if settings fail (I'm looking at you, Intel X710), so this strategy
      is already sort of doomed to failure (but it hasn't made anything not
      work!).  If it causes problems for any cards/drivers/speed combinations,
      we'll revisit this, obviously.
      9337f7d4
  10. 27 Feb, 2018 1 commit
    • Elijah Grubb's avatar
      Updating alpine container to include tcsh · 4e7e0fe0
      Elijah Grubb authored
      I was able to get the alpine package maintainers
      to merge in a patch for tcsh that fixed the out
      of memory issues when attempting to use it as seen
      here: https://github.com/alpinelinux/aports/pull/3302.
      This commit has been lightly hand tested and basic
      tcsh features appear to work. There are some failures
      on alpine with the tcsh regression test suite which
      can be seen here: http://tpaste.us/L6R4.
      
      Associated symlinks that replaced tcsh with bash have
      been removed as a result of this update.
      
      Squashed commit of the following:
      
      commit dc2fc6ddbb1b1815907fabbf40764884a2805761
      Author: Elijah Grubb <u0894728@utah.edu>
      Date:   Tue Feb 27 15:08:14 2018 -0700
      
          Removed unnecessary symlink
      
      commit 0d974c183f6e27a8c069221f3a898b1edf490d8f
      Author: Elijah Grubb <u0894728@utah.edu>
      Date:   Tue Feb 27 13:54:54 2018 -0700
      
          Readding hopefully fixed tcsh
      4e7e0fe0
  11. 24 Feb, 2018 1 commit
  12. 22 Feb, 2018 1 commit
    • David Johnson's avatar
      Support for 25/40Gbps interfaces via ethtool for Linux clientside. · 8bc7e96c
      David Johnson authored
      Also, rework the rc.ifc code that we generate a bit.  We were "falling
      back" to 100Mbps if we didn't recognize the speed; now we autoconfig
      (like we do for FreeBSD).  We also now try autoconfig for mii-tool
      (which of course will never be used again, but gotta be correct).  We
      explicitly warn for speeds that mii-tool does not support, instead of
      doing something.
      8bc7e96c
  13. 21 Feb, 2018 1 commit
  14. 20 Feb, 2018 1 commit
  15. 16 Feb, 2018 1 commit
  16. 14 Feb, 2018 1 commit
  17. 09 Feb, 2018 3 commits
  18. 08 Feb, 2018 4 commits
  19. 07 Feb, 2018 2 commits
  20. 06 Feb, 2018 3 commits
  21. 05 Feb, 2018 1 commit
  22. 03 Feb, 2018 1 commit
  23. 02 Feb, 2018 4 commits