1. 14 Oct, 2019 1 commit
    • Mike Hibler's avatar
      Finish off the "copystatus" blockstore server command. · 2c0cdccf
      Mike Hibler authored
      A command to get the status of an ongoing copy. Will also let you
      know if a copy has aborted for some reason. However, lots more work
      will be required before we can gracefully recover (i.e., continue)
      one of those as any failure on the server side causes the boss scripts
      to tear down all the DB state or, leave the blockstore in a "failed"
      state that requires a great deal of manual effort for resurrection.
  2. 30 Sep, 2019 1 commit
    • Mike Hibler's avatar
      Implement an "on server" strategy for copying persistent datasets. · fce7c7c7
      Mike Hibler authored
      This is implemented as a variant of createdataset. If you do:
          createdataset -F pid/old pid/new
      It will create a new dataset, initializing it with the contents of old.
      The new dataset will of course have the same size, type, and filesystem type
      (if any). Right now the old and new both have to be in the same project, and
      new gets placed in the same pool on the same server (i.e., this is a local
      "zfs send | zfs recv" pipeline).
      Implementing copy as a variant of create will hopefully make it easy for
      Leigh in the portal interface as he doesn't have to treat it any different
      than a normal create: fire it off in the background and wait til the lease
      state becomes "valid".
      Since a copy could takes hours or even days, there are plenty of opportunities
      for failure that I have not considered too much yet, e.g., the storage server
      rebooting in the middle or boss rebooting in the middle. These things could
      happen already, but we have just made the window of opportunity much larger.
      Anyway, this mechanism can serve as the basis for creating persistent datasets
      from clones or other ephemeral datasets.
  3. 19 Sep, 2019 2 commits
  4. 16 Sep, 2019 1 commit
  5. 10 Sep, 2019 1 commit
  6. 29 Aug, 2019 1 commit
  7. 19 Aug, 2019 1 commit
  8. 12 Aug, 2019 1 commit
  9. 06 Aug, 2019 1 commit
  10. 01 Aug, 2019 1 commit
    • Leigh B Stoller's avatar
      Changes to the reservation to support reserving specific nodes: · f21a3123
      Leigh B Stoller authored
      A new flag in the nodes table marks a node as being independently
      reservable by the reservation system. In general, the reservation system
      treats the node type as an opaque string, so why not make it a node_id.
      The nodes table flag is used in various queries to distinguish between
      nodes that reserved as a type and nodes that are individually reserved.
      Everything else pretty much falls into place.
      Minor changes to mapper admission control to look for the use of a
      specific node that is reserved to someone else. Also minor changes in
      ptopgen to remove reserved nodes from the ptop file when they reserved
      to a different project.
  11. 26 Jun, 2019 1 commit
    • chuck cranor's avatar
      add power support for IBM BladeCenter chassis · f66517f5
      chuck cranor authored
      Add power support for the IBM BladeCenter chassis (power_ibmbc.pm).
      This is the chassis used in the old Roadrunner cluster at LANL.  Each
      chassis has 14 blades in it.  The management IP API is accessed from
      boss via ssh.  A ssh keypair should be setup to allow for passwordless
      ssh access.  We assume the admin has installed the keypair on boss
      (in /usr/testbed/etc/{ibmbc,ibmbc.pub}) and on each chassis for
      the standard "USERID" account.  The key files should be owned by
      an account like "operator" to avoid ssh complaining about key file
      permissions in some cases.
      The module will end up running commands like:
            ssh USERID@chassis-mm power -on -T 'blade[1]'
            ssh USERID@chassis-mm power -off -T 'blade[1]'
            ssh USERID@chassis-mm power -cycle -T 'blade[1]'
      (we'll add "-i /usr/testbed/etc/ibmbc" to "ssh" if the key file is present)
      using this requires the following "mysql tbdb" cmds on boss:
        one-time operation:
           insert into node_types (class,type) values ('power', 'ibmbc');
        per-chassis operations:
           # assumes that "rr1" is blade1 of chassis "bch1"
           insert into nodes (node_id,type,phys_nodeid,role,priority,status,
           values ('bch1', 'ibmbc', 'bch1', 'powerctrl', 10001, 'down',
                              'ISUP', 'NONE', 'FREE_DIRTY');
           # adds IP of the chassis management module
           insert into interfaces (node_id,IP,mask,interface_type,iface,role) values
                ('bch1', '', '','','eth0','other');
        per-blade operation:
           insert into outlets (node_id,power_id,outlet)
                  values ('rr1', 'bch1', 1);   # outlet 1==blade1, etc.
  12. 19 Jun, 2019 2 commits
    • Leigh B Stoller's avatar
    • Mike Hibler's avatar
      Further tweaks to jumbo frames code. · 571b4a14
      Mike Hibler authored
      Now use a sitevar, general/allowjumboframes, rather than MAINSITE
      to determine whether we should even attempt any jumbo frames magic.
      Use a per-link/lan setting rather than the hacky per-experiment
      setting to let the user decide if they want to use jumbos. In NS
      world, we already had a link/lan method (set-settings) to specify
      virt_lan_settings which is where it winds up now.
      Client-side fixes to make jumbos work with vnodes.
  13. 13 Jun, 2019 1 commit
  14. 12 Jun, 2019 1 commit
    • Leigh B Stoller's avatar
      Small set of changes for os_setup on sdr nodes. · 58f2b014
      Leigh B Stoller authored
      SDR nodes (type=sdr, but this applies to other similar types) are in the
      "pc" class, but really they are not pcs, they are more like blackboxes
      that can be power cycled and are always ISUP.
      So, I added a "sdr" package to libossetup, that basically just does a
      power cycle to put them into a known state, and makes sure the
      eventstate is ISUP.
      I added "blackbox" to the sdr type definition. Aside; when something is
      a blackbox, we should bypass all image/osinfo handling, but that's a
      tale for another day.
      I added a isblackbox() check in power, to skip any eventstate
      handling. Aside; node_reboot should possibly skip right to power cycle
      for blackbox nodes, instead of trying to ping it or ssh into it.
  15. 07 Jun, 2019 2 commits
  16. 04 Jun, 2019 1 commit
  17. 03 Jun, 2019 2 commits
  18. 01 Jun, 2019 1 commit
  19. 21 May, 2019 1 commit
  20. 07 May, 2019 1 commit
  21. 02 May, 2019 1 commit
  22. 30 Apr, 2019 1 commit
  23. 26 Apr, 2019 2 commits
  24. 15 Apr, 2019 2 commits
  25. 09 Apr, 2019 3 commits
  26. 29 Mar, 2019 1 commit
  27. 27 Mar, 2019 1 commit
  28. 25 Mar, 2019 1 commit
  29. 13 Mar, 2019 1 commit
  30. 06 Mar, 2019 3 commits