1. 12 Feb, 2019 1 commit
    • Leigh Stoller's avatar
      Recovery mode: · bde6c94d
      Leigh Stoller authored
      * Add a new Portal context menu option to nodes, to boot into "recovery"
        mode, which will be a Linux MFS (rather then the FreeBSD MFS, which
        99% of user will not know what to do with).
      
      * Plumb all through to the Geni RPC interface, which invokes node_admin
        with a new option, to use the recovery mfs nodetype attribute.
      
      * recoverymfs_osid is a distinct osid from adminmfs_osid, we use that in
        the CM to add an Emulab name space attribute to the manifest, that
        tells the Portal that a node supports recovery mode (and thus gets a
        context menu option).
      
      * Add an inrecovery flag to the sliver status blob, which the Portal
        uses to determine that a node is currently in recovery mode, so that
        we can indicate that in the topology and list tabs.
      bde6c94d
  2. 08 Feb, 2019 2 commits
  3. 04 Feb, 2019 2 commits
  4. 02 Jan, 2019 1 commit
  5. 13 Dec, 2018 1 commit
  6. 06 Dec, 2018 1 commit
  7. 30 Nov, 2018 1 commit
  8. 28 Nov, 2018 1 commit
  9. 16 Nov, 2018 1 commit
  10. 07 Nov, 2018 1 commit
    • Leigh Stoller's avatar
      Quick fix for watchdog/backup interaction; use a script lock. · 72b4ba32
      Leigh Stoller authored
      From Slack:
      
      What I notice is that mysqldump is read locking all of the tables for a
      long time. This time gets longer and longer of course as the DB gets
      bigger. Last night enough stuff backed up (trying to get various write
      locks) that we hit the 500 thread limit. I only know this cause mysql
      prints "killing 501" threads at 2:03am. Which makes me wonder if our
      thread limit is too small (but seems like it would have to be much
      bigger) or if our backup strategy is inappropriate for how big the DB is
      and how busy the system is. But to be clear, I am not even sure if
      mysqld throws in the towel when it hits 500 threads, I am in the midst
      of reading obtuse mysql documentation. (edited) There a bunch of other
      error messages that I do not understand yet.
      
      I can reproduce this in my elabinelab with a 10 line perl script. Two
      problems; one is that we do not use the permission system, so we cannot
      use dynamic permissions, which means that the single thread that is left
      for just this case, can be used by anyone, and so the server is fully
      out of threads. And 2) then the Emulab mysql watchdog cannot perform its
      query, and so it thinks mysqld has gone catatonic and kills it, right in
      the middle of the backup. Yuck * 2. (edited)
      
      And if anyone is curious about a more typical approach: "If you want to
      do this for MyISAM or mixed tables without any downtime from locking the
      tables, you can set up a slave database, and take your snapshots from
      there. Setting up the slave database, unfortunately, causes some
      downtime to export the live database, but once it's running, you should
      be able to lock it's tables, and export using the methods others have
      described. When this is happening, it will lag behind the master, but
      won't stop the master from updating it's tables, and will catch up as
      soon as the backup is complete"
      72b4ba32
  11. 05 Nov, 2018 1 commit
  12. 30 Oct, 2018 1 commit
  13. 29 Oct, 2018 1 commit
  14. 23 Oct, 2018 1 commit
  15. 11 Oct, 2018 1 commit
  16. 24 Sep, 2018 1 commit
    • David Johnson's avatar
      Fix an IsFeasible bug where start and end of different reservations overlap. · a7089186
      David Johnson authored
      When IsFeasible processes the list of events (i.e. reservation
      start/end, expt start/end), it processes them in sorted order of event
      time, but if times are equal, there is no secondary sort, and thus the
      additive (incoming) reservation might be processed before the reductive
      (outgoing) reservation), which would create a false negative hole in the
      forecast.  This commit adds the secondary sort.
      a7089186
  17. 12 Sep, 2018 1 commit
  18. 29 Aug, 2018 3 commits
  19. 30 Jul, 2018 2 commits
  20. 20 Jul, 2018 1 commit
  21. 16 Jul, 2018 1 commit
  22. 12 Jul, 2018 1 commit
  23. 09 Jul, 2018 1 commit
  24. 18 Jun, 2018 1 commit
    • Leigh Stoller's avatar
      Add automated cancellation of reservations that are not used: · 67a0e58e
      Leigh Stoller authored
      * If unused at six hours, schedule for cancel in three hours and send
        email.
      
      * If reservation becomes used within those three hours, rescind the
        cancellation.
      
      * Add an override bit so that cancel/uncancel on the command line
        supercedes (so explicit cancel or rescinding a cancel, means do not
        make any more automated checks for unused).
      
      * Rework cancel to be more library friendly.
      67a0e58e
  25. 11 Jun, 2018 1 commit
  26. 08 Jun, 2018 1 commit
  27. 07 Jun, 2018 1 commit
  28. 04 Jun, 2018 2 commits
    • David Johnson's avatar
      Docker server-side core, esp new libimageops support for Docker images. · 66366489
      David Johnson authored
      The docker VM server-side goo is mostly identical to Xen, with slightly
      different handling for parent images.  We also support loading external
      Docker images (i.e. those without a real imageid in our DB; in that
      case, user has to set a specific stub image, and some extra per-vnode
      metadata (a URI that points to a Docker registry/image repo/tag);
      the Docker clientside handles the rest.
      
      Emulab Docker images map to a Emulab imageid:version pretty seamlessly.
      For instance, the Emulab `emulab-ops/docker-foo-bar:1` image would map
      to `<local-registry-URI>/emulab-ops/emulab-ops/docker-foo-bar:1`; the
      mapping is `<local-registry-URI>/pid/gid/imagename:version`.  Docker
      repository names are lowercase-only, so we handle that for the user; but
      I would prefer that users use lowercase Emulab imagenames for all Docker
      images; that will help us.  That is not enforced in the code; it will
      appear in the documentation, and we'll see.
      
      Full Docker imaging relies on several other libraries
      (https://gitlab.flux.utah.edu/emulab/pydockerauth,
      https://gitlab.flux.utah.edu/emulab/docker-registry-py).  Each
      Emulab-based cluster must currently run its own private registry to
      support image loading/capture (note however that if capture is
      unnecessary, users can use the external images path instead).  The
      pydockerauth library is a JWT token server that runs out of boss's
      Apache and implements authn/authz for the per-Emulab Docker registry
      (probably running on ops, but could be anywhere) that stores images and
      arbitrates upload/download access.  For instance, nodes in an experiment
      securely pull images using their pid/eid eventkey; and the pydockerauth
      emulab authz module knows what images the node is allowed to pull
      (i.e. sched_reloads, the current image the node is running, etc).  Real
      users can also pull images via user/pass, or bogus user/pass + Emulab
      SSL cert.  GENI credential-based authn/z was way too much work, sadly.
      There are other auth/z paths (i.e. for admins, temp tokens for secure
      operations) as well.
      
      As far as Docker image distribution in the federation, we use the same
      model as for regular ndz images.  Remote images are pulled in to the
      local cluster's Docker registry on-demand from their source cluster via
      admin token auth (note that all clusters in the federation have
      read-only access to the entire registries of any other cluster in the
      federation, so they can pull images).  Emulab imageid handling is the
      same as the existing ndz case.  For instance, image versions are lazily
      imported, on-demand; local version numbers may not match the remote
      image source cluster's version numbers.  This will potentially be a
      bigger problem in the Docker universe; Docker users expect to be able to
      reference any image version at any time anywhere.  But that is of course
      handleable with some ex post facto synchronization flag day, at least
      for the Docker images.
      
      The big new thing supporting native Docker image usage is the guts of a
      refactor of the utils/image* scripts into a new library, libimageops;
      this is necessary to support Docker images, which are stored in their
      own registry using their own custom protocols, so not amenable to our
      file-based storage.  Note: the utils/image* scripts currently call out
      to libimageops *only if* the image format is docker; all other images
      continue on the old paths in utils/image*, which all still remain
      intact, or minorly-changed to support libimageops.
      
      libimageops->New is the factory-style mechanism to get a libimageops
      that works for your image format or node type.  Once you have a
      libimageops instance, you can invoke normal image logical operations
      (CreateImage, ImageValidate, ImageRelease, et al).  I didn't do every
      single operation (for instance, I haven't yet dealt with image_import
      beyond essentially generalizing DownLoadImage by image format).
      Finally, each libimageops is stateless; another design would have been
      some statefulness for more complicated operations.   You will see that
      CreateImage, for instance, is written in a helper-subclass style that
      blurs some statefulness; however, it was the best match for the existing
      body of code.  We can revisit that later if the current argument-passing
      convention isn't loved.
      
      There are a couple outstanding issues.  Part of the security model here
      is that some utils/image* scripts are setuid, so direct libimageops
      library calls are not possible from a non-setuid context for some
      operations.  This is non-trivial to resolve, and might not be worthwhile
      to resolve any time soon.  Also, some of the scripts write meaningful,
      traditional content to stdout/stderr, and this creates a tension for
      direct library calls that is not entirely resolved yet.  Not hard, just
      only partly resolved.
      
      Note that tbsetup/libimageops_ndz.pm.in is still incomplete; it needs
      imagevalidate support.  Thus, I have not even featurized this yet; I
      will get to that as I have cycles.
      66366489
    • Leigh Stoller's avatar
      Minor changes. · ed2dce21
      Leigh Stoller authored
      ed2dce21
  29. 01 Jun, 2018 1 commit
  30. 31 May, 2018 2 commits
  31. 30 May, 2018 3 commits