1. 08 Jul, 2019 2 commits
  2. 06 Jun, 2019 1 commit
  3. 23 May, 2019 1 commit
    • Leigh Stoller's avatar
      Changed related to parameter sets and experiment bindings: · 03e4d8bc
      Leigh Stoller authored
      * Show the parameter bindings on the status page for an experiment, and
        on the memlane page. This is strictly informational so that users can
        quickly see the parameters that are/were chosen at the time the
        experiment was created.
      
      * Add a Save Parameters button on the memlane and status pages. This
        will generate a json structure and store it in the DB for that profile
        and user. Optionally, mark the parameter set as specific to a profile
        version or repo hash, so a user can quickly link to that version/hash
        and apply the parameter set.
      
      * On the instantiate page, the parameters step include new buttons to
        1) reset the form to default, 2) apply the parameters used in the most
        recent experiment (current, then history), 3) choose from a dropdown
        of parameters the users has saved for that profile, and 4) take the
        user to their activation history for the profile, to pick one to run
        again or save parameters.
      
      * Add a new tab to the user dashboard to show the user's saved parameter
        sets.
      
      * Lots of changes to the new version of the ppwizard for apply
        parameter sets and showing warnings about them. This code has NOT been
        applied to the old ppwizard.
      03e4d8bc
  4. 02 Apr, 2019 1 commit
  5. 27 Mar, 2019 1 commit
  6. 18 Mar, 2019 1 commit
  7. 17 Mar, 2019 1 commit
  8. 14 Mar, 2019 1 commit
  9. 13 Mar, 2019 3 commits
  10. 06 Mar, 2019 2 commits
  11. 19 Feb, 2019 1 commit
  12. 12 Feb, 2019 1 commit
    • Leigh Stoller's avatar
      Recovery mode: · bde6c94d
      Leigh Stoller authored
      * Add a new Portal context menu option to nodes, to boot into "recovery"
        mode, which will be a Linux MFS (rather then the FreeBSD MFS, which
        99% of user will not know what to do with).
      
      * Plumb all through to the Geni RPC interface, which invokes node_admin
        with a new option, to use the recovery mfs nodetype attribute.
      
      * recoverymfs_osid is a distinct osid from adminmfs_osid, we use that in
        the CM to add an Emulab name space attribute to the manifest, that
        tells the Portal that a node supports recovery mode (and thus gets a
        context menu option).
      
      * Add an inrecovery flag to the sliver status blob, which the Portal
        uses to determine that a node is currently in recovery mode, so that
        we can indicate that in the topology and list tabs.
      bde6c94d
  13. 28 Sep, 2018 1 commit
  14. 08 Aug, 2018 1 commit
    • Leigh Stoller's avatar
      Big set of changes for deferred/scheduled/offline aggregates: · 6f17de73
      Leigh Stoller authored
      * I started out to add just deferred aggregates; those that are offline
        when starting an experiment (and marked in the apt_aggregates table as
        being deferable). When an aggregate is offline, we add an entry to the
        new apt_deferred_aggregates table, and periodically retry to start the
        missing slivers. In order to accomplish this, I split create_instance
        into two scripts, first part to create the instance in the DB, and the
        second (create_slivers) to create slivers for the instance. The daemon
        calls create_slivers for any instances in the deferred table, until
        all deferred aggregates are resolved.
      
        On the UI side, there are various changes to deal with allowing
        experiments to be partially create. For example used to wait till we
        have all the manifests until showing the topology. Now we show the
        topo on the first manifest, and then add them as they come in. Various
        parts of the UI had to change to deal with missing aggregates, I am
        sure I did not get them all.
      
      * And then once I had that, I realized that "scheduled" experiments was
        an "easy" addition, its just a degenerate case of deferred. For this I
        added some new slots to the tables to hold the scheduled start time,
        and added a started stamp so we can distinguish between the time it
        was created and the time it was actually started. Lots of data.
      
        On the UI side, there is a new fourth step on the instantiate page to
        give the user a choice of immediate or scheduled start. I moved the
        experiment duration to this step. I was originally going to add a
        calendar choice for termination, but I did not want to change the
        existing 16 hour max duration policy, yet.
      6f17de73
  15. 07 Aug, 2018 1 commit
  16. 16 Jul, 2018 1 commit
    • Leigh Stoller's avatar
      Image handling changes: · fe8cc493
      Leigh Stoller authored
      1. The primary change is to the Create Image modal; we now allow users
         to optionally specify a description for the image. This needed to be
         plumbed through all the way to the GeniCM CreateImage() API. Since
         the modal is getting kinda overloaded, I rearranged things a bit and
         changed the argument checking and error handling. I think this is the
         limit of what we want to do on this modal, need a better UI in the
         future.
      
      2. Of course, if we let users set descriptions, lets show them on the
         image listing page. While I was there, I made the list look more like
         the classic image list; show the image name and project, and put the
         URN in a tooltip, since in general the URN is noisy to look at.
      
      3. And while I was messing with the image listing, I noticed that we
         were not deleting profiles like we said we would. The problem is that
         when we form the image list, we know the profile versions that can be
         deleted, but when the user actually clicks to delete, I was trying to
         regen that decision, but without asking the cluster for the info
         again. So instead, just pass through the version list from the web
         UI.
      fe8cc493
  17. 18 Jun, 2018 1 commit
  18. 04 Jun, 2018 1 commit
    • David Johnson's avatar
      Docker server-side core, esp new libimageops support for Docker images. · 66366489
      David Johnson authored
      The docker VM server-side goo is mostly identical to Xen, with slightly
      different handling for parent images.  We also support loading external
      Docker images (i.e. those without a real imageid in our DB; in that
      case, user has to set a specific stub image, and some extra per-vnode
      metadata (a URI that points to a Docker registry/image repo/tag);
      the Docker clientside handles the rest.
      
      Emulab Docker images map to a Emulab imageid:version pretty seamlessly.
      For instance, the Emulab `emulab-ops/docker-foo-bar:1` image would map
      to `<local-registry-URI>/emulab-ops/emulab-ops/docker-foo-bar:1`; the
      mapping is `<local-registry-URI>/pid/gid/imagename:version`.  Docker
      repository names are lowercase-only, so we handle that for the user; but
      I would prefer that users use lowercase Emulab imagenames for all Docker
      images; that will help us.  That is not enforced in the code; it will
      appear in the documentation, and we'll see.
      
      Full Docker imaging relies on several other libraries
      (https://gitlab.flux.utah.edu/emulab/pydockerauth,
      https://gitlab.flux.utah.edu/emulab/docker-registry-py).  Each
      Emulab-based cluster must currently run its own private registry to
      support image loading/capture (note however that if capture is
      unnecessary, users can use the external images path instead).  The
      pydockerauth library is a JWT token server that runs out of boss's
      Apache and implements authn/authz for the per-Emulab Docker registry
      (probably running on ops, but could be anywhere) that stores images and
      arbitrates upload/download access.  For instance, nodes in an experiment
      securely pull images using their pid/eid eventkey; and the pydockerauth
      emulab authz module knows what images the node is allowed to pull
      (i.e. sched_reloads, the current image the node is running, etc).  Real
      users can also pull images via user/pass, or bogus user/pass + Emulab
      SSL cert.  GENI credential-based authn/z was way too much work, sadly.
      There are other auth/z paths (i.e. for admins, temp tokens for secure
      operations) as well.
      
      As far as Docker image distribution in the federation, we use the same
      model as for regular ndz images.  Remote images are pulled in to the
      local cluster's Docker registry on-demand from their source cluster via
      admin token auth (note that all clusters in the federation have
      read-only access to the entire registries of any other cluster in the
      federation, so they can pull images).  Emulab imageid handling is the
      same as the existing ndz case.  For instance, image versions are lazily
      imported, on-demand; local version numbers may not match the remote
      image source cluster's version numbers.  This will potentially be a
      bigger problem in the Docker universe; Docker users expect to be able to
      reference any image version at any time anywhere.  But that is of course
      handleable with some ex post facto synchronization flag day, at least
      for the Docker images.
      
      The big new thing supporting native Docker image usage is the guts of a
      refactor of the utils/image* scripts into a new library, libimageops;
      this is necessary to support Docker images, which are stored in their
      own registry using their own custom protocols, so not amenable to our
      file-based storage.  Note: the utils/image* scripts currently call out
      to libimageops *only if* the image format is docker; all other images
      continue on the old paths in utils/image*, which all still remain
      intact, or minorly-changed to support libimageops.
      
      libimageops->New is the factory-style mechanism to get a libimageops
      that works for your image format or node type.  Once you have a
      libimageops instance, you can invoke normal image logical operations
      (CreateImage, ImageValidate, ImageRelease, et al).  I didn't do every
      single operation (for instance, I haven't yet dealt with image_import
      beyond essentially generalizing DownLoadImage by image format).
      Finally, each libimageops is stateless; another design would have been
      some statefulness for more complicated operations.   You will see that
      CreateImage, for instance, is written in a helper-subclass style that
      blurs some statefulness; however, it was the best match for the existing
      body of code.  We can revisit that later if the current argument-passing
      convention isn't loved.
      
      There are a couple outstanding issues.  Part of the security model here
      is that some utils/image* scripts are setuid, so direct libimageops
      library calls are not possible from a non-setuid context for some
      operations.  This is non-trivial to resolve, and might not be worthwhile
      to resolve any time soon.  Also, some of the scripts write meaningful,
      traditional content to stdout/stderr, and this creates a tension for
      direct library calls that is not entirely resolved yet.  Not hard, just
      only partly resolved.
      
      Note that tbsetup/libimageops_ndz.pm.in is still incomplete; it needs
      imagevalidate support.  Thus, I have not even featurized this yet; I
      will get to that as I have cycles.
      66366489
  19. 20 Feb, 2018 1 commit
  20. 16 Feb, 2018 2 commits
    • Leigh Stoller's avatar
    • Leigh Stoller's avatar
      A lot of work on the RPC code, among other things. · 56f6d601
      Leigh Stoller authored
      I spent a fair amount of improving error handling along the RPC path,
      as well making the code more consistent across the various files. Also
      be more consistent in how the web interface invokes the backend and gets
      errors back, specifically for errors that are generated when taking to a
      remote cluster.
      
      Add checks before every RPC to make sure the cluster is not disabled in
      the database. Also check that we can actually reach the cluster, and
      that the cluster is not offline (NoLogins()) before we try to do
      anything. I might have to relax this a bit, but in general it takes a
      couple of seconds to check, which is a small fraction of what most RPCs
      take. Return precise errors for clusters that are not available, to the
      web interface and show them to user.
      
      Use webtasks more consistently between the web interface and backend
      scripts. Watch specifically for scripts that exit abnormally (exit
      before setting the exitcode in the webtask) which always means an
      internal failure, do not show those to users.
      
      Show just those RPC errors that would make sense users, stop spewing
      script output to the user, send it just to tbops via the email that is
      already generated when a backend script fails fatally.
      
      But do not spew email for clusters that are not reachable or are
      offline. Ditto for several other cases that were generating mail to
      tbops instead of just showing the user a meaningful error message.
      
      Stop using ParRun for single site experiments; 99% of experiments.
      
      For create_instance, a new "async" mode that tells CreateSliver() to
      return before the first mapper run, which is typically very quickly.
      Then watch for errors or for the manifest with Resolve or for the slice
      to disappear. I expect this to be bounded and so we do not need to worry
      so much about timing this wait out (which is a problem on very big
      topologies). When we see the manifest, the RedeemTicket() part of the
      CreateSliver is done and now we are into the StartSliver() phase.
      
      For the StartSliver phase, watch for errors and show them to users,
      previously we mostly lost those errors and just sent the experiment into
      the failed state. I am still working on this.
      56f6d601
  21. 22 Jan, 2018 3 commits
  22. 01 Jan, 2018 1 commit
  23. 11 Dec, 2017 2 commits
    • Leigh Stoller's avatar
      Minor wording change. · 0e063f87
      Leigh Stoller authored
      0e063f87
    • Leigh Stoller's avatar
      Add extension limiting to manage_extensions and request extension paths. · f71d7d95
      Leigh Stoller authored
      The limit is the number of hours since the experiment is created, so a
      limit of 10 days really just means that experiments can not live past 10
      days. I think this makes more sense then anything else. There is an
      associated flag with extension limiting that controls whether the user
      can even request another extension after the limit. The normal case is
      that the user cannot request any more extensions, but when set, the user
      is granted no free time and goes through need admin approval path.
      
      Some changes to the email, so that both the user and admin email days
      how many days/hours were both requested and granted.
      
      Also UI change; explicitly tell the user when extensions are disabled,
      and also when no time is granted (so that the users is more clearly
      aware).
      f71d7d95
  24. 04 Dec, 2017 2 commits
    • Leigh Stoller's avatar
      Extension policy changes: · bd7d9d05
      Leigh Stoller authored
      * New tables to store policies for users and projects/groups. At the
        moment, there is only one policy (with associated reason); disabled.
        This allows us to mark projects/groups/users with enable/disable
        flags. Note that policies are applied consecutively, so you can
        disable extensions for a project, but enable them for a user in that
        project.
      
      * Apply extensions when experiments are created, send mail to the audit
        log when policies cause extensions to be disabled.
      
      * New driver script (manage_extensions) to change the policy tables.
      bd7d9d05
    • Leigh Stoller's avatar
      Changes related to extensions: · e1b6076f
      Leigh Stoller authored
      * Change the units of extension from days to hours along the extension
        path. The user does not see this directly, but it allows us to extend
        experiments to the hour before they are needed by a different
        reservation, both on the user extend modal and the admin extend modal.
      
        On the admin extend page, the input box still defaults to days, but
        you can also use xDyH to specify days and hours. Or just yH for just
        hours.
      
        But to make things easier, there is also a new "max" checkbox to
        extend an experiment out to the maximum allowed by the reservation
        system.
      
      * Changes to "lockout" (disabling extensions). Add a reason field to the
        database, clicking the lockout checkbox will prompt for an optional
        reason.
      
        The user no longer sees the extension modal when extensions are
        disabled, we show an alert instead telling them extensions are
        disabled, and the reason.
      
        On the admin extend page there is a new checkbox to disable extensions
        when denying an extension or scheduling termination.
      
        Log extension disable/enable to the audit log.
      
      * Clear out a bunch of old extension code that is no longer used (since
        the extension code was moved from php to perl).
      e1b6076f
  25. 20 Nov, 2017 1 commit
  26. 03 Nov, 2017 1 commit
    • Leigh Stoller's avatar
      Fixes/Changes for reservations: · 79d99fa8
      Leigh Stoller authored
      1. Fix the user extend modal to show the proper number of days they can
         extend.
      
      2. Fix the admin extend modal warning when the extension would violate
         max extension, it was not showing. Add new alerts when we cannot get
         max extension from the cluster or no extension at all allowed.
      
      3. Reduce number of days in the box to max allowed. Warn loudly if you
         type a different number and its greater then max extension.
      
      4. Add "force" box to override. Use with caution. Added the plumbing
         through to the back end as new force option to RenewSliver().
      
      5. Add check in RenewSliver() to ask the reservation system if extension
         allowed before doing it. This was missing, should solve some of the
         over book problems.
      79d99fa8
  27. 13 Oct, 2017 1 commit
    • Leigh Stoller's avatar
      Changes for automatic lockdown of experiments: · 8f4e3191
      Leigh Stoller authored
      1. First off, we no longer do automatic lockdown of experiments when
         granting an extension longer then 10 days.
      
      2. Instead, we will lockdown experiments on case by case basis.
      
      3. Changes to the lockdown path that ask the reservation system at the
         target cluster if locking down would throw the reservation system
         into chaos. If so, return a refused error and give admin the choice
         to override. When we do override, send email to local tbops informing
         that the reservation system is in chaos state.
      8f4e3191
  28. 04 Oct, 2017 1 commit
  29. 08 Aug, 2017 2 commits
  30. 07 Jul, 2017 1 commit
    • Leigh Stoller's avatar
      Deal with user privs (issue #309): · d1516912
      Leigh Stoller authored
      * Make user privs work across remote clusters (including stitching). I
        took a severe shortcut on this; I do not expect the Cloudlab portal
        will ever talk to anything but an Emulab based aggregate, so I just
        added the priv indicator to the user keys array we send over. If I am
        ever proved wrong on this, I will come out of retirement and fix
        it (for a nominal fee of course).
      
      * Do not show the root password for the console to users with user
        privs.
      
      * Make sure users with user privs cannot start experiments.
      
      * Do show the user trust values on the user dashboard membership tab.
      
      * Update tmcd to use the new privs slot in the nonlocal_user_accounts
        table.
      
      This closes issue #309.
      d1516912