    Leigh B Stoller's avatar
      Anytime the state for a slice or sliver changes, inject a geni style event · bc8afd40
      Leigh B Stoller authored
      into the local event stream. These events are different then normal emulab
      events in that the SITE is set to the URN of the aggregate, and there is
      json representation of the slice/sliver status. There are other fields as
      well that are not in normal emulab events. These events can mix okay with
      emulab events on the local boss, nothing will care about them. But they
      will get forwarded to pubsubd at the portal if CLUSTER_PORTAL is set in the
      defs file.
    Leigh B Stoller's avatar
      Add cancel support. The idea is that a DeleteSlice() with our internal · 5bd9ad1a
      Leigh B Stoller authored
      cancel option, will stop a CreateSliver() in its tracks. We stop the
      monitor, then cleanup the slice. I also added an optimization for tearing
      down large numbers of VMs on shared nodes, previously we were doing them
      one at a time. Note that only the Portal is going to use this option, since
      it loosely depends on code in the XEN clientside (described in another
    Leigh B Stoller's avatar
      Change how all geni images are imported by url; always import as geniuser · 0c653722
      Leigh B Stoller authored
      and always import into the GeniSlices project. Previously, images were
      being imported into the project of the slice experiment, by the geniuser.
      When PROTOGENI_LOCALUSER is turned off, this change does not affect
      anything, since it is still geniuser doing the import, and all imported
      images are consider global and thus cross-project usable. So where we stick
      the image is not really important, but putting all geni imported images in
      one place is more convenient (sure makes it easier to find them). But more
      important, this change is backwards compatible with existing imports. 
      Later, if the source image is updated, and a new user (in another project)
      uses that image, the update (pulling the updated image scross) is done by
      geniuser (who is the leader of all geni holding projects), who has write
      access to the image whatever project it is in. 
      What about when PROTOGENI_LOCALUSER is turned on? There are actually two
      sub cases here.
      1. The user is using an aggregate in a different domain then their SA. Say,
         when a Cloudlab Portal user is creating an experiment at the Clemson
         cluster (which has PROTOGENI_LOCALUSER=1). In this case, clemson does
         not know anything about the user anyway, and so its pretty much like the
         case described above since everything is done by the geniuser in holding
         projects owned by the geniuser.
      2. The user is using the same aggregate as their SA. Say, when a Cloudlab
         Portal user is creating an experiment at the Emulab cluster. In this
         case Emulab knows the user and project, and everything is done as that
         user in the actual project (there is no geni holding project).
         If we import the image into that project as the actual user, we are okay
         at first; as above, all images are global and cross-project, so anyone
         can use it. But what if the source image changes and then a different
         user in a different project tries to use it? The backend is going to try
         to import the new version, but that fails cause the current user does
         not have write access to the image.
         Hence the real reason for this change; if always import into GeniSlices
         as geniuser, we do not get into this permission problem.
    Leigh B Stoller's avatar
      Add AddNodes and DeleteNodes, which are convenience functions for the HPC · c3339c9d
      Leigh B Stoller authored
      AddNodes($slice_urn, $credentials, $nodes):
      The "nodes" argument is a hash that looks like:
        {"node45" : {"diskimage" : "urn...",
                     "startup"   : "/bin/echo",
                     "tarballs"  : ["tarball1", "tarball2", ...],
                     "lans"      : ["lan1", "lan2", ...]
                     "node"      : "pc189"},
         "nodeXX" : {...}}
      DeleteNodes($slice_urn, $credentials, $nodes):
      The "nodes" argument is a list like:
        ["node45", ...]
      Any node can be deleted, but it is not yet clear what happens if all the
      nodes of a lan are removed. I probably need to do some work there, but
      David can start with this.
    Leigh B Stoller's avatar
      Add a new table image_boot_status to record boot success/failure each time · 4fa9d2ea
      Leigh B Stoller authored
      an image is loaded on a node. We want to know both success and failure over
      time so that we can determine when a image works or does not work on a
      particular node/type. This is primarily for the image tracker to determine
      what images work on what node types, but might be useful for in other
      situations. I realize this duplicates some info we already have in the
      image_history table, but that does not record failure, only success, and it
      mostly concerned with who is using what images.
    Leigh B Stoller's avatar
      Latest attempt to improve vnode booting. See below. · 850b5ab7
      Leigh B Stoller authored
      1. Change hackwaitandexit on the client, to return zero if the guest
         has not finished setting up. We used to treat 30 seconds as too
         long must have failed, but this is really not the case, especially
         on busy machines.
      2. Fix up vnode_setup exit code handling, we were losing non-zero
         status cause of not shifting it down, and so failures were never
         being reported.
         New: If the vnode setup does return failure, set its event state to
         TBFAILED to cut short the wait in os_setup and the IG monitor
         process. On the surface this seems like an obviously good idea, but
         I'm sure it will come and bite me when I least expect it.
      3. Change GeniAggregate Start/Restart to ignore vnode_setup failures,
         and let the monitor watch for TBFAILED or timeout. There are just
         too many ways for it to fail, and we want to allow vnodes that did
         not fail to set up normally, and give the user the choice to
         restart the ones that failed.
      4. Don't let frisbee run forever, protect with timeout. I need to use
         Mike's new -T option, but not till I actually get new frisbee
         pushed out.
    Leigh B Stoller's avatar
      I added two new actions to PerformOperationalAction, which appear to · cfd1974a
      Leigh B Stoller authored
      work fine when the nodes are behaving themselves.
      1) geni_update_users: Takes a slice credential and a keys argument. Can
        only be invoked when the sliver is in the started/geni_ready state.
        Moves the slice to the geni_updating_users state until all of the
        nodes have completed the update, at which time the sliver moves back
        to started/geni_ready.
      2) geni_updating_users_cancel: We can assume that some nodes will be whacky
        and will not perform the update when told to. This cancels the
        update and moves the sliver back to started/geni_ready.
      A couple of notes:
      * The current emulab node update time is about three minutes; the
        sliver is in this new state for that time and cannot be restarted or
        stopped. It can of course be deleted.
      * Should we allow restart while in the updating phase? We could, but
        then I need more bookkeeping.
      * Some nodes might not be running the watch dog, or might not even be
        an emulab image, so the operation will never end, not until
        canceled. I could add a timeout, but that will require a monitor or
        adding DB state to store the start time.
    Leigh B Stoller's avatar
      Implement speaksfor (non-abac) support. · 8d53b3fd
      Leigh B Stoller authored
      CM V2 (and thus the AM) now accept a type=speaksfor credential along
      with regular credentials. When supplied, the speaksfor caller must be
      equal to the owner of the speaksfor credential and the target must be
      equal to the owner of the regular credential(s). All operations take
      place in the context of the spokenfor user.
      Added speaksfor slots to geni_slices,geni_aggregates and geni_tickets.
      Also to the history table. But these are just the most recent data.
      Each transaction is logged as normal, and the metadata now includes
      the speaksfor data and the log always includes all of the credentials.
      For testing, there is a new script in the scripts directory to
      generate a speaksfor credential. Not installed since it is really
      a hack. But to create one:
        perl genspeaksfor urn:publicid:IDN+emulab.net+user+leebee \
      which generates a speaksfor credential that says stoller is speaking
      for leebee.
      Given a slice credential issued to leebee, the test scripts can be
      invoked as follows (by stoller):
        createsliver.py -S speaksfor.cred -s slice.cred -c leebee.cred
      A copy of leebee's self credential is needed simply cause of the test
      script's desire to talk to the SA (which does not support speaksfor).
      Not otherwise needed.
      Oh, not tested on the AM interface yet.
