1. 08 Aug, 2017 2 commits
  2. 07 Jul, 2017 1 commit
    • Leigh B Stoller's avatar
      Deal with user privs (issue #309): · d1516912
      Leigh B Stoller authored
      * Make user privs work across remote clusters (including stitching). I
        took a severe shortcut on this; I do not expect the Cloudlab portal
        will ever talk to anything but an Emulab based aggregate, so I just
        added the priv indicator to the user keys array we send over. If I am
        ever proved wrong on this, I will come out of retirement and fix
        it (for a nominal fee of course).
      
      * Do not show the root password for the console to users with user
        privs.
      
      * Make sure users with user privs cannot start experiments.
      
      * Do show the user trust values on the user dashboard membership tab.
      
      * Update tmcd to use the new privs slot in the nonlocal_user_accounts
        table.
      
      This closes issue #309.
      d1516912
  3. 26 Jun, 2017 1 commit
  4. 30 May, 2017 1 commit
    • Leigh B Stoller's avatar
      Rework how we store the sliver/slice status from the clusters: · e5d36e0d
      Leigh B Stoller authored
      In the beginning, the number and size of experiments was small, and so
      storing the entire slice/sliver status blob as json in the web task was
      fine, even though we had to lock tables to prevent races between the
      event updates and the local polling.
      
      But lately the size of those json blobs is getting huge and the lock is
      bogging things down, including not being able to keep up with the number
      of events coming from all the clusters, we get really far behind.
      
      So I have moved the status blobs out of the per-instance web task and
      into new tables, once per slice and one per node (sliver). This keeps
      the blobs very small and thus the lock time very small. So now we can
      keep up with the event stream.
      
      If we grow big enough that this problem comes big enough, we can switch
      to innodb for the per-sliver table and do row locking instead of table
      locking, but I do not think that will happen
      e5d36e0d
  5. 04 May, 2017 1 commit
  6. 17 Apr, 2017 1 commit
  7. 03 Mar, 2017 1 commit
  8. 01 Mar, 2017 1 commit
  9. 28 Feb, 2017 1 commit
  10. 01 Feb, 2017 2 commits
    • Leigh B Stoller's avatar
      Checkpoint the portal side of frisbee events. · 2faf5fd1
      Leigh B Stoller authored
      The igevent_daemon now also forwards frisbee events for slices to the
      Portal pubsubd over the SSL channel.
      
      The aptevent_daemon gets those and adds them to sliverstatus stored in
      the webtask for the instance.
      
      The timeout code in create_instance watches for frisbee events and uses
      that as another indicator of progress (or lack of). The hope is that we
      fail sooner or avoid failing too soon (say cause of a giant image backed
      dataset).
      
      As an added bonus, the status page will display frisbee progress (image
      name and MB written) in the node status hover popver. I mention this
      cause otherwise I would go to my grave without anyone ever noticing and
      giving me pat on the back or a smiley face in Slack.
      2faf5fd1
    • Leigh B Stoller's avatar
      Another tweak to frisbee event code. · ff395072
      Leigh B Stoller authored
      ff395072
  11. 31 Jan, 2017 1 commit
  12. 25 Jan, 2017 1 commit
  13. 27 Dec, 2016 1 commit
  14. 19 Dec, 2016 2 commits
  15. 01 Dec, 2016 1 commit
  16. 29 Nov, 2016 1 commit
  17. 13 Nov, 2016 1 commit
  18. 12 Nov, 2016 2 commits
  19. 07 Nov, 2016 2 commits
    • Leigh B Stoller's avatar
      Some work on restarting (rebooting) nodes. Presently, there is a bit of · 18cdfa8b
      Leigh B Stoller authored
      an inconsistency in SliverAction(); when operating on the entire slice
      we do the whole thing in the background, returning (almost) immediately.
      Which makes sense, we expect the caller to poll for status after.
      
      But when operating on a subset of slivers (nodes), we do it
      synchronously, which means the caller is left waiting until we get
      through rebooting all the nodes. As David pointed out, when rebooting
      nodes in the openstack profile, this can take a long time as the VMs are
      torn down. This leaves the user looking at a spinner modal for a long
      time, which is not a nice UI feature.
      
      So I added a local option to do slivers in the background, and return
      immediately. I am doing the for restart and reload at the moment since
      that is primarily what we use from the Portal.
      
      Note that this has to push out to all clusters.
      18cdfa8b
    • Leigh B Stoller's avatar
  20. 06 Nov, 2016 1 commit
  21. 20 Sep, 2016 1 commit
  22. 11 Jul, 2016 3 commits
  23. 01 Jun, 2016 2 commits
    • Leigh B Stoller's avatar
      Kill debugging. · 5d960d84
      Leigh B Stoller authored
      5d960d84
    • Leigh B Stoller's avatar
      Several sets of changes scattered across all these files. · 0f4a4dfb
      Leigh B Stoller authored
      * More on issue #54; watch for openstack experiments and try to download
        the new openstack stats file via the fast XMLRPC path. Show this as a
        text blob in a new tab on the status page, still need to graph the data.
        The apt_daemon handles the periodic request for the data (every 10
        minutes), which we store in the apt_instances table.
      
      * Addition for Rob on the admin extend page; Add a "more info" button that
        sends the contents of the text box as an email message requesting more
        info and stores that in the ongoing interaction log. Responses from the
        user are not stored though, might look at that someday.
      
      * Another addition for Rob; on the extensions list page, also show expired,
        locked down experiments. Note the sorting; at the top of the list are
        actual extension request (status='ready') while the bottom of the list
        are status='expired'.
      
      * Add a "graphs" tab to the status page, which shows the same idle stats
        graphs that were added to the admin extend page. Most of this change is
        refactoring the code and sharing it between the two pages.
      0f4a4dfb
  24. 25 May, 2016 1 commit
  25. 18 May, 2016 1 commit
  26. 25 Apr, 2016 1 commit
  27. 12 Apr, 2016 1 commit
  28. 06 Apr, 2016 2 commits
  29. 26 Mar, 2016 1 commit
  30. 01 Mar, 2016 1 commit
    • Leigh B Stoller's avatar
      Some tweaks to credential handling: · 3ebffb34
      Leigh B Stoller authored
      1) Anytime we need to generate a slice credential, and the slice has
         expired, bump the slice expiration so we can create a valid credential
         and then reset the expiration. Consider if the slice expires but we
         missed it and its still active; we gotta be able to control it.
      
      2) From the beginning, we have done almost all RPC operations as the
         creator of the experiment. Made sense when the portal interface was not
         project aware, but now other users in the project can see and mess with
         experiments in their project. But we are still doing all the RPC
         operations as the creator of the experiment, which will need to change
         at some point, but in the short term I am seeing a lot of credential
         errors caused by an expired speaks-for credential for that creator (if
         they have not logged into the portal in a while). When this happens,
         lets generate a plain slice credential, issued to the SA, so that we can
         complete the operation. Eventually we have to make the backend project
         aware, and issue the operations as the web user doing the driving.
         Maybe as part of the larger portalization project.
      3ebffb34
  31. 17 Feb, 2016 1 commit