1. 20 Jun, 2017 1 commit
  2. 12 Jun, 2017 1 commit
  3. 06 Jun, 2017 2 commits
  4. 05 Jun, 2017 1 commit
  5. 30 May, 2017 1 commit
    • Leigh B Stoller's avatar
      Rework how we store the sliver/slice status from the clusters: · e5d36e0d
      Leigh B Stoller authored
      In the beginning, the number and size of experiments was small, and so
      storing the entire slice/sliver status blob as json in the web task was
      fine, even though we had to lock tables to prevent races between the
      event updates and the local polling.
      
      But lately the size of those json blobs is getting huge and the lock is
      bogging things down, including not being able to keep up with the number
      of events coming from all the clusters, we get really far behind.
      
      So I have moved the status blobs out of the per-instance web task and
      into new tables, once per slice and one per node (sliver). This keeps
      the blobs very small and thus the lock time very small. So now we can
      keep up with the event stream.
      
      If we grow big enough that this problem comes big enough, we can switch
      to innodb for the per-sliver table and do row locking instead of table
      locking, but I do not think that will happen
      e5d36e0d
  6. 16 May, 2017 1 commit
  7. 04 May, 2017 1 commit
  8. 02 May, 2017 1 commit
    • Leigh B Stoller's avatar
      Speed up the instantiate page response time, it was taking forever! · af8cc34f
      Leigh B Stoller authored
      1. Okay, 10-15 seconds for me, which is the same as forever.
      
      2. Do not sort in PHP, sort in javascript, let the client burn those
         cycles instead of poor overworked boss.
      
      3. Store global lastused/usecount in the apt_profiles table so that we
         do not have to compute it every time for profile.
      
      4. Compute the user's lastused/usecount for each profile in a single
         query and create local array. Cuts out 100s of queries.
      af8cc34f
  9. 19 Apr, 2017 1 commit
  10. 17 Apr, 2017 1 commit
  11. 22 Mar, 2017 1 commit
  12. 17 Mar, 2017 1 commit
  13. 07 Mar, 2017 2 commits
  14. 03 Mar, 2017 1 commit
    • Leigh B Stoller's avatar
      Allow the gid in a group_policy to be '*', for example: · f61bab43
      Leigh B Stoller authored
      +---------+---------+---------+---------+--------+---------------+--------+
      | pid     | gid     | pid_idx | gid_idx | policy | auxdata       | count  |
      +---------+---------+---------+---------+--------+---------------+--------+
      | testbed | *       |   10345 |       0 | type   | d430          |     10 |
      
      which says to apply the policy to all subgroups, using the current
      count for the project.
      f61bab43
  15. 27 Feb, 2017 1 commit
  16. 22 Feb, 2017 1 commit
  17. 10 Feb, 2017 2 commits
  18. 06 Feb, 2017 1 commit
  19. 27 Jan, 2017 1 commit
  20. 19 Jan, 2017 2 commits
  21. 09 Jan, 2017 2 commits
  22. 06 Jan, 2017 2 commits
  23. 04 Jan, 2017 3 commits
  24. 27 Dec, 2016 1 commit
  25. 19 Dec, 2016 3 commits
  26. 15 Dec, 2016 1 commit
  27. 08 Dec, 2016 1 commit
  28. 07 Dec, 2016 1 commit
  29. 12 Nov, 2016 2 commits
    • Leigh B Stoller's avatar
      Minor tweak to make schemacheck happy. · 459fce68
      Leigh B Stoller authored
      459fce68
    • Leigh B Stoller's avatar
      Bring the cluster monitor "inhouse", rather then depending on the jfed · d7c4230e
      Leigh B Stoller authored
      monitoring system.
      
      New portal_monitor daemon does a GetVersion/ListResources call at each
      of the clusters every five minutes, and updates the new table in the
      DB called apt_aggregate_status. We calculate free/inuse counts for
      physical nodes and a free count for VMs. Failure to contact the
      aggregate for more then 10 minutes sets the aggregate as down, since
      from our perspective if we cannot get to it, the cluster is down.
      
      Unlike the jfed monitoring system, we are not going to try to
      instantiate a new experiment or ssh into it. Wait and see if that is
      necessary in our context.
      
      On the instantiate page, generate a json structure for each cluster,
      similar the one described in issue #172 by Keith. This way we can easily
      switch the existing code over to this new system, but fail back to the
      old mechanism if this turn out to be a bust.
      
      Some other related changes to how we hand cluster into the several web
      pages.
      d7c4230e