1. 14 Dec, 2018 1 commit
  2. 13 Dec, 2018 1 commit
  3. 30 Nov, 2018 2 commits
  4. 28 Nov, 2018 1 commit
  5. 08 Nov, 2018 1 commit
  6. 26 Oct, 2018 1 commit
    • Leigh Stoller's avatar
      Changes to repo based profiles: · c40bf355
      Leigh Stoller authored
      * Respect default branch at the origin; gitlab/guthub allows you to set
        the default branch on repo, which we ignoring, always using master.
        Now, we ask the remote for the default branch when we clone/update the
        repo and set that locally.
      
        Like gitlab/guthub, mark the default branch in the branchlist with a
        "default" badge so the user knows.
      
      * Changes to the timer that is asking if the repohash has changed (via a
        push hook), this has a race in it, and I have solved part of it. It is
        not a serious problem, just a UI annoyance I am working on
        fixing. Added a cheesy mechanism to make sure the timer is not running
        at the same time the user clicks on Update().
      c40bf355
  7. 25 Oct, 2018 1 commit
  8. 23 Oct, 2018 5 commits
    • Leigh Stoller's avatar
      238fcb83
    • Leigh Stoller's avatar
      New version of the portal monitor that is specific to the Mothership. · 2a5cbb2a
      Leigh Stoller authored
      This version is intended to replace the old autostatus monitor on bas,
      except for monitoring the Mothership itself. We also notify the Slack
      channel like the autostatus version. Driven from the apt_aggregates
      table in the DB, we do the following.
      
      1. fping all the boss nodes.
      
      2. fping all the ops nodes and dboxen. Aside; there are two special
         cases for now, that will eventually come from the database. 1)
         powder wireless aggregates do not have a public ops node, and 2) the
         dboxen are hardwired into a table at the top of the file.
      
      3. Check all the DNS servers. Different from autostatus (which just
         checks that port 53 is listening), we do an actual lookup at the
         server. This is done with dig @ the boss node with recursion turned
         off. At the moment this is serialized test of all the DNS servers,
         might need to change that latter. I've lowered the timeout, and if
         things are operational 99% of the time (which I expect), then this
         will be okay until we get a couple of dozen aggregates to test.
      
         Note that this test is skipped if the boss is not pingable in the
         first step, so in general this test will not be a bottleneck.
      
      4. Check all the CMs with a GetVersion() call. As with the DNS check, we
         skip this if the boss does not ping. This test *is* done in parallel
         using ParRun() since its slower and the most likely to time out when
         the CM is busy. The time out is 20 seconds. This seems to be the best
         balance between too much email and not hanging for too long on any
         one aggregate.
      
      5. Send email and slack notifications. The current loop is every 60
         seconds, and each test has to fail twice in a row before marking a
         test as a failure and sending notification. Also send a 24 hour
         update for anything that is still down.
      
      At the moment, the full set of tests takes 15 seconds on our seven
      aggregates when they are all up. Will need more tuning later, as the
      number of aggregates goes up.
      2a5cbb2a
    • Leigh Stoller's avatar
      Add timeout override to PingAggregate(). · 076547b6
      Leigh Stoller authored
      076547b6
    • Leigh Stoller's avatar
      When searching for an IP on the history page, lets also show a matching · 10383734
      Leigh Stoller authored
      current experiment if there is one. This is convenient.
      10383734
    • Leigh Stoller's avatar
      Allow HTML in warn/kill message to user. · 74258700
      Leigh Stoller authored
      74258700
  9. 08 Oct, 2018 1 commit
  10. 01 Oct, 2018 2 commits
  11. 28 Sep, 2018 6 commits
  12. 26 Sep, 2018 1 commit
  13. 21 Sep, 2018 2 commits
  14. 17 Sep, 2018 2 commits
  15. 04 Sep, 2018 2 commits
  16. 13 Aug, 2018 1 commit
  17. 09 Aug, 2018 1 commit
  18. 08 Aug, 2018 2 commits
    • Leigh Stoller's avatar
      Left this out of previous commit. · ef517168
      Leigh Stoller authored
      ef517168
    • Leigh Stoller's avatar
      Big set of changes for deferred/scheduled/offline aggregates: · 6f17de73
      Leigh Stoller authored
      * I started out to add just deferred aggregates; those that are offline
        when starting an experiment (and marked in the apt_aggregates table as
        being deferable). When an aggregate is offline, we add an entry to the
        new apt_deferred_aggregates table, and periodically retry to start the
        missing slivers. In order to accomplish this, I split create_instance
        into two scripts, first part to create the instance in the DB, and the
        second (create_slivers) to create slivers for the instance. The daemon
        calls create_slivers for any instances in the deferred table, until
        all deferred aggregates are resolved.
      
        On the UI side, there are various changes to deal with allowing
        experiments to be partially create. For example used to wait till we
        have all the manifests until showing the topology. Now we show the
        topo on the first manifest, and then add them as they come in. Various
        parts of the UI had to change to deal with missing aggregates, I am
        sure I did not get them all.
      
      * And then once I had that, I realized that "scheduled" experiments was
        an "easy" addition, its just a degenerate case of deferred. For this I
        added some new slots to the tables to hold the scheduled start time,
        and added a started stamp so we can distinguish between the time it
        was created and the time it was actually started. Lots of data.
      
        On the UI side, there is a new fourth step on the instantiate page to
        give the user a choice of immediate or scheduled start. I moved the
        experiment duration to this step. I was originally going to add a
        calendar choice for termination, but I did not want to change the
        existing 16 hour max duration policy, yet.
      6f17de73
  19. 07 Aug, 2018 2 commits
  20. 30 Jul, 2018 2 commits
  21. 16 Jul, 2018 2 commits
    • Leigh Stoller's avatar
      Image handling changes: · fe8cc493
      Leigh Stoller authored
      1. The primary change is to the Create Image modal; we now allow users
         to optionally specify a description for the image. This needed to be
         plumbed through all the way to the GeniCM CreateImage() API. Since
         the modal is getting kinda overloaded, I rearranged things a bit and
         changed the argument checking and error handling. I think this is the
         limit of what we want to do on this modal, need a better UI in the
         future.
      
      2. Of course, if we let users set descriptions, lets show them on the
         image listing page. While I was there, I made the list look more like
         the classic image list; show the image name and project, and put the
         URN in a tooltip, since in general the URN is noisy to look at.
      
      3. And while I was messing with the image listing, I noticed that we
         were not deleting profiles like we said we would. The problem is that
         when we form the image list, we know the profile versions that can be
         deleted, but when the user actually clicks to delete, I was trying to
         regen that decision, but without asking the cluster for the info
         again. So instead, just pass through the version list from the web
         UI.
      fe8cc493
    • Leigh Stoller's avatar
      Add ReadFile() convenience function. · fc8b83bb
      Leigh Stoller authored
      fc8b83bb
  22. 09 Jul, 2018 1 commit
    • Leigh Stoller's avatar
      Various bits of support for issue #408: · b7fb16a8
      Leigh Stoller authored
      * Add portal url to the existing emulab extension that tells the CM the
        CreateSliver() is coming from the Portal. Always send this info, not
        just for the Emulab Portal.
      
      * Stash that info in the geni slice data structure so we can add links
        back to the portal status page for current slices.
      
      * Add routines to generate a portal URL for the history entries, since
        we will not have those links for historical slices. Add links back to
        the portal on the showslice and slice history pages.
      b7fb16a8