- 27 Sep, 2018 1 commit
-
-
Leigh B Stoller authored
-
- 21 Sep, 2018 1 commit
-
-
Leigh B Stoller authored
with cause and optionally freeze the user. "Cause" means you can paste in a block of text that is emailed to the user.
-
- 08 Aug, 2018 1 commit
-
-
Leigh B Stoller authored
* I started out to add just deferred aggregates; those that are offline when starting an experiment (and marked in the apt_aggregates table as being deferable). When an aggregate is offline, we add an entry to the new apt_deferred_aggregates table, and periodically retry to start the missing slivers. In order to accomplish this, I split create_instance into two scripts, first part to create the instance in the DB, and the second (create_slivers) to create slivers for the instance. The daemon calls create_slivers for any instances in the deferred table, until all deferred aggregates are resolved. On the UI side, there are various changes to deal with allowing experiments to be partially create. For example used to wait till we have all the manifests until showing the topology. Now we show the topo on the first manifest, and then add them as they come in. Various parts of the UI had to change to deal with missing aggregates, I am sure I did not get them all. * And then once I had that, I realized that "scheduled" experiments was an "easy" addition, its just a degenerate case of deferred. For this I added some new slots to the tables to hold the scheduled start time, and added a started stamp so we can distinguish between the time it was created and the time it was actually started. Lots of data. On the UI side, there is a new fourth step on the instantiate page to give the user a choice of immediate or scheduled start. I moved the experiment duration to this step. I was originally going to add a calendar choice for termination, but I did not want to change the existing 16 hour max duration policy, yet.
-
- 16 Jul, 2018 1 commit
-
-
Leigh B Stoller authored
1. The primary change is to the Create Image modal; we now allow users to optionally specify a description for the image. This needed to be plumbed through all the way to the GeniCM CreateImage() API. Since the modal is getting kinda overloaded, I rearranged things a bit and changed the argument checking and error handling. I think this is the limit of what we want to do on this modal, need a better UI in the future. 2. Of course, if we let users set descriptions, lets show them on the image listing page. While I was there, I made the list look more like the classic image list; show the image name and project, and put the URN in a tooltip, since in general the URN is noisy to look at. 3. And while I was messing with the image listing, I noticed that we were not deleting profiles like we said we would. The problem is that when we form the image list, we know the profile versions that can be deleted, but when the user actually clicks to delete, I was trying to regen that decision, but without asking the cluster for the info again. So instead, just pass through the version list from the web UI.
-
- 22 Jun, 2018 1 commit
-
-
Leigh B Stoller authored
-
- 21 Jun, 2018 1 commit
-
-
Leigh B Stoller authored
for each node in the list. Also, add a new column to show what image is running,
-
- 09 Apr, 2018 1 commit
-
-
Leigh B Stoller authored
images.
-
- 14 Mar, 2018 1 commit
-
-
Leigh B Stoller authored
something different in the Portal. Ditto when we fail on an empty testbed, although it appears we never get that anymore.
-
- 16 Feb, 2018 2 commits
-
-
Leigh B Stoller authored
-
Leigh B Stoller authored
I spent a fair amount of improving error handling along the RPC path, as well making the code more consistent across the various files. Also be more consistent in how the web interface invokes the backend and gets errors back, specifically for errors that are generated when taking to a remote cluster. Add checks before every RPC to make sure the cluster is not disabled in the database. Also check that we can actually reach the cluster, and that the cluster is not offline (NoLogins()) before we try to do anything. I might have to relax this a bit, but in general it takes a couple of seconds to check, which is a small fraction of what most RPCs take. Return precise errors for clusters that are not available, to the web interface and show them to user. Use webtasks more consistently between the web interface and backend scripts. Watch specifically for scripts that exit abnormally (exit before setting the exitcode in the webtask) which always means an internal failure, do not show those to users. Show just those RPC errors that would make sense users, stop spewing script output to the user, send it just to tbops via the email that is already generated when a backend script fails fatally. But do not spew email for clusters that are not reachable or are offline. Ditto for several other cases that were generating mail to tbops instead of just showing the user a meaningful error message. Stop using ParRun for single site experiments; 99% of experiments. For create_instance, a new "async" mode that tells CreateSliver() to return before the first mapper run, which is typically very quickly. Then watch for errors or for the manifest with Resolve or for the slice to disappear. I expect this to be bounded and so we do not need to worry so much about timing this wait out (which is a problem on very big topologies). When we see the manifest, the RedeemTicket() part of the CreateSliver is done and now we are into the StartSliver() phase. For the StartSliver phase, watch for errors and show them to users, previously we mostly lost those errors and just sent the experiment into the failed state. I am still working on this.
-
- 11 Dec, 2017 1 commit
-
-
Leigh B Stoller authored
The limit is the number of hours since the experiment is created, so a limit of 10 days really just means that experiments can not live past 10 days. I think this makes more sense then anything else. There is an associated flag with extension limiting that controls whether the user can even request another extension after the limit. The normal case is that the user cannot request any more extensions, but when set, the user is granted no free time and goes through need admin approval path. Some changes to the email, so that both the user and admin email days how many days/hours were both requested and granted. Also UI change; explicitly tell the user when extensions are disabled, and also when no time is granted (so that the users is more clearly aware).
-
- 04 Dec, 2017 1 commit
-
-
Leigh B Stoller authored
* Change the units of extension from days to hours along the extension path. The user does not see this directly, but it allows us to extend experiments to the hour before they are needed by a different reservation, both on the user extend modal and the admin extend modal. On the admin extend page, the input box still defaults to days, but you can also use xDyH to specify days and hours. Or just yH for just hours. But to make things easier, there is also a new "max" checkbox to extend an experiment out to the maximum allowed by the reservation system. * Changes to "lockout" (disabling extensions). Add a reason field to the database, clicking the lockout checkbox will prompt for an optional reason. The user no longer sees the extension modal when extensions are disabled, we show an alert instead telling them extensions are disabled, and the reason. On the admin extend page there is a new checkbox to disable extensions when denying an extension or scheduling termination. Log extension disable/enable to the audit log. * Clear out a bunch of old extension code that is no longer used (since the extension code was moved from php to perl).
-
- 27 Oct, 2017 1 commit
-
-
Leigh B Stoller authored
do that, always intended to.
-
- 24 Oct, 2017 1 commit
-
-
Leigh B Stoller authored
error is a resource shortage.
-
- 13 Oct, 2017 1 commit
-
-
Leigh B Stoller authored
1. First off, we no longer do automatic lockdown of experiments when granting an extension longer then 10 days. 2. Instead, we will lockdown experiments on case by case basis. 3. Changes to the lockdown path that ask the reservation system at the target cluster if locking down would throw the reservation system into chaos. If so, return a refused error and give admin the choice to override. When we do override, send email to local tbops informing that the reservation system is in chaos state.
-
- 06 Sep, 2017 1 commit
-
-
Leigh B Stoller authored
-
- 08 Aug, 2017 2 commits
-
-
Leigh B Stoller authored
1. The Geni path already supported this, just needed to add it to the web UI and plumb it through. 2. This is featurized, so on the Mothership only users with the feature will see this, not something we want mere users to do.
-
Leigh B Stoller authored
-
- 06 Jul, 2017 2 commits
-
-
Leigh B Stoller authored
message instead of silently failing. The message points to the accept certificate page on ops, since typically this is the cause of the error (a self signed certificate). Unfortunately there is no way to know why it failed, the browser tells you nothing.
-
Leigh B Stoller authored
-
- 26 Jun, 2017 1 commit
-
-
Leigh B Stoller authored
Rearrange things a bit to make the control flow smoother.
-
- 07 Jun, 2017 1 commit
-
-
Leigh B Stoller authored
-
- 28 Apr, 2017 1 commit
-
-
Leigh B Stoller authored
-
- 17 Apr, 2017 1 commit
-
-
Leigh B Stoller authored
User lockdown is as before, user can override that on the terminate page. Admin lockdown is like Classic lockdown; the flag must be cleared before the experiment can be terminated, there is no override on the termination page. UI changes on the status and admin extend page for the additional flag (instead of a single lockdown, there are now two).
-
- 20 Jan, 2017 2 commits
-
-
Leigh B Stoller authored
-
Leigh B Stoller authored
colors. I also reduced line padding a bit to make the table take up a bit less room.
-
- 12 Jan, 2017 1 commit
-
-
Leigh B Stoller authored
-
- 03 Jan, 2017 1 commit
-
-
Leigh B Stoller authored
-
- 28 Dec, 2016 1 commit
-
-
Leigh B Stoller authored
-
- 09 Nov, 2016 1 commit
-
-
Leigh B Stoller authored
only right now.
-
- 20 Jul, 2016 1 commit
-
-
Leigh B Stoller authored
-
- 17 Jun, 2016 1 commit
-
-
Leigh B Stoller authored
-
- 13 Jun, 2016 1 commit
-
-
Leigh B Stoller authored
1. Add a Reload icon on the Graphs tab, to reload the cluster data and redraw the graphs. 2. Implement a Reload Topology function, with button on status page, to sync the portal topology with the clusters current manifests. Currently available to studly users only. This closes ticket #92.
-
- 10 Jun, 2016 1 commit
-
-
Leigh B Stoller authored
intervals in the first 24 hours.
-
- 06 Jun, 2016 3 commits
-
-
Leigh B Stoller authored
-
Leigh B Stoller authored
* Add a "lockdown" checkbox to status page (red-dot) next to the lockout checkbox. * Change Lockdown on adminextend page to the an active checkbox (like lockout). * Add openstackgraphs.js, not in use yet.
-
Leigh B Stoller authored
expiration #days in the future, clears the lockdown, sets the lockout (no free extensions) and sends the text in the box to the user.
-
- 01 Jun, 2016 1 commit
-
-
Leigh B Stoller authored
* More on issue #54; watch for openstack experiments and try to download the new openstack stats file via the fast XMLRPC path. Show this as a text blob in a new tab on the status page, still need to graph the data. The apt_daemon handles the periodic request for the data (every 10 minutes), which we store in the apt_instances table. * Addition for Rob on the admin extend page; Add a "more info" button that sends the contents of the text box as an email message requesting more info and stores that in the ongoing interaction log. Responses from the user are not stored though, might look at that someday. * Another addition for Rob; on the extensions list page, also show expired, locked down experiments. Note the sorting; at the top of the list are actual extension request (status='ready') while the bottom of the list are status='expired'. * Add a "graphs" tab to the status page, which shows the same idle stats graphs that were added to the admin extend page. Most of this change is refactoring the code and sharing it between the two pages.
-
- 18 May, 2016 1 commit
-
-
Leigh B Stoller authored
clicking on their buttons. Also some fixes to hopefully prevent node context menus from getting left behind (cause Jacks is not passing out the event I need).
-
- 05 May, 2016 1 commit
-
-
Leigh B Stoller authored
-