- 11 Oct, 2018 2 commits
-
-
Leigh B Stoller authored
both send an event for stated to pick up, but the later also immediately updates the DB while the former does not. This can lead to lost or out of sync stated events.
-
Leigh B Stoller authored
many profiles, so many queries. Fixed!
-
- 10 Oct, 2018 5 commits
-
-
Mike Hibler authored
...and an option to specify if you want to consider logical CPUs (hyperthreading) and an option to specify an absolute minimum load average to use when doing a percentage. The latter is for, e.g., you have 1 CPU (pc3000); it would not be uncommon to have a load average > 1 even if nothing special is going on.
-
Leigh B Stoller authored
(reservation_autoapprove_limit) that overrides the site variable. Also in node hours and zero means zero instead of unlimited.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
get left in /tmp.
-
- 09 Oct, 2018 1 commit
-
-
Mike Hibler authored
Periodically looks at the slothd RRD files collected on boss. This is just an initial attempt to see if doing this is feasible or if the false positive rate is just too high.
-
- 08 Oct, 2018 2 commits
-
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
- 02 Oct, 2018 3 commits
-
-
Leigh B Stoller authored
-
David Johnson authored
(Also link the dbus machine-id file to the one systemd will generate on the next boot. This seems safe and correct.) Certain things (like systemd's dhcp client) use the machine-id as a seed for derived values. For instance, systemd's dhcp client offers a ClientIdentifier in the new client style, and some servers will return the same address to *all* requesting clients, instead of returning only based on source MAC. Can't have any of that confusion.
-
Leigh B Stoller authored
-
- 01 Oct, 2018 4 commits
-
-
Leigh B Stoller authored
1. Split the resource stuff (where we ask for an advertisement and process it) into a separate script, since that takes a long time to cycle through cause of the size of the ads from the big clusters. 2. On the monitor, distinguish offline (nologins) from actually being down. 3. Add a table to store changes in status so we can see over time how much time the aggregates are usable.
-
Leigh B Stoller authored
and return REFUSED.
-
Leigh B Stoller authored
terminate instead.
-
Leigh B Stoller authored
-
- 28 Sep, 2018 12 commits
-
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
to work on the new aggregates.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
like the edit page does.
-
Leigh B Stoller authored
Terminate and/or Freeze. So now we can send email to users about an experiment, that comes from the system and not from us personally?
-
Leigh B Stoller authored
-
Leigh B Stoller authored
like a long time, but lets try to avoid flapping especially on the POWDER fixed nodes. Might revisit with a per aggregate period setting. Send mail only once per day (and when daemon starts), send email when aggregate is alive again. This closes issue #425.
-
- 27 Sep, 2018 4 commits
-
-
Leigh B Stoller authored
the cord, these routes break contact with the Mothership
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
- 26 Sep, 2018 4 commits
-
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
- 25 Sep, 2018 1 commit
-
-
David Johnson authored
-
- 24 Sep, 2018 1 commit
-
-
David Johnson authored
When IsFeasible processes the list of events (i.e. reservation start/end, expt start/end), it processes them in sorted order of event time, but if times are equal, there is no secondary sort, and thus the additive (incoming) reservation might be processed before the reductive (outgoing) reservation), which would create a false negative hole in the forecast. This commit adds the secondary sort.
-
- 21 Sep, 2018 1 commit
-
-
Leigh B Stoller authored
-