sql/updates/4/545 · d7c4230e67835cd84b36bd7cc7ed9cb7fec95255 · emulab / emulab-stable

Bring the cluster monitor "inhouse", rather then depending on the jfed · d7c4230e

Leigh B Stoller authored Nov 12, 2016

monitoring system.

New portal_monitor daemon does a GetVersion/ListResources call at each
of the clusters every five minutes, and updates the new table in the
DB called apt_aggregate_status. We calculate free/inuse counts for
physical nodes and a free count for VMs. Failure to contact the
aggregate for more then 10 minutes sets the aggregate as down, since
from our perspective if we cannot get to it, the cluster is down.

Unlike the jfed monitoring system, we are not going to try to
instantiate a new experiment or ssh into it. Wait and see if that is
necessary in our context.

On the instantiate page, generate a json structure for each cluster,
similar the one described in issue #172 by Keith. This way we can easily
switch the existing code over to this new system, but fail back to the
old mechanism if this turn out to be a bust.

Some other related changes to how we hand cluster into the several web
pages.

d7c4230e