Commit 9e7c0a66 authored by Mike Hibler's avatar Mike Hibler

Thoughts on improving the power command.

Started a while back, just updated. Basically how to deal with N-to-1
and 1-to-N mappings of nodes to power outlets.
parent a861dd6c
Experiences from the DDC:
1. Support multiple levels of power control.
Apt and CloudLab machines are normally controlled by IPMI, but are
also connected to managed PDUs. We should be able to first try
cycling via IPMI and, if that fails, fall back on the PDU. We could
add a "priority" column to the outlets table for this. Would also
need to allow multiple rows per node_id (i.e., it cannot be the
primary key).
2. Support multiple outlets per node.
Heavy power nodes, or nodes with redundant power supplies might
need this. Allowing multiple rows per (node_id,priority) would do
the trick. Power would just have to gather all outlets and act on
them atomically. For simplicity and better atomicity, require that
all outlets be on the same power_id.
3. Support multiple nodes per outlet.
In situations where a single outlet can easily support more than
one node (e.g., Apt). We could add a "shared" column or maybe a
"shared_idx" column where each node sharing the outlet would have
a unique shared_idx value (so <power_id,outlet,shared_idx> becomes
a unique key). Not sure what we can do with this info at the level
of the power command. Maybe it can refuse to power cycle a node
if it is sharing an outlet with another node unless a force option
is given. Or maybe just if it is sharing with a node that is
already allocated. But I'm not sure it is a good idea to have
the low-level power command poking around in the experiments table;
it should maybe be the responsibility of the caller to ensure
"sanity" at that level.
4. Support scheduled power cycles
Another way to make #3 more usable would be to allow scheduling
of power cycles that affect multiple nodes. When an outlet is
identified as such, power could sched_reserve the nodes into
hwdown or some other experiment and, once all the nodes arrive,
perform the power cycle and free up the nodes.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment