power.TODO 1.9 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Experiences from the DDC:

1. Support multiple levels of power control.

   Apt and CloudLab machines are normally controlled by IPMI, but are
   also connected to managed PDUs. We should be able to first try
   cycling via IPMI and, if that fails, fall back on the PDU. We could
   add a "priority" column to the outlets table for this. Would also
   need to allow multiple rows per node_id (i.e., it cannot be the
   primary key).

2. Support multiple outlets per node.

   Heavy power nodes, or nodes with redundant power supplies might
   need this. Allowing multiple rows per (node_id,priority) would do
   the trick. Power would just have to gather all outlets and act on
   them atomically. For simplicity and better atomicity, require that
   all outlets be on the same power_id.

3. Support multiple nodes per outlet.

   In situations where a single outlet can easily support more than
   one node (e.g., Apt). We could add a "shared" column or maybe a
   "shared_idx" column where each node sharing the outlet would have
   a unique shared_idx value (so <power_id,outlet,shared_idx> becomes
   a unique key). Not sure what we can do with this info at the level
   of the power command. Maybe it can refuse to power cycle a node
   if it is sharing an outlet with another node unless a force option
   is given. Or maybe just if it is sharing with a node that is
   already allocated. But I'm not sure it is a good idea to have
   the low-level power command poking around in the experiments table;
   it should maybe be the responsibility of the caller to ensure
   "sanity" at that level.

4. Support scheduled power cycles

   Another way to make #3 more usable would be to allow scheduling
   of power cycles that affect multiple nodes. When an outlet is
   identified as such, power could sched_reserve the nodes into
   hwdown or some other experiment and, once all the nodes arrive,
   perform the power cycle and free up the nodes.