Admission control for future reservations
We want an admission control system to allow us to determine whether we can satisfy future promises about nodes (promises are of two types: slice expiration=max. duration, i.e., "we won't kick you off before Tuesday"; and future reservations, i.e., "there are 10 d710s we won't allocate to anybody else on April 1").
Admission control is handled local to a cluster, and performed strictly by node type.
Normal users will be subject to admission control. Requests to swap in an experiment/slice, extend a slice or lengthen max. duration of a swapped-in experiment, or to schedule a future reservation will be refused if admission control fails. Admins can have an option to forcibly ignore the violation. (Maybe also exclusive permission to reserve the last few nodes, kind of like a filesystem percentage of blocks reserved for root.)
Some other useful information might fall out of the admission control checks. The same engine allows us to ask questions like "how busy will the d710 pool be on April 1 assuming no requests or voluntary swapouts"; "which project/experiment requests can we not satisfy simultaneously" if we are oversubscribed in the future (e.g., because of hwdown nodes or admin override); and "what is the max size/duration I could swap in right now".
There will be one fundamental API call, querying "is it feasible for us to satisfy all of these reservations" (for one node type at one cluster). It requires a node_type and also accepts a set of zero or more (pid,eid,start_time,end_time,num_nodes) tuples, representing hypothetical reservations/extensions/swapmods. (There are also the implicit inputs of any swapped-in experiments and previously made reservations.) It returns a Boolean indicating whether the schedule (after editing to reflect the hypothetical changes) would be satisfiable. There may also be extra outputs regarding the "useful information" in the previous paragraph.
Limitations:
Promises about future reservations are ONLY that "there are n nodes of x type we won't give to someone else between times t0 and t1". A reservation is NOT a promise that an experiment will swap in and boot successfully, that the nodes won't be in hwdown, that there will be sufficient inter-switch bandwidth, that our machine room will have power, that boss will be up, etc. etc...
We CANNOT make automatic future reservations for specific physical nodes ("you can have pc678 starting on April 1" or "you can have an entire Moonshot chassis starting on April 1"). The best we can do with specific physical nodes is to stick corresponding entries in the next_reserve table -- that'll be handled properly.
To be determined:
Hopefully this will be fast enough (<= a few seconds) that it can be implemented as an RPC service so that the portal can give interactive feedback about (e.g.) possible experiment extension requests. We'll find out when we try it...
Work to be done:
I'll implement the "is this schedule feasible" predicate as a Perl library, as well as all its dependencies (e.g., db representations of reservations).
Everything else (e.g., integrating with the back end, figuring out the transition to the time when admission control is turned on, portal user interface, ...) is outside the scope of this issue...