Can workflow agents recover from a crash without recreating or re-obtaining all caps they'd had?
Right now, if one crashes, we effectively must kill and restart the controller, for several reasons (when the WFA port dies, the controller can't handle that; and flows in the switch as a result of whatever the WFA has done cap-wise remain). This also requires all other WFAs to be restarted... and the entire cap world (and any node state that was changed during cap operations!) to be recreated. This is truly a pain.
Perhaps, if the WFAs journaled bits of metadata each time they recv() a cap from an RP or a membrane, they could "resume" much more easily. This seems straightforward.