What happens on controller failure (i.e. crash)? Easy answer is to just have the switch fail-shut when it loses connectivity with the controller.
But then how do WFAs recover? Right now, we have to reboot the world.
Conceptually, it is "easy" to journal all capability operations sequentially, and replay the log into the controller to restore the working set, with the cptr identifiers remaining the same (so that WFAs can continue after the interruption). The only real problem is that the log is large... and operations on the log to elide parts of it (i.e. when caps are revoked or nodes are removed) are more complex (and probably slower and/or need more locking).
So... we'll just leave this issue here as a low-prio one. It will only rear its head if we have longer-term WFAs that do a longer complicated example, and are a timely, costly pain to test. We've been lucky so far on that front.