capnet issueshttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues2017-08-07T22:38:00-06:00https://gitlab.flux.utah.edu/tcloud/capnet/-/issues/13WFA failure2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduWFA failureCan workflow agents recover from a crash without recreating or re-obtaining all caps they'd had?
Right now, if one crashes, we effectively must kill and restart the controller, for several reasons (when the WFA port dies, the controll...Can workflow agents recover from a crash without recreating or re-obtaining all caps they'd had?
Right now, if one crashes, we effectively must kill and restart the controller, for several reasons (when the WFA port dies, the controller can't handle that; and flows in the switch as a result of whatever the WFA has done cap-wise remain). This also requires all other WFAs to be restarted... and the entire cap world (and any node state that was changed during cap operations!) to be recreated. This is truly a pain.
Perhaps, if the WFAs journaled bits of metadata each time they recv() a cap from an RP or a membrane, they could "resume" much more easily. This seems straightforward.https://gitlab.flux.utah.edu/tcloud/capnet/-/issues/1Handle port_delete OF events2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduHandle port_delete OF eventsThis is quite complicated... we need to handle port-flap events, or worse things where a VM reboots and then gets plugged into a different ofport. What do we call a flap, vs a time to revoke all caps owned by that port, or to that port?...This is quite complicated... we need to handle port-flap events, or worse things where a VM reboots and then gets plugged into a different ofport. What do we call a flap, vs a time to revoke all caps owned by that port, or to that port? Stuff like
that. Do we keep a shadow copy of switch, port, and node objects that is detached from the mul (OF) objects when they temporarily disappear, and then "reconnect" them when they reappear? We also need to add liveness information to these objects so that if people perform cap ops on a node cap they hold to a node that actually has gone away, we can
deal with it. Same with flows...
This is a combination of @johnsond and @joshkunz ...David Johnsonjohnsond@flux.utah.eduDavid Johnsonjohnsond@flux.utah.eduhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/34reframe broker object lookup method to use "instance" RP2017-07-26T16:23:57-06:00David Johnsonjohnsond@flux.utah.edureframe broker object lookup method to use "instance" RPAs I thought through @u1082186's proposed multi-cloud broker some more, I realized our current broker is broken. It's of course my fault, because I always thought of brokers as being peer to peer (one side calls `send()`, other calls `r...As I thought through @u1082186's proposed multi-cloud broker some more, I realized our current broker is broken. It's of course my fault, because I always thought of brokers as being peer to peer (one side calls `send()`, other calls `recv()`, and vice versa, to exchange individual caps. However, of course multiple parties can all hold a cap to a single RP, and exchange caps amongst themselves with it (although they obviously need some way to coordinate that free-for-all).
Thus, even the local broker interface should change. Right now the broker has two methods:
* `register(service_name,rp)`
* `lookup(service_name)`
and the expected use case is a service that registers an RP, and a consumer that looks it up and is then given a cap to the registered RP, on which to send a cap to the service (presumably the first and only cap sent is a membrane; everything else would flow through the membrane).
However, because the consumer has a cap to the service RP, it *could* also `recv()` on the RP... meaning the consumer could instead just keep calling `recv()` and intercept other `sent()` caps from other consumers, and provide a malicious service.
Consequently, I propose we could change the API to
* `register(service_name,rp)`
* `lookup(service_name,crp)`
where the controller would `send()` the `lookup` `crp` argument to the `register` `rp` argument -- so that the consumer does not actually ever get a cap to the service's `rp`.
I suppose that the current API could be useful for some multi-tenant RP exchanges, but we don't have those use cases right now.https://gitlab.flux.utah.edu/tcloud/capnet/-/issues/33change node_grant to node_lease or just lease2017-07-26T14:12:32-06:00David Johnsonjohnsond@flux.utah.educhange node_grant to node_lease or just leaseJosh KunzJosh Kunzhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/32bind real nodes to cn_node_lease instead of cn_node2017-06-30T17:55:57-06:00David Johnsonjohnsond@flux.utah.edubind real nodes to cn_node_lease instead of cn_nodeEventually, when we add VMCreate() in the API, it will create a cn_node and return a capability, but the actual VM will be lazily created only when node.reset() is invoked. Thus VM openstack UUIDs are only ever associated with cn_node_l...Eventually, when we add VMCreate() in the API, it will create a cn_node and return a capability, but the actual VM will be lazily created only when node.reset() is invoked. Thus VM openstack UUIDs are only ever associated with cn_node_lease objects. Further, when node.reset() is called, the current VM is destroyed, and a new one created.David Johnsonjohnsond@flux.utah.eduDavid Johnsonjohnsond@flux.utah.eduhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/29Add autoflow caps2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduAdd autoflow capsSuppose a server hands multiple clients the ability to send to the server. However, almost nothing is unidirectional these days. If the client wants to tightly control the reverse direction flow to it (i.e. the server should only be ab...Suppose a server hands multiple clients the ability to send to the server. However, almost nothing is unidirectional these days. If the client wants to tightly control the reverse direction flow to it (i.e. the server should only be able to send to the port the client is listening for a response on) --- rather than rewriting the application to choose a non-random port, setup a flow cap to it, and send it to the server's agent --- what if the client could just mark the cap to the server it had already received as an "autoresponse" cap. It is then trivial to install a flow match rule, for any connection-based protocol like TCP, to watch for connection start, create an autoresponse cap owned by the owner of the original flow cap, and install it to allow reverse-path communication.
This allows the client to implicitly grant a cap to itself on whatever port the application might choose, to the server to respond to the client. Because of the connection-oriented nature, the capability can be revoked at connection close or reset (again by installed a flow match rule).
Of course the owner could revoke the autocreated response flow cap whenever it would like.
As far as things like connection drops, well, it is up to the client to ensure those are handled, if it wants to use this mechanism. We could also explore timeouts in the match rules.
I don't know why we didn't think of this sugar, but it's definitely worthwhile and important.https://gitlab.flux.utah.edu/tcloud/capnet/-/issues/24Add "drain_wait" method to RPs2017-08-07T22:38:00-06:00Josh KunzAdd "drain_wait" method to RPsWorkflow applications need a reliable way to delay execution until a rendezvous point is empty. A `drain_wait` method on an RP would send a notification to a client when the RP they invoked the method on is empty.Workflow applications need a reliable way to delay execution until a rendezvous point is empty. A `drain_wait` method on an RP would send a notification to a client when the RP they invoked the method on is empty.Josh KunzJosh Kunzhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/20Convert membranes from send/recv to wrap interface2017-08-07T22:38:00-06:00Josh KunzConvert membranes from send/recv to wrap interfaceHaving membranes serve not only as wrapping/unwrapping objects, but also as message passing objects complicates the model needlessly. We should instead have membranes implement a generic `wrap` interface (which works differently dependin...Having membranes serve not only as wrapping/unwrapping objects, but also as message passing objects complicates the model needlessly. We should instead have membranes implement a generic `wrap` interface (which works differently depending on if they are internal or external membranes) that performs the wrapping procedure on objects.
To get the same effect as membrane send/recv, we just need to wrap an RP and have it behave normally.Josh KunzJosh Kunzhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/12Controller failure2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduController failureWhat happens on controller failure (i.e. crash)? Easy answer is to just have the switch fail-shut when it loses connectivity with the controller.
But then how do WFAs recover? Right now, we have to reboot the world.
Conceptually,...What happens on controller failure (i.e. crash)? Easy answer is to just have the switch fail-shut when it loses connectivity with the controller.
But then how do WFAs recover? Right now, we have to reboot the world.
Conceptually, it is "easy" to journal all capability operations sequentially, and replay the log into the controller to restore the working set, *with* the cptr identifiers remaining the same (so that WFAs can continue after the interruption). The only real problem is that the log is large... and operations on the log to elide parts of it (i.e. when caps are revoked or nodes are removed) are more complex (and probably slower and/or need more locking).
So... we'll just leave this issue here as a low-prio one. It will only rear its head if we have longer-term WFAs that do a longer complicated example, and are a timely, costly pain to test. We've been lucky so far on that front.https://gitlab.flux.utah.edu/tcloud/capnet/-/issues/11Fine-grained flows2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduFine-grained flowsJosh KunzJosh Kunzhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/10Abstract nodeinfo/protocol/fine-grained flow interface2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduAbstract nodeinfo/protocol/fine-grained flow interfaceRight now it is strongly tied to ethernet mac/ipv4 . We should relax it.Right now it is strongly tied to ethernet mac/ipv4 . We should relax it.https://gitlab.flux.utah.edu/tcloud/capnet/-/issues/8Joint computation example2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduJoint computation exampleHopefully something with hadoop and genomics.Hopefully something with hadoop and genomics.Anmol VatsaAnmol Vatsahttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/7Implement real reset()2017-08-07T22:38:00-06:00David Johnsonjohnsond@flux.utah.eduImplement real reset()https://gitlab.flux.utah.edu/tcloud/capnet/-/issues/5Native openstack/controller metadata interface2017-08-07T22:38:01-06:00David Johnsonjohnsond@flux.utah.eduNative openstack/controller metadata interfaceActually implement the metadata_openstack metadata interface, using a C AMQP listener. This is *probably* the right thing to do, but I have to think about how it relates to the controller needing to be able to call out to openstack to "...Actually implement the metadata_openstack metadata interface, using a C AMQP listener. This is *probably* the right thing to do, but I have to think about how it relates to the controller needing to be able to call out to openstack to "reset()" a node... maybe this interface does that too. Maybe it also sends openstack API call/replies. Maybe that's a different interface.David Johnsonjohnsond@flux.utah.eduDavid Johnsonjohnsond@flux.utah.eduhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/2API access to Openstack for WFAs *and* controller2017-08-07T22:38:01-06:00David Johnsonjohnsond@flux.utah.eduAPI access to Openstack for WFAs *and* controllerBoth the WFAs and controller need some kind of API access to Openstack to support real reset (that reboots a VM), and createVM. Think carefully about relationship between capabilities and the Openstack security mechanisms. We should co...Both the WFAs and controller need some kind of API access to Openstack to support real reset (that reboots a VM), and createVM. Think carefully about relationship between capabilities and the Openstack security mechanisms. We should consider if we can just wrap and proxy full API access to any service in a reasonable way, or if we have to special-case everything. Obviously the former is ideal... but proxying all API calls over the capability protocol isn't going to be super nice. We'll certainly need protocol support for ack'd packet fragments, because the raw cap proto can't do more than an MTU right now. Sigh...
Again, this is a combination of @joshkunz and @johnsond .David Johnsonjohnsond@flux.utah.eduDavid Johnsonjohnsond@flux.utah.eduhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/28Add referential equality operator to the capability protocol.2017-08-07T22:38:01-06:00Josh KunzAdd referential equality operator to the capability protocol.Often, even if only for testing, we want the ability to see if two capabilities point to the same object. There should be a capability operator to do this.Often, even if only for testing, we want the ability to see if two capabilities point to the same object. There should be a capability operator to do this.Josh KunzJosh Kunzhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/27Create a capability object for use in debugging.2017-08-07T22:38:01-06:00Josh KunzCreate a capability object for use in debugging.It would be really useful to have an easily create-able dummy object that is enabled in debug builds.It would be really useful to have an easily create-able dummy object that is enabled in debug builds.Josh KunzJosh Kunzhttps://gitlab.flux.utah.edu/tcloud/capnet/-/issues/17Write up description of seal/unseal model2017-08-07T22:38:01-06:00David Johnsonjohnsond@flux.utah.eduWrite up description of seal/unseal modelJosh KunzJosh Kunz