-
Leigh B. Stoller authored
* Parser: Added new tb command to set the name of the sync server: tb-set-sync-server <node> This initializes the sync_server slot of the experiment entry to the *vname* of the node that should run the sync server for that experiment. In other words, the sync server is per-experiment, runs on a node in the experiment, and the user gets to chose which node it runs on. * tmcd and client side setup. Added new syncserver command which returns the name of the syncserver and whether the requesting node is the lucky one to run the daemon: SYNCSERVER SERVER='nodeG.syncserver.testbed.emulab.net' ISSERVER=1 The name of the syncserver is written to /var/emulab/boot/syncserver on the nodes so that clients can easily figure out where the server is. Aside: The ready bits are now ignored (no DB accesses are made) for virtual nodes; they are forced to use the new sync server. * New os/syncd directory containing the daemon and the client. The daemon is pretty simple. It waits for TCP (and UDP, although that path is not complete yet) connections, and reads in a little structure that gives the name of the "barrier" to wait for, and an optional count of clients in the group (this would be used by the "master" who initializes barriers for clients). The socket is saved (no reply is made, so the client is blocked) until the count reaches zero. Then all clients are released by writting back to the sockets, and the sockets are closed. Obviously, the number of clients is limited by the numbed of FDs (open sockets), hence the need for a UDP variant, but that will take more work. The client has a simple command line interface: usage: emulab-sync [options] -n <name> Optional barrier name; must be less than 64 bytes long -d Turn on debugging -s server Specify a sync server to connect to -p portnum Specify a port number to connect to -i count Initialize named barrier to count waiters -u Use UDP instead of TCP The client figures out the server by looking for the file created above by libsetup (/var/emulab/boot/syncserver). If you do not specify a barrier "name", it uses an internal default. Yes, the server can handle multiple barriers (differently named of course) at once (non-overlapping clients obviously). Clients can wait before a barrier in "initialized." The count on the barrier just goes negative until someone initializes the barrier using the -i option, which increments the count by the count. Therefore, the master does not have to arrange to get there "first." As an example, consider a master and one client: nodeA> /usr/local/etc/emulab/emulab-sync -n mybarrier nodeB> /usr/local/etc/emulab/emulab-sync -n mybarrier -i 1 Node A waits until Node B initializes the barrier (gives it a count). The count is the number of *waiters*, not including the master. The master is also blocked until all of the waiters have checked in. I have not made an provision for timeouts or crashed clients. Lets see how it goes.
212cc781