Skip to content
  • Leigh B. Stoller's avatar
    Checkpoint my dynamic event stuff, crude as it is. The idea for this first · 9d021a07
    Leigh B. Stoller authored
    draft is that the user will at the end of an experiment run, log into one
    of his nodes and perform some analysis which is intended to be repeated at
    the end of the next run, and in future instantiations of the template.
    
    A new table called experiment_template_events holds the dynamic events for
    the template. Right now I am supporting just program events, but it will be
    easy to support arbitrary events later. As an absurd example:
    
    	node6> /usr/local/bin/template_analyze ~/data_analyze arg arg ...
    
    The user is currently responsible for making sure the output goes into a
    file in the archive. I plan to make the template_analyze wrapper handle
    that automatically later, but for now what you really want is to invoke a
    script that encapsulates that, redirecting output to $ARCHIVE (this
    variable is installed in the environment template_analyze.
    
    The wrapper script will save the current time, and then run the program.
    If the program terminates with a zero exit status, it will ssh over to ops
    and invoke an xmlrpc routine to tell boss to add a program event to both
    the eventlist for the current instance, and to the template_eventlist for
    future instances. The time of the event is the relative start time that was
    saved above (remember, each experiment run replays the event stream from
    time zero).
    
    For the future, we want to allow this to be done on ops as well, but
    that will take more infrastructure, to run "program agents" on ops.
    
    It would be nice to install the ssl xmlrpc client side on our images so
    that we do not have to ssh to ops to invoke the client.
    9d021a07