1. 27 Nov, 2006 1 commit
  2. 03 Nov, 2006 1 commit
    • Leigh B. Stoller's avatar
      Big set of changes intended to solve a couple of problems with long · ff9061d4
      Leigh B. Stoller authored
      term archiving of firstclass objects like users, projects, and of
      course templates.
      
      * Projects, Users, and Groups are now uniquely identified inside the
        DB by a index value that will not be reused. If necessary, this
        could easily be a globally unique identifier, but without federation
        there is no reason to do that yet.
      
      * Currently, pid, gid, and uid still need to be locally unique until
        all of the changes are in place (which is going to take a fairly
        long time since the entire system operates in terms of those, except
        for the few places that I had to change to get the ball rolling).
      
      * We currently archive deleted users to the deleted_users table (their
        user_stats are kept forever since they are indexed by the new index
        column). Eventually do the same with projects (not sure about
        groups) but since we rarely if ever delete a project, there is no
        rush on this one.
      
      * At the same time, I have started a large reorg of th...
      ff9061d4
  3. 26 Oct, 2006 3 commits
    • Kevin Atkinson's avatar
      · a364f40c
      Kevin Atkinson authored
      Give explicit widths to integers in sql/tbreport.sql as per Keith
      Sklower:
        The sql/tbreport.sql file does not have the same width of integers
        as database_create.sql.  This causes schema_check to burp when you have
        updgraded an existing database, but at least the output can be fed back in.
      a364f40c
    • Kevin Atkinson's avatar
      Various tblog changes: · 95a3a6a7
      Kevin Atkinson authored
        Make an attempt to discover what the error was before an swap-* was
        canceled, if any.  Both the main error (canceled) and the other error
        are stored in the error table.  To support this a new column in the
        error table is added "rank".  The primary error has a rank 0 while the
        other error has a rank 1.
      
        Make an attempt to determine when an error is a "me too" error or the
        real cause of the problem.  "Me too" errors are errors which are
        generally reported when the callie script determined that the caller
        script fails.  The caller script should have reported the error, but
        in some cases the error didn't make it into the database.  Thus if a
        "me too" is reported as the cause of a "swap-*" more info is needed to
        determine the true cause.  When a "me too" error is reported it is
        followed by a "..." on it's own line.  It is also recorded in the
        errors table under the new column "need_more_info".
      
        Add inferred column to the errors table.  This is the same value as
        the inferred variable in tblog_find_error.
      
        Add revision column to errors table to make it easy to tell which
        algorithm was used to determine the error.
      95a3a6a7
    • Leigh B. Stoller's avatar
      Bug fixes from Keith Sklower. · 4a21ca5b
      Leigh B. Stoller authored
      4a21ca5b
  4. 24 Oct, 2006 1 commit
  5. 18 Oct, 2006 1 commit
  6. 12 Oct, 2006 2 commits
  7. 08 Oct, 2006 1 commit
  8. 05 Oct, 2006 2 commits
  9. 28 Sep, 2006 1 commit
  10. 27 Sep, 2006 2 commits
  11. 26 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      Add a table to record node failures during start/stop run. This is so · c0fc3d6e
      Leigh B. Stoller authored
      we have a record of what nodes failed to respond during the
      synchronous parts of the start and stop experiment run. Data is stored
      for now, but not sure how it will be used later. The main goal is to
      avoid the problems we see with plab nodes where they do not respond,
      and start/stop run hang and fail.
      c0fc3d6e
  12. 25 Sep, 2006 2 commits
  13. 20 Sep, 2006 1 commit
    • Kirk Webb's avatar
      · 6887cc00
      Kirk Webb authored
      Add bwlimit column to widearea_nodeinfo table.
      6887cc00
  14. 12 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      This started out as a simple little hack to add a StopRun "ns" event, but · cbdc4178
      Leigh B. Stoller authored
      it got more complicated as it progressed.
      
      The bulk of the change was changing template_exprun so that it can take a
      pid/eid as an alternative to eid/guid. This is a big convenience since its
      easy to find the template from a running experiment, and it makes it
      possible to invoke from the event scheduler, which has never heard of a
      template before (and its not something I wanted to teach it about).  Its
      also easier on users.
      
      Anyway, back to the stoprun event. You can now do this:
      
      	$ns at 100 "$ns stoprun"
      or
      	tevc -e pid/eid now ns stoprun
      
      You can add the -w option to wait for the completion event that is sent,
      but this brings me to the glaring problems with this whole thing.
      
      * First, the scheduler has to fire off the stoprun in the background,
        since if it waits, we get deadlock. Why? Cause the implementation of
        stoprun uses the event system (SNAPSHOT event, other things), and if
        the scheduler is sitting and waiting, nothing happens.
      
        Okay, the solution to this was to generate a COMPLETION event from
        template_exprun once the stop operation is complete. This brings me
        to the second problem ...
      
      * Worse, is that the "ns" events that are sent to implement stoprun (like
        snapshot) send their own completion events, and that confuses anyone
        waiting on the original stoprun event (it returns early).
      
        So what to do about this? There is a "token" field in the completion
        event structure, which I presume is to allow you to match things up.  But
        there is no way to set this token using tevc (and then wait for it), and
        besides, the event scheduler makes them up anyway and sticks them into
        the event. So, the seed of a fix are already germinating in my mind, but
        I wanted to get this commit in so that Mike would have fun reading this
        commit log.
      cbdc4178
  15. 05 Sep, 2006 1 commit
    • Leigh B. Stoller's avatar
      A bunch of template changes resulting from meetings last week. · 087dbfff
      Leigh B. Stoller authored
      * Add XMLRPC interface for template swapin,stoprun,startrun,swapout and
        add the appropriate wrappers to the script_wrapper on ops.
      
      * Allow parameter descriptions in NS files. This is probably not in its
        final form since its a bit confusing as to what has priority; something
        in the NS file or a metadata item. Anyway, you can do this in your NS
        file:
      
      	$ns define-template-parameter GUID "0/0" "The GUID to be analyzed"
      
        The rules are currently that the NS file description has priority and
        is copied to child templates, unless the user has modified a description
        via the web interface, in which case the NS file description is ignored.
        I know, sounds awful, but for the most part people are going to use the
        NS file anyway.
      
      * Add "clear" option when starting a new experiment run; the per
        experiment DB at the logholes are cleared. Note that this is *not* the
        default behaviour; you have to either check the checkbox on the web form
        or use the -c option to the script wrapper, or clear=yes if talking
        directly to the XMLRPC server.
      
      * Fix up how email is generated for template_swapin and template_create,
        so that Kevin can debug tblog/tbreport stuff, but also so that we maintain
        mail logs as before. I have made some improvements to libaudit so as to
        centralize the mail goo, and avoid duplicating all that stuff.
      
      * Minor fixes to the program agent so that the new environment strings are
        sent before the program agent exits and reloads them!
      
      * Other minor little things.
      087dbfff
  16. 31 Aug, 2006 1 commit
    • Kirk Webb's avatar
      · a48210ac
      Kirk Webb authored
      Change types of latitude and longitude columns in widearea_nodeinfo to
      float (instead of float(6,5)) to fix problem with values getting adjusted
      improperly during insert.  Apparently the definition of the float(n,m)
      type is non-standard, and has changed since sql 3.x.  Regular float
      columns will do fine here.
      a48210ac
  17. 30 Aug, 2006 1 commit
  18. 25 Aug, 2006 3 commits
  19. 14 Aug, 2006 3 commits
    • Kevin Atkinson's avatar
      · 07dda0d8
      Kevin Atkinson authored
      Prep for Mike Kasick report code.  Updated database schema and
      installed hooks for his code.
      
      Cleaned up how errors were handled in tblog(...).
      
      Allow SENDMAIL to be called before the path is untained in '-T' scripts.
      
      Other small changes.
      07dda0d8
    • Kevin Atkinson's avatar
      commit.log · 09d78ac6
      Kevin Atkinson authored
      09d78ac6
    • Leigh B. Stoller's avatar
      Checkpoint my dynamic event stuff, crude as it is. The idea for this first · 9d021a07
      Leigh B. Stoller authored
      draft is that the user will at the end of an experiment run, log into one
      of his nodes and perform some analysis which is intended to be repeated at
      the end of the next run, and in future instantiations of the template.
      
      A new table called experiment_template_events holds the dynamic events for
      the template. Right now I am supporting just program events, but it will be
      easy to support arbitrary events later. As an absurd example:
      
      	node6> /usr/local/bin/template_analyze ~/data_analyze arg arg ...
      
      The user is currently responsible for making sure the output goes into a
      file in the archive. I plan to make the template_analyze wrapper handle
      that automatically later, but for now what you really want is to invoke a
      script that encapsulates that, redirecting output to $ARCHIVE (this
      variable is installed in the environment template_analyze.
      
      The wrapper script will save the current time, and then run the program.
      If the program terminates with a zero exit status, it will ssh over to ops
      and invoke an xmlrpc routine to tell boss to add a program event to both
      the eventlist for the current instance, and to the template_eventlist for
      future instances. The time of the event is the relative start time that was
      saved above (remember, each experiment run replays the event stream from
      time zero).
      
      For the future, we want to allow this to be done on ops as well, but
      that will take more infrastructure, to run "program agents" on ops.
      
      It would be nice to install the ssl xmlrpc client side on our images so
      that we do not have to ssh to ops to invoke the client.
      9d021a07
  20. 09 Aug, 2006 1 commit
  21. 08 Aug, 2006 2 commits
  22. 03 Aug, 2006 1 commit
    • Leigh B. Stoller's avatar
      Support for capturing the trace data that is stored in the pcal files · 4ce9c421
      Leigh B. Stoller authored
      into per-experiment databases on ops. Additional support for reconsituting
      those databases back into temporary databases on ops, for post processing.
      
      * This revision relies on the "snort" port (/usr/ports/security/snort)
        to read the pcap files and load them into a database. The schema is
        probably not ideal, but its better then nothing. See the file
        ops:/usr/local/share/examples/snort/create_mysql for the schema.
      
      * For simplicity, I have hooked into loghole, which already had all
        the code for downloading the trace data. I added some new methods to
        the XMLRPC server for loghole to use, to get the users DB password
        and the name of the per-experiment database. There is a new slot in
        the traces table that indicates that the trace should be snorted to
        its DB. In case you forgot, at the end of a run or when the instance
        is swapped out, loghole is run to download the trace data.
      
      * For reconsituting, there are lots of additions to opsdb_control and
        opsdb_control.proxy to create "temporary" databases and load them
        from a dump file that is stored in the archive. I've added a button
        to the Template Record page, inappropriately called "Analyze" since
        right now all it does is reconsitute the trace data into a DB on
        ops.
      
        Currently, the only indication of what has been done (the name of
        the DBs created on ops) is the log email that the user gets. A
        future project is tell the user this info in the web interface.
      
      * To turn on database capturing of trace data, do this in your NS
        file:
      
      	set link0 ...
      	$link0 trace
      	$link0 trace_snaplen 128
      	$link0 trace_db 1
      
         the increase in snaplen is optional, but a good idea if you want
         snort to undertand more then just ip headers.
      
      * Also some changes to the parser to allow plain experiments to take
        advantage of all this stuff. To simple get yourself a per-experiment
        DB, put this in your NS file:
      
      	tb-set-dpdb 1
      
        however, anytime you turn trace_db on for a link or lan, you
        automatically get a per-experiment DB.
      
      * To capture the trace data to the DB, you can run loghole by hand:
      
      	loghole sync -s
      
        the -s option turns on the "post-process" phase of loghole.
      4ce9c421
  23. 02 Aug, 2006 1 commit
  24. 27 Jul, 2006 1 commit
  25. 20 Jul, 2006 2 commits
    • Leigh B. Stoller's avatar
      Minor change to previous revision; need the right number of 0s after · 1a564e6d
      Leigh B. Stoller authored
      the decimal point in decimal slots.
      1a564e6d
    • Leigh B. Stoller's avatar
      This start out as: · 5dfa4a47
      Leigh B. Stoller authored
      * add an "active" flag to the template record, which will be used by
        the user to indicate what templates he wants listed (rather then the
        roots). Basically, the current working templates, rather then a big
        graph.
      
      But I never actually finished that cause it sorta morphed into:
      
      * Added a vis_graphs table to cache the last generated visualization
        rendering in the database so that we do not have to wait so long for:
      
      * Add new buttons to showexp and template_show pages, to display in the
        same page either the settings (current view), the NS file, or the
        visualization (along with zoom in/out buttons).
      
      And now I can go back to that "active" thing I mentioned up above ...
      5dfa4a47
  26. 18 Jul, 2006 1 commit
    • Leigh B. Stoller's avatar
      Changes necessary for moving most of the stuff in the node_types · 624a0364
      Leigh B. Stoller authored
      table, into a new table called node_type_attributes, which is intended
      to be a more extensible way of describing nodes.
      
      The only things left in the node_types table will be type,class and the
      various isXXX boolean flags, since we use those in numerous joins all over
      the system (ie: when discriminating amongst nodes).
      
      For the most part, all of that other stuff is rarely used, or used in
      contexts where the information is needed, but not for type descrimination.
      Still, it made for a lot of queries to change!
      
      Along the way I added a NodeType library module that represents the type
      info as a perl object. I also beefed up the existing Node module, and
      started using it in more places. I also added an Interfaces module, but I
      have not done much with that yet.
      
      I have not yet removed all the slots from the node_types table; I plan to
      run the new code for a few days and then remove the slots.
      
      Example using the new NodeType object:
      
      	use NodeType;
      
      	my $typeinfo = NodeType->Lookup($type);
      
              if ($typeinfo->control_interface(\$control_iface) ||
                  !$control_iface) {
        	    warn "No control interface for $type is defined in the DB!\n";
              }
      
      or using the Node:
      
      	use Node;
      
              my $nodeobject = Node->Lookup($node_id);
              my $imageable  = $nodeobject->NodeTypeInfo()->imageable();
      or
              my $rebootable = $nodeobject->isrebootable();
      or
              $nodeobject->NodeTypeAttribute("control_interface", \$control_iface);
      
      Lots of way to accomplish the same thing, but the main point is that the
      Node is able to override the NodeType (if it wants to), which I think is
      necessary for flexibly describing one/two of a kind things like switches, etc.
      624a0364
  27. 11 Jul, 2006 1 commit
    • Leigh B. Stoller's avatar
      Add new table, node_type_attributes, which will eventually encompass · 9bcb29bb
      Leigh B. Stoller authored
      most of what is in the node_types table (and node_types_auxtypes).
      The goal is to make most of the type information free form key/value
      pairs so that type information is more extensible.
      
      This commit simply creates the table and populates it with initial
      info using init_nodeattrs.pl. I expect this script to change, but it
      can be rerun later so no big deal.
      9bcb29bb
  28. 05 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · 183040de
      Kevin Atkinson authored
      Many changes to tblog code.  Database update needed:
      
      1) Added summary of failed nodes is os_setup.  The cause of the error is now
      classified as "user" if it is only user images that failed and the user
      image failed on every pc of a particular type.  Otherwise I leave the cause
      as "unknown" since it is really hard to tell what the real cause is.
      
      2) Raised the confidence threshold for most errors so that they will appear
      on the top.
      
      3) Added a special error when an experiment is canceled.  The cause is
      "canceled" and testbed-ops won't see these errors.
      
      4) Fixed a bug in assign_wrapper where it will incorrectly report "This
      experiment cannot be instantiated on this testbed..." when really the user
      canceled the swapin.
      
      5) Fixed a bug where os_setup errors where being incorrectly reported as
      assign errors.  This happens when os_setup fails for some reason and
      tbswap tries again, but the second time around there are not enough nodes.
      So the last error is coming from assign even though the true cause of the
      error is due to failed nodes.  The fix for this involved added a new column
      to the log table, "attempt", which will be 1 for the first attempt and then
      incremented for each new attempt.  tblog_find_error will then simply ignore
      any errors with "attempt > 1".
      
      6) Also fixed a potential problem when there is an error during the cleanup
      phase by adding another column "cleanup".  tblog_find_error will
      also ignore any errors with the cleanup bit set.
      183040de