1. 08 Mar, 2004 1 commit
    • Leigh B. Stoller's avatar
      Converted os_load and node_reboot into libraries. Basically that meant · 9bfe3d61
      Leigh B. Stoller authored
      splitting the existing code between a frontend script that parses arguments
      and does taint checking, and a backend library where all the work is done
      (including permission checks). The interface to the libraries is simple
      right now (didn't want to spend a lot of time on designing interface
      without knowing if the approach would work long term).
      
      	use libreboot;
      	use libosload;
      
              nodereboot(\%reboot_args, \%reboot_results);
              osload(\%reload_args, \%reload_results);
      
      Arguments are passed to the libraries in the form of a hash. For example,
      in os_setup:
      
      	$reload_args{'debug'}     = $dbg;
      	$reload_args{'asyncmode'} = 1;
      	$reload_args{'imageid'}   = $imageid;
      	$reload_args{'nodelist'}  = [ @nodelist ];
      
      Results are passed back both as a return code (-1 means total failure right
      away, while a positive argument indicates the number of nodes that failed),
      and in the results hash which gives the status for each individual node. At
      the moment it is just success or failure (0 or 1), but in the future might
      be something more meaningful.
      
      os_setup can now find out about individual failures, both in reboot and
      reload, and alter how it operates afterwards. The main thing is to not wait
      for nodes that fail to reboot/reload, and to terminate with no retry when
      this happens, since at the moment it indicates an unusual failure, and it
      is better to terminate early. In the past an os_load failure would result
      in a tbswap retry, and another failure (multiple times). I have already
      tested this by trying to load images that have no file on disk; it is nice
      to see those failures caught early and the experiment failure to happen
      much quicker!
      
      A note about "asyncmode" above. In order to promote parallelism in
      os_setup, asyncmode tells the library to fork off a child and return
      immediately. Later, os_setup can block and wait for status by calling
      back into the library:
      
      	my $foo = nodereboot(\%reboot_args, \%reboot_results);
      	nodereboot_wait($foo);
      
      If you are wondering how the child reports individual node status back to
      the parent (so it can fill in the results hash), Perl really is a kitchen
      sink. I create a pipe with Perl's pipe function and then fork a child to so
      the work; the child writes the results to the pipe (status for each node),
      and the parent reads that back later when nodereboot_wait() is called,
      moving the results into the %reboot_results array. The parent meanwhile can
      go on and in the case of os_setup, make more calls to reboot/reload other
      nodes, later calling the wait() routines once all have been initiated.
      Also worth noting that in order to make the libraries "reentrant" I had to
      do some cleaning up and reorganizing of the code. Nothing too major though,
      just removal of lots of global variables. I also did some mild unrelated
      cleanup of code that had been run over once too many times with a tank.
      
      So how did this work out. Well, for os_setup/os_load it works rather
      nicely!
      
      node_reboot is another story. I probably should have left it alone, but
      since I had already climbed the curve on osload, I decided to go ahead and
      do reboot. The problem is that node_reboot needs to run as root (its a
      setuid script), which means it can only be used as a library from something
      that is already setuid. os_setup and os_load runs as the user. However,
      having a consistent library interface and the ability to cleanly figure out
      which individual nodes failed, is a very nice thing.
      
      So I came up with a suitable approach that is hidden in the library. When the
      library is entered without proper privs, it silently execs an instance of
      node_reboot (the setuid script), and then uses the same trick mentioned
      above to read back individual node status. I create the pipe in the parent
      before the exec, and set the no-close-on-exec flag. I pass the fileno along
      in an environment variable, and the library uses that to the write the
      results to, just like above. The result is that os_setup sees the same
      interface for both os_load and node_reboot, without having to worry that
      one or the other needs to be run setuid.
      9bfe3d61
  2. 12 Feb, 2004 1 commit
    • Leigh B. Stoller's avatar
      * Removed startexp, and merged its contents into batchexp. There has been · aef08532
      Leigh B. Stoller authored
        no reason for the separation for a long time, and it made maintence more
        difficult cause of duplication between batchexp and startexp (batch was
        the sole user of startexp). Cleaner solution.
      
      * Check argument processing for batchexp, swapexp, endexp to make sure the
        taint checks are correct. All three of these scripts will now be
        available from ops. I especially watch the filename processing, which was
        pretty loose before and could allow some to grab a file on boss by trying
        to use it as an NS file (scripts all runs as user of course). The web
        interface generates filenames that are hard to guess, so rather then
        wrapping these scripts when invoked from ops, just allow the usual paths
        (/proj, /groups, /users) but also /tmp/$uid-XXXXXX.nsfile pattern, which
        should be hard enough to guess that users will not be able to get
        anything they are not supposed to.
      
      * Add -w (waitmode) options to all three scripts. In waitmode, the backend
        detaches, but the parent remains waiting for the child to finish so it
        can exit with the appropriate status (for scripting). The user can
        interrupt (^C), but it has no effect on the backend; it just kills the
        parent side that is waiting (backend is in a new session ID). Log outout
        still goes to the file (available from web page) and is emailed.
      aef08532
  3. 10 Feb, 2004 1 commit
  4. 02 Feb, 2004 1 commit
  5. 15 Jan, 2004 1 commit
  6. 16 Dec, 2003 2 commits
  7. 15 Dec, 2003 1 commit
    • Shashi Guruprasad's avatar
      Distributed NSE changes. In other words, simulation resources are · d266bd71
      Shashi Guruprasad authored
      now mapped to more than one PC if required. The simnode_capacity
      column in the node_types table determines how many sim nodes can
      be packed on one PC. The packing factor can also be controlled via
      tb-set-colocate-factor to be smaller than simnode_capacity.
      
      - No frontend code changes. To summarize:
        $ns make-simulated {
          ...
        }
        is still the easy way to put a whole bunch of Tcl code to be
        in simulation.
        One unrelated fix in the frontend code is to fix the
        xmlencode() function which prior to this would knock off
        newlines from columns in the XML output. This affected
        nseconfigs since it is one of the few columns with embedded
        newlines. Also changed the event type and event object type
        in traffic.tcl from TRAFGEN/MODIFY to NSE/NSEEVENT.
      
      - More Tcl code in a new directory tbsetup/nseparse
        -> Runs on ops similar to the main parser. This is invoked
           from assign_wrapper in the end if there are simnodes
        -> Partitions the Tcl code into multiple Tcl specifications
           and updates the nseconfigs table via xmlconvert
        -> Comes with a lot of caveats. Arbitrary Tcl code such as user
           specified objects or procedures will not be re-generated. For
           example, if a user wanted a procedure to be included in Tcl
           code for all partitions, there is no way for code in nseparse
           to do that. Besides that, it needs to be tested more thoroughly.
      
      - xmlconvert has a new option -s. When invoked with this option,
        the experiments table is not allowed to be modified. Also,
        virtual tables are just updated (as opposed to deleting
        all rows in the first invocation before inserting new rows)
      
      - nse.patch has all the IP address related changes committed in
        iversion 1.11 + 2 other changes. 1) MTU discovery support in
        the ICMP agent 2) "$ns rlink" mechanism for sim node to real
        node links
      
      - nseinput.tcl includes several client side changes to add IP
        routes in NSE and the kernel routing table for packets crossing
        pnodes. Also made the parsing of tmcc command output more robust
        to new changes. Other client side changes in libsetup.pm and other
        scripts to run nse, are also in this commit
      
      - Besides the expected changes in assign_wrapper for simulated nodes,
        the interfaces and veth_interfaces tables are updated with
        routing table identifiers (rtabid). The tmcd changes are already
        committed. This field is used only by sim hosts on the client side.
        Of course, they can be used by jails as well if desired.
      d266bd71
  8. 02 Dec, 2003 1 commit
  9. 01 Dec, 2003 1 commit
    • Robert Ricci's avatar
      New scripts: tarfiles_setup, fetchtar.proxy, and webtarfiles_setup . · c0c6547c
      Robert Ricci authored
      The idea is to give us hooks for grabbing experimenters' tarballs (and
      RPMs) from locations other than files on ops. Mainly, to remove
      another dependance on users having shells on ops.
      
      tarfiles_setup supports fetching files from http and ftp URLs right
      now, through wget. It places them into the experiment directory, so
      that they'll go away when the experiment is terminated, and the rest
      of the chain (ie. downloading to clients and os_setup's checks)
      remains unchaged.  It is now tarfiles_setup's job to copy tarballs and
      RPMs from the virt_nodes table to the nodes table for allocated nodes.
      This way, it can translate URLs into the local filenames it
      constructs. It get invoked from tbswap.
      
      Does the actual fetching over on ops, running as the user, with
      fetchtar.proxy.
      
      Should be idempotent, so we should be able to give the user a button
      to run webtarfiles_setup (none exists yet) yet to 'freshen' their
      tarballs. (We'd also have to somehow let the experiment's nodes know
      they need to re-fetch their tarballs.)
      
      One funny side effect of this is that the separator in
      virt_nodes.tarfiles is now ';' instead of ':' like nodes.tarballs,
      since we can now put URLs in the former. Making these consistent is a
      project for another day.
      c0c6547c
  10. 18 Nov, 2003 1 commit
  11. 13 Nov, 2003 1 commit
  12. 09 Oct, 2003 1 commit
    • Leigh B. Stoller's avatar
      Reorg of two aspects of node update. · 2641af4d
      Leigh B. Stoller authored
      * install-rpm, install-tarfile, spewrpmtar.php3, spewrpmtar.in: Pumped
        up even more! The db file we store in /var/db now records both the
        timestamp (of the file, or if remote the install time) and the MD5
        of the file that was installed. Locally, we can get this info when
        accessing the file via NFS (copymode on or off). Remote, we use wget
        to get the file, and so pass the timestamp along in the URL request,
        and let spewrpmtar.in determine if the file has changed. If the
        timestamp it gets is >= to the timestamp of the file, an error code
        of 304 (Not Modifed) is returned. Otherwise the file is returned.
      
        If the timestamps are different (remote, server sends back an actual
        file), the MD5 of the file is compared against the value stored. If
        they are equal, update the timestamp in the db file to avoid
        repeated MD5s (or server downloads) in the future. If the MD5 is
        different, then reinstall the tarball or rpm, and update the db file
        with the new timestamp and MD5. Presto, we have auto update capability!
      
        Caveat: I pass along the old MD5 in the URL, but it is currently
        ignored. I do not know if doing the MD5 on the server is a good
        idea, but obviously it is easy to add later. At the moment it
        happens on the node, which means wasted bandwidth when the timestamp
        has changed, but the file has not (probably not something that will
        happen in typical usage).
      
        Caveat: The timestamp used on remote nodes is the time the tarfile
        is installed (GM time of course). We could arrange to return the
        timestamp of the local file back to the node, but that would mean
        complicating the protocol (or using an http header) and I was not in
        the mood for that. In typical usage, I do not think that people will
        be changing tarfiles and rpms so rapidly that this will make a
        difference, but if it does, we can change it.
      
      * node_update.in, client side watchdog, and various web pages:
        Deflated node_update, removing all of the older ssh code. We now
        assume that all nodes will auto update on a periodic basis, via the
        watchdog that runs on all client nodes, including plab nodes.
      
        Changed the permission check to look for new UPDATE permission (used
        to be UPDATEACCOUNT). As before, it requires local_root or better.
        The reason for this is that node_update now implies more than just
        updating the accounts/mounts. The web pages have been changed to
        explain that in addition to mounts/accounts, rpms and tarfiles will
        also be updated. At the moment, this is still tied to a single
        variable (update_accounts) in the nodes table, but as Kirk requested
        at the meeting, it will probably be nice to split these out in the
        future.
      
        Added the ability to node_update a single node in an experiment (in
        addition to all nodes option on the showexp page). This has been
        added to the shownode webpage menu options.
      
        Changed locking code to use the newer wrapper states, and to move
        the experiment to RUNNING_LOCKED until the update completes. This is
        to prevent mayhem in the rest of the system (which could be dealt
        with, but is not worth the trouble; people have to wait until their
        initiated update is complete, before they can swap out the
        experiment).
      
        Added "short" mode to shownode routine, equiv to the recently added
        short mode for showexp. I use this on the confirmation page for
        updating a single node, giving the user a couple of pertinent (feel
        good) facts before they comfirm.
      2641af4d
  13. 25 Sep, 2003 1 commit
  14. 18 Sep, 2003 1 commit
  15. 22 Aug, 2003 3 commits
  16. 06 Aug, 2003 1 commit
  17. 28 Jul, 2003 1 commit
    • Robert Ricci's avatar
      Add newnode_reboot, which is used on nodes that aren't fully in the · 1f5419ae
      Robert Ricci authored
      database yet, so using node_reboot on them would be catastrophic.
      Uses the special newnode MFS's ssh key.
      
      Also, when a node has booted or the first time, it may be up on a
      temporary IP address rather than its permanent one, so we pass the
      node's IP rather than node_id on the command line.
      
      Only tries ssh.
      1f5419ae
  18. 25 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      Commit my version of assign_wrapper as assign_wrapper-new, and change · 62e38deb
      Leigh B. Stoller authored
      tbswap to use this version inside the testbed project only! All other
      projects will see the old version for now; there are just too many
      things to test, and the testsuite gets just a fraction of them. Some
      highlights (which I will expand on later when I commit this version to
      the main version):
      
      * New -t option to create the TOP file, and then exit. The only other
        side effect of this is to update the min/max nodes for the
        experiment in the DB, unles new option -n (impotent mode) is given.
      
      * New -n option to operate in impotent mode; do not allocate nodes and
        do not modify the DB. Okay, so this option is not as great as it
        sounds. I eventually hit the point of diminishing returns, with
        trying to make things work right without DB modification. At some
        point I just throw in the towel and exit. This currently happens after
        interpolating the link results of assign. But, I have found it very
        useful, and could get better with time. Being able to run assign on
        the main DB without sucking up the nodes is nice for debugging.
      
      * Lots of data structure organization, mostly on the virtual topology
        side of assign (you can think of assign as two sections, the part
        that interprets the DB tables and creates the TOP file, and the part
        that reads the results of assign and sets up all the physical stuff
        in the DB). I removed numerous global hashes, and combined them into
        aggregate data structures, such as they are in Perl. My approach for
        this was to read the tables from the DB, and keep them handy,
        extending them as needed with stuff that assign_wrapper generates as
        it proceeds. This has the side effect of cutting down on the number
        of queries as well.
      
        The next task is to do the physical side reorg, but not up for that
        yet.
      62e38deb
  19. 14 Jul, 2003 1 commit
  20. 08 Jul, 2003 1 commit
  21. 30 Jun, 2003 1 commit
    • Leigh B. Stoller's avatar
      Make the new parser live on mini. New parser ssh'es over to ops to · 2202fc5a
      Leigh B. Stoller authored
      do the actual parse. The parser now spits out XML instead of DB
      queries, and the wrapper on boss converts that to DB insertions after
      verification. There are some makefile changes as well to install the
      new parser on ops via NFS, since otherwise the parser could
      intolerably out of date on ops!
      2202fc5a
  22. 21 Apr, 2003 1 commit
    • Robert Ricci's avatar
      New script: switchmac . Lists all MACs that have been learned by all · 5f8fea31
      Robert Ricci authored
      the experimental switches. The idea is to be able to auto-detect
      where a node has been plugged in, so that we fill out the wires table
      without any manual intervention! This is a step towards being able
      to automate the adding of nodes.
      
      Has a runtime linear in the number of VLANs on the experimental
      switches, so it should run pretty fast on a new testbed, but can
      be kinda slow on, say, ours.
      5f8fea31
  23. 16 Apr, 2003 1 commit
    • Leigh B. Stoller's avatar
      Add support for idleswapping an experiment as the creator of the · ff5a57de
      Leigh B. Stoller authored
      experiment, rather than as an administrator, which presents group
      permission problems when the experiment is in a subgroup (requires two
      additional group, whereas suexec adds only one group). That aside, the
      correct approach is to run the swap as the creator. To do that, must
      flip to the user (from the admin person) in the backend using the new
      idleswap script, and then run the normal swapexp. Add new option to
      swapexp (-i) which changes the email slightly to make it clear that
      the experiment was idleswapped, and so that the From: is tbops not the
      user (again, to make it more clear).
      ff5a57de
  24. 04 Apr, 2003 1 commit
    • Chad Barb's avatar
      · c5333324
      Chad Barb authored
      tbswapin and tbswapout are no more.
      c5333324
  25. 03 Apr, 2003 1 commit
  26. 11 Mar, 2003 1 commit
    • Chad Barb's avatar
      · caad3a35
      Chad Barb authored
      New version of unified tbswap in/out.
      startexp/endexp/swapexp have been changed to use new script.
      
      tbswapin and tbswapout have been replaced with a script which
      spits out a warning message, then calls tbswap appropriately.
      
      The README has also been modified.
      caad3a35
  27. 07 Mar, 2003 1 commit
  28. 13 Feb, 2003 1 commit
  29. 24 Jan, 2003 1 commit
  30. 19 Dec, 2002 1 commit
  31. 24 Oct, 2002 1 commit
    • Leigh B. Stoller's avatar
      Add stuff to update the SFS keys on the fileserver after someone uses · cc1c4e54
      Leigh B. Stoller authored
      the web page to add/delete a key! Nodes were getting updated, but
      the SFS server was not cause there was no program to fire the new keys
      over there.
      
      The operation is currently simple. sfskey_update on boss constructs a
      new sfs_users file. Then it runs sfskey_update.proxy on ops (vis ssh
      of course), and gives it the new file via stdin. The proxy creates the
      .pub version from that file, and then moves the two new files into
      place in /etc/sfs. I employ the same locking stuff that Rob did in
      exports_setup and named_setup to prevent multiple updates from
      stacking up. Not likely, but might as well. Also note that the entire
      file is regenerated. When we get 5000 users this might have to change
      a little bit!
      
      Also changed mkacct slightly. Instead of doing a "sfskey register" on
      ops after generating the new key, just add it to the DB. Then fire off
      an sfskey_update to push the new keys over. Also add a -f flag to
      mkacct for use from the web page to indicate that the user has changed
      his SFS keys. Note that mkacct should probably take a series of flags
      since we have it as a wrapper for several things. Or maybe split all
      this stuff up.
      cc1c4e54
  32. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  33. 09 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Add a new script: tbresize · 0dd12dba
      Mac Newbold authored
      (installs into /usr/testbed/bin/tbresize but isn't avail. on ops yet)
      
      Usage: tbresize [-d] -a -e pid,eid -n num -t type [-p prefix]
             tbresize [-d] -r -e pid,eid <node> [<node> ...]
             tbresize -h
      Use -h to show this usage message.
      Use -d to enable extra debugging output.
      Use -a to add nodes to an experiment.
      Use -r to remove nodes from an experiment.
      Use -e pid,eid to specify the experiment to resize.
      Use -n to specify the number of nodes to add.
      Use -t to specify the type of the nodes to be added (pc, pc850, pc600,
      etc).
      Use -p to specify a prefix for vnames (i.e. "node" => node0 .. nodeN).
      With -r, specify a list of one or more nodes to be removed (i.e. pcXX).
      
      Can even resize an expt down to no nodes then back up again. If it has
      one LAN/link in the expt, it adds the new nodes to it. If it has zero or
      more than one, it doesn't connect the new nodes to the topology.
      
      After finding and reserving (or before freeing) it fixes up the right
      places in the db and reruns snmpit, then reruns exports_setup and
      named_setup and reboots all the nodes that are now in the expt so they get
      updated configuration data.
      
      Even visualizes properly after being resized, the only catch is that the
      ns file is the original one, not one generated from the db.
      
      Use it, abuse it, have fun with it, and let me know what breaks.
      0dd12dba
  34. 02 Oct, 2002 1 commit
    • Chad Barb's avatar
      Initial version of delay web control. · dd27f82a
      Chad Barb authored
      Functional, but needs some work.
      Won't allow non-admins to use it (since it doesn't do "proper" permission checking yet.)
      Input is aggressively checked for bad mojo before being pasted into any command line.
      
      Run from /delaycontrol.php3?eid=exptname&pid=projname
      Admin bit must be on.
      dd27f82a
  35. 10 Sep, 2002 1 commit
  36. 06 Sep, 2002 1 commit
  37. 11 Jul, 2002 1 commit