1. 13 Dec, 2010 1 commit
  2. 08 Dec, 2010 2 commits
  3. 07 Dec, 2010 3 commits
  4. 20 Nov, 2010 1 commit
    • Ryan Jackson's avatar
      Fixed segfault in event/new_sched · 15fc15b7
      Ryan Jackson authored
      Fixed a segfault in the new event scheduler
      
      Cleaned up some code that dealt with converting
      xmlrpc_c::value_strings to C string constants to avoid further memory
      corruption issues.
      
      Also converted AddAgent(), AddGroup(), AddUserEnv() to take 'const
      char *' pointers instead of 'char *'s to silence GCC's warnings about
      invalid conversions from 'const char *' to 'char *'.
      15fc15b7
  5. 17 Nov, 2010 2 commits
  6. 12 Nov, 2010 1 commit
  7. 10 Nov, 2010 1 commit
  8. 09 Nov, 2010 2 commits
  9. 03 Nov, 2010 1 commit
  10. 28 Oct, 2010 2 commits
  11. 20 Oct, 2010 1 commit
    • Mike Hibler's avatar
      Support for no shared filesystem (unsupport for shared filesystem?) and · c1c1bce2
      Mike Hibler authored
      (eventual) support for NFS servers without race conditions!
      
      This means no NFS between nodes and ops/fs. There are still NFS mounts of
      ops on boss however.
      
      Added new defs-* variable NOSHAREDFS, which when set non-zero will disable
      the export of NFS filesystems to nodes.  Involved lots of little changes:
      
       * /users, /proj, and /share filesystems are not exported to nodes.
      
       * Returned mount info now includes an FSTYPE key which will be set to "LOCAL"
         if NOSHAREDFS is in effect (by default it is set to "NFS-RACY"; more on
         this later).  In the case where it is set to LOCAL, the other mount lines
         no longer contain REMOTE=foo settings.  Because of this change,
         THE TMCD VERSION NUMBER HAS BEEN BUMPED TO 32.
      
       * The client rc.mounts script will now create local versions of /users/*,
         /proj/<pid>, and /share when FSTYPE=LOCAL.  It first runs mkextrafs to
         create a large partition for these, since someday we will likely want
         to pre-populate these with a non-trivial amount of data.  Right now,
         the only thing that is put in the user's homedir is the standard dotfiles
         for the OS and the Emulab authorized_keys file (so you can login).
      
       * Linktest had to be modified to fetch the various results files (via
         loghole) rather than just assuming they were in /proj.  And also changed
         to invoke tevc with the local copy of the event key so it won't try to
         read it over NFS.
      
       * create_image was modified to ssh to the node and run the imagezip
         command, capturing the output of ssh.  This is controlled via the "-s"
         option which defaults to on for a NOSHAREDFS system, but can also be
         used on a normal system.
      
       * elabinelab's can be configured with/without a shared FS via the
         CONFIG_SHAREDFS attribute (note polarity change) which defaults to 1.
      
      Another new defs-* variable, NFSRACY, will some day allow you to specify
      (by setting to 0) that your NFS server does NOT have the nefarious mountd
      race condition when changing /etc/exports.  Currently, this defaults to 1
      since all versions of FreeBSD supported as an "fs" node have this "feature."
      Rumor has it that FreeBSD 8 does not have this problem nor, presumably,
      would a Linux NFS server.
      
      The only use of this variable right now is to set the FSTYPE returned by the
      tmcd "mounts" call, which in turn is used by one client script, rc.topomap
      (via a libsetup function) to determine whether it should try copying
      the topo file multiple times.
      
      Random: add python2.6 to list of python's checked for in configure.
      Random: resync defs-example-privatecnet with defs-example.
      Random: did a little code-pissin here and there.
      c1c1bce2
  12. 19 Oct, 2010 1 commit
    • Mike Hibler's avatar
      Minor tweaks to "linktest unplugged". · c3008371
      Mike Hibler authored
      In elab_linktest.pl, put explicit command line arguments after the defaults
      so that they will override them in linktest.pl.  Also, don't conflate LOGDIR
      and PROJDIR.
      
      Half-assed attempt to free the script from the notion of a pid/eid, by
      defining EVENTID param.  However, we still wind up reading a pid/eid from a
      file though this could also be eliminated.
      c3008371
  13. 12 Oct, 2010 2 commits
  14. 11 Oct, 2010 2 commits
  15. 29 Sep, 2010 2 commits
    • Mike Hibler's avatar
      5ae92284
    • Mike Hibler's avatar
      Handle a common failure on the node reload path. · 4dc57d48
      Mike Hibler authored
      Under load, nodes that have just entered reloading and have just rebooted
      might fail to get bootinfo.  The default behavior in this case is for the
      node to boot from disk (dubious, but that is the topic for another day).
      This causes the node to fall off the RELOAD path, winding up in either
      TBFAILED or ISUP.  Worse, if the node makes it to ISUP, its reload state
      is cleared and even if the reload_daemon reboots the node, it will still
      not go through the reloading process.
      
      The result is a bunch of nodes left in reloading.  Now if a node makes an
      invalid transition to TBFAILED or ISUP while in the RELOAD state machine,
      it fires the new REBOOT trigger which does...well, you figure it out.
      Note that in the ISUP case, this trigger overrides the default that would
      otherwise clear the reload state--so reboot is sufficient to get the machine
      back on the RELOAD track.
      4dc57d48
  16. 18 Sep, 2010 2 commits
    • Mike Hibler's avatar
      Lint. · 36d88387
      Mike Hibler authored
      36d88387
    • Mike Hibler's avatar
      Add -ltb to link line. · 0d1f2b6a
      Mike Hibler authored
      This is part of cleaning up the name space of pubsub.  Previously, this
      program was getting the "info" call from libpubsub and that version is going
      to become private.  So use the one in libtb instead.
      0d1f2b6a
  17. 14 Sep, 2010 3 commits
  18. 23 Aug, 2010 1 commit
    • Mike Hibler's avatar
      Don't build delay-agent on FreeBSD 8. · 250a633c
      Mike Hibler authored
      Delay agent won't build on FreeBSD 8.x right now due to dummynet API changes.
      Not even sure we will bother to fix this since we have a newer, more OS
      independent agent.
      250a633c
  19. 20 Aug, 2010 3 commits
  20. 18 Aug, 2010 4 commits
  21. 22 Jul, 2010 1 commit
  22. 17 Jul, 2010 1 commit
  23. 14 Jul, 2010 1 commit
    • Leigh B Stoller's avatar
      Version 1 of the event clients. See attached message. · 2364ad56
      Leigh B Stoller authored
      From: Leigh Stoller <stoller@flux.utah.edu>
      Date: July 1, 2010 11:35:08 AM MDT
      Subject: Re: Event System Issues
      
      Mike and I exchanged a couple of email messages. Mike has indicated
      that we should drop all Elvin support. A nice goal, but not possible
      cause of the basic mistake I made in "version 0" of the event code.
      
      What we *can* do is stop *generating* the elvin hashes completely in
      Version 1. We also drop the elvin_gateway, thus no longer supporting
      ancient images that are still using the real elvin libraries. I am
      okay with this. Comment if you have objections.
      
      Version 0 elvin-compat (these are pubsub clients with ELVIN_COMPAT=1)
      binaries will work cause of the magic ___elvin_ordered___ flag, which
      actually tells the client that the hmac is a pubsub ordered hmac. (I
      know, I am just great at naming things). I just add this little flag
      to events in version 1.
      
      Version 0 non-elvin-compat binaries will not work, but this is okay.
      The only case this matters right now is Protogeni, where we need to be
      able to talk to non-elvin-compat binaries at remote sites. I have
      solved this with a version0 gateway as described in the previous
      message. ops will run a secondary pubsubd on another port, and the
      protogeni client startup code will have clients connect to that
      pubsubd instead. This is mostly tested, and it can roll out to other
      sites as needed, once we roll out cooked mode.
      
      Version 1 clients are fully interoperable.
      
      Lastly, we still need to be able to compare elvin HMACs coming from
      existing version 0 elvin-compat binaries (from our many many system
      and custom images). Thats cause all those images are still going to be
      generating the HMACs in elvin order, and so the server programs on ops
      (event scheduler, tevc, etc) need that elvin hashing code built into
      it. See first paragraph.
      
      Anyway, I have all this done and tested on my elabinelab. I had wanted
      to make it in time for code freeze, but not enough time to get proper
      debugging, so I will push once code freeze is over.
      
      Lbs
      2364ad56