1. 20 Oct, 2010 1 commit
    • Mike Hibler's avatar
      Support for no shared filesystem (unsupport for shared filesystem?) and · c1c1bce2
      Mike Hibler authored
      (eventual) support for NFS servers without race conditions!
      This means no NFS between nodes and ops/fs. There are still NFS mounts of
      ops on boss however.
      Added new defs-* variable NOSHAREDFS, which when set non-zero will disable
      the export of NFS filesystems to nodes.  Involved lots of little changes:
       * /users, /proj, and /share filesystems are not exported to nodes.
       * Returned mount info now includes an FSTYPE key which will be set to "LOCAL"
         if NOSHAREDFS is in effect (by default it is set to "NFS-RACY"; more on
         this later).  In the case where it is set to LOCAL, the other mount lines
         no longer contain REMOTE=foo settings.  Because of this change,
       * The client rc.mounts script will now create local versions of /users/*,
         /proj/<pid>, and /share when FSTYPE=LOCAL.  It first runs mkextrafs to
         create a large partition for these, since someday we will likely want
         to pre-populate these with a non-trivial amount of data.  Right now,
         the only thing that is put in the user's homedir is the standard dotfiles
         for the OS and the Emulab authorized_keys file (so you can login).
       * Linktest had to be modified to fetch the various results files (via
         loghole) rather than just assuming they were in /proj.  And also changed
         to invoke tevc with the local copy of the event key so it won't try to
         read it over NFS.
       * create_image was modified to ssh to the node and run the imagezip
         command, capturing the output of ssh.  This is controlled via the "-s"
         option which defaults to on for a NOSHAREDFS system, but can also be
         used on a normal system.
       * elabinelab's can be configured with/without a shared FS via the
         CONFIG_SHAREDFS attribute (note polarity change) which defaults to 1.
      Another new defs-* variable, NFSRACY, will some day allow you to specify
      (by setting to 0) that your NFS server does NOT have the nefarious mountd
      race condition when changing /etc/exports.  Currently, this defaults to 1
      since all versions of FreeBSD supported as an "fs" node have this "feature."
      Rumor has it that FreeBSD 8 does not have this problem nor, presumably,
      would a Linux NFS server.
      The only use of this variable right now is to set the FSTYPE returned by the
      tmcd "mounts" call, which in turn is used by one client script, rc.topomap
      (via a libsetup function) to determine whether it should try copying
      the topo file multiple times.
      Random: add python2.6 to list of python's checked for in configure.
      Random: resync defs-example-privatecnet with defs-example.
      Random: did a little code-pissin here and there.
  2. 19 Oct, 2010 1 commit
    • Mike Hibler's avatar
      Minor tweaks to "linktest unplugged". · c3008371
      Mike Hibler authored
      In elab_linktest.pl, put explicit command line arguments after the defaults
      so that they will override them in linktest.pl.  Also, don't conflate LOGDIR
      and PROJDIR.
      Half-assed attempt to free the script from the notion of a pid/eid, by
      defining EVENTID param.  However, we still wind up reading a pid/eid from a
      file though this could also be eliminated.
  3. 12 Oct, 2010 2 commits
  4. 11 Oct, 2010 2 commits
  5. 29 Sep, 2010 2 commits
    • Mike Hibler's avatar
    • Mike Hibler's avatar
      Handle a common failure on the node reload path. · 4dc57d48
      Mike Hibler authored
      Under load, nodes that have just entered reloading and have just rebooted
      might fail to get bootinfo.  The default behavior in this case is for the
      node to boot from disk (dubious, but that is the topic for another day).
      This causes the node to fall off the RELOAD path, winding up in either
      TBFAILED or ISUP.  Worse, if the node makes it to ISUP, its reload state
      is cleared and even if the reload_daemon reboots the node, it will still
      not go through the reloading process.
      The result is a bunch of nodes left in reloading.  Now if a node makes an
      invalid transition to TBFAILED or ISUP while in the RELOAD state machine,
      it fires the new REBOOT trigger which does...well, you figure it out.
      Note that in the ISUP case, this trigger overrides the default that would
      otherwise clear the reload state--so reboot is sufficient to get the machine
      back on the RELOAD track.
  6. 18 Sep, 2010 2 commits
    • Mike Hibler's avatar
      Lint. · 36d88387
      Mike Hibler authored
    • Mike Hibler's avatar
      Add -ltb to link line. · 0d1f2b6a
      Mike Hibler authored
      This is part of cleaning up the name space of pubsub.  Previously, this
      program was getting the "info" call from libpubsub and that version is going
      to become private.  So use the one in libtb instead.
  7. 14 Sep, 2010 3 commits
  8. 23 Aug, 2010 1 commit
    • Mike Hibler's avatar
      Don't build delay-agent on FreeBSD 8. · 250a633c
      Mike Hibler authored
      Delay agent won't build on FreeBSD 8.x right now due to dummynet API changes.
      Not even sure we will bother to fix this since we have a newer, more OS
      independent agent.
  9. 20 Aug, 2010 3 commits
  10. 18 Aug, 2010 4 commits
  11. 22 Jul, 2010 1 commit
  12. 17 Jul, 2010 1 commit
  13. 14 Jul, 2010 2 commits
    • Leigh B Stoller's avatar
      Version 1 of the event clients. See attached message. · 2364ad56
      Leigh B Stoller authored
      From: Leigh Stoller <stoller@flux.utah.edu>
      Date: July 1, 2010 11:35:08 AM MDT
      Subject: Re: Event System Issues
      Mike and I exchanged a couple of email messages. Mike has indicated
      that we should drop all Elvin support. A nice goal, but not possible
      cause of the basic mistake I made in "version 0" of the event code.
      What we *can* do is stop *generating* the elvin hashes completely in
      Version 1. We also drop the elvin_gateway, thus no longer supporting
      ancient images that are still using the real elvin libraries. I am
      okay with this. Comment if you have objections.
      Version 0 elvin-compat (these are pubsub clients with ELVIN_COMPAT=1)
      binaries will work cause of the magic ___elvin_ordered___ flag, which
      actually tells the client that the hmac is a pubsub ordered hmac. (I
      know, I am just great at naming things). I just add this little flag
      to events in version 1.
      Version 0 non-elvin-compat binaries will not work, but this is okay.
      The only case this matters right now is Protogeni, where we need to be
      able to talk to non-elvin-compat binaries at remote sites. I have
      solved this with a version0 gateway as described in the previous
      message. ops will run a secondary pubsubd on another port, and the
      protogeni client startup code will have clients connect to that
      pubsubd instead. This is mostly tested, and it can roll out to other
      sites as needed, once we roll out cooked mode.
      Version 1 clients are fully interoperable.
      Lastly, we still need to be able to compare elvin HMACs coming from
      existing version 0 elvin-compat binaries (from our many many system
      and custom images). Thats cause all those images are still going to be
      generating the HMACs in elvin order, and so the server programs on ops
      (event scheduler, tevc, etc) need that elvin hashing code built into
      it. See first paragraph.
      Anyway, I have all this done and tested on my elabinelab. I had wanted
      to make it in time for code freeze, but not enough time to get proper
      debugging, so I will push once code freeze is over.
    • Leigh B Stoller's avatar
  14. 13 Jul, 2010 1 commit
  15. 07 Jun, 2010 1 commit
  16. 04 Jun, 2010 1 commit
  17. 17 May, 2010 1 commit
  18. 27 Apr, 2010 1 commit
  19. 15 Apr, 2010 1 commit
  20. 24 Mar, 2010 1 commit
  21. 23 Mar, 2010 1 commit
  22. 23 Feb, 2010 4 commits
  23. 19 Feb, 2010 1 commit
    • Leigh B Stoller's avatar
      Okay, I think I have solved the problem with interoperability between · 350bc54e
      Leigh B Stoller authored
      event clients and servers that do not have the same value of
      ELVIN_COMPAT compiled in. Most of use know this as the dreaded "HMAC
      mismatch" failure that clients spit out.
      Bottom line: My tests so far have ELVIN_COMPAT and non ELVIN_COMPAT
      clients sending and receiving events to/from each other okay. Events
      work in both directions.
      More details:
      The basic problem is that non ELVIN_COMPAT clients generate the hmac
      by traversing the notification in linear order. Very simple.
      When ELVIN_COMPAT is compiled in, we want to be able to take a
      notification that started in an actual elvin client, was converted to
      a pubsub notification, and then passed along. In this case the hmac
      that was in the original elvin packet was generated using the elvin
      traversal, which is not linear but a hash function. Annoying.
      To deal with this, our ELVIN_COMPAT clients take the notification and
      compute what the hash ordering is supposed to be, and then generates
      the HMAC for comparison. Ditto for outgoing notifications, which is
      why we have the interoperability problems.
      If I had been thinking ahead, I would have put a version number into
      pubsub notifications. Oh wait, thats another story ... lets get back
      to ELVIN_COMPAT ...  Instead of computing an order, I should have just
      put them into the notification in the proper elvin order, so that all
      clients only had to traverse it in linear order to compute the hmac.
      This is essentially what my changes have done. All notifications go
      out with the hmac computed from the linear ordering. When ELVIN_COMPAT
      is on, I use a hash table to generate the proper ordering, and then
      insert them into the notification.
      By putting adding a version number into the notification (refer to
      Mike's Rule of Version Numbers), I can tell old clients from new
      clients, and so new clients know what to do with a notification from
      an old client.
  24. 09 Feb, 2010 1 commit
  25. 25 Jan, 2010 1 commit
    • Mike Hibler's avatar
      Fixes for Fedora 10. · 9e9c658b
      Mike Hibler authored
      Fix obvious typo in liblocsetup.pm which was getting perl5.10 all cranky.
      Stop statically linking a couple of proxy pieces.  In general, it is/was
      a bad idea, and Fedora 10 doesn't have a static libz anyway.