1. 12 Oct, 2010 2 commits
  2. 11 Oct, 2010 2 commits
  3. 29 Sep, 2010 2 commits
    • Mike Hibler's avatar
    • Mike Hibler's avatar
      Handle a common failure on the node reload path. · 4dc57d48
      Mike Hibler authored
      Under load, nodes that have just entered reloading and have just rebooted
      might fail to get bootinfo.  The default behavior in this case is for the
      node to boot from disk (dubious, but that is the topic for another day).
      This causes the node to fall off the RELOAD path, winding up in either
      TBFAILED or ISUP.  Worse, if the node makes it to ISUP, its reload state
      is cleared and even if the reload_daemon reboots the node, it will still
      not go through the reloading process.
      The result is a bunch of nodes left in reloading.  Now if a node makes an
      invalid transition to TBFAILED or ISUP while in the RELOAD state machine,
      it fires the new REBOOT trigger which does...well, you figure it out.
      Note that in the ISUP case, this trigger overrides the default that would
      otherwise clear the reload state--so reboot is sufficient to get the machine
      back on the RELOAD track.
  4. 18 Sep, 2010 2 commits
    • Mike Hibler's avatar
      Lint. · 36d88387
      Mike Hibler authored
    • Mike Hibler's avatar
      Add -ltb to link line. · 0d1f2b6a
      Mike Hibler authored
      This is part of cleaning up the name space of pubsub.  Previously, this
      program was getting the "info" call from libpubsub and that version is going
      to become private.  So use the one in libtb instead.
  5. 14 Sep, 2010 3 commits
  6. 23 Aug, 2010 1 commit
    • Mike Hibler's avatar
      Don't build delay-agent on FreeBSD 8. · 250a633c
      Mike Hibler authored
      Delay agent won't build on FreeBSD 8.x right now due to dummynet API changes.
      Not even sure we will bother to fix this since we have a newer, more OS
      independent agent.
  7. 20 Aug, 2010 3 commits
  8. 18 Aug, 2010 4 commits
  9. 22 Jul, 2010 1 commit
  10. 17 Jul, 2010 1 commit
  11. 14 Jul, 2010 2 commits
    • Leigh Stoller's avatar
      Version 1 of the event clients. See attached message. · 2364ad56
      Leigh Stoller authored
      From: Leigh Stoller <stoller@flux.utah.edu>
      Date: July 1, 2010 11:35:08 AM MDT
      Subject: Re: Event System Issues
      Mike and I exchanged a couple of email messages. Mike has indicated
      that we should drop all Elvin support. A nice goal, but not possible
      cause of the basic mistake I made in "version 0" of the event code.
      What we *can* do is stop *generating* the elvin hashes completely in
      Version 1. We also drop the elvin_gateway, thus no longer supporting
      ancient images that are still using the real elvin libraries. I am
      okay with this. Comment if you have objections.
      Version 0 elvin-compat (these are pubsub clients with ELVIN_COMPAT=1)
      binaries will work cause of the magic ___elvin_ordered___ flag, which
      actually tells the client that the hmac is a pubsub ordered hmac. (I
      know, I am just great at naming things). I just add this little flag
      to events in version 1.
      Version 0 non-elvin-compat binaries will not work, but this is okay.
      The only case this matters right now is Protogeni, where we need to be
      able to talk to non-elvin-compat binaries at remote sites. I have
      solved this with a version0 gateway as described in the previous
      message. ops will run a secondary pubsubd on another port, and the
      protogeni client startup code will have clients connect to that
      pubsubd instead. This is mostly tested, and it can roll out to other
      sites as needed, once we roll out cooked mode.
      Version 1 clients are fully interoperable.
      Lastly, we still need to be able to compare elvin HMACs coming from
      existing version 0 elvin-compat binaries (from our many many system
      and custom images). Thats cause all those images are still going to be
      generating the HMACs in elvin order, and so the server programs on ops
      (event scheduler, tevc, etc) need that elvin hashing code built into
      it. See first paragraph.
      Anyway, I have all this done and tested on my elabinelab. I had wanted
      to make it in time for code freeze, but not enough time to get proper
      debugging, so I will push once code freeze is over.
    • Leigh Stoller's avatar
  12. 13 Jul, 2010 1 commit
  13. 07 Jun, 2010 1 commit
  14. 04 Jun, 2010 1 commit
  15. 17 May, 2010 1 commit
  16. 27 Apr, 2010 1 commit
  17. 15 Apr, 2010 1 commit
  18. 24 Mar, 2010 1 commit
  19. 23 Mar, 2010 1 commit
  20. 23 Feb, 2010 4 commits
  21. 19 Feb, 2010 1 commit
    • Leigh Stoller's avatar
      Okay, I think I have solved the problem with interoperability between · 350bc54e
      Leigh Stoller authored
      event clients and servers that do not have the same value of
      ELVIN_COMPAT compiled in. Most of use know this as the dreaded "HMAC
      mismatch" failure that clients spit out.
      Bottom line: My tests so far have ELVIN_COMPAT and non ELVIN_COMPAT
      clients sending and receiving events to/from each other okay. Events
      work in both directions.
      More details:
      The basic problem is that non ELVIN_COMPAT clients generate the hmac
      by traversing the notification in linear order. Very simple.
      When ELVIN_COMPAT is compiled in, we want to be able to take a
      notification that started in an actual elvin client, was converted to
      a pubsub notification, and then passed along. In this case the hmac
      that was in the original elvin packet was generated using the elvin
      traversal, which is not linear but a hash function. Annoying.
      To deal with this, our ELVIN_COMPAT clients take the notification and
      compute what the hash ordering is supposed to be, and then generates
      the HMAC for comparison. Ditto for outgoing notifications, which is
      why we have the interoperability problems.
      If I had been thinking ahead, I would have put a version number into
      pubsub notifications. Oh wait, thats another story ... lets get back
      to ELVIN_COMPAT ...  Instead of computing an order, I should have just
      put them into the notification in the proper elvin order, so that all
      clients only had to traverse it in linear order to compute the hmac.
      This is essentially what my changes have done. All notifications go
      out with the hmac computed from the linear ordering. When ELVIN_COMPAT
      is on, I use a hash table to generate the proper ordering, and then
      insert them into the notification.
      By putting adding a version number into the notification (refer to
      Mike's Rule of Version Numbers), I can tell old clients from new
      clients, and so new clients know what to do with a notification from
      an old client.
  22. 09 Feb, 2010 1 commit
  23. 25 Jan, 2010 1 commit
    • Mike Hibler's avatar
      Fixes for Fedora 10. · 9e9c658b
      Mike Hibler authored
      Fix obvious typo in liblocsetup.pm which was getting perl5.10 all cranky.
      Stop statically linking a couple of proxy pieces.  In general, it is/was
      a bad idea, and Fedora 10 doesn't have a static libz anyway.
  24. 05 Jan, 2010 2 commits