1. 29 Mar, 2002 9 commits
  2. 28 Mar, 2002 13 commits
    • Robert Ricci's avatar
      New script: stated · 447bb8a5
      Robert Ricci authored
      Watches for events sent by TMCD regarding the state of nodes. Records
      this information in the database. Also watches for nodes that undergo
      invalid state transitions, or stay in the same state for too long.
      Right now, the only action it takes is to send email, but in the
      future, will take action to 'unstick' nodes.
      
      Not yet installed by default.
      447bb8a5
    • Robert Ricci's avatar
      Added two tables. · 1d4dd4ef
      Robert Ricci authored
      state_timeouts is for recording the maximum amount of time (in
      seconds) that a node should be in a given state. (0 means no timeout)
      The contents of the action column are not yet well-defined - in the
      future, it may contain commands for dealing with stuck nodes, or
      perhaps a keyword to indicate to the watchdog daemon what action
      should be taken.
      
      state_transitions contains a list of valid state transitions.
      1d4dd4ef
    • Robert Ricci's avatar
    • Robert Ricci's avatar
      Added code in dostate() to chomp whitespace off the end of the new · a4921502
      Robert Ricci authored
      state string. This was causing (tremendously frustrating) problems
      elsewhere.
      a4921502
    • Robert Ricci's avatar
      5600aa42
    • Robert Ricci's avatar
      Fixed up target to make event_wrap.c (generated by SWIG.) This is made · 25ec7e2c
      Robert Ricci authored
      awkward by the fact that swig insists on putting it's generated C
      file in the source directory, not the object directory.
      25ec7e2c
    • Leigh B. Stoller's avatar
      Fix up os_load -l query. · bc88f075
      Leigh B. Stoller authored
      bc88f075
    • Leigh B. Stoller's avatar
      Add toor user as backdoor in case root password is changed. Not · 69c47c74
      Leigh B. Stoller authored
      installed anyplace yet ...
      69c47c74
    • Leigh B. Stoller's avatar
      Minor fix to previous revision. · a82d73a7
      Leigh B. Stoller authored
      a82d73a7
    • Leigh B. Stoller's avatar
      Add versioning support. This has been a minor problem, and is going to · 2d522296
      Leigh B. Stoller authored
      be a worse problem with remote nodes, where we will not be able to
      keep everyone up to date like we can in the local testbed case. I ran
      into this yesterday with the key distribution stuff for RON nodes,
      which require incompatable changes to the accounts info that is
      returned. So, tmcc now takes a [-v version] argument, which is passed
      through to tmcd in the request field. tmcd passes that version number
      (assumed to be an int) down, and the routines should look at that. We
      will need to make some structural changes in tmcd as we get more
      version skew, but for now this is fine. Anyway, tmcd/tmcc have a
      compiled in DEFAULT_VERSION (see decls.h). If no version is supplied,
      assume DEFAULT_VERSION (2), which covers all of the old images and yet
      to be updated current images. As the new tmcc makes it out, versions
      will be sent through. VERY IMPORTANT: The current version is placed in
      libsetup.pm. When you make incompatible changes, bump the version
      number is decls.h and libsetup.pm, recompile and install a new tmcc
      and the new libsetup.pm on the clients (and of course, tmcd on the
      server).
      
      Fixes to termination; Add signal handlers for HUP,INT,TERM, and make
      sure all the children get killed off before exiting. We still have
      some problems though; I think the children should wait until the
      current request is completed before exiting. I'll give that some more
      thought though since it easy to mess that stuff up (leave zombies).
      
      Add build_info[] to startup message to syslog. Good for debugging.
      Some minor cleanup and restructuring. Mike is gonna hate it.
      2d522296
    • Leigh B. Stoller's avatar
      Add a sleep 1 between killall and restart, to give all the children · 89209d50
      Leigh B. Stoller authored
      a chance to react. Otherwise, we sometimes try to start the new one
      before all the children have released the port number (socket).
      89209d50
    • Leigh B. Stoller's avatar
    • Leigh B. Stoller's avatar
      Fix some -Wall warnings. · 8d0caae2
      Leigh B. Stoller authored
      8d0caae2
  3. 27 Mar, 2002 7 commits
  4. 26 Mar, 2002 4 commits
  5. 25 Mar, 2002 7 commits