1. 14 Sep, 2017 2 commits
  2. 13 Sep, 2017 5 commits
  3. 12 Sep, 2017 9 commits
    • Mike Hibler's avatar
      Introduce sitevars to control the sensitivity of alerts. · 2962b32f
      Mike Hibler authored
      The sitevars are a bit obscure:
        # cnetwatch/check_interval
        #   Interval at which to collect info.
        #   Zero means don't run cnetwatch (exit immediately).
        # cnetwatch/alert_interval
        #   Interval over which to calculate packet/bit rates and to log alerts.
        #   Should be an integer multiple of the check_interval.
        # cnetwatch/pps_threshold
        #   Packet rate (packets/sec) in excess of which to log an alert.
        #   Zero means don't generate packet rate alerts.
        # cnetwatch/bps_threshold
        #   Data rate (bits/sec) in excess of which to log an alert.
        #   Zero means don't generate data rate alerts.
        # cnetwatch/mail_interval
        #   Interval at which to send email for all alerts logged during the interval.
        #   Zero means don't ever send email.
        # cnetwatch/mail_max
        #   Maximum number of alert emails to send; after this alerts are only logged.
        #   Zero means no limit to the emails.
      Basically you can tweak pps_threshold and bps_threshold to define what you
      think an unusual "burst" of cnet traffic is and then alert_interval to
      determine how long a burst has to last before you will send an alert.
      Why would you have check_interval less than alert_interval? You probably
      wouldn't unless you want to record finer-grained port stats using the -l
      option to write stats to a logfile. We do it on the mothership as a data
      source for some student machine learning projects. Note that in an environment
      with lots of control net switches, a single instance of gathering port
      counters from the switches could take 30 seconds or longer (on the mothership
      it can take minutes). So don't set check_interval too low.
      The mail_* variables are paranoia about sending too much email due to runaway
      nodes. The mail_interval just coalesces alerts to reduce messages, and
      mail_max is the maximum number of emails that one instance of cnetwatch will
      send. The latter is a pretty silly mechanism as a long running cnetwatch will
      probably hit the limit legitiamtely after 6 months or so and you will have to
      restart it.
    • Leigh B Stoller's avatar
      Very temporary patch for the image type problem at Utah Cloudlab: · e3f94313
      Leigh B Stoller authored
      If the metadata does not include an architecture then we fall back to
      old method of allowing imported image to run on all local node
      types. This works pretty all the time on all clusters, but not on
      Cloudlab Utah which has ARM and X96 nodes. So for now, I have pushed out
      a change to all our clusters that forces an architecture into the
      metadata, based on the fact that if the local testbed does not have any
      m400 nodes, then the image is 99.9 percent certain to be x86_64.
      I will think about a proper solution on the way to Italy.
    • Mike Hibler's avatar
      "Inline" a withadminprivs call by setting the environment variable. · 4a668e80
      Mike Hibler authored
      Also, add -1 option to run once and exit (to help debug portstats issues).
    • Leigh B Stoller's avatar
    • Leigh B Stoller's avatar
    • Leigh B Stoller's avatar
      Export $NODEROLE_TESTNODE · e06101ab
      Leigh B Stoller authored
    • Leigh B Stoller's avatar
      Minor change to dynamic load of device specific modules; watch for a · 9acba539
      Leigh B Stoller authored
      load error other then "not found", and report that rather then
      continuing on without the module. This causes really obscure problems
      that take a long time to figure out, especially with my old brain.
    • Leigh B Stoller's avatar
      Kill stack arg to portstats, portstats does not grok that, and it is not · 2cac9b11
      Leigh B Stoller authored
      really necessary since the apcon does not have port counters, but we
      will end up clearing the port counters for the per-exp switches. Will
      need to revisit if the next layer one switch does port counters.
    • Leigh B Stoller's avatar
  4. 11 Sep, 2017 2 commits
  5. 10 Sep, 2017 1 commit
  6. 08 Sep, 2017 2 commits
  7. 07 Sep, 2017 2 commits
  8. 06 Sep, 2017 5 commits
  9. 05 Sep, 2017 3 commits
  10. 01 Sep, 2017 5 commits
  11. 31 Aug, 2017 4 commits