utils/cnetwatch.in · 2962b32f7e385af06fa9a2ff7d53f3ff24522e91 · emulab / emulab-devel

Introduce sitevars to control the sensitivity of alerts. · 2962b32f
Mike Hibler authored Sep 12, 2017
The sitevars are a bit obscure:

  # cnetwatch/check_interval
  #   Interval at which to collect info.
  #   Zero means don't run cnetwatch (exit immediately).
  #
  # cnetwatch/alert_interval
  #   Interval over which to calculate packet/bit rates and to log alerts.
  #   Should be an integer multiple of the check_interval.
  #
  # cnetwatch/pps_threshold
  #   Packet rate (packets/sec) in excess of which to log an alert.
  #   Zero means don't generate packet rate alerts.
  #
  # cnetwatch/bps_threshold
  #   Data rate (bits/sec) in excess of which to log an alert.
  #   Zero means don't generate data rate alerts.
  #
  # cnetwatch/mail_interval
  #   Interval at which to send email for all alerts logged during the interval.
  #   Zero means don't ever send email.
  #
  # cnetwatch/mail_max
  #   Maximum number of alert emails to send; after this alerts are only logged.
  #   Zero means no limit to the emails.

Basically you can tweak pps_threshold and bps_threshold to define what you
think an unusual "burst" of cnet traffic is and then alert_interval to
determine how long a burst has to last before you will send an alert.

Why would you have check_interval less than alert_interval? You probably
wouldn't unless you want to record finer-grained port stats using the -l
option to write stats to a logfile. We do it on the mothership as a data
source for some student machine learning projects. Note that in an environment
with lots of control net switches, a single instance of gathering port
counters from the switches could take 30 seconds or longer (on the mothership
it can take minutes). So don't set check_interval too low.

The mail_* variables are paranoia about sending too much email due to runaway
nodes. The mail_interval just coalesces alerts to reduce messages, and
mail_max is the maximum number of emails that one instance of cnetwatch will
send. The latter is a pretty silly mechanism as a long running cnetwatch will
probably hit the limit legitiamtely after 6 months or so and you will have to
restart it.
2962b32f