- 13 Jul, 2018 1 commit
-
-
Mike Hibler authored
Cuz you can never have too many sitevars!
-
- 09 Jul, 2018 3 commits
-
-
Leigh Stoller authored
-
Leigh Stoller authored
hand). Also add enable sitevar since we run this only on clusters that support portstats on the control network.
-
Leigh Stoller authored
easily get to the experiment (or portal status page).
-
- 25 Jun, 2018 1 commit
-
-
Mike Hibler authored
-
- 20 Oct, 2017 1 commit
-
-
Mike Hibler authored
This is a hack to get cnetwatch to shutup about dbox3 already! It should be generalized to support both node_type_ and node_ attributes to configure the various (currently global) parameters.
-
- 15 Sep, 2017 1 commit
-
-
Mike Hibler authored
-
- 13 Sep, 2017 1 commit
-
-
Mike Hibler authored
-
- 12 Sep, 2017 2 commits
-
-
Mike Hibler authored
The sitevars are a bit obscure: # cnetwatch/check_interval # Interval at which to collect info. # Zero means don't run cnetwatch (exit immediately). # # cnetwatch/alert_interval # Interval over which to calculate packet/bit rates and to log alerts. # Should be an integer multiple of the check_interval. # # cnetwatch/pps_threshold # Packet rate (packets/sec) in excess of which to log an alert. # Zero means don't generate packet rate alerts. # # cnetwatch/bps_threshold # Data rate (bits/sec) in excess of which to log an alert. # Zero means don't generate data rate alerts. # # cnetwatch/mail_interval # Interval at which to send email for all alerts logged during the interval. # Zero means don't ever send email. # # cnetwatch/mail_max # Maximum number of alert emails to send; after this alerts are only logged. # Zero means no limit to the emails. Basically you can tweak pps_threshold and bps_threshold to define what you think an unusual "burst" of cnet traffic is and then alert_interval to determine how long a burst has to last before you will send an alert. Why would you have check_interval less than alert_interval? You probably wouldn't unless you want to record finer-grained port stats using the -l option to write stats to a logfile. We do it on the mothership as a data source for some student machine learning projects. Note that in an environment with lots of control net switches, a single instance of gathering port counters from the switches could take 30 seconds or longer (on the mothership it can take minutes). So don't set check_interval too low. The mail_* variables are paranoia about sending too much email due to runaway nodes. The mail_interval just coalesces alerts to reduce messages, and mail_max is the maximum number of emails that one instance of cnetwatch will send. The latter is a pretty silly mechanism as a long running cnetwatch will probably hit the limit legitiamtely after 6 months or so and you will have to restart it.
-
Mike Hibler authored
Also, add -1 option to run once and exit (to help debug portstats issues).
-
- 23 Aug, 2017 2 commits
-
-
Mike Hibler authored
-
Leigh Stoller authored
iface syntax and prints out in iface syntax (-i option).
-
- 26 Jul, 2017 1 commit
-
-
Mike Hibler authored
-
- 02 Jun, 2017 1 commit
-
-
Mike Hibler authored
-
- 19 Apr, 2017 1 commit
-
-
Mike Hibler authored
-
- 31 Mar, 2017 2 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
A varient of node_traffic.
-