1. 29 Aug, 2018 2 commits
  2. 18 Jun, 2018 1 commit
  3. 04 Jun, 2018 1 commit
  4. 16 Apr, 2018 1 commit
  5. 21 Feb, 2018 1 commit
  6. 27 Nov, 2017 1 commit
  7. 13 Oct, 2017 1 commit
  8. 12 Sep, 2017 1 commit
    • Mike Hibler's avatar
      Introduce sitevars to control the sensitivity of alerts. · 2962b32f
      Mike Hibler authored
      The sitevars are a bit obscure:
      
        # cnetwatch/check_interval
        #   Interval at which to collect info.
        #   Zero means don't run cnetwatch (exit immediately).
        #
        # cnetwatch/alert_interval
        #   Interval over which to calculate packet/bit rates and to log alerts.
        #   Should be an integer multiple of the check_interval.
        #
        # cnetwatch/pps_threshold
        #   Packet rate (packets/sec) in excess of which to log an alert.
        #   Zero means don't generate packet rate alerts.
        #
        # cnetwatch/bps_threshold
        #   Data rate (bits/sec) in excess of which to log an alert.
        #   Zero means don't generate data rate alerts.
        #
        # cnetwatch/mail_interval
        #   Interval at which to send email for all alerts logged during the interval.
        #   Zero means don't ever send email.
        #
        # cnetwatch/mail_max
        #   Maximum number of alert emails to send; after this alerts are only logged.
        #   Zero means no limit to the emails.
      
      Basically you can tweak pps_threshold and bps_threshold to define what you
      think an unusual "burst" of cnet traffic is and then alert_interval to
      determine how long a burst has to last before you will send an alert.
      
      Why would you have check_interval less than alert_interval? You probably
      wouldn't unless you want to record finer-grained port stats using the -l
      option to write stats to a logfile. We do it on the mothership as a data
      source for some student machine learning projects. Note that in an environment
      with lots of control net switches, a single instance of gathering port
      counters from the switches could take 30 seconds or longer (on the mothership
      it can take minutes). So don't set check_interval too low.
      
      The mail_* variables are paranoia about sending too much email due to runaway
      nodes. The mail_interval just coalesces alerts to reduce messages, and
      mail_max is the maximum number of emails that one instance of cnetwatch will
      send. The latter is a pretty silly mechanism as a long running cnetwatch will
      probably hit the limit legitiamtely after 6 months or so and you will have to
      restart it.
      2962b32f
  9. 30 Aug, 2017 1 commit
  10. 23 Aug, 2017 1 commit
  11. 18 Aug, 2017 1 commit
  12. 26 Jul, 2017 1 commit
    • Mike Hibler's avatar
      Support for per-experiment root keypairs (Round 1). See issue #302. · c6150425
      Mike Hibler authored
      Provide automated setup of an ssh keypair enabling root to login without
      a password between nodes. The biggest challenge here is to get the private
      key onto nodes in such a way that a non-root user on those nodes cannot
      obtain it. Otherwise that user would be able to ssh as root to any node.
      This precludes simple distribution of the private key using tmcd/tmcc as
      any user can do a tmcc (tmcd authentication is based on the node, not the
      user).
      
      This version does a post-imaging "push" of the private key from boss using
      ssh. The key is pushed from tbswap after nodes are imaged but before the
      event system, and thus any user startup scripts, are started. We actually
      use "pssh" (really "pscp") to scale a bit better, so YOU MUST HAVE THE
      PSSH PACKAGE INSTALLED. So be sure to do a:
      
          pkg install -r Emulab pssh
      
      on your boss node. See the new utils/pushrootkeys.in script for more.
      
      The public key is distributed via the "tmcc localization" command which
      was already designed to handle adding multiple public keys to root's
      authorized_keys file on a node.
      
      This approach should be backward compatible with old images. I BUMPED THE
      VERSION NUMBER OF TMCD so that newer clients can also get back (via
      rc.localize) a list of keys and the names of the files they should be stashed
      in. This is used to allow us to pass along the SSL and SSH versions of the
      public key so that they can be placed in /root/.ssl/<node>.pub and
      /root/.ssh/id_rsa.pub respectively. Note that this step is not necessary for
      inter-node ssh to work.
      
      Also passed along is an indication of whether the returned key is encrypted.
      This might be used in Round 2 if we securely implant a shared secret on every
      node at imaging time and then use that to encrypt the ssh private key such
      that we can return it via rc.localize. But the client side script currently
      does not implement any decryption, so the client side would need to be changed
      again in this future.
      
      The per experiment root keypair mechanism has been exposed to the user via
      old school NS experiments right now by adding a node "rootkey" method. To
      export the private key to "nodeA" and the public key to "nodeB" do:
      
          $nodeA rootkey private 1
          $nodeB rootkey public 1
      
      This enables an asymmetric relationship such that "nodeA" can ssh into
      "nodeB" as root but not vice-versa. For a symmetric relationship you would do:
      
          $nodeA rootkey private 1
          $nodeB rootkey private 1
          $nodeA rootkey public 1
          $nodeB rootkey public 1
      
      These user specifications will be overridden by hardwired Emulab restrictions.
      The current restrictions are that we do *not* distribute a root pubkey to
      tainted nodes (as it opens a path to root on a node where no one should be
      root) or any keys to firewall nodes, virtnode hosts, delay nodes, subbosses,
      storagehosts, etc. which are not really part of the user topology.
      
      For more on how we got here and what might happen in Round 2, see:
      
          #302
      c6150425
  13. 05 Jun, 2017 1 commit
    • Leigh Stoller's avatar
      Working on issue #269 ... · ad2a3e70
      Leigh Stoller authored
      Add new script to "deprecate" images:
      
      	boss> wap deprecate_image
      	Usage: deprecate_image [-e|-w] <image> [warning message to users]
      	Options:
      	       -e     Use of image is an error; default is warning
      	       -w     Use of image is a warning
      
      When an image is deprecated with just warnings, new classic experiments
      generate warnings in the output. Swapping in an experiment also
      generates warnings in the output, but also sends email to the user.
      When the image set for error, both new experiment and swapin will fail
      with prejudice.
      
      Same deal on the Geni path; we generate warnings/errors and send email.
      Errors are reflected back in the Portal interface.
      
      At the moment the image server knows nothing about deprecated images, so
      the Portal constraint checker will not be bothered nor tell the user
      until later when the cluster throws an error. As a result, when we
      deprecate an image, we need to do it on all clusters. Needs to think
      about this a bit more.
      ad2a3e70
  14. 30 May, 2017 1 commit
    • Mike Hibler's avatar
      Sort out ZFS refquota/quota settings, part 2. · 2202163e
      Mike Hibler authored
      Add setzfsquotas script to handle fixup of existing quotas, add update
      script to do a one-time invocation of this script at boss-install time,
      and fix accountsetup so it will properly set both quotas going forward.
      2202163e
  15. 04 May, 2017 1 commit
  16. 14 Mar, 2017 1 commit
  17. 20 Jan, 2017 1 commit
  18. 06 Jan, 2017 1 commit
  19. 07 Nov, 2016 1 commit
  20. 12 Oct, 2016 1 commit
  21. 17 Jun, 2016 1 commit
  22. 26 May, 2016 1 commit
  23. 25 May, 2016 1 commit
    • Gary Wong's avatar
      Add future reservations and admission control. · 294fade1
      Gary Wong authored
      Right now this is strictly advisory.  In particular, swap-ins go through
      the normal path and are NOT forced to comply with admission control
      wrt future reservations; therefore, reservations don't yet come with
      any guarantees at all.
      294fade1
  24. 12 Apr, 2016 1 commit
  25. 11 Apr, 2016 1 commit
  26. 23 Mar, 2016 1 commit
  27. 21 Mar, 2016 1 commit
    • Leigh Stoller's avatar
      New (test) version of image_import that can import the entire image history · aef647a6
      Leigh Stoller authored
      from the server, keeping it in sync with the server as new versions of the
      image are added. Also handles importing deltas if the metadata says there
      is a delta.
      
      Note that downloading the image files is still lazy; we will not import all
      15 versions of an image unless they actually are needed.
      
      Lots of work still do. This is a bit of a nightmare cause of client/server
      (backward) compatibility issues wrt provenance/noprovenance and
      deltas/nodeltas. I might change my mind and say the hell with
      compatibility!
      
      Along these same lines, there is an issue of what to do when a site that is
      running with provenance turned on, gets this new code. Up to now, the
      client and server never tried to stay in sync, but now they have to (cause
      of deltas), and so the client image descriptors have to be upgraded. That
      will be a hassle too.
      aef647a6
  28. 08 Dec, 2015 1 commit
  29. 15 May, 2015 1 commit
    • Leigh Stoller's avatar
      Directory based image paths. · 3a21f39e
      Leigh Stoller authored
      Soon, we will have images with both full images and deltas, for the same
      image version. To make this possible, the image path will now be a
      directory instead of a file, and all of the versions (ndz,sig,sha1,delta)
      files will reside in the directory.
      
      A new config variable IMAGEDIRECTORIES turns this on, there is also a check
      for the ImageDiretories feature. This is applied only when a brand new
      image is created; a clone version of the image inherits the path it started
      with. Yes, you can have a mix of directory based and file based image
      descriptors.
      
      When it is time to convert all images over, there is a script called
      imagetodir that will go through all image descriptors, create the
      directory, move/rename all the files, and update the descriptors.
      Ultimately, we will not support file based image paths.
      
      I also added versioning to the image metadata descriptors so that going
      forward, old clients can handle a descriptor from a new server.
      3a21f39e
  30. 05 Mar, 2015 1 commit
  31. 27 Jan, 2015 1 commit
    • Leigh Stoller's avatar
      Two co-mingled sets of changes: · 85cb063b
      Leigh Stoller authored
      1) Implement the latest dataset read/write access settings from frontend to
         backend. Also updates for simultaneous read-only usage.
      
      2) New configure options: PROTOGENI_LOCALUSER and PROTOGENI_GENIWEBLOGIN.
      
         The first changes the way that projects and users are treated at the
         CM. When set, we create real accounts (marked as nonlocal) for users and
         also create real projects (also marked as nonlocal). Users are added to
         those projects according to their credentials. The underlying experiment
         is thus owned by the user and in the project, although all the work is
         still done by the geniuser pseudo user. The advantage of this approach
         is that we can use standard emulab access checks to control access to
         objects like datasets. Maybe images too at some point.
      
         NOTE: Users are not removed from projects once they are added; we are
         going to need to deal with this, perhaps by adding an expiration stamp
         to the groups_membership tables, and using the credential expiration to
         mark it.
      
         The second new configure option turns on the web login via the geni
         trusted signer. So, if I create a sliver on a backend cluster when both
         options are set, I can use the trusted signer to log into my newly
         created account on the cluster, and see it (via the emulab classic web
         interface).
      
         All this is in flux, might end up being a bogus approach in the end.
      85cb063b
  32. 15 Dec, 2014 1 commit
  33. 25 Nov, 2014 1 commit
  34. 04 Nov, 2014 1 commit
    • Leigh Stoller's avatar
      Add runsonxen script to set the bits of DB state required. · 04c35b0b
      Leigh Stoller authored
      	usage: runsonxen [-p <parent>] <imageid>
      	usage: runsonxen -a [-p <parent>]
      	usage: runsonxen -c <imageid>
      	Options:
      	 -n      - Impotent mode
      	 -c      - Clear XEN parent settings completely
      	 -a      - Operate on all current XEN capable images
      	 -p      - Set default parent; currently XEN43-64-STD
      04c35b0b
  35. 28 Oct, 2014 1 commit
  36. 09 Jul, 2014 1 commit
  37. 01 Jul, 2014 1 commit
  38. 13 Jun, 2014 1 commit
  39. 06 Jun, 2014 1 commit
    • Leigh Stoller's avatar
      New script, analogous to Mike's node_traffic script. Basically, it · b885ce89
      Leigh Stoller authored
      was driving me nuts that we do not have an easy way to see what is
      going on *inside* the fabric.
      
      So this one reports on traffic across trunk links and interconnects
      out of the fabric.  Basic operation is pretty simple:
      
      	Usage: switch_traffic [-rs] [-i seconds] [switch[:switch] ...]
      	Reports traffic across trunk links and interconnects
      	-h          This message
      	-i seconds  Show stats over a <seconds>-period interval
      
      So with no arguments will give portstats style output of all trunk
      links and interconnects in the database. Trunk links are aggregate
      numbers of all of the trunk wires that connect two switches.
      
      The -i option gives traffic over an interval, which is much more
      useful than the raw packet numbers, since on most of our switches
      those numbers have probably rolled over a few times.
      
      You can optionally specify specific switches and interconnects on the
      command line. For example:
      
      boss> wap switch_traffic -i 10 cisco3 ion
      Trunk                    InOctets      InUpkts   InNUpkts   ...
      ----------------------------------------------------------- ...
      cisco3:cisco10                128            0          1   ...
      cisco3:cisco8                2681            7          4   ...
      cisco3:cisco1                4493           25          7   ...
      cisco3:cisco9                 192            0          1   ...
      cisco3:cisco4                 128            0          2   ...
      pg-atla:ion                     0            0          0   ...
      pg-hous:ion                     0            0          0   ...
      pg-losa:ion                     0            0          0   ...
      pg-salt:ion                  2952            0         42   ...
      pg-wash:ion                     0            0          0   ...
      
      NOTE that the above output is abbreviated so it does not wrap in the
      git log, but you get the idea.
      
      Or you can specify a specific trunk link:
      
      	boss> wap switch_traffic -i 10 cisco3:cisco8
      
      Okay this is all pretty basic and eventually it would be nice to take
      these numbers and feed them into mrtg or rrdtool so we can view pretty
      graphs, but this as far as I can take it for now.
      
      Maybe in the short term it would be enough to record the numbers every
      5 minutes or so and put the results into a file.
      b885ce89