1. 06 Jul, 2016 1 commit
    • David Johnson's avatar
      Add a couple missing fallback-to-EOL wgets. · d74be89c
      David Johnson authored
      The openstack people rename their branches to -eol variants, and so that
      tarballs keep working in perpetuity despite us needing to download some
      things, we first try the stable/<release> tag, then fall back to
      <release>-eol tag.  Sigh...
      d74be89c
  2. 17 Jun, 2016 1 commit
    • David Johnson's avatar
      Several changes to openstack slothd gatherer. · 3b8bde10
      David Johnson authored
      Added the entire epoch as a period named '__EPOCH__'
      
      Stopped collecting cpu_util, network.(incoming,outgoing).bytes.rate
      meters for periods, and collect 'instance' meters per period (these just
      show how many instances were on each hypervisor during the period).
      
      Stopped collecting port.(delete,create,update) events -- far too many
      and they essentially relate n-to-1 with VMs (where n is usually 1, 1
      port per VM).
      
      Handle some weird inconsistencies in resource metadata (it seemed that
      for some VMs, some of the metadata wasn't set and thus we were not
      including said VMs in the primary VM info dict -- and this seemed to
      cause problems for the javascript graph stuff).
      
      Also, if a resource/meter value isn't associated with a hostname, use
      the 'UNKNOWN' hostname instead of None/null.  For whatever reason, this
      occurs with image.update now; funny.
      
      Get rid of the __FLATTEN__ stuff; that was just a lingering turd.
      3b8bde10
  3. 09 Jun, 2016 1 commit
  4. 01 Jun, 2016 2 commits
  5. 31 May, 2016 3 commits
  6. 26 May, 2016 1 commit
  7. 21 May, 2016 2 commits
    • David Johnson's avatar
      Reorg top level; add more resource info; add timestamp/runtime metadata. · acbf5767
      David Johnson authored
      Now the top-level keys are: 'META' (metadata about the collection run,
      so that whoever pulls this file back to boss doesn't have to check its
      ctime/mtime to know how stale the data is -- times in GMT); 'info',
      which has keys like 'images', 'vms', 'networks', 'subnets', 'ports',
      'routers', and a UUID->dict where the dict has a 'name' field (HRN),
      'status' (if the resource has status; all do); and 'deleted' (True
      or False).  Then the periods (which were previously top-level keys)
      are now keys in the 'periods' top-level dict.
      acbf5767
    • David Johnson's avatar
      Add network in/out byte rate meters, and API meters. · f77aac78
      David Johnson authored
      Openstack reports in/out byte rates for each vm and for each device
      on those VMs, but I aggregate the per-device stats into per-VM in/out
      totals.
      
      Currently, I'm reporting these API calls:
        * (network,subnet,port,router).(create,update,delete)
        * (image).(upload,update)
      API calls are reported from which "host" they were issued (I think);
      if there is no host info logged (like for images), the hostname is
      "null".
      f77aac78
  8. 20 May, 2016 1 commit
    • David Johnson's avatar
      A simple cpu_util statistics gatherer. · 5300a37f
      David Johnson authored
      This collects openstack cpu_util stats, grouped by hypervisor, and dumps
      them into a JSON file.  The JSON file will be written into
      /root/setup/cloudlab-openstack-stats.json . Currently it gets written
      every 2 minutes (however, openstack by default collects CPU stats only
      every 600 seconds...).
      
      The format is quite simple. It's a dict of time periods -- currrently
      the last 10 minutes, last hour, last 6 hours, last day, and last
      week. Each period is also a dict, currently with two keys: vm_info and
      cpu_util. vm_info contains a dict for each physical hypervisor node, and
      that dict contains a mapping of openstack VM uuid to VM
      shortname. cpu_util also contains a dict for each physical hypervisor
      node, and that dict contains two keys: a total of the average cpu utils
      for all the VMs on that node; and a "vms" dict containing the avg cpu
      util for each VM.
      5300a37f
  9. 17 May, 2016 1 commit
    • David Johnson's avatar
      Fix ctl node reboot races on Liberty/Ubuntu 15.10. · 74df028a
      David Johnson authored
      Reboots of the ctl node for the Liberty version would result in
      failures to startup mysql, and this renders all openstack services
      inoperable.
      
      Recall that in the common case (because we have many testbeds whose
      nodes only have one expt interface), we setup the openstack mgmt lan as
      a VPN over the control net between all the nodes, served from the nm
      node.
      
      Well, mysql binds to and listens on the ip addr of the mgmt net device,
      and when the ctl node is rebooted, mysql starts long before openvpn can
      bring up the vpn client net device.  Moreover, rabbitmq would fail to
      start for the same reason, and rabbitmq is the AMQP messaging service
      that underlies all openstack RPC.
      
      For various reasons, it's not sufficient to just make the mysql
      initscript (which on 15.10 is still legacy LSB!) depend on the openvpn
      legacy LSB initscript.
      
      So I wrote a little initcript (embedded in setup-controller.sh) that
      spins in a sleep 1; loop, looking for the mgmt net to get its known IP
      from the openvpn client.  It has reverse dependency on mysql, so it runs
      to completion before mysql starts.
      
      Then, we had to handle the rabbitmq case... but rabbitmq has a modern
      systemd unit file, not an LSB initscript.  So I wrote a systemd unit
      file that invokes my mgmt net LSB initscript to wait for the mgmt net
      IP... and that has a reverse dep on rabbitmq-server.service.
      
      Now all is good.  mysql and rabbitmq-server are certainly blocked for a
      few extra seconds, while the VPN comes up, but all the openstack
      services themselves are written defensively to handle RPC server
      disconnects, or database disconnects (doh).
      74df028a
  10. 04 May, 2016 1 commit
  11. 03 May, 2016 1 commit
  12. 21 Apr, 2016 1 commit
  13. 19 Apr, 2016 2 commits
  14. 25 Mar, 2016 1 commit
  15. 03 Mar, 2016 2 commits
  16. 27 Feb, 2016 2 commits
  17. 26 Feb, 2016 1 commit
  18. 25 Feb, 2016 3 commits
  19. 22 Feb, 2016 2 commits
  20. 19 Feb, 2016 2 commits
    • David Johnson's avatar
      On Liberty aarch64, enable vnc using vga driver disabling usb input. · fde6f527
      David Johnson authored
      Liberty didn't seem to like the disable_vnc flag to Nova on aarch64 that
      we relied on so that images would boot.  Fortuitously, qemu/libvirt have
      been upgraded enough so that you can actually attach a VGA adapter to an
      aarch64 KVM qemu instance.  So we do that, and now we mark images with a
      specific flag that says to use the vga display driver instead of the
      'cirrus' default, which qemu/libvirt aarch64 does *not* support.
      
      Probably I should just find a way to fix the vnc disablement :).
      fde6f527
    • David Johnson's avatar
      27c65db3
  21. 17 Feb, 2016 9 commits