1. 12 Sep, 2016 2 commits
  2. 09 Sep, 2016 3 commits
    • David Johnson's avatar
      Minor bugfix. · 25525036
      David Johnson authored
      25525036
    • David Johnson's avatar
      Fix a couple params; change image strategy back to the same-name-hack... · 18ba7a10
      David Johnson authored
      ... which will just continue to work in the alias world when it arrives.
      18ba7a10
    • David Johnson's avatar
      Refactor VM image setup; support extra images; add user import script. · 3b3a8d50
      David Johnson authored
      This cleanly refactors everything we do to VM images (asserting the
      random passwd, disabling root passwd, changing sshd config, etc).  This
      allows us to support adding extra images based on URL/name user provides
      us in params, and to allow them to call our script after profile
      instantiation to add an image.  It's fairly comprehensive and it
      certainly works for the common cloud images from various linux vendors.
      
      It also rolls multi-nic support into each image.  We do this via
      boot-time udev scripts and dhcp hooks that ensure we don't add routes
      for interfaces other than eth0 (this ensures that the default gateway is
      always attached to eth0).  The old, hacky, sometimes-broken multi-nic
      support is gone, as is the special image.  I have no idea why cloud
      images don't just include this feature by default... it's not hard at
      all.
      
      We support Ubuntu and Fedora/Centos.  We support basically the image
      formats that qemu-nbd supports (i.e., qcow, vmware, vdi, raw), and gz or
      xz compression.  That seems to cover the core spectrum.
      
      On aarch64, we yank the kernel and initrd out of the image's /boot and
      create an AMI/AKI/ARI image tuple, instead of uploading the raw disk
      image.  I have never figured out how to boot a raw Ubuntu cloud image on
      KVM/aarch64, and the HP guys never got back to me.  So this is the only
      way I know (well, there's UEFI, and there's a UEFI aarch64 BIOS, so the
      UEFI cloud images might work... but life is way too short for all that
      fun).
      3b3a8d50
  3. 19 Aug, 2016 1 commit
    • David Johnson's avatar
      Add Mitaka; unified controller/networkmanager; Manila; linuxbridge. · 6d23a989
      David Johnson authored
      The feature notes:
      
        * Mitaka is now the default OpenStack release configured by this
          profile.  Kilo and Juno are deprecated, and we are no longer testing
          the profile's functionality under those versions (although we have
          no concrete plans to remove the code at this point).  They may
          continue to work, or they may not.  You should update to Mitaka if
          possible, of course.
      
        * The default topology is now down to two nodes: a controller (`ctl`)
          node and a compute (`cp-1`) node; the networkmanager node's
          functionality has been moved to the controller, as is the default in
          the OpenStack Ubuntu/Apt documentation.  You can return to the old
          three-node configuration by changing the name of the
          "networkmanager" node in the Advanced Parameters from `ctl` to `nm`.
      
        * One of the bigger Mitaka features is shared filesystem support
          (Manila).  We download a shared filesystem image and configure
          Manila so that you can immediately create a share and connect it to
          guests.
      
        * We have added support for the Neutron ML2 "Linuxbridge" driver,
          although we continue to install the "OpenVSwitch" ML2 driver by
          default.  The Linuxbridge driver is not as well-tested as the
          OpenVSwitch driver, in all possible configurations of this profile.
          Although OpenStack has switched to the linuxbridge driver as its
          default, we have no plans to do that yet.
      
        * You can now choose an Apt mirror and set a custom mirror path if you
          require fast localized access to a mirror.
      
        * The MTU that dnsmasq pushes to your OpenStack VMs has been reduced
          from 1454 bytes to 1450 bytes.  1454 is an adequate setting for GRE
          tunnels, of course, but not for VXLAN networks, which require 1450
          on a normal physical network with 1500-byte MTU.  Somehow this
          mistake escaped prior testing.
      
      A few details:
      
        * I refactored the Neutron ML2 plugin setup code, since all nodes
          have to be configured essentially the same way.  Moreover, it
          supports either openvswitch or linuxbridge.
      
        * I haven't setup Manila for aarch64 because there is no available
          Manila service image for aarch64.  Have to build one of my own.
      6d23a989
  4. 14 Jul, 2016 2 commits
    • David Johnson's avatar
      Use a different mechanism to tell dpkg automatic conffile settings. · 4ace7c90
      David Johnson authored
      Now I see why I hadn't enabled the
      
        -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"
      
      directly on the apt-get command lines.  apt-get must have a bug, because
      when you specify this option in noninteractive (and non-pty, I assume,
      because this is via startup command-then-ssh), at least one of the dpkg
      commands invoked by apt-get has no dpkg action.
      
      So, put these two options into /etc/dpkg/dpkg.cfg/cloudlab, and then
      there are no problems.
      
      Of course, this means this same behavior will happen to the users if
      they try apt-get or dpkg later on.  This is on the one hand, preferable,
      because then they can't possibly screw up openstack config files through
      package upgrades.  On the other hand, they might get fooled they're
      upgrading some other package.
      
      Probably will just document this and call it good :).
      4ace7c90
    • David Johnson's avatar
  5. 13 Jul, 2016 3 commits
  6. 06 Jul, 2016 1 commit
    • David Johnson's avatar
      Add a couple missing fallback-to-EOL wgets. · d74be89c
      David Johnson authored
      The openstack people rename their branches to -eol variants, and so that
      tarballs keep working in perpetuity despite us needing to download some
      things, we first try the stable/<release> tag, then fall back to
      <release>-eol tag.  Sigh...
      d74be89c
  7. 17 Jun, 2016 1 commit
    • David Johnson's avatar
      Several changes to openstack slothd gatherer. · 3b8bde10
      David Johnson authored
      Added the entire epoch as a period named '__EPOCH__'
      
      Stopped collecting cpu_util, network.(incoming,outgoing).bytes.rate
      meters for periods, and collect 'instance' meters per period (these just
      show how many instances were on each hypervisor during the period).
      
      Stopped collecting port.(delete,create,update) events -- far too many
      and they essentially relate n-to-1 with VMs (where n is usually 1, 1
      port per VM).
      
      Handle some weird inconsistencies in resource metadata (it seemed that
      for some VMs, some of the metadata wasn't set and thus we were not
      including said VMs in the primary VM info dict -- and this seemed to
      cause problems for the javascript graph stuff).
      
      Also, if a resource/meter value isn't associated with a hostname, use
      the 'UNKNOWN' hostname instead of None/null.  For whatever reason, this
      occurs with image.update now; funny.
      
      Get rid of the __FLATTEN__ stuff; that was just a lingering turd.
      3b8bde10
  8. 09 Jun, 2016 1 commit
  9. 01 Jun, 2016 2 commits
  10. 31 May, 2016 3 commits
  11. 26 May, 2016 1 commit
  12. 21 May, 2016 2 commits
    • David Johnson's avatar
      Reorg top level; add more resource info; add timestamp/runtime metadata. · acbf5767
      David Johnson authored
      Now the top-level keys are: 'META' (metadata about the collection run,
      so that whoever pulls this file back to boss doesn't have to check its
      ctime/mtime to know how stale the data is -- times in GMT); 'info',
      which has keys like 'images', 'vms', 'networks', 'subnets', 'ports',
      'routers', and a UUID->dict where the dict has a 'name' field (HRN),
      'status' (if the resource has status; all do); and 'deleted' (True
      or False).  Then the periods (which were previously top-level keys)
      are now keys in the 'periods' top-level dict.
      acbf5767
    • David Johnson's avatar
      Add network in/out byte rate meters, and API meters. · f77aac78
      David Johnson authored
      Openstack reports in/out byte rates for each vm and for each device
      on those VMs, but I aggregate the per-device stats into per-VM in/out
      totals.
      
      Currently, I'm reporting these API calls:
        * (network,subnet,port,router).(create,update,delete)
        * (image).(upload,update)
      API calls are reported from which "host" they were issued (I think);
      if there is no host info logged (like for images), the hostname is
      "null".
      f77aac78
  13. 20 May, 2016 1 commit
    • David Johnson's avatar
      A simple cpu_util statistics gatherer. · 5300a37f
      David Johnson authored
      This collects openstack cpu_util stats, grouped by hypervisor, and dumps
      them into a JSON file.  The JSON file will be written into
      /root/setup/cloudlab-openstack-stats.json . Currently it gets written
      every 2 minutes (however, openstack by default collects CPU stats only
      every 600 seconds...).
      
      The format is quite simple. It's a dict of time periods -- currrently
      the last 10 minutes, last hour, last 6 hours, last day, and last
      week. Each period is also a dict, currently with two keys: vm_info and
      cpu_util. vm_info contains a dict for each physical hypervisor node, and
      that dict contains a mapping of openstack VM uuid to VM
      shortname. cpu_util also contains a dict for each physical hypervisor
      node, and that dict contains two keys: a total of the average cpu utils
      for all the VMs on that node; and a "vms" dict containing the avg cpu
      util for each VM.
      5300a37f
  14. 17 May, 2016 1 commit
    • David Johnson's avatar
      Fix ctl node reboot races on Liberty/Ubuntu 15.10. · 74df028a
      David Johnson authored
      Reboots of the ctl node for the Liberty version would result in
      failures to startup mysql, and this renders all openstack services
      inoperable.
      
      Recall that in the common case (because we have many testbeds whose
      nodes only have one expt interface), we setup the openstack mgmt lan as
      a VPN over the control net between all the nodes, served from the nm
      node.
      
      Well, mysql binds to and listens on the ip addr of the mgmt net device,
      and when the ctl node is rebooted, mysql starts long before openvpn can
      bring up the vpn client net device.  Moreover, rabbitmq would fail to
      start for the same reason, and rabbitmq is the AMQP messaging service
      that underlies all openstack RPC.
      
      For various reasons, it's not sufficient to just make the mysql
      initscript (which on 15.10 is still legacy LSB!) depend on the openvpn
      legacy LSB initscript.
      
      So I wrote a little initcript (embedded in setup-controller.sh) that
      spins in a sleep 1; loop, looking for the mgmt net to get its known IP
      from the openvpn client.  It has reverse dependency on mysql, so it runs
      to completion before mysql starts.
      
      Then, we had to handle the rabbitmq case... but rabbitmq has a modern
      systemd unit file, not an LSB initscript.  So I wrote a systemd unit
      file that invokes my mgmt net LSB initscript to wait for the mgmt net
      IP... and that has a reverse dep on rabbitmq-server.service.
      
      Now all is good.  mysql and rabbitmq-server are certainly blocked for a
      few extra seconds, while the VPN comes up, but all the openstack
      services themselves are written defensively to handle RPC server
      disconnects, or database disconnects (doh).
      74df028a
  15. 04 May, 2016 1 commit
  16. 03 May, 2016 1 commit
  17. 21 Apr, 2016 1 commit
  18. 19 Apr, 2016 2 commits
  19. 25 Mar, 2016 1 commit
  20. 03 Mar, 2016 2 commits
  21. 27 Feb, 2016 2 commits
  22. 26 Feb, 2016 1 commit
  23. 25 Feb, 2016 3 commits
  24. 22 Feb, 2016 2 commits