- 31 May, 2016 2 commits
-
-
David Johnson authored
-
David Johnson authored
-
- 26 May, 2016 1 commit
-
-
David Johnson authored
Also add a version field to the META top-level key; currently 1.
-
- 21 May, 2016 2 commits
-
-
David Johnson authored
Now the top-level keys are: 'META' (metadata about the collection run, so that whoever pulls this file back to boss doesn't have to check its ctime/mtime to know how stale the data is -- times in GMT); 'info', which has keys like 'images', 'vms', 'networks', 'subnets', 'ports', 'routers', and a UUID->dict where the dict has a 'name' field (HRN), 'status' (if the resource has status; all do); and 'deleted' (True or False). Then the periods (which were previously top-level keys) are now keys in the 'periods' top-level dict.
-
David Johnson authored
Openstack reports in/out byte rates for each vm and for each device on those VMs, but I aggregate the per-device stats into per-VM in/out totals. Currently, I'm reporting these API calls: * (network,subnet,port,router).(create,update,delete) * (image).(upload,update) API calls are reported from which "host" they were issued (I think); if there is no host info logged (like for images), the hostname is "null".
-
- 20 May, 2016 1 commit
-
-
David Johnson authored
This collects openstack cpu_util stats, grouped by hypervisor, and dumps them into a JSON file. The JSON file will be written into /root/setup/cloudlab-openstack-stats.json . Currently it gets written every 2 minutes (however, openstack by default collects CPU stats only every 600 seconds...). The format is quite simple. It's a dict of time periods -- currrently the last 10 minutes, last hour, last 6 hours, last day, and last week. Each period is also a dict, currently with two keys: vm_info and cpu_util. vm_info contains a dict for each physical hypervisor node, and that dict contains a mapping of openstack VM uuid to VM shortname. cpu_util also contains a dict for each physical hypervisor node, and that dict contains two keys: a total of the average cpu utils for all the VMs on that node; and a "vms" dict containing the avg cpu util for each VM.
-
- 17 May, 2016 1 commit
-
-
David Johnson authored
Reboots of the ctl node for the Liberty version would result in failures to startup mysql, and this renders all openstack services inoperable. Recall that in the common case (because we have many testbeds whose nodes only have one expt interface), we setup the openstack mgmt lan as a VPN over the control net between all the nodes, served from the nm node. Well, mysql binds to and listens on the ip addr of the mgmt net device, and when the ctl node is rebooted, mysql starts long before openvpn can bring up the vpn client net device. Moreover, rabbitmq would fail to start for the same reason, and rabbitmq is the AMQP messaging service that underlies all openstack RPC. For various reasons, it's not sufficient to just make the mysql initscript (which on 15.10 is still legacy LSB!) depend on the openvpn legacy LSB initscript. So I wrote a little initcript (embedded in setup-controller.sh) that spins in a sleep 1; loop, looking for the mgmt net to get its known IP from the openvpn client. It has reverse dependency on mysql, so it runs to completion before mysql starts. Then, we had to handle the rabbitmq case... but rabbitmq has a modern systemd unit file, not an LSB initscript. So I wrote a systemd unit file that invokes my mgmt net LSB initscript to wait for the mgmt net IP... and that has a reverse dep on rabbitmq-server.service. Now all is good. mysql and rabbitmq-server are certainly blocked for a few extra seconds, while the VPN comes up, but all the openstack services themselves are written defensively to handle RPC server disconnects, or database disconnects (doh).
-
- 04 May, 2016 1 commit
-
-
David Johnson authored
-
- 03 May, 2016 1 commit
-
-
David Johnson authored
-
- 21 Apr, 2016 1 commit
-
-
David Johnson authored
(I'm not sure why migrate isn't working... the nodes try to migrate but don't complete the journey :).)
-
- 19 Apr, 2016 2 commits
-
-
David Johnson authored
User has to include a tarball containing a single dir which in turn contains a script called setup.sh . This tarball must be installed into /tmp/setup/ext . Minimal, but who cares, it works for now.
-
David Johnson authored
(This was pushed into the online version of the profile last last week.)
-
- 25 Mar, 2016 1 commit
-
-
David Johnson authored
-
- 03 Mar, 2016 2 commits
-
-
David Johnson authored
-
David Johnson authored
-
- 27 Feb, 2016 2 commits
-
-
David Johnson authored
(This var just had to be defined for the blockstore case.)
-
David Johnson authored
-
- 26 Feb, 2016 1 commit
-
-
David Johnson authored
-
- 25 Feb, 2016 3 commits
-
-
David Johnson authored
-
David Johnson authored
-
David Johnson authored
This has support for generating arguments to AddNodes() and DeleteNodes(), which means re-reading the current manifest to know what the original parameters were, so that the appropriate args (like which image, which lans to join, etc) go to AddNodes().
-
- 22 Feb, 2016 2 commits
-
-
David Johnson authored
This was to try to solve the performance problem; no luck here.
-
David Johnson authored
-
- 19 Feb, 2016 2 commits
-
-
David Johnson authored
Liberty didn't seem to like the disable_vnc flag to Nova on aarch64 that we relied on so that images would boot. Fortuitously, qemu/libvirt have been upgraded enough so that you can actually attach a VGA adapter to an aarch64 KVM qemu instance. So we do that, and now we mark images with a specific flag that says to use the vga display driver instead of the 'cirrus' default, which qemu/libvirt aarch64 does *not* support. Probably I should just find a way to fix the vnc disablement :).
-
David Johnson authored
-
- 17 Feb, 2016 9 commits
-
-
David Johnson authored
-
David Johnson authored
-
David Johnson authored
Apparently oslo.service handles stop signals (like those from systemd) in a bad way. When systemd stops a service, it can drive the service into an exception (in one of its threads, I think) and I assume systemd waits until its timeout and then kills it more firmly. This patch went in to oslo.service 1.2.0 or so, but of course we're stuck back in the stone age still, around 0.9 or whatever. So apply the patch, stupid. We start and stop services a lot as we're setting up.
-
David Johnson authored
-
David Johnson authored
Memcache + Keystone + WSGI/Apache seems to cause a problem where Keystone is effectively unavailable (internal errors) for about a minute... then it comes back by itself. So we disable it by default. The docs default to using it, but this is far from the first time the doc defaults trigger bugs or are simply bad configuration!
-
David Johnson authored
-
David Johnson authored
-
David Johnson authored
-
David Johnson authored
-
- 16 Feb, 2016 1 commit
-
-
David Johnson authored
Add Liberty support. Add keystone v3 support. Now you can choose which version of keystone to run... all combinations tested exception Juno with v3. Make node type and link speed configurable. Make token and session timeouts much longer by default (so people don't get logged out so quickly), but also configurable. Keystone is now served by WSGI through Apache on Kilo and Liberty. Memcached keystone token caching is disabled for now; it causes intermittent problems; so using SQL for now. Add localhost to /etc/hosts file. This doesn't cause problems anymore, if it ever did. We now use the `openstack' CLI command for >= Kilo, instead of the per-service client CLI tools. Stick with ovs agent even in Liberty -- even though the default is now linuxbridge, it seems. In general, get rid of nearly all the rest of the cat <<EOF ... EOF stuff and replace it with crudini --set/--del. A touch slower, but much cleaner. Also in general, improve the Kilo support so that it more closely matches the docs.
-
- 01 Feb, 2016 2 commits
-
-
David Johnson authored
-
David Johnson authored
If you instantiate a portal expt on Emulab (where you might have a real account), the swapper is you, not geniuser. So, check geniuser via geni-get slice_urn success/failure.
-
- 23 Dec, 2015 2 commits
-
-
David Johnson authored
Also, adds a geni-lib script that generates an rspec instead of printing it (although print still works at portal) and generates input for CM::AddNodes() when requested. This generator is stateful; it tries to avoid generating new nodes with previously-used IPs or client_ids; thus it is a separate object. It is designed so that it can be imported into a script, and the importing script can look for special DYNSLICE_GENERATOR variables to use its rspec foo to create a slice and add nodes in some semantic way.
-
David Johnson authored
-
- 21 Dec, 2015 1 commit
-
-
David Johnson authored
We want to use only local AM tmcd info in this case...
-