Skip to content
  • David Johnson's avatar
    Fix ctl node reboot races on Liberty/Ubuntu 15.10. · 74df028a
    David Johnson authored
    Reboots of the ctl node for the Liberty version would result in
    failures to startup mysql, and this renders all openstack services
    inoperable.
    
    Recall that in the common case (because we have many testbeds whose
    nodes only have one expt interface), we setup the openstack mgmt lan as
    a VPN over the control net between all the nodes, served from the nm
    node.
    
    Well, mysql binds to and listens on the ip addr of the mgmt net device,
    and when the ctl node is rebooted, mysql starts long before openvpn can
    bring up the vpn client net device.  Moreover, rabbitmq would fail to
    start for the same reason, and rabbitmq is the AMQP messaging service
    that underlies all openstack RPC.
    
    For various reasons, it's not sufficient to just make the mysql
    initscript (which on 15.10 is still legacy LSB!) depend on the openvpn
    legacy LSB initscript.
    
    So I wrote a little initcript (embedded in setup-controller.sh) that
    spins in a sleep 1; loop, looking for the mgmt net to get its known IP
    from the openvpn client.  It has reverse dependency on mysql, so it runs
    to completion before mysql starts.
    
    Then, we had to handle the rabbitmq case... but rabbitmq has a modern
    systemd unit file, not an LSB initscript.  So I wrote a systemd unit
    file that invokes my mgmt net LSB initscript to wait for the mgmt net
    IP... and that has a reverse dep on rabbitmq-server.service.
    
    Now all is good.  mysql and rabbitmq-server are certainly blocked for a
    few extra seconds, while the VPN comes up, but all the openstack
    services themselves are written defensively to handle RPC server
    disconnects, or database disconnects (doh).
    74df028a