Skip to content
  • David Johnson's avatar
    Best effort to decrease dhclient latency on Ubuntu 16. · 5bc94e54
    David Johnson authored
    (AFAIK, nothing has changed about dhclient, etc... I just noticed on the
    Emulab d820s (which have BCM5720s, tg3 driver), the driver takes
    anywhere from 7-9 seconds to simply init the card and autoneg with the
    switch (I've seen worse times, too, i.e. 19 and 29 seconds!).)
    
    dhclient is happy to start sending requests on interfaces that have no
    carrier (gee, did it ever seem like a good idea to make that behavior
    optional???).  Thus, if we get stuck with a control net NIC that has a
    horribly long init/autoneg time, dhclient is far into its backoff
    strategy on the control net, when it doesn't need to be!  In addition to
    slow media negotiation, there are STP auto things that can further delay
    the forwarding state of a switch port (like the ProCurve "auto-edge"
    port setting that causes the switch to wait 3 seconds after media
    negotiation for a BPDU).  So we have to be a little smarter about
    bringing up the control net via DHCP.
    
    So to combat these possible scenarios, we try two main things.
    
    First, we modify findcnet to wait for one of two things to be true
    before we start dhclient at all (or until a 6-second timeout is
    reached): 1) if we have a previous control net device in
    /var/lib/dhcp/dhclient.leases, we wait for that to come up; or 2) if we
    don't have a previous control net device (i.e. first boot of an image),
    we wait for at least one device to obtain carrier.  We could increase
    the 6-second timeout, but we'll wait on that for now; this should be
    good for now.
    
    Second, we set initial-delay and initial-interval both to 3 seconds in
    dhclient.conf ; hopefully this will give STP protection schemes a chance
    to have gotten things straight by the time dhclient makes its first
    retransmit.
    
    I tried adding a forced 'ifconfig <X> up' to the udev interface handler
    script, just to try to kick the device into autoneg mode ASAP, but of
    course that didn't help anything.
    
    I cannot improve on this unless we move to a split, managed dhclient
    scheme, where we actually run a dhclient for each interface, and control
    the backoff time much more tightly.  For now, I don't want to do either
    of these things.
    5bc94e54