Commit 5bc94e54 authored by David Johnson's avatar David Johnson

Best effort to decrease dhclient latency on Ubuntu 16.

(AFAIK, nothing has changed about dhclient, etc... I just noticed on the
Emulab d820s (which have BCM5720s, tg3 driver), the driver takes
anywhere from 7-9 seconds to simply init the card and autoneg with the
switch (I've seen worse times, too, i.e. 19 and 29 seconds!).)

dhclient is happy to start sending requests on interfaces that have no
carrier (gee, did it ever seem like a good idea to make that behavior
optional???).  Thus, if we get stuck with a control net NIC that has a
horribly long init/autoneg time, dhclient is far into its backoff
strategy on the control net, when it doesn't need to be!  In addition to
slow media negotiation, there are STP auto things that can further delay
the forwarding state of a switch port (like the ProCurve "auto-edge"
port setting that causes the switch to wait 3 seconds after media
negotiation for a BPDU).  So we have to be a little smarter about
bringing up the control net via DHCP.

So to combat these possible scenarios, we try two main things.

First, we modify findcnet to wait for one of two things to be true
before we start dhclient at all (or until a 6-second timeout is
reached): 1) if we have a previous control net device in
/var/lib/dhcp/dhclient.leases, we wait for that to come up; or 2) if we
don't have a previous control net device (i.e. first boot of an image),
we wait for at least one device to obtain carrier.  We could increase
the 6-second timeout, but we'll wait on that for now; this should be
good for now.

Second, we set initial-delay and initial-interval both to 3 seconds in
dhclient.conf ; hopefully this will give STP protection schemes a chance
to have gotten things straight by the time dhclient makes its first
retransmit.

I tried adding a forced 'ifconfig <X> up' to the udev interface handler
script, just to try to kick the device into autoneg mode ASAP, but of
course that didn't help anything.

I cannot improve on this unless we move to a split, managed dhclient
scheme, where we actually run a dhclient for each interface, and control
the backoff time much more tightly.  For now, I don't want to do either
of these things.
parent 5b941237
......@@ -106,6 +106,8 @@ sysetc-fixup:
rm -f $(SYSETCDIR)/rc.local
sysetc-install: dir-install
$(INSTALL) -m 755 $(SRCDIR)/dhclient.conf \
$(SYSETCDIR)/dhcp/dhclient.conf
$(INSTALL) -m 755 $(SRCDIR)/dhclient-exit-hooks \
$(SYSETCDIR)/dhcp/dhclient-exit-hooks.d/emulab
$(INSTALL) -m 755 $(SRCDIR)/dhclient-enter-hooks \
......
# Configuration file for /sbin/dhclient.
#
# This is a sample configuration file for dhclient. See dhclient.conf's
# man page for more information about the syntax of this file
# and a more comprehensive list of the parameters understood by
# dhclient.
#
# Normally, if the DHCP server provides reasonable information and does
# not leave anything out (like the domain name, for example), then
# few changes must be made to this file, if any.
#
option rfc3442-classless-static-routes code 121 = array of unsigned integer 8;
send host-name = gethostname();
request subnet-mask, broadcast-address, time-offset, routers,
domain-name, domain-name-servers, domain-search, host-name,
dhcp6.name-servers, dhcp6.domain-search, dhcp6.fqdn, dhcp6.sntp-servers,
netbios-name-servers, netbios-scope, interface-mtu,
rfc3442-classless-static-routes, ntp-servers;
timeout 300;
#retry 60;
#reboot 10;
#select-timeout 5;
initial-interval 3;
initial-delay 3;
......@@ -173,10 +173,67 @@ elif static_widearea_config $iface; then
fi
else
#
# Find a list of candidate interfaces
# Find a list of candidate interfaces and run dhclient on them all.
#
_iflist=`ifconfig -a | grep -E '^(eth|en|sl)' | awk '{ print $1 }'`
echo "`date`: $iface: findcnet running dhclient on: $_iflist"
# Also wait until either the previous control net device is up, or
# until at least one is up (if this is first boot of the image and
# there is not yet a previous control net device), or a max amount
# of time.
#
MAXUPWAITTIME=6
LEASES="/var/lib/dhcp/dhclient.leases"
prevcnetdevs=
if [ -f $LEASES ]; then
prevcnetdevs=`cat $LEASES | sed -n -r -e 's/^[ \t]*interface[ \t]*"([^"]*)".*$/\1/p' | uniq | xargs`
echo "`date`: $iface: findcnet found '$prevcnetdevs' possible previous control net devices from old DHCP leases" >>$LOGDIR/dhclient.log
fi
_iflist=`ifconfig -a | grep -E '^(eth|en|sl)' | awk '{ print $1 }' | xargs`
lpc=0
while [ $lpc -lt $MAXUPWAITTIME ]; do
_iflist=`ifconfig -a | grep -E '^(eth|en|sl)' | awk '{ print $1 }' | xargs`
echo "`date`: $iface: findcnet checking $lpc for up devices: $_iflist" >>$LOGDIR/dhclient.log
downifs=0
upifs=0
upifaces=
foundprevcnet=0
for _if in $_iflist ; do
ip link show "$_if" | grep -q LOWER_UP
if [ $? -eq 0 ] ; then
upifs=`expr $upifs + 1`
upifaces="$upifaces $_if"
for _previf in $prevcnetdevs ; do
if [ "$_previf" = "$_if" ]; then
echo "`date`: $iface: known cnet device $_if is up; stopping waiting" >>$LOGDIR/dhclient.log
foundprevcnet=1
break
fi
done
if [ $foundprevcnet -eq 1 ]; then
break
fi
else
#ifconfig $_if up
downifs=`expr $downifs + 1`
fi
done
if [ $foundprevcnet -eq 1 ]; then
break
fi
if [ -z "$prevcnetdevs" -a $upifs -gt 0 ]; then
echo "`date`: $iface: at least one iface is up ($upifaces) (no previous cnet yet, must be first boot); stopping waiting" >>$LOGDIR/dhclient.log
break
fi
if [ $downifs -eq 0 ]; then
echo "`date`: $iface: all '$_iflist' are up; stopping waiting" >>$LOGDIR/dhclient.log
break
fi
lpc=`expr $lpc + 1`
sleep 1
done
# Mark that we are running dhclient.
touch /var/run/cnet-dhlient-running
#
# If dhclient returns success, then it has configured the first interface
......@@ -184,6 +241,7 @@ else
# more and just kill it. We also shutdown all the other interfaces (which
# dhclient will leave "up").
#
echo "`date`: $iface: findcnet running dhclient on: $_iflist"
if [ -x /sbin/dhclient ] && /sbin/dhclient -q $_iflist ; then
killall dhclient
rm -f /var/run/dhclient.pid
......@@ -197,6 +255,9 @@ else
done
fi
# We are done with dhclient.
rm -f /var/run/cnet-dhlient-running
# Emit this upstart event to allow boot to continue, even
# if we couldn't get a dhcp lease.
# Otherwise, if this is systemd, we have a special job that
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment