All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

Commit 72fb9e3a authored by David Johnson's avatar David Johnson

Fixup the last commit so newer-style linux shaping with netem works.

I didn't know this initially, but it turns out that with newer netems,
you can't add another netem qdisc instance (nor an htb instance)
inside another netem instance.  The linux maintainers removed
"classful" qdisc support from the netem qdisc (which makes it
possible for another qdisc to be "nested" inside a netem qdisc)
because 1) netem couldn't even nest an instance of itself inside
itself -- which isn't stricly necessary for us because we can do
both delay and plr in one netem instance), and 2) because apparently
non-work-conserving qdiscs already didn't work inside netem (a
work-conserving qdisc is one that always has a packet ready when its
underlying device is ready to transmit a packet -- thus, a
bandwidth-shaping qdisc that might not have a packet ready because
it's slowing down the send rate is non-work-conserving), and 3) to
support code cleanups.

So -- what this means for us is that by using modern netem, we are now
doing bandwidth shaping first, then plr and delay.  With our old
custom kernel modules, we were doing plr, delay, then bandwidth.

I talked this strategy over with Jon (because adding classful support
back to netem is nontrivial and defeats the point of trying to use
what's in the kernel directly without patching it more), and we believe
it's ok to do -- one because it doesn't always change the shaped rate
from the old way we used to do things, and second because using these
params *in tandem* to do link shaping is kind of a poor man's way
of actually modeling real link behavior -- a la flexlab.

So we'll just document it for users, call it beta for now, and test
it against the old way and BSD.  If it looks reasonable, we'll stick
with it; otherwise we'll look at reviving the old style.
parent 23bce373
......@@ -367,38 +367,52 @@ sub DelaySetup
$lp2 = $np2; $np2 += $pinc;
}
# next, plr on the link
if (!$DO_NETEM) {
# next, plr on the link
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 $nextparent1 plr $plr1";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 $nextparent2 plr $plr2";
}
else {
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 $nextparent1 netem drop $plr1";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 $nextparent2 netem drop $plr2";
}
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
# next, delay on link
if (!$DO_NETEM) {
# next, delay on link
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 parent $lp1:1 delay usecs $delay1";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 parent $lp2:1 delay usecs $delay2";
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
# finally, do the rate limiting
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 parent $lp1:1 htb default 1";
push @upcmds,"$TC class add dev $iface1 classid $np1:1 parent $np1 htb rate $bandw1 ceil $bandw1";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 parent $lp2:1 htb default 1";
push @upcmds,"$TC class add dev $iface2 classid $np2:1 parent $np2 htb rate $bandw2 ceil $bandw2";
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
}
else {
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 parent $lp1:1 delay ${delay1}us";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 parent $lp2:1 delay ${delay2}us";
}
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
#
# netem cannot have non-work-conserving qdiscs inside of itself,
# and it can't have itself inside itself -- because it uses the
# skbuff's control block and would thus overwrite itself. The
# Linux maintainers removed its classful support for these and
# other reasons, so you can't nest anything inside it.
# So, we have to do bandwidth shaping first, and then the loss
# and delay with the same netem qdisc.
#
# finally, do the rate limiting
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 parent $lp1:1 htb default 1";
push @upcmds,"$TC class add dev $iface1 classid $np1:1 parent $np1 htb rate $bandw1 ceil $bandw1";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 parent $lp2:1 htb default 1";
push @upcmds,"$TC class add dev $iface2 classid $np2:1 parent $np2 htb rate $bandw2 ceil $bandw2";
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
# first do the rate limiting
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 $nextparent1 htb default 1";
push @upcmds,"$TC class add dev $iface1 classid $np1:1 parent $np1 htb rate $bandw1 ceil $bandw1";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 $nextparent2 htb default 1";
push @upcmds,"$TC class add dev $iface2 classid $np2:1 parent $np2 htb rate $bandw2 ceil $bandw2";
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
# next, plr and delay on the link
push @upcmds,"$TC qdisc add dev $iface1 handle $np1 parent ${lp1}:1 netem drop $plr1 delay ${delay1}us";
push @upcmds,"$TC qdisc add dev $iface2 handle $np2 parent ${lp2}:1 netem drop $plr2 delay ${delay2}us";
$lp1 = $np1; $np1 += $pinc;
$lp2 = $np2; $np2 += $pinc;
}
# and last, add the down commands:
push @downcmds,"$TC qdisc del dev $iface1 root";
......@@ -639,19 +653,38 @@ sub LinkDelaySetup()
print DEL "$IFCONFIG $iface txqueuelen $queue\n";
print DEL "$TC qdisc add dev $iface handle $pipeno root ";
print DEL "netem drop $plr\n";
if (!$DO_NETEM) {
print DEL "$TC qdisc add dev $iface handle $pipeno root ";
print DEL "plr $plr\n";
print DEL "$TC qdisc add dev $iface handle ". ($pipeno+10) ." ";
print DEL "parent ${pipeno}:1 netem delay ${delay}us\n";
print DEL "$TC qdisc add dev $iface handle ". ($pipeno+10) ." ";
print DEL "parent ${pipeno}:1 delay usecs $delay\n";
print DEL "$TC qdisc add dev $iface handle ". ($pipeno+20) ." ";
print DEL "parent ". ($pipeno+10) .":1 htb default 1\n";
print DEL "$TC qdisc add dev $iface handle ". ($pipeno+20) ." ";
print DEL "parent ". ($pipeno+10) .":1 htb default 1\n";
if ($bandw != 0) {
print DEL "$TC class add dev $iface classid ". ($pipeno+20) .":1 ";
print DEL "parent ". ($pipeno+20) ." htb rate ${bandw} ";
print DEL "ceil ${bandw}\n";
if ($bandw != 0) {
print DEL "$TC class add dev $iface classid ". ($pipeno+20) .":1 ";
print DEL "parent ". ($pipeno+20) ." htb rate ${bandw} ";
print DEL "ceil ${bandw}\n";
}
}
else {
#
# See comments in DelaySetup for why we have to reverse
# the normal shaping order for netem!
#
print DEL "$TC qdisc add dev $iface handle ". ($pipeno+20) ." root ";
print DEL "htb default 1\n";
if ($bandw != 0) {
print DEL "$TC class add dev $iface classid ". ($pipeno+20) .":1 ";
print DEL "parent ". ($pipeno+20) ." htb rate ${bandw} ";
print DEL "ceil ${bandw}\n";
}
print DEL "$TC qdisc add dev $iface handle ".($pipeno+10)." parent ".($pipeno+20).":1 ";
print DEL "netem drop $plr delay ${delay}us\n";
}
$iface =~ /\D+(\d+)/;
......@@ -668,21 +701,37 @@ sub LinkDelaySetup()
die("No such IMQ device: imq${imqnum}");
}
print DEL "$TC qdisc add dev $imqdev handle $pipeno ";
print DEL "root netem drop $rplr\n";
print DEL "$TC qdisc add dev $imqdev handle ";
print DEL "". ($pipeno+10) ." parent ${pipeno}:1 ";
print DEL "netem delay ${rdelay}us\n";
print DEL "$TC qdisc add dev $imqdev handle ";
print DEL "". ($pipeno+20) ." parent ". ($pipeno+10) .":1 ";
print DEL "htb default 1\n";
if (!$DO_NETEM) {
print DEL "$TC qdisc add dev $imqdev handle $pipeno ";
print DEL "root plr $rplr\n";
print DEL "$TC qdisc add dev $imqdev handle ";
print DEL "". ($pipeno+10) ." parent ${pipeno}:1 ";
print DEL "delay ${rdelay}us\n";
print DEL "$TC qdisc add dev $imqdev handle ";
print DEL "". ($pipeno+20) ." parent ". ($pipeno+10) .":1 ";
print DEL "htb default 1\n";
if ($rbandw != 0) {
print DEL "$TC class add dev $imqdev classid ";
print DEL "". ($pipeno+20) .":1 parent ". ($pipeno+20) ." ";
print DEL "htb rate ${rbandw} ceil ${rbandw}\n";
}
}
else {
print DEL "$TC qdisc add dev $imqdev handle ";
print DEL "". ($pipeno+20) ." root ";
print DEL "htb default 1\n";
if ($rbandw != 0) {
print DEL "$TC class add dev $imqdev classid ";
print DEL "". ($pipeno+20) .":1 parent ". ($pipeno+20) ." ";
print DEL "htb rate ${rbandw} ceil ${rbandw}\n";
}
if ($rbandw != 0) {
print DEL "$TC class add dev $imqdev classid ";
print DEL "". ($pipeno+20) .":1 parent ". ($pipeno+20) ." ";
print DEL "htb rate ${rbandw} ceil ${rbandw}\n";
print DEL "$TC qdisc add dev $imqdev handle ".($pipeno+10)." ";
print DEL "parent ".($pipeno+20).":1 netem drop $rplr delay $rdelay\n";
}
print DEL "$IPTABLES -t mangle -A PREROUTING -i $iface ";
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment