clientside/tmcc/linux/openvz/libvnode_openvz.pm · fdf97b51f1a82112368c899b321d6b8814e74f8a · emulab / emulab-devel

Lots of changes: debug; macvlans; details below. · fdf97b51

David Johnson authored Nov 29, 2011

I added debug options for each LVM and vzctl call; you can toggle it
on by touching /vz/.lvmdebug, /vz.save/.lvmdebug, /.lvmdebug, and
/vz/.vzdebug, /vz.save/.vzdebug, /.vzdebug. I also added dates to
debug timestamps for debugging longer-term shared node problems.

I added support for using macvlan devices instead of openvz veths
for experiment interfaces. Basically, you can add macvlan devices
atop any other ethernet device to "virtualize" it using fake mac
addresses. We use them like this: if the virtual link/lan needs to
leave the vhost on a phys device or vlan device, we attach the macvlan
devices to the appropriate real device. If the virtlan is completely
internal to the vhost, we create a dummy ethernet device and attach
the macvlan devices to that.

The difference between macvlan devices and veths is that macvlan
devices are created only in the root context, and are moved into
the container context when the vnodes boot. There is no "root
context" half -- the device is fully in the container's network
namespace. BUT, the underlying device is in the root network
namespace.

We use macvlans in "bridge" mode, so that when one macvlan device sends
a packet, the device driver checks any other macvlan devices attached
to the underlying physical, vlan, or dummy device, and delivers the packet
accordingly. The difference between this fake bridge and a real bridge
is that the macvlan driver knows the mac of each attached interface,
and does not have to do any learning whatsoever. I haven't looked at
the code, but it should be a very, very simple, fast, and zero-copy
transmit from one macvlan device onto another.

This is essentially the same as the planetlab shortbridge, but since
I haven't looked at the code, I can't say that there aren't more
opportunities to optimize. Still, this should hopefully be faster
than openvz veths.

Oh, and I also added support for using Linux tc's netem modules
for doing delay and loss shaping, instead of using our custom
kernel modules. I got tired of pulling our patches forward and
adapting to the packet sched API changes in the kernel! netem is
more advanced than our stuff, anyway, and should do a fine job.

fdf97b51