Commit 36835af1 authored by Mike Hibler's avatar Mike Hibler
Browse files

Integrate some text I had elsewhere about the virtual control net.

parent 7b8b84aa
......@@ -466,7 +466,88 @@ up with the correct one. But that shouldn't matter as each vnode should
get a reply eventually and each reply should have the same info.
C5. More about the startup pieces.
C5. The virtual control net.
As mentioned, we implement a "virtual control net" that allows each jail
to have its own unique address address and port name space. We use the
172.16/12 unroutable range for this, which we do route internal to Emulab.
External access to these jails is provided via ssh forwarding on ops
(e.g., as encapsulated in the ssh-mime.pl script). The current convention
for naming in the 20 available bits of 172.16/12 is:
12 bits for (up to 4096) physical host id
8 bits for (up to 256) virtual host id
where we might possibly have to reduce the physical host id by a bit if we
want mainbed (say 172.16) and minibed (say 172.24) to exist together.
[ Note that this isn't what we do right now, we mistakenly used 172.17
as a second testbed range. But we ignore the existence of a second testbed
for now... ]
We need 172.16 addresses for the router and for each physical node. The
router is 172.16.0.1 as convention would dictate. We could use virtual
host id 0 in each net to be the physical host so that, for example on pc79:
172.16.79.0 pcvm79-host (i.e., an alias for pc79's control IF)
172.16.79.1 pcvm79-1
...
172.16.79.254 pcvm79-254
Note that we don't use .255. Even though this is not a real /24 net, we
treat it as such for routing internal to the jail (see below).
We setup a number of special routes for a jail, some setup externally,
some from inside:
Destination Gateway Flags Refs Use Netif
default 172.16.0.1 UGSc 2 0 fxp0
127.0.0.1 127.0.0.1 UH 0 0 lo0
155.101.132/22 link#5 UCSc 0 0 fxp0
155.101.132.79 127.0.0.1 UGHS 0 0 lo0
172.16/12 link#1 UCSc 1 0 fxp0
172.17.79/24 lo0 USc 0 0 lo0
The "default" route gets out to "the world", which means the testbed
servers (boss/ops/tipserv). 172.16.0.1 is an alias on the router for
the physical control net. Using a virtual control net address for the
router is not necessary for most applications but was added for gated,
which checks that next hops are accessible via attached interfaces.
Since the control net interface appears internal to the jail as
172.17.x.x/255.255.255.255, this still isn't quite correct, but we use
a config file feature of gated to finesse it. We have to apply more
finesse when setting up routing, as described below.
The loopback route "127.0.0.1" is not as straight-forward as it might
appear. Since lo0 is a shared interface, how do we ensure that packets
loopback within a jail and are not received by a different jail? The
kernel jail code takes care of this by changing the source IP address
from the loopback address to the primary jail address. However, since
it still uses the shared loopback device, there are a couple of implications.
First, any jail can see the traffic with tcpdump. Second, since the
interface is not tagged, replies are routed using the primary routing
table. Thus there must be a route for reaching 172.17.x.x in that
routing table. We ensure this as part of the jail setup process.
The "155.101.132" routes ensure we can reach nodes via their physical
control net addresses (e.g., using the canonical "pcXX" names). The
first reaches others hosts, the second is a loopback route for the
local host. Strictly speaking, this is a violation of the virtualization,
but it is a pragmatic one.
The "172.16/12" route is the general virtual control net route. This
route might seem redundant given the default route, but it is actually
needed to setup the default route. If this route didn't exist first,
replies for ARP requests for the gateway would be rejected as "not on
local network" since the control net interfaces appears as a /32 net
and 172.16.0.1 is technically not reachable via it.
Finally, "172.17.x/24" is a loopback route used to reach the set of vnodes
on this pnode. Note that this includes the virtual alias of the physical
host (.0). [ Note also that the .17 would be .16 except for the botched
main/mini-bed naming. ]
C6. More about the startup pieces.
vnodesetup hangs around so that you can signal it and easily
reboot the vnode. I guess the idea is that it is also jail/vserver
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment