In order to allow experiments with a very large number of nodes, we provide a multiplexed virtual node implementation. If an experiment application's CPU, memory and network requirements are modest, multiplexed virtual nodes (hereafter known as just "virtual nodes"), allow an experiment to use 10-20 times as many nodes as there are available physical machines in Emulab. These virtual nodes can currently only run FreeBSD, but Linux support is coming.
Virtual nodes fall between simulated nodes (ala,
ns
)
and real, dedicated machines in terms of accuracy of modeling the real world.
A virtual node is just a lightweight virtual machine running on top of
a regular operating system. In particular, our virtual nodes are based
on the FreeBSD jail
mechanism, that allows groups of processes
to be isolated from each other while running on the same physical machine.
Emulab virtual nodes provide isolation of the filesystem, process, network,
and account namespaces. That is to say, each virtual node has its own
private filesystem, process hierarchy, network interfaces and IP addresses,
and set of users and groups. This level of virtualization allows unmodified
applications to run as though they were on a real machine. Virtual network
interfaces are used to form an arbitrary number of virtual network links.
These links may be individually shaped and may be multiplexed over physical
links or used to connect virtual nodes within a single physical node.
With some limitations, virtual nodes can act in any role that a normal Emulab node can: end node, router, or traffic generator. You can run startup commands, ssh into them, run as root, use tcpdump or traceroute, modify routing tables, and even reboot them. You can construct arbitrary topologies of links and LANs, even mixing virtual and real nodes.
The number of virtual nodes that can be multiplexed on a single physical node depends on a variety of factors including the resource requirements of the application, the type of the underlying node, the bandwidths of the links you are emulating and the desired fidelity of the emulation. See the Advanced Issues section for more info.
set nodeA [$ns node]
tb-set-hardware $nodeA pcvm
or, if you want all virtual nodes to be mapped to the same machine type,
say a pc850:
set nodeA [$ns node]
tb-set-hardware $nodeA pcvm850
that is, instead of "pcvm" use "pcvmN" where N is the node type
(600, 850, 1500, 2000).
That's it! With few exceptions, every thing you use in an NS file for an
Emulab experiment running on physical nodes, will work with virtual nodes.
The most notable exception is that you cannot specify the operating system
for a virtual node, they are limited to running our custom version of
FreeBSD 4.7 (soon to be FreeBSD 4.9).
As a simple example, we could take the basic NS script
used in the
tutorial
add the following lines:
and remove the explicit setting of the OS:
tb-set-hardware $nodeA pcvm
tb-set-hardware $nodeB pcvm
tb-set-hardware $nodeC pcvm
tb-set-hardware $nodeD pcvm
and the resulting NS file
can be submitted to produce the very same topology.
Once the experiment has been instantiated, the experiment web page should
include a listing of the reserved nodes that looks something like:
# Set the OS on a couple.
tb-set-node-os $nodeA FBSD-STD
tb-set-node-os $nodeC RHL-STD
Logging into a virtual node you see only the processes associated with your
jail:
The
PID TT STAT TIME COMMAND
1846 ?? IJ 0:00.01 injail: pcvm36-5 (injail)
1883 ?? SsJ 0:00.03 /usr/sbin/syslogd -ss
1890 ?? SsJ 0:00.01 /usr/sbin/cron
1892 ?? SsJ 0:00.28 /usr/sbin/sshd
1903 ?? IJ 0:00.01 /usr/bin/perl -w /usr/local/etc/emulab/watchdog start
5386 ?? SJ 0:00.04 sshd: mike@ttyp1 (sshd)
5387 p1 SsJ 0:00.06 -tcsh (tcsh)
5401 p1 R+J 0:00.00 ps ax
injail
process serves the same function as init
on a regular node, it is the ''root'' of the process name space. Killing it
will kill the entire virtual node. Other standard FreeBSD processes include
syslog
, cron
, and sshd
along with the
Emulab watchdog process. Note that the process IDs are in fact not
virtualized, they are in the physical machine's name space. However,
a virtual node still cannot kill a process that is part of another jail.
Doing a df
you see:
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/vn5c 507999 1484 496356 0% /
/var/emulab/jails/local/testbed 6903614 73544 6277782 1% /local/testbed
/users/mike 14081094 7657502 5297105 59% /users/mike
...
/dev/vn5c
is your private root filesystem, which is a FreeBSD
vnode disk (i.e., a regular file in the physical machine filesystem).
/local/
projname is ''loopback'' mounted from the physical
host and provides some disk space that is shared between all virtual nodes
on the same physical node. Also mounted are the usual Emulab-provided, shared
filesystems. Thus you have considerable flexibility in sharing ranging from
shared by all nodes (/users/
yourname and
/proj/
projname), shared by all virtual nodes on a physical
node (/local/
projname) to private to a virtual node
(/local
).
Doing ifconfig
reveals:
Here
fxp4: flags=8843
fxp4
is the control net interface. Due to limited routable
IP address space, Emulab uses the 172.16/12 unroutable address range to assign
control net addresses to virtual nodes. These addresses are routed within
Emulab, but are not exposed externally. This means that you can access this
node (including using the DNS name ''nodeC.vtest.testbed.emulab.net'') from
ops.emulab.net or from other nodes in your experiment, but not from
outside Emulab. If you need to access a virtual node from outside Emulab,
you will have to proxy the access via ops or a physical node (that is what
the ssh icon in the web page does). veth3
is a virtual ethernet
device (not part of standard FreeBSD, we wrote it at Utah) and is the
experimental interface for this node. There will be one veth
device for every experimental interface. Note the reduced MTU (1484) on the
veth interface. This is because the veth device uses encapsulation to
identify packets which are multiplexed on physical links. Even though this
particular virtual link does not cross a physical wire, the MTU is reduced
anyway so that all virtual links have the same MTU.
Following is a list of the per virtual node resources and how they can be accessed from the physical host:
/var/emulab/jails/
vnodename/root
where vnodename is the "pcvmNN-NN" Emulab name. The regular
file that is the disk itself is in the per virtual node directory
as root.vnode
. The /bin, /sbin, /usr directories
are read-only loopback mounted from the parent as are the normal
shared directories in /users and /proj.
ifconfig
.
Identifying which interfaces belong to a particular virtual node must
be done by hand, most easily by first logging into the virtual node in
question and doing ifconfig
. You can also look at
/var/emulab/jails/
vnodename/rc.ifc
which is the startup script used to configure the node's interfaces.
In addition to the usual information, ifconfig
on a
virtual device also shows which route table (rtabid), broadcast
domain (vethtag) and parent device (parent) it is associated with it.
See Technical Details below for what
these mean.
netstat
with the enhanced '-f inet' option:
netstat -ran -f inet
netstat -ran -f inet:3
netstat -ran -f inet:-1
The first form shows IP4 routes in the "main" (physical host's)
routing table. The second would show routing table 3, and the last
shows all active routing tables. Routing tables may be modified using
the route
command with the new '-rtabid N' option, where
N is the rtabid:
route add -rtabid 3 -net 192.168/16 -interface lo0
Normally, the Emulab resource mapper, assign
will map virtual nodes onto physical
nodes in such a way as to achieve the best overall use of physical resources
without violating any of the constraints of the virtual nodes or links.
In a nutshell, it packs as many virtual nodes onto a physical node as it
can without exceeding a node's internal or external network bandwidth
capabilities and without exceeding a node-type specific static packing
factor. Internal network bandwidth is an empirically derived value for
how much network data can be moved through internally connected virtual
ethernet interfaces. External network bandwidth is determined by the number
of physical interfaces available on the node. The static packing factor is
intended as a coarse metric of CPU and memory load that a physical node
can support, currently it is based strictly on the amount of physical memory
in each node type. The current values for these constraints are:
The mapper generally produces an "unsurprising" mapping of virtual nodes to physical nodes (e.g., mapping small LANs all on the same physical host) and where it doesn't, it is usually because doing so would violate one of the constraints. One exception involves LANs.
One might think that an entire 100Mb LAN, regardless of the number of members, could be located on a single physical host since the internal bandwidth of a host is 400Mb/sec. Alas, this is not the case. A LAN is modeled in Emulab as a set of point-to-point links to a "LAN node." The LAN node will then see 100Mb/sec from every LAN member. For the purposes of bandwidth allocation, a LAN node must be mapped to a physical host just as any other node. The difference is that a LAN node may be mapped to a switch, which has "unlimited" internal bandwidth, as well as to a node. Now consider the case of a 100Mb/sec LAN with 5 members. If the LAN node is colocated with the other nodes on the same physical host, it is a violation as 500Mb/sec of bandwidth is required for the LAN node. If instead the LAN node is mapped to a switch, it is still a violation because now we need 500Mb/sec from the physical node to the switch, but there is only 400Mb/sec available there as well. Thus you can only have 4 members of a 100Mb/sec LAN on any single physical host. You can however have 4 members on each of many physical hosts to form a large LAN, in this case the LAN node will be located on the switch. Note that this discussion applies equally to 8 members on a 50Mb/sec LAN, 20 members of a 20Mb LAN, or any LAN where the aggregate bandwidth exceeds 400Mb/sec. And of course, you must take into consideration the bandwidth of all other links and LANs on a node. Now you know why we have a complex program to do this!
Anyway, if you are still not deterred and feel you can do a better job of virtual to physical node mapping yourself, there are a few ways to do this. Note carefully though that none of these will allow you to violate the bandwidth and packing constraints listed above.
The NS-extension tb-set-colocate-factor
command allows you
to globally decrease (not increase!) the maximum number of virtual nodes
per physical node. This command is useful if you know the application
load you are running in the vnodes is going to require more resources
per instance (e.g., a java DHT), and that the Emulab picked values of
10-20 per physical node are just too high.
Note that currently, this is not really a "factor,"
it is an absolute value. Setting it to 5 will reduce the capacity of
all node types to 5, whether they were 10 or 20 by default.
If the packing factor is ok, but assign
just won't colocate virtual nodes the way you want,
you can resort to trying to do the mapping by hand using
tb-fix-node
. This technique is not for the faint of heart
(or weak of stomach) as it involves mapping virtual nodes to specific
physical nodes, which you must determine in advance are available.
For example, the following code snippet will allocate 8 nodes in a LAN
and force them all onto the same physical host (pc41):
If the host is not available, this will fail. Note again, that "fixing"
nodes will still not allow you to violate any of the fundamental
mapping constraints.
set phost pc41 # physical node to use
set phosttype 850 # type of physical node, e.g. pc850
# Force virtual nodes in a LAN to one physical host
set lanstr ""
for {set j 1} {$j <= 8} {incr j} {
set n($j) [$ns node]
append lanstr "$n($j) "
tb-set-hardware $n($j) pcvm${phosttype}
tb-fix-node $n($j) $phost
}
set lan [$ns make-lan "$lanstr" 10Mb 0ms]
There is one final technique that will allow you to circumvent
assign
and the bandwidth constraints above.
The NS-extension tb-set-noshaping
can be used to turn off
link shaping for a specific link or LAN, e.g.:
added to the NS snippet above would allow you to specify "1Mb" for the
LAN bandwidth and map 20 virtual nodes to the same physical host,
but then not be bound by the bandwidth constraint later.
In this way
tb-set-noshaping $lan 1
assign
would map your topology, but no enforcement
would be done at runtime. Specifically, this tells Emulab not to set
up ipfw rules and dummynet pipes on the specified interfaces.
One semi-legitimate use
of this command, is in the case where you know that your applications
will not exceed a certain bandwidth, and you don't want to incur the
ipfw/dummynet overhead associated with explicitly enforcing the limits.
Note, that as implied by the name, this turns off all shaping of a link,
not just the bandwidth constraint. So if you need delays or packet loss,
don't use this.
One thing to try is to allocate a modest sized version of your experiment, say 40-50 nodes, using just physical nodes and compare that to the same experiment with 40-50 virtual nodes with various packing factors.
We are currently working on techniques that will allow you to specify some performance constraints in some fashion, and have the experiment run and self-adjust til it reaches a packing factor that doesn't violate those constraints.
set lanstr ""
for {set j 1} {$j <= 8} {incr j} {
set n($j) [$ns node]
append lanstr "$n($j) "
if {$j & 1} {
tb-set-hardware $n($j) pcvm
} else {
tb-set-hardware $n($j) pc
tb-set-node-os $n($j) FBSD-STD
}
}
set lan [$ns make-lan "$lanstr" 10Mb 0ms]
The current limitation is that the physical nodes must run FreeBSD because
of the use of the custom encapsulation on virtual ethernet devices. Note
that this also implies that the physical nodes use virtual ethernet devices
and thus the MTU is likewise reduced.
We have implemented, but not yet deployed, a non-encapsulating version of the virtual ethernet interface that will allow virtual nodes to talk directly to physical ethernet interfaces and thus remove the FreeBSD-only and reduced-MTU restrictions.
doc/vnode-impl.txt
.