vnodes.html 27.6 KB
Newer Older
1
2
<!--
   EMULAB-COPYRIGHT
3
   Copyright (c) 2000-2005 University of Utah and the Flux Group.
4
5
6
7
8
9
10
11
12
13
14
   All rights reserved.
  -->
<center>
<h1>Multiplexed Virtual Nodes in Emulab</h1>
</center>

<h2>Contents</h2>
<ul>
<li> <a href="#Overview">Overview</a>
<li> <a href="#Use">Use</a>
<li> <a href="#AdvancedIssues">Advanced Issues</a>
15
16
17
18
19
20
<ul>
<li> <a href="#AI1">Taking advantage of a virtual node host</a>
<li> <a href="#AI2">Controlling virtual node layout</a>
<li> <a href="#AI3">Determining how many nodes to colocate</a>
<li> <a href="#AI4">Mixing virtual and physical nodes</a>
</ul>
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<li> <a href="#Limitations">Limitations</a>
<li> <a href="#KnownBugs">Known Bugs</a>
<li> <a href="#TechDetails">Technical Details</a>
</ul>

<hr>
<a NAME="Overview"></a><h2>Overview</h2>
<p>
In order to allow experiments with a very large number of nodes,
we provide a <i>multiplexed virtual node</i> implementation.
If an experiment application's CPU, memory and network requirements are
modest, multiplexed virtual nodes (hereafter known as just "virtual nodes"),
allow an experiment to use 10-20 times as many nodes as there are available
physical machines in Emulab.  These virtual nodes can currently only run
FreeBSD, but Linux support is coming.
</p><p>
37
38
Virtual nodes fall between simulated nodes (ala, <code>
    <a href="docwrapper.php3?docname=nse.html">ns</a></code>)
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
and real, dedicated machines in terms of accuracy of modeling the real world.
A virtual node is just a lightweight virtual machine running on top of
a regular operating system.  In particular, our virtual nodes are based
on the FreeBSD <code>jail</code> mechanism, that allows groups of processes
to be isolated from each other while running on the same physical machine.
Emulab virtual nodes provide isolation of the filesystem, process, network,
and account namespaces.  That is to say, each virtual node has its own
private filesystem, process hierarchy, network interfaces and IP addresses,
and set of users and groups.  This level of virtualization allows unmodified
applications to run as though they were on a real machine.  Virtual network
interfaces are used to form an arbitrary number of virtual network links.
These links may be individually shaped and may be multiplexed over physical
links or used to connect virtual nodes within a single physical node.
</p><p>
With <a href="#Limitations">some limitations</a>, virtual nodes can act in
any role that a normal Emulab node can: end node, router, or traffic generator.
You can run startup commands, ssh into them, run as root, use tcpdump or
traceroute, modify routing tables, and even reboot them.  You can construct
arbitrary topologies of links and LANs, even mixing virtual and real nodes.
</p><p>
The number of virtual nodes that can be multiplexed on a single physical
node depends on a variety of factors including the resource requirements
of the application, the type of the underlying node, the bandwidths
of the links you are emulating and the desired fidelity of the emulation.
See the <a href="#AdvancedIssues">Advanced Issues</a> section for more info.
</p>
<a NAME="Use"></a><h2>Use</h2>
66
Multiplexed virtual nodes are specified in an NS description by indicating
67
68
69
70
71
that you want the <b>pcvm</b> node type:
	<code><pre>
	set nodeA [$ns node]
	tb-set-hardware $nodeA pcvm
	</code></pre>
72
73
74
75
76
77
78
79
or, if you want all virtual nodes to be mapped to the same machine type,
say a pc850:
	<code><pre>
	set nodeA [$ns node]
	tb-set-hardware $nodeA pcvm850
	</code></pre>
that is, instead of "pcvm" use "pcvmN" where N is the node type
(600, 850, 1500, 2000).
80
81
82
83
That's it!  With few exceptions, every thing you use in an NS file for an
Emulab experiment running on physical nodes, will work with virtual nodes.
The most notable exception is that you cannot specify the operating system
for a virtual node, they are limited to running our custom version of
84
FreeBSD 4.10.
85
</p><p>
86
As a simple example, we could take the <a href="basic.ns">basic NS script</a>
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
used in the
<a href="docwrapper.php3?docname=tutorial.html#Designing">tutorial</a>
add the following lines:
	<code><pre>
	tb-set-hardware $nodeA pcvm
	tb-set-hardware $nodeB pcvm
	tb-set-hardware $nodeC pcvm
	tb-set-hardware $nodeD pcvm
	</code></pre>
and remove the explicit setting of the OS:
	<code><pre>
	# Set the OS on a couple.
	tb-set-node-os $nodeA FBSD-STD
	tb-set-node-os $nodeC RHL-STD         
	</code></pre>
102
and the <a href="vnode-example.ns">resulting NS file</a>
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
can be submitted to produce the very same topology.
Once the experiment has been instantiated, the experiment web page should
include a listing of the reserved nodes that looks something like:
<p>
  <br>
  <center>
    <img src="vnodes-list.png"><br><br>
  </center>
  <br>
</p>
By looking at the NodeIDs (pcvm36-NN), you can see that all four virtual
nodes were assigned to the same physical node (pc36).
(At the moment, control over virtual node to physical node mapping is
limited.  The <a href="#AdvancedIssues">Advanced Issues</a> section
discusses ways in which you can affect the mapping.)
Clicking on
the ssh icon will log you in to the virtual node.  Virtual nodes do not have
consoles, so there is no corresponding icon.  Note that there is also an entry
for the ''hosting'' physical node.  You can login to it as well, either with
ssh or via the console.
See the <a href="#AdvancedIssues">Advanced Issues</a> section for how you
can use the physical host.
Finally, note that there is no ''delay node'' associated with the shaped
link.  This is because virtual links always use
<a href="../doc/docwrapper.php3?docname=linkdelays.html#LINKDELAYS">
128
end node shaping</a>.
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
</p>
<p>
Logging into a virtual node you see only the processes associated with your
jail:
	<code><pre>
        PID  TT  STAT      TIME COMMAND
        1846  ??  IJ     0:00.01 injail: pcvm36-5 (injail)
        1883  ??  SsJ    0:00.03 /usr/sbin/syslogd -ss
        1890  ??  SsJ    0:00.01 /usr/sbin/cron
        1892  ??  SsJ    0:00.28 /usr/sbin/sshd
        1903  ??  IJ     0:00.01 /usr/bin/perl -w /usr/local/etc/emulab/watchdog start
        5386  ??  SJ     0:00.04 sshd: mike@ttyp1 (sshd)
        5387  p1  SsJ    0:00.06 -tcsh (tcsh)
        5401  p1  R+J    0:00.00 ps ax
	</code></pre>
The <code>injail</code> process serves the same function as <code>init</code>
on a regular node, it is the ''root'' of the process name space.  Killing it
will kill the entire virtual node.  Other standard FreeBSD processes include
<code>syslog</code>, <code>cron</code>, and <code>sshd</code> along with the
Emulab watchdog process.  Note that the process IDs are in fact <i>not</i>
virtualized, they are in the physical machine's name space.  However,
a virtual node still cannot kill a process that is part of another jail.
</p><p>
Doing a <code>df</code> you see:
	<code><pre>
        Filesystem                      1K-blocks      Used   Avail Capacity  Mounted on
        /dev/vn5c                          507999      1484  496356     0%    /
        /var/emulab/jails/local/testbed   6903614     73544 6277782     1%    /local/testbed
        /users/mike                      14081094   7657502 5297105    59%    /users/mike
        ...
	</code></pre>
<code>/dev/vn5c</code> is your private root filesystem, which is a FreeBSD
vnode disk (i.e., a regular file in the physical machine filesystem).
<code>/local/</code><i>projname</i> is ''loopback'' mounted from the physical
host and provides some disk space that is shared between all virtual nodes
on the same physical node.  Also mounted are the usual Emulab-provided, shared
filesystems.  Thus you have considerable flexibility in sharing ranging from
shared by all nodes (<code>/users/</code><i>yourname</i> and
<code>/proj/</code><i>projname</i>), shared by all virtual nodes on a physical
node (<code>/local/</code><i>projname</i>) to private to a virtual node
(<code>/local</code>).

</p><p>
Doing <code>ifconfig</code> reveals:
	<code><pre>
        fxp4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 rtabid 0
                inet 172.17.36.5 netmask 0xffffffff broadcast 172.17.36.5
                ether 00:d0:b7:14:0f:e2
                media: Ethernet autoselect (100baseTX <full-duplex>)
                status: active
        lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 rtabid 0
                inet 127.0.0.1 netmask 0xff000000 
        veth3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1484 rtabid 5
                inet 10.1.2.3 netmask 0xffffff00 broadcast 10.1.2.255
                ether 00:00:0a:01:02:03
                vethtag: 513 parent interface: <none>
	</code></pre>
Here <code>fxp4</code> is the control net interface.  Due to limited routable
IP address space, Emulab uses the 172.16/12 unroutable address range to assign
control net addresses to virtual nodes.  These addresses are routed within
Emulab, but are not exposed externally.  This means that you can access this
node (including using the DNS name ''nodeC.vtest.testbed.emulab.net'') from
ops.emulab.net or from other nodes in your experiment, but <i>not</i> from
outside Emulab.  If you need to access a virtual node from outside Emulab,
you will have to proxy the access via ops or a physical node (that is what
the ssh icon in the web page does).  <code>veth3</code> is a virtual ethernet
device (not part of standard FreeBSD, we wrote it at Utah) and is the
experimental interface for this node.  There will be one <code>veth</code>
device for every experimental interface.  Note the reduced MTU (1484) on the
veth interface.  This is because the veth device uses encapsulation to 
identify packets which are multiplexed on physical links.  Even though this
particular virtual link does not cross a physical wire, the MTU is reduced
anyway so that all virtual links have the same MTU.

</p><p>
<a NAME="AdvancedIssues"></a><h2>Advanced Issues</h2>

206
<a NAME="AI1"></a><h3>Taking advantage of a virtual node host.</h3>
Mike Hibler's avatar
Mike Hibler committed
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
A physical node hosting one or more virtual nodes is not itself part of
the topology, it exists only to host virtual nodes.  However, the physical
node is still setup with user accounts and shared filesystems just as a
regular node is.  Thus you can login to, and use the physical node in a
variety of ways:
<ul>
<li> Since the /usr file system for each node is mounted via a read-only
     loopback mount from the physical host, any files installed on a
     physical host's /usr will automatically be part of every virtual node
     as well.  This allows for a potentially more efficient file distribution
     mechanism:
     install packages in the host's /usr and they are visible in the virtual
     nodes as well.  Unfortunately, there is currently no "handle" for a
     virtual node host in the NS file, so you cannot install tarballs or
     RPMs on it as part of the experiment creation process.  You must install
     them by hand after the experiment has been created, and reboot the
     virtual nodes.  Thereafter, the packages will be available.
<li> The private root filesystem for each virtual node is also accessible
225
226
     to the host node (see below).  Thus the host can monitor log files and
     even change files on the fly.
Mike Hibler's avatar
Mike Hibler committed
227
228
229
230
231
232
233
234
235
236
237
<li> Other forms of monitoring can be done as well since all processes,
     filesystems, network interfaces and routing tables are visible in the
     host.  For instance, you can run tcpdump on a virtual interface outside
     the node rather than inside it.  You can also tcpdump on a physical
     interface on which many virtual nodes' traffic is multiplexed.  The
     installed version of tcpdump understands the veth encapsulation.
</ul>
We should emphasize however, that virtual nodes are not "performance
isolated" from each other or from the host; i.e., a big CPU hogging
monitor application in the host might affect the performance and behavior
of the hosted virtual nodes.
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
<p>
Following is a list of the per virtual node resources and how they can be
accessed from the physical host:
<ul>
<li> <b>Processes.</b>
     FreeBSD does not distinguish which processes belong to which jails,
     you can see which processes belong to any jail as indicated by the
     'J' in a ps listing.  The "injail" process for each jail does identify
     itself on the ps command line, so you can trace parent/child
     relationships from there.
<li> <b>Filesystems.</b>
     The private "file" disk for each virtual node is mounted as
     <code>/var/emulab/jails/</code><i>vnodename</i><code>/root</code>
     where <i>vnodename</i> is the "pcvmNN-NN" Emulab name.  The regular
     file that is the disk itself is in the per virtual node directory
     as <code>root.vnode</code>.  The /bin, /sbin, /usr directories
     are read-only loopback mounted from the parent as are the normal
     shared directories in /users and /proj.
<li> <b>Network Interfaces.</b>
     All virtual network interfaces are visible using <code>ifconfig</code>.
     Identifying which interfaces belong to a particular virtual node must
     be done by hand, most easily by first logging into the virtual node in
     question and doing <code>ifconfig</code>.  You can also look at
     <code>/var/emulab/jails/</code><i>vnodename</i><code>/rc.ifc</code>
     which is the startup script used to configure the node's interfaces.
     In addition to the usual information, <code>ifconfig</code> on a
     virtual device also shows which route table (rtabid), broadcast
     domain (vethtag) and parent device (parent) it is associated with it.
     See <a href="#TechDetails">Technical Details</a> below for what
     these mean.
<li> <b>Routing tables.</b>
     Every virtual node has its own IP routing table.  Each table is
     identified by an ID, the "rtabid."  Tables can be viewed in the
     parent using <code>netstat</code> with the enhanced '-f inet' option:
		<code><pre>
		netstat -ran -f inet
		netstat -ran -f inet:3
		netstat -ran -f inet:-1
		</code></pre>
     The first form shows IP4 routes in the "main" (physical host's)
     routing table.  The second would show routing table 3, and the last
     shows all active routing tables.  Routing tables may be modified using
     the <code>route</code> command with the new '-rtabid N' option, where
     N is the rtabid:
		<code><pre>
		route add -rtabid 3 -net 192.168/16 -interface lo0
		</code></pre>
</ul>
286

287
<a NAME="AI2"></a><h3>Controlling virtual node layout.</h3>
Mike Hibler's avatar
Mike Hibler committed
288
289
290
291
292
293
294
295
296
297
<p>
Normally, the Emulab resource mapper, <code>assign</code>
will map virtual nodes onto physical
nodes in such a way as to achieve the best overall use of physical resources
without violating any of the constraints of the virtual nodes or links.
In a nutshell, it packs as many virtual nodes onto a physical node as it
can without exceeding a node's internal or external network bandwidth
capabilities and without exceeding a node-type specific static packing
factor.  Internal network bandwidth is an empirically derived value for
how much network data can be moved through internally connected virtual
298
ethernet interfaces.  External network bandwidth is determined by the number
Mike Hibler's avatar
Mike Hibler committed
299
300
of physical interfaces available on the node.  The static packing factor is
intended as a coarse metric of CPU and memory load that a physical node
301
302
can support, currently it is based strictly on the amount of physical memory
in each node type.  The current values for these constraints are:
Mike Hibler's avatar
Mike Hibler committed
303
304
<ul>
<li>Internal network bandwidth: 400Mb/sec for all node types
305
<li>External network bandwidth: 400Mb/sec (4 x 100Mb NICs) for all node types
Mike Hibler's avatar
Mike Hibler committed
306
307
308
309
310
311
<li>Packing factor: 10 for pc600s and pc1500s, 20 for pc850s and pc2000s
</ul>
</p><p>
The mapper generally produces an "unsurprising" mapping of virtual nodes
to physical nodes (e.g., mapping small LANs all on the same physical host)
and where it doesn't, it is usually because doing so would violate one
312
of the constraints.  One exception involves LANs.
Mike Hibler's avatar
Mike Hibler committed
313
</p><p>
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
One might think that an entire 100Mb LAN, regardless of the number of
members, could be located on a single physical host since the internal
bandwidth of a host is 400Mb/sec.  Alas, this is not the case.  A LAN
is modeled in Emulab as a set of point-to-point links to a "LAN node."
The LAN node will then see 100Mb/sec from every LAN member.  For the
purposes of bandwidth allocation, a LAN node must be mapped to a physical
host just as any other node.  The difference is that a LAN node may be
mapped to a switch, which has "unlimited" internal bandwidth,
as well as to a node.  Now consider the case of a 100Mb/sec LAN with 5 members.
If the LAN node is colocated with the other nodes on the same physical
host, it is a violation as 500Mb/sec of bandwidth is required for
the LAN node.  If instead the LAN node is mapped to a switch, it is
still a violation because now we need 500Mb/sec from the physical node
to the switch, but there is only 400Mb/sec available there as well.
Thus you can only have 4 members of a 100Mb/sec LAN on any single physical
host.  You can however have 4 members on each of many physical hosts to
form a large LAN, in this case the LAN node will be located on the switch.
Note that this discussion applies equally to 8 members on a 50Mb/sec LAN,
20 members of a 20Mb LAN, or any LAN where the aggregate bandwidth
exceeds 400Mb/sec.  And of course, you must take into consideration
the bandwidth of all other links and LANs on a node.
Now you know why we have a complex program to do this!
</p><p>
Anyway, if you are still not deterred and feel you can do a better job
of virtual to physical node mapping yourself, there are a few ways to
do this.  Note carefully though that none of
these will allow you to violate the bandwidth and packing constraints
listed above.
</p><p>
The NS-extension <code>tb-set-colocate-factor</code> command allows you
to globally decrease (not increase!) the maximum number of virtual nodes
Mike Hibler's avatar
Mike Hibler committed
345
346
per physical node.  This command is useful if you know the application
load you are running in the vnodes is going to require more resources
347
348
per instance (e.g., a java DHT), and that the Emulab picked values of
10-20 per physical node are just too high.
Mike Hibler's avatar
Mike Hibler committed
349
350
351
352
Note that currently, this is not really a "factor,"
it is an absolute value.  Setting it to 5 will reduce the capacity of
all node types to 5, whether they were 10 or 20 by default.
</p><p>
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
If the packing factor is ok, but <code>assign</code>
just won't colocate virtual nodes the way you want,
you can resort to trying to do the mapping by hand using
<code>tb-fix-node</code>.  This technique is not for the faint of heart
(or weak of stomach) as it involves mapping virtual nodes to specific
physical nodes, which you must determine in advance are available.
For example, the following code snippet will allocate 8 nodes in a LAN
and force them all onto the same physical host (pc41):
	<code><pre>
        set phost       pc41    # physical node to use
        set phosttype   850     # type of physical node, e.g. pc850

        # Force virtual nodes in a LAN to one physical host
        set lanstr ""
        for {set j 1} {$j <= 8} {incr j} {
                set n($j) [$ns node]
                append lanstr "$n($j) "
                tb-set-hardware $n($j) pcvm${phosttype}
                tb-fix-node $n($j) $phost
        }
        set lan [$ns make-lan "$lanstr" 10Mb 0ms]
	</code></pre>
If the host is not available, this will fail.  Note again, that "fixing"
nodes will still not allow you to violate any of the fundamental
mapping constraints.
</p><p>
There is one final technique that will allow you to circumvent
<code>assign</code> and the bandwidth constraints above.
The NS-extension <code>tb-set-noshaping</code> can be used to turn off
link shaping for a specific link or LAN, e.g.:
	<code><pre>
	tb-set-noshaping $lan 1
	</code></pre>
added to the NS snippet above would allow you to specify "1Mb" for the
LAN bandwidth and map 20 virtual nodes to the same physical host,
but then not be bound by the bandwidth constraint later.
In this way <code>assign</code> would map your topology, but no enforcement
would be done at runtime.  Specifically, this tells Emulab not to set
up ipfw rules and dummynet pipes on the specified interfaces.
One semi-legitimate use
of this command, is in the case where you know that your applications
will not exceed a certain bandwidth, and you don't want to incur the
ipfw/dummynet overhead associated with explicitly enforcing the limits.
Note, that as implied by the name, this turns off all shaping of a link,
not just the bandwidth constraint.  So if you need delays or packet loss,
don't use this.
<a NAME="AI3"></a><h3>How do I know what the right colocate factor is?</h3>
The hardest issue when using virtual nodes is determining how many
virtual nodes you can colocate on a physical node, without affecting the
fidelity of the experiment.  Ultimately, the experimenter must make
this decision, based on the nature of the applications run and what exactly
is being measured.  We provide some simple limits (e.g., network bandwidth
405
caps) and coarse-grained aggregate limits (e.g., the default colocation factor)
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
but these are hardly adequate.
<p>
One thing to try is to allocate a modest sized version of your experiment,
say 40-50 nodes, using just physical nodes and compare that to the same
experiment with 40-50 virtual nodes with various packing factors.
</p><p>
We are currently working on techniques that will allow you to specify
some performance constraints in some fashion, and have the experiment
run and self-adjust til it reaches a packing factor that doesn't violate
those constraints.
<a NAME="AI4"></a><h3>Mixing virtual and physical nodes.</h3>
It is possible to mix virtual nodes and physical nodes in the same
experiment.  For example, we could setup a LAN, similar to the above example,
such that half the nodes were virtual (pcvm) and half physical (pc):
	<code><pre>
        set lanstr ""
	for {set j 1} {$j <= 8} {incr j} {
	        set n($j) [$ns node]
		append lanstr "$n($j) "
		if {$j & 1} {
		        tb-set-hardware $n($j) pcvm
		} else {
		        tb-set-hardware $n($j) pc
                        tb-set-node-os $n($j) FBSD-STD
		}
	}
	set lan [$ns make-lan "$lanstr" 10Mb 0ms]
	</code></pre>
The current limitation is that the physical nodes must run FreeBSD because
of the use of the custom encapsulation on virtual ethernet devices.  Note
that this also implies that the physical nodes use virtual ethernet devices
and thus the MTU is likewise reduced.
<p>
439
440
441
442
443
444
445
446
We have also implemented, a non-encapsulating version of the
virtual ethernet interface that allows virtual nodes to talk directly to
physical ethernet interfaces and thus remove the reduced-MTU restriction.
To use the non-encapsulating version, put:
	<code><pre>
	tb-set-encapsulate 0
	</code></pre>
in your NS file.
447
<a NAME="Limitations"></a><h2>Limitations</h2>
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
Following are the primary limitations of the Emulab virtual node
implementation.
<ul>
<li> <b>Not a complete virtualization of a node.</b>
     We make no claims about being a true x86 or even BSD/Linux
     virtual machine.  We build on an existing mechanism (jail) with
     the primary goal of providing functional transparency to applications.
     We are even more lax in that we assume that all virtual nodes on a
     physical host belong to the same experiment.  This reduces the security
     concerns considerably.  For example, if a virtual node is able to crash
     the physical machine or is able to see data outside its scope, it only
     affects the particular experiment.  This is not to say that we are
     egregious in our violation.  A particular example is that virtual nodes
     are allowed to read /dev/mem.  This made it much easier as we did not
     have to either virtualize /dev/mem or rewrite lots of system utilities
     that use it.  The consequence is, that virtual nodes can spy on each
     other if they want.  But then, if you cannot trust yourself, who can
     you trust!
466
467
468
469
470
471
<li> <b>Not a complete virtualization of the network.</b>
     This is another aspect of the previous bullet, but bears special note.
     While we have virtual interfaces and routing tables, much of the
     network stack of a physical host remains shared, in particular all
     the resources used by the higher level protocols.  For example, all
     of the statistics reported by "netstat -s" are global to the node.
472
473
474
475
476
477
478
<li> <b>No resource guarantees for CPU and memory on nodes.</b>
     We also don't provide complete performance isolation.  We currently
     have no virtual node aware CPU scheduling mechanisms.  Processes in
     virtual nodes are just processes on the real machine and are scheduled
     according to the standard BSD scheduler.  There are also no limits
     on virtual or physical memory consumption by a virtual node.
<li> <b>Nodes must run a specific version of FreeBSD.</b>
479
     We have hacked the FreeBSD 4.10 kernel mightily to support virtual nodes.
480
481
482
     See
     <a href="../doc/docwrapper.php3?docname=jail.html">this document</a>
     for details, but suffice it to say, making these changes to Linux
483
484
     or even other versions of FreeBSD would be a huge task.  We will be
     providing a virtual node environment for Linux, likely using Xen.
485
486
487
488
489
490
491
492
493
494
495
496
497
498
<li> <b>Will only scale to low 1000s of nodes.</b>
     We currently have a number of scaling issues that make it impractical
     to run experiments of more than 1000-2000 nodes.  These range from
     algorithmic issues in the resource mapper and route calculator, to
     physical issues like too few and too feeble of physical nodes, to
     user interface issues like how to present a listing or visualization
     of thousands of nodes in a useful way.
<li> <b>Virtual nodes are not externally visible.</b>
     Due to a lack of routable IP space, virtual nodes are given non-routable
     control net addresses and thus cannot be accessed directly from outside
     Emulab.  You must use a suitable proxy or access them from the Emulab
     user-login server.
<li> <b>Virtual ethernet encapsulation reduces the MTU.</b>
     This is a detail, but of possible importance to people since they
499
500
501
     are doing network experiments.  By default, the veth device reduces
     the MTU by 16 bytes to 1484.  As mentioned, we have a version of the
     interface which does not use encapsulation. 
502
<li> <b>Only 400Mb of internal "network" bandwidth.</b>
503
     This falls in the rinky-dink node category.  As most of our nodes
504
505
506
507
508
509
510
511
512
513
514
515
     are based on ancient 100Mhz FSB, sub-GHz technology, they cannot
     host many virtual nodes or high capacity virtual links.  The next
     wave of cluster machines will be much better in this regard.
<li> <b>No node consoles.</b>
     Virtual nodes do not have a virtual console.  If we discover a need
     for one, we will implement it.
<li> <b>Must use "linkdelays."</b>
     To enable topology-on-a-single-node configurations and to conserve
     physical resources in the face of large topologies, we use on-node
     traffic shaping rather than dedicated traffic shaping nodes.  This
     increases the overhead on the host machine slightly.   To improve
     the fidelity of delays and bandwidth shaping, virtual node hosts
516
517
518
519
520
521
522
523
524
     run their kernel at 1000Hz rather than 100Hz.
     <p>
     One potentially serious
     side-effect of vnode traffic shaping, and linkdelays in general, is
     that dummynet on FreeBSD induces a minimum one clock tick (1ms for
     1000Hz kernel) delay for <i>any</i> form of traffic shaping.  For
     example, if you had 10 machines connected point to point in a "line",
     you would incur a 10ms delay from one end to the other, even if you
     were only shaping the bandwidth of the links.
525
</ul>
526
<a NAME="KnownBugs"></a><h2>Known Bugs</h2>
527
528
529
530
There is currently a problem with the "loopback" (nullfs) filesystem
mechanism we use to export filesystems to virtual nodes.  It is prone to
deadlock under load.  To be safe, you should do all your logging and
heavy file activity inside the "file" disk (e.g., in /var).
531
532
533
534
535
536
<a NAME="TechDetails"></a><h2>Technical Details</h2>
There is an
<a href="../doc/docwrapper.php3?docname=jail.html">online document</a>
covering some of the details of the FreeBSD implementation of virtual nodes.
There is a more detailed document in the Emulab source code in the file
<code>doc/vnode-impl.txt</code>.