vnodes.html 14.7 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
<!--
   EMULAB-COPYRIGHT
   Copyright (c) 2000-2004 University of Utah and the Flux Group.
   All rights reserved.
  -->
<center>
<h1>Multiplexed Virtual Nodes in Emulab</h1>
</center>

<h2>Contents</h2>
<ul>
<li> <a href="#Overview">Overview</a>
<li> <a href="#Use">Use</a>
<li> <a href="#AdvancedIssues">Advanced Issues</a>
<li> <a href="#Limitations">Limitations</a>
<li> <a href="#KnownBugs">Known Bugs</a>
<li> <a href="#TechDetails">Technical Details</a>
</ul>

<hr>
<a NAME="Overview"></a><h2>Overview</h2>
<p>
In order to allow experiments with a very large number of nodes,
we provide a <i>multiplexed virtual node</i> implementation.
If an experiment application's CPU, memory and network requirements are
modest, multiplexed virtual nodes (hereafter known as just "virtual nodes"),
allow an experiment to use 10-20 times as many nodes as there are available
physical machines in Emulab.  These virtual nodes can currently only run
FreeBSD, but Linux support is coming.
</p><p>
Virtual nodes fall between simulated nodes (ala, <code>ns</code>)
and real, dedicated machines in terms of accuracy of modeling the real world.
A virtual node is just a lightweight virtual machine running on top of
a regular operating system.  In particular, our virtual nodes are based
on the FreeBSD <code>jail</code> mechanism, that allows groups of processes
to be isolated from each other while running on the same physical machine.
Emulab virtual nodes provide isolation of the filesystem, process, network,
and account namespaces.  That is to say, each virtual node has its own
private filesystem, process hierarchy, network interfaces and IP addresses,
and set of users and groups.  This level of virtualization allows unmodified
applications to run as though they were on a real machine.  Virtual network
interfaces are used to form an arbitrary number of virtual network links.
These links may be individually shaped and may be multiplexed over physical
links or used to connect virtual nodes within a single physical node.
</p><p>
With <a href="#Limitations">some limitations</a>, virtual nodes can act in
any role that a normal Emulab node can: end node, router, or traffic generator.
You can run startup commands, ssh into them, run as root, use tcpdump or
traceroute, modify routing tables, and even reboot them.  You can construct
arbitrary topologies of links and LANs, even mixing virtual and real nodes.
</p><p>
The number of virtual nodes that can be multiplexed on a single physical
node depends on a variety of factors including the resource requirements
of the application, the type of the underlying node, the bandwidths
of the links you are emulating and the desired fidelity of the emulation.
See the <a href="#AdvancedIssues">Advanced Issues</a> section for more info.
</p>
<a NAME="Use"></a><h2>Use</h2>
Multiplexed virtual nodes are specified in an ns description by indicating
that you want the <b>pcvm</b> node type:
	<code><pre>
	set nodeA [$ns node]
	tb-set-hardware $nodeA pcvm
	</code></pre>
That's it!  With few exceptions, every thing you use in an NS file for an
Emulab experiment running on physical nodes, will work with virtual nodes.
The most notable exception is that you cannot specify the operating system
for a virtual node, they are limited to running our custom version of
FreeBSD 4.7 (soon to be FreeBSD 4.9).
</p><p>
As a simple example, we could take the <a href="basic.ns">basic ns script</a>
used in the
<a href="docwrapper.php3?docname=tutorial.html#Designing">tutorial</a>
add the following lines:
	<code><pre>
	tb-set-hardware $nodeA pcvm
	tb-set-hardware $nodeB pcvm
	tb-set-hardware $nodeC pcvm
	tb-set-hardware $nodeD pcvm
	</code></pre>
and remove the explicit setting of the OS:
	<code><pre>
	# Set the OS on a couple.
	tb-set-node-os $nodeA FBSD-STD
	tb-set-node-os $nodeC RHL-STD         
	</code></pre>
and the <a href="vnode-example.ns">resulting ns file</a>
can be submitted to produce the very same topology.
Once the experiment has been instantiated, the experiment web page should
include a listing of the reserved nodes that looks something like:
<p>
  <br>
  <center>
    <img src="vnodes-list.png"><br><br>
  </center>
  <br>
</p>
By looking at the NodeIDs (pcvm36-NN), you can see that all four virtual
nodes were assigned to the same physical node (pc36).
(At the moment, control over virtual node to physical node mapping is
limited.  The <a href="#AdvancedIssues">Advanced Issues</a> section
discusses ways in which you can affect the mapping.)
Clicking on
the ssh icon will log you in to the virtual node.  Virtual nodes do not have
consoles, so there is no corresponding icon.  Note that there is also an entry
for the ''hosting'' physical node.  You can login to it as well, either with
ssh or via the console.
See the <a href="#AdvancedIssues">Advanced Issues</a> section for how you
can use the physical host.
Finally, note that there is no ''delay node'' associated with the shaped
link.  This is because virtual links always use
<a href="../doc/docwrapper.php3?docname=linkdelays.html#LINKDELAYS">
113
end node shaping</a>.
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
</p>
<p>
Logging into a virtual node you see only the processes associated with your
jail:
	<code><pre>
        PID  TT  STAT      TIME COMMAND
        1846  ??  IJ     0:00.01 injail: pcvm36-5 (injail)
        1883  ??  SsJ    0:00.03 /usr/sbin/syslogd -ss
        1890  ??  SsJ    0:00.01 /usr/sbin/cron
        1892  ??  SsJ    0:00.28 /usr/sbin/sshd
        1903  ??  IJ     0:00.01 /usr/bin/perl -w /usr/local/etc/emulab/watchdog start
        5386  ??  SJ     0:00.04 sshd: mike@ttyp1 (sshd)
        5387  p1  SsJ    0:00.06 -tcsh (tcsh)
        5401  p1  R+J    0:00.00 ps ax
	</code></pre>
The <code>injail</code> process serves the same function as <code>init</code>
on a regular node, it is the ''root'' of the process name space.  Killing it
will kill the entire virtual node.  Other standard FreeBSD processes include
<code>syslog</code>, <code>cron</code>, and <code>sshd</code> along with the
Emulab watchdog process.  Note that the process IDs are in fact <i>not</i>
virtualized, they are in the physical machine's name space.  However,
a virtual node still cannot kill a process that is part of another jail.
</p><p>
Doing a <code>df</code> you see:
	<code><pre>
        Filesystem                      1K-blocks      Used   Avail Capacity  Mounted on
        /dev/vn5c                          507999      1484  496356     0%    /
        /var/emulab/jails/local/testbed   6903614     73544 6277782     1%    /local/testbed
        /users/mike                      14081094   7657502 5297105    59%    /users/mike
        ...
	</code></pre>
<code>/dev/vn5c</code> is your private root filesystem, which is a FreeBSD
vnode disk (i.e., a regular file in the physical machine filesystem).
<code>/local/</code><i>projname</i> is ''loopback'' mounted from the physical
host and provides some disk space that is shared between all virtual nodes
on the same physical node.  Also mounted are the usual Emulab-provided, shared
filesystems.  Thus you have considerable flexibility in sharing ranging from
shared by all nodes (<code>/users/</code><i>yourname</i> and
<code>/proj/</code><i>projname</i>), shared by all virtual nodes on a physical
node (<code>/local/</code><i>projname</i>) to private to a virtual node
(<code>/local</code>).

</p><p>
Doing <code>ifconfig</code> reveals:
	<code><pre>
        fxp4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 rtabid 0
                inet 172.17.36.5 netmask 0xffffffff broadcast 172.17.36.5
                ether 00:d0:b7:14:0f:e2
                media: Ethernet autoselect (100baseTX <full-duplex>)
                status: active
        lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 rtabid 0
                inet 127.0.0.1 netmask 0xff000000 
        veth3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1484 rtabid 5
                inet 10.1.2.3 netmask 0xffffff00 broadcast 10.1.2.255
                ether 00:00:0a:01:02:03
                vethtag: 513 parent interface: <none>
	</code></pre>
Here <code>fxp4</code> is the control net interface.  Due to limited routable
IP address space, Emulab uses the 172.16/12 unroutable address range to assign
control net addresses to virtual nodes.  These addresses are routed within
Emulab, but are not exposed externally.  This means that you can access this
node (including using the DNS name ''nodeC.vtest.testbed.emulab.net'') from
ops.emulab.net or from other nodes in your experiment, but <i>not</i> from
outside Emulab.  If you need to access a virtual node from outside Emulab,
you will have to proxy the access via ops or a physical node (that is what
the ssh icon in the web page does).  <code>veth3</code> is a virtual ethernet
device (not part of standard FreeBSD, we wrote it at Utah) and is the
experimental interface for this node.  There will be one <code>veth</code>
device for every experimental interface.  Note the reduced MTU (1484) on the
veth interface.  This is because the veth device uses encapsulation to 
identify packets which are multiplexed on physical links.  Even though this
particular virtual link does not cross a physical wire, the MTU is reduced
anyway so that all virtual links have the same MTU.

</p><p>
<a NAME="AdvancedIssues"></a><h2>Advanced Issues</h2>

<h3>Taking advantage of a virtual node host.</h3>
Mike Hibler's avatar
Mike Hibler committed
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
A physical node hosting one or more virtual nodes is not itself part of
the topology, it exists only to host virtual nodes.  However, the physical
node is still setup with user accounts and shared filesystems just as a
regular node is.  Thus you can login to, and use the physical node in a
variety of ways:
<ul>
<li> Since the /usr file system for each node is mounted via a read-only
     loopback mount from the physical host, any files installed on a
     physical host's /usr will automatically be part of every virtual node
     as well.  This allows for a potentially more efficient file distribution
     mechanism:
     install packages in the host's /usr and they are visible in the virtual
     nodes as well.  Unfortunately, there is currently no "handle" for a
     virtual node host in the NS file, so you cannot install tarballs or
     RPMs on it as part of the experiment creation process.  You must install
     them by hand after the experiment has been created, and reboot the
     virtual nodes.  Thereafter, the packages will be available.
<li> The private root filesystem for each virtual node is also accessible
     to the host node in
     <code>/var/emulab/jails/</code><i>vnodename</i><code>/root</code>
     where <i>vnodename</i> is the "pcvmNN-NN" Emulab name.  Thus the host
     can monitor log files and even change files on the fly.
<li> Other forms of monitoring can be done as well since all processes,
     filesystems, network interfaces and routing tables are visible in the
     host.  For instance, you can run tcpdump on a virtual interface outside
     the node rather than inside it.  You can also tcpdump on a physical
     interface on which many virtual nodes' traffic is multiplexed.  The
     installed version of tcpdump understands the veth encapsulation.
</ul>
We should emphasize however, that virtual nodes are not "performance
isolated" from each other or from the host; i.e., a big CPU hogging
monitor application in the host might affect the performance and behavior
of the hosted virtual nodes.
225
226

<h3>Controlling virtual node layout.</h3>
Mike Hibler's avatar
Mike Hibler committed
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
<p>
Normally, the Emulab resource mapper, <code>assign</code>
will map virtual nodes onto physical
nodes in such a way as to achieve the best overall use of physical resources
without violating any of the constraints of the virtual nodes or links.
In a nutshell, it packs as many virtual nodes onto a physical node as it
can without exceeding a node's internal or external network bandwidth
capabilities and without exceeding a node-type specific static packing
factor.  Internal network bandwidth is an empirically derived value for
how much network data can be moved through internally connected virtual
ethernet interfaces.  External network bandwidth is based on the number
of physical interfaces available on the node.  The static packing factor is
intended as a coarse metric of CPU and memory load that a physical node
can support, currently it is based strictly on the amount of physical memory.
The current values for these constraints are:
<ul>
<li>Internal network bandwidth: 400Mb/sec for all node types
<li>External network bandwidth: 400Mb/sec for all node types
<li>Packing factor: 10 for pc600s and pc1500s, 20 for pc850s and pc2000s
</ul>
</p><p>
The mapper generally produces an "unsurprising" mapping of virtual nodes
to physical nodes (e.g., mapping small LANs all on the same physical host)
and where it doesn't, it is usually because doing so would violate one
of the constraints.  However, there are circumstances in which you might
want to modify or even override the way in which mapping is done.
Currently there are only limited ways in which to do this, and none of
these will allow you to violate the constrains above.
</p><p>
Using the NS-extension <code>tb-set-colocate-factor</code> command, you
can globally reduce (not increase!) the maximum number of virtual nodes
per physical node.  This command is useful if you know the application
load you are running in the vnodes is going to require more resources
per instance (e.g., a java DHT).
Note that currently, this is not really a "factor,"
it is an absolute value.  Setting it to 5 will reduce the capacity of
all node types to 5, whether they were 10 or 20 by default.
</p><p>
Since <code>assign</code> uses a heuristic algorithm at its core,
sometime it just doesn't find the best solution that you might think
is obvious.  If assign just won't colocate virtual nodes that you want
colocated, you can resort to trying to do the mapping by hand using
<code>tb-fix-node</code>.
<i>TODO:
271
using tb-set-jail-os,
Mike Hibler's avatar
Mike Hibler committed
272
using tb-set-noshaping,
273
274
understanding how bandwidth affects layout.
How do I know what the right colocate factor is?
Mike Hibler's avatar
Mike Hibler committed
275
ENDTODO</i>
276
277
<h3>Mixing virtual and physical nodes.</h3>
<a NAME="Limitations"></a><h2>Limitations</h2>
Mike Hibler's avatar
Mike Hibler committed
278
<i>TODO:
279
280
281
282
283
284
285
286
287
288
Must run FreeBSD and a particular version at that.
No resource guarantees for CPU and memory.
veth encapsulation reduces MTU.
Vnode control net not externally visible.
400Mb internal "network" bandwidth.
Only scale to low 1000s of nodes due to various bottlenecks
(assign, NFS, routing).
No consoles.
Always use linkdelays (more overhead, requires 1000Hz kernel).
Not a complete virtualization, many commands "see through".
Mike Hibler's avatar
Mike Hibler committed
289
ENDTODO</i>
290
<a NAME="KnownBugs"></a><h2>Known Bugs</h2>
Mike Hibler's avatar
Mike Hibler committed
291
<i>TODO:
292
Deadlocks in loopback mounts.
Mike Hibler's avatar
Mike Hibler committed
293
ENDTODO</i>
294
295
296
297
298
299
<a NAME="TechDetails"></a><h2>Technical Details</h2>
There is an
<a href="../doc/docwrapper.php3?docname=jail.html">online document</a>
covering some of the details of the FreeBSD implementation of virtual nodes.
There is a more detailed document in the Emulab source code in the file
<code>doc/vnode-impl.txt</code>.