vnodes.html 10.6 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
<!--
   EMULAB-COPYRIGHT
   Copyright (c) 2000-2004 University of Utah and the Flux Group.
   All rights reserved.
  -->
<center>
<h1>Multiplexed Virtual Nodes in Emulab</h1>
</center>

<h2>Contents</h2>
<ul>
<li> <a href="#Overview">Overview</a>
<li> <a href="#Use">Use</a>
<li> <a href="#AdvancedIssues">Advanced Issues</a>
<li> <a href="#Limitations">Limitations</a>
<li> <a href="#KnownBugs">Known Bugs</a>
<li> <a href="#TechDetails">Technical Details</a>
</ul>

<hr>
<a NAME="Overview"></a><h2>Overview</h2>
<p>
In order to allow experiments with a very large number of nodes,
we provide a <i>multiplexed virtual node</i> implementation.
If an experiment application's CPU, memory and network requirements are
modest, multiplexed virtual nodes (hereafter known as just "virtual nodes"),
allow an experiment to use 10-20 times as many nodes as there are available
physical machines in Emulab.  These virtual nodes can currently only run
FreeBSD, but Linux support is coming.
</p><p>
Virtual nodes fall between simulated nodes (ala, <code>ns</code>)
and real, dedicated machines in terms of accuracy of modeling the real world.
A virtual node is just a lightweight virtual machine running on top of
a regular operating system.  In particular, our virtual nodes are based
on the FreeBSD <code>jail</code> mechanism, that allows groups of processes
to be isolated from each other while running on the same physical machine.
Emulab virtual nodes provide isolation of the filesystem, process, network,
and account namespaces.  That is to say, each virtual node has its own
private filesystem, process hierarchy, network interfaces and IP addresses,
and set of users and groups.  This level of virtualization allows unmodified
applications to run as though they were on a real machine.  Virtual network
interfaces are used to form an arbitrary number of virtual network links.
These links may be individually shaped and may be multiplexed over physical
links or used to connect virtual nodes within a single physical node.
</p><p>
With <a href="#Limitations">some limitations</a>, virtual nodes can act in
any role that a normal Emulab node can: end node, router, or traffic generator.
You can run startup commands, ssh into them, run as root, use tcpdump or
traceroute, modify routing tables, and even reboot them.  You can construct
arbitrary topologies of links and LANs, even mixing virtual and real nodes.
</p><p>
The number of virtual nodes that can be multiplexed on a single physical
node depends on a variety of factors including the resource requirements
of the application, the type of the underlying node, the bandwidths
of the links you are emulating and the desired fidelity of the emulation.
See the <a href="#AdvancedIssues">Advanced Issues</a> section for more info.
</p>
<a NAME="Use"></a><h2>Use</h2>
Multiplexed virtual nodes are specified in an ns description by indicating
that you want the <b>pcvm</b> node type:
	<code><pre>
	set nodeA [$ns node]
	tb-set-hardware $nodeA pcvm
	</code></pre>
That's it!  With few exceptions, every thing you use in an NS file for an
Emulab experiment running on physical nodes, will work with virtual nodes.
The most notable exception is that you cannot specify the operating system
for a virtual node, they are limited to running our custom version of
FreeBSD 4.7 (soon to be FreeBSD 4.9).
</p><p>
As a simple example, we could take the <a href="basic.ns">basic ns script</a>
used in the
<a href="docwrapper.php3?docname=tutorial.html#Designing">tutorial</a>
add the following lines:
	<code><pre>
	tb-set-hardware $nodeA pcvm
	tb-set-hardware $nodeB pcvm
	tb-set-hardware $nodeC pcvm
	tb-set-hardware $nodeD pcvm
	</code></pre>
and remove the explicit setting of the OS:
	<code><pre>
	# Set the OS on a couple.
	tb-set-node-os $nodeA FBSD-STD
	tb-set-node-os $nodeC RHL-STD         
	</code></pre>
and the <a href="vnode-example.ns">resulting ns file</a>
can be submitted to produce the very same topology.
Once the experiment has been instantiated, the experiment web page should
include a listing of the reserved nodes that looks something like:
<p>
  <br>
  <center>
    <img src="vnodes-list.png"><br><br>
  </center>
  <br>
</p>
By looking at the NodeIDs (pcvm36-NN), you can see that all four virtual
nodes were assigned to the same physical node (pc36).
(At the moment, control over virtual node to physical node mapping is
limited.  The <a href="#AdvancedIssues">Advanced Issues</a> section
discusses ways in which you can affect the mapping.)
Clicking on
the ssh icon will log you in to the virtual node.  Virtual nodes do not have
consoles, so there is no corresponding icon.  Note that there is also an entry
for the ''hosting'' physical node.  You can login to it as well, either with
ssh or via the console.
See the <a href="#AdvancedIssues">Advanced Issues</a> section for how you
can use the physical host.
Finally, note that there is no ''delay node'' associated with the shaped
link.  This is because virtual links always use
<a href="../doc/docwrapper.php3?docname=linkdelays.html#LINKDELAYS">
end node shaping<a>.
</p>
<p>
Logging into a virtual node you see only the processes associated with your
jail:
	<code><pre>
        PID  TT  STAT      TIME COMMAND
        1846  ??  IJ     0:00.01 injail: pcvm36-5 (injail)
        1883  ??  SsJ    0:00.03 /usr/sbin/syslogd -ss
        1890  ??  SsJ    0:00.01 /usr/sbin/cron
        1892  ??  SsJ    0:00.28 /usr/sbin/sshd
        1903  ??  IJ     0:00.01 /usr/bin/perl -w /usr/local/etc/emulab/watchdog start
        5386  ??  SJ     0:00.04 sshd: mike@ttyp1 (sshd)
        5387  p1  SsJ    0:00.06 -tcsh (tcsh)
        5401  p1  R+J    0:00.00 ps ax
	</code></pre>
The <code>injail</code> process serves the same function as <code>init</code>
on a regular node, it is the ''root'' of the process name space.  Killing it
will kill the entire virtual node.  Other standard FreeBSD processes include
<code>syslog</code>, <code>cron</code>, and <code>sshd</code> along with the
Emulab watchdog process.  Note that the process IDs are in fact <i>not</i>
virtualized, they are in the physical machine's name space.  However,
a virtual node still cannot kill a process that is part of another jail.
</p><p>
Doing a <code>df</code> you see:
	<code><pre>
        Filesystem                      1K-blocks      Used   Avail Capacity  Mounted on
        /dev/vn5c                          507999      1484  496356     0%    /
        /var/emulab/jails/local/testbed   6903614     73544 6277782     1%    /local/testbed
        /users/mike                      14081094   7657502 5297105    59%    /users/mike
        ...
	</code></pre>
<code>/dev/vn5c</code> is your private root filesystem, which is a FreeBSD
vnode disk (i.e., a regular file in the physical machine filesystem).
<code>/local/</code><i>projname</i> is ''loopback'' mounted from the physical
host and provides some disk space that is shared between all virtual nodes
on the same physical node.  Also mounted are the usual Emulab-provided, shared
filesystems.  Thus you have considerable flexibility in sharing ranging from
shared by all nodes (<code>/users/</code><i>yourname</i> and
<code>/proj/</code><i>projname</i>), shared by all virtual nodes on a physical
node (<code>/local/</code><i>projname</i>) to private to a virtual node
(<code>/local</code>).

</p><p>
Doing <code>ifconfig</code> reveals:
	<code><pre>
        fxp4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 rtabid 0
                inet 172.17.36.5 netmask 0xffffffff broadcast 172.17.36.5
                ether 00:d0:b7:14:0f:e2
                media: Ethernet autoselect (100baseTX <full-duplex>)
                status: active
        lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 rtabid 0
                inet 127.0.0.1 netmask 0xff000000 
        veth3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1484 rtabid 5
                inet 10.1.2.3 netmask 0xffffff00 broadcast 10.1.2.255
                ether 00:00:0a:01:02:03
                vethtag: 513 parent interface: <none>
	</code></pre>
Here <code>fxp4</code> is the control net interface.  Due to limited routable
IP address space, Emulab uses the 172.16/12 unroutable address range to assign
control net addresses to virtual nodes.  These addresses are routed within
Emulab, but are not exposed externally.  This means that you can access this
node (including using the DNS name ''nodeC.vtest.testbed.emulab.net'') from
ops.emulab.net or from other nodes in your experiment, but <i>not</i> from
outside Emulab.  If you need to access a virtual node from outside Emulab,
you will have to proxy the access via ops or a physical node (that is what
the ssh icon in the web page does).  <code>veth3</code> is a virtual ethernet
device (not part of standard FreeBSD, we wrote it at Utah) and is the
experimental interface for this node.  There will be one <code>veth</code>
device for every experimental interface.  Note the reduced MTU (1484) on the
veth interface.  This is because the veth device uses encapsulation to 
identify packets which are multiplexed on physical links.  Even though this
particular virtual link does not cross a physical wire, the MTU is reduced
anyway so that all virtual links have the same MTU.

</p><p>
<a NAME="AdvancedIssues"></a><h2>Advanced Issues</h2>

<h3>Taking advantage of a virtual node host.</h3>
A physical node hosting one or more virtual nodes is ''invisible'' to the
topology, it exists only to host virtual nodes.  You can however use the
physical node in a variety of ways: monitoring, proxying, efficient
distribution of files.
How to find the vnode filesystems,
what is shared from the host,
monitoring veth and phys eth devices,
routing tables.

<h3>Controlling virtual node layout.</h3>
Using tb-set-colocate-factor,
using tb-fix-node,
using tb-set-jail-os,
understanding how bandwidth affects layout.
How do I know what the right colocate factor is?
<h3>Mixing virtual and physical nodes.</h3>
<a NAME="Limitations"></a><h2>Limitations</h2>
Must run FreeBSD and a particular version at that.
No resource guarantees for CPU and memory.
veth encapsulation reduces MTU.
Vnode control net not externally visible.
400Mb internal "network" bandwidth.
Only scale to low 1000s of nodes due to various bottlenecks
(assign, NFS, routing).
No consoles.
Always use linkdelays (more overhead, requires 1000Hz kernel).
Not a complete virtualization, many commands "see through".
<a NAME="KnownBugs"></a><h2>Known Bugs</h2>
Deadlocks in loopback mounts.
<a NAME="TechDetails"></a><h2>Technical Details</h2>
There is an
<a href="../doc/docwrapper.php3?docname=jail.html">online document</a>
covering some of the details of the FreeBSD implementation of virtual nodes.
There is a more detailed document in the Emulab source code in the file
<code>doc/vnode-impl.txt</code>.