delay-implementation.txt 57.3 KB
Newer Older
1
2
3
[ This file explains how traffic shaping is implemented with the emphasis
on how the delay-agent works. ]

4
5
6

0. Overview

7
8
9
10
11
We can shape network links or LANs.  Links can have their characteristics
set either symmetrically (duplex) or asymmetrically (simplex).  LANs can
have characteristics set either uniformly for the entire LAN or individually
per node on the LAN.  Note that shaped LANs are mostly used to emulate
"clouds" where you are modeling the last hop of nodes connected to some
12
13
opaque network.  We can shape bandwidth, delay and packet loss rate, and
to a limited extent, queuing behavior.
14

15
16
17
18
19
From the user perspective, links and LANs can be shaped "statically" by
specifying their characteristics once in the NS file, or dynamically by
sending "shaping events" via a web page, client GUI, of the command line
tool "tevc."

20
21
22
23
24
25
26
27
Shaping is usually done using a dedicated "delay node" which is interposed
between nodes on a link or LAN.  A single shaping node can shape one link
per two interfaces.  So in Emulab, where nodes typically have four experimental
network interfaces, we can shape two links per shaping node.  For a LAN in
Emulab, one shaping node can handle two nodes connected to the LAN.
More details are given below.

A lower-fidelity method is to shape the links at the end points ("end node
28
29
30
31
32
33
34
shaping").  Larger networks can be emulated in this way since it doesn't
require a dedicated shaping node for every 1-2 links.

As part of the Flexlab project, Emulab supports shaping LANs in more
specialized ways.  These include the ability to shape traffic between
individual node-pairs and even to shape between individual TCP or UDP
flows among nodes.
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

Our shaping nodes currently use dummynet, configured via IPFW, running on
FreeBSD.  Much of the terminology below (e.g., "pipe") comes from this
heritage though hopefully the parameters are general enough for other
implementations.  The particular implementation sets up a layer 2 bridge
between the two interfaces composing a link.  An IPFW rule is setup for
each interface, and a dummynet pipe associated with each rule.  Shaping
characteristics are then applied to those pipes.


1. Specifying shaping.

Shaping can be specified statically in the NS file using (largely) standard
NS commands and syntax.  Some commands were added by us, in particular to
handle LANs.  Commands:

51
* Create a link between nodes with specified parameters:
52

53
	set <link> [$ns duplex-link <node1> <node2> <bw> <del> <q-behavior>]
54

55
  and to set loss rate on the link:
56

57
	tb-set-link-loss <link> <plr>
58

59
60
61
62
63
64
65
66
  Here the characteristics specified are one-way; i.e., traffic from
  <node1> to <node2> are shaped with the values, as is the traffic from
  <node2> to <node1>.  The result is that a round-trip measurement such
  as ping will see <bw> bandwidth, 2 * <del> delay, and 1-((1-<plr>)**2)
  packet loss rate.

* To set simplex (individual direction) parameters on a link:

67
	tb-set-link-simplex-params <link> <src-node> <del> <bw> <plr>
68
69
70
71
72
73
74
75
76
77
78
79
80

  As measured from a single node doing a round-trip test, you will observe
  the lesser of the two directional <bw> values, the sum of the directional
  <del> values and 1 - ((1-<plr1>) * (1-<plr2>)) packet loss.  In effect,
  a duplex link is a degenerate case of the simplex link (duh!)

For LANs:

* Create a LAN with N nodes:

	set <lan> [$ns make-lan "<node0> <node1> ... <nodeN>" <bw> <del>]

  and to set loss rate for a LAN:
81
82
83

	tb-set-lan-loss <lan> <loss>

84
85
86
  Here a LAN appears as a set of pairwise links for the purposes of shaping
  characteristics.  Traffic from any node to any other will see the indicated
  values.  Thus, round-trip traffic between any pair of nodes will be the
87
88
89
  same as for an identically shaped link between those nodes.  For the
  remainder of this text, we will generally refer to this as a "symmetrically
  shaped" or simply "symmetric" LAN.
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

* You can also construct LANs with per-node characteristics:

	set <lan> [$ns make-lan "<n1> <n2> ... <nN>" 100Mbs 0ms]
	tb-set-node-lan-params <n1> <lan> <del1> <bw1> <loss1>
	tb-set-node-lan-params <n2> <lan> <del2> <bw2> <loss2>
	...

  However, here the interpretation of the shaping values is slightly different.
  In this case, the characteristics reflect one-way values to and from "the
  LAN."  In other words, it is a duplex-link to the LAN.  In still other words,
  round-trip traffic from <n1> to any other unshaped node on an unshaped LAN
  will see <bw1> bandwidth, 2 * <del1>, and 1-((1-<loss1>)**2) packet loss.
  If the other node involved in the round trip is also shaped, then round-trip
  traffic will see:

	bw: lesser of <bw1> and <bw2>
	delay: 2 * <del1> + 2 * <del2>
	plr: 1 - ((1 - <loss1>)**2 * (1 - <loss2>)**2)

110
111
112
113
114
115
116
117
118
119
120
  We refer to this case as an "asymmetrically shaped" or "asymmetric" LAN.

      NOTE: This is a bit of a misnomer however, as it is possible
      to use tb-set-node-lan-params to set identical (symmetric)
      characteristics on the node connections, but the observed
      behavior will be different than for a so-called symmetric LAN
      setup with the same characteristics.

  It is also possible for the base LAN to be shaped (characteristics on the
  make-lan method) and for the individual node connections to be shaped (an
  asymmetric symmetric LAN?).  For sanity reasons we won't EVEN go there.
121
122

Shaping can also be modified dynamically using the web page or tevc.
123
124
125
126
127
128
129
130
No matter the UI, the actual work is done by sending link shaping events
to the shaping client, as described later.

It is possible to force allocation of shaping nodes even for unshaped
links (i.e., links that might be later dynamically shaped) by using the
mustdelay method:

	<link-or-lan> mustdelay
131
132

End node shaping can be set globally or per-link/LAN:
133

134
135
136
	tb-use-endnodeshaping <enable?>
	tb-set-endnodeshaping <link-or-lan> <enable?>

137
138
139
140
141
142
143
144
145
146
147
148
1a. Is a LAN of two nodes a link?

One interesting issue that arises here, and has implications further down
the line, is whether a LAN of two nodes is equivalent to a link.  The answer,
as you might expect, is "yes and no."  A duplex-link will behave the same
as a symmetric LAN of two nodes.  The two are in fact represented in the DB
and implemented identically.  The same is NOT (quite) true of an asymmetric
LAN of two nodes, because of the different semantics.  The LAN will be
"collapsed" into a link in the DB and the implementation will be the same,
but the characteristics stored in the DB and the resulting observed behavior
will be different, reflecting the differing semantics.

149
150
151

2. Shaping info in the database.

152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
Shaping information is stored in the DB in three tables.  One stores
the virtual ("logical") information, which is essentially as specified in
the NS topology.  The other two store the physical information needed by
either the dedicated shaping node or the experiment nodes themselves (when
endnode shaping is in effect).  This info includes the physical nodes used
for shaping (if any), the interfaces involved, the dummynet pipe numbers
to use, etc.

2a. virt_lans

The logical information is stored in the virt_lans table.  Here, for a
given experiment, there is a row for every endpoint of a link or lan for
every node involved.  For example, a link between two nodes:

    set link [$ns duplex-link n1 n2 1Mbps 10ms DropTail]
    tb-set-link-loss $link 0.01

would have two rows, one for n1 and one for n2:

171
172
173
174
175
176
    +-------+------+------+---------+------+------+---------+
    | vnode | del  | bw   | loss    | rdel | rbw  | rloss   |
    +-------+------+------+---------+------+------+---------+
    | n1    | 5.00 | 1000 | 0.00501 | 5.00 | 1000 | 0.00501 |
    | n2    | 5.00 | 1000 | 0.00501 | 5.00 | 1000 | 0.00501 |
    +-------+------+------+---------+------+------+---------+
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229

Each row contains the characteristics for "outgoing" or forward traffic on
the endpoint (del/bw/loss) and the characteristics of "incoming" or reverse
traffic on the endpoint (rdel/rbw/rloss).

Since the characteristics specified by the user for a link are for
one-way between the nodes, this DB arrangement requires that the shaping
characteristics be divided up between the nodes (across DB rows) such that
the resulting combination reflects the user-specified values.  For bandwidth,
the value stored in the DB is just what the user gave, since limiting the
BW on both sides is the same as limiting on one side.  For delay, half the
value is associated with each endpoint since delay values are additive.
For loss rate, there is a multiplicative effect so "half" the value means
1 - sqrt(1-<loss>).  Returning to the example above, this means that the
outgoing characteristics for n1, and incoming for n2, will be bw=1000,
delay=5, loss=0.00501.  Since it is a duplex-link, the return path (outgoing
for n2, incoming for n1) will be set the same.

For simplex links in which each direction has different characteristics:

    set link [$ns duplex-link $n1 $n2 100Mb 0ms DropTail]
    tb-set-link-simplex-params $link $n1 10ms 1Mb 0.01
    tb-set-link-simplex-params $link $n2 20ms 2Mb 0.02

the characteristics are again split, with the node listed as the source
node uses the "outgoing" fields to store the characteristics for that
direction:

    +-------+-------+------+---------+-------+------+---------+
    | vnode | del   | bw   | loss    | rdel  | rbw  | rloss   |
    +-------+-------+------+---------+-------+------+---------+
    | n1    | 5.00  | 1000 | 0.00501 | 10.00 | 2000 | 0.01005 |
    | n2    | 10.00 | 2000 | 0.01005 | 5.00  | 1000 | 0.00501 |
    +-------+-------+------+---------+-------+------+---------+


For a symmetric delayed LAN (i.e., one in which all node pairs have virtual
duplex links with the indicated characteristics); e.g.:

    set lan [$ns make-lan "n1 n2 n3" 1Mbps 10ms]
    tb-set-lan-loss $lan 0.01

the DB state for the endpoints for each set of nodes is setup as for a
duplex link above:

    +-------+------+------+---------+------+------+---------+
    | vnode | del  | bw   | loss    | rdel | rbw  | rloss   |
    +-------+------+------+---------+------+------+---------+
    | n1    | 5.00 | 1000 | 0.00501 | 5.00 | 1000 | 0.00501 |
    | n2    | 5.00 | 1000 | 0.00501 | 5.00 | 1000 | 0.00501 |
    | n3    | 5.00 | 1000 | 0.00501 | 5.00 | 1000 | 0.00501 |
    +-------+------+------+---------+------+------+---------+

230
231
As mentioned in earlier text, a symmetric LAN of two nodes is represented
identically to a duplex link with the same characteristics.
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259

For asymmetric delayed LANs (those with per-node characteristics); e.g.:

    set lan [$ns make-lan "n1 n2 n3" 100Mbs 0ms]
    tb-set-node-lan-params $n1 $lan 10us 1Mbps 0.01
    tb-set-node-lan-params $n2 $lan 20us 2Mbps 0.02
    tb-set-node-lan-params $n3 $lan 30us 3Mbps 0.03

the user specified values are for the "link" between a node and the LAN.
Thus for LANs, the information is not split.  Recalling that the user-
specified values are for traffic both to and from the node, the single row
associated with the connection of the node and the LAN contains those
user-specified values for both the forward and reverse directions:

    +-------+-------+------+---------+-------+------+---------+
    | vnode | del   | bw   | loss    | rdel  | rbw  | rloss   |
    +-------+-------+------+---------+-------+------+---------+
    | n1    | 10.00 | 1000 | 0.01000 | 10.00 | 1000 | 0.01000 |
    | n2    | 20.00 | 2000 | 0.02000 | 20.00 | 2000 | 0.02000 |
    | n3    | 30.00 | 3000 | 0.03000 | 30.00 | 3000 | 0.03000 |
    +-------+-------+------+---------+-------+------+---------+

2b. delays

The delays table stores the "physical" information related to delays when
dedicated shaping nodes are used.  This is information about the physical
instantiation of the virt_lans information and thus only exists when an
experiment is swapped in.  The delays table information is structured for
260
261
262
263
264
265
266
267
the benefit of the shaping node for which it is intended.

    NOTE: Shaping nodes do not even exist in the virtual (persistent)
    state of an experiment.  They are assigned when an experiment
    is swapped in, and only physical state tables like delays know
    about them.

For each experiment, there is a single row representing a delayed link or lan
268
269
270
271
272
273
connection.  Each row has two sets of shaping characteristics, called "pipes".
Each pipe represents traffic flowing in one direction through the shaping
node.  Exactly what that means, depends on whether we are shaping a link,
a symmetrically delayed LAN, or an asymmetrically delayed LAN.  Let's look
at some examples.

274
275
276
277
278
279
280
281
    NOTE: In the interest of full-disclosure, it should be noted that
    the following DB tables were hand-edited for clarity.  In particular,
    many columns are omitted and we currently support only two delayed
    links per physical shaping node.  For the latter, pipe numbers have
    been renumbered to be unique--as though all three LAN nodes were
    delayed by the same shaping node.  In reality the delays table also
    contains the physical node_id of the shaping node, and it is the
    combo of node_id/pipe that is truly unique.
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329

For our example duplex link:

    set link [$ns duplex-link n1 n2 1Mbps 10ms DropTail]
    tb-set-link-loss $link 0.01

we get:

    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | vn1  | p0  | del0  | bw0  | loss0 | vn2  | p1  | del1  | bw1  | loss1 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | n1   | 130 | 10.00 | 1000 | 0.010 | n2   | 140 | 10.00 | 1000 | 0.010 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+

where pipe0 (p0) represents the shaping (del0/bw0/loss0) on the path from
n1 to n2, and pipe1 (p1) represents the shaping (del1/bw1/loss1) on the
path from n2 to n1.  As we see, for the duplex link, both directions are
identical.  For the simplex link:

    set link [$ns duplex-link $n1 $n2 100Mb 0ms DropTail]
    tb-set-link-simplex-params $link $n1 10ms 1Mb 0.01
    tb-set-link-simplex-params $link $n2 20ms 2Mb 0.02

we get:

    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | vn1  | p0  | del0  | bw0  | loss0 | vn2  | p1  | del1  | bw1  | loss1 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | n1   | 130 | 10.00 | 1000 | 0.010 | n2   | 140 | 20.00 | 2000 | 0.020 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+

Here we see the two pipes reflecting the different characteristics.

For our symmetric delayed LAN:

    set lan [$ns make-lan "n1 n2 n3" 1Mbps 10ms]
    tb-set-lan-loss $lan 0.01

we have:

    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | vn1  | p0  | del0  | bw0  | loss0 | vn2  | p1  | del1  | bw1  | loss1 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | n1   | 130 | 5.00  | 1000 | 0.005 | n1   | 140 | 5.00  | 1000 | 0.005 |
    | n2   | 150 | 5.00  | 1000 | 0.005 | n2   | 160 | 5.00  | 1000 | 0.005 |
    | n3   | 110 | 5.00  | 1000 | 0.005 | n3   | 120 | 5.00  | 1000 | 0.005 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+

330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
Notice that this is NOT like the entries in the delays table would be for
duplex links between any two nodes in the LAN.  Instead, the values are
"halved".  Even though the definition of a symmetric shaped LAN leads one
to believe that the connection between any pair of LAN nodes would look
like a duplex link, that isn't the case here.  This is due to the fact that
the implementation of LANs is different than that of links and the delays
table reflects the implementation.  The difference is that, for links,
shaping is between two nodes while, for LANs, the shaping is between a node
and the LAN.  Hence one-way traffic on a link is shaped by a single pipe
(e.g., n1 -> n2 via pipe 130 in the duplex link table) while in a LAN, it
is shaped by two (e.g., n1 -> LAN via pipe 130, LAN -> n2 via pipe 160).
So the values must be different in the two implementations to achieve the
same observed result.

But what if we had a LAN of two nodes; e.g., removing "n3" above?  Then
it is represented exactly like a duplex link.  The two lines you would
get in the delays table by removing "n3" above, are in fact collapsed
into a single entry that looks like the duplex-link example.
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372

For an asymmetric delayed LAN where nodes have individual shaping parameters,
such as:

    set lan [$ns make-lan "n1 n2 n3" 100Mbs 0ms]
    tb-set-node-lan-params $n1 $lan 10us 1Mbps 0.01
    tb-set-node-lan-params $n2 $lan 20us 2Mbps 0.02
    tb-set-node-lan-params $n3 $lan 30us 3Mbps 0.03

we get:

    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | vn1  | p0  | del0  | bw0  | loss0 | vn2  | p1  | del1  | bw1  | loss1 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | n1   | 130 | 10.00 | 1000 | 0.010 | n1   | 140 | 10.00 | 1000 | 0.010 |
    | n2   | 110 | 20.00 | 2000 | 0.020 | n2   | 120 | 20.00 | 2000 | 0.020 |
    | n3   | 110 | 30.00 | 3000 | 0.030 | n3   | 120 | 30.00 | 3000 | 0.030 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+

Here the entries are very much like the duplex-link case.  That is because
asymmetric delayed LANs are essentially duplex-links from a node to the LAN.
Thus pipe0 is the path from node to LAN, and pipe1 the path from LAN to node.
Note that since this is a LAN configuration, traffic from one node to another
does traverse two pipes.

373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
If we again remove "n3" to get a LAN of two nodes, the remaining two lines
are again collapsed into one, but the result is NOT the same as for the
simplex-link example.  Instead we get:

    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | vn1  | p0  | del0  | bw0  | loss0 | vn2  | p1  | del1  | bw1  | loss1 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+
    | n1   | 130 | 30.00 | 1000 | 0.030 | n2   | 140 | 30.00 | 1000 | 0.030 |
    +------+-----+-------+------+-------+------+-----+-------+------+-------+

This reflects the behavior that traffic from n1 to n2 will see, for example,
10ms delay from n1 to the LAN and then another 20ms (30ms total) from the
LAN to n2.


388
389
390
391
392
393
394
395
396
397
2c. linkdelays

The linkdelays table is the analog of the delays table for cases where
endnode shaping is used.  In other words, linkdelays entries exist for
links and LANs that have endnode shaping specified, delays table entries
exist for all others.

The structure of linkdelays is very similar to that of delays.
As with delays, the entries only exist when an experiment is swapped in.
Again, for each experiment, there is a single row representing a delayed
398
link or LAN connection and each row has two pipes and the associated
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
characteristics.  However, here the pipes represent traffic out of the
node ("pipe", analogous to delays "pipe0") and traffic into the node
("rpipe" analogous to delays "pipe1").

One interesting difference is in how links are represented.  Instead of
being two pipes on the shaping node (and thus one delays table entry),
it is now one pipe each on each endnode (and thus two linkdelays table
entries).  The entries for our example duplex link look like:

    +-------+----+------+-------+------+-------+-------+------+------+-------+
    | pnode | vn | pipe | del   | bw   | loss  | rpipe | rdel | rbw  | rloss |
    +-------+----+------+-------+------+-------+-------+------+------+-------+
    | pc1   | n1 | 130  | 10.00 | 1000 | 0.010 | 0     | 0.00 | 100  | 0.000 |
    | pc2   | n2 | 130  | 10.00 | 1000 | 0.010 | 0     | 0.00 | 100  | 0.000 |
    +-------+----+------+-------+------+-------+-------+------+------+-------+

Note the odd information for the reverse pipe.  This is because a reverse
pipe is not setup as there is no shaping to do in that direction.  Traffic
from n1 to n2 is shaped on n1 and traffic from n2 to n1 is shaped on n2.

The simplex link example is similar:

    +-------+----+------+-------+------+-------+-------+------+------+-------+
    | pnode | vn | pipe | del   | bw   | loss  | rpipe | rdel | rbw  | rloss |
    +-------+----+------+-------+------+-------+-------+------+------+-------+
    | pc1   | n1 | 130  | 10.00 | 1000 | 0.010 | 0     | 0.00 | 100  | 0.000 |
    | pc2   | n2 | 130  | 20.00 | 2000 | 0.020 | 0     | 0.00 | 100  | 0.000 |
    +-------+----+------+-------+------+-------+-------+------+------+-------+


Information for LANs is the same as in the delays table.  For the symmetric
example all pipes are used the characteristics are "halved":

    +-------+----+------+-------+------+-------+-------+------+------+-------+
    | pnode | vn | pipe | del   | bw   | loss  | rpipe | rdel | rbw  | rloss |
    +-------+----+------+-------+------+-------+-------+------+------+-------+
    | pc1   | n1 | 110  | 5.00  | 1000 | 0.005 | 120   | 5.00 | 1000 | 0.005 |
    | pc2   | n2 | 110  | 5.00  | 1000 | 0.005 | 120   | 5.00 | 1000 | 0.005 |
    | pc3   | n3 | 110  | 5.00  | 1000 | 0.005 | 120   | 5.00 | 1000 | 0.005 |
    +-------+----+------+-------+------+-------+-------+------+------+-------+

and for the asymmetric LAN:

    +-------+----+------+-------+------+-------+-------+-------+------+-------+
    | pnode | vn | pipe | del   | bw   | loss  | rpipe | rdel  | rbw  | rloss |
    +-------+----+------+-------+------+-------+-------+-------+------+-------+
    | pc1   | n1 | 110  | 10.00 | 1000 | 0.010 | 120   | 10.00 | 1000 | 0.010 |
    | pc2   | n2 | 110  | 20.00 | 2000 | 0.020 | 120   | 20.00 | 2000 | 0.020 |
    | pc3   | n3 | 110  | 30.00 | 3000 | 0.030 | 120   | 30.00 | 3000 | 0.030 |
    +-------+----+------+-------+------+-------+-------+-------+------+-------+


2d. A few words about queues.

Conspicuously absent from the discussion thus far, is the topic of queues
and queue lengths.  The NS specification allows queue types and lengths to
be set on links and LANs, and both the virt_lans and delays tables contain
information about queues--I have just ignored it.  This needs to be
addressed, I have just never taken the time to understood the issues.

However, briefly, the default queue size is 50 packets.  That value can be
adjusted or changed to reflect bytes rather than packets.  There are also
some parameters for describing RED queuing as implemented by dummynet.

The one anomalous case is for the so-called incoming (reverse) path for LANs
in the delays table.  Here the queue size is hardwired to two, (I believe)
because bottleneck queuing for the connection between two nodes on the LAN
should take place only once, and that would be at the outgoing pipe.
This is an area that needs to be better understood and described.
468
469
470
471


3. Shaping info on the client.

472
473
474
475
476
477
478
479
480
481
482
483
484
485
To reiterate, shaping clients are most often dedicated "delay nodes,"
but may also be experiment nodes themselves when endnode shaping is used.
Each shaping client runs some delay configuration scripts at boot time to
handle the initial, static configuration of delays.  The client also runs
an instance of the delay-agent to handle dynamic changes to shaping.

The boot time scripts use database information returned via the "tmcc"
database proxy to perform the initial configuration and also to provide
interface configuration to the delay-agent.

3a. Dedicated shaping node

The DB state is returned in the tmcd "delay" command and is a series of
lines, each line looking like:
486
487

DELAY INT0=<mac0> INT1=<mac1> \
488
489
490
491
 PIPE0=<pipe0> DELAY0=<delay0> BW0=<bw0> PLR0=<plr0> \
 PIPE1=<pipe1> DELAY1=<delay1> BW1=<bw1> PLR1=<plr1> \
 LINKNAME=<link> \
 <queue0 params> <queue1 params> \
492
493
 VNODE0=<node0> VNODE1=<node1> NOSHAPING=<0|1>

494
495
496
pretty much a direct reflection of the information in the delays table,
one line per row in the table.

497
498
499
500
501
502
503
504
<mac0> and <mac1> are used to identify the physical interfaces which are
the endpoints of the link.  The client runs a program called findif to map
a MAC address into an interface to configure.  Identification is done in
this manner since different OSes have different names for interfaces
(e.g., "em0", "eth0") and even different versions of an OS might label
interfaces in different orders.

<pipe0> and <pipe1> identify the two directions of a link, with <delayN>,
505
506
<bwN> and <plrN> being the associated characteristics.  How these are used
is explained below.
507

508
<link> is the name of the link as given in the NS file and is used to
509
510
identify the link in the delay-agent.

511
512
513
514
<queueN params> are the parameters associated with queuing which I will
continue to gloss over for now.  Suffice it to say, the parameters pretty
much directly translate into dummynet configuration parameters.

515
516
517
518
519
520
<vnode0> and <vnode1> are the names of the nodes at the end points of the
link as given in the NS file.

The NOSHAPING parameter is not used by the delay agent.  It is used for
link monitoring to indicate that a bridge with no pipes should be setup.

521
522
This information is used at boot time to create two files.  One is for
the benefit of the delay-agent and is discussed later.  The other,
523
524
525
526
527
528
/var/emulab/boot/rc.delay is constructed on dedicated shaping nodes.
This file contains shell commands and is run to configure the bridge
and pipes.  For each delayed link/LAN (i.e., each line of the tmcc delays
information) the two interfaces are bridged together using the FreeBSD
bridge code and IPFW is enabled for the bridge.  Then the assorted IPFW
pipes are configured, again using the information from tmcc.  The result
529
looks something like this for a link:
530

531
    sysctl -w net.link.ether.bridge_cfg=<if0>:69,<if1>:69,
532
533
534
    sysctl -w net.link.ether.bridge=1
    sysctl -w net.link.ether.bridge_ipfw=1
    ...
535
536
537
538
    ipfw add <pipe0> pipe <pipe0> ip from any to any in recv <if0>
    ipfw add <pipe1> pipe <pipe1> ip from any to any in recv <if1>
    ipfw pipe <pipe0> config delay <del0>ms bw <bw0>Kbit/s plr <plr0> <q0-params>
    ipfw pipe <pipe1> config delay <del1>ms bw <bw1>Kbit/s plr <plr1> <q1-params>
539
    ...   
540

541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
Or pictorially:

  +-------+                      +-------+                      +-------+
  |       |                +-----+       +-----+                |       |
  | node0 |---- pipe0 ---->| if0 | delay | if1 |<---- pipe1 ----| node1 |
  |       | del0/bw0/plr0  +-----+       +-----+  del1/bw1/plr1 |       |
  +-------+                      +-------+                      +-------+

In terms of physical connectivity, node0's interface and delay's
interface <if0> are in a switch VLAN together while node1's interface and
delay's <if1> are in another VLAN.

For a LAN, consider a shaping node which is handling two nodes from a
potentially larger LAN.  There would be two lines from tmcd:

DELAY INT0=<mac0> INT1=<mac1> \
 PIPE0=<pipe0> DELAY0=<delay0> BW0=<bw0> PLR0=<plr0> \
 PIPE1=<pipe1> DELAY1=<delay1> BW1=<bw1> PLR1=<plr1> \
 LINKNAME=<lan> \
 <queue0 params> <queue1 params> \
 VNODE0=<node0> VNODE1=<node0> NOSHAPING=<0|1>

DELAY INT0=<mac2> INT1=<mac3> \
 PIPE0=<pipe2> DELAY0=<delay2> BW0=<bw2> PLR0=<plr2> \
 PIPE1=<pipe3> DELAY1=<delay3> BW1=<bw3> PLR1=<plr3> \
 LINKNAME=<lan> \
 <queue2 params> <queue3 params> \
 VNODE0=<node1> VNODE1=<node1> NOSHAPING=<0|1>

Note that LINKNAME is the same in both lines since the shaping node is
handling two nodes in the same LAN.  Also, in each line, VNODE0 and VNODE1
are the same since the two pipes describe connections for the same node
to and from the LAN as indicated in the picture below.

    sysctl -w net.link.ether.bridge_cfg=<if0>:69,<if1>:69,<if2>:70,<if3>:70,
    sysctl -w net.link.ether.bridge=1
    sysctl -w net.link.ether.bridge_ipfw=1
    ...
    ipfw add <pipe0> pipe <pipe0> ip from any to any in recv <if0>
    ipfw add <pipe1> pipe <pipe1> ip from any to any in recv <if1>
    ipfw pipe <pipe0> config delay <del0>ms bw <bw0>Kbit/s plr <plr0> <q0-params>
    ipfw pipe <pipe1> config delay <del1>ms bw <bw1>Kbit/s plr <plr1> queue 5
    ipfw add <pipe2> pipe <pipe2> ip from any to any in recv <if2>
    ipfw add <pipe3> pipe <pipe3> ip from any to any in recv <if3>
    ipfw pipe <pipe2> config delay <del2>ms bw <bw2>Kbit/s plr <plr2> <q2-params>
    ipfw pipe <pipe3> config delay <del3>ms bw <bw3>Kbit/s plr <plr3> queue 5

which looks like:

  +-------+                      +-------+                      +-------+
  |       |                +-----+       +-----+                |       |
  | node0 |---- pipe0 ---->| if0 |       | if1 |<---- pipe1 ----|       |
  |       | del0/bw0/plr0  +-----+       +-----+  del1/bw1/plr1 |       |
  +-------+                      |       |                      |       |
                                 | delay |                      | "lan" |
  +-------+                      |       |                      |       |
  |       |                +-----+       +-----+                |       |
  | node1 |---- pipe2 ---->| if2 |       | if3 |<---- pipe3 ----|       |
  |       | del2/bw2/plr2  +-----+       +-----+  del3/bw3/plr3 |       |
  +-------+                      +-------+                      +-------+
                                                                  |   |
                                                    (to other     |   |
                                                  shaping nodes)  V   V


In terms of physical connectivity, node0's interface and delay's
interface <if0> are in a switch VLAN together as are node1's interface
and delay's <if2>.  On the other side, delay's <if1> and <if3> (along with
interfaces for shaping nodes for any other nodes in the LAN) are in a VLAN
together.  Note that traffic between node0 and node1 will not take a
loopback "shortcut" on delay as the bridges between <if0>/<if1> and
<if2>/<if3> ensure that traffic is pushed out onto the LAN.

    NOTE: Looking at the previous two diagrams, you can see the main
    reason that LANs of two nodes are converted into links.  If the
    LAN example just above had only the two nodes shown and was
    implemented as a true LAN, it would require twice as many shaping
    resources as for a link.  That is, a link has one set of shaping
    pipes between nodes while a LAN of two nodes would have two sets.

    NOTE: These diagrams also show a subtle implementation issue
    with respect to switches.  Consider the shaped link between
    node0 and node1 in the first diagram.  Because the shaping node
    is bridging traffic between its if0 and if1, a side effect is
    that traffic with node0's MAC address will arrive at the switch
    not only at the port to which node0 is attached, but also at the
    port to which delay's if1 is attached.  Ditto for node1's MAC
    on its switch port and delay's if0 port.  Some switches cannot
    handle the same MAC address appearing on multiple ports in
    different VLANs.  These switches have a global (as opposed to
    per-VLAN) MAC table used for "learning" which MACs are behind
    which ports and are confused by the same MAC appearing on
    different ports.
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676

3b. Endnode shaping

The DB state is returned in the tmcd "linkdelay" command and is a series of
lines, each line looking like:

LINKDELAY IFACE=<mac> TYPE=<type> LINKNAME=<link> VNODE=<node> \
 INET=<IP> MASK=<netmask> \
 PIPE=<pipe> DELAY=<delay> BW=<bw> PLR=<plr> \
 RPIPE=<rpipe> RDELAY=<rdelay> RBW=<rbw> RPLR=<rplr> \
 <queue params>

pretty much a direct reflection of the information in the linkdelays table,
one line per row in the table.

<mac> is used to identify the physical interface which corresponds to
the endpoint of the link on this node.  As with shaping nodes, the client
uses findif to map the MAC address into an interface to configure.

<type> is the type of the link or LAN, either "simplex" or "duplex".
This is used as an indication as to whether the reverse pipe needs to
be setup (duplex) or not (simplex).  This is an unfortunate overloading
of the terms, as a duplex link will be labeled with TYPE=simplex.

<link> is the name of the link as given in the NS file and is used to
identify the link in the delay-agent.

<node> is the node receiving the info (us).  It is not really needed.

<IP> and <netmask> are no longer important.  They were used to enable
endnode shaping on physical links that were multiplexed using IP aliasing.
These were used along with a local modification to IPFW to apply multiple
rules to an interface based on the network of the "next hop".  We no longer
allow this (though the rules are still setup, see below) as it did not
completely work.

<pipe> and <rpipe> identify the two directions of a link, with <(r)delay>,
<(r)bw>, and <(r)plr> being the associated characteristics.

<queue params> are the parameters associated with queuing which I will
continue to gloss over for now.  Suffice it to say, the parameters pretty
much directly translate into dummynet configuration parameters.

677
678
679
As with the shaping node case, the information is used at boot time to
create two files.  One for the delay-agent discussed in the next section,
and the other for boot time configuration of the shaping pipes.
680
681
682
683
/var/emulab/boot/rc.linkdelay is analogous to rc.delay for dedicated shaping
nodes.  It contains shell commands and is run to configure IPFW on the shaped
interfaces.  The result looks something like:

684
685
686
687
688
    ifconfig <if> media 100baseTX mediaopt full-duplex
    ipfw add <pipe> pipe <pipe> ip from any to any out xmit <if>
    ipfw pipe <if> config delay <del>ms bw <bw>Kbit/s plr <plr> <q-params>
    ipfw add <rpipe> pipe <rpipe> ip from any to any in recv <if>
    ipfw pipe <rpipe> config delay <del>ms bw <bw>Kbit/s plr 0 queue 5
689
690
691
692


4. Dynamic shaping with the delay-agent.

693
4a. delay-agent running on dedicated shaping nodes.
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714

The delay-agent uses a mapping file created at boot time to determine
what names are associated with what interfaces and delay pipes.
/var/emulab/logs/delay_mapping contains a link describing each link
which is to be shaped by this node.  Lines looks like:

  <linkname> <linktype> <node0> <node1> <if0> <if1> <pipe0> <pipe1>

<linkname> is what is specified in the ns file as the link/lan name.
It is used as the ID for operations (events) on the link/lan.

<linktype> is duplex or simplex.

<node0> and <node1> are the ns names of nodes which are the endpoints
of a link.  For a lan, then will be the same name.

<if0> and <if1> are the interfaces *on the delay node* for the two sides
of the link.  <if0> is associated with <node0> and <if1> with <node1>.
For a lan, <if0> is associated with <node0> and <if1> with "the lan"
(see below for more info).

715
716
Reviewing the diagrams from the previous section, the shaping setup for
a link (or LAN of two nodes) looks like:
717
718
719
720
721
722
723

  +-------+                   +-------+                   +-------+
  |       |             +-----+       +-----+             |       |
  | node0 |--- pipe0 -->| if0 | delay | if1 |<-- pipe1 ---| node1 |
  |       |             +-----+       +-----+             |       |
  +-------+                   +-------+                   +-------+

724
725
So the delay_mapping file provides the necessary context to map events
for the link <linkname> to IPFW operations on dummynet pipes.
726
727
728
729
730
731
732
733
734

A LAN of 3 or more nodes is considerably different.  Each node will have
two pipes again, one between the node and the delay node and one between
the delay node and "the lan".  The delay_mapping file now looks like:

  <linkname> <linktype> <node0> <node0> <if0> <if1> <pipe0> <pipe1>
  <linkname> <linktype> <node1> <node1> <if2> <if3> <pipe2> <pipe3>
  <linkname> <linktype> <node2> <node2> <if4> <if5> <pipe4> <pipe5>

735
736
737
738
739
    NOTE: Of course, our shaping nodes can only handle two links
    each since they have only four interfaces, so there will
    actually be two shaping nodes for a LAN of three nodes.
    But for this explanation, we pretend that one shaping node
    has six interfaces and would have the above lines.
740
741
742
743
744
745
746
747
748
749
750
751
752

This translates into:

  +-------+                   +-------+                   +-------+
  |       |             +-----+       +-----+             |       |
  | node0 |--- pipe0 -->| if0 |       | if1 |<-- pipe1 ---|       |
  |       |             +-----+       +-----+             |       |
  +-------+                   |       |                   |       |
                              |       |                   |       |
  +-------+                   |       |                   |       |
  |       |             +-----+       +-----+             |       |
  | node1 |--- pipe2 -->| if2 | delay | if3 |<-- pipe3 ---| "lan" |
  |       |             +-----+       +-----+             |       |
753
  +-------+                   |       |                   |       |
754
755
756
757
758
759
760
                              |       |                   |       |
  +-------+                   |       |                   |       |
  |       |             +-----+       +-----+             |       |
  | node2 |--- pipe4 -->| if4 |       | if5 |<-- pipe5 ---|       |
  |       |             +-----+       +-----+             |       |
  +-------+                   +-------+                   +-------+

761
762
763
764
765
4b. delay-agent on end nodes.

Fill me in...

4c. Dynamic configuration via events.
766
767
768
769
770

Emulab events are used to communicate and effect changes on links.
delay-agent specific events have the following arguments.

OBJNAME: the link being controlled.
771
772
773
774
775
776
777
  The name is of the form <linkname> or <linkname>-<nodename>.
  The former is used duplex links and symmetrically shaped LANs to
  change the global characteristics.  The latter is used to identify
  specific endpoints of a link or LAN to effect simplex-style changes.

OBJTYPE: the event agent type.
  Always "LINK".
778

779
780
EVENTTYPE: operations on links and LANs.
  One of: RESET, UP, DOWN, MODIFY.
781
782
783

  RESET forces a complete re-running of "delaysetup" which tears down
  all existing dummynet and bridging, and sets it up again.  Currently
784
785
  only used as part of the Flexlab infrastructure below, but not specific
  to it.
786
787
788
789
790

  UP, DOWN will take the indicated link up or down.  Taking a link down
  is done by setting the packet loss rate to 1.  Up returns the plr to
  its previous value.

791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
  MODIFY is used for all other changes:
    PIPE=N: shaping pipe to apply changes to,
    BANDWIDTH=N: bandwidth, measured in kilobits per second,
    DELAY=N: one-way delay measured in milliseconds,
    PLR=N.N: a packet loss probability between 0 and 1,
    LIMIT=N: the maximum length of the bandwidth shaping queue in packets,
    QUEUE-IN-BYTES: indicates that limit is bytes rather than packets,
    MAXTHRESH, THRESH, LINTERM, Q_WEIGHT: assorted dummynet RED params.

The tevc syntax for sending these events is:

    tevc -e pid/eid now <objname> <eventtype> <args...>

Links and lans are identified by the names given to them in the NS file
(e.g., "link1", "biglan").  Appending the node name with a dash (e.g.,
"link1-n2", "biglan-client5") identifies a particular node's connection
to the indicated link/lan and is used for modifying one-way params.
The object type (LINK) is implied by the object and does not need to be
passed via tevc.

811
812
813

5. Flexlab configuration.

814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
Flexlab has a variety of interesting requirements.  In the "application-
centric" Internet Model (ACIM), it wants to be able to perform traffic
shaping on a per-flow basis.  Here a flow is a combination of protocol,
src and dst IP address and port numbers.  These flow pipes are setup and
torn down on demand as the application being monitored opens and closes
network connections.  Different pipes are used for traffic in each direction.
In addition to bandwidth and delay (no loss rate yet), it also indirectly
tracks the maximum queuing delay of the bottleneck router and wants to be
able to emulate that in dummynet.

There are also two "simple," measurement-driven models.  The simple-static
model uses historic data to do a one-time configuration of the shaping of
links between host pairs.  The simple-dynamic model continuously updates
the link shaping parameters based on current values collected from a
background measurement service.  Both of the simple models require two
simplex pipes per host pair in the experiment, one in each direction. 

Finally, there are some improvements on the simple model which attempt
to recognize bottleneck links in the topology of communicating nodes.
In this so-called "hybrid" model, there is not a complete set of node-pair
shaping pipes for all characteristics.  Instead a nodes may share a
bandwidth pipe to a set of destination nodes.  The node may also have a
different shared bandwidth pipe to another set of nodes, or may have
per-node destination bandwidth pipes as well:


N5 <----+                     +----> N2
         \  5Mbps    10Mbps  /
N6 <------+<----- N1 ------>+------> N3
                  /          \
N7 <-------------+            +----> N4
       50Mbps

This is the "shared destination" variant of the hybrid model.  There is also
an analogous (and as of yet unimplemented) shared-source model in which
incoming traffic for a node from a set of nodes shares a bandwidth pipe.
In both of the shared models, bandwidth and delay are still per node-pair
attributes.

It is assumed that the ACIM and simple models will not exist within the
same experiment; i.e., a delay node only has to implement one or the
other at any given time.

While all of these models are configured and managed dynamically using
events and the delay-agents, there are some changes to the NS file
necessary to accommodate Flexlab usage.

A Flexlab experiment must have all nodes in a "cloud" created via the
"make-cloud" method instead of "make-lan":

    set cloud [$ns make-cloud "n1 n2 n3" 100Mbps 0ms]

Make-cloud is roughly equivalent to creating a symmetrically shaped LAN
with no initial shaping (i.e., with mustdelay set to force allocation of
shaping nodes), and with special triggering of a CREATE event (described
below) at boot time, e.g.:

    set cloud [$ns make-lan "n1 n2 n3" 100Mbps 0ms]
    $cloud mustdelay
    $ns at 0 "$cloud CREATE"

A cloud must have at least three nodes as LANs of two nodes are optimized
into a link and links do not give us all the pipes we need, as we will see
soon.

The CREATE event is sent to all nodes in the cloud (rather, to the shaping
node responsible for each node's connection to the underlying LAN) and
881
882
883
884
885
886
creates, internal to the delay-agent, "node pair" pipes for each node to
all other nodes on the LAN.  Actual IPFW rules and dummynet pipes are only
created the first time a per-pair pipe's characteristics are set via the
MODIFY event.  This behavior is in part an optimization, but is also
essential for the hybrid model described later.

887
888
889
There is a corresponding CLEAR event which will destroy all the per-pair
pipes, leaving only the standard delayed LAN setup (node to LAN pipes).

890
891
892
893
894
895
896
897
898
899
900
Each node-to-LAN connection has two pipes associated with each possible
destination on the LAN (destinations determined from /etc/hosts file).
The first pipe is used for shaping bandwidth for the pair.  The second
pipe is used for shaping delay (and eventually packet loss).  While it
might seem that the single pipe from a node to the LAN might be sufficient
for shaping both, the split is needed when operating in the hybrid mode
as described below.  Characteristics of these per-pair pipes cannot be
modified unless a CREATE command has first been executed.

Assuming all IPFW/dummynet pipes have been modified, the cloud snippet
above would translate into a physical setup of:
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923

  +----+                        +-------+                        +-------+
  |    |--- to n2 pipe -->+-----+       +-----+<- from n2 pipe --|       |
  | n1 |--- to n3 pipe -->| if0 |       | if1 |<- from n3 pipe --|       |
  |    |-- from n1 pipe ->+-----+       +-----+<-- to n1 pipe ---|       |
  +----+       (BW)             |       |         (del/plr)      |       |
                                |       |                        |       |
  +----+                        |       |                        |       |
  |    |--- to n1 pipe -->+-----+       +-----+<- from n1 pipe --|       |
  | n2 |--- to n3 pipe -->| if2 | delay | if3 |<- from n3 pipe --| "lan" |
  |    |-- from n2 pipe ->+-----+       +-----+<-- to n2 pipe ---|       |
  +----+       (BW)             |       |         (del/plr)      |       |
                                |       |                        |       |
  +----+                        |       |                        |       |
  |    |--- to n1 pipe -->+-----+       +-----+<- from n1 pipe --|       |
  | n3 |--- to n2 pipe -->| if0 |       | if1 |<- from n2 pipe --|       |
  |    |-- from n3 pipe ->+-----+       +-----+<-- to n3 pipe ---|       |
  +----+       (BW)             +-------+         (del/plr)      +-------+


where the top two pipes in each set of three are the new, per-pair pipes
and the final pipe is the standard shaping pipe which can be thought of
as the "default" pipe through which any traffic flows for which there is
924
925
926
927
928
929
930
931
932
933
not a specific per-pair setup.  In IPFW, the rules associated with the
per-pair pipes are numbered starting at 60000 and decreasing.  This gives
them higher priority than the default pipes which are numbered above 60000.

One important thing to note is that while bandwidth is shaped on the
outgoing pipe, when a delay value is set on n1 for destination n2, it is
imposed on the link *into* n1.  This is different than for regular LAN
shaping (and for the ACIM model below), where bandwidth, delay and loss
are all applied in one direction.  The reason for the split is explained
in the hybrid-model discussion below.
934
935
936
937
938
939
940
941
942
943
944
945
946
947

5a. Simple mode setup:

In both the simple-static and simple-dynamic models, tevc commands are
used to assign characteristics to the various per-pair pipes created above.
In the static case, this is done only at boot time.  In the dynamic case,
it is done periodically throughout the lifetime of the experiment.  To
accomplish this, the tevc MODIFY event is augmented with an additional
DEST parameter.  The DEST parameter is used to identify a specific node
pair pipe (the source is implied by the link object targeted by the tevc
command).  If the DEST parameter is not given, then the modification is
applied to the "default" pipe (i.e., the normal shaping behavior).  For
example:

948
949
950
951
952
953
    tevc -e pid/eid now cloud-n1 MODIFY DEST=10.0.0.2 BANDWIDTH=1000 DELAY=10

Assuming 10.0.0.2 is "n2" in the diagram above, this would change n1's
"to n2 pipe" to shape the bandwidth, and change n1's "from n2 pipe" to
handle the delay.  If a more "balanced" shaping is desired, half of each
characteristic could be applied to both sides via:
954

955
956
    tevc -e pid/eid now cloud-n1 MODIFY DEST=10.0.0.2 BANDWIDTH=1000 DELAY=5
    tevc -e pid/eid now cloud-n2 MODIFY DEST=10.0.0.1 BANDWIDTH=1000 DELAY=5
957
958
959
960
961
962
963
964

5b. ACIM mode setup:

ACIM mode is again a dynamic shaping feature.  As if per-node pair pipes
were not enough, here we further add per-flow pipes!  For example, in the
diagram above, the six pipes for n1 might also have a seventh pipe for
"n1 TCP port 10345 to n2 TCP port 80" if a monitored web application running
on n1 were to connect to the web server on n2.  That pipe could then have
965
specific BW, delay and loss characteristics.
966

967
968
969
970
971
972
973
974
975
976
977
978
979
980
Note that only one pipe is created here to serve bandwidth, delay and loss,
unlike the split of BW from the others on per-pair pipes.  The one pipe is
in the node-to-lan outgoing direction (i.e., on the left hand side in the
diagram above).

Higher priority is given to per-flow pipes by numbering the IPFW rules
starting from 100 and working up.  Thus the priority is: per-flow pipe,
per-pair pipe, default pipe.

For an application being monitored with ACIM, the flow pipes are created
for each flow on the fly as connections are formed.  Flows from unmonitored
applications will use the node pair pipes.  Note that this would include
return traffic to the monitored application unless the other end were also
monitored.
981
982
983
984
985
986
987
988
989
990
991
992
993

The tevc commands sports even more parameters to support per-flow pipes.
In addition to the DEST parameter, there are three others needed:

PROTOCOL:
    Either "UDP" or "TCP".

SRCPORT:
    The source UDP or TCP port number.

DSTPORT:
    The destination UDP or TCP port number.

994
995
996
997
998
999
1000
An example follows.  First, a flow pipe must be explicitly created:

    tevc -e pid/eid now cloud-n1 CREATE \
	DEST=10.0.0.2 PROTOCOL=TCP SRCPORT=10345 DSTPORT=80

Note that unlike per-pair pipes, the CREATE call here immediately creates
the associated IPFW rule and dummynet pipe.  A flow pipe will inherit its