assign_internals.txt 16.4 KB
Newer Older
Leigh Stoller's avatar
Leigh Stoller committed
1
#
2
# Copyright (c) 2000-2003 University of Utah and the Flux Group.
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
# 
# {{{EMULAB-LICENSE
# 
# This file is part of the Emulab network testbed software.
# 
# This file is free software: you can redistribute it and/or modify it
# under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
# 
# This file is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Affero General Public
# License for more details.
# 
# You should have received a copy of the GNU Affero General Public License
# along with this file.  If not, see <http://www.gnu.org/licenses/>.
# 
# }}}
Leigh Stoller's avatar
Leigh Stoller committed
22 23
#

24
PLEASE NOTE: This file is somewhat out of date.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

This document attempts to describe some of the internals of assign.
Much of the simulated annealing code of assign is not documented here
as it was not written by this author.  What is covered is a high level
overview and details of the scoring code.

Overview
--------

assign attempts to solve a graph to graph mapping problem using
simulated annealing.  The 'physical' graph consists of nodes (pnodes),
links (plinks), and switches.  The virtual graph consists of nodes
(vnodes) and links (vlinks).  As assign runs it matches vnodes to
pnodes and vlinks to plinks.  An incremental scoring system keeps
track of the running score, both as a quality (reflected as a low
score) and a number of violations.  A correct solution is a solution
with 0 violations.  A good solution is a correct solution with a low
score.  Assign uses simulated annealing to attempt to minimize both
violations and score.

Simulated Annealing
-------------------

Simulate annealing works by randomly shuffling the state of a problem
around.  In general changes that result in a increase in score are
ignored while those resulting in a decrease are accepted.  The
problem with this approach is that it is easy to fall into a local
minima.  To avoid this in simulated annealing there is a random chance
of accepting a change with an increase in score.  This chance starts
rather high and as the algorithm proceeds steadily decreases.  The
likelihood of accepting a "bad" change is reflected in the
temperature.  Once the temperature has reach a threshold the algorithm
exits and prints out the best solution found.  A good visual metaphor
is to view the solution space as a hilly landscape with the altitude
reflecting the score.  The current solution is a boulder.  During
simulated annealing the landscape is shaken causing the boulder to
bounce about.  As time goes on the landscape shakes less and less.

Changes and Scoring
-------------------

The simulated annealing loop interfaces with the scoring system and
the data via two routines, add_node and remove_node.  Both routines
update the data structures appropriately and modify the score and
69
violations count.  add_node takes a vnode and a pnode to map it to
70 71 72 73 74 75 76 77 78 79 80 81 82 83
and returns whether that match is even remotely valid or not.
remove_node just takes a vnode.  The initial state of the mapping is
assumed to be the null mapping.  I.e., no vnodes have been mapped and
thus no physical resources used.  This starts with a violation count
equal to the number of vnodes as each unmapped vnode causes one
violation.

The Type System
---------------

Each vnode has a type and a valid solution must match each vnode to a
pnode that can support that type.  A vnode has a single type but a
pnode may be able to support several types and even have multiple
vnodes in a single vnode.  For example, a standard testbed PC with
84 85 86 87 88 89 90 91 92 93 94 95 96 97
four interfaces can either be a test node or up to two delay nodes.
The type system is composed of two classes of types: 'dynamic' types,
and 'static' types. A pnode can always satisfy any of its static
types, but has only one of its dynamic types 'active' at any time.  A
pnode that has not been matched to a vnode using one of its dynamic
types is called virgin and can take on any of its dynamic types.  Once
it has been mapped to a vnode it becomes typed and is locked into a
certain type (that of the vnode).  When add_node is called the first
thing checked is whether the pnode has a suitable type for the vnode.
If the vnode uses a static type, the pnode must simply have available
capacity for that type. Otherwise, dynamic types are checked. If it is
virgin then all its types are checked, otherwise only its current type
is checked.  If the types can not be matched then add_node returns an
invalid condition, otherwise it continues on.
98 99

Each type of a pnode also has a limit.  This is the number of vnodes
100 101
of that type that can fit in the pnode.  A vnode can not be placed in
a pnode unless there is room left.
102 103 104 105 106 107 108 109

Links
-----

Each vlink has the amount of bandwidth is uses.  Each plink has its
bandwidth capacity and two arbitrary strings which are associated with
each end.  These strings are not used for any scoring purposes and are
just passed through, they are their for the use of the user.  A
110
virtual to physical link mapping comes in one of four types.  DIRECT
111 112 113 114 115 116 117 118 119
links represent physical wires connecting one node directly to
another.  INTRASWITCH links are links that go from a node to a switch
and then from that switch to the destination node.  INTERSWITCH links
go from a node to a switch, then through one or more interconnects to
another switch, and finally to the destination node.  DIRECT mappings
are always one to one, i.e. there is only one plink corresponding to
the vlink.  In INTRASWITCH maps there are two plinks for the vlink;
one to the switch, and one from the switch.  In INTERSWITCH maps there
are at least three (to the switch, at least one interconnect, and from
120 121 122 123
the final switch to the destination) and can be any number.  The final
type is a TRIVIAL link.  This type occurs when trying to assign a
vlink between vnodes that are in the same pnode.  TRIVIAL links impose
zero cost and have no data associated with them.
124 125 126 127 128 129 130 131

For interswitch plinks (switch to switch links) any number of vlinks
can use the plink.  For all other plinks only a single vlink is
allowed to use the plink without violations.  Every vlink beyond the
first results in a violation.  Violations also occur for every vlink
that causes the used bandwidth of a plink to exceed the allowed
bandwidth.

132 133 134
A vlink can be marked emulated.  An emulated vlink can share a plink
with other emulated vlinks.

135 136 137 138 139 140 141 142 143
The switches and their interconnects are collectively called the
switch fabric.  The shortest path between any two switches is
precomputed.  This means that every interswitch link between the same
two switches will use the same path.  This also means that any
switches connected by multiple links will only use a single link.  For
bandwidth calculating purposes multiple links between switches should
be collated into a single link.


144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211
PClasses
--------

Assign defines an equivalence relation of physical nodes and uses this
relation to partition the physical space into equivalence classes.
Simulated annealing is then done on the classes rather than the
individual nodes.  In addition a cost is imposed on each pclass used,
causing assign to prefer solutions with similar nodes.

Two pnodes are equivalent if they have identical type information,
identical feature information, and a bijection of links exists which
preserves destination and bandwidth.  I.e. every link on one node can
be matched to a link on the other node such that they have the same
destination and bandwidth.

The equivalence classes are automatically generated during the
initialization step and stored in a dictionary of arrays, with keys as
types.  During annealing a starting pclass is randomly chosen and then
the pclasses are rotated through until a valid class is found.  As the
list of pclasses is chosen by type assign will never look at pclasses
that can not support the type of the node.

A complication arises in pnodes that can contain multiple vnodes.  To
handle this each pclass has a list of all nodes of a given type.  Say
a node of type T is requested from pclass P.  Assign looks at the list
of T nodes in P and chooses the first one.  It then removes that node
from all other type lists in P.  In addition, if the node is now full
it is removed from the T list in P, otherwise it is left at the front.
When a node is unassigned, if it is now empty it is added to the end
of all type lists in its class.  Otherwise, if it still has a vnode
and hence a type, it is only added to the beginning of the list of its
current type.  By this method assign tends to fill nodes completely
before moving on to a new node.


VClasses
--------

VClasses allow the user to specify a class of virtual nodes and then
have the vnodes be of that class.  Each vclass has a weight and a list
of allowable types.  Assign attempts to map nodes such that all nodes
of the same vclass have the same type.

To do this, assign keeps track of the dominant type of a vclass.  This
is the type that the majority of members are.  Every time a member of
a vclass is assigned or unassigned the dominant type and score from
that vclass are recalculated.  The score of a vclass (not the member
nodes) is 0 if all members are of the dominant type, and the weight
otherwise.  A weight of 1 or more also generates a vclass violation.

Due to the nature of assign when a member of a vclass is to be mapped
its type must be chosen before it is fitted.  It would be nice if
assign computed all possible pclasses that could satisfy the member,
based on its possible types, and then loop through that list.  This is
future work.  For the moment, when assign sees a member of a vclass it
chooses a type for it.  A little over half the time it will choose the
dominant class, otherwise it chooses randomly.

Fixed Nodes
-----------

Fixed nodes are a mechanism by which a user can manually specify part
of the mapping themselves.  Fixed nodes show up in assign as a special
case.  If a node is fixed it is marked as such and mapped to the
correct pnode during initialization.  From then on it can never be
unmapped.


212 213 214 215 216 217 218 219 220 221 222 223 224
LEDA
----

Assign makes heavy use of LEDA.  Before getting deeply into the code
it would be a good idea to become familiar with LEDA's parameterized
GRAPHs and other basic data structures.


Overview of the Code
--------------------

CC Files:

225 226 227 228
assign.cc - Contains all initialization code, including main().

anneal.cc - Contains the simulated annealing loop and related
functions.
229 230 231 232 233 234 235 236 237 238

score.cc - Contains init_score, add_node, remove_node and all
associated helper routines.

parse_top.cc - Contains parse_top which parses and sets up the virtual
topology.

parse_ptop.cc - Contains parse_ptop which parses and sets up the
physical topology.

239 240 241 242 243
pclass.cc - Contains the equivalence relation, the code to generate
the pclasses, and the code to maintain the pclasses.

vclass.cc - Contains the code to setup vclasses and calculate score
deltas when nodes are mapped and unmapped..
244

245 246 247
solution.cc - Contains the code to print out solutions and other
graphs.

248 249 250 251 252 253 254 255 256 257 258 259 260
Header Files:

common.h - Contains all the parameters including all the score
amounts.

score.h - Defines the datastructure and prototypes for the scoring
code.

physical.h - Defines the physical elements: pnodes, plinks, and switch
fabric structures.

virtual.h - Defines the virtual elements: vnodes and vlinks.

261 262 263 264
pclass.h - Defines the pclass class.

vclass.h - Defines the vclass class.

265 266 267 268 269 270 271 272 273 274 275 276 277 278

Important Globals
-----------------

switch_pred_array - The predecessor array to represent the shortest
path between any two switches.

PG - The physical topology.

G - The virtual topology.

bestscore, absbest - The bestscore this iteration and the absolute
best score.

279
absnodes - Contains the absolute best solution.
280 281 282

unassigned_nodes - Priority queue of nodes to be assigned.

283 284 285 286 287
type_table - Indexed by type, contains a list of pclasses that can
satisfy that type.

fixed_nodes - Indexed by vname, contains the pname the node is fixed
to.
288 289 290 291 292 293 294 295

Initialization
--------------

During the initialization process assign reads in the top and ptop
files, initializes the random number generator, and calculates the
shortest paths between any two switches.  

296
This next step deserves some comments.  parse_ptop will fill out both
297 298 299 300 301 302 303 304 305 306 307 308 309
the PG, the completely physical topology, and SG, the switch fabric.
The SG graph contains slinks and snodes which exists in one-to-one
correspondence to the switches and interswitch links in PG.  Each
slink and snode has a field, mate, which contains the corresponding
edge or node in PG.  Assign determines edge costs as 10000 minus the
bandwidth of the interswitch link and then uses the DIJKSTRA_T
algorithm from LEDA to compute the shortest paths between all
switches.  These paths are stored in switch_distances.  For a given
switch sw, switch_distances[sw] is a node_array (indexed by node) of
the predecessor node.  This is used by the scoring code to calculate
interswitch paths.  If SCORE_DEBUG is defined the both edge costs and
predecessors are output.

310
Finally assign computes all physical equivalence classes.
311 312 313 314 315 316 317 318

The Main Loop
-------------

A strange sequence of calls that exist for historical reasons
ultimately end up with assign() which is the main simulated annealing
loop.  This particularly code has had at least three and possibly as
many as five different authors and none of them quite understand it.
319 320 321 322 323 324
In essence it randomly chooses an unassigned vnode and then a pclass
that can satisfy the vnode and calls add_node to attempt to map
them.  If add_node succeeds it then compares scores and decides
whether to accept the mapping.  If no vnodes are unassigned then it
randomly chooses an assign vnode, removes it via remove_node and then
randomly maps it to a pnode via add_node.
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434


The Scoring Code
----------------

init_score()

This initializes the scoring datastructures.  First it sets score and
violations to 0 and blanks the vinfo structure (vinfo stores counts of
each type of violations).  It then looks at all vnodes and add the
cost of an unassigned node to the score and adds a violations.  It
then goes through all vnodes, vlinks, plinks, and pnodes and sets
their data members appropriately for being unused and unassigned.

add_node(vnode,pnode)

If pnode is virgin
	If matching type exists then set type of pnode to T
   Else fail
Else if types match and and pnode is not full then we're ok
Else fail

# At this point we know that we can fit the vnode into the pnode with
# all types ok.  We now go onto links.

Foreach vlink in vnode
	If destination is mapped
		# We assign links when both ends are mapped.  Otherwise we leave
		# them unassigned.
		If direct_link exists then map vlink to single direct plink
		Else if vnode and destination are on same switch then
			Find links from vnode to switch to destination and map to them.
		Else
			If interswitch path exists
				Map vlink to interswitch path
			Else
				Fail
				# It's possible to score this as a violation instead of
				# a failure however it was found to do better when we failed.

# We now have all links set up we just need to do the last few things:
# increment the load, features and desires, count a pnode used if this
# was virgin, count a switch as used if this is the pnode on the
# switch, and reduce the penalty and score for a newly assigned vnode.
Finish setting up pnode


remove_node(vnode)

# This is almost the exist reverse of add_node
Let pnode be what vnode is mapped to.

Foreach vlink in vnode
	Unassociated all plinks
	Remove score for link type and plink costs.
	Remove any violations for bandwidth.

Remove any violations associated with no connections on this node.
Reduce the load and free up pnode is load is now 0.
Free the switch if this was the last node on the switch.
Penalize for an unassigned vnode.
Undo all features and desires.


direct_link(A,B)

This routine looks for a direct link from A to B.  It does this in a
pretty basic fashion.  It looks at all edges from A and looks for the
edge going to B with the least users and less bandwidth used.


find_link_to_switch(A)

Very similar to direct link except is going from A to the switch A is
connected to rather than from A to B.


find_intraswitch_path(src,dst)

Extremely basic.  Just calls find_link_to_switch on both src and dst.


find_interswitch_path(src,dst)

Another very basic routine.  Just reads off the from the switch_preds
datastructure the shortest path between the src switch and the dst
switch.


score_link(edge,interswitchp)

For non interswitch links this penalizes users beyond the first.  

For all links this keeps track of bandwidth and penalizes going over
bandwidth allowance.


unscore_link(edge,interswitch)

Exact opposite of score_link.