All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

assign_internals.txt 11.2 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

This document attempts to describe some of the internals of assign.
Much of the simulated annealing code of assign is not documented here
as it was not written by this author.  What is covered is a high level
overview and details of the scoring code.


assign attempts to solve a graph to graph mapping problem using
simulated annealing.  The 'physical' graph consists of nodes (pnodes),
links (plinks), and switches.  The virtual graph consists of nodes
(vnodes) and links (vlinks).  As assign runs it matches vnodes to
pnodes and vlinks to plinks.  An incremental scoring system keeps
track of the running score, both as a quality (reflected as a low
score) and a number of violations.  A correct solution is a solution
with 0 violations.  A good solution is a correct solution with a low
score.  Assign uses simulated annealing to attempt to minimize both
violations and score.

Simulated Annealing

Simulate annealing works by randomly shuffling the state of a problem
around.  In general changes that result in a increase in score are
ignored while those resulting in a decrease are accepted.  The
problem with this approach is that it is easy to fall into a local
minima.  To avoid this in simulated annealing there is a random chance
of accepting a change with an increase in score.  This chance starts
rather high and as the algorithm proceeds steadily decreases.  The
likelihood of accepting a "bad" change is reflected in the
temperature.  Once the temperature has reach a threshold the algorithm
exits and prints out the best solution found.  A good visual metaphor
is to view the solution space as a hilly landscape with the altitude
reflecting the score.  The current solution is a boulder.  During
simulated annealing the landscape is shaken causing the boulder to
bounce about.  As time goes on the landscape shakes less and less.

Changes and Scoring

The simulated annealing loop interfaces with the scoring system and
the data via two routines, add_node and remove_node.  Both routines
update the data structures appropriately and modify the score and
Christopher Alfeld's avatar
Christopher Alfeld committed
violations count.  add_node takes a vnode and a pnode to map it to
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
and returns whether that match is even remotely valid or not.
remove_node just takes a vnode.  The initial state of the mapping is
assumed to be the null mapping.  I.e., no vnodes have been mapped and
thus no physical resources used.  This starts with a violation count
equal to the number of vnodes as each unmapped vnode causes one

The Type System

Each vnode has a type and a valid solution must match each vnode to a
pnode that can support that type.  A vnode has a single type but a
pnode may be able to support several types and even have multiple
vnodes in a single vnode.  For example, a standard testbed PC with
four interfaces can either be a test node or up to two delay nodes.  A
pnode that has not been matched to a vnode is called virgin and can
take on any of its types.  Once it has been mapped to a vnode it
becomes typed and is locked into a certain type (that of the vnode).
Christopher Alfeld's avatar
Christopher Alfeld committed
When add_node is called the first thing checked is whether the pnode
65 66
has a suitable type for the vnode.  If it is virgin then all its types
are checked, otherwise only its current type is checked.  If the types
Christopher Alfeld's avatar
Christopher Alfeld committed
can not be matched then add_node returns an invalid condition,
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313
otherwise it continues on.

Each type of a pnode also has a limit.  This is the number of vnodes
of that type that can fit in the pnode.  A vnode can not be placed in a pnode unless there is room left.

Under no conditions is a vnode mapped to a switch.


Each vlink has the amount of bandwidth is uses.  Each plink has its
bandwidth capacity and two arbitrary strings which are associated with
each end.  These strings are not used for any scoring purposes and are
just passed through, they are their for the use of the user.  A
virtual to physical link mapping comes in one of three types.  DIRECT
links represent physical wires connecting one node directly to
another.  INTRASWITCH links are links that go from a node to a switch
and then from that switch to the destination node.  INTERSWITCH links
go from a node to a switch, then through one or more interconnects to
another switch, and finally to the destination node.  DIRECT mappings
are always one to one, i.e. there is only one plink corresponding to
the vlink.  In INTRASWITCH maps there are two plinks for the vlink;
one to the switch, and one from the switch.  In INTERSWITCH maps there
are at least three (to the switch, at least one interconnect, and from
the final switch to the destination) and can be any number.

For interswitch plinks (switch to switch links) any number of vlinks
can use the plink.  For all other plinks only a single vlink is
allowed to use the plink without violations.  Every vlink beyond the
first results in a violation.  Violations also occur for every vlink
that causes the used bandwidth of a plink to exceed the allowed

The switches and their interconnects are collectively called the
switch fabric.  The shortest path between any two switches is
precomputed.  This means that every interswitch link between the same
two switches will use the same path.  This also means that any
switches connected by multiple links will only use a single link.  For
bandwidth calculating purposes multiple links between switches should
be collated into a single link.


Assign makes heavy use of LEDA.  Before getting deeply into the code
it would be a good idea to become familiar with LEDA's parameterized
GRAPHs and other basic data structures.

Overview of the Code

CC Files: - Contains the simulated annealing loop and all
initialization code, including main(). - Contains init_score, add_node, remove_node and all
associated helper routines. - Contains parse_top which parses and sets up the virtual
topology. - Contains parse_ptop which parses and sets up the
physical topology.

Header Files:

common.h - Contains all the parameters including all the score

score.h - Defines the datastructure and prototypes for the scoring

physical.h - Defines the physical elements: pnodes, plinks, and switch
fabric structures.

virtual.h - Defines the virtual elements: vnodes and vlinks.

Important Globals

switch_pred_array - The predecessor array to represent the shortest
path between any two switches.

PG - The physical topology.

G - The virtual topology.

bestscore, absbest - The bestscore this iteration and the absolute
best score.

bestnodes, absnodes - Contains the actual best and absolute best

unassigned_nodes - Priority queue of nodes to be assigned.


During the initialization process assign reads in the top and ptop
files, initializes the random number generator, and calculates the
shortest paths between any two switches.  

This last step deserves some comments.  parse_ptop will fill out both
the PG, the completely physical topology, and SG, the switch fabric.
The SG graph contains slinks and snodes which exists in one-to-one
correspondence to the switches and interswitch links in PG.  Each
slink and snode has a field, mate, which contains the corresponding
edge or node in PG.  Assign determines edge costs as 10000 minus the
bandwidth of the interswitch link and then uses the DIJKSTRA_T
algorithm from LEDA to compute the shortest paths between all
switches.  These paths are stored in switch_distances.  For a given
switch sw, switch_distances[sw] is a node_array (indexed by node) of
the predecessor node.  This is used by the scoring code to calculate
interswitch paths.  If SCORE_DEBUG is defined the both edge costs and
predecessors are output.

The Main Loop

A strange sequence of calls that exist for historical reasons
ultimately end up with assign() which is the main simulated annealing
loop.  This particularly code has had at least three and possibly as
many as five different authors and none of them quite understand it.
In essence it randomly chooses an unassigned vnode and then a pnode
and calls add_node to attempt to map them.  If add_node succeeds it
then compares scores and decides whether to accept the mapping.  If no
vnodes are unassigned then it randomly chooses an assign vnode,
removes it via remove_node and then randomly maps it to a pnode via

The Scoring Code


This initializes the scoring datastructures.  First it sets score and
violations to 0 and blanks the vinfo structure (vinfo stores counts of
each type of violations).  It then looks at all vnodes and add the
cost of an unassigned node to the score and adds a violations.  It
then goes through all vnodes, vlinks, plinks, and pnodes and sets
their data members appropriately for being unused and unassigned.


If pnode is virgin
	If matching type exists then set type of pnode to T
   Else fail
Else if types match and and pnode is not full then we're ok
Else fail

# At this point we know that we can fit the vnode into the pnode with
# all types ok.  We now go onto links.

Foreach vlink in vnode
	If destination is mapped
		# We assign links when both ends are mapped.  Otherwise we leave
		# them unassigned.
		If direct_link exists then map vlink to single direct plink
		Else if vnode and destination are on same switch then
			Find links from vnode to switch to destination and map to them.
			If interswitch path exists
				Map vlink to interswitch path
				# It's possible to score this as a violation instead of
				# a failure however it was found to do better when we failed.

# We now have all links set up we just need to do the last few things:
# increment the load, features and desires, count a pnode used if this
# was virgin, count a switch as used if this is the pnode on the
# switch, and reduce the penalty and score for a newly assigned vnode.
Finish setting up pnode


# This is almost the exist reverse of add_node
Let pnode be what vnode is mapped to.

Foreach vlink in vnode
	Unassociated all plinks
	Remove score for link type and plink costs.
	Remove any violations for bandwidth.

Remove any violations associated with no connections on this node.
Reduce the load and free up pnode is load is now 0.
Free the switch if this was the last node on the switch.
Penalize for an unassigned vnode.
Undo all features and desires.


This routine looks for a direct link from A to B.  It does this in a
pretty basic fashion.  It looks at all edges from A and looks for the
edge going to B with the least users and less bandwidth used.


Very similar to direct link except is going from A to the switch A is
connected to rather than from A to B.


Extremely basic.  Just calls find_link_to_switch on both src and dst.


Another very basic routine.  Just reads off the from the switch_preds
datastructure the shortest path between the src switch and the dst


For non interswitch links this penalizes users beyond the first.  

For all links this keeps track of bandwidth and penalizes going over
bandwidth allowance.


Exact opposite of score_link.