assign_internals.txt 15.7 KB
 Leigh B. Stoller committed Jul 03, 2002 1 2 3 4 5 6 ``````# # EMULAB-COPYRIGHT # Copyright (c) 2000-2002 University of Utah and the Flux Group. # All rights reserved. # `````` Christopher Alfeld committed Jul 23, 2001 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 `````` This document attempts to describe some of the internals of assign. Much of the simulated annealing code of assign is not documented here as it was not written by this author. What is covered is a high level overview and details of the scoring code. Overview -------- assign attempts to solve a graph to graph mapping problem using simulated annealing. The 'physical' graph consists of nodes (pnodes), links (plinks), and switches. The virtual graph consists of nodes (vnodes) and links (vlinks). As assign runs it matches vnodes to pnodes and vlinks to plinks. An incremental scoring system keeps track of the running score, both as a quality (reflected as a low score) and a number of violations. A correct solution is a solution with 0 violations. A good solution is a correct solution with a low score. Assign uses simulated annealing to attempt to minimize both violations and score. Simulated Annealing ------------------- Simulate annealing works by randomly shuffling the state of a problem around. In general changes that result in a increase in score are ignored while those resulting in a decrease are accepted. The problem with this approach is that it is easy to fall into a local minima. To avoid this in simulated annealing there is a random chance of accepting a change with an increase in score. This chance starts rather high and as the algorithm proceeds steadily decreases. The likelihood of accepting a "bad" change is reflected in the temperature. Once the temperature has reach a threshold the algorithm exits and prints out the best solution found. A good visual metaphor is to view the solution space as a hilly landscape with the altitude reflecting the score. The current solution is a boulder. During simulated annealing the landscape is shaken causing the boulder to bounce about. As time goes on the landscape shakes less and less. Changes and Scoring ------------------- The simulated annealing loop interfaces with the scoring system and the data via two routines, add_node and remove_node. Both routines update the data structures appropriately and modify the score and `````` Christopher Alfeld committed Jul 26, 2001 51 ``````violations count. add_node takes a vnode and a pnode to map it to `````` Christopher Alfeld committed Jul 23, 2001 52 53 54 55 56 57 58 59 60 61 62 63 64 65 ``````and returns whether that match is even remotely valid or not. remove_node just takes a vnode. The initial state of the mapping is assumed to be the null mapping. I.e., no vnodes have been mapped and thus no physical resources used. This starts with a violation count equal to the number of vnodes as each unmapped vnode causes one violation. The Type System --------------- Each vnode has a type and a valid solution must match each vnode to a pnode that can support that type. A vnode has a single type but a pnode may be able to support several types and even have multiple vnodes in a single vnode. For example, a standard testbed PC with `````` Robert Ricci committed Jun 26, 2003 66 67 68 69 70 71 72 73 74 75 76 77 78 79 ``````four interfaces can either be a test node or up to two delay nodes. The type system is composed of two classes of types: 'dynamic' types, and 'static' types. A pnode can always satisfy any of its static types, but has only one of its dynamic types 'active' at any time. A pnode that has not been matched to a vnode using one of its dynamic types is called virgin and can take on any of its dynamic types. Once it has been mapped to a vnode it becomes typed and is locked into a certain type (that of the vnode). When add_node is called the first thing checked is whether the pnode has a suitable type for the vnode. If the vnode uses a static type, the pnode must simply have available capacity for that type. Otherwise, dynamic types are checked. If it is virgin then all its types are checked, otherwise only its current type is checked. If the types can not be matched then add_node returns an invalid condition, otherwise it continues on. `````` Christopher Alfeld committed Jul 23, 2001 80 81 `````` Each type of a pnode also has a limit. This is the number of vnodes `````` Robert Ricci committed Jun 26, 2003 82 83 ``````of that type that can fit in the pnode. A vnode can not be placed in a pnode unless there is room left. `````` Christopher Alfeld committed Jul 23, 2001 84 85 86 87 88 89 90 91 `````` Links ----- Each vlink has the amount of bandwidth is uses. Each plink has its bandwidth capacity and two arbitrary strings which are associated with each end. These strings are not used for any scoring purposes and are just passed through, they are their for the use of the user. A `````` Christopher Alfeld committed May 29, 2002 92 ``````virtual to physical link mapping comes in one of four types. DIRECT `````` Christopher Alfeld committed Jul 23, 2001 93 94 95 96 97 98 99 100 101 ``````links represent physical wires connecting one node directly to another. INTRASWITCH links are links that go from a node to a switch and then from that switch to the destination node. INTERSWITCH links go from a node to a switch, then through one or more interconnects to another switch, and finally to the destination node. DIRECT mappings are always one to one, i.e. there is only one plink corresponding to the vlink. In INTRASWITCH maps there are two plinks for the vlink; one to the switch, and one from the switch. In INTERSWITCH maps there are at least three (to the switch, at least one interconnect, and from `````` Christopher Alfeld committed May 29, 2002 102 103 104 105 ``````the final switch to the destination) and can be any number. The final type is a TRIVIAL link. This type occurs when trying to assign a vlink between vnodes that are in the same pnode. TRIVIAL links impose zero cost and have no data associated with them. `````` Christopher Alfeld committed Jul 23, 2001 106 107 108 109 110 111 112 113 `````` For interswitch plinks (switch to switch links) any number of vlinks can use the plink. For all other plinks only a single vlink is allowed to use the plink without violations. Every vlink beyond the first results in a violation. Violations also occur for every vlink that causes the used bandwidth of a plink to exceed the allowed bandwidth. `````` Christopher Alfeld committed May 29, 2002 114 115 116 ``````A vlink can be marked emulated. An emulated vlink can share a plink with other emulated vlinks. `````` Christopher Alfeld committed Jul 23, 2001 117 118 119 120 121 122 123 124 125 ``````The switches and their interconnects are collectively called the switch fabric. The shortest path between any two switches is precomputed. This means that every interswitch link between the same two switches will use the same path. This also means that any switches connected by multiple links will only use a single link. For bandwidth calculating purposes multiple links between switches should be collated into a single link. `````` Christopher Alfeld committed Jan 15, 2002 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 ``````PClasses -------- Assign defines an equivalence relation of physical nodes and uses this relation to partition the physical space into equivalence classes. Simulated annealing is then done on the classes rather than the individual nodes. In addition a cost is imposed on each pclass used, causing assign to prefer solutions with similar nodes. Two pnodes are equivalent if they have identical type information, identical feature information, and a bijection of links exists which preserves destination and bandwidth. I.e. every link on one node can be matched to a link on the other node such that they have the same destination and bandwidth. The equivalence classes are automatically generated during the initialization step and stored in a dictionary of arrays, with keys as types. During annealing a starting pclass is randomly chosen and then the pclasses are rotated through until a valid class is found. As the list of pclasses is chosen by type assign will never look at pclasses that can not support the type of the node. A complication arises in pnodes that can contain multiple vnodes. To handle this each pclass has a list of all nodes of a given type. Say a node of type T is requested from pclass P. Assign looks at the list of T nodes in P and chooses the first one. It then removes that node from all other type lists in P. In addition, if the node is now full it is removed from the T list in P, otherwise it is left at the front. When a node is unassigned, if it is now empty it is added to the end of all type lists in its class. Otherwise, if it still has a vnode and hence a type, it is only added to the beginning of the list of its current type. By this method assign tends to fill nodes completely before moving on to a new node. VClasses -------- VClasses allow the user to specify a class of virtual nodes and then have the vnodes be of that class. Each vclass has a weight and a list of allowable types. Assign attempts to map nodes such that all nodes of the same vclass have the same type. To do this, assign keeps track of the dominant type of a vclass. This is the type that the majority of members are. Every time a member of a vclass is assigned or unassigned the dominant type and score from that vclass are recalculated. The score of a vclass (not the member nodes) is 0 if all members are of the dominant type, and the weight otherwise. A weight of 1 or more also generates a vclass violation. Due to the nature of assign when a member of a vclass is to be mapped its type must be chosen before it is fitted. It would be nice if assign computed all possible pclasses that could satisfy the member, based on its possible types, and then loop through that list. This is future work. For the moment, when assign sees a member of a vclass it chooses a type for it. A little over half the time it will choose the dominant class, otherwise it chooses randomly. Fixed Nodes ----------- Fixed nodes are a mechanism by which a user can manually specify part of the mapping themselves. Fixed nodes show up in assign as a special case. If a node is fixed it is marked as such and mapped to the correct pnode during initialization. From then on it can never be unmapped. `````` Christopher Alfeld committed Jul 23, 2001 194 195 196 197 198 199 200 201 202 203 204 205 206 ``````LEDA ---- Assign makes heavy use of LEDA. Before getting deeply into the code it would be a good idea to become familiar with LEDA's parameterized GRAPHs and other basic data structures. Overview of the Code -------------------- CC Files: `````` Robert Ricci committed Jun 20, 2003 207 208 209 210 ``````assign.cc - Contains all initialization code, including main(). anneal.cc - Contains the simulated annealing loop and related functions. `````` Christopher Alfeld committed Jul 23, 2001 211 212 213 214 215 216 217 218 219 220 `````` score.cc - Contains init_score, add_node, remove_node and all associated helper routines. parse_top.cc - Contains parse_top which parses and sets up the virtual topology. parse_ptop.cc - Contains parse_ptop which parses and sets up the physical topology. `````` Christopher Alfeld committed Jan 15, 2002 221 222 223 224 225 ``````pclass.cc - Contains the equivalence relation, the code to generate the pclasses, and the code to maintain the pclasses. vclass.cc - Contains the code to setup vclasses and calculate score deltas when nodes are mapped and unmapped.. `````` Christopher Alfeld committed Jul 23, 2001 226 `````` `````` Robert Ricci committed Jun 20, 2003 227 228 229 ``````solution.cc - Contains the code to print out solutions and other graphs. `````` Christopher Alfeld committed Jul 23, 2001 230 231 232 233 234 235 236 237 238 239 240 241 242 ``````Header Files: common.h - Contains all the parameters including all the score amounts. score.h - Defines the datastructure and prototypes for the scoring code. physical.h - Defines the physical elements: pnodes, plinks, and switch fabric structures. virtual.h - Defines the virtual elements: vnodes and vlinks. `````` Christopher Alfeld committed Jan 15, 2002 243 244 245 246 ``````pclass.h - Defines the pclass class. vclass.h - Defines the vclass class. `````` Christopher Alfeld committed Jul 23, 2001 247 248 249 250 251 252 253 254 255 256 257 258 259 260 `````` Important Globals ----------------- switch_pred_array - The predecessor array to represent the shortest path between any two switches. PG - The physical topology. G - The virtual topology. bestscore, absbest - The bestscore this iteration and the absolute best score. `````` Christopher Alfeld committed Jan 15, 2002 261 ``````absnodes - Contains the absolute best solution. `````` Christopher Alfeld committed Jul 23, 2001 262 263 264 `````` unassigned_nodes - Priority queue of nodes to be assigned. `````` Christopher Alfeld committed Jan 15, 2002 265 266 267 268 269 ``````type_table - Indexed by type, contains a list of pclasses that can satisfy that type. fixed_nodes - Indexed by vname, contains the pname the node is fixed to. `````` Christopher Alfeld committed Jul 23, 2001 270 271 272 273 274 275 276 277 `````` Initialization -------------- During the initialization process assign reads in the top and ptop files, initializes the random number generator, and calculates the shortest paths between any two switches. `````` Christopher Alfeld committed Jan 15, 2002 278 ``````This next step deserves some comments. parse_ptop will fill out both `````` Christopher Alfeld committed Jul 23, 2001 279 280 281 282 283 284 285 286 287 288 289 290 291 ``````the PG, the completely physical topology, and SG, the switch fabric. The SG graph contains slinks and snodes which exists in one-to-one correspondence to the switches and interswitch links in PG. Each slink and snode has a field, mate, which contains the corresponding edge or node in PG. Assign determines edge costs as 10000 minus the bandwidth of the interswitch link and then uses the DIJKSTRA_T algorithm from LEDA to compute the shortest paths between all switches. These paths are stored in switch_distances. For a given switch sw, switch_distances[sw] is a node_array (indexed by node) of the predecessor node. This is used by the scoring code to calculate interswitch paths. If SCORE_DEBUG is defined the both edge costs and predecessors are output. `````` Christopher Alfeld committed Jan 15, 2002 292 ``````Finally assign computes all physical equivalence classes. `````` Christopher Alfeld committed Jul 23, 2001 293 294 295 296 297 298 299 300 `````` The Main Loop ------------- A strange sequence of calls that exist for historical reasons ultimately end up with assign() which is the main simulated annealing loop. This particularly code has had at least three and possibly as many as five different authors and none of them quite understand it. `````` Christopher Alfeld committed Jan 15, 2002 301 302 303 304 305 306 ``````In essence it randomly chooses an unassigned vnode and then a pclass that can satisfy the vnode and calls add_node to attempt to map them. If add_node succeeds it then compares scores and decides whether to accept the mapping. If no vnodes are unassigned then it randomly chooses an assign vnode, removes it via remove_node and then randomly maps it to a pnode via add_node. `````` Christopher Alfeld committed Jul 23, 2001 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 `````` The Scoring Code ---------------- init_score() This initializes the scoring datastructures. First it sets score and violations to 0 and blanks the vinfo structure (vinfo stores counts of each type of violations). It then looks at all vnodes and add the cost of an unassigned node to the score and adds a violations. It then goes through all vnodes, vlinks, plinks, and pnodes and sets their data members appropriately for being unused and unassigned. add_node(vnode,pnode) If pnode is virgin If matching type exists then set type of pnode to T Else fail Else if types match and and pnode is not full then we're ok Else fail # At this point we know that we can fit the vnode into the pnode with # all types ok. We now go onto links. Foreach vlink in vnode If destination is mapped # We assign links when both ends are mapped. Otherwise we leave # them unassigned. If direct_link exists then map vlink to single direct plink Else if vnode and destination are on same switch then Find links from vnode to switch to destination and map to them. Else If interswitch path exists Map vlink to interswitch path Else Fail # It's possible to score this as a violation instead of # a failure however it was found to do better when we failed. # We now have all links set up we just need to do the last few things: # increment the load, features and desires, count a pnode used if this # was virgin, count a switch as used if this is the pnode on the # switch, and reduce the penalty and score for a newly assigned vnode. Finish setting up pnode remove_node(vnode) # This is almost the exist reverse of add_node Let pnode be what vnode is mapped to. Foreach vlink in vnode Unassociated all plinks Remove score for link type and plink costs. Remove any violations for bandwidth. Remove any violations associated with no connections on this node. Reduce the load and free up pnode is load is now 0. Free the switch if this was the last node on the switch. Penalize for an unassigned vnode. Undo all features and desires. direct_link(A,B) This routine looks for a direct link from A to B. It does this in a pretty basic fashion. It looks at all edges from A and looks for the edge going to B with the least users and less bandwidth used. find_link_to_switch(A) Very similar to direct link except is going from A to the switch A is connected to rather than from A to B. find_intraswitch_path(src,dst) Extremely basic. Just calls find_link_to_switch on both src and dst. find_interswitch_path(src,dst) Another very basic routine. Just reads off the from the switch_preds datastructure the shortest path between the src switch and the dst switch. score_link(edge,interswitchp) For non interswitch links this penalizes users beyond the first. For all links this keeps track of bandwidth and penalizes going over bandwidth allowance. unscore_link(edge,interswitch) Exact opposite of score_link. ``````