assign/anneal.cc · d7fb737d78f9f9372362b6977c32da46ff8c3e80 · emulab / emulab-devel

Fix a bug that's over 10 years old · d7fb737d

Robert Ricci authored Nov 09, 2010

I think this must get the award for oldest (frequently excercised) bug
in the Emulab source. Thanks to the magic of the git pickaxe, I
discovered that code with this problem was first committed Monday, May
22, 2000, when it used to be in assign_hw/assign.cc

assign used to use a priority queue to 'randomly' select unassigned
nodes to try assigning. Turns out this is not very random! A node can
get unlucky, get a very low random priority assigned to it, and get
stuck in the queue for a very, very, very long time. As a result, I've
seen assign try to re-assign the same node several thousand times in a
row, when there are others waiting to be re-assigned. Of course, this is
going to result in very poor exploration of the state space.

I changed the unassigned node selection process to use a regular list,
and select an item from the list at random at the time of item removal.
This gets a much more even distribution of nodes, at the small cost of
linear iteration over the list of unassigned nodes. This should be a
pretty minor cost, though, as iteration over this list should be cheap
and the list itself should generally be pretty small.

Also added some debugging statements.

d7fb737d