-
Mike Hibler authored
I was expanding a global list in a loop for every node. So for each node, I was finding all the delta images in the ever-growing list and adding their dependencies (again!) making the list even larger. In an experiment loading a two-level delta image on 8 nodes, the list included 40+ copies of the same three images to load by the time we got to the last node. However, no node attempted to load all those images because tmcd exceeded its reply buffer size on the "loadinfo" call and would not return anything. Of course, by then we had computed a max wait time based on image.max_wait * 45 so the experiment suffered a slow, lingering death even though the nodes were not doing anything. Beware, I do not know if I got the "access key" code right for remote nodes. Not even sure if we use that path anymore. I attempted to fix it in libosload, I did not even try in libosload_new.
a62de08a