Skip to content
  • Leigh B. Stoller's avatar
    Such a brutal ElabinElab hack ... When trying to swapin an actual · 0749ef9c
    Leigh B. Stoller authored
    experiment from the web interface, I ran into another control network
    problem, this time in bootinfo. When a node is sitting free, it waits
    in pxeboot for a bootinfo packet from boss to tell it what to do (this
    is different then when the node is allocated, and bootinfo tells it
    what to do in a reply to the initial request). In the PXEWAIT case, we
    *send* it a packet, addressed to its *control network* address, which
    in the inner DB, is on the inner control network, but of course PXE is
    really using the outer control network, so packets addressed to inner
    control network are never seen by pxeboot.
    
    This is the only (known) case of this happening, and rather then try
    for some general, over engineered solution, I did something unusual,
    and put in a hack, ifdefed for ELABINELAB (meaning, its an inner
    elab). I know, you're thinking, how could he have done such a thing,
    its so unlike him!
    
    Well, it was damn easy! Anyway, this little hack checks the DB for an
    interface tagged as role='outer_ctrl' and uses that IP instead of the
    inner control network. When I create the inner DB from the outer DB, I
    was already leaving the outer control network in place so that
    bootinfo could find the proper node (again, cause the bootinfo request
    packets are coming from the outer control network, and so its IP would
    not match any nodes in the DB).
    
    I'd like to say that this is the last problem with swapin, but I see
    in my other window that the event scheduler failed to start on inner
    ops with some silly error ssh permission denied error. Whats that all
    about?
    0749ef9c