• Kevin Atkinson's avatar
    · 183040de
    Kevin Atkinson authored
    Many changes to tblog code.  Database update needed:
    1) Added summary of failed nodes is os_setup.  The cause of the error is now
    classified as "user" if it is only user images that failed and the user
    image failed on every pc of a particular type.  Otherwise I leave the cause
    as "unknown" since it is really hard to tell what the real cause is.
    2) Raised the confidence threshold for most errors so that they will appear
    on the top.
    3) Added a special error when an experiment is canceled.  The cause is
    "canceled" and testbed-ops won't see these errors.
    4) Fixed a bug in assign_wrapper where it will incorrectly report "This
    experiment cannot be instantiated on this testbed..." when really the user
    canceled the swapin.
    5) Fixed a bug where os_setup errors where being incorrectly reported as
    assign errors.  This happens when os_setup fails for some reason and
    tbswap tries again, but the second time around there are not enough nodes.
    So the last error is coming from assign even though the true cause of the
    error is due to failed nodes.  The fix for this involved added a new column
    to the log table, "attempt", which will be 1 for the first attempt and then
    incremented for each new attempt.  tblog_find_error will then simply ignore
    any errors with "attempt > 1".
    6) Also fixed a potential problem when there is an error during the cleanup
    phase by adding another column "cleanup".  tblog_find_error will
    also ignore any errors with the cleanup bit set.
batchexp.in 32.4 KB