• Robert Ricci's avatar
    Changes proposed by Jason Shupe and Keith Sklower from DETER. Some of the · 733ebb12
    Robert Ricci authored
    relevant email:
    Date: Wed, 18 Apr 2007 21:20:37 -0700
    From: Jason Shupe <jshupe@ISI.EDU>
    To: Testbed Ops <testbed-ops@emulab.net>
    Subject: [patches] tmcd.c (Jason) and elabinelab.in (Keith)
    Included in this email are my description of the problem, and my patches
    to tmcd.c, followed by more descriptions of the problem and Keith's
    patch to elabinelab.in.  I apologize in advance for misquoting,
    changing and other wise abusing Keith's prose.
    An elab in elab experiment was started from the DeterTest project.  A
    simple inner experiment was then started from the emulab-ops project.
    During experiment swap in the program agent would fail to start.  If the
    same simple inner experiment was started under the DeterTest project or
    on the main testbed it would start normally.
    It turned out that Keith's account (among others) wasn't getting created
    on the inner experimental node.
    tmcd was only sending a subset of accounts to the experimental nodes.
    By digging through the database queries from tmcd.c I noticed one of the
    database responses contained a NULL in the g.unix_gid field.  By
    removing the only user from the emulab-ops sub group 'ops-test' it was
    then possible to successfully swap in the inner experiment.
    I've included two different versions of an untested tmcd.c patch.  Both
    versions include changes only to the mysql statement.  Both versions of
    the modified mysql statements were tested on the elab in elab database
    after the only member of emulab-ops was re-added to the 'ops-test'
    group.  Both queries returned all results of the original statements
    except the offending record with the 'NULL' value for g.unix_gid.
    The first patch directly excludes the offending record(s), and the
    second patch simple changes the _left join_'s to just _join_'s (Keith's
    suggestion) which also produces the same result for the data set tested.
    Ted reminded me that "is not NULL" is better than my initial "!='NULL'",
    which also produces the same results.  Other suggestions on this end
    include specifically using "inner join", and to use both "inner join"
    and "is not NULL".
    Date: Tue, 24 Apr 2007 14:44:08 -0700
    From: Leigh Stoller <stoller@flux.utah.edu>
    Subject: Re: [patches] tmcd.c (Jason) and elabinelab.in (Keith)
    Well, unix_gid is not supposed to be null, so we should fix that problem
    instead, I would think.
    Date: Wed, 25 Apr 2007 00:35:30 -0700 (PDT)
    From: Keith Sklower <sklower@vangogh.CS.Berkeley.EDU>
    Subject: Re: [Deter-ops] [patches] tmcd.c (Jason) and elabinelab.in (Keith)
    It became null because of using an outer join instead of an inner join.
    I'll repeat the condition:
    1.) the DETER emulab-ops has subgroups
    2.) the inner elab group membership table and references to a group
        which was not inherited from the outer boss [pid=emulab-ops,
        gid=test-grup, uid=jhickey]
    So, my initial proposal was to be a bit tidier in specifying what
    group membership entries should be subsetted.
    (the was a phrase which intended to catch the group membership
    for anybody currently active in emulab-ops, but it was too encompassing).
    Date: Wed, 25 Apr 2007 13:15:51 -0600
    From: Robert P Ricci <ricci@cs.utah.edu>
    Subject: Re: [Deter-ops] [patches] tmcd.c (Jason) and elabinelab.in (Keith)
    I guess, then, I will commit both proposed changes to tmcd - both to
    make the existing join more 'correct', and to guard against other ways
    (ie. bad/inconsistent DB state) the gid might show up as null.
tmcd.c 173 KB