Changes proposed by Jason Shupe and Keith Sklower from DETER. Some of the
relevant email: Date: Wed, 18 Apr 2007 21:20:37 -0700 From: Jason Shupe <jshupe@ISI.EDU> To: Testbed Ops <firstname.lastname@example.org> Subject: [patches] tmcd.c (Jason) and elabinelab.in (Keith) Included in this email are my description of the problem, and my patches to tmcd.c, followed by more descriptions of the problem and Keith's patch to elabinelab.in. I apologize in advance for misquoting, changing and other wise abusing Keith's prose. An elab in elab experiment was started from the DeterTest project. A simple inner experiment was then started from the emulab-ops project. During experiment swap in the program agent would fail to start. If the same simple inner experiment was started under the DeterTest project or on the main testbed it would start normally. It turned out that Keith's account (among others) wasn't getting created on the inner experimental node. tmcd was only sending a subset of accounts to the experimental nodes. By digging through the database queries from tmcd.c I noticed one of the database responses contained a NULL in the g.unix_gid field. By removing the only user from the emulab-ops sub group 'ops-test' it was then possible to successfully swap in the inner experiment. I've included two different versions of an untested tmcd.c patch. Both versions include changes only to the mysql statement. Both versions of the modified mysql statements were tested on the elab in elab database after the only member of emulab-ops was re-added to the 'ops-test' group. Both queries returned all results of the original statements except the offending record with the 'NULL' value for g.unix_gid. The first patch directly excludes the offending record(s), and the second patch simple changes the _left join_'s to just _join_'s (Keith's suggestion) which also produces the same result for the data set tested. Ted reminded me that "is not NULL" is better than my initial "!='NULL'", which also produces the same results. Other suggestions on this end include specifically using "inner join", and to use both "inner join" and "is not NULL". Date: Tue, 24 Apr 2007 14:44:08 -0700 From: Leigh Stoller <email@example.com> Subject: Re: [patches] tmcd.c (Jason) and elabinelab.in (Keith) Well, unix_gid is not supposed to be null, so we should fix that problem instead, I would think. Date: Wed, 25 Apr 2007 00:35:30 -0700 (PDT) From: Keith Sklower <sklower@vangogh.CS.Berkeley.EDU> Subject: Re: [Deter-ops] [patches] tmcd.c (Jason) and elabinelab.in (Keith) It became null because of using an outer join instead of an inner join. I'll repeat the condition: 1.) the DETER emulab-ops has subgroups 2.) the inner elab group membership table and references to a group which was not inherited from the outer boss [pid=emulab-ops, gid=test-grup, uid=jhickey] So, my initial proposal was to be a bit tidier in specifying what group membership entries should be subsetted. (the was a phrase which intended to catch the group membership for anybody currently active in emulab-ops, but it was too encompassing). Date: Wed, 25 Apr 2007 13:15:51 -0600 From: Robert P Ricci <firstname.lastname@example.org> Subject: Re: [Deter-ops] [patches] tmcd.c (Jason) and elabinelab.in (Keith) I guess, then, I will commit both proposed changes to tmcd - both to make the existing join more 'correct', and to guard against other ways (ie. bad/inconsistent DB state) the gid might show up as null.