- 13 Dec, 2010 1 commit
-
-
Jonathon Duerig authored
Various fixes. Notably, pssh needs to be run in interactive mode or it just hangs mysteriously while you beat your head against your keyboard trying to understand why.
-
- 08 Dec, 2010 2 commits
-
-
Jonathon Duerig authored
-
Jonathon Duerig authored
-
- 07 Dec, 2010 3 commits
-
-
Jonathon Duerig authored
-
Jonathon Duerig authored
-
Jonathon Duerig authored
-
- 20 Nov, 2010 1 commit
-
-
Ryan Jackson authored
Fixed a segfault in the new event scheduler Cleaned up some code that dealt with converting xmlrpc_c::value_strings to C string constants to avoid further memory corruption issues. Also converted AddAgent(), AddGroup(), AddUserEnv() to take 'const char *' pointers instead of 'char *'s to silence GCC's warnings about invalid conversions from 'const char *' to 'char *'.
-
- 17 Nov, 2010 2 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
This prevents it from powering off nodes and generally being quite so anal about "security violations" in the SECURE boot/load path. We will leave this on til we get all the d710 kinks worked out.
-
- 12 Nov, 2010 1 commit
-
-
Mike Hibler authored
The logic was not quite right. Also, don't send a SHUTDOWN event when powering off a node.
-
- 10 Nov, 2010 1 commit
-
-
Mike Hibler authored
stated POWER* triggers will now actually do something!
-
- 09 Nov, 2010 2 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
- 03 Nov, 2010 1 commit
-
-
Ryan Jackson authored
-
- 28 Oct, 2010 2 commits
-
-
Ryan Jackson authored
Modify the event scheduler to use the BSD-licensed xmlrpc-c library instead of the LGPL-licensed ulxmlrpcpp.
-
Ryan Jackson authored
-
- 20 Oct, 2010 1 commit
-
-
Mike Hibler authored
(eventual) support for NFS servers without race conditions! This means no NFS between nodes and ops/fs. There are still NFS mounts of ops on boss however. Added new defs-* variable NOSHAREDFS, which when set non-zero will disable the export of NFS filesystems to nodes. Involved lots of little changes: * /users, /proj, and /share filesystems are not exported to nodes. * Returned mount info now includes an FSTYPE key which will be set to "LOCAL" if NOSHAREDFS is in effect (by default it is set to "NFS-RACY"; more on this later). In the case where it is set to LOCAL, the other mount lines no longer contain REMOTE=foo settings. Because of this change, THE TMCD VERSION NUMBER HAS BEEN BUMPED TO 32. * The client rc.mounts script will now create local versions of /users/*, /proj/<pid>, and /share when FSTYPE=LOCAL. It first runs mkextrafs to create a large partition for these, since someday we will likely want to pre-populate these with a non-trivial amount of data. Right now, the only thing that is put in the user's homedir is the standard dotfiles for the OS and the Emulab authorized_keys file (so you can login). * Linktest had to be modified to fetch the various results files (via loghole) rather than just assuming they were in /proj. And also changed to invoke tevc with the local copy of the event key so it won't try to read it over NFS. * create_image was modified to ssh to the node and run the imagezip command, capturing the output of ssh. This is controlled via the "-s" option which defaults to on for a NOSHAREDFS system, but can also be used on a normal system. * elabinelab's can be configured with/without a shared FS via the CONFIG_SHAREDFS attribute (note polarity change) which defaults to 1. Another new defs-* variable, NFSRACY, will some day allow you to specify (by setting to 0) that your NFS server does NOT have the nefarious mountd race condition when changing /etc/exports. Currently, this defaults to 1 since all versions of FreeBSD supported as an "fs" node have this "feature." Rumor has it that FreeBSD 8 does not have this problem nor, presumably, would a Linux NFS server. The only use of this variable right now is to set the FSTYPE returned by the tmcd "mounts" call, which in turn is used by one client script, rc.topomap (via a libsetup function) to determine whether it should try copying the topo file multiple times. Random: add python2.6 to list of python's checked for in configure. Random: resync defs-example-privatecnet with defs-example. Random: did a little code-pissin here and there.
-
- 19 Oct, 2010 1 commit
-
-
Mike Hibler authored
In elab_linktest.pl, put explicit command line arguments after the defaults so that they will override them in linktest.pl. Also, don't conflate LOGDIR and PROJDIR. Half-assed attempt to free the script from the notion of a pid/eid, by defining EVENTID param. However, we still wind up reading a pid/eid from a file though this could also be eliminated.
-
- 12 Oct, 2010 2 commits
-
-
Mike Hibler authored
This will used the local copy of the keyfile (/var/emulab/boot) instead of the across-NFS copy (/proj/.../tbdata).
-
Mike Hibler authored
-
- 11 Oct, 2010 2 commits
-
-
Jonathon Duerig authored
-
Jonathon Duerig authored
-
- 29 Sep, 2010 2 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
Under load, nodes that have just entered reloading and have just rebooted might fail to get bootinfo. The default behavior in this case is for the node to boot from disk (dubious, but that is the topic for another day). This causes the node to fall off the RELOAD path, winding up in either TBFAILED or ISUP. Worse, if the node makes it to ISUP, its reload state is cleared and even if the reload_daemon reboots the node, it will still not go through the reloading process. The result is a bunch of nodes left in reloading. Now if a node makes an invalid transition to TBFAILED or ISUP while in the RELOAD state machine, it fires the new REBOOT trigger which does...well, you figure it out. Note that in the ISUP case, this trigger overrides the default that would otherwise clear the reload state--so reboot is sufficient to get the machine back on the RELOAD track.
-
- 18 Sep, 2010 2 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
This is part of cleaning up the name space of pubsub. Previously, this program was getting the "info" call from libpubsub and that version is going to become private. So use the one in libtb instead.
-
- 14 Sep, 2010 3 commits
-
-
Mike Hibler authored
-
David Johnson authored
-
David Johnson authored
-
- 23 Aug, 2010 1 commit
-
-
Mike Hibler authored
Delay agent won't build on FreeBSD 8.x right now due to dummynet API changes. Not even sure we will bother to fix this since we have a newer, more OS independent agent.
-
- 20 Aug, 2010 3 commits
-
-
Mike Hibler authored
"make -C dir foo" != "cd dir && make foo" PWD in a shell script evaluates different, presumably because it is inherited from the invoking shell/make.
-
Mike Hibler authored
-
Mike Hibler authored
-
- 18 Aug, 2010 4 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
Use "uint32_t" instead of "unsigned long" for on-the-wire data.
-
- 22 Jul, 2010 1 commit
-
-
Leigh B Stoller authored
-
- 17 Jul, 2010 1 commit
-
-
Mike Hibler authored
-
- 14 Jul, 2010 1 commit
-
-
Leigh B Stoller authored
From: Leigh Stoller <stoller@flux.utah.edu> Date: July 1, 2010 11:35:08 AM MDT Subject: Re: Event System Issues Mike and I exchanged a couple of email messages. Mike has indicated that we should drop all Elvin support. A nice goal, but not possible cause of the basic mistake I made in "version 0" of the event code. What we *can* do is stop *generating* the elvin hashes completely in Version 1. We also drop the elvin_gateway, thus no longer supporting ancient images that are still using the real elvin libraries. I am okay with this. Comment if you have objections. Version 0 elvin-compat (these are pubsub clients with ELVIN_COMPAT=1) binaries will work cause of the magic ___elvin_ordered___ flag, which actually tells the client that the hmac is a pubsub ordered hmac. (I know, I am just great at naming things). I just add this little flag to events in version 1. Version 0 non-elvin-compat binaries will not work, but this is okay. The only case this matters right now is Protogeni, where we need to be able to talk to non-elvin-compat binaries at remote sites. I have solved this with a version0 gateway as described in the previous message. ops will run a secondary pubsubd on another port, and the protogeni client startup code will have clients connect to that pubsubd instead. This is mostly tested, and it can roll out to other sites as needed, once we roll out cooked mode. Version 1 clients are fully interoperable. Lastly, we still need to be able to compare elvin HMACs coming from existing version 0 elvin-compat binaries (from our many many system and custom images). Thats cause all those images are still going to be generating the HMACs in elvin order, and so the server programs on ops (event scheduler, tevc, etc) need that elvin hashing code built into it. See first paragraph. Anyway, I have all this done and tested on my elabinelab. I had wanted to make it in time for code freeze, but not enough time to get proper debugging, so I will push once code freeze is over. Lbs
-