[This file is not kept entirely up to date.] * Fix tmcd/fork problem (events out of order). * Supply form args to beginexp page, and allow inline NS files. * From: Jay Lepreau Subject: expt phase transition durations Date: Fri, 18 Jul 2003 14:03:48 MDT Some sub-times would be really useful too, like how long assign took to run. * Feed back jail changes to FreeBSD.org. * I think these swap/modify errors should include the ns file. Well, certainly if it's a parsing error. * From: Jay Lepreau Date: Thu, 10 Jul 2003 10:56:37 MDT 1. I think most of us are in agreement that in the big visualizer the host/lan icons should be replaced by plain boxes and circles. I think Chad might have even been convinced before he left, after seeing how nice the little ones looked. 2. We definitely need some smaller scale options for the big topos arising from vnodes, the ones that Mike was playing with. There should be some heuristic to decide what the default should be when the big viz page is first displayed. Ie, doing the really big one first (100%) is often just a waste of time and annoys the user while it loads. 3. Non-tweaks (ie, significant work, but maybe not hard): Really big topos need something better. The Tamara Munzner/then-CAIDA tool that lets you zoom in on an area would have been just the thing for that really dense/overlapping one that was out there a week ago. To support that we need some kind of export and viewer configuration capability. (This fits into my vision for a one-stop shop for network expts, where people will use Elab even for big pure simuations, because it will be linked in to great computational resources and have all sorts of handy tools.) * From: Eric Eide Date: Thu, 10 Jul 2003 11:12:19 -0600 Something that I would like to be able to do is to run a batch experiment that builds a disk image, and and at the end of the batch script, have the disk image dumped. Any clever ideas about how one might go about doing this? * Fix /etc/named.conf problem on RON image and build new image. * Run neato and the visualization code on ops cause of CPU load on big experiments. * Change ISO version number. * Adjust web page width to actual display width the user is using. * From: Mike Hibler Date: Tue, 8 Jul 2003 11:01:00 -0600 (MDT) BTW: shouldn't we have a "node types" page where people can find out info about the various machine types (CPU, memory, etc.)? * From: Mac Newbold Date: Tue, 8 Jul 2003 10:57:22 -0600 (MDT) >> mangled because the regular output goes to stdout and the error >> messages go to stderr. I've fixed that problem. > >Well, we list them in the up/down and reservation page, so people probably >see the faster ID and try to get them. Maybe they should not be public >(add a "public" bit to the nodes table?). If you're going to add one, I'd probably vote for doing it per-type instead of per-node. So in node_types you could set what things are public and what aren't. Then we also wouldn't need holding expts for nodes of odd types, and it would be okay to have some of them sitting around free. * From: Timothy Stack Subject: shell env for startup scripts Date: Mon, 7 Jul 2003 10:43:38 -0600 (MDT) The shell environment that the tb-set-node-startup script uses can be quite a bit different from the one you get when logged in remotely. This can make things kinda complicated when trying to automate things, since you don't quite know what you gonna get. I'm particularly interested in the ACE_ROOT/TAO_ROOT variables that are set by the scripts in /etc/profile.d/. * The routes table is a problem. Currently 129000 rows, averaging 86 bytes, for a total of almost 12megs. That on main, with no virtnodes (1000 node experiments!). The representation is also grossly inefficient (like, pid/eid in every one of those rows!), and using text representation of the IP addresses (src,dst,nexthop,mask). We could probably cut that back by 60%, but its still too much stuff to keep around for every swapped out experiment. We should run the route setup at swapin, and if the user wants to see them when its swapped out, we can run the route calculator on the fly and dump the output to the web page. * Add routes/events option to web page, since those are not in the email any longer. * Other items. We better start saving the thumbnails in the experiment archive directory too (expinfo). We should also be re-rendering after a modify. We should also save the XML representation to avoid having to reparse old NS files, although there is some versioning issues with this. * Auto discovery of new nodes. * Rob is working on this. * Viz can't handle multiple links between nodes. * Change netbuild to speak XML (in both directions). Related: Might require addition of a DTD or Schema to our XML format. * From: Jay Lepreau Subject: Snapping an image - physnode menu item Date: Tue, 01 Jul 2003 17:39:36 MDT It's always seemed logical to me to have such a menu item on the (physical) "Node Information" page. In fact, several times I've looked for it there, only to remember it doesn't exist, and I need to go to "ImageIdS and OSIDs" (which is not an "action" item, and is not associated with an experiment). * Email archive for project email lists. * Add NS file history, and links to view/create from old NS files. I think this might be pretty cool to do, and its not that hard. Change the nsfiles table to index by the unique experiment idx, instead of pid/eid, and add a copy of the original description field to it. Then link it off the users "Show History" link, which has his experiment log. We could load up the old NS files, but I think thats enough of a pain that I would not nother too. From Dave: > key by the md5 hash of the ns file and store it by that: > > table 1: > exp pid date md5_of_file > > table 2: > md5_of_file contents_of_file > > That has the advantage of keeping the commonly accessed > table (table 1) small and statically sized, whereas the > slow and dynamic TEXT records are then in table 2. Could also hang it off the experiment_resources record, which would address the swapmod issue, since a new resource record is created at each swapmod. * Secure the event system so that experiments cannot spoof/snoop each other. * Fix case sensitivity of event system (and agents, scheduler). * Fix idleswap auditing. * Aren't users required to fill in their first and last names? Also note the state. *** Major: * Fix event scheduler for experiment modify so that it can add new events from the current time index, instead of from time 0. * Break up emulab into smaller components (for example, split of account and group stuff so its independent. * Fix the entire nalloc/nfree/reloading mechansism and the state control stuff for it that is scattered around nfree, tmcd, stated, and the reload daemon needs a complete overhaul. Many races, many oppotunities to fail. Mac is thinking about this. * Event system startup cost. Abhijeet reported that after ISUP, it could take a very long time for events to start. This is because it takes a really long time to process the event stream in event-sched using Ian's original binary tree stuff. I hacked in a fix, but need to look at that algorithm and perhaps change. Need to decide if insertion needs to be optimized, over deletion. * Continuing work on jails for both local and remote nodes. * Need to default the OS id version (4.3, 7.1) if we are going to delay reloading, or else people can get old versions of the OS when in the same project (last_reservation). This might be moot depending on what we do wrt reloading when experiments are done. * tmcd does not appear to be scaling with the advent of ssl. Rob suggested a combined tmcd command to return the entire node configuration in one message. We would still keep the individual calls, but provide a way to get all the data at once and save on a dozen connections per boot. LBS: local nodes using tmcc-nossl since we get security via the switch. Wideare nodes *do* use ssl. * Complete event system overhaul (per-exp elvind, secure elvind, per-node elvind, distribution of event lists to nodes). Cannot multicast events to multiple agents at a time. * Get the program agent working on ron nodes. This is related (and dependent) on securing the event system since we do not want anyone to be able to send ab event to the program agent from some random node! * Deal with two ends of a remote link being allocated to the same ron node! Need to catch the situation for now (and error), but eventually make sure it does not happen since setting up a tunnel from a node to itself sound rather silly. * Investigate other types of tunnels for wideare nodes. Perhaps ipsec AH tunnels. * Fix mountd-invalidating-current-mounts problem. * Automated swapping (with disk saving) support so that we can swap out experiments and save per-node images for people. Requires a lot of disk space. *** Medium: * Change batch system to handle limited-use disk images like timesys. * CDROM changes: 1) Add per host certificates. * Switch rpm/tar file to non-nfs solution. Perhaps a ftpd like daemon which does some of the same checks that tmcd does. Or maybe a tmcd variant that does nothing but serve up files according. Maybe it does not need to be separate, but seems like putting this directly into tmcd is a bad idea. Maybe not. LBS: This is now done for tar files on widearea nodes, and in jails. Needs to be done with RPMs too. * Clean up osid/imageid mess. * Need to add a "kill runnin frisbee" function so that creating new images does get frisbee messed up. * Front end support for changing delay/bw/plr asymmetrically in events. Currently, we can do queue params, but the basic delay, bw, and plr and can be only be done symmetrically. Related: Add NS event support for some of the tb- commands. For example, tb-set-lan-simplex-params. This is actually harder, since lans are not directly controllable at the link level anywhere in the system. delay_config chokes on lans at the moment for this reason. * Support images with more than once slice (but not the entire disk). At present, people can make use of the 4th slice, but cannot save it with an image, unless they create an entire disk image, and we do not allow mere users to do that. The current disk image stuff is not flexible enough to support arbitrary slice definitions (save slice 2-4). Related: Kill deltas! * Add MBR initialization to all images, perhaps as a special Frisbee operation, like Mike did for slice 4. This is to prevent problems with people messing up the MBR. * Change hardwired degree 4 for vrons->rons to more flexible DB management. Related would be dynamic creation of virtual nodes instead of hardwired entries in the nodes table, but thats a lot of work, and might not be worth it. * Script to remove old log files from the mysql directory. Also remove old backups from the mysql backup dir. These files are taking up a huge amount of space, and /usr/testbed/log has been filling up a lot. * Add some kind of host table support to RON nodes so that programs can figure out IPs. This is going to be a pain. * Support for protocols other than IP. Mike reported some issues related to this in email of Fri, 17 May 2002 10:05:41. * Bring in a bug tracking system we can use from the web interface. Need someone to look around for this. I hate GNATS! Rob mentioned RT (http://www.fsck.com/projects/rt). Eric mentioned Bugzilla and Jitterbug * Noswap bit to prevent users from swapping special experiments that have things like SPAN turned on. *** Minor: * When I syntax check an ns file, and it fails, it would be handy to have a one-click way to check the same file again. (My Tcl isn't so good.) * Change logs for group experiments from /proj//logs/' to `/groups///logs/'. * Clean up ISADMIN() and ISADMININISTRATOR() calls in php pages. * > Mapping RHL-STD on pc92 to emulab-ops-RHL73-STD. > *** Tarfile /usr/local for node pc96 does not exist! Can't we check the validity of these paths during the parse phase and fail a lot sooner? * Macrofy the signature of the email (currently "Testbed Ops"). * FAQ entry for lilo: To access partitions on the disk outside of the C:H:S tuple limit (8.4 GB), you must add 'lba32' to the global options section. Not a big deal, but requires someone who knows lilo to verify and to test it. * Fix "no networks link warning" to deal with remote node links. * DB consistency checker; to run at night and as part of flest. * I'm sitting here looking at the "details" page for an experiment and no where obvious on this page does it show the name of the experiment. If I scroll all the way down to experiment details, there it is. How about putting it over the vis image or making it part of the vis image? Somewhere right at the top. * allow user to specify OSIDs for their delay nodes. Not entirely sure how, since delays are chosen late in the game, but at the moment its difficuly for people to customize delay nodes. * Fix up hostnames generation from tmcd for lans. Currently, if you have a link and a lan to the same node, you get two aliases. * Add web interface for generating simple hardwired topologies. ie: "Just give me some nodes in a lan." > From: Jay Lepreau > To: stoller@fast.cs.utah.edu > Subject: For the 'todo' list -- project and user "archiving" > Date: Tue, 11 Jun 2002 20:49:44 -0600 (MDT) > > I don't want to delete projects and probably not users (typically). > I want to "retire" them, or move them to "alumni"/archive status. I > want to keep them in the DB for analysis and statistics reasons, but > not keep them around forever as active because they clutter things > up. > > Also, the users might be reactivated if they start a new project. > > These take a little thought because of name space issues, at least. > Maybe more issues (eg how does an inactive user get reactivated > unless their password is still valid?). > > ANyway, just something for the list, and keep this message in the > details part.