PROJECTS 11 KB
Newer Older
1 2 3 4 5 6 7 8 9 10
Things that could be done:

1. Monitor temperature sensor on CPU (mike)

	Linux and probably BSD have drivers/software for reading the temp
	sensors.  We should use, maybe in conjunction with Dave's autostatus
	to make sure things like fan failure don't kill a CPU.

2. Filesystem compression (mike)

11 12
	[ Leigh is working on this. ]

13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
	Jay's Holy Grail of Ultimate Disk Image Compression.  I see three
	ways of doing this:

	- Have the "zip" part effectively do a tar of everything in the FS
	  and make a note of how big the FS is.  Unzip will newfs a filesystem
	  of the same size and untar.  Note that I don't necessarily mean a
	  literal tar/untar as you may want to operate beneath the FS level
	  for speed.  Downside is that you don't create an exact copy of the
	  original.  If something in the original relies on exact location
	  (e.g., LILO), ya be screwed.

	- Modify zlib (or create a new OSKit library) to know about filesystem
	  formats, at least UFS and EXT2FS.  The zipper will recognize free
	  blocks and encode them special.  The unzipper will note these special
	  runs and just leave room on the disk (rather than zero them).  Note
	  the information security issue here, we need to provide the user
	  with an option to wipe the disk when they are done so that their
	  old stuff doesn't leak through into a new experiment.

	- A not so obvious solution with beneficial side-effects would be to
	  write a tool to expand filesystems like UFS and EXT2FS.  Then we
	  could make our image filesystems just big enough to hold everything
	  we care about, and create regularly zipped images of those.
	  The unzipper, expands the small disk image and then goes in and
	  grows the filesystem to some reasonable size.  A filesystem
	  expansion (contraction?) tool would probably be a nice addition to
	  *BSD and/or Linux.  Note that this assumes we are creating images
	  with only a single partition (OS), otherwise you still have to
	  fill out the first partition no matter how little of it you used.

	FYI: long ago I asked Kirk about expanding an FFS filesystem and this
	is what he said:
	---------

	From: mckusick@chez.Berkeley.EDU (Kirk Mckusick)
	To: mike@cs (Mike Hibler)
	Subject: Re: a UFS question 
	Date: Mon, 15 Jul 91 17:11:17 PDT

	        Date: Fri, 5 Jul 91 17:43:36 -0600
	        From: mike@cs.utah.edu (Mike Hibler)
	        To: mckusick@okeeffe.Berkeley.EDU
	        Subject: a UFS question

	        Occasionally we come across a situation where it would be
	        nice to expand a filesystem.  Rather than create a new
	        filesystem it would be nice if you could extend an existing
	        one.  How practical is that with the 4.2 FS?  It seems like
	        things are fairly encapsulated at the cylinder group level
	        so it might not be that hard to add more groups.  Is there
	        any static-sized global state (e.g. bitmaps in the superblock)
	        that would make this impossible?

	        Thought I would go straight to the source rather than
	        muddling around.

	In theory expanding a filesystem is not too difficult. The basic
	algorithm is to add more cylinder groups (possibly first expanding 
	an existing partial cylinder group at the end of the old filesystem).
	The easiest way to do this is to update the superblock to reflect
	the larger size and zero out the inode blocks in the new cylinder
	groups (if any). Then run fsck to update the bit maps in the new or
	expanded cylinder groups. The only caveat is that the superblock 
	summary maps are allocated in the data fragments immediately
	following the inodes in the first cylinder group. If you add enough
	new cylinder groups to overflow the previously allocated fragments,
	you will have to relocate the immediately following fragment to make
	room for the expanded summary information. On a filesystem with 1K
	fragments, each fragment holds summaries for 64 cylinder groups,
	so you can only easily increase (i.e. without relocating an existing
	file) the total number of cylinder groups to a multiple of 64.

	        ~Kirk
	---------

3. A new tip (mike)

90 91 92
	[ Mike fixed the easy stuff: ripped out most of the escape sequences
	  and dialer support, reduced to one process, cleanup tty on exit. ]

93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
	Things that are wrong with tip in the environment we use it for
	(remote console access):

	- Inflexible authentication model.  If there is no lock file and you
	  can access the tty, you are in.  We need to verify a user ID and tty
	  combo from a database.

	- Obsolete escape sequences.  Not only are useless, but interfere with
	  running things like emacs.

	- Ineffective cleanup.  You kill one half of the tip, the other side
	  might get left around, the lock file left, the tty left screwed up,
	  you name it.

	- Obsolete or inefficient model.  The two-process input/output model
	  hardly seems worthwhile.  A simple select-based scheme is probably
	  more efficient in the modern age.

	- No direct remote access.  You must run tip on the machine with the
	  serial line.  You can r/slogin to that machine from anywhere, but
	  there is an extra layer of program.

	- Inadequate logging.  We introduced a layer between the tty and tip
	  to provide good logging.  Shouldn't be necessary.

	What we want is something along the lines of our current capture,
	a server per serial port which:

	- always logs to a disk file,

	- allows one (or multiple) users can connect via sockets and get output
          from (or provide input to) the serial line,

	- ensures all such connections are access checked, with access checking
	  abstracted to allow for different implementations,

	- supports few (no?) magic escape sequences,

	- possibly encrypts traffic,


4. Make DNARDs useful (mike)

	Have someone produce interesting software for the DNARDs.  If they
	are going to be source/sinks, make useful standalone OSKit kernels
	to avoid needing NetBSD if possible.  Start with the oskit traffic-gen
	programs and beef them up (i.e., make them work on the DNARDs, make
	them remotely, dynamically controllable).

5. Teach tcpdump about spanning tree and other inter-switch traffic (mike)

	Even if it could just make it easier to identify in the output,
	or even better, give us a simple handle so that we can exclude it
	(e.g., "tcpdump not switchtraffic")  I am reminded of this by seeing
	the never-ending stream of "Unknown IPX Data" packets.
148

149
6. Reduce (or eliminate) logging in default OSes (mike)
150

151
	The standard daily/weekly/monthly activities are occurring and
152 153 154
	sending mail to root (at least in our FreeBSD image).  We should
	either eliminate this or, better, reduce it and have it logged
	on plastic or paper.  We should at least do some security checking
155
	and mail any anomalies to testbed-ops or someplace so we know if
156
	some machine has been cracked.
157

158
7. Reduce power usage (mike)
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206

	We should do what we can to reduce excess power usage.  The most
	obvious thing to do is turn machines off when they are not assigned.
	The benefits are:

		- maximum power savings
		- reduced vulnerability to script kiddies
		- longer component life?
		- no rc5des!

	The downsides are:

		- can't use unassigned machines for other purposes (abone)
		- don't save anything if the testbed is in use 24/7/365
		- frequent transitions would be harder on the HW

	At the very least, we should make our scripts capable of dealing
	with this model.  For example, os_setup would need to check and see
	if the machine is powered down and, if so, turn it on.  Likewise,
	deassignment should turn the machine off.  Note we will need to 
	differentiate "down to save power" from "down cuz it emits large
	billows of black smoke when turned on."  Autostatus would have to
	be taught to show "available/assigned/down" instead of "up/down".
	To avoid excess transitions due to frequent reassignment, we can
	delay shutting down a machine "for a while" after an experiment
	has ended.  The end-experiment algorithm might look like:

		if someone is waiting for an available node,
			immediately reassign/reload this one
		else
			"clean" (reload/zero) the disk
		if the node still isn't needed,
			power it down

	That would give a 5-10 minute window for reclamation.

	What can we do for machines that are turned on?  For assigned
	machines we probably can't dictate much.  The best we can do
	is make our default kernels "green".  Both Linux and FreeBSD
	probably idle the CPU, so we are probably ok there.  Not sure
	about spinning down the disk.  Because of daemons and logging it
	might be difficult to keep the disk spun down.  Dave/Kevin/Steve
	worked on this some on the Jaz-disk test machines.  Some keys are:

		- shutdown as many daemons as possible
		- reduce logging and log across NFS if possible
		- move unimportant logging to a memory disk

207 208
	Maybe we should disconnect one of those big case fans in each
	machine or get temperature regulated fans.
209

210 211
	Of course, the most important thing to do is determine whether
	we really have a problem and what components suck the most power.
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261

8. Get ARM/Linux running on the DNARDs (mike)

	I spent a couple of days on this, long enough to get a diskless
	system basically working, figured out how to build a kernel, and
	discover all the problems that we need to fix.  For the complete
	lowdown on my long, strange trip into DNARD/Linux, see
	~mike/flux/doc/linux-dnard.txt.

	Here are some of the things we still need to do:

	1. Linux is not well setup for operating a large number
	of diskless clients.  It has provisions for booting each node
	with a root of /tftpboot/<ip-addr> and then all mounting a
	common /usr, but the root filesystem is still on the order
	of 40MB.  10MB of this is /lib, which has lots of shared
	libraries (glibc alone is 4MB) needed for binaries in /sbin.
	14MB is /var, most of which is the RPM database (since this
	disk images was loaded with about every package known to man).
	Even if we go with the 40MB roots, I still had to hack some
	startup files to deal with the NFS root.  In particular, /
	must be in the fstab but fsck will fail if / is an NFS
	filesystem (duh!)  Made a gross hack to deal with that
	(look for .I_am_an_NFS_rootfilesystem) in rc.sysinit.
	Also make sure ONBOOT=no in ifcfg-eth0 else it will hang
	trying to initialize eth0 (which was already inited because
	of NFS root).

	2. There is a known NFS bug in pre-2.4 kernels which cause
	much grief with diskless systems.  Has to do with the old
	open-and-unlink-a-file-but-still-have-access semantic.
	We need a newer kernel.

	3. Apparently you cannot use the PIT to get a periodic
	interrupt on the DNARDs.  Thought this was a Linux problem
	but NetBSD doesn't use it either.  Both use the RTC at 64Hz.
	However, the Netwinder application base we are using doesn't
	recognize 64Hz as a valid value and defaults to 100Hz.
	Probably throwing lots of timing related things off.
	We need to rebuild the appropriate shared library and
	affected static binaries.

	4. Related to #3 is just the general problem that the Linux
	setup relies on mish-mash of kernel/binary releases.
	We should build our own system from the sources.  The
	kernel may always be a problem since the Shark code is
	bit-rotting in the ARM linux tree.

	5. Reboot doesn't work, it just hangs.