Commit c34c1178 authored by Mike Hibler's avatar Mike Hibler
Browse files

First foray into using an MD5 hash to improve speed.

Currently, this only means defining a signature file and creating a
utility to make them and check them against a disk.  The signature file
is not used by frisbee/imageunzip yet.
parent 4461b987
......@@ -37,9 +37,11 @@ endif
#PTHREADCFLAGS += -DCONDVARS_WORK
CFLAGS = $(SUBDIRCFLAGS) -I$(SRCDIR) -static
LIBS = -lz $(PTHREADLIBS)
LIBS = -lz
UNZIPCFLAGS = $(CFLAGS) $(PTHREADCFLAGS) -Wall
UNZIPLIBS = $(LIBS) $(PTHREADLIBS)
HASHCFLAGS = $(CFLAGS) $(PTHREADCFLAGS) -Wall
HASHLIBS = $(LIBS) -lcrypto $(PTHREADLIBS)
# UFS/UFS2
ifeq ($(WITH_FFS),1)
......@@ -75,7 +77,7 @@ SUBDIRS += fat
FSLIBS += fat/libfat.a
endif
all: $(SUBDIRS) imagezip imageunzip imagedump
all: $(SUBDIRS) imagezip imageunzip imagedump imagehash
include $(TESTBED_SRCDIR)/GNUmakerules
......@@ -91,11 +93,18 @@ imageunzip.o: imageunzip.c
imagedump: imagedump.o version.o
$(CC) $(CFLAGS) imagedump.o version.o $(LIBS) -o imagedump
imagehash: imagehash.o version.o
$(CC) $(CFLAGS) imagehash.o version.o $(HASHLIBS) -o imagehash
imagehash.o: imagehash.c
$(CC) -c $(HASHCFLAGS) -o imagehash.o $<
ffs extfs ntfs fat:
@$(MAKE) SUBDIRCFLAGS="$(SUBDIRCFLAGS)" -C $@ all
imagezip.o: sliceinfo.h imagehdr.h global.h
imageunzip.o: imagehdr.h
imagehash.o: imagehdr.h
version.c: imagezip.c imageunzip.c imagedump.c
echo >$@ "char build_info[] = \"Built `date +%d-%b-%Y` by `id -nu`@`hostname | sed 's/\..*//'`:`pwd`\";"
......
Thoughts and initial work on using digest/hashing techniques to improve
frisbee performance.
Create a "signature" file for an image using a collision-resistant hash
like MD5 or SHA-1. When imageunzip (or frisbee) run, they first load
in the signature file and use that information to check the current
contents of the disk/partition. It then fetches/decompresses/writes
only data that are not already correct. For this to be at all practical,
we must meet two criteria:
1. The current contents of the disk must be largely similar to what we
want to load. In the common case in Emulab, this is true: people load
a disk with the standard image (both BSD and Linux), use one of BSD or
Linux for awhile, and then quit. Most of the OS they used, and all of
the one they didn't use, will be unchanged.
2. It must also be the case that reading/hashing are significantly
faster than decompression/writing. This of course depends on the
processor, memory and disk speeds but is in general true. For our
machines, here are some numbers (all MB/sec):
type read(1/6) write(1/6) md5 inflate
pc600 22.1/18.4 19.6/19.6 71.6 32.5
pc850 26.7/28.0 21.4/21.4 86.4 45.1
pc2000 43.4/43.4 38.9/38.8 242.1 63.2
The two numbers for read/write are numbers for reading 1GB of data
starting at the beginning of the disk (i.e., the first GB) and starting
at 5GB (i.e., the 6th GB). They show a little of the slowdown as we
get out further on the disk (We mostly live in the first 6GB).
MD5 and inflate numbers are from the then standard FBSD+RHL image.
So we can see that there is some margin. Contrary to my gut feeling,
MD5 hashing is way faster than decompression, likely because the
latter involves not only a memory read for every byte, but multiple
writes as well.
I have already written an "imagehash" utility, that can create the
signature files (MD5 or SHA-1) and can be run to check the signature
vs. a disk. Currently it creates a 16 (or 20) byte hash for every 64KB
of allocated data in every region of every chunk. Thus it is finer-
grained than chunks or regions in chunks. Imagehash overlaps disk reads
with hashing, so it is a good indication of the best we can do. Running
this on a node, sitting in the frisbee MFS we get, compared to frisbee
time to load the disk (in seconds):
type frisbee imagehash save imagehash serial (R+H)
pc600 93.6 81.1 13.4% 85.1 (62.4 + 21.9)
pc850 82.3 65.4 20.5% 87.3 (68.5 + 18.1)
pc2000 68.0 44.9 34.0% 55.8 (48.7 + 6.5)
"imagehash serial" shows a run without overlapping reading with hashing
along with the broken out time for those two phases. Note that there are
some bizarre effects here that I don't yet understand: 1) pc850s are slower
than pc600s to read the disk when serially imagehashing, yet the disks
are faster, 2) pc850s and pc2000s show extremely good overlap of IO with
hashing, which pc600s show almost no improvement (85.1s to 81.1).
Anyway, we do see that imagehash is faster than frisbee by 13-34%.
However, in that saved time, we need to be able to transfer over, decompress,
and write any actual changes. How much of this standard frisbee action
we can overlap with the hashing is important. If we have to do the
hashing and then do the frisbee, we aren't going to win to any meaningful
degree (I contend that we have to gain at least 10% to make the complexity
worthwhile).
Here are the logical steps involved:
1. transfer signature file to node
2. read data from disk
3. hash data and compare to sig
4. download chunks for data we need
5. decompress chunks
6. write data we need to disk
2-3 are what imagehash does, 4-6 are basically frisbee but requesting
only specific chunks and possibly writing only select data from the chunks.
1. Transfer signature file.
A signature file could be anywhere from around 1KB to multiple megabytes
depending on the granularity of data that we hash and how big the image is.
We could just embed this info in the image file itself, but that at least
partially defeats the purpose of doing this which is to not download data
we don't need. We could transfer the info down in advance, possibly
using frisbee, but we might want to transfer it "out of band" for at
least one other reason. If we transfer the .sig over in a secure transfer,
we can also use the .sig to ensure integrity of the frisbee distributed
data by adding a step 5.5: compare hash of expected data vs. what we
actually received.
As for hashing granularity, there are three possibilities. At the
coarsest granularity, we could compute a hash for every chunk. This
produces the smallest signature and simplifies things, but decreases
the effectiveness: a chunk can represent a huge amount of decompressed
data, and we would have to make a hashing pass over all that data before
we could make a decision about whether we need to request the data or not.
The next level is to hash each region within a chunk, but it isn't clear
that this provides enough additional granularity. In the case of a raw
image or a densely populated filesystem, there will only be a single
region per chunk and we gain nothing over chunk granularity. Finally,
we can break every region into fixed-sized pieces and hash those pieces.
This is what imagehash currently does, for every region there are some
number of 64KB pieces and possibly one final smaller piece which are
individually hashed. This allows a much quicker decision about whether
a chunk (or part of it) is needed, at the expense of a larger signature
(825KB for our 460MB combined BSD/Linux image). I haven't played with
making the hash-unit larger or smaller.
2 (read) and 6 (write). Disk IO.
Do we want to structure these as separate threads? My gut says no,
that we would wind up thrashing the disk as the reader and writer
compete for distinct areas of the disk. Do we really gain anything by
combining them? After all, they are still using distinct areas of the
disk. Well, we can directly prioritize reads vs. writes (not that I know
what that policy should be...) and maybe we can sort the requests to cluster
disk contention.
3 (hash) and 5 (decompress). CPU intensive activity.
On a uniprocessor, there is no advantage in parallelizing the two,
but on a multi-processor we would want to. Again, an issue here is
prioritizing them w.r.t. each other. Possibly, this is more easily
done with a single thread or at least a common work queue.
4. Download data from the net.
This is still an independent thread, but it has interesting dependencies
on the hashing process in frisbee where we receive "unsolicited" data
blocks. Clearly, if we have hashed everything in one chunk and determined
that we don't need that data, we won't request that chunk and we won't
allocate any resources to collecting that chunk if someone else requests
it. But what if the hasher thread comes to a new chunk to work on, and
the data for that chunk has already been partially received? Do we not
bother to hash data for that chunk and just let the normal frisbee process
blindly overwrite the data? Do we continue to collect the data and do the
hash in parallel, hoping to at least avoid at least some decompression or
writing? Or do we not collect unsolicited chunks at all? Conversely,
what if we start to receive data for a chunk that we are in the process
of hashing?
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment