Commit 9e55b0b1 authored by Mike Hibler's avatar Mike Hibler

Frisbee general:

1. Implement PREQUEST message which passes a bit map of desired blocks.
   We still use the REQUEST message (start block + number of blocks) for
   full chunk requests as that is more efficient.  This message also
   includes a flag indicating whether it is a retry of a request we
   originally made or not.  This gives the server more accurate loss info.

2. More stats and tracing goo.


Frisbee client:

1. Add 'C' and 'W' command line options to specify amount of memory
   for chunk buffers (network buffering) and for write buffers (disk
   buffering).  The Emulab frisbee startup script uses these to partition
   up all the available memory on a machine.  Previously we were just
   using a fixed ~128MB even though our machines have 256 or 512MB of
   memory.  Also add the 'M' option which specifies the overall memory,
   internally dividing it up between chunk buffers and write buffers.

2. Add 'S' command line option to explicitly specify the server.  This
   allows us to make a feeb...um, "lightweight" authentication check
   on incoming messages.

3. Use the common BlockMap data struct to track which pieces of a chunk
   we have received.  This is easily inverted to make PREQUESTS and it is
   also smaller than the older byte-per-block technique.

4. Allow partial request-ahead.  Previously, we only issued request-ahead
   if there were enough empty chunk buffers for a maximum (2) request-ahead.

Frisbee server:

1. Use BlockMap for workQ elements.  An easy way to allow a complete merge
   of incoming requests with existing ones.

2. Check for overlap of incoming requests with the request currently
   being serviced.  This happens surprisingly often.

3. Dubious: burst gap becomes burst interval.  The latter takes into
   account the time required to read data, etc., in other words, we now
   have variable-sized gaps and put out bursts at specific times rather
   than having fixed gaps and putting out bursts at variable times.
   This gives us more accurate pacing over shorter time periods.  I
   thought this might be important for dynamic pacing.

4. Add 'W' command line option to specify a target bandwidth.  Frisbeed
   will use this to calculate a burst size/interval.

5. Rewrote the dynamic pacing code.  It is now easily as bad as before
   if not worse.  But it does have fewer magic constants!  Needs to be
   redone by someone who understands the TCP-friendly rate equation.

Imagezip:

1. add 'R' option to specify one or more partitions for which to force
   raw (naive) compression even if the FS format is understood.  Useful
   for benchmarking.

2. add 'D' option to allow "dangerous" writes.  In this mode, we don't
   do the fsync's or retries of failed writes.  Overrides the hack we put
   in for NFS.  Use this if writing to a local filesystem (or /dev/null).

3. Eliminate an extra copy of every chunk header.

Imageunzip:

1. Eliminate extra copy of decompressed data that we were doing between
   the single decompression buffer and the disk buffers.  Helps on slow
   machines (like gatech's 300Mhz machines with 66MHz memory bus).

2. Allow dynamic number of variable-sized write buffers.  Total memory
   not to exceed the writebufmem limit.  Previously we had a small number
   of fixed-size (256K) buffers.

3. Add debugging 'C' option to just compute a single CRC of the decompressed
   image.  Back-ported to older imageunzip and used to make sure my write
   buffer changes were correct.  Maybe handy for similar massive changes
   in the future.
parent 42709260
......@@ -33,8 +33,14 @@ SERVEROBJS = server.o $(SHAREDOBJS)
CFLAGS = -O2 -g -Wall -static $(PTHREADCFLAGS) -DSTATS
LDFLAGS = -static
# Hacky loss rate flag
#CFLAGS += -DDOLOSSRATE
# Define this if you implementation of cond_vars works well
#CFLAGS += -DCONDVARS_WORK
# Define this to a non-zero value to enable recording of trace data
#CFLAGS += -DNEVENTS=20000
#CFLAGS += -DNEVENTS=25000
# Turn on client event handling
#CFLAGS += -DDOEVENTS
......@@ -61,9 +67,10 @@ $(FRISBEEDIR)/imageunzip.c: $(FRISBEEDIR)/imagehdr.h $(FRISBEEDIR)/queue.h
frisbee.o: $(FRISBEEDIR)/imageunzip.c
$(CC) -c $(CFLAGS) -DFRISBEE -I$(FRISBEEDIR) -o frisbee.o $<
client.o: decls.h log.h trace.h
server.o: decls.h log.h trace.h
client.o: decls.h log.h utils.h trace.h
server.o: decls.h log.h utils.h trace.h
log.o: decls.h log.h
network.o: decls.h utils.h
trace.o: decls.h trace.h log.h
install: $(INSTALL_SBINDIR)/frisbeed
......
......@@ -33,3 +33,65 @@
when the rate is too high. Good news: it is symmetric with what the server
currently does. Bad news: harder to map this rate to an adjustment than
it is with the queue-size-estimate method.
2. Auto-adjust readahead on the client.
Similar to #1 the client should track the level of activity on the
server and increase its readahead accordingly. For example, if we are
the only client, we could increase our readahead.
3. Eliminate client-side copy of compressed data.
Right now we read packets into a local packet buffer and then, for
BLOCK messages, copy the data out to the chunk buffers. This results
in a complete copy of the compressed data. If we make a chunk buffer
into an array of pointers to data buffers, we can read packets into
these data buffers and link them straight into the chunk buffers.
The downside is that we must modify the already gruesome decompression
loop to deal with input buffer boundaries in addition to region
and writer buffer boundaries.
4. Multi-thread the frisbee server.
We can make our network output intervals more consistant if we
separate the disk reader from the network writer. This would have a
performance benefit for the imageunzip program which currently
combines the reader and decompresser having only a separate writer
thread.
5. Investigate large block/chunk sizes.
Most importantly would be to increase block size from 1024 to something
approaching the 1448 max (given current packet format). Constraint:
number of blocks in a chunk should be a multiple of 8 since we use a
bitmap to track blocks. This is not strictly necessary, it would just
be nice and the BlockMap routines might require a little tweaking ow.
Maybe should be a multiple of 32 to ensure bitmap is a multiple of 4
in size. Large chunk issues: 1) more potential wasted space per chunk,
though mostly only in the last chunk, 2) It takes longer to accumulate
chunks at the client, potentially idling the decompesser and writer,
3) takes more space to accumulate chunks, allowing for fewer in progress
chunks. So maybe 1448B/blk * 768 blks/chunk == 1.06MB/chunk. PREQUEST
BlockMaps come down from 128 bytes to 96.
6. Dynamic rate pacing in the server.
Our attempts to date have been pretty feeble. I think we have a
reasonable loss metric now, just need a smooth weighted decay formula
we can use. Look at the proposed standard TCP-friendly rate equation.
PROBLEMS:
1. Have seen the clients run out of socket buffer space causing them
to lose packets when still well short of the network bandwidth (at
~70Mb/sec). Not sure why. One thing we know is that the decompress
thread will almost certainly run for a full scheduling interval (1ms)
everytime. Thus we have to have enough buffering in the card and socket
buffers to handle 1ms of data. With the default params, we are only
putting out 8 packets every 1ms, so that shouldn't be an issue.
Assuming that we are getting it off the card in time, that means the
network thread is either not running frequently enough, or it is spending
too much time doing other things (like copying packet data, see #3 above).
This diff is collapsed.
......@@ -8,6 +8,7 @@
* Shared for defintions for frisbee client/server code.
*/
#include <limits.h> /* CHAR_BIT */
#include "log.h"
/*
......@@ -21,15 +22,38 @@
#define CHUNKSIZE 1024
/*
* The number of chunk buffers in the client.
* See if we can represent a bitmap of blocks in a single packet.
* If so, then allow partial request messages.
*/
#define MAXCHUNKBUFS 64
#if (CHUNKSIZE%CHAR_BIT) != 0 || (CHUNKSIZE/CHAR_BIT) > 1450
#error "Invalid chunk size"
#endif
/*
* Chunk buffers and output write buffers constitute most of the memory
* used in the system. These should be sized to fit in the physical memory
* of the client, forcing pieces of frisbee to be paged out to disk (even
* if there is a swap disk to use) is not a very efficient way to load disks.
*
* MAXCHUNKBUFS is the number of BLOCKSIZE*CHUNKSIZE chunk buffers used to
* receive data from the network. With the default values, these are 1MB
* each.
*
* MAXWRITEBUFMEM is the amount, in MB, of write buffer memory in the client.
* This is the amount of queued write data that can be pending. A value of
* zero means unlimited.
*
* The ratio of the number of these two buffer types depends on the ratio
* of network to disk speed and the degree of compression in the image.
*/
#define MAXCHUNKBUFS 64 /* 64MB */
#define MAXWRITEBUFMEM 64 /* 64MB */
/*
* Socket buffer size, used for both send and receive in client and
* server right now.
*/
#define SOCKBUFSIZE (128 * 1024)
#define SOCKBUFSIZE (200 * 1024)
/*
* The number of read-ahead chunks that the client will request
......@@ -77,27 +101,31 @@
* Given the typical scheduling granularity
* of 10ms for most unix systems, this
* will likely be set to either 0 or 10000.
* On FreeBSD we set the clock to 1ms
* granularity.
*
* Together with the BLOCKSIZE, these two params form a theoretical upper
* bound on bandwidth consumption for the server. That upper bound (for
* ethernet) is:
*
* (1000000 / SERVER_BURST_GAP) # bursts per second
* * (BLOCKSIZE + 42) * SERVER_BURST_SIZE # * wire size of a burst
* * (BLOCKSIZE+24+42) * SERVER_BURST_SIZE # * wire size of a burst
*
* which for the default 1k packets, gap of 10ms and burst of 64 packets
* is about 6.8MB/sec. In practice, the server is ultimately throttled by
* clients' ability to generate requests which is limited by their ability
* to decompress and write to disk.
* which for the default 1k packets, gap of 1ms and burst of 16 packets
* is about 17.4MB/sec. That is beyond the capacity of a 100Mb ethernet
* but with a 1ms granularity clock, the average gap size is going to be
* 1.5ms yielding 11.6MB/sec. In practice, the server is ultimately
* throttled by clients' ability to generate requests which is limited by
* their ability to decompress and write to disk.
*/
#define SERVER_BURST_SIZE 16
#define SERVER_BURST_GAP 1000
#define SERVER_BURST_GAP 2000
/*
* Max burst size when doing dynamic bandwidth adjustment.
* Needs to be large enough to induce loss.
*/
#define SERVER_DYNBURST_SIZE (SOCKBUFSIZE/BLOCKSIZE)
#define SERVER_DYNBURST_SIZE 128
/*
* How long (in usecs) to wait before re-reqesting a chunk.
......@@ -105,19 +133,21 @@
*
* (CHUNKSIZE/SERVER_BURST_SIZE) * SERVER_BURST_GAP
*
* usec (0.16 sec with defaults) for each each chunk it pumps out,
* usec (0.13 sec with defaults) for each each chunk it pumps out,
* and we conservatively assume that there are a fair number of other
* chunks that must be processed before it gets to our chunk.
*
* XXX don't like making the client rely on compiled in server constants,
* lets just set it to 1 second right now.
*/
#define CLIENT_REQUEST_REDO_DELAY \
(10 * ((CHUNKSIZE/SERVER_BURST_SIZE)*SERVER_BURST_GAP))
#define CLIENT_REQUEST_REDO_DELAY 1000000
/*
* How long for the writer to sleep if there are no blocks currently
* ready to write. Allow a full server burst period, assuming that
* something in the next burst will complete a block.
*/
#define CLIENT_WRITER_IDLE_DELAY SERVER_BURST_GAP
#define CLIENT_WRITER_IDLE_DELAY 1000
/*
* Client parameters and statistics.
......@@ -145,17 +175,24 @@ typedef struct {
unsigned long nofreechunks;
unsigned long dupchunk;
unsigned long dupblock;
unsigned long lostblocks;
unsigned long prequests;
unsigned long recvidles;
unsigned long joinattempts;
unsigned long requests;
unsigned long decompidles;
unsigned long decompblocks;
unsigned long writeridles;
int writebufmem;
unsigned long lostblocks;
unsigned long rerequests;
} v1;
unsigned long limit[256];
} u;
} ClientStats_t;
typedef struct {
char map[CHUNKSIZE/CHAR_BIT];
} BlockMap_t;
/*
* Packet defs.
*/
......@@ -201,6 +238,19 @@ typedef struct {
int count; /* Number of blocks */
} request;
/*
* Partial chunk request, a bit map of the desired blocks
* for a chunk. An alternative to issuing multiple standard
* requests. Retries is a hint to the server for congestion
* control, non-zero if this is a retry of an earlier request
* we made.
*/
struct {
int chunk;
int retries;
BlockMap_t blockmap;
} prequest;
/*
* Leave reporting client params/stats
*/
......@@ -219,6 +269,7 @@ typedef struct {
#define PKTSUBTYPE_BLOCK 3
#define PKTSUBTYPE_REQUEST 4
#define PKTSUBTYPE_LEAVE2 5
#define PKTSUBTYPE_PREQUEST 6
/*
* Protos.
......@@ -229,10 +280,7 @@ int PacketReceive(Packet_t *p);
void PacketSend(Packet_t *p, int *resends);
void PacketReply(Packet_t *p);
int PacketValid(Packet_t *p, int nchunks);
char *CurrentTimeString(void);
int sleeptime(unsigned int usecs, char *str);
int fsleep(unsigned int usecs);
void ClientStatsDump(unsigned int id, ClientStats_t *stats);
void dump_network(void);
/*
* Globals
......
/*
* EMULAB-COPYRIGHT
* Copyright (c) 2002 University of Utah and the Flux Group.
* Copyright (c) 2002, 2003 University of Utah and the Flux Group.
* All rights reserved.
*/
......@@ -34,6 +34,30 @@ static event_handle_t ehandle;
static int gotevent;
static int clientnum;
static int
useclient(int clinum, char *buf)
{
char *cp;
int low, high;
cp = buf;
if (cp != NULL) {
while ((cp = strsep(&buf, ",")) != NULL) {
if (sscanf(cp, "%d-%d", &low, &high) == 2) {
if (clinum >= low && clinum <= high)
return 1;
continue;
}
if (sscanf(cp, "%d", &low) == 1) {
if (clinum == low)
return 1;
continue;
}
}
}
return 0;
}
/*
* type==START
* STAGGER=N PKTTIMEOUT=N IDLETIMER=N READAHEAD=N INPROGRESS=N REDODELAY=N
......@@ -45,6 +69,8 @@ parse_event(Event_t *event, char *etype, char *buf)
{
char *cp;
int val;
char str[STRSIZE+1];
int skipping = 0;
memset(event, -1, sizeof *event);
......@@ -58,90 +84,145 @@ parse_event(Event_t *event, char *etype, char *buf)
cp = buf;
if (cp != NULL) {
while ((cp = strsep(&buf, " ")) != NULL) {
/*
* Hideous Hack Alert!
*
* Assume hostname is of the form 'c-<num>.<domain>'
* and use <num> to determine our client number.
* We use that number and compare to the useclients
* string to determine if we should process this event.
* If not, we ignore this event.
*/
if (sscanf(cp, "USECLIENTS=%s", str) == 1) {
if (!useclient(clientnum, str))
skipping = 1;
else
skipping = 0;
continue;
}
if (sscanf(cp, "SKIPCLIENTS=%s", str) == 1) {
if (useclient(clientnum, str)) {
gotevent = 0;
return 0;
}
continue;
}
if (skipping)
continue;
if (sscanf(cp, "STARTDELAY=%d", &val) == 1) {
event->data.start.startdelay = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "STARTAT=%d", &val) == 1) {
event->data.start.startat = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "PKTTIMEOUT=%d", &val) == 1) {
event->data.start.pkttimeout = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "IDLETIMER=%d", &val) == 1) {
event->data.start.idletimer = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "CHUNKBUFS=%d", &val) == 1) {
event->data.start.chunkbufs = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "WRITEBUFMEM=%d", &val) == 1) {
event->data.start.writebufmem = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "MAXMEM=%d", &val) == 1) {
event->data.start.maxmem = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "READAHEAD=%d", &val) == 1) {
event->data.start.readahead = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "INPROGRESS=%d", &val) == 1) {
event->data.start.inprogress = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "REDODELAY=%d", &val) == 1) {
event->data.start.redodelay = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "IDLEDELAY=%d", &val) == 1) {
event->data.start.idledelay = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "SLICE=%d", &val) == 1) {
event->data.start.slice = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "ZEROFILL=%d", &val) == 1) {
event->data.start.zerofill = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "RANDOMIZE=%d", &val) == 1) {
event->data.start.randomize = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "NOTHREADS=%d", &val) == 1) {
event->data.start.nothreads = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "DOSTYPE=%d", &val) == 1) {
event->data.start.dostype = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "DEBUG=%d", &val) == 1) {
event->data.start.debug = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "TRACE=%d", &val) == 1) {
event->data.start.trace = val;
gotevent = 1;
continue;
}
if (sscanf(cp, "TRACEPREFIX=%s", str) == 1) {
strncpy(event->data.start.traceprefix,
str, STRSIZE-1);
gotevent = 1;
continue;
}
if (sscanf(cp, "EXITSTATUS=%d", &val) == 1) {
event->data.stop.exitstatus = val;
gotevent = 1;
continue;
}
/*
* Hideous Hack Alert!
*
* Assume our hostname is of the form 'c-<num>.<domain>'
* and use <num> to determine our client number.
* We use that number and compare to the maxclients
* field to determine if we should process this event.
* If not, we ignore this event.
*/
if (sscanf(cp, "MAXCLIENTS=%d", &val) == 1) {
if (clientnum >= val) {
gotevent = 0;
return 0;
#ifdef DOLOSSRATE
{
double plr;
if (sscanf(cp, "PLR=%lf", &plr) == 1) {
event->data.start.plr = plr;
gotevent = 1;
continue;
}
continue;
}
#endif
}
}
gotevent = 1;
return 0;
}
......
/*
* EMULAB-COPYRIGHT
* Copyright (c) 2002 University of Utah and the Flux Group.
* Copyright (c) 2002, 2003 University of Utah and the Flux Group.
* All rights reserved.
*/
#define STRSIZE 64
/*
* Event defs
*/
......@@ -12,9 +14,12 @@ typedef struct {
union {
struct {
int startdelay; /* range in sec of start interval */
int startat; /* start time (alt to startdelay) */
int pkttimeout; /* packet timeout in usec */
int idletimer; /* idle timer in pkt timeouts */
int chunkbufs; /* max receive buffers */
int writebufmem;/* max disk write buffer memory */
int maxmem; /* max total memory */
int readahead; /* max readahead in packets */
int inprogress; /* max packets in progress */
int redodelay; /* redo delay in usec */
......@@ -26,6 +31,11 @@ typedef struct {
int dostype; /* DOS partition type to set */
int debug; /* debug level */
int trace; /* tracing level */
char traceprefix[STRSIZE];
/* prefix for trace output file */
#ifdef DOLOSSRATE
double plr;
#endif
} start;
struct {
int exitstatus;
......
/*
* EMULAB-COPYRIGHT
* Copyright (c) 2000-2002 University of Utah and the Flux Group.
* Copyright (c) 2000-2003 University of Utah and the Flux Group.
* All rights reserved.
*/
......@@ -20,17 +20,28 @@
#include <signal.h>
#include <errno.h>
#include "decls.h"
#include "utils.h"
#ifdef DOLOSSRATE
#define LOSSONSENDER
#define LOSSONRECVER
#endif
#ifdef DOLOSSRATE
extern int lossrate;
#endif
#ifdef STATS
#ifdef DOLOSSRATE
unsigned long rpackets, rpacketslost;
unsigned long spackets, spacketslost;
#endif
unsigned long nonetbufs;
#define DOSTAT(x) (x)
#else
#define DOSTAT(x)
#endif
/* Time in usec to delay waiting for more buffer space on sends */
#define NOBUF_DELAY 1000
/* Max number of times to attempt bind to port before failing. */
#define MAXBINDATTEMPTS 10
......@@ -42,6 +53,24 @@ struct in_addr myipaddr;
static int nobufdelay = -1;
int broadcast = 0;
void
dump_network(void)
{
#ifdef DOLOSSRATE
if (lossrate == 0)
return;
if (spacketslost)
fprintf(stderr, "Lost %lu of %lu send packets (%.2f%%)\n",
spacketslost, spackets,
(double)spacketslost * 100 / spackets);
if (rpacketslost)
fprintf(stderr, "Lost %lu of %lu recv packets (%.2f%%)\n",
rpacketslost, rpackets,
(double)rpacketslost * 100 / rpackets);
#endif
}
static void
CommonInit(void)
{
......@@ -55,10 +84,14 @@ CommonInit(void)
pfatal("Could not allocate a socket");
i = SOCKBUFSIZE;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &i, sizeof(i));
if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &i, sizeof(i)) < 0)
pwarning("Could not increase send socket buffer size to %d",
SOCKBUFSIZE);
i = SOCKBUFSIZE;
setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &i, sizeof(i));
if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &i, sizeof(i)) < 0)
pwarning("Could not increase recv socket buffer size to %d",
SOCKBUFSIZE);
name.sin_family = AF_INET;
name.sin_port = htons(portnum);
......@@ -123,13 +156,6 @@ CommonInit(void)
&i, sizeof(i)) < 0)
pfatal("setsockopt(SOL_SOCKET, SO_BROADCAST)");
}
else if (mcastif.s_addr) {
/*
* Overload this. In unicast mode, use this as our
* outgoing interface. Useful when multihomed.
*/
myipaddr.s_addr = mcastif.s_addr;
}
/*
* We use a socket level timeout instead of polling for data.
......@@ -142,11 +168,13 @@ CommonInit(void)
pfatal("setsockopt(SOL_SOCKET, SO_RCVTIMEO)");
/*
* We add our (unicast) IP addr to every outgoing message.
* This is going to be used to return replies to the sender,
* where appropriate.
* If a specific interface IP is specified, use that to
* tag our outgoing packets. Otherwise we use the IP address
* associated with our hostname.
*/
if (!myipaddr.s_addr) {
if (mcastif.s_addr)
myipaddr.s_addr = mcastif.s_addr;
else {
if (gethostname(buf, sizeof(buf)) < 0)
pfatal("gethostname failed");
......@@ -157,11 +185,10 @@ CommonInit(void)
}
/*
* Compute the out of buffer space delay
* Compute the out of buffer space delay.
*/
if (nobufdelay < 0)
nobufdelay = sleeptime(NOBUF_DELAY,
"out of socket buffer space delay");
nobufdelay = sleeptime(100, NULL, 1);
}
int
......@@ -187,13 +214,39 @@ ServerNetInit(void)
*
* The amount of data received is determined from the datalen of the hdr.
* All packets are actually the same size/structure.
*
* Returns 0 for a good packet, 1 for a back packet, -1 on timeout.
*/
int
PacketReceive(Packet_t *p)
{
struct sockaddr_in from;
int mlen, alen;
#ifdef DOLOSSRATE
#ifdef LOSSONRECVER
struct timeval now, then;
if (lossrate) {
/*
* XXX cannot rely on socket timeout value since we need to
* treat received and dropped packets as though they never
* arrived. This is still not correct as a receive timeout
* could still be up to twice as long as it should be, but
* I don't want to mess with the socket timeout on every
* recv call.
*/
gettimeofday(&then, 0);
if ((then.tv_usec += PKTRCV_TIMEOUT) >= 1000000) {
then.tv_sec++;
then.tv_usec -= 1000000;
}
again:
gettimeofday(&now, 0);
if (timercmp(&now, &then, >=))
return -1;
}
#endif
#endif
alen = sizeof(from);
bzero(&from, alen);
if ((mlen = recvfrom(sock, p, sizeof(*p), 0,
......@@ -203,11 +256,36 @@ PacketReceive(Packet_t *p)
pfatal("PacketReceive(recvfrom)");
}
if (mlen != sizeof(p->hdr) + p->hdr.datalen)
fatal("PacketReceive: Bad message length %d!=%d",
mlen, p->hdr.datalen);
/*
* Basic integrity checks
*/
if (mlen < sizeof(p->hdr) + p->hdr.datalen) {
log("Bad message length (%d != %d)",
mlen, p->hdr.datalen);
return 1;
}
if (p->hdr.srcip != from.sin_addr.s_addr) {
log("Bad message source (%x != %x)",
ntohl(from.sin_addr.s_addr), ntohl(p->hdr.srcip));
return 1;
}
#ifdef DOLOSSRATE
#ifdef LOSSONRECVER
DOSTAT(rpackets++);
if (lossrate && random() < lossrate) {
/* XXX hack: don't loose join/leave messages, screws stats */
if (p->hdr.subtype != PKTSUBTYPE_JOIN &&
p->hdr.subtype != PKTSUBTYPE_LEAVE &&
p->hdr.subtype != PKTSUBTYPE_LEAVE2) {
DOSTAT(rpacketslost++);
goto again;
}
}
#endif
#endif
return p->hdr.datalen;
return 0;
}
/*
......@@ -223,6 +301,21 @@ PacketSend(Packet_t *p, int *resends)
struct sockaddr_in to;
int len, delays;
#ifdef DOLOSSRATE
#ifdef LOSSONSENDER
DOSTAT(spackets++);
if (lossrate && random() < lossrate) {
/* XXX hack: don't loose join/leave messages, screws stats */
if (p->hdr.subtype != PKTSUBTYPE_JOIN &&
p->hdr.subtype != PKTSUBTYPE_LEAVE &&
p->hdr.subtype != PKTSUBTYPE_LEAVE2) {
DOSTAT(spacketslost++);
return;
}
}
#endif
#endif
len = sizeof(p->hdr) + p->hdr.datalen;
p->hdr.srcip = myipaddr.s_addr;
......@@ -314,6 +407,13 @@ PacketValid(Packet_t *p, int nchunks)
p->msg.request.block+p->msg.request.count > CHUNKSIZE)
return 0;
break;
case PKTSUBTYPE_PREQUEST:
if <