- 30 Mar, 2018 1 commit
-
-
Mike Hibler authored
We have had issues with uploading images to boss where they are then written across NFS to ops. That seems to be a network hop too far on CloudLab Utah where we have a 10Gb control network. We get occasional transcient timeouts from somewhere in the TCP code. With the convoluted path through real and virtual NICs, some with offloading, some without, packets wind up getting out of order and someone gets far enough behind to cause problems. So we work around it. If IMAGEUPLOADTOFS is defined in the defs-* file, we will run a frisbee master server on the fs (ops) node and the image creation path directs the nodes to use that server. There is a new hack configuration for the master server "upload-only" which is extremely specific to ops: it validates the upload with the boss master server and, if allowed, fires up an upload server for the client to talk to. The image will thus be directly uploaded to the local (ZFS) /proj or /groups filesystems on ops. This seems to be enough to get around the problem. Note that we could allow this master server to serve downloads as well to avoid the analogous problem in that direction, but this to date has not been a problem. NOTE: the ops node must be in the nodes table in the DB or else boss will not validate proxied requests from it. The standard install procedure is supposed to add ops, but we have a couple of clusters where it is not in the table!
-
- 30 Jan, 2017 1 commit
-
-
Mike Hibler authored
I ALMOST got sucked into making big changes that would have required a version change.
-
- 19 Jan, 2017 1 commit
-
-
Mike Hibler authored
My "shortcut" to enable a heartbeat via a client-side command line proved to be untenable. There are just too many places where we fire off the client and getting the right heartbeat interval value to all those places would have been...challenging. So back to the original plan of having a server-side command line option and letting the server tell the client when/what to report. This limits the changes to just the frisbee master server, in particular I now just have to get the value to master server instances running on the subbosses (not done yet, just hardwiring a value for now). All this said, I still had to modify the various places we invoke the frisbee client to add an option to enable the heartbeat, but at least I didn't need to know a specific value.
-
- 17 Jan, 2017 1 commit
-
-
Mike Hibler authored
There are three pieces here, a change to the frisbee protocol itself, an Emulab event component to get status back to the portal, and the surrounding infrastructure to make it all work. Frisbee heartbeat messages: Added a new message type to the frisbee protocol, "Progress". In theory it operates by having the server send a multicast progress request to its clients which includes an interval at which to report (or "just once") and an indication of what to report (nothing, progress summary, or full stats). The client then sends unicast "fire and forget" UDP replies according to that schedule. However, I took a shortcut for the moment and just added a command line option to the client to tell it to report a summary at the indicated interval (-H <interval>). So the server never sends requests. This is implemented in the client by a fourth thread since I wanted it to operate independent of packet reception (which would cause clients to report in a highly synchronized fashion due to multicast). The server instance just logs progress reports into its log. This protocol addition should be fully backward compatible as both client and server ignore (but log) unknown messages. Emulab progress report events: When this is compiled in (-DEMULAB_EVENTS) and turned on (-E <server>), the frisbee server instances will send a FRISBEEPROGRESS event to the indicated event server for every progress report it receives (in addition to logging the events to its own log). Right now it will create an event with key/value pairs for the information in a client summary reply: TSTAMP is the client's time at which it sends the event. Could be used by the received to determine latency of the report if it cared (and if it assumed that the clocks are in sync). We don't care about this. SEQUENCE is the report number. Again, could be used by the receiver, in this case to detect loss, if it cared. We don't. CHUNKS_RECV is complete chunks that the client has received from the network. CHUNKS_DECOMP is chunks decompressed by the client. BYTES_WRITTEN is bytes written to disk by the client. Any of the three can be used by the event receiver as an indication of life and/or progress. However, only the last would be a reasonable indicator of time remaining since it is the last (and slowest) phase of imaging. To estimate time remaining we could compare that value to the amount of uncompressed data that is in the image. This makes the sketchy assumptions that time for writes to the disk are uniform and that the number and distance of seeks is uniform, but it is better than a sharp stick in the eye. Emulab infrastructure: There is a new sitevar "images/frisbee/heartbeat" which can be set to a non-zero value to tell the frisbee MFS to fire off frisbee with -H <value> and thus make reports. The default value of zero means to not make reports. The tmcd "loadinfo" command sends this through via the HEARTBEAT=<value> param. REQUIRED A TMCD VERSION BUMP TO 41.
-
- 27 May, 2015 2 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
Two different fixes here. The first affects frisbeed ("the server") and frisuploadd ("the uploader"). In both, the master server was choosing the port to use as an obscure function of the current value of emulab_indicies frisbee_index without regard to whether the port was already in use by someone else. To fix this, the "-p <port>" option of both programs has been changed to allow a value of 0 to indicate that the program (rather, the kernel) should choose the first available port. It will also take a port range (e.g., "-p 50000-50100") which says to find the first available port in that range. To communicate The Chosen Port back to the master server, there is a new option to frisbeed and frisuploadd, "-A <file>", which says to write the address info into the indicated file in the <IP-addr>:<port> format. Note that we don't care about the <IP-addr> part since that is just the multicast address (frisbeed) or our unicast address (frisuploadd) that we pass in to the program. The "Emulab configuration" of the master server uses the defs file FRISEBEEMCASTPORT and FRISEBEENUMPORTS vars to determine what to pass via the "-p" option. See the comment in defs-example. The "null configuration" (aka, on a subboss) just passes "-p 0" to frisbeed. The second fix was an attempt to avoid port conflicts on the client side (frisbee). There is only so much we can do since all clients of a multicast frisbee session have to use the same port, but we can avoid conflicts with other UDP apps that bind to INADDR_ANY:<port>. We make use of the REUSEADDR socket option and bind specifically to <mcaddr>:<port>. This also requires that the server multicast the JOIN reply that was previously unicast. Note that use of REUSEADDR will also allow multiple frisbee clients on the same host to be in the same session (not that we ever do that). Since the server is typically updated whenever the Emulab software is, but the client is embedded in images and MFSes, there can be pretty much any combo of {old,new} server and {old,new} client in the field. So backward compatibility was essential and there are a variety of implementation details related to that. See the comment in network.c::ClientNetInit().
-
- 10 Feb, 2014 2 commits
-
-
Mike Hibler authored
It appeared to be before, but wasn't really. The -k option for both client and server will set the max socketbuf size in KB (NOTE: THIS USED TO BE MB!) The actual socketbuf size will then be the min of that and what the system supports. The client stats now include the sockbuf size of the run.
-
Mike Hibler authored
To avoid namespace conflicts (e.g., with libm's "log" function).
-
- 01 Nov, 2013 1 commit
-
-
Mike Hibler authored
Two changes: make sendto() call non-blocking (via MSG_DONTWAIT) so we get back EAGAIN if socket buffer is full, and turn on "extended error return reporting" (via IP_RECVERR) so we get back ENOBUFS when NIC send buffers are full.
-
- 18 May, 2013 1 commit
-
-
Mike Hibler authored
The "-k" option (hey, it was the best available letter!) takes a size in MB with which the frisbee client and server will first try to size their UDP socket buffers. They may still wind up with a smaller size, depending on what the OS supports. Fixed the burst logging logic so that we correctly trace overrun conditions.
-
- 16 Nov, 2012 1 commit
-
-
Mike Hibler authored
Previously if you reverted an image to an older version, the check would not detect it and we would not update cached copies. Also, improve a couple of info messages.
-
- 24 Sep, 2012 1 commit
-
-
Eric Eide authored
This commit is intended to makes the license status of Emulab and ProtoGENI source files more clear. It replaces license symbols like "EMULAB-COPYRIGHT" and "GENIPUBLIC-COPYRIGHT" with {{{ }}}-delimited blocks that contain actual license statements. This change was driven by the fact that today, most people acquire and track Emulab and ProtoGENI sources via git. Before the Emulab source code was kept in git, the Flux Research Group at the University of Utah would roll distributions by making tar files. As part of that process, the Flux Group would replace the license symbols in the source files with actual license statements. When the Flux Group moved to git, people outside of the group started to see the source files with the "unexpanded" symbols. This meant that people acquired source files without actual license statements in them. All the relevant files had Utah *copyright* statements in them, but without the expanded *license* statements, the licensing status of the source files was unclear. This commit is intended to clear up that confusion. Most Utah-copyrighted files in the Emulab source tree are distributed under the terms of the Affero GNU General Public License, version 3 (AGPLv3). Most Utah-copyrighted files related to ProtoGENI are distributed under the terms of the GENI Public License, which is a BSD-like open-source license. Some Utah-copyrighted files in the Emulab source tree are distributed under the terms of the GNU Lesser General Public License, version 2.1 (LGPL).
-
- 08 Aug, 2012 1 commit
-
-
Mike Hibler authored
-
- 08 Jul, 2012 1 commit
-
-
Mike Hibler authored
In at least the Linux 3.2 kernel on Ubuntu 12, setsockopt to set the socket buffer size does not return an error if you try to set a value higher than the kernel max. So we do an immediately following getsockopt to verify. This will prevent the server from over-driving the send socket (leading to re-requests of blocks from clients) for really high bandwidth values (i.e., with large burst sizes).
-
- 19 Jun, 2012 1 commit
-
-
Mike Hibler authored
Add "-Q <interval>" option to the master server to allow it to act as an IGMP V2 querier in environment where there is otherwise not one. It does essentially what the perl-based querier (code.google.com/p/perl-igmp-querier/) does, sending out a v2 membership query at the specified interval. This eliminates the need to run mrouted in some environments (e.g., elabinelab) just to issue IGMP queries. As a result, all the boss-install and elabinelab setup related to using mrouted to perform this function has been removed. The elabinelab CONFIG_MROUTED option has been changed to CONFIG_QUERIER (the former is still recognized and mapped to the latter). The undocumented defs-* variable NEEDMROUTED has been changed to NEEDMCQUERIER (the former still exists in install/installvars.pm.in but is always set to 0) to more accurately reflect the variable's purpose. If NEEDMCQUERIER is set, then the mfrisbeed startup script is modified to add the "-Q 30" option. The implementation of the client and server "-K <interval>" keep-alive option has been changed to directly send IGMP v2 membership reports containing the associated MC address. Note that the -K options have always been a hack to work-around assorted IGMP-related misconfigurations and incompatibilities, and really should only be used as a last resort. As implemented, they could cause the host machine to be pruned out of other MC groups at the nearest switch since they only report membership in the frisbee MC group. With the master server acting as an IGMP querier, instances of the frisbee server on that host should no longer need to do keep alives. We still have one case where it is needed on the client-side: a FreeBSD 8.x or later host connected to an IGMPv2-only switch. It appears that the IGMPv3 implementation added in FreeBSD 8.x always sends v3 reports, even when the default is configured (via sysctl or even recompiling the kernel) as v2.
-
- 26 Apr, 2012 1 commit
-
-
Mike Hibler authored
I had never completed this. Two things to note: 1. Distribution via broadcast is still disabled by default in the master server. To enable it, see the comment added in 3.mfrisbeed.sh.in. To use broadcast by default in the client, see the comment in rc.frisbee. 2. If you specify broadcast (-b) in either the client or server, then you should use "-m 255.255.255.255". However, this will broadcast to ALL interfaces on the client/server. To limit to a specific interface, also include "-i <interface-IP>". This will tell the client/server to look up that interface and use the subnet broadcast address in place of 255.255.255.255. Since the master server always starts up frisbeed instances with -i, broadcast will always be directed on the server. Since our rc.frisbee script also fires up the client with -i, it will likewise be directed.
-
- 20 Mar, 2012 1 commit
-
-
Leigh B Stoller authored
the os subdir into the clientside/os dir. Mike, use the --follow argument to git log to traverse across the mv.
-
- 08 Oct, 2011 2 commits
-
-
Mike Hibler authored
This reverts commit fc89eb38. Checked in a bunch of crap that was unrelated.
-
Mike Hibler authored
When downloading an image, start the frisbeed process with the minimum set of gids necessary to access the image. This includes the unix gid of the project that the image is in and, optionally, the unix gid of the project subgroup if the image is part of one. Previously, we just use the gid set of the uid of the swapper of the experiment. Not only was this excessive, but it might also not include the gids needed in the case of a "global" image that is not in the world-readable /usr/testbed/images directory.
-
- 23 Aug, 2011 1 commit
-
-
Mike Hibler authored
Use select+recv instead. Hopefully will allow this to work on Cygwin.
-
- 08 Mar, 2011 1 commit
-
-
Mike Hibler authored
-
- 13 Jan, 2011 1 commit
-
-
Mike Hibler authored
Spending way too much time here, but this does have to work for elabinelab and subbosses, so there!
-
- 11 Jan, 2011 1 commit
-
-
Mike Hibler authored
More work on the hierarchical configuration for subboss. When doing host-based authentication, allow client to pass an explicit host (IP) to the mserver. If the mserver is configured to allow it, that IP is used for authenticating the request instead of the caller's IP. Add a default ("null") configuration so the mserver can operate out-of-the-box with no config file. The goal of these two changes is for an mserver instance with the default config and a proxy option to serve the needs of a subboss node (i.e., so no explicit configuration will be needed).
-
- 13 Dec, 2010 1 commit
-
-
Mike Hibler authored
proxy, then it will perform its checks against the hostIP provided rather than the IP of the message sender. For the Emulab subboss case, subboss nodes (as determined from the DB) are allowed to explicitly specify hosts that are under their control. The master server on real boss will run with proxying enabled. Also, in a fit of madness, I added a version number to the master server protocol. This way, if at some distant point in the future (say next week) I realize I screwed up the protocol, I can fix it without resorting to creative retrofitting of a version number (see imagezip) or the more Orwellian eradication and denial of past versions (what I am doing now). Furthermore, using an ill-thought-out insight, I made the version number be an ASCII string in case I decide to change to an all-text protocol at some equally distant point in the future.
-
- 07 Dec, 2010 1 commit
-
-
Mike Hibler authored
The frisbee server (and client) return a special exit code if they cannot bind to the given port. Arrange for server startup to be retried if this happens.
-
- 06 Dec, 2010 1 commit
-
-
Mike Hibler authored
There are some hacks involved here right now since a unicast frisbee server currently can only support a single client. So for now, we only allow a single unicast server for any image (i.e., because the right way to do this is to fix the server to support multiple clients, not to start up multiple single-client servers).
-
- 03 Dec, 2010 1 commit
-
-
Mike Hibler authored
The master server can be started with a "parent" mserver that it can call if it doesn't have an image. The master server will return an EAGAIN type error to any clients that contact it while it is downloading an image from its parent. The client now has a "-B N" option to tell it to try again every N seconds as long as it gets back that error. This is the "store and forward" mode. The mserver also has a "cut through" style where it will return to clients the mcast info it got from its parent so they can download directly from the parent until the local mserver has it.
-
- 02 Dec, 2010 1 commit
-
-
Mike Hibler authored
Add the ability of the master server to have a "parent" from which it can download an image if it doesn't have it or if the image is out of date. Had to add some more goo to the GET reply, notably a hash so that we can check for up-to-dateness. The actual part where we upcall to the parent isn't done yet, that is why this is "inching toward" and not "leaping and bounding toward"... Also redid the child process management to not use SIGCHLD, no need for that.
-
- 30 Nov, 2010 1 commit
-
-
Mike Hibler authored
Experiment: lets see if this helps revive stuck multicast sessions. With "-K <seconds>", the client will send a IGMP-leave/IGMP-join after <seconds> of no received packets.
-
- 24 Nov, 2010 1 commit
-
-
Mike Hibler authored
There are a couple of new packet types in the frisbee protocol which are exchanged via TCP with the master server: GETREQUEST and GETREPLY. The client passes to the master server an opaque imageid and a couple of options and gets back the addr/port to use to actually download the image. The implementation of the master server is fragile and is more of a test framework, Grant is working on a more robust master server. I am mostly doing a backend that communicates with the Emulab DB to do its authentication and making the client changes. The client now uses the -S option to specify the IP address of the master server and the -F option to specify an imageid. If no error is returned, the image is downloaded using the returned addr/port. If -Q is used in place of -F, then the client makes a "status only" call getting back info about whether the named image is accessible to the client and whether a server is currently running. On the server side, the new master server (mserver.c) has an Emulab configuration "backend" that supports host-based authentication. The IP address of the caller is mapped to a node_id/pid/gid/eid combo that is used to determine access. On a request, the specified imageid is treated either as a pathname (if it starts with '/') or an image identifier of the form "<pid>/<imagename>". If it is a pathname, we check to make sure that pathname (after running through "realpath") is contained in one of the directories accessible to that node in its current experiment context; i.e., /share, /proj/<pid>, /groups/<pid>/<gid>, or /users/<swapper-uid>. If it is an image identifier, the DB is queried to ensure that access is allowed to that image; i.e., it must be "global" or in the appropriate project/group. The master server forks a frisbeed for each valid request, if one is not already running. The multicast address selection is still based on the emulab_indicies.frisbee_index field, but the address/port/server info is no longer stored in the frisbee_blobs table (frisbee_pid, load_address, load_busy are not set). Note that this is not yet integrated in the os_load path. Further work is required to replace frisbeelauncher.
-
- 29 Oct, 2010 1 commit
-
-
Mike Hibler authored
Basically, make it possible to transfer a non imagezip image. Previously you had to wrap a regular file as an image in order to transfer it. The big hang up was that the frisbee protocol could only transfer files that were a multiple of 1MB (the chunk size). This commit changes the frisbee protocol slightly to allow transfer of non-1MB-multiple files. The protocol change was to add a new JOIN message that returns the size of the file in bytes rather than in blocks. This allows the client to know that the file in question is not a multiple of 1MB and allows it to request the correct partial number of blocks for the final chunk and to extract the correct amount of data from the final 1K block (that block is still padded to 1K by the server). For the server side, the request mostly allows it to do some sanity checking. The fact that the server is started with a file that is not a multiple of 1MB is what triggers it to know about partial chunks. The sanity checking is that the server will not acknowledge clients that attempt to join with a version 1 JOIN message, since nothing good would come of that pairing. On the client side, frisbee must be invoked with the -N (nodecompress) option in order to issue a v2 JOIN. See the comment in the code for the rationale, but it is largely a backward compat feature. While I was changing the JOIN message, I added a couple of other future features. One is that by passing back a 64-bit value for the size of the image in bytes, we can feed bigger images. However there is still much to be done to realize this. The other was to add blocksize/chunksize fields in the message so that the server/client can negotiate the transfer parameters, e.g., 1024 blocks of 1024 bytes vs. 256 blocks of 8192 bytes, the latter being for "jumbo" packets on a Gb ethernet. But there is still more to be done to get this working too.
-
- 28 Sep, 2009 1 commit
-
-
Mike Hibler authored
Support for jumbo packets. Setting WITH_JUMBO on the make command line will change the image block size to 8192 bytes and reduces the number of block per chunk to 256 (to maintain the 1MB chunk size for compat with old images). The default is still 1024. Added the notion of a "dubious" chunk buffer in the client. If an incoming chunk buffer is marked as CHUNK_DUBIOUS, then its contents can be evicted and the buffer reused for a more promising chunk. This is a crude replacement mechanism that is currently only used in one place: if we miss part of a chunk and the server switches to sending a new chunk for which we have no free buffer, we switch to collecting the new chunk. The reasoning is that it will take a while for the server to switch back to completing the former chunk, during which time it may send one or more complete chunks that we could more fruitfully use (decompress and write out). Changed the meaning of the "done" field for a chunk. It used to mean either that we have completely processed the chunk or that we are currently collecting it. It took additional work (scanning all chunk buffers) to differentiate these cases, so I make it explicit. Allow the client and server to dynamically determine the maximum socket buffer size. Fix a couple more on-the-wire data structure size/alignment issues that showed up on a 64-bit OS. A few minor speedups to the bitmap handling code. Think: "rearranging deck chairs on the Titanic" here. We need more serious algorithmic changes to scale all this code going forward. Add some more TRACE events and refine what is already there. Added some hacks to allow frisbee client/server to run on the same machine. We had made it remarkably hard to do this. But then again, why would you want to! Look for SAME_HOST_HACK in the makefile.
-
- 21 Nov, 2006 1 commit
-
-
Mike Hibler authored
-
- 02 Dec, 2005 1 commit
-
-
Mike Hibler authored
and return a distinct error when we fail to get the port
-
- 11 May, 2005 1 commit
-
-
Mike Hibler authored
used to force the server to send an IGMP report if it doesn't receive any packets within <seconds> seconds. As long as the server is receiving packets, it won't send the report. What I'm not lovin here, is that to send a report I have to drop membership in the group (socket opt IP_DROP_MEMBERSHIP) and rejoin (IP_ADD_MEMBERSHIP). Simply trying to do an add membership doesn't work because the kernel thinks you are already in the group and errs out. I'm hoping all the up and down activity doesn't make the switch behave any worse than it already does.
-
- 08 Mar, 2004 1 commit
-
-
Mike Hibler authored
-
- 14 Jan, 2004 1 commit
-
-
Mike Hibler authored
-
- 09 Apr, 2003 2 commits
-
-
Mike Hibler authored
hopefully never to return again!
-
Mike Hibler authored
1. Implement PREQUEST message which passes a bit map of desired blocks. We still use the REQUEST message (start block + number of blocks) for full chunk requests as that is more efficient. This message also includes a flag indicating whether it is a retry of a request we originally made or not. This gives the server more accurate loss info. 2. More stats and tracing goo. Frisbee client: 1. Add 'C' and 'W' command line options to specify amount of memory for chunk buffers (network buffering) and for write buffers (disk buffering). The Emulab frisbee startup script uses these to partition up all the available memory on a machine. Previously we were just using a fixed ~128MB even though our machines have 256 or 512MB of memory. Also add the 'M' option which specifies the overall memory, internally dividing it up between chunk buffers and write buffers. 2. Add 'S' command line option to explicitly specify the server. This allows us to make a feeb...um, "lightweight" authentication check on incoming messages. 3. Use the common BlockMap data struct to track which pieces of a chunk we have received. This is easily inverted to make PREQUESTS and it is also smaller than the older byte-per-block technique. 4. Allow partial request-ahead. Previously, we only issued request-ahead if there were enough empty chunk buffers for a maximum (2) request-ahead. Frisbee server: 1. Use BlockMap for workQ elements. An easy way to allow a complete merge of incoming requests with existing ones. 2. Check for overlap of incoming requests with the request currently being serviced. This happens surprisingly often. 3. Dubious: burst gap becomes burst interval. The latter takes into account the time required to read data, etc., in other words, we now have variable-sized gaps and put out bursts at specific times rather than having fixed gaps and putting out bursts at variable times. This gives us more accurate pacing over shorter time periods. I thought this might be important for dynamic pacing. 4. Add 'W' command line option to specify a target bandwidth. Frisbeed will use this to calculate a burst size/interval. 5. Rewrote the dynamic pacing code. It is now easily as bad as before if not worse. But it does have fewer magic constants! Needs to be redone by someone who understands the TCP-friendly rate equation. Imagezip: 1. add 'R' option to specify one or more partitions for which to force raw (naive) compression even if the FS format is understood. Useful for benchmarking. 2. add 'D' option to allow "dangerous" writes. In this mode, we don't do the fsync's or retries of failed writes. Overrides the hack we put in for NFS. Use this if writing to a local filesystem (or /dev/null). 3. Eliminate an extra copy of every chunk header. Imageunzip: 1. Eliminate extra copy of decompressed data that we were doing between the single decompression buffer and the disk buffers. Helps on slow machines (like gatech's 300Mhz machines with 66MHz memory bus). 2. Allow dynamic number of variable-sized write buffers. Total memory not to exceed the writebufmem limit. Previously we had a small number of fixed-size (256K) buffers. 3. Add debugging 'C' option to just compute a single CRC of the decompressed image. Back-ported to older imageunzip and used to make sure my write buffer changes were correct. Maybe handy for similar massive changes in the future.
-
- 26 Nov, 2002 1 commit
-
-
Mike Hibler authored
-