Forked from
emulab / emulab-devel
Source project has a limited visibility.
-
Mike Hibler authored
We have had issues with uploading images to boss where they are then written across NFS to ops. That seems to be a network hop too far on CloudLab Utah where we have a 10Gb control network. We get occasional transcient timeouts from somewhere in the TCP code. With the convoluted path through real and virtual NICs, some with offloading, some without, packets wind up getting out of order and someone gets far enough behind to cause problems. So we work around it. If IMAGEUPLOADTOFS is defined in the defs-* file, we will run a frisbee master server on the fs (ops) node and the image creation path directs the nodes to use that server. There is a new hack configuration for the master server "upload-only" which is extremely specific to ops: it validates the upload with the boss master server and, if allowed, fires up an upload server for the client to talk to. The image will thus be directly uploaded to the local (ZFS) /proj or /groups filesystems on ops. This seems to be enough to get around the problem. Note that we could allow this master server to serve downloads as well to avoid the analogous problem in that direction, but this to date has not been a problem. NOTE: the ops node must be in the nodes table in the DB or else boss will not validate proxied requests from it. The standard install procedure is supposed to add ops, but we have a couple of clusters where it is not in the table!
Mike Hibler authoredWe have had issues with uploading images to boss where they are then written across NFS to ops. That seems to be a network hop too far on CloudLab Utah where we have a 10Gb control network. We get occasional transcient timeouts from somewhere in the TCP code. With the convoluted path through real and virtual NICs, some with offloading, some without, packets wind up getting out of order and someone gets far enough behind to cause problems. So we work around it. If IMAGEUPLOADTOFS is defined in the defs-* file, we will run a frisbee master server on the fs (ops) node and the image creation path directs the nodes to use that server. There is a new hack configuration for the master server "upload-only" which is extremely specific to ops: it validates the upload with the boss master server and, if allowed, fires up an upload server for the client to talk to. The image will thus be directly uploaded to the local (ZFS) /proj or /groups filesystems on ops. This seems to be enough to get around the problem. Note that we could allow this master server to serve downloads as well to avoid the analogous problem in that direction, but this to date has not been a problem. NOTE: the ops node must be in the nodes table in the DB or else boss will not validate proxied requests from it. The standard install procedure is supposed to add ops, but we have a couple of clusters where it is not in the table!