- 30 Mar, 2018 2 commits
-
-
Mike Hibler authored
We have had issues with uploading images to boss where they are then written across NFS to ops. That seems to be a network hop too far on CloudLab Utah where we have a 10Gb control network. We get occasional transcient timeouts from somewhere in the TCP code. With the convoluted path through real and virtual NICs, some with offloading, some without, packets wind up getting out of order and someone gets far enough behind to cause problems. So we work around it. If IMAGEUPLOADTOFS is defined in the defs-* file, we will run a frisbee master server on the fs (ops) node and the image creation path directs the nodes to use that server. There is a new hack configuration for the master server "upload-only" which is extremely specific to ops: it validates the upload with the boss master server and, if allowed, fires up an upload server for the client to talk to. The image will thus be directly uploaded to the local (ZFS) /proj or /groups filesystems on ops. This seems to be enough to get around the problem. Note that we could allow this master server to serve downloads as well to avoid the analogous problem in that direction, but this to date has not been a problem. NOTE: the ops node must be in the nodes table in the DB or else boss will not validate proxied requests from it. The standard install procedure is supposed to add ops, but we have a couple of clusters where it is not in the table!
-
Leigh B Stoller authored
-
- 29 Mar, 2018 5 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
1) Rework so that instead of relying on swapin__last + autoswap timeout, set expt_expires for classic experiments at the beginning of swapin time. This is cause swapin_last is not set till the end of swapin, and so during swapin the res system is in an inconsistent state since there is no way to determine when the experiment ends. 2) On the Geni path, simplify expiration handling; do not allow a slice modification and expiration change at the same time; the bookkeeping and failure rollback is a pain, especially wrt reservation system, and this rarely ever actually happens, so get rid of a lot of complication.
-
- 28 Mar, 2018 6 commits
-
-
Mike Hibler authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
- 27 Mar, 2018 1 commit
-
-
David Johnson authored
-
- 26 Mar, 2018 14 commits
-
-
David Johnson authored
Also assume debian9 == ubuntu16. Also fix some ubuntu15 install bugs.
-
Leigh B Stoller authored
request (and the https code calls die) we can catch it and return a proper error response.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
infrastructure switch
-
Leigh B Stoller authored
-
Leigh B Stoller authored
logfile metadata.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
- 22 Mar, 2018 2 commits
-
-
Mike Hibler authored
At least I think the problem is that we are doing inadvertent TRIM operations on large (480GB) SSDs. For one, when creating an ext4 filesystem on such a blockstore, we specify "nodiscard". I toyed with the idea of turning off "issue_discards" for the lvremove operations when a blockstore is destroyed, but that led to old metadata being seen when the blockstore was re-created. That led to the last change, which was to force metadata zeroing when we do an lvcreate of a blockstore.
-
Mike Hibler authored
-
- 19 Mar, 2018 1 commit
-
-
Leigh B Stoller authored
-
- 18 Mar, 2018 1 commit
-
-
Leigh B Stoller authored
-
- 14 Mar, 2018 3 commits
-
-
Leigh B Stoller authored
something different in the Portal. Ditto when we fail on an empty testbed, although it appears we never get that anymore.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
- 13 Mar, 2018 1 commit
-
-
Leigh B Stoller authored
-
- 09 Mar, 2018 4 commits
-
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
nodes (no switch in the middle). Handy for testing, not really something we expect people to do.
-
Leigh B Stoller authored
-