emulab-devel issueshttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues2024-02-02T15:31:57-07:00https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/688Allow explicit user snapshots of persistent blockstores (remote datasets)2024-02-02T15:31:57-07:00Mike HiblerAllow explicit user snapshots of persistent blockstores (remote datasets)Right now, snapshots of a blockstore are only made when an experiment containing a node with a RW mapping of that blockstore is terminated. This might be a bit cumbersome for workflows in which the "master" of a dataset is being updated ...Right now, snapshots of a blockstore are only made when an experiment containing a node with a RW mapping of that blockstore is terminated. This might be a bit cumbersome for workflows in which the "master" of a dataset is being updated on a regular basis (instantiate, update, terminate).
We could allow for users to make an explicit snapshot of a blockstore while a RW mapping exists. The workflow would be: user clicks a button on the GUI somewhere, invokes a script on boss which `ssh`s over to the node with the RW mapping and unmounts any filesystem associated with the dataset, then makes the DB call on boss to create a snapshot of the zvol, and finally `ssh`s again to the node to remount any filesystem.
I would not allow more than a single snapshot which would basically be "the most recent snapshot" that new RO and clone mappings use. If they then create another snapshot, it will replace the previous one if possible (i.e., the snapshot is not in active use). I don't want to start accounting for multiple user snapshots and providing ways to map particular ones. I view this just as a shortcut of the terminate/reinstantiate model.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/686Handle blockstores "correctly" during experiment modify2023-10-11T11:45:50-06:00Mike HiblerHandle blockstores "correctly" during experiment modifySince @stoller got experiment modify working via the portal (have I mentioned how awesome this is?) I need to figure out how to handle blockstores in a sane manner.
First, it should be pretty easy to make this work for remote blockstore...Since @stoller got experiment modify working via the portal (have I mentioned how awesome this is?) I need to figure out how to handle blockstores in a sane manner.
First, it should be pretty easy to make this work for remote blockstores, we just have to detach from them before modify and reattach to whatever is in the experiment after the modify.
Local blockstores are the issue. The cleanest approach would just be to destroy any existing blockstores before modify and then let it recreate blockstores after the modify. However, this would likely not sit well with users unless they were explicitly making a change to the blockstore configuration as part of their modify operation. If they were just say, adding a node, then wiping out their local blockstores on all other nodes is probably not a reasonable thing to do. Unfortunately, this may be what I do for the current "reconfig" target, I had better go fix that!
There is code in place today to save the current config at boot time and on reboot read in that config, adding or removing any blockstores that appear of disappear. But that is not going to work unless the reconfig involves a reboot. It also won't work if the boot disk is reloaded as part of the modify. We either need to store the configuration info in /proj, of maybe take advantage of the fact that volume managers like LVM and ZFS can reconstruct their config from on-disk info.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/683Boss/ops hardware upgrades at Cloudlab Clemson2023-06-09T08:45:59-06:00Mike HiblerBoss/ops hardware upgrades at Cloudlab ClemsonThis is nearly identical to #669 and #670. A related issue is #629.
We need to do everything here:
* [ ] both: figure out hardware, either buying new machines or repurposing new-ish nodes.
* [ ] ops: make preliminary copy of current op...This is nearly identical to #669 and #670. A related issue is #629.
We need to do everything here:
* [ ] both: figure out hardware, either buying new machines or repurposing new-ish nodes.
* [ ] ops: make preliminary copy of current ops ZFS data over to new ops
* [ ] both: make preliminary copy of /usr/testbed over to new machines, making sure services are disabled on the new machines
* [ ] figure out what if any DB state needs to be updated to reflect the change
* [ ] schedule downtime
* [ ] make sure @stoller and Scott/Dennis are around to share the pain :-)
* [ ] make the final transitionMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/680A better grub2pxe2023-04-17T11:34:46-06:00Mike HiblerA better grub2pxe`grub2pxe` is our "Linux MFS" replacement for `pxeboot` from FreeBSD-world. Since it is the future (and has been for at least 10 years now) there are a couple of things that could/should be improved:
* We should get our changes synced up...`grub2pxe` is our "Linux MFS" replacement for `pxeboot` from FreeBSD-world. Since it is the future (and has been for at least 10 years now) there are a couple of things that could/should be improved:
* We should get our changes synced up with some suitably recent version of mainstream Grub. In the past I have merged our changes (bootinfo, FreeBSD kernel loading fixes, TFTP improvements). My most recent attempt (last year sometime) resulted in a grub2pxe that didn't work right. So this needs to get sorted.
* Come up with the minimum set of grub2pxe-loaded `grub.cfg` files that will handle loading FreeBSD or Linux MFSes, FreeBSD or Linux on disk images) and UEFI (GPT) or legacy BIOS (MBR) format images on all our nodes types with their various quirks. I started down this path as part of #233, meticulously testing all possible combos for each node type, but after about a month of off and on work, I gave up and just started tweaking the installed versions til they worked without going back and integrating the changes and retesting working combos. All in the name of "let's just get this done already!" The result is a trail of highly similar versions that could be reduced and consolidated.
* Grub has support for network transports other than TFTP, in particular HTTP. Once grub2pxe has been loaded, we could download the kernel and MFS image much more efficiently with HTTP. For one, it is TCP and not UDP so we get some congestion control. For another, there is likely a lot more web server development going on than TFTP server development, in particular for handling lots of simultaneous clients. Finally, putting a random-access disk-based boot loader on top of TFTP results in particularly horrible behavior when it has to do a seek. Seeking backward means starting over at the beginning, and doing sequential block-by-block transfers of data (that you throw away) til you reach the correct new location. Trying to change the behavior of the boot loader by avoiding backward seeks is one of the set of highly invasive, custom changes to Grub we have been carrying around.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/677Clouldlab display problem in narrow window2023-03-16T12:46:16-06:00Dan ReadingClouldlab display problem in narrow windowMonitor is 1920x1200 in vertical orientation
Don't see the drop downs Experiments, Storage, Docs and <user> if the window is 1121W x 1283H. If window width increased to 1541W then the drops downs appear.
If browser window is decreased to...Monitor is 1920x1200 in vertical orientation
Don't see the drop downs Experiments, Storage, Docs and <user> if the window is 1121W x 1283H. If window width increased to 1541W then the drops downs appear.
If browser window is decreased to 75% then the drop down appears.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/674Unclean shutdown of iSCSI blockstores2023-02-13T22:00:01-07:00Mike HiblerUnclean shutdown of iSCSI blockstoresWhen an experiment terminates, iSCSI blockstores with mounted filesystems to not get unmounted cleanly. Since the nodes in an experiment do not get rebooted (I don't think) until the node winds up in `reloading`, the blockstore shutdown ...When an experiment terminates, iSCSI blockstores with mounted filesystems to not get unmounted cleanly. Since the nodes in an experiment do not get rebooted (I don't think) until the node winds up in `reloading`, the blockstore shutdown script's unmounting of remote blockstores at that time will not work since the experiment VLANs have been torn down. The result is that not all data may be sync'ed to the storage server.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/671Enable SEV on Cloudlab AMD nodes2022-12-22T12:21:14-07:00Mike HiblerEnable SEV on Cloudlab AMD nodesThis only applies to Clemson `r6525` nodes right now, but I am creating this ticket to make note of what to do for future nodes.
To enable AMD SEV, enable the following (on Dell machines):
* IOMMU
* Kernel DMA Protection
* Secure Memory...This only applies to Clemson `r6525` nodes right now, but I am creating this ticket to make note of what to do for future nodes.
To enable AMD SEV, enable the following (on Dell machines):
* IOMMU
* Kernel DMA Protection
* Secure Memory Encryption
* Secure Nested Paging
* SNP Memory Coverage
and set "Minimum SEV non-ES ASID" to a value greater than one. It appears to actually be a maximum based on the description in [the vSphere docs](https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.security.doc/GUID-757E2B37-C9D0-416A-AA38-088009C75C56.html). They say set it to N+1 if you want N VMs, so we could set it to something like 17 or 33 maybe.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/666Periodic retries for nodes in `hwdown`2022-07-11T11:12:17-06:00Mike HiblerPeriodic retries for nodes in `hwdown`We spend a considerable amount of time on an ongoing basis dealing with nodes in the `hwdown` experiment. By the time we diagnose such a node, quite often the problem has disappeared or we discover the problem is easily fixable. Worse, i...We spend a considerable amount of time on an ongoing basis dealing with nodes in the `hwdown` experiment. By the time we diagnose such a node, quite often the problem has disappeared or we discover the problem is easily fixable. Worse, if we don't notice a node in there quickly enough it can get pushed down the list by more recent failures and can fall off our radar, winding up in `hwdown` for months.
So... @eeide asks, "Can we do better?" Maybe periodically releasing nodes from `hwdown` to give them a second chance (which is often times what we do manually when we don't have time to diagnose).
Note that this is mostly an issue for nodes of the less popular types (e.g. `pc3000`s). `hwdown`ing of nodes of frequently used types (e.g., those with GPUs) will cause overbook problems in the reservation system or users will complain and we will act quickly.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/665Expiring projects and users2022-06-23T13:59:23-06:00Leigh StollerExpiring projects and usersJust a place to record some notes about we want for this. @ricci mentioned that class projects should expired after some period of time. Some questions that come to mind:
- How long?
- Is the project deleted? We do not have project arch...Just a place to record some notes about we want for this. @ricci mentioned that class projects should expired after some period of time. Some questions that come to mind:
- How long?
- Is the project deleted? We do not have project archiving at this time, and for history purposes we would need to add it.
- Are the users deleted or made inactive? Probably not the leader.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/660A better mechanism for moving data between concurrent experiments2022-03-25T11:59:34-06:00Mike HiblerA better mechanism for moving data between concurrent experimentsLooking at some of our recent control net traffic abuse, there seems to be a trend among them where people move a large amount of data between nodes in different experiments. In one case, they have a semi-persistent single node experimen...Looking at some of our recent control net traffic abuse, there seems to be a trend among them where people move a large amount of data between nodes in different experiments. In one case, they have a semi-persistent single node experiment, and then as they get GPU nodes one at a time, they copy their data over to those. Then (I think), they copy the data back before the GPU node expires. In another last night, they were moving data (I think) from the NVMe drives in an older experiment onto the NVMe drives in a newer experiment, I assumed because the older experiment was going to expire soon. The former is probably about getting some sort of continuity when they can only get GPU nodes for short periods of time. The latter maybe because the two experiments overlap and they cannot use a persistent dataset RW in two experiments at once.
Anyway, there does seem to be a desire to move data between concurrent experiments. Right now, the easiest way to do that is over the control net, either directly with `scp` or indirectly with NFS (/proj). I am pondering whether there is a better way. Possibilities:
* Use a shared vlan between the experiments where they could just do their scp over an experiment net.
* Expose a shared filesystem abstraction via the "blockstore" mechanism. Again that would use the experiment fabric, but would put load on shared infrastructure.
* Eliminate the need for multiple experiments by making it easy to add and remove nodes from an experiment. Then they could have something like an `m400` "NFS server" node (with blockstore) for their data and add/subtract "good" nodes in a LAN to do actual work.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/659Fix up Apt node BIOSes2022-03-02T08:58:19-07:00Mike HiblerFix up Apt node BIOSesWhile attempting to collect hwinfo from all Apt nodes, I noticed two anomalies with the pt (in particular `r320`) nodes.
* No setup password set
* The lifecycle controller is disabled, did we intentionally do that?
Because of the first,...While attempting to collect hwinfo from all Apt nodes, I noticed two anomalies with the pt (in particular `r320`) nodes.
* No setup password set
* The lifecycle controller is disabled, did we intentionally do that?
Because of the first, we should make a pass over the BIOSes and make sure they are configured as we expect.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/658Time synchronization at Cloudlab clusters2022-02-24T12:14:21-07:00Mike HiblerTime synchronization at Cloudlab clustersA recent question on the users list asked about time synchronization between the clusters which got me thinking about this again.
All of the nodes at a cluster use a local (`ntp1`) NTP time server which by convention is `ops`. We also ...A recent question on the users list asked about time synchronization between the clusters which got me thinking about this again.
All of the nodes at a cluster use a local (`ntp1`) NTP time server which by convention is `ops`. We also stash away the "drift" value from each node (via the watchdog) and use the latest saved value to initialize the drift file when a node is imaged. The various cluster NTP servers use a range of upstream servers and NTP pools, but are not directly connected ("peers"). We seem to keep reasonable time between the cluster NTP servers at least, generally around 1-5ms.
Some questions:
* Is saving/restoring the drift value still a good thing to do?
* Should we be using PTP?
* Any chance of getting a GPS receiver at the main clusters?
* Should we use `chrony` which is ```aimed at ordinary computers, which are unstable, go into sleep mode or have intermittent connection to the Internet. chrony is also designed for virtual machines, a much more unstable environment.```? I think current Ubuntu images already use it.
At the very least, we should probably move the `ntp1` alias off of `ops`, which is a VM at all but Emulab, and onto the control node instead where there would be a more stable clock.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/657Emulab storage servers acting flaky2022-02-08T10:21:17-07:00Mike HiblerEmulab storage servers acting flakyDuring the recently completed storage upgrade of the storage box (#656) both of the SAS-attached storage servers exhibited flaky behavior. At one time or another, both rebooted suddenly and during reboots (expected or not), both had a te...During the recently completed storage upgrade of the storage box (#656) both of the SAS-attached storage servers exhibited flaky behavior. At one time or another, both rebooted suddenly and during reboots (expected or not), both had a tendency to hang as the OS was coming up. Additionally, `dbox2` was showing some
```
Processor #0x2d Asserted IERR.
```
errors. I could find no documentation about this, but there were statements that this is likely an error detected by the processor and not an error with the processor itself.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/654Speed up the Emulab database2021-12-06T15:08:09-07:00Mike HiblerSpeed up the Emulab databaseIt is becoming increasingly clear, if it wasn't already, that the database is one of our primary bottlenecks for allowing instantiation of large numbers of experiments at once. The options are either to speed up our mysql setup/schema or...It is becoming increasingly clear, if it wasn't already, that the database is one of our primary bottlenecks for allowing instantiation of large numbers of experiments at once. The options are either to speed up our mysql setup/schema or to switch to a different DB.
For the former, we can further attempt to optimize our MyISAM tables:
* https://dev.mysql.com/doc/refman/5.7/en/optimizing-myisam.html
which is straight forward but will provide minimal payoff. We can switch to InnoDB that supports better parallelism but at non-trivial cost to convert:
* https://dev.mysql.com/doc/refman/5.7/en/converting-tables-to-innodb.html
or we could try clustering or replication:
* https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster.html
* https://dev.mysql.com/doc/refman/5.7/en/replication.html
but I am not sure that those make sense in our environment which doesn't need to scale _that_ far and has a very small footprint infrastructure-wise (one or two servers).
Switching databases would be a lot more work and with no guarantee of better performance. MariaDB:
* https://mariadb.org/
is a fork of mysql and claims to be faster/better/stronger. It would probably be the easiest to transition to. PostgreSQL:
* https://www.postgresql.org/
is more featureful and better for very large DBs, but seems like overkill for us. The transition is likely to be extremely painful as well.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/651Reservations and VTypes problem2021-10-28T09:36:58-06:00Leigh StollerReservations and VTypes problemI noticed today that using "powder-compute" (a global vtype) does not play well with the reservation pre checks in the mapper.I noticed today that using "powder-compute" (a global vtype) does not play well with the reservation pre checks in the mapper.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/650Update storage servers FreeNAS to the latest release (TrueNAS Core)2022-02-08T09:37:23-07:00Mike HiblerUpdate storage servers FreeNAS to the latest release (TrueNAS Core)Our current storage servers are running the dead-end FreeNAS 11 (FreeBSD 11 based). We need to update them to TrueNAS Core version 12 (FreeBSD 12 based). The biggest hurdle is that they have done away with REST API v1.0 in favor of 2.0 w...Our current storage servers are running the dead-end FreeNAS 11 (FreeBSD 11 based). We need to update them to TrueNAS Core version 12 (FreeBSD 12 based). The biggest hurdle is that they have done away with REST API v1.0 in favor of 2.0 which is considerably different.
The storage servers that need upgrading:
* [x] Emulab dbox1 and dbox2
* [ ] Cloudlab Utah dbox2
* [ ] Cloudlab Clemson dbox
* [ ] Cloudlab Wisconsin dboxMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/649Improve storage server disk usage2022-02-08T14:24:23-07:00Mike HiblerImprove storage server disk usageThis comes from another ticket (#627) but is not specific to that storage server.
The general problem is that we run out of space well before what we think is the capacity of the zpool. I discovered that the ZFS zvol volume size (-V whe...This comes from another ticket (#627) but is not specific to that storage server.
The general problem is that we run out of space well before what we think is the capacity of the zpool. I discovered that the ZFS zvol volume size (-V when creating) is a space limit from the perspective of the user, and not a limit on how much space the zvol will consume. While knowing this at some level, I failed to appreciate just how much overhead ZFS can introduce on top of that, including metadata such as RAID parity, checksums, and fragmentation due to misalignment of various blocksizes. Apparently the recommendation for iSCSI on zvols is to not allocate more than 50% of the capacity. In our most extreme example, we have a 15TB dataset, which is using 21TB of disk space even though the overlayed Linux filesystem is only using 7TB. Potential improvements here:
* Use "thin" zvols. This really doesn't solve anything though, it just allows us to overbook storage and probably everyone will run out of space in a truly ugly way down the road.
* Turn on compression for the zvols. This is one way of implementing that "don't allocate the full disk capacity", but of course depends on the nature of the data and doesn't address the metadata overhead.
* Switch to using iSCSI volumes on top of ZFS filesystem files. This is what a number of people say you should do, as apparently it has fewer "misalignment" problems and performs better overall.
* Turn on the "discard" option when we create ext4 filesystems, so that they TRIM. This will help when the overlayed filesystem has a lot of free space.
* Use something other than ZFS? This would be a lot of work to implement at this point.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/645Making parameter sets more prominent2021-09-29T12:06:43-06:00Robert Ricciricci@cs.utah.eduMaking parameter sets more prominentSome ideas to make parameter sets more prominent. Not sure we should do all, but a list for brainstorming, in rough order of the experiment instantiation process
- On the first step of the instantiate page and the profile picker, indicat...Some ideas to make parameter sets more prominent. Not sure we should do all, but a list for brainstorming, in rough order of the experiment instantiation process
- On the first step of the instantiate page and the profile picker, indicate which profiles the user has a paramset for
- In the parameterize step, give them an option to load parameter sets for the profile (and maybe those made by others that they have used recently?)
- On the finalize page, give them a button to save a paramset (probably presented as some kind of save/share option) (of course, only for a parameterized profile)
- On the experiment status page, something similar. This should happen even for failed experiments, to enable a use case where, if there weren't enough nodes or whatever, you have an easy way back later to swap in quickly without re-filling everything.
- A 'recents' menu somewhere (in the main menu in the header?) that takes you right to the middle of the instantiate wizard, with the profile selected and the params filled outLeigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/643Make profile sharing more prominent2021-09-24T14:49:05-06:00Robert Ricciricci@cs.utah.eduMake profile sharing more prominentThough users can share profiles publicly in a few different ways, they seem less aware of this than we would like. I have a few ideas on how we might improve this.
* ![Screenshot_2021-09-24_14-15-34](/uploads/62ccb4f71c836c79f387a99b710...Though users can share profiles publicly in a few different ways, they seem less aware of this than we would like. I have a few ideas on how we might improve this.
* ![Screenshot_2021-09-24_14-15-34](/uploads/62ccb4f71c836c79f387a99b7109c560/Screenshot_2021-09-24_14-15-34.png) I don't think the Share button is as prominent as it could be - it's currently at the bottom, and the same color as other buttons. I think it could stand to be a more distinct color (the bootstrap success color (green)?). Also, when a profile has a long description, it gets hidden off the bottom of the screen, so I think putting it in the box on the left would be better.
* If you don't own a profile, there is no clear indication that it *can* be made public. In the screenshot above, I'm looking at another project-member's profile, and notice that this is not suggested to me at all. If, for example a student is making profiles, and a faculty member is taking care of releasing software, they may not even realize that it's possible to make it public. I'm not sure we want to allow project members to make each others' profiles public (though maybe project leads?), but maybe we could at least have a greyed-out 'make public' button with a tooltip explaining that the owner has to do it. Hmmm - actually now that I look at it, I don't get an indication when looking at my own profiles that I can make the public, without clicking "Edit" - which is different from how most Share UIs work these days.
* Speaking of how share UIs work, here is a comparison between ours and Google Docs. I would not say that I really like the way Google Docs does it, but it does have the advantage that you can change the settings right from the popover. It also does a concise job of explaining what the sharing options mean.
![Screenshot_2021-09-24_14-14-48](/uploads/7a52ca7633ef0b764a139c82d0ba0a51/Screenshot_2021-09-24_14-14-48.png)
![Screenshot_2021-09-24_14-17-50](/uploads/f35171ef753f340935d3adce34e22510/Screenshot_2021-09-24_14-17-50.png)
![Screenshot_2021-09-24_14-18-45](/uploads/388837bdc5c64311745a1bc8b492e650/Screenshot_2021-09-24_14-18-45.png)
* For some reason, people seem to think that they can't make profiles public if they use custom disk images. When they toggle public on something, maybe we can include a message to the effect that this makes their disk image public too.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/642Unify path that nodes take to get into `hwdown`2021-09-22T11:02:32-06:00Mike HiblerUnify path that nodes take to get into `hwdown`Right now, depending on how nodes find their way into `hwdown`, they can be in different states.
For one, years ago I had added a sitevar `reload/hwdownaction` that could define what we do with node when reload failed and we moved the n...Right now, depending on how nodes find their way into `hwdown`, they can be in different states.
For one, years ago I had added a sitevar `reload/hwdownaction` that could define what we do with node when reload failed and we moved the nodes into `hwdown`, one of do nothing, reboot the node into the admin MFS, or power it off. But, as the name implies, this is only done if it is the reload daemon that puts the node in `hwdown`. If it gets there via the checknodes daemon, or via an explicit `sched_reserve` or `nalloc`, then nothing special is done.
For another, whether NFS filesystems should be available and mounted likewise depends on how the node gets into `hwdown`, or more accurately, whether `exports_setup` gets run on that path.
So, we should put some code in the `Node.pm` module or maybe just write an explicit script that will put a node into `hwdown`, taking care of all the magic necessary to ensure it is cleanly removed from wherever it is and put into a consistent state.Mike HiblerMike Hibler