emulab-devel issueshttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues2021-10-22T14:44:49-06:00https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/648Stop using the uuid for sharing profiles.2021-10-22T14:44:49-06:00Leigh StollerStop using the uuid for sharing profiles.This came up while I was in Utah. We should stop using the uuid of a profile for sharing it (when not public) since that makes it impossible to revoke.This came up while I was in Utah. We should stop using the uuid of a profile for sharing it (when not public) since that makes it impossible to revoke.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/647Include reservation justification text in admin email2021-10-21T07:01:51-06:00Kirk WebbInclude reservation justification text in admin emailFeature request:
I'd like to see the user's justification text in the email notification messages for reservation requests. For example, this would have helped this past weekend when a user asked if the start time could be moved to som...Feature request:
I'd like to see the user's justification text in the email notification messages for reservation requests. For example, this would have helped this past weekend when a user asked if the start time could be moved to sometime Sunday. I saw that there was a reservation request during the weekend, but didn't look at it until this (Monday) morning.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/646Deal with CA root certificate expiration fallout2022-01-06T15:23:44-07:00Mike HiblerDeal with CA root certificate expiration falloutOn 09/30/2021 the root "DST Root CA X3" certificate expired. A new certificate ("ISRG Root X1") was in place well in advance, but OpenSSL 1.0.2 (and others) still try to chain through the old certificate. See [this blog post](https://www...On 09/30/2021 the root "DST Root CA X3" certificate expired. A new certificate ("ISRG Root X1") was in place well in advance, but OpenSSL 1.0.2 (and others) still try to chain through the old certificate. See [this blog post](https://www.openssl.org/blog/blog/2021/09/13/LetsEncryptRootCertExpire/). This affects not only our servers, but all standard and custom images as well.
Things we gotta do:
* [x] Fix all boss/ops/dbox/whatever nodes that need HTTPS service from anyone.
* [x] Make sure out client images going forward **do not** include the DST certificate and **do** include the replacement.
* [x] Add `slicefix` magic to fix up custom images based on our supported images (Ubuntu 16+, CentOS 7+, FreeBSD 11+).
* [ ] Have a plan for older images (instructions for how users can fix them?).
The fix is pretty straight forward for at least Ubuntu and FreeBSD, just remove the invalid certificate from the right places. I will note that Ubuntu 14 does not include the replacement certificate, so a fix is harder...if we chose to try and do something about older images.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/645Making parameter sets more prominent2021-09-29T12:06:43-06:00Robert Ricciricci@cs.utah.eduMaking parameter sets more prominentSome ideas to make parameter sets more prominent. Not sure we should do all, but a list for brainstorming, in rough order of the experiment instantiation process
- On the first step of the instantiate page and the profile picker, indicat...Some ideas to make parameter sets more prominent. Not sure we should do all, but a list for brainstorming, in rough order of the experiment instantiation process
- On the first step of the instantiate page and the profile picker, indicate which profiles the user has a paramset for
- In the parameterize step, give them an option to load parameter sets for the profile (and maybe those made by others that they have used recently?)
- On the finalize page, give them a button to save a paramset (probably presented as some kind of save/share option) (of course, only for a parameterized profile)
- On the experiment status page, something similar. This should happen even for failed experiments, to enable a use case where, if there weren't enough nodes or whatever, you have an easy way back later to swap in quickly without re-filling everything.
- A 'recents' menu somewhere (in the main menu in the header?) that takes you right to the middle of the instantiate wizard, with the profile selected and the params filled outLeigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/644Public page listing public profiles2021-10-21T07:03:53-06:00Robert Ricciricci@cs.utah.eduPublic page listing public profilesIf we remember correctly, we made it possible to view the profile page for public profiles even if you are not logged in. We should similarly make a page that's visible to non-logged-in people that lists all of the public profiles. Maybe...If we remember correctly, we made it possible to view the profile page for public profiles even if you are not logged in. We should similarly make a page that's visible to non-logged-in people that lists all of the public profiles. Maybe with thumbnails of the topos (if we still generate those) and abbreviated version of the short description. Sorted by all-time instantiations or recent instantiations.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/643Make profile sharing more prominent2021-09-24T14:49:05-06:00Robert Ricciricci@cs.utah.eduMake profile sharing more prominentThough users can share profiles publicly in a few different ways, they seem less aware of this than we would like. I have a few ideas on how we might improve this.
* ![Screenshot_2021-09-24_14-15-34](/uploads/62ccb4f71c836c79f387a99b710...Though users can share profiles publicly in a few different ways, they seem less aware of this than we would like. I have a few ideas on how we might improve this.
* ![Screenshot_2021-09-24_14-15-34](/uploads/62ccb4f71c836c79f387a99b7109c560/Screenshot_2021-09-24_14-15-34.png) I don't think the Share button is as prominent as it could be - it's currently at the bottom, and the same color as other buttons. I think it could stand to be a more distinct color (the bootstrap success color (green)?). Also, when a profile has a long description, it gets hidden off the bottom of the screen, so I think putting it in the box on the left would be better.
* If you don't own a profile, there is no clear indication that it *can* be made public. In the screenshot above, I'm looking at another project-member's profile, and notice that this is not suggested to me at all. If, for example a student is making profiles, and a faculty member is taking care of releasing software, they may not even realize that it's possible to make it public. I'm not sure we want to allow project members to make each others' profiles public (though maybe project leads?), but maybe we could at least have a greyed-out 'make public' button with a tooltip explaining that the owner has to do it. Hmmm - actually now that I look at it, I don't get an indication when looking at my own profiles that I can make the public, without clicking "Edit" - which is different from how most Share UIs work these days.
* Speaking of how share UIs work, here is a comparison between ours and Google Docs. I would not say that I really like the way Google Docs does it, but it does have the advantage that you can change the settings right from the popover. It also does a concise job of explaining what the sharing options mean.
![Screenshot_2021-09-24_14-14-48](/uploads/7a52ca7633ef0b764a139c82d0ba0a51/Screenshot_2021-09-24_14-14-48.png)
![Screenshot_2021-09-24_14-17-50](/uploads/f35171ef753f340935d3adce34e22510/Screenshot_2021-09-24_14-17-50.png)
![Screenshot_2021-09-24_14-18-45](/uploads/388837bdc5c64311745a1bc8b492e650/Screenshot_2021-09-24_14-18-45.png)
* For some reason, people seem to think that they can't make profiles public if they use custom disk images. When they toggle public on something, maybe we can include a message to the effect that this makes their disk image public too.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/642Unify path that nodes take to get into `hwdown`2021-09-22T11:02:32-06:00Mike HiblerUnify path that nodes take to get into `hwdown`Right now, depending on how nodes find their way into `hwdown`, they can be in different states.
For one, years ago I had added a sitevar `reload/hwdownaction` that could define what we do with node when reload failed and we moved the n...Right now, depending on how nodes find their way into `hwdown`, they can be in different states.
For one, years ago I had added a sitevar `reload/hwdownaction` that could define what we do with node when reload failed and we moved the nodes into `hwdown`, one of do nothing, reboot the node into the admin MFS, or power it off. But, as the name implies, this is only done if it is the reload daemon that puts the node in `hwdown`. If it gets there via the checknodes daemon, or via an explicit `sched_reserve` or `nalloc`, then nothing special is done.
For another, whether NFS filesystems should be available and mounted likewise depends on how the node gets into `hwdown`, or more accurately, whether `exports_setup` gets run on that path.
So, we should put some code in the `Node.pm` module or maybe just write an explicit script that will put a node into `hwdown`, taking care of all the magic necessary to ensure it is cleanly removed from wherever it is and put into a consistent state.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/641Show/warn users about pending (unapproved) reservation requests2021-10-21T18:05:31-06:00Kirk WebbShow/warn users about pending (unapproved) reservation requestsFor scarce/single resources that are in demand, users can easily step on each other with reservation requests. This is because they have no visibility into pending requests. I propose that we show pending reservation requests on the "a...For scarce/single resources that are in demand, users can easily step on each other with reservation requests. This is because they have no visibility into pending requests. I propose that we show pending reservation requests on the "available resources" views, probably using a different color/marking to distinguish them. Additionally, we should email a warning to users that submit requests that overlap with existing unapproved requests. Finally, the "search" button on the reservation request page should take into account pending reservations when looking for an available window.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/640Fix clientside scripts to work with python3.2021-09-03T13:54:54-06:00Mike HiblerFix clientside scripts to work with python3.We have fixed up the server side (#611) along with a few of the client-side scripts that are used on `ops`, but we should finish the job.We have fixed up the server side (#611) along with a few of the client-side scripts that are used on `ops`, but we should finish the job.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/639Slow disk IO on Wisconsin boss and ops VMs2022-12-23T08:43:56-07:00Mike HiblerSlow disk IO on Wisconsin boss and ops VMsI don't even remember the details here.
Spun this off from (#482) so that I can close that issue.I don't even remember the details here.
Spun this off from (#482) so that I can close that issue.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/638Reload Topology button broken2021-09-13T05:02:53-06:00Leigh StollerReload Topology button brokenThe reload topology button loses the nodes in the manifest somehow. Okay after page reload, so must be something in the javascript.The reload topology button loses the nodes in the manifest somehow. Okay after page reload, so must be something in the javascript.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/637Repo Update button works once but then does nothing, need to page reload2021-10-21T07:05:14-06:00Leigh StollerRepo Update button works once but then does nothing, need to page reloadNo big deal, but I am not home to put a postit on my monitor.No big deal, but I am not home to put a postit on my monitor.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/636Mysql performance on boss is terrible!2021-10-21T07:07:54-06:00Leigh StollerMysql performance on boss is terrible!Since the 12.2 upgrade i have noticed a lot of very slow loading web pages. Looking at the mysql slow queries log there are an enormous number of ones like this, that should have been close to instant.
Its bothering me enough that I am ...Since the 12.2 upgrade i have noticed a lot of very slow loading web pages. Looking at the mysql slow queries log there are an enormous number of ones like this, that should have been close to instant.
Its bothering me enough that I am going to have to dig into it.
```
# Time: 2021-08-20T19:35:18.826539Z
# User@Host: skip-grants user[instantiate.php] @ localhost [] Id: 31469255
# Query_time: 5.324131 Lock_time: 0.002438 Rows_sent: 1152 Rows_examined: 29629
SET timestamp=1629488118;
select p.uuid,p.name,p.pid,v.creator,p.profileid, p.usecount,f.marked from apt_profiles as p left join apt_profile_versions as v on v.profileid=p.profileid and v.version=p.version left join group_membership as g on g.uid_idx='926619' and g.pid_idx=v.pid_idx and g.pid_idx=g.gid_idx left join apt_profile_favorites as f on f.profileid=p.profileid and f.uid_idx='926619' where locked is null and p.disabled=0 and v.disabled=0 and (p.public=1 or p.shared=1 or v.creator_idx='926619' or g.uid_idx is not null );
```Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/635`frisbeed` eating up CPU2021-08-29T20:12:14-06:00Mike Hibler`frisbeed` eating up CPUNew for FreeBSD 12.2!
When `frisbeed` finishes serving clients, it eats up 100% of a CPU til it dies. Doing some mutex op repeatedly:
```
...
72175 frisbeed CALL _umtx_op(0x800289f10,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffffffe488)
7...New for FreeBSD 12.2!
When `frisbeed` finishes serving clients, it eats up 100% of a CPU til it dies. Doing some mutex op repeatedly:
```
...
72175 frisbeed CALL _umtx_op(0x800289f10,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffffffe488)
72175 frisbeed RET _umtx_op -1 errno 60 Operation timed out
72175 frisbeed CALL _umtx_op(0x800289f10,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffffffe488)
72175 frisbeed RET _umtx_op -1 errno 60 Operation timed out
...
```Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/634FreeBSD: "older" fsck version is incompatible with "newer" versions of UFS2021-08-11T10:34:07-06:00Mike HiblerFreeBSD: "older" fsck version is incompatible with "newer" versions of UFSWhile we were setting up an elabinelab for testing the new firewall, we used prebuilt full-disk images of 12.2-based boss and ops nodes and put those down on a couple of d430s. Upon booting, we got all kinds of cylinder group checksum er...While we were setting up an elabinelab for testing the new firewall, we used prebuilt full-disk images of 12.2-based boss and ops nodes and put those down on a couple of d430s. Upon booting, we got all kinds of cylinder group checksum errors from both kernels. After many (many) bad theories we discovered that the FreeBSD 10 version of fsck in the MFS will fix up bad summary information, which is actually metadata in a FreeBSD 12 filesystem.
I only know for sure that this is a problem between FreeBSD 10 and 12, I haven't tracked this down to see when the incompatibility really happened. Hence the vague "older" and "newer" in the title.
This needs to be fixed in anything that deals with FreeBSD filesystems. Possibly this is as simple as switching to a FreeBSD 12 version of fsck--assuming that version does not screw up filesystems for older versions of FreeBSD. This will affect the FreeBSD and Linux based MFSes as well as the blockstore code in FreeBSD.
We were here once before, with FreeBSD 4 and 5 I think. Didn't really have a satisfactory fix in that case. I think. That was many generations of Mike ago though.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/633Fix medusa segfaults2021-09-03T13:42:56-06:00Mike HiblerFix medusa segfaultsWe run `medusa` on boss against experiment nodes to look for authentication issues in VNC. The version we use, 2.2 from 2016, is the latest released version but has core-dump issues. @stoller has tried tracking these down and so have I. ...We run `medusa` on boss against experiment nodes to look for authentication issues in VNC. The version we use, 2.2 from 2016, is the latest released version but has core-dump issues. @stoller has tried tracking these down and so have I. The latest upgrade on boss and ops (to FreeBSD 12.2) seems to have exacerbated the problem and put it back on my radar.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/632Get our content out of the Plone wiki2021-09-03T13:52:31-06:00Mike HiblerGet our content out of the Plone wikiWe have been keeping Plone on life support through a couple of boss/ops upgrades. After the latest, it is time to pull the plug since nobody wants to convert it to python3. So we need to get our content out of there and loaded somewhere ...We have been keeping Plone on life support through a couple of boss/ops upgrades. After the latest, it is time to pull the plug since nobody wants to convert it to python3. So we need to get our content out of there and loaded somewhere else. (gitlab wiki?) If only I had remembered this *before* we converted ops...
So now we have to move the current installation over to a machine with python2, either an elabinelab or else just move it somewhere like ops.utah.cloudlab.us. Then figure out a way to extract the useful content. Then figure out a way to get that content into something else in a reasonable form.
I expect this falls on @hibler or @stoller.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/631Reservation time search can return reservations that don't work2021-07-15T15:45:11-06:00Robert Ricciricci@cs.utah.eduReservation time search can return reservations that don't workI heard from a user that they used the feature of the reservation request page where they put in a machine type and number of days to search for a start time, but when they clicked to request the reservation, they were told it didn't fit...I heard from a user that they used the feature of the reservation request page where they put in a machine type and number of days to search for a start time, but when they clicked to request the reservation, they were told it didn't fit.
I don't know the reason; eg. it's possible an experiment swapped in or got extended between when the search ran and when they requested it, but it seems like it might be worth taking a look to make sure we don't have any obvious potential bugs.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/630Get the 32-port 100Gb Barefoot switch up and running2021-09-03T13:57:08-06:00Mike HiblerGet the 32-port 100Gb Barefoot switch up and runningBrent wants to use this, so we need to get it integrated in the testbed and wired up to some nodes.
The short-term plan is to connect up 8 of the new c6525-100g nodes directly as they have a second, unused 100Gb port.
* [x] Rack the swi...Brent wants to use this, so we need to get it integrated in the testbed and wired up to some nodes.
The short-term plan is to connect up 8 of the new c6525-100g nodes directly as they have a second, unused 100Gb port.
* [x] Rack the switch in V05
* [x] Wire up the nodes
* [ ] Apply a feature to the nodes so they can be specified in a profile (and possibly to dis-favor them for normal use)
* [ ] Add the switch to the DB and make the management interface accessible in a safe way
Down the road:
* [ ] Ability to reload the switch OShttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/629Replace the storage server at Clemson.2023-06-09T08:43:20-06:00Mike HiblerReplace the storage server at Clemson.Spinning this one off from #567 as well. That issues says:
```
Clemson: 50TB on one server, but in two zpools of 43 and 7TB. 5.8TB and 1.3TB free. Again a couple of
possibilities. The first would be to commandeer another one of the first...Spinning this one off from #567 as well. That issues says:
```
Clemson: 50TB on one server, but in two zpools of 43 and 7TB. 5.8TB and 1.3TB free. Again a couple of
possibilities. The first would be to commandeer another one of the first-gen storage node, giving us
another 50TB. A more intriguing possibility would be to take over one of the dss7500 nodes with 45 HDDs
and 270TB. That would solve the space issue for some time to come but would take one of only two of
those machines. I only suggest it because those machines are almost never used right now, which is
a waste.
```
This has come to the forefront because the smaller zpool of 1TB disks has a slowly failing disk that ZFS can deal with, but it takes long enough for it to retry an operation that the iSCSI client times out, leading to "disk errors" and a corrupted filesystem. `smartctl` confirms lots of corrected errors and that it is in the "pre-fail" state. Unfortunately, so is every other disk in that zpool, so I am not sure there is much point in replacing the one disk. We need to evacuate that pool.
Short term I am going to clear out dead datasets and move everything left on the small zpool to the larger one. That one has 4TB drives that are just as old, but seem to be holding up better.
But it is time to seriously consider taking over one of the dss7500 nodes.Mike HiblerMike Hibler