emulab-devel issueshttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues2020-05-19T12:48:27-06:00https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/548Configure setting for protogeni-errors@flux email2020-05-19T12:48:27-06:00Leigh StollerConfigure setting for protogeni-errors@flux emailWe send all CM email on very cluster to that address, we need to make this a configure variable so that sites not part of the Cloudlab do not send us their mail.We send all CM email on very cluster to that address, we need to make this a configure variable so that sites not part of the Cloudlab do not send us their mail.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/549More error checking on frequency ranges2020-05-19T10:20:41-06:00Leigh StollerMore error checking on frequency ranges```
In protogeni-wrapper.pl
DB Query failed
Query: insert into interfaces_rf_limit set node_id='cellsdr1-bes', iface='rf0', freq_low='2400', freq_high='2400', power='0'
Error: Duplicate entry 'cellsdr1-bes-rf0-2400.00-2400.00' ...```
In protogeni-wrapper.pl
DB Query failed
Query: insert into interfaces_rf_limit set node_id='cellsdr1-bes', iface='rf0', freq_low='2400', freq_high='2400', power='0'
Error: Duplicate entry 'cellsdr1-bes-rf0-2400.00-2400.00' for key 'PRIMARY' (1062)
at /usr/testbed/lib/emdbi.pm line 781.
emdbi::DBWarn("DB Query failed") called at /usr/testbed/lib/emdbi.pm line 759
emdbi::DBQueryWarnN(0, "insert into interfaces_rf_limit set node_id='cellsdr1-bes',"...) called at /usr/testbed/lib/emdbi.pm line 763
emdbi::DBQueryWarn("insert into interfaces_rf_limit set node_id='cellsdr1-bes',"...) called at /usr/testbed/lib/emdb.pm line 71
emdb::DBQueryWarn("insert into interfaces_rf_limit set node_id='cellsdr1-bes',"...) called at /usr/testbed/lib/Node.pm line 4717
Node::AddSpectrum(Node=HASH(0x80e8f2300), ARRAY(0x80e8b9ee8)) called at /usr/testbed/lib/GeniCM.pm line 5271
GeniCM::SliverWorkAux(HASH(0x80e8a84e0)) called at /usr/testbed/lib/GeniCMV2.pm line 604
eval {...} called at /usr/testbed/lib/GeniCMV2.pm line 604
GeniCMV2::CreateSliver(HASH(0x80d30fcd8)) called at /usr/testbed/suidbin/protogeni-wrapper.pl line 747
eval {...} called at /usr/testbed/suidbin/protogen
```Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/541Temporarily stop cnetwatch email on a node or experiment.2020-05-19T09:10:18-06:00Leigh StollerTemporarily stop cnetwatch email on a node or experiment.@hibler says:
Since we probably have the motivation right now, this could be a time to see what we can do with the "excessive traffic" messages when the traffic is unavoidable. There is a node and node_type attribute cnetwatch_disable t...@hibler says:
Since we probably have the motivation right now, this could be a time to see what we can do with the "excessive traffic" messages when the traffic is unavoidable. There is a node and node_type attribute cnetwatch_disable that can be set to have cnetwatch ignore a node or type. We could probably add an admin mechanism where we could set that (perhaps on all nodes in an experiment or maybe just per node). Then change nfree (or whereever the appropriate place would be) to clear that when the node is released from an experiment.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/550Expired credential in GeniCM::UnregisterSliver()2020-05-14T08:44:04-06:00Leigh StollerExpired credential in GeniCM::UnregisterSliver()```
at 2:41 PM, Geni User <geniuser@boss.emulab.net> wrote:
In cluster-wrapper.pl
DB Query failed
Query: replace into logfile_metadata set logidx='128554738',metakey='Output',metaval='Malformed arguments: func=xmlSecOpenSSLX509Stor...```
at 2:41 PM, Geni User <geniuser@boss.emulab.net> wrote:
In cluster-wrapper.pl
DB Query failed
Query: replace into logfile_metadata set logidx='128554738',metakey='Output',metaval='Malformed arguments: func=xmlSecOpenSSLX509StoreVerify:file=x509vfy.c:line=341:obj=x509-store:subj=unknown:error=71:certificate verification failed:X509_verify_cert: subject=/C=US/ST=Utah/O=Cloudlab Cluster/OU=utah.cloudlab.us.CORD-SiaB-Manual/CN=c4c8e2d5-743f-11ea-b9ac-d79bac36fb78/emailAddress=testbed-ops@ops.utah.cloudlab.us; issuer=/C=US/ST=Utah/L=Cloudlab/O=Cloudlab Cluster/OU=Certificate Authority/CN=boss.utah.cloudlab.us/emailAddress=testbed-ops@ops.utah.cloudlab.us; err=10; msg=certificate has expired'
Error: Data too long for column 'metaval' at row 1 (1406)
at /usr/testbed/lib/emdbi.pm line 781.
Record day for bugs. Two things to fix; 1) check the length of the string
before trying to insert it, 2) fix the actual problem, using an expired
sliver credential in GenCM.pm ad line 7654:
$credential = $aggregate->NewCredential($me);
The aggregate credential is only for 30 days. I could make them longer,
but might as well fix the problem.
```Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/496Default DNS entries for dynamic IP addreses2020-04-20T16:01:18-06:00Mike HiblerDefault DNS entries for dynamic IP addresesWe should probably have default entries ("A" records) for all our allocatable dynamic IPs. Something like "dyn-155-98-37-86".
If we had records in the named DB then "we" (read "mike") would not go allocating these for other purposes just...We should probably have default entries ("A" records) for all our allocatable dynamic IPs. Something like "dyn-155-98-37-86".
If we had records in the named DB then "we" (read "mike") would not go allocating these for other purposes just because they are not in the DNS or hosts file. It would also make ISO happier if our active IPs resolved.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/522Should not advertise Powder FE ops node2020-04-18T11:26:31-06:00Mike HiblerShould not advertise Powder FE ops nodeOn the FE nodes, ops is a jail with an unroutable 10.x.x.x address, but we are still putting it in the external view of the DNS. This can result in undeliverable email from outside (e.g., a reply to a message).
Don't know if such a conf...On the FE nodes, ops is a jail with an unroutable 10.x.x.x address, but we are still putting it in the external view of the DNS. This can result in undeliverable email from outside (e.g., a reply to a message).
Don't know if such a configuration of the ops node is a general option we support or just a one-off Powder thing, so I am making the issue here.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/539Revised OTA frequency admission control2020-04-04T15:40:21-06:00Kirk WebbRevised OTA frequency admission controlCurrently we require that users have an approved frequency reservation before they are allowed to instantiate an experiment that uses this spectrum. This has resulted in an awkward and delay-prone human-in-the-loop workflow for outdoor ...Currently we require that users have an approved frequency reservation before they are allowed to instantiate an experiment that uses this spectrum. This has resulted in an awkward and delay-prone human-in-the-loop workflow for outdoor wireless experiments.
We have decided to change to the following strategy for RF frequency admission control:
* Do away with "Emulab features" for specific ranges
* Maintain a list of admin-set ranges allowed for a given project
* Free-form, not required to be from a predefined set
* Have an admin interface for setting these ranges
* Provide a set of predefined ranges to auto-populate for convenience (e.g., the Sprint spectrum)
* Have a "global" list for ranges allowed by anyone
* Do not _require_ frequency reservations anymore
* Users can still make reservations to guarantee future availability
* Keep a history of frequency range use
* A `frequency_history` table is suggestedLeigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/437Out of band control for NUCs2020-03-31T16:25:27-06:00Robert Ricciricci@cs.utah.eduOut of band control for NUCsSince we will not have the ready physical access we have always had with Emulab/CloudLab resources, we would really, really like to be able to remotely access the control node for base stations and other end points. (I'm not going out on...Since we will not have the ready physical access we have always had with Emulab/CloudLab resources, we would really, really like to be able to remotely access the control node for base stations and other end points. (I'm not going out on the roof of the Medical Tower in a blizzard to power cycle a NUC!) Specifically we are concerned with maintaining control of NUC control nodes: at the very least the ability to reboot the node and ideally the ability to diagnose via the console.
On the hardware-side, NUCs have watchdog timers and a LAN-based management interface, but it is not yet clear how accessible the watchdog timers are and in some cases we know the LAN-based MI will not help us.
On the software-side, we may be able to mitigate problems by running the base CloudLab control servers inside of a VM on the NUC. This is assuming that the most common cause of failure would be a lock-up/freak-out of the FreeBSD boss/ops and not a hardware failure of the physical box.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/443Scripting to add/remove BYOD devices2020-03-31T16:24:25-06:00Robert Ricciricci@cs.utah.eduScripting to add/remove BYOD devicesBYOD devices are typically gray/black-box hardware (and sometimes software). User shows up with a device, that at minimum has a control network interface and some ability to be rebooted. We have a variety of support for them: osid opmo...BYOD devices are typically gray/black-box hardware (and sometimes software). User shows up with a device, that at minimum has a control network interface and some ability to be rebooted. We have a variety of support for them: osid opmode/features, image boot/load timeout controls, node/type attributes. We have support for a few different types of management/console interfaces (perhaps we could package/automate this a bit, but not sure it's worth it). Then of course we support a variety of power control methods, including IPMI/ilo/drac, etc. These devices may not be imageable and may not have our clientside installed; that's fine and supported. If they support experiment ethernet network interfaces that plug into our fabric, we can isolate those into links.
The goal here is not to add new gray/black-box config/setup modes during experiment runtime -- but rather to better script the addition/removal of (temporary) user BYODs.
Here's what we already have: `addstack`, `addswitch`, `addwire`, `addmanagementiface`, `addinterface`.
There are also `addspecialdevice` `addspecialiface` but at least the former should be extended to handle a few more necessary things (osid, os features) (right now it assumes the presence of the GENERIC osid). Finally, there is `newscript`, which can take XML descriptions of new nodes (and their ifaces/ifacetypes/wires).
My sense is that by extending `addspecialdevice` and `addspecialiface` as mentioned above, and by ensuring that BYOD types can optionally only be used by their owners in their project(s), we can basically call this done for the initial case. (I'm sure when we actually start adding these things more regularly, we'll want to add further options to the `addspecialdevice` script to set various node/type attributes, etc.)David Johnsonjohnsond@flux.utah.eduDavid Johnsonjohnsond@flux.utah.eduhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/463Managing and testing of SyncE and PTP2020-03-31T16:19:49-06:00Robert Ricciricci@cs.utah.eduManaging and testing of SyncE and PTPhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/494Base-station control node configuration2020-03-31T16:05:58-06:00Mike HiblerBase-station control node configurationThere are lots of questions surrounding the control NUC in a roof-top base-station. Whereas the fixed endpoint is a separate aggregate with the control NUC running a boss/ops, a base-station is part of the mothership aggregate and the "c...There are lots of questions surrounding the control NUC in a roof-top base-station. Whereas the fixed endpoint is a separate aggregate with the control NUC running a boss/ops, a base-station is part of the mothership aggregate and the "control NUC" is not really a boss or a subboss.
Its responsibilities include:
* RF monitoring via an attached NUC
* environmental monitoring via attached sensors
* acting independently to address RF/environmental issues if disconnected from the mothership
The last involves its ability to talk to the power controller and possibly RF frontend to shut things down.
Questions:
* What OS does the NUC run? Probably Linux of some flavor as that is the most comfortable for the monitoring tools. The subset of the power control and Arduino control that we currently do under FreeBSD, should work fine under Linux. Or do we run Xen with assorted VMs?
* How does the node appear to Emulab? Is it a manually loaded infrastructure node like boss or ops? Is it an experiment node permanently allocated to an experiment but able to be easily reloaded like a subboss node? Or is it a shared vnode host capable of running vnodes to handle the various responsibilities?
* Where is it on the network? Is it Internet-facing on the control network or can it just be on the private HW control net? It may need the latter for access to power controller, switch or RF frontend.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/418Get NUCs off of old Linux-based MFSes2020-03-26T14:46:07-06:00Mike HiblerGet NUCs off of old Linux-based MFSesA couple of things have changed in Ubuntu 18 w.r.t. the kernel/initrd, making the old linux_slicefix.pl script no longer work. I spent half a day and fixed one of the problems, but I don't think it is worth it to continue down this road.A couple of things have changed in Ubuntu 18 w.r.t. the kernel/initrd, making the old linux_slicefix.pl script no longer work. I spent half a day and fixed one of the problems, but I don't think it is worth it to continue down this road.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/518Represent groups of related reservations2020-03-26T14:29:24-06:00Robert Ricciricci@cs.utah.eduRepresent groups of related reservationsAs mentioned in https://gitlab.flux.utah.edu/powderrenew/powder/issues/21 we need to be able to represent groups of reservations that are logically treated as one, though they don't need to be implemented this way down in the reservation...As mentioned in https://gitlab.flux.utah.edu/powderrenew/powder/issues/21 we need to be able to represent groups of reservations that are logically treated as one, though they don't need to be implemented this way down in the reservation system itself. Some additional thoughts on this can be found in https://gitlab.flux.utah.edu/powderrenew/powder/issues/86 .Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/471Geographically-based Jacks view (POWDER map)2020-03-25T14:17:06-06:00Robert Ricciricci@cs.utah.eduGeographically-based Jacks view (POWDER map)Jonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/535os_load -l2020-03-05T11:34:23-07:00Dan Readingos_load -lI wish this worked again.
os_load -l
DB Query failed:
Query: select distinct i.pid,i.imagename from images as i left join image_versions as v on v.imageid=i.imageid and v.version=i.version left join group_membership as g on ...I wish this worked again.
os_load -l
DB Query failed:
Query: select distinct i.pid,i.imagename from images as i left join image_versions as v on v.imageid=i.imageid and v.version=i.version left join group_membership as g on g.pid_idx=i.pid_idx where (g.uid='dreading' or v.global) order by i.pid,i.imageid
Error: Expression #2 of ORDER BY clause is not in SELECT list, references column 'tbdb.i.imageid' which is not in SELECT list; this is incompatible with DISTINCT (3065) Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/531Reconfigure `bighp1` ports for DriveScale chassis2020-03-02T10:39:11-07:00Mike HiblerReconfigure `bighp1` ports for DriveScale chassisRelated to #526, we needed a couple of those `bighp1` ports to hook up the DriveScale disk box chassis. They breakout from 40Gb to 4x10Gb SFP+ (the links, ironically, are aggregated back to a single 40Gb I think).
So I need to reconfigu...Related to #526, we needed a couple of those `bighp1` ports to hook up the DriveScale disk box chassis. They breakout from 40Gb to 4x10Gb SFP+ (the links, ironically, are aggregated back to a single 40Gb I think).
So I need to reconfigure the ports 1/0/9-10, to be broken out to 10Gb and this unfortunately requires a reboot of the switch.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/532Clientside blockstore code can fail after fsck2020-02-14T16:44:14-07:00Mike HiblerClientside blockstore code can fail after fsckWhen we are booting up and the blockstore device already exists and has a filesystem, we will `fsck` it to be safe. We return failure if `fsck` exits non-zero. However, it appears that `fsck` exits non-zero even if it fixes whatever prob...When we are booting up and the blockstore device already exists and has a filesystem, we will `fsck` it to be safe. We return failure if `fsck` exits non-zero. However, it appears that `fsck` exits non-zero even if it fixes whatever problem there was.
Deal with this.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/526Free up `bighp1` 40Gb ports2020-02-04T10:58:46-07:00Mike HiblerFree up `bighp1` 40Gb portsTo link in the CloudLab2 Phase3, we need to free up some (10?) 40Gb ports on `bighp1`.
The chosen strategy will be to convert the 4x40Gb control network LAGGs for the Moonshot ARM chassis to be 2x10Gb instead. Let us start here:
```
Car...To link in the CloudLab2 Phase3, we need to free up some (10?) 40Gb ports on `bighp1`.
The chosen strategy will be to convert the 4x40Gb control network LAGGs for the Moonshot ARM chassis to be 2x10Gb instead. Let us start here:
```
Card 0 (24 x 40Gb):
0/0/1-4 BAGG1 (ch4, ctrl)
0/0/5-6 BAGG21 (ch11, ctrl, part1)
0/0/7-10 BAGG2 (ch5, ctrl)
0/0/11-12 BAGG21 (ch11, ctrl, part2) [both DOWN -- unplugged?]
0/0/13-16 BAGG3 (ch6, ctrl)
0/0/17-18 BAGG22 (ch12, ctrl, part1)
0/0/19-22 BAGG4 (ch7, ctrl)
0/0/23-24 BAGG22 (ch12, ctrl, part2)
```
and free up ports 3/4, 9/10, 15/16, and 21/22. Note that these are likely to be a major PITA to get to physically, but at least it is the card on the end! I will work with @kwebb to get this done so that @amaricq can do the wiring.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/512Poor 10Gb interface performance on the d820s2020-01-27T08:05:55-07:00Mike HiblerPoor 10Gb interface performance on the d820sA while back (okay, probably a year ago) a user mentioned that he was only getting ~3Gb/s through the 10Gb interfaces on `pc609`. We figured most likely this was a user error, but we put the node into `hwdown` to investigate. A quick loo...A while back (okay, probably a year ago) a user mentioned that he was only getting ~3Gb/s through the 10Gb interfaces on `pc609`. We figured most likely this was a user error, but we put the node into `hwdown` to investigate. A quick look into this a couple of weeks ago confirmed that the node did indeed exhibit poor performance (though not always) when talking to another d820, `pc616`. I also discovered however, that at least one other d820, `pc607`, also shows poor performance. `pc616` does not have any performance issues.
Based on this quick testing, it could be that `pc616` is the outlier and is the only one that can achieve 10Gb/sec. It has newer firmware than at least those other two nodes, so one possible avenue is to update the firmware on one of the others and see if the problem persists.
This does not appear to be an issue with BIOS settings on the nodes, or jumbo frames (or other) settings on the Arista switch.
Needs more investigation.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/250Fix whacky escape sequences on the reserve info page2020-01-27T07:44:12-07:00Leigh StollerFix whacky escape sequences on the reserve info pageLike: ```We would like to reserve nodes to run a 16+1 node experiment to show our system performance under scale out. It will be helpful to run our 1&#x2F;2&#x2F;4&#x2F;8&#x2F;16 node experiments all at once on the same cluster instead ...Like: ```We would like to reserve nodes to run a 16+1 node experiment to show our system performance under scale out. It will be helpful to run our 1/2/4/8/16 node experiments all at once on the same cluster instead of creating a new 1/2/4/8/16 node cluster for each experiment. We may not need all of the reserved time, and will return the machines as soon as we are done with the experiment. Thank you.```Leigh StollerLeigh Stoller