emulab-devel issueshttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues2018-06-21T14:30:33-06:00https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/386Controlling `stated` initiated reboots2018-06-21T14:30:33-06:00Mike HiblerControlling `stated` initiated rebootsWe need to have some way of temporarily putting nodes into an "always up" mode so that `stated` will not try to reboot them at inopportune times, e.g., while the BIOS is being updated.
TL;DR:
`Stated` was designed from the beginning to...We need to have some way of temporarily putting nodes into an "always up" mode so that `stated` will not try to reboot them at inopportune times, e.g., while the BIOS is being updated.
TL;DR:
`Stated` was designed from the beginning to have a reboot "trigger" action that would allow it to reboot a node if it timed-out in a particular state. However, for the first 15 or so years, that particular action was commented out, it would just say that it was rebooting a node but not really doing it.
Enter @mike, who a year or so ago decided that `stated` should walk-the-walk and stop being all talk and no action, and uncommented the reboot. For the most part this is a good thing, but if we have to boot a node into the BIOS for an extended period, or boot it from a dongle or CD with something that is not one of our images, then we can get into a state where `stated` thinks the node is booting and starts the timer ticking for it to make progress. It will then proceed, at 10 minute or so intervals depending on the state, to reboot the node up to three times.
You can HUP `stated` and have it re-read the DB, so it is possible to, say, manually change the DB to put a node into the ALWAYSUP op_mode. But HUPing `stated` tends to bring a lot of skeletons out of the closet as it resets timers and starts monitoring old nodes (e.g., wireless nodes) that it had long ago given up on. This is mostly harmless and generally only results in email messages that bring back a feeling of nostalgia in the reader.Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/111Give ops an interface in the node control network on the mothership2018-06-21T12:52:40-06:00Robert Ricciricci@cs.utah.eduGive ops an interface in the node control network on the mothershipThe goal here is to avoid going through the router for NFS mounts, to see to what extent this clears up (or doesn't) headaches with NFS.
The basic plan:
* Give `ops` a new (VLAN-based) interface in the node control net
* Firewal...The goal here is to avoid going through the router for NFS mounts, to see to what extent this clears up (or doesn't) headaches with NFS.
The basic plan:
* Give `ops` a new (VLAN-based) interface in the node control net
* Firewall that interface so that only NFS is allowed on it
* Point `fs.emulab.net` to the new IP addressKirk WebbKirk Webbhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/322not updating the aliases file2018-06-21T10:54:16-06:00Dan Readingnot updating the aliases fileOn the boss machines something touches /etc/mail/aliases and does not run newaliases.
The generated aliases.db file is much different, as least in size, to the current aliases.db files.On the boss machines something touches /etc/mail/aliases and does not run newaliases.
The generated aliases.db file is much different, as least in size, to the current aliases.db files.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/264When image name is bogus, instantiate-time error is too vague to be helpful2018-02-01T14:18:48-07:00Eric EideWhen image name is bogus, instantiate-time error is too vague to be helpfulConsider a profile that references a bogus image ID, like "FBSD103-64-STD".
If one tries to instantiate this profile, the instantiate-time error message is:
> This profile will not work on any clusters. Please check your profile or...Consider a profile that references a bogus image ID, like "FBSD103-64-STD".
If one tries to instantiate this profile, the instantiate-time error message is:
> This profile will not work on any clusters. Please check your profile or parameters for errors. If you are sure they are correct, you can report the problem to support@cloudlab.us and make sure to link to the problematic profile.
This is true, and it is good to get an error message at this point—but the error message doesn't explain anything about the reason for the error.
The error message would be much more helpful to users if it could explain some reason for the failure, e.g.,
> This profile will not work on any cluster because no cluster has an image named 'FBSD103-64-STD'."
This error message might be further improved, of course, or we might try harder to keep people from putting goofy image names in their RSpecs, but the main point of this ticket is to get more info into the instantiate-time error message.Jonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/377Fix and improve handling for geni users who are not in any project at the GPO2018-01-25T14:12:10-07:00Leigh StollerFix and improve handling for geni users who are not in any project at the GPOThis still needs work, it would be handy if the user could "promote" to a real user, but that path needs work. In the meantime we need to show them more clearly that they cannot do anything if they do not have any project membership.This still needs work, it would be handy if the user could "promote" to a real user, but that path needs work. In the meantime we need to show them more clearly that they cannot do anything if they do not have any project membership.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/370Jacks slowness (?)2018-01-25T13:04:34-07:00David Johnsonjohnsond@flux.utah.eduJacks slowness (?)I am encountering more Jacks slowness when I attempt to instantiate https://www.cloudlab.us/show-profile.php?uuid=6a1598a1-cef5-11e7-b179-90e2ba22fee4 when I change the number of `VBG VMs per host` to 20; `Physical Hosts` to 10; and set ...I am encountering more Jacks slowness when I attempt to instantiate https://www.cloudlab.us/show-profile.php?uuid=6a1598a1-cef5-11e7-b179-90e2ba22fee4 when I change the number of `VBG VMs per host` to 20; `Physical Hosts` to 10; and set `Number VM hosts plugged into each VBG VM` to 0 instead of 1.
Seems like it takes 4-5 minutes on my lightly-loaded, 32GB RAM desktop, to get to a fully-rendered Finalize frame in the wizard. I didn't try to separate out the cost of the constraint checker vs the renderer. Then there is a long delay on the status page when transitioning from the 'provisioning' state to the 'booting' state, even once CreateSliver is obviously a long way down the road. @stoller suspects that part of that delay is rendering the Topology View.
I need some way around this in the next couple or three weeks, even if we can't look at the root cause prior to that. My ideas are things like a profile parameter/metadata bit, UI option, instantiate URL param, to disable jacks rendering. Of course, if we allow disabling of Jacks rendering, then we need to ensure that the same Actions can be performed from the Node List tab as can be done from the Topology View tab. I wonder, would it also be easy to revert back to the legacy Emulab renderer for large experiments or if Jacks render has been disabled? Presumably we generate the classic experiment picture at the CM, and would just have to get it back to the portal and dump into the Topology View tab. Anyway, maybe something like that could be a stopgap?Jonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/329Fix Image import at Moonshot cluster2018-01-11T14:26:50-07:00Leigh StollerFix Image import at Moonshot clusterIf the imported image does not include an architecture definition in the metadata, we do not know which node types or architecture to assign locally. Normally not a problem, we just assign all local node types, which works everywhere exc...If the imported image does not include an architecture definition in the metadata, we do not know which node types or architecture to assign locally. Normally not a problem, we just assign all local node types, which works everywhere except Moonshot.
We had a bunch of imported images marked to run on m400 and m510. It was easy to fix all the images up and update the image server, since in our universe, all imported images are by definition x86 images. But need to do two things;
1) Make sure we set the arch for all node types and images on our clusters.
2) Do something for when we just do not get an architecture.Leigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/301Stuff we install on ops2018-01-03T12:09:27-07:00Mike HiblerStuff we install on opsNot counting the mothership which has its own special demands, do we really need to install:
* mysql-server
* apache
* php*
on an ops node?Not counting the mothership which has its own special demands, do we really need to install:
* mysql-server
* apache
* php*
on an ops node?https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/327Color code based on adjusted free values2017-11-06T09:24:57-07:00Jonathon DuerigColor code based on adjusted free valuesChange color coding in the cluster picker to follow the adjusted free values that don't count reserved nodes as fully 'free'. Consult with @gtw to see where this adjusted information is coming in via the XMLRPC call.Change color coding in the cluster picker to follow the adjusted free values that don't count reserved nodes as fully 'free'. Consult with @gtw to see where this adjusted information is coming in via the XMLRPC call.Jonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/340Logging console output for boss/ops VMs2017-11-06T09:02:05-07:00Mike HiblerLogging console output for boss/ops VMsFor boss/ops nodes running as VMs, it would be useful to have a way to capture the "xm console" output for diagnostic purposes. This is of particular interest to Hussam at UK since he has all those geniracks to babysit. He says there is ...For boss/ops nodes running as VMs, it would be useful to have a way to capture the "xm console" output for diagnostic purposes. This is of particular interest to Hussam at UK since he has all those geniracks to babysit. He says there is a `xend` option to do this with PV domUs, but that it doesn't work for HVM domUs (aka, FreeBSD 10.x VMs). The only way he has found to do it is to set the xm.conf file to say the console is a file. But then you lose interactive console ability.
So I know we can do this by running `capture` on the control node, I had to make some changes so that it would deal with the annoying habit of Xen changing the pty device when an HVM reboots, but it seems to be working okay since then. But then there is the issue of interactive access. `capture` is usually configured so that it reports a secret to boss and then boss mediates access to the capture process via `console`. But there is no obvious "boss" in this situation. However, once upon a time I think we had a version of `console` or some other hack that allowed local connection to the capture process without authentication. That would work in this case, but I will have to resurrect that knowledge...Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/254Fix geni-lib link shaping properties to match what Emulab supports2017-07-19T17:55:08-06:00Leigh StollerFix geni-lib link shaping properties to match what Emulab supportsgeni-lib does not support asymmetric link shaping properties.geni-lib does not support asymmetric link shaping properties.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/31geni-lib side of encrypted parts2017-07-13T11:02:24-06:00Robert Ricciricci@cs.utah.edugeni-lib side of encrypted partsPortal geni-lib supportDavid Johnsonjohnsond@flux.utah.eduDavid Johnsonjohnsond@flux.utah.eduhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/300Fix default encapsulation style for multiplexed links2017-07-06T13:17:54-06:00Mike HiblerFix default encapsulation style for multiplexed linksRight now our default encapsulation style is `veth-ne` which doesn't work anywhere. Good choice, eh?
But it is a good match for our default VM osid, OPENVZ-STD, which likewise doesn't work anywhere.
History is a bitch.
Anyway, at least...Right now our default encapsulation style is `veth-ne` which doesn't work anywhere. Good choice, eh?
But it is a good match for our default VM osid, OPENVZ-STD, which likewise doesn't work anywhere.
History is a bitch.
Anyway, at least @stoller and I talked about switching the default to `vlan` since that is the only thing that works anywhere. At the time, my hour or so of investigation into this revealed that it was going to be more complicated than I thought to do it.
The nasty thing about vlan encapsulation in our environment is, IIRC (unlikely), 1) we reserve VLAN tags even if all the virtual links are on one node and 2) if we do have to put VLANs on the switches it quickly becomes the bottleneck of experiment setup (a couple hundred VLANs can take 20-30 minutes to setup, IIRC (unlikely)).
@johnsond is talking about setting up a 1000 docker experiment and if he has any significant number of virtual links/lans in that, he is going to be one :hurtinpuppy:. So maybe making `vlan` the default in the Age of Docker is not the right thing. But then we are back to SW encapsulation or no encapsulation (the latter meaning no isolation--put everything out in the same broadcast domain and let the nodes themselves sort it out)Mike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/268Jacks: Cannot make firewalled experiments2017-06-22T13:21:54-06:00Eric EideJacks: Cannot make firewalled experimentsUnless I am mistaken, there is no way to make firewalled experiments in Jacks.
The underlying plumbing is there for this, I think. (See closed issue #108.)Unless I am mistaken, there is no way to make firewalled experiments in Jacks.
The underlying plumbing is there for this, I think. (See closed issue #108.)Jonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/297Removing a project doesn't seem to unmount dirs on boss2017-06-05T11:25:17-06:00Mike HiblerRemoving a project doesn't seem to unmount dirs on bossIf I remove project `bar`, the mounts of `/proj/bar` and `/groups/bar` seem to still exist on boss. Not normally a problem except when you want to immediately reuse the project name.
This will possibly involve flushing of amd or automou...If I remove project `bar`, the mounts of `/proj/bar` and `/groups/bar` seem to still exist on boss. Not normally a problem except when you want to immediately reuse the project name.
This will possibly involve flushing of amd or automounter state as well as just doing unmounts.https://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/211"Projects That Have Used CloudLab" page2017-05-04T12:24:32-06:00Robert Ricciricci@cs.utah.edu"Projects That Have Used CloudLab" pageWe should do a version of this:
http://www.emulab.net/projectlist.php3
... that's specifically for CloudLab, and hosted on the cloudlab.us site. We can presumably use the same data source, just filtered for projects that have instantia...We should do a version of this:
http://www.emulab.net/projectlist.php3
... that's specifically for CloudLab, and hosted on the cloudlab.us site. We can presumably use the same data source, just filtered for projects that have instantiated at least one experiment on one of the CloudLab clusters.Robert Ricciricci@cs.utah.eduRobert Ricciricci@cs.utah.eduhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/104Support automatic setup of a routing daemon2017-03-29T22:49:13-06:00Robert Ricciricci@cs.utah.eduSupport automatic setup of a routing daemonSupport the equivalent of our old NS "Session" routing, in which we set up a routing daemon (`quagga`?) on the nodes insteadSupport the equivalent of our old NS "Session" routing, in which we set up a routing daemon (`quagga`?) on the nodes insteadPortalizing EmulabMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/110Feature-by-feature comparison between Emulab Java GUI and Jacks2017-03-29T22:49:13-06:00Robert Ricciricci@cs.utah.eduFeature-by-feature comparison between Emulab Java GUI and JacksThe goal here is to see what features the old Emulab Java GUI had that do not currently exist in Jacks, and to decide which ones we need to add to Jacks and which ones we are dropping GUI support forThe goal here is to see what features the old Emulab Java GUI had that do not currently exist in Jacks, and to decide which ones we need to add to Jacks and which ones we are dropping GUI support forPortalizing EmulabJonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/136Emulab Portal: Sign-up page doesn't have links to policies or other docs2017-03-29T22:49:12-06:00Eric EideEmulab Portal: Sign-up page doesn't have links to policies or other docsThe Emulab portal sign-up page at https://www.emulab.net/portal/signup.php does not have any information about what a project is, who we allow to start projects, an acceptable-use policy, etc. I think that the the sign-up page should hav...The Emulab portal sign-up page at https://www.emulab.net/portal/signup.php does not have any information about what a project is, who we allow to start projects, an acceptable-use policy, etc. I think that the the sign-up page should have obvious links and/or embedded text that explains these things.
The "old" Emulab sign-up process has a lot more information about the policies, including a separate page that warns students that their professor needs to start a project.Robert Ricciricci@cs.utah.eduRobert Ricciricci@cs.utah.eduhttps://gitlab.flux.utah.edu/emulab/emulab-devel/-/issues/146Making the event system great again2017-03-29T22:49:12-06:00Mike HiblerMaking the event system great againWe have had a variety of issues recently affecting the stability and performance of the event system.
Of particular interest is the UPenn peoples' use of rapid `tevc` calls (or even direct-to-pubsub calls) to deliver link change events ...We have had a variety of issues recently affecting the stability and performance of the event system.
Of particular interest is the UPenn peoples' use of rapid `tevc` calls (or even direct-to-pubsub calls) to deliver link change events to 30+ nodes in their experiments. And there are typically 2-3 instances of this experiment workflow going on at once. They have discovered that they are limited to about 50 events/sec (I assume this is in aggregate and not to each node in the experiment). Beyond that they have problems with the event scheduler crashing and/or the entire infrastructure melting down, either immediately or after 30+ minutes.
So maybe it is time to revisit our NCR-era scaling plans. We discussed the problems in the Big Book of Emulab, chapter 7 (there is a copy at bas:~mike/ncr/report-0.6.pdf), but it is not clear whether the goals then were the same as what we care about now.
`clusterd` was part of the proposed solution, though we have not moved very rapidly on that yet. In particular we had plans to use it to replace `evproxy`+`pubsubd` on every experiment node (we have deployed it on vnode hosts) and even talked about adding another level--running an instance on every subboss (though that would violate Mike's First Law of Hierarchy). https://wiki.emulab.net/wiki/clusterd contains old thoughts on `clusterd` in particular.
Some possible actions we can pursue:
* Optimize the UPenn case. What can we do to allow a single client to deliver, say, 100 events/sec to one or more agents? For true, dynamic generation of "do it now" events this might involve bypassing the event scheduler or even bypassing the central `pubsubd` and talking straight to the node(s) pubsubd(s). Or if the event timeline can be worked out in advance, we might be able to pre-stage the events on the end nodes using some bulk event send version of `tevc`.
* Optimize the current architecture for greater system-wide throughput. This is maybe just taking advantage of `clusterd` and doing some engineering on the components (e.g., multithread `pubsubd`).
* (Re)investigate off-the-shelf pubsub solutions. There should be things out there that receive constant attention and are known to scale. `rabbitmq` and `nanomsg` are two packages I have seen in a quick google. They both have pubsub "patterns".
What we do here will depend on what we think the event model will look like in the Portalized world of Emulab too.
Maybe we can find something here that is well-formed enough to make it a project for Kobus's class!