- 11 Aug, 2016 1 commit
-
-
Jonathon Duerig authored
Users can now get from the show/manage profile pages to the editor and when in the editor, they can create new profiles or edit existing ones.
-
- 10 Aug, 2016 2 commits
-
-
Mike Hibler authored
There are now some sitevars to control its behavior, the one of interest here is reload/failtime: The way the reload daemon is supposed to work now is that nodes will be started on their reloading adventure with an os_load. If they are still there after reload/retrytime minutes, then they will either be rebooted (if the os_load was successful) or os_load'ed again (if the first os_load failed outright). The logic for either of these is that there might have been some transient condition that caused the failure. If we do have to perform this "retry" then we will send email to testbed-ops if reload/warnonretry is set. If, after another reload/retrytime minutes, a node is still there, then the node will be sent to hwdown, possibly powering it off or booting it into the admin MFS depending on the setting of reload/hwdownaction. So really, reload/failtime should not be needed. All node should exit reloading in 2 * reload/retrytime minutes. But it is there as a backstop (and because I didn't understand the logic of the reload daemon at first!) Well, it also comes into play if the reload daemon is restarted after being down for a long period of time. In this case, all nodes in reloading will get moved to hwdown. May need to reconsider this...
-
David Johnson authored
Ok, it seems that sometimes the network.target runs before network devices have fully finished going through udev. I think what goes on here is that udev can "settle" (meaning there are no events), but there will still be some events in the future. So now in the special networking-emulab.service, we settle AND wait for at least one auto, non-lo interface to appear via ifquery.
-
- 09 Aug, 2016 11 commits
-
-
David Johnson authored
systemd.swap is one of its special builtin services. Basically, swap devices are parsed out of fstab, or by examining a disk's GPT. Any such devices are turned into instantiated units. This happens via the systemd-fstab-generator. Generators in systemd are almost uncontrollable. They run immediately, prior to on-disk unit file parsing, and all you can do is disable or replace them. You cannot express dependencies on the resulting units (unless you write your own generator). Generators also run in an impoverished environment (think read-only /etc), so we cannot just add another generator that does basically what fixup-fstab-swaps does. Finally, we cannot write a template unit file for all swap devices (we would use this to inject a blocking dependency so that these swap units don't conflict with us). Lennart has recognized the value in this, but thought the impl effort is pretty hard. This makes sense, because the generators run prior to unit file load from disk (and presumably that would nix templates for generated units)... and I gather there are other problems as well. This is quite problematic for us because we rely on the ability to update /etc/fstab with the name of the real swap device, and to mkswap on it. However, on machines with lots of cores, systemd is at its parallelizing best, and inevitably systemd tries to start up one of its instantiated swap device units at the same time as our fixup-fstab-swaps script is running. So I've done several things to try to deal with this situation. First, this Ubuntu 16-specific version of fixup-fstab-swaps no longer adds a swap line to fstab with options=defaults -- instead it uses options=noauto,x-emulab-auto . The noauto causes systemd's instantiated swap units to not automatically run on boot (don't worry, they become active if fixup-fstab-swaps swapons them, and thus they get swapped off prior to umount -- important that happens to avoid hangs); but our script will swapon the noauto,x-emulab-auto swap partitions as if they'd had options=default|auto. What this does break is swapon/off -a --- but who cares. The x-* comment option in fstab is something I didn't know about, I'll admit. Second, I've done is make emulab-fstab-fixup.service Conflict with swap.target, but also to be pulled in by swap.target! The hope was that this would ensure that our service *always* runs successfully, even if it kills off swap.target to "handle" the conflict. Well, the problem is that we need to Conflict with the instantiated swap unit files, not swap.target... so I think that isn't really working. But I left it in -- maybe it is helping us win races. The one thing I cannot block is that systemd looks at the partition types of at least one of our hardware types (d820) and generates swap unit files by the partition UUID. How it is doing this, I have no idea -- that behavior is only supposed to happen if your disk is GPT. So we get failures on the d820s from the systemd instantiated swap units on first boot, but our scripts always do the right thing.
-
Leigh Stoller authored
we see that kind of interface naming and that is how geni-lib does it. But need to accept that syntax.
-
David Johnson authored
The stock Ubuntu 16 networking.service only runs `udevadm settle` if there are 'auto ...' stanzas in /etc/network/interfaces . Well, we got rid of that a few commits ago, and now let udev rules populate /etc/network/interfaces (really /run/emulab-interfaces.d-auto-added/*). So, it's either hack the networking.service unit file to force udev to settle, and have it blown away on package update; or add a networking-emulab.service that has to run before networking.service to force udev to settle. We *always* want udev to settle on any Emulab node before bringing up interfaces, just in case the control net NIC is slow for whatever reason.
-
Mike Hibler authored
-
David Johnson authored
I cannot find why we called fixup-fstab-swaps with '-E' (which means, don't mkswap/swapon any swaps). The only thing I can think of is that perhaps running swapon manually made the systemd dev-*.swap targets unhappy. However, it is necessary to mkswap if the swap device didn't exist, because systemd will not mkswap for you, AFAIK -- it will only swapon. On Ubuntu 16, the dev-*.swap targets are happy whether they or Emulab does the swapon. If that's not true on Centos 7 or other systemds, we may have to revisit this tweak.
-
David Johnson authored
-
Mike Hibler authored
Put that code in the Linux prepare instead.
-
Mike Hibler authored
The comment line is different.
-
Mike Hibler authored
-
Mike Hibler authored
-
David Johnson authored
The remount-root-fs unit changed names in 16 to systemd-remount-fs , and I didn't see the race in the first round of testing, I guess.
-
- 08 Aug, 2016 2 commits
-
-
David Johnson authored
Prior to this commit, in Ubuntu 16, our control net hook was getting invoked accidentally by udev rules that look for bridge ports or vlan ports via ifquery. Those rules invoke call ifquery -l, but do not add the --no-mappings argument to skip mapping processing --- and thus our mapping hook got run. But it was not getting run via systemd's networking.service, which is where it needs to run. That service guarantees that udev has 'settled' (flushed its event queue it accumulated during boot), which is important for devices with slow firmware/drivers/etc. Sadly, our mapping hook could *not* get run by the normal networking.service, because we cannot predict the control net device name (the possibilities are determined now by hardware and firmware, and could range from enoX to enpXsYfZdA). ifup -a requires that the real device name be present and be set to auto in /etc/network/interfaces. You can run ifup -a --force to bring up a non-existent device, but you cannot bring it down with ifdown. Interestingly enough, ifquery does not require that all 'auto' devices it returns be real devices, and that's why things were working. First, we have to make sure our findcnet hook does not run via the builtin udev rules. That's easy; we fixed up findcnet to look for some udev/systemd env vars, and do nothing in that case. Hopefully we got env vars that are always present... There are basically 3 strategies we can try after that. We can make our own networking-emulab.service that brings up and down the Emulab control net, and make networking.service pull that in. This way, 'service networking restart' or 'systemctl restart networking.service' would still work. However, ifup/ifdown would not work, because the control net iface is not present in /etc/network/interfaces. So nix that. Two other options require us to dynamically edit /etc/network/interfaces on first boot of a debian/systemd machine, to place all ethernet devices into it along with our mapping hook and set them to auto, *and* to remove those customizations in prepare. This sort of sucks, but it doesn't suck much worse than if prepare fails in some other part of the process. What is more, we can make it suck less by always checking to assure ourselves that the real control net device is present in /etc/network/interfaces, and is present on the system. If we encounter anything to the contrary, we can recreate the Emulab section from scratch. Thus if there are prepare failures, the image will still boot because any inconsistent cruft will get wiped away. We can do this either by adding a networking-emulab.service that runs and finishes prior to networking.service, OR we can add a udev rule that calls a script to ensure all ethernet devices are added to /etc/network/interfaces prior to running. At this point, I favor the latter approach, if we can guarantee that it finishes prior to anything looking at /etc/network/interfaces. We can't guarantee anything about udev events being "finished" for a subsystem, AFAIK. Finally (and the best way), we can use yet another interfaces(5) mechanism and some strategic udev rules of our own! We add udev rules (/etc/udev/rules.d/99-emulab-control-network.rules) that populates the /run/emulab-interfaces.d-auto-added dir (listed as a source dir in /etc/network/interfaces for the ifup/ifdown/ifquery commands below) with files that contain simply 'auto <IFACE>'. Those rules are careful to do only that for certain valid wired Ethernet devices (and deliberately not wireless devices!). Then, once we've got 'auto ...' stanzas for each possible Ethernet device, we can continue to utilize the mapping stanzas below like previous versions of this file did. And we don't have anything to clean out on reboot or on image capture, because /run is automatically cleared. ifup/ifdown/ifquery are not bothered by the absence of the sourced directory in /run, if that didn't exist for any reason. If you need to add another foo* device name, you'll need to edit the interfaces file (with another mapping stanza) and update the match rules in 99-emulab-control-network.rules .
-
Gary Wong authored
-
- 06 Aug, 2016 1 commit
-
-
Leigh Stoller authored
This closes issue #131.
-
- 04 Aug, 2016 1 commit
-
-
Gary Wong authored
-
- 03 Aug, 2016 4 commits
-
-
David Johnson authored
In the latest udev world, udev generates predictable device names using firmware info and/or pci buss info (i.e., eno1 or enps4f0). So, we now try to run dhclient only on real ethernet devices (i.e., eth*, en*, sl*). There are other kinds of ethernet devices (i.e. wireless, wl*, ww*) or virtual devices, but we don't care about finding the control net on those. Might need to add another device name prefix for PV devices in Xen guests... we'll see.
-
David Johnson authored
This replaces the first attempt, which just masked the race condition, since I didn't understand what tmcc bossinfo was really doing. This appears to fix it satisfactorily for now; it doesn't seem that we will run into the case where the file exists but has no nameserver. resolvconf on Linux also breaks DNS momentarily via dhclient exit hook, or something. On Ubuntu 16, resolvconf is setup to run via dhclient enter hook (the hook redefines make_resolv_conf, which dhclient-script eventually executes prior to the exit hook execution). For whatever reason, though, sometimes when our exit hook (this script) runs, /etc/resolv.conf is a dangling symlink. I was not able to find the source of the asynch behavior, so I can't say for sure. But sethostname.dhclient is an immediate casualty, because it calls tmcc bossinfo(), and the tmcc binary attempts to use res_init and read the resolver and use that as boss. If there is no /etc/resolv.conf (or it is a broken symlink into /run, as it is on resolvconf systems before resolvconf runs for the first time on boot), res_init will return localhost, and there is no way for us in tmcc to know that is inappropriate (taking the res_init resolver might not be the best choice, but we do not dare to add a special-case rejection of localhost in tmcc).
-
Mike Hibler authored
-
Leigh Stoller authored
-
- 02 Aug, 2016 1 commit
-
-
Leigh Stoller authored
-
- 01 Aug, 2016 1 commit
-
-
Leigh Stoller authored
clusters using credentials to provide permission to access the datasets. * Add authority_urn to the images table, which is the urn of the origin dataset (similar to the slice urn, the Portal mints a credential in its namespace, so that the Portal always has permission to do anything it wants to the dataset at the remote cluster). * Add slot to the apt_datasets table to store a credential from the cluster where the dataset lives. This credential gives the owner permission to download the dataset, which the portal will delegate to any cluster that might need to get that dataset.
-
- 29 Jul, 2016 8 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
Even when there were no changes, we were stat'ing every mountpoint. Now we wait til after we have determined that something has changed before we test the mountpoints. Also, just open and append to files directly in perl rather than exec'ing a "cat" to do it.
-
Mike Hibler authored
Don't put out the "DO NOT EDIT" comment. We do that in the proxy as well. Print out timestamps around the call to the proxy.
-
Leigh Stoller authored
-
Leigh Stoller authored
-
Leigh Stoller authored
to the web interface. Note that I removed the setgroups from the path cause it is so slow on the Mothership, probably cause of the group file size. Turns out this is okay, we keep the groups of inactive users in sync.
-
Leigh Stoller authored
-
Leigh Stoller authored
that mountd does not need to be hupped. Mike says perl now randomizes its hashes, how silly is that. Still, this takes way too long, almost four seconds cause of the ssh to ops each time. We should keep the files local and do the diffs on boss.
-
- 28 Jul, 2016 5 commits
-
-
Leigh Stoller authored
the value of sitevar protogeni/default_osname when no OS is specified. For Phantomnet, Kirk needs the type specific default OS used instead.
-
Leigh Stoller authored
-
Leigh Stoller authored
-
Mike Hibler authored
-
Mike Hibler authored
-
- 27 Jul, 2016 1 commit
-
-
Leigh Stoller authored
This will prevent bootinfo contact from off-network, as on geni racks.
-
- 26 Jul, 2016 2 commits
-
-
Mike Hibler authored
-
Leigh Stoller authored
-