Commits · stable-20181105 · emulab / emulab-devel

Oct 30, 2018

Bugfix: correct libdb::TBSetNodeEventState return value to match documented API. · 0ba81dc5

David Johnson authored 6 years ago

(No existing code ever checked the return value from TBSetNodeEventState
until libosload_virtnode started to do so, to retry failed event sends
under high load in large-scale vnode experiments.  libevent, Node, and
libdb alternate return conventions; this sets them right.)

0ba81dc5

Bugfix: don't fail to delete Docker images without paths; skip them. · 122980a9
David Johnson authored 6 years ago

122980a9

Bugfix: use the right magic to tell Docker the control net mac address. · 242bbd16

David Johnson authored 6 years ago

This is necessary for clusters that run an arp lockdown on boss.  This
eluded me for a long time.  None of the documented ways to set the mac
address of an endpoint on container create work (they only work on
post-create network attach).  You have to use some special, weird,
undocumented magic.

242bbd16

Bugfix: clientside Docker address calc and firewall bugs. · 0380e194

David Johnson authored 6 years ago

(Most of these got lost in some other commit storm, I believe.  The
firewall fixes are new, for newer Dockers that drop traffic by default.)

0380e194

Minor fixes to repo based profiles. · 878e590d
Leigh B Stoller authored 6 years ago

878e590d

Oct 29, 2018
- Another weekend project; replace the Edit Site Variables page · 2bf0890c
  Leigh B Stoller authored 6 years ago
  
  with a modern implementation.
  2bf0890c
- Change nightly check for corrupted tables; tables are too big these · 5d202969
  Leigh B Stoller authored 6 years ago
  
  days, extended check locks up the system for too long and it never ever finds any problems. So do a "medium" check instead, runs 5 times faster.
  5d202969
Oct 26, 2018
- Stop using the cs.utah.edu Ubuntu mirror for Docker builds. · 94316e93
  David Johnson authored 6 years ago
  
  94316e93
- Ensure docker vnode tmcc proxies are running in the boothook. · 0d44725e
  David Johnson authored 6 years ago
  
  0d44725e
- Handle Docker image history more carefully; Docker returns different things. · c74bf344
  David Johnson authored 6 years ago
  
  c74bf344
- Fix minor Docker image capture bug. · f5195cb4
  David Johnson authored 6 years ago
  
  f5195cb4
- Add uncommitted image_history dockerclient method. · a73154bb
  David Johnson authored 6 years ago
  
  a73154bb
- Work around a long standing issue with installation of SSL certs. · 918e6f83
  Mike Hibler authored 6 years ago
  
  Turns out we have not been installing (via slicefix) the local site certs on nodes after they have been imaged. We haven't noticed because we don't usually use SSL-enabled tmcd. Leigh noticed because we do use it in the script that locks down ARP entries.
  918e6f83
- Silence the email when the certificate does not verify, we tell the user · a249493d
  Leigh B Stoller authored 6 years ago
  
  that and point them to the Geni portal to fix it.
  a249493d
- Checkpoint ... · 36398d74
  Leigh B Stoller authored 6 years ago
  
  36398d74
- Changes to repo based profiles: · c40bf355
  Leigh B Stoller authored 6 years ago
  
  * Respect default branch at the origin; gitlab/guthub allows you to set the default branch on repo, which we ignoring, always using master. Now, we ask the remote for the default branch when we clone/update the repo and set that locally. Like gitlab/guthub, mark the default branch in the branchlist with a "default" badge so the user knows. * Changes to the timer that is asking if the repohash has changed (via a push hook), this has a race in it, and I have solved part of it. It is not a serious problem, just a UI annoyance I am working on fixing. Added a cheesy mechanism to make sure the timer is not running at the same time the user clicks on Update().
  c40bf355
- Add OPSVM_ENABLE changes; we do not need to do arplockdown on ops when · a6f79ed2
  Leigh B Stoller authored 6 years ago
  
  it is a jail, and it's mac is the same as boss.
  a6f79ed2
Oct 25, 2018

Add defs file for amaricq · 2f41610c
Aleksander Maricq authored 6 years ago

2f41610c
pf does not like /255.255.255.248 masks ... has to be /29 · 6584c4c9
Leigh B Stoller authored 6 years ago

6584c4c9

Replace the Docker entrypoint/cmd/env implementation for augmented images. · a986a085

David Johnson authored 6 years ago

(Also, add support for user to change container entrypoint at runtime.
Note also that the server side now stores the entrypoint/cmd/env
attributes as base64url-encoded virt_node_attributes, so that we can
just use the existing table_regex for those values.)

We add a new runit service (/etc/service/dockerentrypoint) to
clientside/tmcc/linux/docker/dockerfiles/common to handle the
entrypoint/cmd/env/workingdir/user emulation.  From the comments:

  Docker's semantics for ENTRYPOINT/CMD vary depending on if those
  values are specified as arrays of string, or simple as single strings
  (which must be interpreted by /bin/sh -c).

  Handling all the quoting possibilities in the shell is a major pain.
  So, this script handles the basic stuff (in particular, sourcing env
  vars, because we want the shell to interpret them!) -- then execs our
  perl companion script (run.pl) to deal with the entrypoint/command
  files that libvnode_docker::emulabizeImage and
  libvnode_docker::vnodeCreate populated.

  libvnode_docker creates these single-line files in /etc/emulab/docker
  as either string:hexstr(<entrypoint-or-cmd-string>), or
  array:hexstr(a[0]),hexstr(a[1])... .  This allows us to preserve the
  original type of the image's entrypoint/cmd as well as the runtime
  entrypoint/cmd, and to preserve the exact bytes for the eventual final
  call to exec.

  The static files builtin to an emulabized image are
  /etc/emulab/docker/{entrypoint.image,cmd.image}, and those created
  dynamically at runtime if user changes the entrypoint or cmd are
  bind-mounted to /etc/emulab/docker{entrypoint.runtime,cmd.runtime}.

  Given the presence (or absence!) of those files, this script
  implements the emulation, based upon the content in those files.

a986a085

Add support for privileged Docker containers. · 993e9f8c
David Johnson authored 6 years ago

993e9f8c
Down our Docker services the proper runit way. · e48155a7
David Johnson authored 6 years ago

e48155a7
Tweaks for 2018Q4 port set. · f3dc1bfe
Mike Hibler authored 6 years ago

f3dc1bfe
Minor fix to repo based profile update. · 671c9a48
Leigh B Stoller authored 6 years ago

671c9a48
Turn on image tracking. · d43e6a81
Leigh B Stoller authored 6 years ago

d43e6a81
Merge branch 'master' of gitlab.flux.utah.edu:emulab/emulab-devel · 8689a1e5
Mike Hibler authored 6 years ago

8689a1e5

Introduce a full port of m2crypto rather than a wrapper. · 7257198b

Mike Hibler authored 6 years ago

The full port is fixed at version 0.29.1. The latest version that was
wraped, version 0.30.1 has problems with unicode to "string" conversions.
This explicitly caused an exception from the m2crypto SWIG stubs for libssl.
Even after fixing that, we still could not verify a certificate due to apparent
missing chars in strings.

7257198b

Oct 24, 2018

Fixes for DeleteNodes(): · c14472f9

Leigh B Stoller authored 6 years ago

* When deleting a lan can there is only one interface left, need to go
  back and delete the interface from the last node. Else its a malformed
  rpsec (which we have been ignoring), but it was passing through to the
  manifest, which made it a malformed manifest.

* But a later bug was causing that now removed interface to sneak back
  in via the old copy of the manifest in the database.

* Also fix a bug that was causing multiple versions of the site_info
  element to get inserted during an update.

* Remove code that updates the manifest in the DB, use the existing
  Aggregate->UpdateManifest() method instead.

c14472f9

Changes for Arduino I did a while back. · c2387c9b

Mike Hibler authored 6 years ago

Avoid gratuituous serial line signal changes when opening up the USB
device for the Arduino. Otherwise, the Arduino will reset its state.

c2387c9b

Minor fix; we let users delete profiles (or versions) while there is an · e234b170

Leigh B Stoller authored 6 years ago

experiment running that uses that profile. A small bug here prevented
the Terminate button from getting enabled. In general though, I wonder
if we should not allow a profile to be deleted while its instantiated. :-)

e234b170

Oct 23, 2018

Change geni-lib STRING test to allow any printable ascii character. · 0255b322
Leigh B Stoller authored 6 years ago

0255b322
Use nosighup option to ParRun() since we use that for syslog log · 238fcb83
Leigh B Stoller authored 6 years ago
```
rolling.
```
238fcb83

Fix gaping race condition in ParRun() that was causing an infinite loop · 833d0937

Leigh B Stoller authored 6 years ago

when getting a termination signal. Also add an option to not redefine
the HUP handler, which is needed for the portal_monitor, which uses the
HUP signal to reopen the logfile (from syslogd).

833d0937

Minor fix. · 6f628c59
Leigh B Stoller authored 6 years ago

6f628c59

New version of the portal monitor that is specific to the Mothership. · 2a5cbb2a

Leigh B Stoller authored 6 years ago

This version is intended to replace the old autostatus monitor on bas,
except for monitoring the Mothership itself. We also notify the Slack
channel like the autostatus version. Driven from the apt_aggregates
table in the DB, we do the following.

1. fping all the boss nodes.

2. fping all the ops nodes and dboxen. Aside; there are two special
cases for now, that will eventually come from the database. 1)
powder wireless aggregates do not have a public ops node, and 2) the
dboxen are hardwired into a table at the top of the file.

3. Check all the DNS servers. Different from autostatus (which just
checks that port 53 is listening), we do an actual lookup at the
server. This is done with dig @ the boss node with recursion turned
off. At the moment this is serialized test of all the DNS servers,
might need to change that latter. I've lowered the timeout, and if
things are operational 99% of the time (which I expect), then this
will be okay until we get a couple of dozen aggregates to test.

Note that this test is skipped if the boss is not pingable in the
first step, so in general this test will not be a bottleneck.

4. Check all the CMs with a GetVersion() call. As with the DNS check, we
skip this if the boss does not ping. This test *is* done in parallel
using ParRun() since its slower and the most likely to time out when
the CM is busy. The time out is 20 seconds. This seems to be the best
balance between too much email and not hanging for too long on any
one aggregate.

5. Send email and slack notifications. The current loop is every 60
seconds, and each test has to fail twice in a row before marking a
test as a failure and sending notification. Also send a 24 hour
update for anything that is still down.

At the moment, the full set of tests takes 15 seconds on our seven
aggregates when they are all up. Will need more tuning later, as the
number of aggregates goes up.

2a5cbb2a

More tweaks to powder fixed node build. · 3dcc45bc
Leigh B Stoller authored 6 years ago

3dcc45bc
Add timeout override to PingAggregate(). · 076547b6
Leigh B Stoller authored 6 years ago

076547b6
When searching for an IP on the history page, lets also show a matching · 10383734
Leigh B Stoller authored 6 years ago
```
current experiment if there is one. This is convenient.
```
10383734
Allow HTML in warn/kill message to user. · 74258700
Leigh B Stoller authored 6 years ago

74258700
With Apache 2.4, there is a new option to allow CAs with no CRLS · c1220b25
Leigh B Stoller authored 6 years ago
```
when CRLS are enabled. This used to be the default but is now an
option we need to turn on.
```
c1220b25