1. 06 Nov, 2018 1 commit
  2. 05 Nov, 2018 1 commit
    • Leigh Stoller's avatar
      Changes to how we handle/report mapping failures that also fail the · 11074445
      Leigh Stoller authored
      empty testbed test.
      
      Prior to this commit, we were not invoking the empty testbed case
      consitently. Now we do, but that exposed another problem; reporting that
      to the error to the Portal in a meaningful way. Basically, we can report
      a different error code for an impossible to map error, but then we lose
      the info we store now about what the actual failure was (which we show
      to the user with additional helpful info). Since we cannot (easily)
      change the Geni API for CreateSliver(), I have elected to continue the
      practice of returning the specific error codes (which also go into the
      database for long term historical info), and add more helpful text that
      for the Portal user that explains clearly that the mapping is impossible
      on the target cluster. This extra text also go into the database in the
      attached message field, so we ccan come back later and post process if
      we decide to do something different.
      11074445
  3. 25 Oct, 2018 2 commits
    • David Johnson's avatar
      Replace the Docker entrypoint/cmd/env implementation for augmented images. · a986a085
      David Johnson authored
      (Also, add support for user to change container entrypoint at runtime.
      Note also that the server side now stores the entrypoint/cmd/env
      attributes as base64url-encoded virt_node_attributes, so that we can
      just use the existing table_regex for those values.)
      
      We add a new runit service (/etc/service/dockerentrypoint) to
      clientside/tmcc/linux/docker/dockerfiles/common to handle the
      entrypoint/cmd/env/workingdir/user emulation.  From the comments:
      
        Docker's semantics for ENTRYPOINT/CMD vary depending on if those
        values are specified as arrays of string, or simple as single strings
        (which must be interpreted by /bin/sh -c).
      
        Handling all the quoting possibilities in the shell is a major pain.
        So, this script handles the basic stuff (in particular, sourcing env
        vars, because we want the shell to interpret them!) -- then execs our
        perl companion script (run.pl) to deal with the entrypoint/command
        files that libvnode_docker::emulabizeImage and
        libvnode_docker::vnodeCreate populated.
      
        libvnode_docker creates these single-line files in /etc/emulab/docker
        as either string:hexstr(<entrypoint-or-cmd-string>), or
        array:hexstr(a[0]),hexstr(a[1])... .  This allows us to preserve the
        original type of the image's entrypoint/cmd as well as the runtime
        entrypoint/cmd, and to preserve the exact bytes for the eventual final
        call to exec.
      
        The static files builtin to an emulabized image are
        /etc/emulab/docker/{entrypoint.image,cmd.image}, and those created
        dynamically at runtime if user changes the entrypoint or cmd are
        bind-mounted to /etc/emulab/docker{entrypoint.runtime,cmd.runtime}.
      
        Given the presence (or absence!) of those files, this script
        implements the emulation, based upon the content in those files.
      a986a085
    • David Johnson's avatar
      993e9f8c
  4. 24 Oct, 2018 1 commit
    • Leigh Stoller's avatar
      Fixes for DeleteNodes(): · c14472f9
      Leigh Stoller authored
      * When deleting a lan can there is only one interface left, need to go
        back and delete the interface from the last node. Else its a malformed
        rpsec (which we have been ignoring), but it was passing through to the
        manifest, which made it a malformed manifest.
      
      * But a later bug was causing that now removed interface to sneak back
        in via the old copy of the manifest in the database.
      
      * Also fix a bug that was causing multiple versions of the site_info
        element to get inserted during an update.
      
      * Remove code that updates the manifest in the DB, use the existing
        Aggregate->UpdateManifest() method instead.
      c14472f9
  5. 12 Sep, 2018 1 commit
  6. 16 Aug, 2018 1 commit
    • David Johnson's avatar
      When user supplies external Docker image or Dockerfile, validate them. · fdcd1b4d
      David Johnson authored
      We try to emulate the standard Docker CLI's image handling.  Thus, if
      user specifies an image like 'ubuntu', we turn that into
      'ubuntu:latest'.  If user does not supply a registry host, we try
      'registry.hub.docker.com'.  If they do not specify a registry host or
      specify a registry host that is either registry-1.docker.io or
      registry.hub.docker.com, and their image does not contain a /, we
      prepend 'library/' to it (I *think* this is the right heuristic, but
      it's inference).
      
      For Dockerfiles, we must be able to download it, and it must contain a
      line matching ^\s*FROM (i.e. a valid FROM instruction, which all
      Dockerfiles must have).  We try to support DOS-mode textfiles too, but
      only ASCII.
      
      Might need to loosen these checks.
      fdcd1b4d
  7. 13 Aug, 2018 1 commit
    • Leigh Stoller's avatar
      Possible fix using shared lan and wanting to use the 10G interface: · 7b5eb1bb
      Leigh Stoller authored
      * In the CM we have always ignored the BW settings on a shared lan,
        since there is no way to set the properties in geni-lib for such a
        lan. There is the local hack I added (linkwide properties), but that
        was also ignored for shared vlans. Now I am looking to see if there is
        a bandwidth specification there, and using that. I assume we do not
        care about delay/loss since well, we never have before.
      
      * But even so, the mapper was ignoring it too. But we also have the code
        that tries to not use 10G interfaces unless explicitly asked for a 10G
        link, and that is not in the shared vlan path. So ... I made a few
        changes, the worst that can happen is that I broke share vlans for
        everyone except this one case.
      7b5eb1bb
  8. 16 Jul, 2018 1 commit
  9. 09 Jul, 2018 2 commits
    • Leigh Stoller's avatar
      Various bits of support for issue #408: · b7fb16a8
      Leigh Stoller authored
      * Add portal url to the existing emulab extension that tells the CM the
        CreateSliver() is coming from the Portal. Always send this info, not
        just for the Emulab Portal.
      
      * Stash that info in the geni slice data structure so we can add links
        back to the portal status page for current slices.
      
      * Add routines to generate a portal URL for the history entries, since
        we will not have those links for historical slices. Add links back to
        the portal on the showslice and slice history pages.
      b7fb16a8
    • Leigh Stoller's avatar
      Fix for bug uncovered by DeleteNodes(); when the user has not set the · 06453763
      Leigh Stoller authored
      IPs on the lan and we fill them in, we put those IPs into the manifest,
      but we also have to put the netmask in, since without that, the update
      operation will get confused.
      06453763
  10. 25 Jun, 2018 2 commits
  11. 21 Jun, 2018 1 commit
  12. 07 Jun, 2018 1 commit
  13. 04 Jun, 2018 2 commits
    • David Johnson's avatar
      Docker server-side core, esp new libimageops support for Docker images. · 66366489
      David Johnson authored
      The docker VM server-side goo is mostly identical to Xen, with slightly
      different handling for parent images.  We also support loading external
      Docker images (i.e. those without a real imageid in our DB; in that
      case, user has to set a specific stub image, and some extra per-vnode
      metadata (a URI that points to a Docker registry/image repo/tag);
      the Docker clientside handles the rest.
      
      Emulab Docker images map to a Emulab imageid:version pretty seamlessly.
      For instance, the Emulab `emulab-ops/docker-foo-bar:1` image would map
      to `<local-registry-URI>/emulab-ops/emulab-ops/docker-foo-bar:1`; the
      mapping is `<local-registry-URI>/pid/gid/imagename:version`.  Docker
      repository names are lowercase-only, so we handle that for the user; but
      I would prefer that users use lowercase Emulab imagenames for all Docker
      images; that will help us.  That is not enforced in the code; it will
      appear in the documentation, and we'll see.
      
      Full Docker imaging relies on several other libraries
      (https://gitlab.flux.utah.edu/emulab/pydockerauth,
      https://gitlab.flux.utah.edu/emulab/docker-registry-py).  Each
      Emulab-based cluster must currently run its own private registry to
      support image loading/capture (note however that if capture is
      unnecessary, users can use the external images path instead).  The
      pydockerauth library is a JWT token server that runs out of boss's
      Apache and implements authn/authz for the per-Emulab Docker registry
      (probably running on ops, but could be anywhere) that stores images and
      arbitrates upload/download access.  For instance, nodes in an experiment
      securely pull images using their pid/eid eventkey; and the pydockerauth
      emulab authz module knows what images the node is allowed to pull
      (i.e. sched_reloads, the current image the node is running, etc).  Real
      users can also pull images via user/pass, or bogus user/pass + Emulab
      SSL cert.  GENI credential-based authn/z was way too much work, sadly.
      There are other auth/z paths (i.e. for admins, temp tokens for secure
      operations) as well.
      
      As far as Docker image distribution in the federation, we use the same
      model as for regular ndz images.  Remote images are pulled in to the
      local cluster's Docker registry on-demand from their source cluster via
      admin token auth (note that all clusters in the federation have
      read-only access to the entire registries of any other cluster in the
      federation, so they can pull images).  Emulab imageid handling is the
      same as the existing ndz case.  For instance, image versions are lazily
      imported, on-demand; local version numbers may not match the remote
      image source cluster's version numbers.  This will potentially be a
      bigger problem in the Docker universe; Docker users expect to be able to
      reference any image version at any time anywhere.  But that is of course
      handleable with some ex post facto synchronization flag day, at least
      for the Docker images.
      
      The big new thing supporting native Docker image usage is the guts of a
      refactor of the utils/image* scripts into a new library, libimageops;
      this is necessary to support Docker images, which are stored in their
      own registry using their own custom protocols, so not amenable to our
      file-based storage.  Note: the utils/image* scripts currently call out
      to libimageops *only if* the image format is docker; all other images
      continue on the old paths in utils/image*, which all still remain
      intact, or minorly-changed to support libimageops.
      
      libimageops->New is the factory-style mechanism to get a libimageops
      that works for your image format or node type.  Once you have a
      libimageops instance, you can invoke normal image logical operations
      (CreateImage, ImageValidate, ImageRelease, et al).  I didn't do every
      single operation (for instance, I haven't yet dealt with image_import
      beyond essentially generalizing DownLoadImage by image format).
      Finally, each libimageops is stateless; another design would have been
      some statefulness for more complicated operations.   You will see that
      CreateImage, for instance, is written in a helper-subclass style that
      blurs some statefulness; however, it was the best match for the existing
      body of code.  We can revisit that later if the current argument-passing
      convention isn't loved.
      
      There are a couple outstanding issues.  Part of the security model here
      is that some utils/image* scripts are setuid, so direct libimageops
      library calls are not possible from a non-setuid context for some
      operations.  This is non-trivial to resolve, and might not be worthwhile
      to resolve any time soon.  Also, some of the scripts write meaningful,
      traditional content to stdout/stderr, and this creates a tension for
      direct library calls that is not entirely resolved yet.  Not hard, just
      only partly resolved.
      
      Note that tbsetup/libimageops_ndz.pm.in is still incomplete; it needs
      imagevalidate support.  Thus, I have not even featurized this yet; I
      will get to that as I have cycles.
      66366489
    • Leigh Stoller's avatar
      Fix a bug that was introduced when we shifted to using os_setup · e59fc714
      Leigh Stoller authored
      directly (on the Cloudlab clusters); we were losing a lock out that
      allowed DeleteSliver() to run while in the middle of a CreateSliver().
      This was resulting in a lot of email about node failures since the nodes
      were getting yanked out from underneath the CreateSliver(). From the
      user perspective, this did not matter much, since they wanted the slice
      gone, but it finally bothered me enough to look more closely.
      e59fc714
  14. 31 May, 2018 1 commit
  15. 30 May, 2018 1 commit
    • Leigh Stoller's avatar
      Add support for linkwide properties which are far more efficient wrt the · aba79edd
      Leigh Stoller authored
      XML size on really big lans. I do not expect this to be used very often,
      but it is handy. On the geni-lib side:
      
      class setProperties(object):
          """Added to a Link or LAN object, this extension tells Emulab based
          clusters to set the symmetrical properties of the entire link/lan to
          the desired characteristics (bandwidth, latency, plr). This produces
          more efficient XML then setting a property on every source/destination
          pair, especially on a very large lan. Bandwidth is in Kbps, latency in
          milliseconds, plr a floating point number between 0 and 1. Use keyword
          based arguments, all arguments are optional:
      
              link.setProperties(bandwidth=100000, latency=10, plr=0.5)
      
          """
      aba79edd
  16. 26 Apr, 2018 1 commit
  17. 23 Apr, 2018 1 commit
  18. 18 Apr, 2018 1 commit
    • Leigh Stoller's avatar
      A tiny little tweak that allows you to set the IPs on layer 1 link · 728bd3bd
      Leigh Stoller authored
      ifaces that correspond to endpoints on nodes. Makes it easier to
      do something like this, if we init the interfaces on the nodes with IP
      and mask and bring it up.
      
      	# Add a raw PC to the request and give it an interface.
      	node1 = request.RawPC("node1")
      	iface1 = node1.addInterface()
      
      	# Specify the IPv4 address
      	iface1.addAddress(pg.IPv4Address("192.168.1.1", "255.255.255.0"))
      
      	# Add another raw PC to the request and give it an interface.
      	node2 = request.RawPC("node2")
      	iface2 = node2.addInterface()
      
      	# Specify the IPv4 address
      	iface2.addAddress(pg.IPv4Address("192.168.1.2", "255.255.255.0"))
      
      	# Add L1 link from node1 to node2
      	link1 = request.L1Link("link1")
      	link1.addInterface(iface1)
      	link1.addInterface(iface2)
      728bd3bd
  19. 13 Apr, 2018 2 commits
  20. 09 Apr, 2018 1 commit
  21. 03 Apr, 2018 1 commit
    • Leigh Stoller's avatar
      When referring to a specific image version in the disk image URN, and · d83614dc
      Leigh Stoller authored
      that image is an *imported* system image, we are almost certainly
      referring to a version related to the origin image, not the local
      copy (which is not in sync with the origin wrt the history). In that
      case, we can use the hash from the image server to track down the local
      version.
      
      Aside; If I were going to redo the import mechanism, I would make image
      provenance the default for everyone, and make the option be whether to
      save or delete the image file (that was the entire reason I made
      provenance optional, most sites especially geni racks, do not have
      enough storage to maintain old images. And image import would always
      import the history too, which is something I did later when we thought
      deltas were going to solve our problems.
      d83614dc
  22. 29 Mar, 2018 1 commit
    • Leigh Stoller's avatar
      Reservations system changes: · df90d7a7
      Leigh Stoller authored
      1) Rework so that instead of relying on swapin__last + autoswap timeout,
         set expt_expires for classic experiments at the beginning of swapin
         time. This is cause swapin_last is not set till the end of swapin,
         and so during swapin the res system is in an inconsistent state since
         there is no way to determine when the experiment ends.
      
      2) On the Geni path, simplify expiration handling; do not allow a slice
         modification and expiration change at the same time; the bookkeeping
         and failure rollback is a pain, especially wrt reservation system,
         and this rarely ever actually happens, so get rid of a lot of
         complication.
      df90d7a7
  23. 26 Mar, 2018 2 commits
  24. 14 Mar, 2018 2 commits
  25. 08 Mar, 2018 2 commits
  26. 16 Feb, 2018 2 commits
  27. 22 Jan, 2018 5 commits