1. 13 Mar, 2019 1 commit
  2. 08 Nov, 2018 1 commit
  3. 30 May, 2018 1 commit
  4. 22 May, 2018 1 commit
  5. 24 Apr, 2018 1 commit
  6. 30 Mar, 2018 1 commit
  7. 26 Mar, 2018 1 commit
  8. 16 Feb, 2018 1 commit
    • Leigh Stoller's avatar
      A lot of work on the RPC code, among other things. · 56f6d601
      Leigh Stoller authored
      I spent a fair amount of improving error handling along the RPC path,
      as well making the code more consistent across the various files. Also
      be more consistent in how the web interface invokes the backend and gets
      errors back, specifically for errors that are generated when taking to a
      remote cluster.
      
      Add checks before every RPC to make sure the cluster is not disabled in
      the database. Also check that we can actually reach the cluster, and
      that the cluster is not offline (NoLogins()) before we try to do
      anything. I might have to relax this a bit, but in general it takes a
      couple of seconds to check, which is a small fraction of what most RPCs
      take. Return precise errors for clusters that are not available, to the
      web interface and show them to user.
      
      Use webtasks more consistently between the web interface and backend
      scripts. Watch specifically for scripts that exit abnormally (exit
      before setting the exitcode in the webtask) which always means an
      internal failure, do not show those to users.
      
      Show just those RPC errors that would make sense users, stop spewing
      script output to the user, send it just to tbops via the email that is
      already generated when a backend script fails fatally.
      
      But do not spew email for clusters that are not reachable or are
      offline. Ditto for several other cases that were generating mail to
      tbops instead of just showing the user a meaningful error message.
      
      Stop using ParRun for single site experiments; 99% of experiments.
      
      For create_instance, a new "async" mode that tells CreateSliver() to
      return before the first mapper run, which is typically very quickly.
      Then watch for errors or for the manifest with Resolve or for the slice
      to disappear. I expect this to be bounded and so we do not need to worry
      so much about timing this wait out (which is a problem on very big
      topologies). When we see the manifest, the RedeemTicket() part of the
      CreateSliver is done and now we are into the StartSliver() phase.
      
      For the StartSliver phase, watch for errors and show them to users,
      previously we mostly lost those errors and just sent the experiment into
      the failed state. I am still working on this.
      56f6d601
  9. 16 Jan, 2018 1 commit
  10. 09 Jan, 2018 1 commit
  11. 19 Nov, 2017 1 commit
    • Leigh Stoller's avatar
      Round of changes related to dataset approval: · f431479c
      Leigh Stoller authored
      Previously we forced all Portal datasets to auto approve at the target
      cluster, now we let the local policy settings determine that, and return
      status indicating that the dataset needs to be approved by an admin.
      
      Plumbed through the approval path to the remote cluster.
      
      Fixed up polling to handle unapproved datasets and to watch for new
      failed state that Mike added to indicate that allocation failed.
      f431479c
  12. 07 Sep, 2017 1 commit
  13. 08 Aug, 2017 2 commits
  14. 25 Jan, 2017 1 commit
  15. 21 Sep, 2016 1 commit
  16. 07 Sep, 2016 1 commit
  17. 29 Aug, 2016 2 commits
  18. 04 May, 2016 1 commit
  19. 01 Mar, 2016 1 commit
    • Leigh Stoller's avatar
      Some tweaks to credential handling: · 3ebffb34
      Leigh Stoller authored
      1) Anytime we need to generate a slice credential, and the slice has
         expired, bump the slice expiration so we can create a valid credential
         and then reset the expiration. Consider if the slice expires but we
         missed it and its still active; we gotta be able to control it.
      
      2) From the beginning, we have done almost all RPC operations as the
         creator of the experiment. Made sense when the portal interface was not
         project aware, but now other users in the project can see and mess with
         experiments in their project. But we are still doing all the RPC
         operations as the creator of the experiment, which will need to change
         at some point, but in the short term I am seeing a lot of credential
         errors caused by an expired speaks-for credential for that creator (if
         they have not logged into the portal in a while). When this happens,
         lets generate a plain slice credential, issued to the SA, so that we can
         complete the operation. Eventually we have to make the backend project
         aware, and issue the operations as the web user doing the driving.
         Maybe as part of the larger portalization project.
      3ebffb34
  20. 27 Jan, 2016 1 commit
  21. 04 Jan, 2016 1 commit
  22. 09 Oct, 2015 1 commit
  23. 23 Sep, 2015 1 commit
  24. 14 Sep, 2015 1 commit
  25. 10 Sep, 2015 1 commit
  26. 21 Aug, 2015 1 commit
  27. 19 Jun, 2015 1 commit
    • Leigh Stoller's avatar
      New support for importing image backed datasets from other clusters. This · 613d90dd
      Leigh Stoller authored
      is just like importing images (by using a url instead of a urn), which
      makes sense since image backed datasets are just images with a flag set.
      
      Key differences:
      
      1. You cannot snapshot a new version of the dataset on a cluster it has
         been imported to. The snapshot has to be done where the dataset was
         created initially. This is slightly inconvenient and will perhaps
         confuse users, but it is far less confusing that then datasets getting
         out of sync.
      
      2. No image versioning of datasets. We can add that later if we want to.
      613d90dd
  28. 08 Jun, 2015 1 commit
  29. 22 May, 2015 2 commits
  30. 30 Apr, 2015 1 commit
  31. 18 Mar, 2015 1 commit
  32. 11 Mar, 2015 1 commit
  33. 10 Mar, 2015 1 commit
  34. 06 Mar, 2015 1 commit
  35. 05 Mar, 2015 1 commit
  36. 04 Feb, 2015 1 commit
  37. 27 Jan, 2015 1 commit
    • Leigh Stoller's avatar
      Two co-mingled sets of changes: · 85cb063b
      Leigh Stoller authored
      1) Implement the latest dataset read/write access settings from frontend to
         backend. Also updates for simultaneous read-only usage.
      
      2) New configure options: PROTOGENI_LOCALUSER and PROTOGENI_GENIWEBLOGIN.
      
         The first changes the way that projects and users are treated at the
         CM. When set, we create real accounts (marked as nonlocal) for users and
         also create real projects (also marked as nonlocal). Users are added to
         those projects according to their credentials. The underlying experiment
         is thus owned by the user and in the project, although all the work is
         still done by the geniuser pseudo user. The advantage of this approach
         is that we can use standard emulab access checks to control access to
         objects like datasets. Maybe images too at some point.
      
         NOTE: Users are not removed from projects once they are added; we are
         going to need to deal with this, perhaps by adding an expiration stamp
         to the groups_membership tables, and using the credential expiration to
         mark it.
      
         The second new configure option turns on the web login via the geni
         trusted signer. So, if I create a sliver on a backend cluster when both
         options are set, I can use the trusted signer to log into my newly
         created account on the cluster, and see it (via the emulab classic web
         interface).
      
         All this is in flux, might end up being a bogus approach in the end.
      85cb063b