1. 30 May, 2018 1 commit
    • Leigh B Stoller's avatar
      Several backend/RPC changes for reservations: · 8266ae51
      Leigh B Stoller authored
      1. Return current set of reservations (if any) for a user when getting
         the max extension (piggy backing on the call to reduce overhead).
      2. Add RPC to get the reservation history for a user (all past
         reservations that were approved).
         Aside; the reservation_history table was not being updated properly,
         only expired reservations were saved, not deleted (but used)
         reservations, so we lost a lot of history. We could regen some of it
         from the history tables I added at the Portal for Dmitry, but not
         sure it is worth the trouble.
      3. And then the main content of this commit is that for both of the
         lists above, also return the experiment usage history for the project
         an dthe user who created the reservation. This takes the form of a
         time line of allocation changes so that we can graph node usage
         against the reservation bounds, to show graphically how well utilized
         the reservation is.
  2. 24 Apr, 2018 1 commit
  3. 17 Apr, 2018 1 commit
  4. 26 Mar, 2018 2 commits
  5. 14 Mar, 2018 1 commit
  6. 16 Feb, 2018 2 commits
    • Leigh B Stoller's avatar
    • Leigh B Stoller's avatar
      A lot of work on the RPC code, among other things. · 56f6d601
      Leigh B Stoller authored
      I spent a fair amount of improving error handling along the RPC path,
      as well making the code more consistent across the various files. Also
      be more consistent in how the web interface invokes the backend and gets
      errors back, specifically for errors that are generated when taking to a
      remote cluster.
      Add checks before every RPC to make sure the cluster is not disabled in
      the database. Also check that we can actually reach the cluster, and
      that the cluster is not offline (NoLogins()) before we try to do
      anything. I might have to relax this a bit, but in general it takes a
      couple of seconds to check, which is a small fraction of what most RPCs
      take. Return precise errors for clusters that are not available, to the
      web interface and show them to user.
      Use webtasks more consistently between the web interface and backend
      scripts. Watch specifically for scripts that exit abnormally (exit
      before setting the exitcode in the webtask) which always means an
      internal failure, do not show those to users.
      Show just those RPC errors that would make sense users, stop spewing
      script output to the user, send it just to tbops via the email that is
      already generated when a backend script fails fatally.
      But do not spew email for clusters that are not reachable or are
      offline. Ditto for several other cases that were generating mail to
      tbops instead of just showing the user a meaningful error message.
      Stop using ParRun for single site experiments; 99% of experiments.
      For create_instance, a new "async" mode that tells CreateSliver() to
      return before the first mapper run, which is typically very quickly.
      Then watch for errors or for the manifest with Resolve or for the slice
      to disappear. I expect this to be bounded and so we do not need to worry
      so much about timing this wait out (which is a problem on very big
      topologies). When we see the manifest, the RedeemTicket() part of the
      CreateSliver is done and now we are into the StartSliver() phase.
      For the StartSliver phase, watch for errors and show them to users,
      previously we mostly lost those errors and just sent the experiment into
      the failed state. I am still working on this.
  7. 25 Jan, 2018 1 commit
  8. 22 Jan, 2018 3 commits
  9. 19 Nov, 2017 2 commits
  10. 04 Oct, 2017 2 commits
  11. 12 Jul, 2017 1 commit
    • Leigh B Stoller's avatar
      Improvements to the protogeni fcgid handler: · aeb3d617
      Leigh B Stoller authored
      * Fix the logging that had been messed up for while; the logfile object
        was not defined in the children, needed a little reorg.
      * Add changes needed for SecureImageDownload(), which is a little
        messier with fcgid since we have to stream the image back to apache
        which means we need to reconnect the fcgid handler.
      * Add the CH module, seems to work fine.
      * Wrap the calls to cluster-wrapper.pl so that we can set an ENV
        variable indicating which module is being served, and then put this
        in the proc title; its very annoying that perl (sometime?) messes the
        with the proc title without permission from me, so I don't know what
        each server is serving since the command line options are gone.
      * Some tweaks to the apache config file.
      Note hat this is not running live yet, still just in my devel tree.
  12. 23 May, 2017 1 commit
  13. 25 Apr, 2017 1 commit
  14. 17 Apr, 2017 1 commit
    • Leigh B Stoller's avatar
      Attempt to operate in an admin mode for reservations · 188f041f
      Leigh B Stoller authored
      So, one reason the fast RPC path is fast cause we do not normally
      operate with credentials, but with reservations we have to since we want
      the reservation creator to be a real user and of course the project has
      to exist. Need credentials for that. But when an admin is editing or
      creating a reservation in another project, we need the admin user to
      exist too, and we might need the project to be created. That requires
      different credentials. So in an attempt to deal more generally with the
      admin problem, export an entrypoint to create a user (the admin user)
      before trying to create a reservation. Not sure this is the best way to
      go but its one way to go.
      In general, I think we need a more explicit user/project management API
      for the Portal. Needs more thought.
  15. 24 Mar, 2017 1 commit
  16. 23 Mar, 2017 1 commit
  17. 20 Mar, 2017 1 commit
  18. 14 Mar, 2017 1 commit
  19. 06 Mar, 2017 1 commit
    • Leigh B Stoller's avatar
      Two changes to reservations: · 5e7e613b
      Leigh B Stoller authored
      1. Plumb through a prediction RPC to return the reservation system
         pressure and outstanding reservations for a list of projects. This is
         invoked from the instantiate page when loaded, using the projects
         the user has permission to create experiments in, the results are
         stored in a script global variable for someone else to make sense of.
      2. When checking to see if a reservation can be accommodated, check with
         the admission control library first to see if the is a project limit
         on the type that would be violated. Need to do a little rearranging
         of the deck chairs in admission control library.
  20. 16 Feb, 2017 1 commit
  21. 15 Feb, 2017 1 commit
  22. 20 Jan, 2017 1 commit
  23. 03 Nov, 2016 3 commits
  24. 20 Sep, 2016 1 commit
  25. 29 Aug, 2016 3 commits
  26. 20 Jul, 2016 1 commit
    • Leigh B Stoller's avatar
      This change is to support switching urn in data structure from strings · 9d2cc009
      Leigh B Stoller authored
      to objects (see GeniHRN, look for new()). Much easier, less typing.  But
      in order to do that, we have make sure that if we send one on the wire,
      it gets converted properly (converted to it its plain string).
      But, Frontier and XML::RPC do not let you hook in so that if you have a
      blessed reference, it will call its stringify method for encoding.  Odd,
      cause encode_json (JSON) supports that.
      As it turns out, Frontier is structured so that it is easy to hook into
      it, and its only mildly sleazy. And Frontier has not changed in years,
      so probably not going to change much in the next few years. XML::RPC was
      too messy, so I switched Genixmlrpc CallMethod() to use Frontier
      instead.  Does not seem to risky, we use Frontier on the receiving end
      Lets see how this goes, its been running in my devel tree for a while.
  27. 11 Jul, 2016 1 commit
  28. 01 Jun, 2016 1 commit
  29. 25 May, 2016 1 commit
  30. 25 Apr, 2016 1 commit