1. 26 Oct, 2016 26 commits
    • Charlie Jacobsen's avatar
      lcd-create-v2: Guest virtual address space set up. · c28ba3f0
      Charlie Jacobsen authored
      Map everything except the ioremap region as write back. ioremap
      region is uncacheable. Only high 512 GBs are mapped to the first
      512 GBs of guest physical.
      No more dynamic build up of the guest virtual address space during
      page walks. That code was just so damn messy.
    • Charlie Jacobsen's avatar
      lcd-create-v2: Physical address space setup (except UTCB). · fc7bf418
      Charlie Jacobsen authored
      Wow, this is so much easier with slightly coarser-grained
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      lcd-create-v2: Update address space layout for v2. · 39dfe81c
      Charlie Jacobsen authored
      The boot physical address space fits in the first 512 GBs.
      The boot virtual address maps this to the high 512 GBs. In this
      way, the kernel module code will be loaded at the address it was
      linked for.
      Address spaces designed for 1 GB huge guest virtual pages. The
      hope is this address will be big enough for a lot of use cases,
      so the code inside the LCD won't ever need to touch the guest
      virtual address space (just physical).
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      lcd-create-v2: Add low- and high-level vmalloc to kliblcd. · cf3f9e4c
      Charlie Jacobsen authored
      Similar to alloc pages. Call into the microkernel to vmalloc
      memory. Then, add it to the vmalloc resource tree so we can
      do address -> cptr translation.
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      host-resource-trees: Volunteer host memory. · ac556704
      Charlie Jacobsen authored
      When non-isolated code wants to "volunteer" host memory to the
      microkernel's capability system, it invokes lcd_volunteer_pages,
      lcd_volunteer_dev_mem, or lcd_volunteer_vmalloc_mem, depending on
      the type of memory.
      Internally, these check to see if the memory has already been
      volunteered (so we don't get duplicates -- this is checked via the
      global memory interval tree). If not, it inserts the memory into
      the caller's cspace. The caller can subsequently share the memory
      with e.g. an isolated LCD via the capability mediated interfaces.
      I had support for this before, but there was no checking for
      duplicate inserts (and this is a real problem with the pmfs example
      for string sharing): Non-isolated code has no way of knowing
      (without implementing data structures on its own) whether it inserted
      host memory already or not, or whether some other non-isolated
      code has.
      Furthermore, now we have full support for address -> cptr translation
      in the non-isolated side. This is also needed for the pmfs example
      with string sharing: before, the non-isolated code just always
      inserted memory every time to share strings, even if this lead
      to duplicate inserts.
      I think this is one of the "friction points" of embedding a
      capability system inside a kernel: translation from host objects
      to capabilities and back. For some objects, you can just embed
      the cptr in the object itself (our "container structs"). But for
      some things -- like memory -- it's not so easy. (For device memory,
      the host kernel doesn't use a struct page to represent it. So we're
      faced with creating our own giant array of data structures to
      represent each page of device memory, and embedding the cptr in
      that. Or instead -- as I have done -- use a data structure like
      a tree to do a reverse lookup.)
    • Charlie Jacobsen's avatar
      host-resource-trees: Add/remove resource nodes from trees in kliblcd. · d39579d0
      Charlie Jacobsen authored
      Motivation: An LCD needs to keep track of address -> cptr
      correspondences. The resource trees fulfill that role. Each
      kLCD has two resource trees: one for physical memory (RAM, dev mem,
      etc.) and one for vmalloc memory (non-contiguous physical
      memory that is contiguous in the virtual address space).
      To mirror isolated code, when the kLCD maps/unmaps a memory
      object in host physical, we update its resource trees. (Of course,
      we don't bother / can't modify the host's physical mappings, so
      this is all that happens.) It gives kliblcd a chance to update
      the trees.
    • Charlie Jacobsen's avatar
      libcap-integration: Resource trees partially integrated into kliblcd. · be7b3caa
      Charlie Jacobsen authored
      Enter/exit code sets up/tears down the thread's tree.
      Fixed a few spotted bugs in allocation and tree code.
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      generalized-allocator: Clean up interface and documentation. · 8ee3bfac
      Charlie Jacobsen authored
      Few more details to sort out, bugs caught after thinking through
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      generalized-allocator: Add free page code. · 63293b17
      Charlie Jacobsen authored
      Borrows from Linux's buddy allocator code in mm/page_alloc.c,
      in __free_one_page.
    • Charlie Jacobsen's avatar
      generalized-allocator: Add allocator init. · 473d6bab
      Charlie Jacobsen authored
      Most of the initialization is ready. I just need to implement
      some of the core routines for alloc/free that initialization
      depends on.
    • Charlie Jacobsen's avatar
      generalized-allocator: Add code for metadata allocation. · b24caca1
      Charlie Jacobsen authored
      I'm no longer providing the option of embedding resource nodes in
      the metadata. The caller can just reserve a static array of the
      appropriate size (shouldn't be too big in common cases). Makes
      the metadata size calculation simpler. And the caller will most likely
      need to do tuning regardless.
    • Charlie Jacobsen's avatar
      generalized-allocator: Add page allocator metadata size calculation. · 058edf87
      Charlie Jacobsen authored
      This is needed so that the page allocator user knows how much memory
      is needed to set it up. It's also needed for the metadata embedding
    • Charlie Jacobsen's avatar
      generalized-allocator: Add resource tree implementation. · 6c0c9fc0
      Charlie Jacobsen authored
      This is a thin wrapper around Linux's interval tree.
    • Charlie Jacobsen's avatar
      generalized-allocator: Sketch out data structures and interfaces. · 4d114c0e
      Charlie Jacobsen authored
      I'm introducing two new data structures: a page allocator and
      a resource tree.
      page allocator motivation: We need some kind of data structure for
      tracking regions of the guest physical address space. For example,
      you may want to dedicate the portion of physical address space from
      (1 << 20) -- (16 << 20) (from the first megabyte to the sixteenth
      megabyte) for ioremap's. Someone will say: give me 16 pages of
      uncacheable addresses I can use to ioremap this device memory; the
      page allocator will find 16 free pages of guest physical address.
      resource tree motivation/what it is: This data structure is an
      interval tree used to resolve a guest physical address to the
      cptr/capability for the memory object that is mapped at that
      address. For example, you may need the cptr for the page that
      backs a certain guest physical address so that you can share or
      free the page (you need the cptr for the page because the microkernel
      interface only uses cptr's).
      page allocator planned implementation:
      I plan to adapt the buddy allocator algorithm from Linux. After
      reviewing the code, I found the algorithm to be simple enough that this
      is realistic. In addition, the allocator will provide a means for
      doing "microkernel page allocs" in a more coarse-grained fashion. For
      example, the page allocator will call out to the microkernel to get
      chunks of 1 MB machine pages, and then allocate from that at page
      (4 KB) granularity. This means fewer VM exits. (Right now, every page
      alloc/free results in a VM exit; the page allocator calls out the
      microkernel for every page alloc/free; it doesn't try to do
      coarse-grained alloc/frees and then track those bigger chunks.)
      I also plan to allow the page allocator to "embed" its metadata in the
      address space that it is managing, to cover some heap bootstrap issues.
      (This embedding won't work for some use cases, like tracking uncacheable
      memory - we wouldn't want to embed the RAM that contains the page
      allocator metadata inside the address space region and make it
      Finally, I plan to allow the page allocator to use varying granularity
      for "microkernel allocs" (if applicable) and allocs for higher levels
      (e.g., page allocator allocates 4 MB chunks from the microkernel, but allows
      higher level code inside an LCD to alloc at 4 KB granularity).
      The page allocator data structure (there can be multiple instances)
      will be used exclusively inside an LCD for guest physical address
      space management.
      resource tree planned implementation:
      I plan to re-use the interval tree data structure in Linux. Google
      developed a nice API. (It replaced the priority tree that was once
      used in the vma code.)
      The resource tree will be used in isolated and non-isolated environments
      (physical address -> cptr translation is needed in both).
      Some alternatives/discussion:
      I could use an easier bitmap first-fit algorithm for page allocation,
      but this is slow (this is what we use now). I wondered if the majority
      of page allocs will be on the control path, and that we may be able to
      tolerate this inefficiency (and all data path operations will involve just
      ring buffer operations on shared memory that is set up beforehand). But
      I suspect this won't be the case. There could be some slab allocations that
      happen on the data path for internal data; if the slab needs to shrink
      or grow, this may trigger a call into the page allocator, which could
      be slow (if it triggered a VM exit, a call on the data path could be
      bloated to 2000 cycles). Maybe this is not true and my concerns are
      It may also seem silly to have multiple page allocator instances inside
      an LCD; why not just one allocator that manages the entire address
      space? First, some of the dynamic regions are independent of each other:
      The heap region and the ioremap region are for different purposes; having
      a single allocator for both regions might be complex and error prone.
      Second, you wouldn't want one allocator to track the entire
      address space since its huge (the amount of allocator metadata could
      be enormous, depending on the design). My intent is to abstract over
      common needs from all regions (tracking free guest physical address space)
      and provide some hooks for specific cases.
      An alternative to the resource tree is to just use a giant array of
      cptr's, one slot for each page (this is what we do now for the heap
      inside an LCD). You would then translate a physical
      address to an offset into the array to get the cptr for the resource
      that contains that physical address (e.g. a page). There are a couple
      issues with this: First, the array could be huge (for a region of
      16 GBs at 4 KB granularity, the array would be 32 MBs). Second, even
      if this is tolerable inside an LCD, the non-isolated code needs the
      same functionality (address -> cptr translation), and setting up a
      giant array for the entire host physical address space is obviously
      dumb (and would need to be done *per thread* since each thread uses
      its own cspace). A tree handles the sparsity a lot better.
      Finally, it is worth considering how KVM handles backing guest physical
      memory with real host/machine pages. I believe they use some
      sophisticated demand paging triggered by EPT faults, and all of this
      is hooked into the host's page cache. This seems too scary and
      complex for our humble microkernel (that we want to keep simple).
      I hope you enjoyed this giant commit message.
    • Charlie Jacobsen's avatar
      libcap-integration: Re-factor basic lcd create code in kliblcd. · 0ca33457
      Charlie Jacobsen authored
      Creating empty LCDs, configuring them, running them.
    • Charlie Jacobsen's avatar
      libcap-integration: Re-factor kliblcd page alloc and mapping. · fc7f2e63
      Charlie Jacobsen authored
      Mostly complete except for the bits that need the rb tree I'm
      planning to put in place for translating physical addresses
      to cptr's.
      This may seem like silly refactoring, but it's cleaning up
      and unifying a bunch of crap (including the more recent
      feature for passing strings back and forth).
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      libcap-integration: Finish liblcd v2 headers for now. · 007e3f0c
      Charlie Jacobsen authored
        -- cap.h: delete, revoke; you may wonder: why do we need this
                  if we have libcap? Answer: an LCD needs to have a way
                  to modify *it's own* cspace, rather than cspaces it
        -- console.h: lcd_printk and friends, moved into new file with
                      few changes
        -- enter_exit.h: lcd_enter, exit, etc., moved into new file with
                         few changes
    • Charlie Jacobsen's avatar
      libcap-integration: Add some headers for liblcd v2. · 7dadddf3
      Charlie Jacobsen authored
      I wanted to do this first before re-factoring kliblcd, so I know
      what I need to do.
      This is a step toward unifying the old isolated and non-isolated
      interfaces. The semantics of each function will be a bit different
      depending on the execution context.
        -- address_spaces.h: from old types.h, with few changes
        -- boot_info.h: bootstrap page data; from old types.h; small
                        changes to struct
        -- create.h: LCD and kLCD creation; from old kliblcd.h; doc cleaned
                     up and interface
        -- mem.h: unified memory interface; coalesces functions from old
                  liblcd.h and kliblcd.h
        -- sync_ipc.h: unifies ipc and utcb headers
        -- syscall.h: same as before
      Removes old capability and data store crap.
      Also, fixes small bug for edge case in cap types.