1. 26 Oct, 2016 40 commits
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      lcd-create-v2: Update address space layout for v2. · 39dfe81c
      Charlie Jacobsen authored
      The boot physical address space fits in the first 512 GBs.
      The boot virtual address maps this to the high 512 GBs. In this
      way, the kernel module code will be loaded at the address it was
      linked for.
      
      Address spaces designed for 1 GB huge guest virtual pages. The
      hope is this address will be big enough for a lot of use cases,
      so the code inside the LCD won't ever need to touch the guest
      virtual address space (just physical).
      39dfe81c
    • Anton Burtsev's avatar
      Changed regs to be a data structure vs an array · d87f9b5e
      Anton Burtsev authored
      Not tested, can't test because of the PAT problems
      d87f9b5e
    • Charles Jacobsen's avatar
      Fix PAT initialization, shrink ioremap region for now. · 2aedcbde
      Charles Jacobsen authored
      PAT appears to work now with write back memory. Added an
      arch check to ensure PAT is valid before VM entry.
      
      The ioremap region was too big (the amount of metadata - bitmaps,
      etc. - was too much). I shrunk it for now to 16 MBs. Later, we
      can make it slightly larger.
      2aedcbde
    • Charlie Jacobsen's avatar
      1e5c02ad
    • Charlie Jacobsen's avatar
      lcd-create-v2: Add low- and high-level vmalloc to kliblcd. · cf3f9e4c
      Charlie Jacobsen authored
      Similar to alloc pages. Call into the microkernel to vmalloc
      memory. Then, add it to the vmalloc resource tree so we can
      do address -> cptr translation.
      cf3f9e4c
    • Charlie Jacobsen's avatar
      b49acd30
    • Charlie Jacobsen's avatar
      host-resource-trees: Volunteer host memory. · ac556704
      Charlie Jacobsen authored
      When non-isolated code wants to "volunteer" host memory to the
      microkernel's capability system, it invokes lcd_volunteer_pages,
      lcd_volunteer_dev_mem, or lcd_volunteer_vmalloc_mem, depending on
      the type of memory.
      
      Internally, these check to see if the memory has already been
      volunteered (so we don't get duplicates -- this is checked via the
      global memory interval tree). If not, it inserts the memory into
      the caller's cspace. The caller can subsequently share the memory
      with e.g. an isolated LCD via the capability mediated interfaces.
      
      I had support for this before, but there was no checking for
      duplicate inserts (and this is a real problem with the pmfs example
      for string sharing): Non-isolated code has no way of knowing
      (without implementing data structures on its own) whether it inserted
      host memory already or not, or whether some other non-isolated
      code has.
      
      Furthermore, now we have full support for address -> cptr translation
      in the non-isolated side. This is also needed for the pmfs example
      with string sharing: before, the non-isolated code just always
      inserted memory every time to share strings, even if this lead
      to duplicate inserts.
      
      I think this is one of the "friction points" of embedding a
      capability system inside a kernel: translation from host objects
      to capabilities and back. For some objects, you can just embed
      the cptr in the object itself (our "container structs"). But for
      some things -- like memory -- it's not so easy. (For device memory,
      the host kernel doesn't use a struct page to represent it. So we're
      faced with creating our own giant array of data structures to
      represent each page of device memory, and embedding the cptr in
      that. Or instead -- as I have done -- use a data structure like
      a tree to do a reverse lookup.)
      ac556704
    • Charlie Jacobsen's avatar
      host-resource-trees: Add/remove resource nodes from trees in kliblcd. · d39579d0
      Charlie Jacobsen authored
      Motivation: An LCD needs to keep track of address -> cptr
      correspondences. The resource trees fulfill that role. Each
      kLCD has two resource trees: one for physical memory (RAM, dev mem,
      etc.) and one for vmalloc memory (non-contiguous physical
      memory that is contiguous in the virtual address space).
      
      To mirror isolated code, when the kLCD maps/unmaps a memory
      object in host physical, we update its resource trees. (Of course,
      we don't bother / can't modify the host's physical mappings, so
      this is all that happens.) It gives kliblcd a chance to update
      the trees.
      d39579d0
    • Abhiram Balasubramanian's avatar
      Fix compilation error · 4acf9cbb
      Abhiram Balasubramanian authored
      
      
      - moved dependancies accordingly
      Signed-off-by: Abhiram Balasubramanian's avatarAbhiram Balasubramanian <abhiram@cs.utah.edu>
      4acf9cbb
    • Anton Burtsev's avatar
      Moved LCD_TEST_MODS_PATH to the Makefile · 4e643c28
      Anton Burtsev authored
      -- way we don't have to change include/lcd-domains/types.h
      4e643c28
    • Abhiram Balasubramanian's avatar
      Add ioremap support for lcds · be3834c3
      Abhiram Balasubramanian authored
      
      
      - introduce a new memory space as a part of GPA and GVA
      - set PAT memory type to UC so that effective memory type becomes UC
      
      NOTE - implementation needs to be tested with Charlie's revamped code
      Signed-off-by: Abhiram Balasubramanian's avatarAbhiram Balasubramanian <abhiram@cs.utah.edu>
      be3834c3
    • Charlie Jacobsen's avatar
      libcap-integration: Resource trees partially integrated into kliblcd. · be7b3caa
      Charlie Jacobsen authored
      Enter/exit code sets up/tears down the thread's tree.
      
      Fixed a few spotted bugs in allocation and tree code.
      be7b3caa
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      generalized-allocator: Clean up interface and documentation. · 8ee3bfac
      Charlie Jacobsen authored
      Few more details to sort out, bugs caught after thinking through
      documentation.
      8ee3bfac
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      generalized-allocator: Add free page code. · 63293b17
      Charlie Jacobsen authored
      Borrows from Linux's buddy allocator code in mm/page_alloc.c,
      in __free_one_page.
      63293b17
    • Charlie Jacobsen's avatar
      generalized-allocator: Add allocator init. · 473d6bab
      Charlie Jacobsen authored
      Most of the initialization is ready. I just need to implement
      some of the core routines for alloc/free that initialization
      depends on.
      473d6bab
    • Charlie Jacobsen's avatar
      generalized-allocator: Add code for metadata allocation. · b24caca1
      Charlie Jacobsen authored
      I'm no longer providing the option of embedding resource nodes in
      the metadata. The caller can just reserve a static array of the
      appropriate size (shouldn't be too big in common cases). Makes
      the metadata size calculation simpler. And the caller will most likely
      need to do tuning regardless.
      b24caca1
    • Charlie Jacobsen's avatar
      generalized-allocator: Add page allocator metadata size calculation. · 058edf87
      Charlie Jacobsen authored
      This is needed so that the page allocator user knows how much memory
      is needed to set it up. It's also needed for the metadata embedding
      trick.
      058edf87
    • Charlie Jacobsen's avatar
      generalized-allocator: Add resource tree implementation. · 6c0c9fc0
      Charlie Jacobsen authored
      This is a thin wrapper around Linux's interval tree.
      6c0c9fc0
    • Charlie Jacobsen's avatar
      generalized-allocator: Sketch out data structures and interfaces. · 4d114c0e
      Charlie Jacobsen authored
      I'm introducing two new data structures: a page allocator and
      a resource tree.
      
      page allocator motivation: We need some kind of data structure for
      tracking regions of the guest physical address space. For example,
      you may want to dedicate the portion of physical address space from
      (1 << 20) -- (16 << 20) (from the first megabyte to the sixteenth
      megabyte) for ioremap's. Someone will say: give me 16 pages of
      uncacheable addresses I can use to ioremap this device memory; the
      page allocator will find 16 free pages of guest physical address.
      
      resource tree motivation/what it is: This data structure is an
      interval tree used to resolve a guest physical address to the
      cptr/capability for the memory object that is mapped at that
      address. For example, you may need the cptr for the page that
      backs a certain guest physical address so that you can share or
      free the page (you need the cptr for the page because the microkernel
      interface only uses cptr's).
      
      page allocator planned implementation:
      
      I plan to adapt the buddy allocator algorithm from Linux. After
      reviewing the code, I found the algorithm to be simple enough that this
      is realistic. In addition, the allocator will provide a means for
      doing "microkernel page allocs" in a more coarse-grained fashion. For
      example, the page allocator will call out to the microkernel to get
      chunks of 1 MB machine pages, and then allocate from that at page
      (4 KB) granularity. This means fewer VM exits. (Right now, every page
      alloc/free results in a VM exit; the page allocator calls out the
      microkernel for every page alloc/free; it doesn't try to do
      coarse-grained alloc/frees and then track those bigger chunks.)
      
      I also plan to allow the page allocator to "embed" its metadata in the
      address space that it is managing, to cover some heap bootstrap issues.
      (This embedding won't work for some use cases, like tracking uncacheable
      memory - we wouldn't want to embed the RAM that contains the page
      allocator metadata inside the address space region and make it
      uncacheable.)
      
      Finally, I plan to allow the page allocator to use varying granularity
      for "microkernel allocs" (if applicable) and allocs for higher levels
      (e.g., page allocator allocates 4 MB chunks from the microkernel, but allows
      higher level code inside an LCD to alloc at 4 KB granularity).
      
      The page allocator data structure (there can be multiple instances)
      will be used exclusively inside an LCD for guest physical address
      space management.
      
      resource tree planned implementation:
      
      I plan to re-use the interval tree data structure in Linux. Google
      developed a nice API. (It replaced the priority tree that was once
      used in the vma code.)
      
      The resource tree will be used in isolated and non-isolated environments
      (physical address -> cptr translation is needed in both).
      
      Some alternatives/discussion:
      
      I could use an easier bitmap first-fit algorithm for page allocation,
      but this is slow (this is what we use now). I wondered if the majority
      of page allocs will be on the control path, and that we may be able to
      tolerate this inefficiency (and all data path operations will involve just
      ring buffer operations on shared memory that is set up beforehand). But
      I suspect this won't be the case. There could be some slab allocations that
      happen on the data path for internal data; if the slab needs to shrink
      or grow, this may trigger a call into the page allocator, which could
      be slow (if it triggered a VM exit, a call on the data path could be
      bloated to 2000 cycles). Maybe this is not true and my concerns are
      unfounded.
      
      It may also seem silly to have multiple page allocator instances inside
      an LCD; why not just one allocator that manages the entire address
      space? First, some of the dynamic regions are independent of each other:
      The heap region and the ioremap region are for different purposes; having
      a single allocator for both regions might be complex and error prone.
      Second, you wouldn't want one allocator to track the entire
      address space since its huge (the amount of allocator metadata could
      be enormous, depending on the design). My intent is to abstract over
      common needs from all regions (tracking free guest physical address space)
      and provide some hooks for specific cases.
      
      An alternative to the resource tree is to just use a giant array of
      cptr's, one slot for each page (this is what we do now for the heap
      inside an LCD). You would then translate a physical
      address to an offset into the array to get the cptr for the resource
      that contains that physical address (e.g. a page). There are a couple
      issues with this: First, the array could be huge (for a region of
      16 GBs at 4 KB granularity, the array would be 32 MBs). Second, even
      if this is tolerable inside an LCD, the non-isolated code needs the
      same functionality (address -> cptr translation), and setting up a
      giant array for the entire host physical address space is obviously
      dumb (and would need to be done *per thread* since each thread uses
      its own cspace). A tree handles the sparsity a lot better.
      
      Finally, it is worth considering how KVM handles backing guest physical
      memory with real host/machine pages. I believe they use some
      sophisticated demand paging triggered by EPT faults, and all of this
      is hooked into the host's page cache. This seems too scary and
      complex for our humble microkernel (that we want to keep simple).
      
      I hope you enjoyed this giant commit message.
      4d114c0e
    • Charlie Jacobsen's avatar
      libcap-integration: Re-factor basic lcd create code in kliblcd. · 0ca33457
      Charlie Jacobsen authored
      Creating empty LCDs, configuring them, running them.
      0ca33457
    • Charlie Jacobsen's avatar
      libcap-integration: Re-factor kliblcd page alloc and mapping. · fc7f2e63
      Charlie Jacobsen authored
      Mostly complete except for the bits that need the rb tree I'm
      planning to put in place for translating physical addresses
      to cptr's.
      
      This may seem like silly refactoring, but it's cleaning up
      and unifying a bunch of crap (including the more recent
      feature for passing strings back and forth).
      fc7f2e63
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      libcap-integration: Finish liblcd v2 headers for now. · 007e3f0c
      Charlie Jacobsen authored
        -- cap.h: delete, revoke; you may wonder: why do we need this
                  if we have libcap? Answer: an LCD needs to have a way
                  to modify *it's own* cspace, rather than cspaces it
                  manages
        -- console.h: lcd_printk and friends, moved into new file with
                      few changes
        -- enter_exit.h: lcd_enter, exit, etc., moved into new file with
                         few changes
      007e3f0c
    • Charlie Jacobsen's avatar
      libcap-integration: Add some headers for liblcd v2. · 7dadddf3
      Charlie Jacobsen authored
      I wanted to do this first before re-factoring kliblcd, so I know
      what I need to do.
      
      This is a step toward unifying the old isolated and non-isolated
      interfaces. The semantics of each function will be a bit different
      depending on the execution context.
      
        -- address_spaces.h: from old types.h, with few changes
        -- boot_info.h: bootstrap page data; from old types.h; small
                        changes to struct
        -- create.h: LCD and kLCD creation; from old kliblcd.h; doc cleaned
                     up and interface
        -- mem.h: unified memory interface; coalesces functions from old
                  liblcd.h and kliblcd.h
        -- sync_ipc.h: unifies ipc and utcb headers
        -- syscall.h: same as before
      
      Removes old capability and data store crap.
      
      Also, fixes small bug for edge case in cap types.
      7dadddf3
    • Charlie Jacobsen's avatar
      libcap-integration: Simplify types.h header. · d125987a
      Charlie Jacobsen authored
      Removes cptr and capability crap. Boot info and LCD address
      spaces will be moved to separate header(s).
      d125987a
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      libcap-integration: Add revoke syscall, update syscall ids. · 31d6991f
      Charlie Jacobsen authored
      Revoke wasn't available before. Put it in there for completeness.
      31d6991f
    • Charlie Jacobsen's avatar
      libcap-integration: Add syscall arg accessors. · 4fbcfebd
      Charlie Jacobsen authored
      I just use %r8, %r9, etc. for syscall args.
      4fbcfebd
    • Charlie Jacobsen's avatar
      Adds support for passing short strings back and forth. · 92e45ef3
      Charlie Jacobsen authored
      Simple illustrative example in test-mods/string_example. Ping-pongs
      a string 'hello' back and forth a few times.
      
      The string/data needs to be contained inside a single page
      for now. This should hopefully handle most of our immediate
      use cases.
      
      Idea: The sender grants the receiver access to the page that
      contains the data. The receiver maps it in their address space. This
      is clearly too slow for the main data path, and it's a bit hackie.
      But it works for strings that are passed in control plane interactions
      (rather than trying to cram the darn thing inside the message buffer).
      
      Wiki to be updated soon with more details.
      92e45ef3
    • Charles Jacobsen's avatar
      Got async working! · 96995fae
      Charles Jacobsen authored
      96995fae
    • Charlie Jacobsen's avatar
      Updated address space doc. · f37f2c0c
      Charlie Jacobsen authored
      f37f2c0c
    • Charlie Jacobsen's avatar
      Adds example to test async runtime. · 6b12bac4
      Charlie Jacobsen authored
      The example fails right now. Michael and I will debug.
      
      Async (THC) runtime init/exit integrated into
      LCD boot/shutdown. This is working properly.
      
      This commit also fixes a serious bug in the page allocator,
      and fixes some kmalloc configuration.
      
      As I will note on the wiki, the maximum allowed alloc size
      for kmalloc is 32 pages (128 KBs). This was chosen so that
      the async runtime will work (lazy stacks are about 17 pages).
      There is a limit on the max size because of how I allocate
      groups of pages (I use an array allocated on the stack inside
      the page allocator at some point, and this array can't be
      too big). In the future, we may improve on this.
      6b12bac4
    • Charlie Jacobsen's avatar
      Separates utcb (message buffer) and the stack. · 742eca0d
      Charlie Jacobsen authored
      The utcb is in a separate page of memory now. The stack can now
      be made bigger than a page if necessary. This isn't a final
      solution, but it will work for now while trying to get async
      working. (Obviously, calling send/recv async won't work right
      now since there is just one global message buffer.)
      742eca0d
    • Charlie Jacobsen's avatar
    • Charlie Jacobsen's avatar
      c74b3315
    • Charlie Jacobsen's avatar
      Moves async/thc into liblcd. Removes old files. · 34760078
      Charlie Jacobsen authored
      
      
      Not integrated into liblcd yet.
      
      Conflicts:
      	drivers/Kconfig
      	drivers/Makefile
      	include/linux/sched.h
      	virt/lcd-domains/thcsync.c
      Resolved-by: Vikram Narayanan's avatarVikram Narayanan <vikram186@gmail.com>
      34760078
    • Charlie Jacobsen's avatar
      Major overhaul of build process. · 8198c2fb
      Charlie Jacobsen authored
      Full kernel build no longer required. Yay! This should
      cut down on dev time a lot.
      
      I moved all of the LCD source into $(kernel-src)/lcd-domains/,
      so it's all in one spot. There is now a top-level makefile in
      there that triggers building liblcd, the microkernel, and the
      examples. This is built as an *external* build now, even
      though the directory is in the kernel source. The build now takes
      under a minute to do everything LCD related.
      
      This should also make verification easier in the future (e.g.
      building with clang) if we aren't ensnared in the kernel
      source.
      
      Of course, to use the microkernel and examples, you have to
      build the patched kernel and install it. But now when you
      make a few lines of changes in e.g. an example, you don't have
      to trigger a top-level kernel build to rebuild it. Running
      the full kernel build takes on average about 3 - 4 minutes
      (some files are generated everytime, linking is done, and so
      on), and can take upwards of 30 minutes for a full build if you
      re-config'd.
      
      Which brings me to my other change: no more config for LCDs
      in menuconfig. If we create menu entries for every example
      and so on, we end up changing the config too often, and this
      triggers full kernel rebuilds == waste of time. We can use
      macros by setting them via compiler flags (e.g., -DSOME_FLAG).
      Furthermore, it wasn't making sense to me to do conditional
      compilation for LCD support (we always want to compile for that).
      Yes, changes aren't clearly delineated with macros, but you can
      see changes made by just doing 'git diff v3.10.14 some-file-or-dir'.
      
      The wiki has been fully updated with instructions for building,
      and other relevant parts (updated paths to files).
      
      I also took the opportunity to clean up some old stuff lying around
      that is dead (like lcdguest). I incorporated all of the documentation
      in Documentation/lcd-domains into the wiki so it's all in one
      spot now (including some helpful debug tips).
      8198c2fb