Commit 35ed57d7 authored by Charlie Jacobsen's avatar Charlie Jacobsen Committed by Vikram Narayanan

Linux slab allocator and page allocator inside LCDs.

Single threaded, no locks, no fancy NUMA/percpu.

Passing some simple examples. Added a memory management
example module, in test-mods/mm, that exercises a lot
of this new code.


I moved in and adapted our existing guest virtual
paging code from kliblcd.c. I'm using statically
allocated bitmaps and arrays for tracking allocations
in the guest virtual and physical address spaces.
Using identity mapping for ease. (I decided not to
use Linux's page allocator since it's too intertwined
with the boot process - percpu variables, freeing
init mem, boot allocator, all kinds of complexity ...)
It might not be too hard to reimplement the buddy
allocator algorithms, since I had to include a
statically allocated array of struct pages anyway.

I've set aside about 16 MBs for dynamic page allocations,
but this can be changed using macros. You can allocate
1, 2, 4, 8, etc. pages at a time. (The slab allocator
requires this.)


I finally broke down and set up boot info pages - 4
boot pages right now, can be adjusted with a macro.
Whoever boots the lcd needs to pack in information about the
lcd's address space, initial cspace, and so on. 4 pages
is enough to pack in information for larger modules
like the mm example.


I moved liblcd to a separate directory, and hacked the
build system so that we can build liblcd as a static
library and link it with example modules.

liblcd/ contains lcd/, which has code for interacting
with the microkernel and my simple page allocator.

The Linux slab allocator is inside mm/, and some
needed dependencies are in lib/. I made very few
changes to the source code itself, but used some
preprocessor/compiler hacks to make everything work. See
Documentation/lcd-domains/liblcd.txt. I elided all of
the locking and made it single core, single NUMA node.
It's possible we'll see some bigs in the future, in code I haven't
excerised yet (will probably manifest themselves as
page faults).

Ideally, we should have a separate tree for liblcd
and building modules. That way we can avoid some of these
hacks (maybe not all).


Updated a lot of the documentation in
parent 2af41f80
......@@ -31,3 +31,8 @@ in the upstream kernel after we branched off version 3.10.14. You might
see - watchdog: timeout on eth0 (bnx2) etc. etc. And you may lose connectivity
and possibly a hang (if you're trying to access a file via nfs).
-- There may be bad interactions with KVM code if you load it. This might
be the source of the bad hang, but I'm not sure.
-- See also some of the tips in liblcd.txt: Notes & Suggestions when debugging
page faults, etc. inside an LCD.
......@@ -65,3 +65,29 @@ The following transitions are an error (return non-zero), and have no effect:
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config).
This code is the most complicated part of kliblcd.c. We package up all of the
context and data for setting up a module LCD inside struct lcd_info. This
contains lists of pages we've mapped in the LCD, the temporary cptr
cache we're using to set up the LCD's cspace, and so on. This is done so
that we can properly boot the LCD and tear everything down later.
There are two main parts: loading the module and setting up the VM.
Loading the module happens in:
Setting up the VM happens in:
lcd_create_module_lcd loads the module and sets up the LCD's address space.
The caller can then finish the boot process by populating the boot info
pages for the LCD, providing it with endpoints, and so on.
See the examples in test-mods/ for usage.
This is the minimal libkernel that should be linked with an LCD. Right now,
it contains code for ipc, kmalloc, and page alloc, but requires whoever
boots the LCD to do some proper boot setup. See the example in
The code is in liblcd/ and is built before we recursively descend into the
other directories. This is so we don't have 10 different recursive make's
trying to build the code and link it into lib.a (from the dependees
in the test-mods folders). I slightly tweaked the top-level Makefile to do
I also slightly tweaked module building to allow for linking libraries with
modules. Most of the time, the build and link just worked, but occasionally,
liblcd was listed first in the command to LD, and hence wasn't linked with the
rest of the objects. (Recall that if a library is listed at the beginning
of the list of files to link with LD, it won't get linked with any of the
files, since there are no outstanding dependencies that require it.)
See the examples in test-mods/. You should include the following three headers
in every source file for a module that will go inside an LCD:
... other headers ...
<lcd-domains/liblcd-hacks.h> /* recommended */
You should also list liblcd as a dependency in the Makefile. Again, see
the examples.
** You may need to do some extra work, see NOTES & SUGGESTIONS. **
IMPORTANT: Do not put liblcd-hacks.h inside header files. Here's a problematic
#include <lcd-domains/liblcd-config.h> /* GOOD */
#include <linux/mm.h>
#include <linux/types.h>
#include <lcd-domains/liblcd-hacks.h> /* BAD */
#define FOO(x) x
static inline int bar(int x) { return x; }
typedef unsigned long gfp_t; /* this will break in file.c */
#include <lcd-domains/liblcd-config.h> /* GOOD */
#include "file.h" /* <<< BAD */
#include <linux/gfp.h> /* you will get a redefine error */
The hacks header file can also undefine symbols that other host kernel
headers are expecting. So, to be safe, only put it in source code (.c) files
after *all* of the other headers.
I will explain by example.
Suppose you want to pull the function foo into liblcd, but you don't want
to reimplement it (imagine foo is something complicated like kmalloc).
First, you figure out that foo is declared in include/linux/foo.h and
defined in mm/foo.c. The first step is to make a copy of foo.c and put it
in liblcd/mm (I use the same directory structure as the kernel).
foo uses a lot of conditional compilation (using CONFIG_* macros), and you
want it to build in a certain way. Define or undefine the correct CONFIG_*
macros in liblcd-config.h, and put <lcd-domains/liblcd-config.h> at the
top of foo.c. This will make all of the code in foo.c and the headers it
includes have the proper configuration.
foo has a number of dependencies. There are five possible types for kernel
1 -- foo calls another function in foo.c
2 -- foo calls another function in a different file that *is not* exported
3 -- foo calls another function in a different file that *is* exported
4 -- foo calls an inline function in a header it includes
5 -- foo uses a macro defined in one of the headers it includes
The strategy depends on the type and how much complexity you want to
Suppose foo calls bar, and bar is in foo.c. The call to bar will work, but
you will need to ensure all of bar's dependencies are fulfilled.
Suppose foo calls bar in bar.c. You have to resolve this dependency or
else linking will fail. You can pull in bar.c into liblcd, or you can
emulate bar in <lcd-domains/hacks.h>, either by eliding it away or
emulating it with other functions that are in liblcd. Pulling in bar.c
may be more complicated, but preferrable if other code depends on it and
bar isn't too complicated. Other functions in bar.c may not be needed, and
you can fulfill their dependencies with some major hacks and elision.
Suppose foo calls bar in bar.c. First, bar may never be called for your
scenarios, and if this is the case, it's probably best to elide it by putting
a hack in <lcd-domains/hacks.h>. Alternatively, you can suck in the file
bar.c. <lcd-domains/hacks.h> will elide the EXPORT_SYMBOL macros, so the
build system won't get confused when it sees a double export. Finally, you
can also choose not to resolve this dependency at all - if bar.c is
built for the host kernel, the build system will see that bar is exported,
and it won't complain when it tries to build and link foo.c in a library
or module (it will assume the dependency will be resolved when the module
is installed). Of course, if bar happens to be called unexpectedly inside
the LCD, you would probably get a page fault since bar is not linked.
Except for the lcd/ subdirectory, all of the source code is from the original
kernel, with very few changes (some files just have the two headers added -
liblcd-config.h and liblcd-hacks.h).
liblcd-config.h changes the build configuration so that the code will be
built for a uniprocessor machine with one NUMA node, no debugging, etc. This
was set up until I got it working; it may not be fully correct or work in
all build scenarios.
For macros and inlines, if they don't cause trouble, you don't have to do
anything. But if they contain code that will break things, you're only option
is to #undef them and emulate them in the hacks header.
Your goal is to make as few changes as possible - the more changes you make,
the easier it is to introduce bugs. You'll notice in liblcd/mm/slab.c,
I carefully mark where I made changes using /* BEGIN LCD */ and /* END LCD */.
** IMPORTANT: If you have global variables that are uninitialized (in the
BSS section), you will need to manually zero them out at some point before
they are used in the LCD. You can see what those variables are by doing
something like
nm my-module.ko | grep '.* b '
nm my-module.ko | grep '.* B '
Some variables marked as __initdata do not show up as b or B via nm. I am
not sure if these are properly zero'd or not, so beware (I'm zero'ing
some out to be safe). You can see all of the symbols using:
readelf -s my-module.ko
You will see lines like this:
82: 0000000000000000 360 OBJECT LOCAL DEFAULT 19 init_kmem_cache_node
This says init_kmem_cache_node is a local variable that resides in section
19. To list all sections, do:
readelf -S my-module.ko
You will see lines like this (this is section 19):
[19] PROGBITS 0000000000000000 0000a3a0
0000000000000168 0000000000000000 WA 0 0 32
Note that init_kmem_cache_node is marked as __initdata, so it appears in this
While sucking in code into liblcd, you can build it and then see what
symbols are unresolved via nm. To be safe, you should go through every
line of the source code to see what the dependencies are, so that you
are using the macros/inlines/etc. that you expect.
When linking with a module, you can make sure all dependencies are resolved
by running nm on it, e.g.,
nm my-module.ko
This is after my-module.ko has been built and linked with liblcd/lib.a.
If you get page faults, you can look at the kernel logs to see where the
module was loaded in the host. Take the faulting address, and subtract off
the starting address of the core code that was loaded. Now objdump the
kernel module, and locate the address in there.
For example, if the faulting address was 0x1234, and the module was loaded
at address 0x1200, the offset into the module is 0x1234 - 0x1200 = 0x34.
objdump -d my-module.ko
and look for 0x34 on the left side. You can also do
objdump -S my-module.ko
to see the source code intermixed with assembly. This will help you pinpoint
the spot in the source code faster.
If the faulting address is low, like 0x8 or 0x34, you most likely are using
a null pointer somewhere - which may imply you have uninitialized data
(globals, e.g.).
If you get a general protection exception on a mov instruction, you might
have a bad non-canonical address - also may mean uninitialized data.
......@@ -64,11 +64,17 @@ kernel source,
-- go into Virtualization (2) and select Lightweight Capability
Domains and Intel Support for LCDs
-- it is recommended you build them as modules, for debugging
-- ** important ** : go into Processor type and features, and
turn off the stack protector feature (-fstack-protector);
otherwise, gcc will compile kernel code (include the modules
that are going inside lcd's) to use a stack protector, and
the lcd's are not configured for that
-- ** important ** : make sure you have these turned off:
-- KVM
-- stack protector (under Processor type and features)
-- tracing (under Kernel Hacking -> Tracing)
These should be off by default, but you might accidentally
kick them on if you turn on some debug features. Tracing
may actually be OK to have on, I'm not completely sure. (Tracing
and the stack protector features affect how code is built,
which could be bad for liblcd code.)
[ 2 ] exit and save the configuration
......@@ -5,28 +5,24 @@ OVERVIEW
In virt/lcd-domains/test-mods, you can put a new group of test modules
for running with the lcd system. You will find a few in there already.
One of them - load - is built automatically and ran during a test when the
One of them - printk - is built automatically and ran during a test when the
lcd module is inserted.
Beware! The load test module contains an infinite loop, so you probably don't
want to run it on your host machine (i.e., you should run it inside an
lcd where it will be preempted periodically and then stopped by the test
Follow the examples in the test-mods dir when in doubt.
Follow the examples in the test-mods dir when in doubt. You will most
likely need at least two modules - one of them will run non-isolated and
will boot the lcd, and the other is for the lcd.
Step 1
Create a new sub dir in test-mods with source files. At least one of the
Create a new sub dir in test-mods with source files. Again, at least one of the
modules should probably run non-isolated - it will set up the lcd's.
Important: Modules that will run inside lcd's should textually include
the liblcd file.
Important: Modules that will run inside lcd's should link with liblcd.
Step 2
......@@ -957,6 +957,13 @@ endif
# make sure no implicit rule kicks in
$(sort $(vmlinux-deps)): $(vmlinux-dirs) ;
# We do this before building anything else so that it's done
# and we don't have any problems with recursive make.
liblcd: prepare scripts
$(Q)$(MAKE) $(build)=$@
# Handle descending into subdirectories listed in $(vmlinux-dirs)
# Preset locale variables to speed up the build process. Limit locale
# tweaks to this spot to avoid wrong language settings when running
......@@ -964,7 +971,7 @@ $(sort $(vmlinux-deps)): $(vmlinux-dirs) ;
# Error messages still appears in the original language
PHONY += $(vmlinux-dirs)
$(vmlinux-dirs): prepare scripts
$(vmlinux-dirs): prepare scripts liblcd
$(Q)$(MAKE) $(build)=$@
define filechk_kernel.release
This diff is collapsed.
......@@ -7,7 +7,7 @@ source "virt/kvm/Kconfig"
bool "Virtualization"
depends on HAVE_KVM || X86
default y
default n
Say Y here to get to see options for using your Linux host to run other
operating systems inside virtual machines (guests).
......@@ -2282,6 +2282,58 @@ static void vmx_pack_desc(struct desc_struct *desc, u64 base, u64 limit,
/* VMX EXIT HANDLING -------------------------------------------------- */
static void dump_lcd_arch(struct lcd_arch *lcd)
unsigned long flags;
lcd->regs[LCD_ARCH_REGS_RIP] = vmcs_readl(GUEST_RIP);
lcd->regs[LCD_ARCH_REGS_RSP] = vmcs_readl(GUEST_RSP);
flags = vmcs_readl(GUEST_RFLAGS);
printk(KERN_ERR "---- Begin LCD Arch Dump ----\n");
printk(KERN_ERR "CPU %d VPID %d\n", lcd->cpu, lcd->vpid);
printk(KERN_ERR "RIP 0x%016llx RFLAGS 0x%08lx\n",
lcd->regs[LCD_ARCH_REGS_RIP], flags);
printk(KERN_ERR "RAX 0x%016llx RCX 0x%016llx\n",
printk(KERN_ERR "RDX 0x%016llx RBX 0x%016llx\n",
printk(KERN_ERR "RSP 0x%016llx RBP 0x%016llx\n",
printk(KERN_ERR "RSI 0x%016llx RDI 0x%016llx\n",
printk(KERN_ERR "R8 0x%016llx R9 0x%016llx\n",
printk(KERN_ERR "R10 0x%016llx R11 0x%016llx\n",
printk(KERN_ERR "R12 0x%016llx R13 0x%016llx\n",
printk(KERN_ERR "R14 0x%016llx R15 0x%016llx\n",
/* printk(KERN_ERR "Dumping Stack Contents...\n"); */
/* sp = (unsigned long *) vcpu->regs[VCPU_REGS_RSP]; */
/* for (i = 0; i < STACK_DEPTH; i++) */
/* if (get_user(val, &sp[i])) */
/* printk(KERN_INFO "vmx: RSP%+-3ld ?\n", */
/* i * sizeof(long)); */
/* else */
/* printk(KERN_INFO "vmx: RSP%+-3ld 0x%016lx\n", */
/* i * sizeof(long), val); */
printk(KERN_ERR "---- End LCD Arch Dump ----\n");
static inline int vmx_exit_intr(struct lcd_arch *lcd_arch)
return (lcd_arch->exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) ||
......@@ -2499,8 +2551,9 @@ static int vmx_handle_hard_exception(struct lcd_arch *lcd_arch)
LCD_ARCH_ERR("got a machine check inside vm!");
return -EIO;
LCD_ARCH_ERR("unhandled exception: vector = %x, info = %x",
vector, lcd_arch->exit_intr_info);
LCD_ARCH_ERR("unhandled exception: vector = %x, info = %x, instruction addr = 0x%lx",
vector, lcd_arch->exit_intr_info,
return -EIO;
......@@ -2564,7 +2617,7 @@ static int vmx_handle_exception_interrupt(struct lcd_arch *lcd_arch)
ret = vmx_handle_ext_intr(lcd_arch);
LCD_ARCH_ERR("unexcepted interrupt type %d", type);
LCD_ARCH_ERR("unhandled interrupt type %d", type);
ret = -EIO;
......@@ -2852,6 +2905,12 @@ out:
* If there was an error, dump the lcd's state.
if (ret < 0)
return ret;
This diff is collapsed.
* Ensure we use the right config
#define CONFIG_NR_CPUS 1
#undef CONFIG_NODES_SHIFT /* force max numnodes to 1 */
* Set include guards to force using our includes.
#ifndef MM_SLAB_H
#define MM_SLAB_H
#ifndef __MM_INTERNAL_H
#define __MM_INTERNAL_H
#include <lcd-domains/liblcd-config.h>
#include <lcd-domains/liblcd.h>
* Misc macros, etc.
#undef BUG
#define BUG() do { \
lcd_printk("BUG! in %s:%s:%d", __FILE__, __func__, \
__LINE__); \
lcd_exit(-1); \
} while (0)
#undef BUG_ON
#define BUG_ON(x) do { \
if (x) { \
lcd_printk("BUG! in %s:%s:%d", __FILE__, __func__, \
__LINE__); \
lcd_exit(-1); \
} \
} while (0)
#undef VM_BUG_ON
#define VM_BUG_ON(x) do { \
if (x) { \
lcd_printk("VM_BUG! in %s:%s:%d", __FILE__, __func__, \
__LINE__); \
lcd_exit(-1); \
} \
} while (0)
#define WARN_ON_ONCE(x) ({ x; })
#define EXPORT_SYMBOL(x)
#undef panic
#define panic(fmt, args...) do { \
lcd_printk(fmt, args); \
lcd_exit(-1); \
} while(0)
#undef printk
#define printk(fmt, args...) do { \
lcd_printk(fmt, args); \
} while(0)
#undef printk_ratelimit
#define printk_ratelimit() true
#undef kasprintf
#define kasprintf(x, fmt, args...) ({ lcd_printk(fmt, args); (char *)1; });
#undef dump_stack
#define dump_stack() do { } while (0)
#undef unlikely
#define unlikely(x) x
#undef likely
#define likely(x) x
#undef numa_mem_id
#define numa_mem_id() 0
#undef nr_cpus_node
#define nr_cpus_node(x) 1
#undef smp_processor_id
#define smp_processor_id() 0
#undef num_possible_nodes
#define num_possible_nodes() 1
#define DEFINE_PER_CPU(type, name) __typeof__(type) name
#undef percpu
#define percpu(name, cpu) name
#undef __initcall
#define __initcall(x)
#undef on_each_cpu
#define on_each_cpu(func, arg, wait) do { func(arg); } while(0)
* mm
#undef gfp_pfmemalloc_allowed
#define gfp_pfmemalloc_allowed(x) true
#undef sk_memalloc_socks
#define sk_memalloc_socks() false
#undef register_cpu_notifier
#define register_cpu_notifier(x) do { } while(0)
#undef add_zone_page_state
#define add_zone_page_state(x, y, z) do { } while(0)
#undef sub_zone_page_state
#define sub_zone_page_state(x, y, z) do { } while(0)
#undef prefetchw
#define prefetchw(x) do { } while(0)
#undef totalram_pages
#define totalram_pages 0UL
#undef mm_populate
#define mm_populate(x, y) do { } while(0)
#undef gfp_allowed_mask
#define gfp_allowed_mask __GFP_BITS_MASK
#undef do_mmap_pgoff
#define do_mmap_pgoff(x1,x2,x3,x4,x5,x6,x7) 0UL
#undef KSTK_ESP
#define KSTK_ESP(x) 0UL
* Locking, interrupts, etc.
* All locking etc. is elided.
#define DEFINE_MUTEX(x) struct mutex x
#undef get_online_cpus
#define get_online_cpus() do { } while(0)
#undef put_online_cpus
#define put_online_cpus() do { } while(0)
#undef mutex_lock
#define mutex_lock(x) do { } while(0)
#undef mutex_unlock
#define mutex_unlock(x) do { } while(0)
#undef spin_lock_init
#define spin_lock_init(x) do { } while(0)
#undef spin_lock_irqsave
#define spin_lock_irqsave(x,flags) do { } while (0)
#undef spin_lock
#define spin_lock(x) do { } while (0)
#undef spin_lock_irq
#define spin_lock_irq(x) do { } while (0)
#undef spin_lock_irqrestore
#define spin_lock_irqrestore(x,flags) do { } while (0)
#undef spin_unlock
#define spin_unlock(x) do { } while(0)
#undef spin_unlock_irq
#define spin_unlock_irq(x) do { } while(0)
#undef rcu_barrier
#define rcu_barrier() do { smp_mb(); } while(0)
#undef call_rcu
#define call_rcu(arg, func) do { func(arg); } while(0)
#undef local_irq_enable
#define local_irq_enable() do { } while(0)
#undef local_irq_disable
#define local_irq_disable() do { } while(0)
#undef might_sleep_if
#define might_sleep_if(x) do { } while (0)
#undef _raw_spin_lock
#define _raw_spin_lock(x) do { } while (0)
#undef _raw_spin_unlock
#define _raw_spin_unlock(x) do { } while (0)