Commit 664628a4 authored by Charlie Jacobsen's avatar Charlie Jacobsen Committed by Vikram Narayanan

Except IPC, kliblcd fully tested. Everything is working.

Documentation in Documentation/lcd-domains/...

Loading, mapping, and running a module is working correctly, using
all of the capability code that interposes on each operation (mapping,
freeing pages, etc.).

cptr allocation and indexing into cspaces is working correctly.

IPC testing and debugging is coming next.
parent 5a6d94fd
This diff is collapsed.
========================================
OVERVIEW
========================================
The code is inside virt/lcd-domains/kliblcd.c. The header (for non-isolated
kernel code to use) is in include/lcd-domains/kliblcd.h.
A kernel thread can "enter/exit into lcd mode" (similar to cap_enter in
Capsicum) by invoking klcd_enter/klcd_exit. A kernel thread that has entered
lcd mode is called a *kernel lcd* or *klcd*. The functions you see with
klcd_ instead of lcd_ are only part of the kliblcd interface and only
available to non-isolated lcd's.
Upon entering lcd mode, a kernel thread can invoke the functions in the
kliblcd interface for creating lcd's, allocating pages, loading modules, etc.
A klcd has a cspace and utcb for message passing, but does not have an
underlying hardware vm (the thread runs unisolated).
See the kliblcd header for a detailed description of the interface. See the
test cases for examples.
========================================
LCD STATUS
========================================
To do: I will probably remove the suspend state. This seemed like it would be
simple, but the proper handling of it when combined with ipc may be
too difficult to justify right now.
An lcd can be in one of five states:
E = Embryo - just after it is created, not configured with a starting
stack pointer, etc.
C = Configured - stack pointer, starting program counter configured
R = Running - kthread is runnable or running, and may be running
inside vm
S = Suspended - kthread is asleep or will soon sleep
D = Dead - kthread has stopped or will soon stop; lcd may be in
the process of being torn down
_____________________________________
/ lcd_destroy \
lcd_run | |
lcd_suspend | lcd_run, lcd_config |
.__. ^ .__. |
| | .----------->| | | |
\ | / lcd_destroy | \ | |
\ | / ^ \ | |
\ V / | \ V V
create +---+ lcd_config +---+ lcd_run +---+ lcd_destroy +---+
-------->| E |------------->| C |------------->| R |--------------->| D |
+---+ .->+---+ +---+ +---+
/ / / ^ ^ \
/ / / \ | \
/ / / | | \
'---' lcd_suspend | | lcd_run '---'
lcd_config, \ / lcd_run
lcd_suspend V / lcd_suspend
+---+ lcd_destroy
| S | lcd_config
+---+
^ \
| \
| \
'---'
lcd_config, lcd_suspend
The following transitions are an error (return non-zero), and have no effect:
E: lcd_run, lcd_suspend - you must configure the lcd first
C: lcd_config, lcd_suspend - lcd already configured; cannot suspend either
R: lcd_run, lcd_config - lcd already running and config'd
S: lcd_suspend, lcd_config - lcd already suspended and config'd
D: all - lcd is dead
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config, allow multiple suspend calls - only first one has effect,
rest are no-ops).
========================================
LCD STATUS
========================================
An lcd can be in one of five states:
E = Embryo - just after it is created, not configured with a starting
stack pointer, etc.
C = Configured - stack pointer, starting program counter configured
R = Running - kthread is runnable or running, and may be running
inside vm
S = Suspended - kthread is asleep or will soon sleep
D = Dead - kthread has stopped or will soon stop; lcd may be in
the process of being torn down
_____________________________________
/ lcd_destroy \
lcd_run | |
lcd_suspend | lcd_run, lcd_config |
.__. ^ .__. |
| | .----------->| | | |
\ | / lcd_destroy | \ | |
\ | / ^ \ | |
\ V / | \ V V
create +---+ lcd_config +---+ lcd_run +---+ lcd_destroy +---+
-------->| E |------------->| C |------------->| R |--------------->| D |
+---+ .->+---+ +---+ +---+
/ / / ^ ^ \
/ / / \ | \
/ / / | | \
'---' lcd_suspend | | lcd_run '---'
lcd_config, \ / lcd_run
lcd_suspend V / lcd_suspend
+---+ lcd_destroy
| S | lcd_config
+---+
^ \
| \
| \
'---'
lcd_config, lcd_suspend
The following transitions are an error (return non-zero), and have no effect:
E: lcd_run, lcd_suspend - you must configure the lcd first
C: lcd_config, lcd_suspend - lcd already configured; cannot suspend either
R: lcd_run, lcd_config - lcd already running and config'd
S: lcd_suspend, lcd_config - lcd already suspended and config'd
D: all - lcd is dead
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config, allow multiple suspend calls - only first one has effect,
rest are no-ops).
......@@ -3,192 +3,8 @@
OVERVIEW
========================================
This code is in virt/lcd-domains and include/lcd-domains. It is the
arch-independent layer of the LCD microkernel.
The two main objects are struct lcd and struct lcd_thread, defined in
include/lcd-domains/lcd-domains.h.
struct lcd contains the guest physical address space (in
underlying lcd_arch), and a list of lcd_thread's. It will soon contain
the lcd's cspace when that is incorporated.
struct lcd_thread corresponds with a host kernel thread that is running
inside the hardware virtual machine. It contains a pointer to the thread's
utcb (for easy microkernel access), and a pointer to the underlying
lcd_arch_thread (the hardware vm).
Why have one kernel thread / hardware vm for each lcd_thread? Answer: To keep
the microkernel simple. The microkernel could add an additional layer of
virtualization on top of the hardware vm, so that we didn't have so many
hardware vm's floating around. But it would be complicated and we would then
have to write scheduler code in the microkernel.
A struct lcd is created by providing a module name. The module will be loaded
inside the lcd, and an initial lcd_thread will be created (stored in the
struct lcd's init_thread field) that, when started, will execute the module's
init code.
See also the comments in the header lcd-domains.h (above) and the test cases
in virt/lcd-domains/lcd-tests.c.
========================================
SETUP
========================================
Aside from building and installing the kernel code, you will need to do one
extra step, explained in detail below.
Background
----------
We don't want tricky logic for locating modules, so we want to re-use the
request_module facility in the kernel. But this relies on the user space
modprobe tools. So, we did the following:
-- we modified the module loading code in the kernel so that a caller
can safely load a module that is destined for an lcd in the host
(modules destined for an lcd *will not* have their init code executed
when installed in the host, nor their exit code executed when they
are uninstalled from the host)
-- we added an ioctl interface that user code can use to
load a module destined for an lcd; it uses the patched module loading
code
-- we created a patched modprobe that uses this interface
-- we patched request_module to allow kernel code to load a module
destined for an lcd, using the patched modprobe
So, when you call lcd_create, the kernel loads the module using the patched
modprobe.
This means you need to have the patched modprobe properly installed!
Step 1
------
Build and install the kernel and all modules. In the root directory of the
kernel source,
[ 1 ] make menuconfig
-- go into Virtualization (2) and select Lightweight Capability
Domains and Intel Support for LCDs
-- it is recommended you build them as modules, for debugging
[ 2 ] exit and save the configuration
[ 3 ] make
-- use make -j 8 if e.g. you have 8 cores, will go faster
[ 4 ] sudo make modules_install install
-- order is important!
-- this should automatically update the grub boot menu
Step 2 - Patched Modprobe Setup
-------------------------------
The patched version is inside tools/module-init-tools.
To build and install, enter the module-init-tools directory,
and do the following:
[ 1 ] aclocal -I m4 && automake --add-missing --copy && autoconf
[ 2 ] ./configure --prefix=/ --program-prefix=lcd-
[ 3 ] make
[ 4 ] (sudo) make install
This will install the patched /sbin/lcd-modprobe and /sbin/lcd-insmod,
as well as the other init tools that were left untouched. The
request_module will use lcd-modprobe to load the module.
The man pages won't install on emulab (since /share is read only).
You can specify a different man dir via configure if you wish.
[Note: The only changes to init tools are in modprobe.c and insmod.c; only
the changes in modprobe.c are of interest (lcd-insmod is not currently
used/needed). Instead of doing the Linux init_module system call,
lcd-modprobe does an ioctl call to the LCD driver (hence, the LCD driver
must be loaded), with the bytes of the module, its size, and command
line options.]
Step 3 - Reboot and install
---------------------------
After rebooting the machine, select the new kernel to boot it.
After booting, if you built the lcd system as modules, do:
[ 1 ] insmod ${MODULE_PATH}/arch/x86/lcd-domains/lcd-domains-arch.ko
[ 2 ] insmod ${MODULE_PATH}/virt/lcd-domains/lcd-domains.ko
where ${MODULE_PATH} is something like /lib/modules/3.10.14/kernel.
This will install the lcd system.
You can now create an lcd using the example below.
========================================
EXAMPLE
========================================
Here is an example of how to start up an lcd with a module named foo.ko. foo.ko
should already be compiled and installed in the system's module load path.
struct lcd *lcd;
struct lcd_thread *lcd_thread;
int ret;
/*
* Create the lcd
*/
ret = lcd_create("foo.ko", &lcd);
/*
* Start the lcd's init thread (will run foo.ko's init routine)
*/
ret = lcd_thread_start(lcd->init_thread);
/* (...wait for a while, maybe sleep...) */
/*
* Kill the init thread
*/
ret = lcd_thread_kill(lcd->init_thread);
/*
* Tear down the LCD
*/
lcd_destroy(lcd);
========================================
MODULE LOADING
========================================
This one is a real zinger.
========================================
GUEST VIRTUAL ADDRESS SPACE
========================================
A good chunk of the current arch-independent code is for setting up the
boot guest virtual address space for an lcd. We assume that the lcd will take
over managing this, so we've kept allocation logic dirt simple.
Note that the microkernel is protected from what the lcd does to its guest
virtual address space. The microkernel manages the lcd's guest physical
address space, and the host pages the lcd has access to, so it can safely
write to memory without causing a page fault.
This documents the code inside virt/lcd-domains/main.c. This code handles
create/destroy of lcd's, page allocation, and running lcd's. kliblcd calls
into this code to carry out these operations.
See Documentation/lcd-domains/kliblcd.txt for more info.
========================================
OVERVIEW
========================================
The arch-independent code is in virt/lcd-domains. It is the arch-independent
layer of the LCD microkernel.
The main objects are struct cspaces, struct lcds, struct endpoints, defined
in virt/lcd-domains/internal.h.
White and black box test cases are in virt/lcd-domains/tests. (These are just
included at the bottom of the corresponding source file, and the tests are
ran when the microkernel is loaded.)
struct lcd has an associated host kernel thread that is running inside
a hardware virtual machine. It contains a pointer to the lcd's utcb, some
status info, and a cspace.
External code should use kliblcd to interact with the microkernel and create
lcd's; see Documentation/lcd-domains/kliblcd.txt.
See also the comments in the internal.h header and tests.
========================================
SETUP
========================================
Aside from building and installing the kernel code, you will need to do one
extra step, explained in detail below.
Background
----------
We don't want tricky logic for locating modules, so we want to re-use the
request_module facility in the kernel. But this relies on the user space
modprobe tools. So, we did the following:
-- we modified the module loading code in the kernel so that a caller
can safely load a module that is destined for an lcd in the host
(modules destined for an lcd *will not* have their init code executed
when installed in the host, nor their exit code executed when they
are uninstalled from the host)
-- we added an ioctl interface that user code can use to
load a module destined for an lcd; it uses the patched module loading
code
-- we created a patched modprobe that uses this interface
-- we patched request_module to allow kernel code to load a module
destined for an lcd, using the patched modprobe
So, when you call lcd_create, the kernel loads the module using the patched
modprobe.
This means you need to have the patched modprobe properly installed!
Step 1
------
Build and install the kernel and all modules. In the root directory of the
kernel source,
[ 1 ] make menuconfig
-- go into Virtualization (2) and select Lightweight Capability
Domains and Intel Support for LCDs
-- it is recommended you build them as modules, for debugging
[ 2 ] exit and save the configuration
[ 3 ] make
-- use make -j 8 if e.g. you have 8 cores, will go faster
[ 4 ] sudo make modules_install install
-- order is important!
-- this should automatically update the grub boot menu
Step 2 - Patched Modprobe Setup
-------------------------------
The patched version is inside tools/module-init-tools.
To build and install, enter the module-init-tools directory,
and do the following:
[ 1 ] aclocal -I m4 && automake --add-missing --copy && autoconf
[ 2 ] ./configure --prefix=/ --program-prefix=lcd-
[ 3 ] make
[ 4 ] (sudo) make install
This will install the patched /sbin/lcd-modprobe and /sbin/lcd-insmod,
as well as the other init tools that were left untouched. The
request_module will use lcd-modprobe to load the module.
The man pages won't install on emulab (since /share is read only).
You can specify a different man dir via configure if you wish.
[Note: The only changes to init tools are in modprobe.c and insmod.c; only
the changes in modprobe.c are of interest (lcd-insmod is not currently
used/needed). Instead of doing the Linux init_module system call,
lcd-modprobe does an ioctl call to the LCD driver (hence, the LCD driver
must be loaded), with the bytes of the module, its size, and command
line options.]
Step 3 - Reboot and install
---------------------------
After rebooting the machine, select the new kernel to boot it.
After booting, if you built the lcd system as modules, do:
[ 1 ] insmod ${MODULE_PATH}/arch/x86/lcd-domains/lcd-domains-arch.ko
[ 2 ] insmod ${MODULE_PATH}/virt/lcd-domains/lcd-domains.ko
where ${MODULE_PATH} is something like /lib/modules/3.10.14/kernel.
This will install the lcd system.
You can now create an lcd using kliblcd. See
Documentation/lcd-domains/kliblcd.txt.
......@@ -45,7 +45,7 @@ out:
fail_alloc:
return ret;
}
#if 0
static int test03_help(struct lcd_arch *lcd, gpa_t base)
{
hpa_t actual;
......@@ -133,7 +133,7 @@ fail3:
fail1:
return ret;
}
#endif
static int test04(void)
{
struct lcd_arch *lcd;
......@@ -273,8 +273,8 @@ static void lcd_arch_tests(void)
return;
if (test02())
return;
if (test03())
return;
// if (test03())
// return;
if (test04())
return;
if (test05())
......
......@@ -1148,7 +1148,7 @@ int lcd_arch_ept_unmap(struct lcd_arch *lcd, gpa_t a)
{
int ret;
lcd_arch_epte_t *ept_entry;
/*
* Walk ept
*/
......@@ -1240,7 +1240,8 @@ int lcd_arch_ept_gpa_to_hpa(struct lcd_arch *lcd, gpa_t ga, hpa_t *ha_out)
/**
* Recursively frees all present entries in dir at level, and
* the page containing the dir.
* the page containing the dir. The recursion depth is limited to 3 - 4 stack
* frames, so it's reasonable to use.
*
* 0 = pml4
* 1 = pdpt
......@@ -1254,23 +1255,16 @@ static void vmx_free_ept_dir_level(lcd_arch_epte_t *dir, int level)
{
int idx;
if (level == 3) {
/*
* Base case of recursion
*
* Free any mapped host page frames, notify
*
* XXX: This can lead to nasty double frees if we made a
* mistake and just forgot to unmap in the ept.
*/
for (idx = 0; idx < LCD_ARCH_PTRS_PER_EPTE; idx++) {
if (vmx_epte_present(dir[idx])) {
LCD_ARCH_ERR("memory leak at hva %lx",
hva_val(vmx_epte_hva(dir[idx])));
free_page(hva_val(vmx_epte_hva(dir[idx])));
}
}
} else {
/*
* Base case of recursion is when level = 3.
*
* In that case - don't do anything - don't try to free any
* pages that are still mapped. The higher level layers
* should've done that already (but may not have bothered
* unmapping). If we try to free pages that are still mapped,
* we may get bad double free's.
*/
if (level != 3) {
/*
* pml4, pdpt, or page directory
*
......
......@@ -159,7 +159,8 @@ int lcd_run(cptr_t lcd);
* This will not tear it down - this happens when the last capability to the
* lcd goes away. Use lcd_delete or lcd_revoke as necessary.
*
* Blocks until lcd halts.
* Blocks until lcd halts. Warning: This could be a while if the lcd is
* sitting in an endpoint queue.
*/
int lcd_suspend(cptr_t lcd);
......@@ -218,8 +219,6 @@ int lcd_cap_revoke(cptr_t slot);
/* CPTR CACHE -------------------------------------------------- */
#define LCD_MAX_CPTRS 32
/**
* Find an unused cptr (a cptr that refers to an unused cnode).
*/
......@@ -270,10 +269,6 @@ struct lcd_module_info {
* Where to point the program counter to run init
*/
gva_t init;
/*
* Counter to track which cptrs are used
*/
unsigned long cptr_counter;
/*
* List of lcd_module_pages
*/
......
......@@ -12,7 +12,7 @@
*/
/* FIXME: this must be reserved in miscdevice.h */
#define LCD_MINOR 234
#define LCD_MINOR 239
struct lcd_init_module_args {
/* syscall arguments to init_module */
......
......@@ -91,7 +91,7 @@ static int call_modprobe(char *module_name, int wait, int for_lcd)
argv[2] = "--";
argv[3] = module_name; /* check free_modprobe_argv() */
argv[4] = NULL;
info = call_usermodehelper_setup(__modprobe_path, argv, envp,
GFP_KERNEL, NULL, free_modprobe_argv,
NULL);
......
......@@ -1696,7 +1696,7 @@ static void do_softdep(const struct module_softdep *softdep,
}
}
#define DEVICE_NAME "/dev/lcd-prototype"
#define DEVICE_NAME "/dev/lcd"
static int lcd_init_module(void *module_image, unsigned long len,
char *param_values)
{
......
......@@ -187,7 +187,6 @@ static int update_cnode_table(struct cspace *cspace,
* cnode free, invalid, etc.
*/
return -EINVAL; /* signal error in look up */
}
}
......@@ -218,28 +217,34 @@ static int find_cnode(struct cspace *cspace, struct cnode_table *old,
return 1; /* signal we found the slot and are done */
} else {
/*
* invalid indexing, etc.
*/
return -EINVAL; /* signal an error in look up */
}
}
static int get_level_index(int table_level, unsigned long index,
static int get_level_index(int table_level, cptr_t c,
unsigned long *level_id)
{
int more_levels;
/*
* Right shift to set of bits for the level
*/
index >>= (LCD_CNODE_TABLE_NUM_SLOTS * table_level);
/*
* Determine if we need to follow a table pointer
*/
more_levels = index & (LCD_CNODE_TABLE_NUM_SLOTS >> 1);
/*
* Calculate index in this level
* Calculate the depth of the index
*/
*level_id = index & ((LCD_CNODE_TABLE_NUM_SLOTS >> 1) - 1);
return more_levels;
if (lcd_cptr_level(c) == table_level) {
/*
* We're at the final level - we're done, and need to look in
* the cap slots in the cnode table
*/
*level_id = lcd_cptr_slot(c);
return 0; /* signal no more levels to traverse */
} else {
/*
* More levels to go; determine index of next table to
* look at
*/
*level_id = lcd_cptr_fanout(c, table_level);
return 1; /* signal more levels to traverse */
}
}
static int walk_one_level(struct cspace *cspace, cptr_t c, bool alloc,
......@@ -249,7 +254,7 @@ static int walk_one_level(struct cspace *cspace, cptr_t c, bool alloc,
int more_levels;
unsigned long level_id;
more_levels = get_level_index(old->table_level, cptr_val(c), &level_id);
more_levels = get_level_index(old->table_level, c, &level_id);
if (more_levels)
return update_cnode_table(cspace, old, level_id, alloc, new);
else
......@@ -270,9 +275,6 @@ static int __lcd_cnode_lookup(struct cspace *cspace, cptr_t c, bool alloc,
struct cnode_table *old;
struct cnode_table *new;
if(cptr_val(c) >= LCD_MAX_CAPS)
return -EINVAL;
/*
* Initialize to root cnode table
*/
......
......@@ -52,8 +52,59 @@ static inline void __lcd_warn(char *file, int lineno, char *fmt, ...)
* See Documentation/lcd-domains/cap.txt.
*/
#define LCD_MAX_CAPS 32
#define LCD_CNODE_TABLE_NUM_SLOTS 8 /* should be a power of 2 */
#define LCD_CPTR_DEPTH_BITS 2 /* max depth of 3, zero indexed */
#define LCD_CPTR_FANOUT_BITS 2 /* each level fans out by a factor of 4 */
#define LCD_CPTR_SLOT_BITS 2 /* each node contains 4 cap slots */
#define LCD_CNODE_TABLE_NUM_SLOTS ((1 << LCD_CPTR_SLOT_BITS) + \
(1 << LCD_CPTR_FANOUT_BITS))
#define LCD_CPTR_LEVEL_SHIFT (((1 << LCD_CPTR_DEPTH_BITS) - 1) * \
LCD_CPTR_FANOUT_BITS + LCD_CPTR_SLOT_BITS)
static inline unsigned long lcd_cptr_slot(cptr_t c)
{
/*
* Mask off low bits
*/
return cptr_val(c) & ((1 << LCD_CPTR_SLOT_BITS) - 1);
}
/*
* Gives fanout index for going *from* lvl to lvl + 1, where
* 0 <= lvl < 2^LCD_CPTR_DEPTH_BITS - 1 (i.e., we can't go anywhere
* if lvl = 2^LCD_CPTR_DEPTH_BITS - 1, because we are at the deepest
* level).
*/
static inline unsigned long lcd_cptr_fanout(cptr_t c, int lvl)
{
unsigned long i;
BUG_ON(lvl >= (1 << LCD_CPTR_DEPTH_BITS) - 1);
i = cptr_val(c);
/*
* Shift and mask off bits at correct section
*/
i >>= (lvl * LCD_CPTR_FANOUT_BITS + LCD_CPTR_SLOT_BITS);
i &= ((1 << LCD_CPTR_FANOUT_BITS) - 1);
return i;
}
/*
* Gives depth/level of cptr, zero indexed (0 means the root cnode table)
*/
static inline unsigned long lcd_cptr_level(cptr_t c)
{
unsigned long i;
i = cptr_val(c);
/*
* Shift and mask
*/
i >>= LCD_CPTR_LEVEL_SHIFT;
i &= ((1 << LCD_CPTR_DEPTH_BITS) - 1);
return i;
}
/*
* --------------------------------------------------
......@@ -222,13 +273,15 @@ struct lcd_utcb {
*
* LCDs - lightweight capability domain
*
* See Documentation/lcd-domains/lcd.txt
*/
#define LCD_STATUS_EMBRYO 0
#define LCD_STATUS_SUSPENDED 1
#define LCD_STATUS_RUNNING 2
#define LCD_STATUS_DEAD 3 /* an lcd can never come back to life */
#define LCD_STATUS_CONFIGED 1
#define LCD_STATUS_SUSPENDED 2
#define LCD_STATUS_RUNNING 3
#define LCD_STATUS_DEAD 4