Commit 664628a4 authored by Charlie Jacobsen's avatar Charlie Jacobsen Committed by Vikram Narayanan

Except IPC, kliblcd fully tested. Everything is working.

Documentation in Documentation/lcd-domains/...

Loading, mapping, and running a module is working correctly, using
all of the capability code that interposes on each operation (mapping,
freeing pages, etc.).

cptr allocation and indexing into cspaces is working correctly.

IPC testing and debugging is coming next.
parent 5a6d94fd
......@@ -49,8 +49,13 @@ microkernel will initialize a cdt with the cnode at the root.
grant
-----
During ipc (and only during ipc), LCD A can grant rights to LCD B. This is how
it works:
Grant can occur at two times: when an lcd is being created, and during ipc.
If LCD A is creating LCD B, A can grant capabilities to B using lcd_cap_grant.
LCD A is responsible for notifying B where the capabilities are in B's
cspace, by some agreed upon protocol.
During ipc, LCD A can grant rights to LCD B. This is how it works:
-- Suppose LCD A has a capability to an object already. The capability
is stored in a cnode and referenced by cptr1.
......@@ -116,29 +121,6 @@ The cspace/cdt data structures can also accomodate this weird case too.
(Note: It's not possible for an object to be inserted mutliple times - and
lead to multiple cdts - because the microkernel does the insertion.)
[ 2 ]
Two different cptr's can refer to the same cap slot. For example, with
a cnode table size of 8, 0b000011 and 0b001011 refer to the same slot.
The cptr cache allocator will ensure it generates cptr's for unique slots. If
an LCD manipulates a cptr, it does so at its own peril - cptr's are meant to
be opaque.
Cptr `aliasing' still allows for plenty of cptr's - for a cnode table of
size 8, and 64 bit cptr's, you can get roughly:
4 + 4^2 + 4^3 + 4^4 + ...
cptr's (i.e., plenty).
Note that the cptr cache allocator needs to ensure it also doesn't hand out
a cptr like 0xFFFFFFFFFF .... since this cannot resolve to a cap slot (it will
always follow a table pointer).
See the section "Capability Space Radix Tree" for more details about cspace
traversal.
========================================
COMPARISON TO seL4
========================================
......@@ -171,13 +153,22 @@ does, we do the copying on behalf of the LCDs when rights are granted. (As
mentioned in "Warnings", it is still possible for an LCD to have multiple
capabilities to the same object.)
[ 3 ]
In our microkernel, an LCD can grant capabilities to an LCD it is creating,
using lcd_cap_grant.
========================================
CPTR CACHE
========================================
For now, so that the LCD doesn't need to track which cnodes are used, we
give it a cptr cache for getting a fresh cptr to an unused cnode. This should
be moved at some point to liblcd.
kliblcd (and soon, liblcd) contain cptr cache implementations, so that the
other code inside an LCD doesn't need to track which cnodes are used. Not every
integer is a valid cptr, so the allocation is a bit complicated. See the
"Capability Space Radix Tree" below for an overview of cptr indexing.
The cache contains a bitmap for each level of the cspace tree. This makes it
easy to generate valid cptr's and track allocation/free's.
========================================
CAPABILITY SPACE RADIX TREE
......@@ -205,68 +196,98 @@ The cspace is built dynamically by the microkernel as slots are referenced by
an LCD. (This is different from seL4 - threads in seL4 are responsible for
building the cspace using the interface provided by the microkernel.)
An index, or cptr, is resolved using a radix-tree-style look up, but from the
least significant bit rather than the most significant bit. If the bits
resolve to a cap slot, the search is done; otherwise, the table pointer is
followed to the next level, and the next set of bits are considered. Some
examples follow. 0b1101 means binary 1101.
[See notes at the end if you're wondering why this is so complicated.]
Suppose the number of slots per table is 8 - so there are 4 cap slots and
4 table slots.
The shape of the cspace is controlled by three parameters (in internal.h):
Index = 3 = 0b000011:
LCD_CPTR_DEPTH_BITS - controls number of levels in cspace
Since there are 8 entries in the root table, we look at the first three
bits - 011. This indexes into the 3rd cap slot (zero indexed) in the
root cnode table:
|
|
V |
+---+---+---+---+---+---+---+---+
| | | |011| | | | | level 0 (root table)
+---+---+---+---+---+---+---+---+
|
cap slots table slots
0 bits : only root table (2^0 levels)
1 bit : root and one more level (2^1 levels)
2 bits : root and three more levels (2^2 levels)
LCD_CPTR_FANOUT_BITS - controls how many table slots are in the
cnode tables - in other words, the fanout
Index = 6 = 0b000110
0 bits : one table slot (2^0)
1 bit : two table slots (2^1)
2 bits : four table slots (2^2)
First three bits are 110 - this is a table slot, so we follow the pointer
to the next level.
LCD_CPTR_SLOT_BITS - controls how many cap slots are in the
cnode tables; similar to table slots above
Next three bits are 000 - this is a cap slot - we're done.
An index, or cptr, encodes the location of a cnode in the cspace. The encoding
includes the level of the table that contains the slot; the fanout "path" to
get to that table; and the slot index inside the table.
|
+---+---+---+---+---+---+---+---+
| | | | | | |110| | level 0
+---+---+---+---+---+---+-|-+---+
| |
|
V
The lookup is just like a radix tree lookup (starting from LSB instead of MSB),
but the level bits tell us how far to go / how many fanout bits are meaningful.
|
+---+---+---+---+---+---+---+---+
|000| | | | | | | | level 1
+---+---+---+---+---+---+---+---+
^ |
|
|
final slot
If all of the parameters are 2 (2 bits each), the cptr bit layout is the
following:
____ cap slot index bits
/
/
00 00 00 00 00
/ | | |
/ | | |
level -----' fanout path (like a radix tree path)
Index = 11 = 0b001011
So, in LSB order, the slot index bits come first, then the fanout path, then
the level. The level and slot index bits are always in the same position. The
interpretation of the fanout path bits depends on the level.
First three bits are 011 - this is the 3rd cap slot again.
For example, if the cptr is
This demonstrates that two different indexes can refer to the same slot!
11 11 00 10 01
|
|
V |
+---+---+---+---+---+---+---+---+
| | | |011| | | | |
+---+---+---+---+---+---+---+---+
|
the slot index is 01 = 1, and the level of the table is 11 = 3. All three
pairs of fanout bits are used to traverse from the root cnode table to the
final cnode table:
[1] From the root cnode table, we look at the first pair of fanout
bits (in LSB order) = 10 = 2; so we follow the 2nd table slot
(zero indexed) in the root cnode table to level 1
[2] From the level 1 cnode table, we look at the next pair of fanout
bits = 00 = 0; so we follow the 0th table slot in this cnode
table to level 2
[3] Finally, from level 2, we look at the next pair of fanout
bits = 11 = 3; so we follow the 3rd table slot to arrive
at the cnode table in level 3
We now use the cap lot index bits = 01 = 1 to look up the capability.
Another example: if the cptr is
01 00 00 01 11
the level is 1, and only the first pair (01) of fanout bits are meaningful.
Starting in the root cnode table,
[1] we see that the level = 01 = 1 > 0, so we look at the first pair of
fanout bits = 01 = 1; we follow the pointer in the 1st table slot
to level
[2] we now see that the level of the cptr = the level we are at, 1, so
we now use the slot bits to look up the slot in the table (11 = 3).
Notes
-----
It looks more complicated than it is. It's just radix tree traversal with
a depth check.
We tried a similar technique, but without using the level bits. This lead to
problems: since a cptr may refer to different levels, it became hard to
know when to stop the radix tree-like traversal.
As an alternative, we tried taking triples of bits from LSB to MSB: If the
high bit was set in a triple, this indicated to keep traversing and look at the
next triple; otherwise, stop. But it's hard to convert a number like 15 to an
index of this type. And this makes alloc/free of cptrs hard (talk to Charlie
for more details).
========================================
CAPABILITY DERIVATION TREE
......@@ -360,8 +381,11 @@ containing cdt and cspace from going away while it is in use (they can't go
away until the cnode is locked and removed from the cspace and/or cdt).
========================================
CALL/REPLY CAPABILITIES
SPECIAL CPTRS / CAPABILITIES
========================================
slot 1 = capability to lcd's endpoint for receiving replies
slot 2 = (dynamic) capability to caller's reply endpoint, during call/reply
To be implemented soon:
cptr 0 = null, always invalid
cptr 1 = capability to lcd's endpoint for receiving replies
cptr 2 = (dynamic) capability to caller's reply endpoint, during call/reply
========================================
OVERVIEW
========================================
The code is inside virt/lcd-domains/kliblcd.c. The header (for non-isolated
kernel code to use) is in include/lcd-domains/kliblcd.h.
A kernel thread can "enter/exit into lcd mode" (similar to cap_enter in
Capsicum) by invoking klcd_enter/klcd_exit. A kernel thread that has entered
lcd mode is called a *kernel lcd* or *klcd*. The functions you see with
klcd_ instead of lcd_ are only part of the kliblcd interface and only
available to non-isolated lcd's.
Upon entering lcd mode, a kernel thread can invoke the functions in the
kliblcd interface for creating lcd's, allocating pages, loading modules, etc.
A klcd has a cspace and utcb for message passing, but does not have an
underlying hardware vm (the thread runs unisolated).
See the kliblcd header for a detailed description of the interface. See the
test cases for examples.
========================================
LCD STATUS
========================================
To do: I will probably remove the suspend state. This seemed like it would be
simple, but the proper handling of it when combined with ipc may be
too difficult to justify right now.
An lcd can be in one of five states:
E = Embryo - just after it is created, not configured with a starting
stack pointer, etc.
C = Configured - stack pointer, starting program counter configured
R = Running - kthread is runnable or running, and may be running
inside vm
S = Suspended - kthread is asleep or will soon sleep
D = Dead - kthread has stopped or will soon stop; lcd may be in
the process of being torn down
_____________________________________
/ lcd_destroy \
lcd_run | |
lcd_suspend | lcd_run, lcd_config |
.__. ^ .__. |
| | .----------->| | | |
\ | / lcd_destroy | \ | |
\ | / ^ \ | |
\ V / | \ V V
create +---+ lcd_config +---+ lcd_run +---+ lcd_destroy +---+
-------->| E |------------->| C |------------->| R |--------------->| D |
+---+ .->+---+ +---+ +---+
/ / / ^ ^ \
/ / / \ | \
/ / / | | \
'---' lcd_suspend | | lcd_run '---'
lcd_config, \ / lcd_run
lcd_suspend V / lcd_suspend
+---+ lcd_destroy
| S | lcd_config
+---+
^ \
| \
| \
'---'
lcd_config, lcd_suspend
The following transitions are an error (return non-zero), and have no effect:
E: lcd_run, lcd_suspend - you must configure the lcd first
C: lcd_config, lcd_suspend - lcd already configured; cannot suspend either
R: lcd_run, lcd_config - lcd already running and config'd
S: lcd_suspend, lcd_config - lcd already suspended and config'd
D: all - lcd is dead
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config, allow multiple suspend calls - only first one has effect,
rest are no-ops).
========================================
LCD STATUS
========================================
An lcd can be in one of five states:
E = Embryo - just after it is created, not configured with a starting
stack pointer, etc.
C = Configured - stack pointer, starting program counter configured
R = Running - kthread is runnable or running, and may be running
inside vm
S = Suspended - kthread is asleep or will soon sleep
D = Dead - kthread has stopped or will soon stop; lcd may be in
the process of being torn down
_____________________________________
/ lcd_destroy \
lcd_run | |
lcd_suspend | lcd_run, lcd_config |
.__. ^ .__. |
| | .----------->| | | |
\ | / lcd_destroy | \ | |
\ | / ^ \ | |
\ V / | \ V V
create +---+ lcd_config +---+ lcd_run +---+ lcd_destroy +---+
-------->| E |------------->| C |------------->| R |--------------->| D |
+---+ .->+---+ +---+ +---+
/ / / ^ ^ \
/ / / \ | \
/ / / | | \
'---' lcd_suspend | | lcd_run '---'
lcd_config, \ / lcd_run
lcd_suspend V / lcd_suspend
+---+ lcd_destroy
| S | lcd_config
+---+
^ \
| \
| \
'---'
lcd_config, lcd_suspend
The following transitions are an error (return non-zero), and have no effect:
E: lcd_run, lcd_suspend - you must configure the lcd first
C: lcd_config, lcd_suspend - lcd already configured; cannot suspend either
R: lcd_run, lcd_config - lcd already running and config'd
S: lcd_suspend, lcd_config - lcd already suspended and config'd
D: all - lcd is dead
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config, allow multiple suspend calls - only first one has effect,
rest are no-ops).
......@@ -3,192 +3,8 @@
OVERVIEW
========================================
This code is in virt/lcd-domains and include/lcd-domains. It is the
arch-independent layer of the LCD microkernel.
The two main objects are struct lcd and struct lcd_thread, defined in
include/lcd-domains/lcd-domains.h.
struct lcd contains the guest physical address space (in
underlying lcd_arch), and a list of lcd_thread's. It will soon contain
the lcd's cspace when that is incorporated.
struct lcd_thread corresponds with a host kernel thread that is running
inside the hardware virtual machine. It contains a pointer to the thread's
utcb (for easy microkernel access), and a pointer to the underlying
lcd_arch_thread (the hardware vm).
Why have one kernel thread / hardware vm for each lcd_thread? Answer: To keep
the microkernel simple. The microkernel could add an additional layer of
virtualization on top of the hardware vm, so that we didn't have so many
hardware vm's floating around. But it would be complicated and we would then
have to write scheduler code in the microkernel.
A struct lcd is created by providing a module name. The module will be loaded
inside the lcd, and an initial lcd_thread will be created (stored in the
struct lcd's init_thread field) that, when started, will execute the module's
init code.
See also the comments in the header lcd-domains.h (above) and the test cases
in virt/lcd-domains/lcd-tests.c.
========================================
SETUP
========================================
Aside from building and installing the kernel code, you will need to do one
extra step, explained in detail below.
Background
----------
We don't want tricky logic for locating modules, so we want to re-use the
request_module facility in the kernel. But this relies on the user space
modprobe tools. So, we did the following:
-- we modified the module loading code in the kernel so that a caller
can safely load a module that is destined for an lcd in the host
(modules destined for an lcd *will not* have their init code executed
when installed in the host, nor their exit code executed when they
are uninstalled from the host)
-- we added an ioctl interface that user code can use to
load a module destined for an lcd; it uses the patched module loading
code
-- we created a patched modprobe that uses this interface
-- we patched request_module to allow kernel code to load a module
destined for an lcd, using the patched modprobe
So, when you call lcd_create, the kernel loads the module using the patched
modprobe.
This means you need to have the patched modprobe properly installed!
Step 1
------
Build and install the kernel and all modules. In the root directory of the
kernel source,
[ 1 ] make menuconfig
-- go into Virtualization (2) and select Lightweight Capability
Domains and Intel Support for LCDs
-- it is recommended you build them as modules, for debugging
[ 2 ] exit and save the configuration
[ 3 ] make
-- use make -j 8 if e.g. you have 8 cores, will go faster
[ 4 ] sudo make modules_install install
-- order is important!
-- this should automatically update the grub boot menu
Step 2 - Patched Modprobe Setup
-------------------------------
The patched version is inside tools/module-init-tools.
To build and install, enter the module-init-tools directory,
and do the following:
[ 1 ] aclocal -I m4 && automake --add-missing --copy && autoconf
[ 2 ] ./configure --prefix=/ --program-prefix=lcd-
[ 3 ] make
[ 4 ] (sudo) make install
This will install the patched /sbin/lcd-modprobe and /sbin/lcd-insmod,
as well as the other init tools that were left untouched. The
request_module will use lcd-modprobe to load the module.
The man pages won't install on emulab (since /share is read only).
You can specify a different man dir via configure if you wish.
[Note: The only changes to init tools are in modprobe.c and insmod.c; only
the changes in modprobe.c are of interest (lcd-insmod is not currently
used/needed). Instead of doing the Linux init_module system call,
lcd-modprobe does an ioctl call to the LCD driver (hence, the LCD driver
must be loaded), with the bytes of the module, its size, and command
line options.]
Step 3 - Reboot and install
---------------------------
After rebooting the machine, select the new kernel to boot it.
After booting, if you built the lcd system as modules, do:
[ 1 ] insmod ${MODULE_PATH}/arch/x86/lcd-domains/lcd-domains-arch.ko
[ 2 ] insmod ${MODULE_PATH}/virt/lcd-domains/lcd-domains.ko
where ${MODULE_PATH} is something like /lib/modules/3.10.14/kernel.
This will install the lcd system.
You can now create an lcd using the example below.
========================================
EXAMPLE
========================================
Here is an example of how to start up an lcd with a module named foo.ko. foo.ko
should already be compiled and installed in the system's module load path.
struct lcd *lcd;
struct lcd_thread *lcd_thread;
int ret;
/*
* Create the lcd
*/
ret = lcd_create("foo.ko", &lcd);
/*
* Start the lcd's init thread (will run foo.ko's init routine)
*/
ret = lcd_thread_start(lcd->init_thread);
/* (...wait for a while, maybe sleep...) */
/*
* Kill the init thread
*/
ret = lcd_thread_kill(lcd->init_thread);
/*
* Tear down the LCD
*/
lcd_destroy(lcd);
========================================
MODULE LOADING
========================================
This one is a real zinger.
========================================
GUEST VIRTUAL ADDRESS SPACE
========================================
A good chunk of the current arch-independent code is for setting up the
boot guest virtual address space for an lcd. We assume that the lcd will take
over managing this, so we've kept allocation logic dirt simple.
Note that the microkernel is protected from what the lcd does to its guest
virtual address space. The microkernel manages the lcd's guest physical
address space, and the host pages the lcd has access to, so it can safely
write to memory without causing a page fault.
This documents the code inside virt/lcd-domains/main.c. This code handles
create/destroy of lcd's, page allocation, and running lcd's. kliblcd calls
into this code to carry out these operations.
See Documentation/lcd-domains/kliblcd.txt for more info.
========================================
OVERVIEW
========================================
The arch-independent code is in virt/lcd-domains. It is the arch-independent
layer of the LCD microkernel.
The main objects are struct cspaces, struct lcds, struct endpoints, defined
in virt/lcd-domains/internal.h.
White and black box test cases are in virt/lcd-domains/tests. (These are just
included at the bottom of the corresponding source file, and the tests are
ran when the microkernel is loaded.)
struct lcd has an associated host kernel thread that is running inside
a hardware virtual machine. It contains a pointer to the lcd's utcb, some
status info, and a cspace.
External code should use kliblcd to interact with the microkernel and create
lcd's; see Documentation/lcd-domains/kliblcd.txt.
See also the comments in the internal.h header and tests.
========================================
SETUP
========================================
Aside from building and installing the kernel code, you will need to do one
extra step, explained in detail below.
Background
----------
We don't want tricky logic for locating modules, so we want to re-use the
request_module facility in the kernel. But this relies on the user space
modprobe tools. So, we did the following:
-- we modified the module loading code in the kernel so that a caller
can safely load a module that is destined for an lcd in the host
(modules destined for an lcd *will not* have their init code executed
when installed in the host, nor their exit code executed when they
are uninstalled from the host)
-- we added an ioctl interface that user code can use to
load a module destined for an lcd; it uses the patched module loading
code
-- we created a patched modprobe that uses this interface
-- we patched request_module to allow kernel code to load a module
destined for an lcd, using the patched modprobe
So, when you call lcd_create, the kernel loads the module using the patched
modprobe.
This means you need to have the patched modprobe properly installed!
Step 1
------
Build and install the kernel and all modules. In the root directory of the
kernel source,
[ 1 ] make menuconfig
-- go into Virtualization (2) and select Lightweight Capability
Domains and Intel Support for LCDs
-- it is recommended you build them as modules, for debugging
[ 2 ] exit and save the configuration
[ 3 ] make
-- use make -j 8 if e.g. you have 8 cores, will go faster
[ 4 ] sudo make modules_install install
-- order is important!
-- this should automatically update the grub boot menu
Step 2 - Patched Modprobe Setup
-------------------------------
The patched version is inside tools/module-init-tools.
To build and install, enter the module-init-tools directory,
and do the following:
[ 1 ] aclocal -I m4 && automake --add-missing --copy && autoconf
[ 2 ] ./configure --prefix=/ --program-prefix=lcd-
[ 3 ] make
[ 4 ] (sudo) make install