All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

Commit 8198c2fb authored by Charlie Jacobsen's avatar Charlie Jacobsen Committed by Vikram Narayanan

Major overhaul of build process.

Full kernel build no longer required. Yay! This should
cut down on dev time a lot.

I moved all of the LCD source into $(kernel-src)/lcd-domains/,
so it's all in one spot. There is now a top-level makefile in
there that triggers building liblcd, the microkernel, and the
examples. This is built as an *external* build now, even
though the directory is in the kernel source. The build now takes
under a minute to do everything LCD related.

This should also make verification easier in the future (e.g.
building with clang) if we aren't ensnared in the kernel

Of course, to use the microkernel and examples, you have to
build the patched kernel and install it. But now when you
make a few lines of changes in e.g. an example, you don't have
to trigger a top-level kernel build to rebuild it. Running
the full kernel build takes on average about 3 - 4 minutes
(some files are generated everytime, linking is done, and so
on), and can take upwards of 30 minutes for a full build if you

Which brings me to my other change: no more config for LCDs
in menuconfig. If we create menu entries for every example
and so on, we end up changing the config too often, and this
triggers full kernel rebuilds == waste of time. We can use
macros by setting them via compiler flags (e.g., -DSOME_FLAG).
Furthermore, it wasn't making sense to me to do conditional
compilation for LCD support (we always want to compile for that).
Yes, changes aren't clearly delineated with macros, but you can
see changes made by just doing 'git diff v3.10.14 some-file-or-dir'.

The wiki has been fully updated with instructions for building,
and other relevant parts (updated paths to files).

I also took the opportunity to clean up some old stuff lying around
that is dead (like lcdguest). I incorporated all of the documentation
in Documentation/lcd-domains into the wiki so it's all in one
spot now (including some helpful debug tips).
parent af74008b
This diff is collapsed.
-- If your code fails, but e.g. modprobe or insmod hangs, the cpu may
be stuck in VMX Non-root mode or something. You will need to reboot.
-- If you get a page fault inside the lcd, confirm you put the __init
compiler flag on your module_init routine. If you don't do that, the
init routine won't be linked with the module.
-- You can rate limit printk so you're e.g. not printing an error message
after every vm exit.
-- If you get vm exits from nmi's, you should be ok - you probably have
the nmi watchdog turned on. nmi's fire periodically, and the nmi watchdog
just does some routine checks. It will print out warnings if there's actually
a problem.
-- Beware of putting printk's inside nmi handlers. Doing a printk inside an
nmi is in general not safe, because printk uses locks - if code takes the
lock and gets interrupted by an nmi, the nmi will block trying to take the
lock. And nmi's won't fire again until that nmi handler does an EOI, so you
got a hard lock up. (More recent kernel versions use safter printk handling
inside nmi's, if I'm not mistaken - deferring the printk until it's safe to
do so.)
-- There may be a bug in the Broadcom bnx2 ethernet driver that was fixed
in the upstream kernel after we branched off version 3.10.14. You might
see - watchdog: timeout on eth0 (bnx2) etc. etc. And you may lose connectivity
and possibly a hang (if you're trying to access a file via nfs).
-- There may be bad interactions with KVM code if you load it. This might
be the source of the bad hang, but I'm not sure.
-- See also some of the tips in liblcd.txt: Notes & Suggestions when debugging
page faults, etc. inside an LCD.
-- If you get linking errors or redefined symbol errors, you might be using
a different configuration than what I used when I set up liblcd. You will
need to either change your configuration, or modify liblcd to resolve the
symbol errors. (This is one reason why we should build liblcd in a separate
tree, in the future.)
-- If you have lock dep turned on with `proving correctness', you will
get some warnings when you load the LCD module. This is because the code
in main.c and cap.c uses some wild locking that could possibly lead to
deadlocks (it hasn't yet). So lockdep dumps warnings. I haven't bothered
inserting the code to prevent lock dep from complaining.
Recall that LCDs refer to capabilities in their cspace using
integer identifiers (similar to a file descriptor); these are
capability pointers, or cptr_t's.
An LCD has 8 64-bit general registers and 8 capability pointer (cptr_t)
registers. General registers are for scalar arguments. Capability pointer
registers are for granting capabilities. An LCD accesses its registers via:
u64 lcd_r0(void)
... reading general registers
u64 lcd_r8(void)
void lcd_set_r0(u64 val)
... writing to general registers
void lcd_set_r8(u64 val)
ctpr_t lcd_cr0(void)
... reading capability registers
cptr_t lcd_cr8(void)
void lcd_set_cr0(cptr_t val)
... writing to capability registers
void lcd_set_cr8(cptr_t val)
I will explain by example.
Suppose LCD A has:
-- a send capability to a rendezvous point for communicating with LCD B,
referenced by cptr_t c1
-- a capability to a page referenced by cptr_t c2
and that LCD B has:
-- a receive capability on the same rendezvous point, referenced by cptr_t c3
Suppose LCD A wants to grant the page capability to LCD B, and LCD B is
expecting to be granted this capability, and wants to reference the granted
capability via cptr_t c4. A few things need to happen.
First, LCD B needs to allocate a cnode in its cspace:
c4 = lcd_cnode_alloc();
Second, LCD B needs to do a receive, and put c4 in its capability register:
Third, LCD A needs to invoke a send:
The microkernel will match up the send and receive. It will copy the page
capability referred to in *LCD A's cspace* by c2 to cnode in *LCD B's cspace*
referred to by c4.
LCD A could also pass along scalar arguments to LCD B during the same
send invocation.
Call/reply takes the place of two send/recv pairs. Instead of:
lcd_send( ... )
lcd_recv( ... )
lcd_recv( ... )
lcd_send( ... )
the two LCD's can do:
lcd_call( ... )
lcd_reply( ... )
The code is inside virt/lcd-domains/kliblcd.c. The header (for non-isolated
kernel code to use) is in include/lcd-domains/kliblcd.h.
A kernel thread can "enter/exit into lcd mode" (similar to cap_enter in
Capsicum) by invoking klcd_enter/klcd_exit. A kernel thread that has entered
lcd mode is called a *kernel lcd* or *klcd*. The functions you see with
klcd_ instead of lcd_ are only part of the kliblcd interface and only
available to non-isolated lcd's.
Upon entering lcd mode, a kernel thread can invoke the functions in the
kliblcd interface for creating lcd's, allocating pages, loading modules, etc.
A klcd has a cspace and utcb for message passing, but does not have an
underlying hardware vm (the thread runs unisolated).
See the kliblcd header for a detailed description of the interface. See the
test cases for examples.
An lcd can be in one of four states:
E = Embryo - just after it is created, not configured with a starting
stack pointer, etc.
C = Configured - stack pointer, starting program counter configured
R = Running - kthread is runnable or running, and may be running
inside vm
D = Dead - kthread has stopped or will soon stop; lcd may be in
the process of being torn down
/ lcd_destroy \
| |
lcd_run | lcd_run, lcd_config |
.__. ^ .__. |
| | .----------->| | | |
\ | / lcd_destroy | \ | |
\ | / ^ \ | |
\ V / | \ V V
create +---+ lcd_config +---+ lcd_run +---+ lcd_destroy +---+
-------->| E |------------->| C |------------->| R |--------------->| D |
+---+ .->+---+ +---+ +---+
/ / ^ \
/ / | \
/ / | \
'---' '---'
lcd_config lcd_run
The following transitions are an error (return non-zero), and have no effect:
E: lcd_run - you must configure the lcd first
C: lcd_config - lcd already configured
R: lcd_run, lcd_config - lcd already running and config'd
D: all - lcd is dead
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config).
This code is the most complicated part of kliblcd.c. We package up all of the
context and data for setting up a module LCD inside struct lcd_info. This
contains lists of pages we've mapped in the LCD, the temporary cptr
cache we're using to set up the LCD's cspace, and so on. This is done so
that we can properly boot the LCD and tear everything down later.
There are two main parts: loading the module and setting up the VM.
Loading the module happens in:
Setting up the VM happens in:
lcd_create_module_lcd loads the module and sets up the LCD's address space.
The caller can then finish the boot process by populating the boot info
pages for the LCD, providing it with endpoints, and so on.
See the examples in test-mods/ for usage.
An lcd can be in one of five states:
E = Embryo - just after it is created, not configured with a starting
stack pointer, etc.
C = Configured - stack pointer, starting program counter configured
R = Running - kthread is runnable or running, and may be running
inside vm
S = Suspended - kthread is asleep or will soon sleep
D = Dead - kthread has stopped or will soon stop; lcd may be in
the process of being torn down
/ lcd_destroy \
lcd_run | |
lcd_suspend | lcd_run, lcd_config |
.__. ^ .__. |
| | .----------->| | | |
\ | / lcd_destroy | \ | |
\ | / ^ \ | |
\ V / | \ V V
create +---+ lcd_config +---+ lcd_run +---+ lcd_destroy +---+
-------->| E |------------->| C |------------->| R |--------------->| D |
+---+ .->+---+ +---+ +---+
/ / / ^ ^ \
/ / / \ | \
/ / / | | \
'---' lcd_suspend | | lcd_run '---'
lcd_config, \ / lcd_run
lcd_suspend V / lcd_suspend
+---+ lcd_destroy
| S | lcd_config
^ \
| \
| \
lcd_config, lcd_suspend
The following transitions are an error (return non-zero), and have no effect:
E: lcd_run, lcd_suspend - you must configure the lcd first
C: lcd_config, lcd_suspend - lcd already configured; cannot suspend either
R: lcd_run, lcd_config - lcd already running and config'd
S: lcd_suspend, lcd_config - lcd already suspended and config'd
D: all - lcd is dead
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config, allow multiple suspend calls - only first one has effect,
rest are no-ops).
This is the minimal libkernel that should be linked with an LCD. Right now,
it contains code for ipc, kmalloc, and page alloc, but requires whoever
boots the LCD to do some proper boot setup. See the example in
The code is in liblcd/ and is built before we recursively descend into the
other directories. This is so we don't have 10 different recursive make's
trying to build the code and link it into lib.a (from the dependees
in the test-mods folders). I slightly tweaked the top-level Makefile to do
I also slightly tweaked module building to allow for linking libraries with
modules. Most of the time, the build and link just worked, but occasionally,
liblcd was listed first in the command to LD, and hence wasn't linked with the
rest of the objects. (Recall that if a library is listed at the beginning
of the list of files to link with LD, it won't get linked with any of the
files, since there are no outstanding dependencies that require it.)
See the examples in test-mods/. You should include the following three headers
in every source file for a module that will go inside an LCD:
... other headers ...
<lcd-domains/liblcd-hacks.h> /* recommended */
You should also list liblcd as a dependency in the Makefile. Again, see
the examples.
** You may need to do some extra work, see NOTES & SUGGESTIONS. **
IMPORTANT: Do not put liblcd-hacks.h inside header files. Here's a problematic
#include <lcd-domains/liblcd-config.h> /* GOOD */
#include <linux/mm.h>
#include <linux/types.h>
#include <lcd-domains/liblcd-hacks.h> /* BAD */
#define FOO(x) x
static inline int bar(int x) { return x; }
typedef unsigned long gfp_t; /* this will break in file.c */
#include <lcd-domains/liblcd-config.h> /* GOOD */
#include "file.h" /* <<< BAD */
#include <linux/gfp.h> /* you will get a redefine error */
The hacks header file can also undefine symbols that other host kernel
headers are expecting. So, to be safe, only put it in source code (.c) files
after *all* of the other headers.
I will explain by example.
Suppose you want to pull the function foo into liblcd, but you don't want
to reimplement it (imagine foo is something complicated like kmalloc).
First, you figure out that foo is declared in include/linux/foo.h and
defined in mm/foo.c. The first step is to make a copy of foo.c and put it
in liblcd/mm (I use the same directory structure as the kernel).
foo uses a lot of conditional compilation (using CONFIG_* macros), and you
want it to build in a certain way. Define or undefine the correct CONFIG_*
macros in liblcd-config.h, and put <lcd-domains/liblcd-config.h> at the
top of foo.c. This will make all of the code in foo.c and the headers it
includes have the proper configuration.
foo has a number of dependencies. There are five possible types for kernel
1 -- foo calls another function in foo.c
2 -- foo calls another function in a different file that *is not* exported
3 -- foo calls another function in a different file that *is* exported
4 -- foo calls an inline function in a header it includes
5 -- foo uses a macro defined in one of the headers it includes
The strategy depends on the type and how much complexity you want to
Suppose foo calls bar, and bar is in foo.c. The call to bar will work, but
you will need to ensure all of bar's dependencies are fulfilled.
Suppose foo calls bar in bar.c. You have to resolve this dependency or
else linking will fail. You can pull in bar.c into liblcd, or you can
emulate bar in <lcd-domains/hacks.h>, either by eliding it away or
emulating it with other functions that are in liblcd. Pulling in bar.c
may be more complicated, but preferrable if other code depends on it and
bar isn't too complicated. Other functions in bar.c may not be needed, and
you can fulfill their dependencies with some major hacks and elision.
Suppose foo calls bar in bar.c. First, bar may never be called for your
scenarios, and if this is the case, it's probably best to elide it by putting
a hack in <lcd-domains/hacks.h>. Alternatively, you can suck in the file
bar.c. <lcd-domains/hacks.h> will elide the EXPORT_SYMBOL macros, so the
build system won't get confused when it sees a double export. Finally, you
can also choose not to resolve this dependency at all - if bar.c is
built for the host kernel, the build system will see that bar is exported,
and it won't complain when it tries to build and link foo.c in a library
or module (it will assume the dependency will be resolved when the module
is installed). Of course, if bar happens to be called unexpectedly inside
the LCD, you would probably get a page fault since bar is not linked.
Except for the lcd/ subdirectory, all of the source code is from the original
kernel, with very few changes (some files just have the two headers added -
liblcd-config.h and liblcd-hacks.h).
liblcd-config.h changes the build configuration so that the code will be
built for a uniprocessor machine with one NUMA node, no debugging, etc. This
was set up until I got it working; it may not be fully correct or work in
all build scenarios.
For macros and inlines, if they don't cause trouble, you don't have to do
anything. But if they contain code that will break things, you're only option
is to #undef them and emulate them in the hacks header.
Your goal is to make as few changes as possible - the more changes you make,
the easier it is to introduce bugs. You'll notice in liblcd/mm/slab.c,
I carefully mark where I made changes using /* BEGIN LCD */ and /* END LCD */.
** IMPORTANT: If you have global variables that are uninitialized (in the
BSS section), you will need to manually zero them out at some point before
they are used in the LCD. You can see what those variables are by doing
something like
nm my-module.ko | grep '.* b '
nm my-module.ko | grep '.* B '
Some variables marked as __initdata do not show up as b or B via nm. I am
not sure if these are properly zero'd or not, so beware (I'm zero'ing
some out to be safe). You can see all of the symbols using:
readelf -s my-module.ko
You will see lines like this:
82: 0000000000000000 360 OBJECT LOCAL DEFAULT 19 init_kmem_cache_node
This says init_kmem_cache_node is a local variable that resides in section
19. To list all sections, do:
readelf -S my-module.ko
You will see lines like this (this is section 19):
[19] PROGBITS 0000000000000000 0000a3a0
0000000000000168 0000000000000000 WA 0 0 32
Note that init_kmem_cache_node is marked as __initdata, so it appears in this
While sucking in code into liblcd, you can build it and then see what
symbols are unresolved via nm. To be safe, you should go through every
line of the source code to see what the dependencies are, so that you
are using the macros/inlines/etc. that you expect.
When linking with a module, you can make sure all dependencies are resolved
by running nm on it, e.g.,
nm my-module.ko
This is after my-module.ko has been built and linked with liblcd/lib.a.
If you get page faults, you can look at the kernel logs to see where the
module was loaded in the host. Take the faulting address, and subtract off
the starting address of the core code that was loaded. Now objdump the
kernel module, and locate the address in there.
For example, if the faulting address was 0x1234, and the module was loaded
at address 0x1200, the offset into the module is 0x1234 - 0x1200 = 0x34.