Commit c667fea5 authored by Charlie Jacobsen's avatar Charlie Jacobsen Committed by Vikram Narayanan

Two simple IPC tests are passing. Still getting mysterious hang.

Working on moving Linux's mm into lcd's.

I gave up trying to debug the hang. Confirmed the pages for the lcd's
vm are the ones I expect. Turned on red zones. All tests are passing.
Hang happens after insmod/rmmod of the lcd module about 10 - 20 times, it
varies. Sometimes one core just silently dies / doesn't even respond to
an NMI. Sometimes the ethernet driver complains (this could be an
unrelated bug that was fixed upstream).

Few things in this commit:

  1
=====

Updated documentation in Documentation/lcd-domains/.

  2
=====

Baby version of lib kernel, inside arch/x86/lcd-domains/liblcd.c.
Unfortunately due to the recursive make, this needs to be textually
included inside the modules destined for lcd's, for now.

  3
=====

Added new test modules and modified directory structure and
build system. See documentation in Documentation/lcd-domains.

  4
=====

A few tweaks to the nmi handler to print a backtrace. May remove that in
the future, as it's probably not safe to do inside an nmi handler (but if
we're in that error state, we might be desperate to know what's happening ...).

  5
=====

Changed interrupt handling in arch-dependent code. The KVM code we were using
is probably wrong for 64-bit - it doesn't properly switch stacks, etc., which
is super important for 64-bit and may be impossible to emulate in
software. I think this could be stale code inside KVM, but not sure. Dune
doesn't use it. KVM doesn't ack external interrutps on vm exit, so I think
this interrupt emulation code is always skipped (at least for non-nested
VMs).

Instead, we're not ack'ing interrupts on exit, and letting the native code
do the right thing, like Dune.

I was thinking this might be the source of the bad hang (stack
overflow, e.g.), but not true.

Conflicts:
	include/linux/sched.h
	kernel/watchdog.c
	virt/lcd-domains/lcd-cspace-tests2.c
Resolved-by: Vikram Narayanan's avatarVikram Narayanan <vikram186@gmail.com>
parent d933576e
......@@ -29,6 +29,21 @@ capabilities in LCD B's and LCD C's cspaces, using the cdt.
So, cspaces are contained in an LCD, but cdts have pointers that span across
cspaces.
========================================
RELEVANT FILES
========================================
include/lcd-domains/types.h - cptr definition, simple functions,
cspace configuration macros
virt/lcd-domains/{internal.h, cap.c} - cspace implemementation
virt/lcd-domains/kliblcd.c - cptr cache implementation for
klcd's
arch/x86/lcd-domains/liblcd.c - cptr cache implementation for
regular lcd's
========================================
OPERATIONS
========================================
......@@ -384,8 +399,6 @@ away until the cnode is locked and removed from the cspace and/or cdt).
SPECIAL CPTRS / CAPABILITIES
========================================
To be implemented soon:
cptr 0 = null, always invalid
cptr 1 = capability to lcd's endpoint for receiving replies
cptr 2 = (dynamic) capability to caller's reply endpoint, during call/reply
========================================
EXPERIENCES
========================================
-- If your code fails, but e.g. modprobe or insmod hangs, the cpu may
be stuck in VMX Non-root mode or something. You will need to reboot.
-- If you get a page fault inside the lcd, confirm you put the __init
compiler flag on your module_init routine. If you don't do that, the
init routine won't be linked with the module.
-- You can rate limit printk so you're e.g. not printing an error message
after every vm exit.
-- If you get vm exits from nmi's, you should be ok - you probably have
the nmi watchdog turned on. nmi's fire periodically, and the nmi watchdog
just does some routine checks. It will print out warnings if there's actually
a problem.
-- Beware of putting printk's inside nmi handlers. Doing a printk inside an
nmi is in general not safe, because printk uses locks - if code takes the
lock and gets interrupted by an nmi, the nmi will block trying to take the
lock. And nmi's won't fire again until that nmi handler does an EOI, so you
got a hard lock up. (More recent kernel versions use safter printk handling
inside nmi's, if I'm not mistaken - deferring the printk until it's safe to
do so.)
-- There may be a bug in the Broadcom bnx2 ethernet driver that was fixed
in the upstream kernel after we branched off version 3.10.14. You might
see - watchdog: timeout on eth0 (bnx2) etc. etc. And you may lose connectivity
and possibly a hang (if you're trying to access a file via nfs).
......@@ -25,25 +25,20 @@ test cases for examples.
LCD STATUS
========================================
To do: I will probably remove the suspend state. This seemed like it would be
simple, but the proper handling of it when combined with ipc may be
too difficult to justify right now.
An lcd can be in one of five states:
An lcd can be in one of four states:
E = Embryo - just after it is created, not configured with a starting
stack pointer, etc.
C = Configured - stack pointer, starting program counter configured
R = Running - kthread is runnable or running, and may be running
inside vm
S = Suspended - kthread is asleep or will soon sleep
D = Dead - kthread has stopped or will soon stop; lcd may be in
the process of being torn down
_____________________________________
/ lcd_destroy \
lcd_run | |
lcd_suspend | lcd_run, lcd_config |
| |
lcd_run | lcd_run, lcd_config |
.__. ^ .__. |
| | .----------->| | | |
\ | / lcd_destroy | \ | |
......@@ -52,30 +47,21 @@ An lcd can be in one of five states:
create +---+ lcd_config +---+ lcd_run +---+ lcd_destroy +---+
-------->| E |------------->| C |------------->| R |--------------->| D |
+---+ .->+---+ +---+ +---+
/ / / ^ ^ \
/ / / \ | \
/ / / | | \
'---' lcd_suspend | | lcd_run '---'
lcd_config, \ / lcd_run
lcd_suspend V / lcd_suspend
+---+ lcd_destroy
| S | lcd_config
+---+
^ \
| \
| \
'---'
lcd_config, lcd_suspend
/ / ^ \
/ / | \
/ / | \
'---' '---'
lcd_config lcd_run
lcd_destroy
lcd_config
The following transitions are an error (return non-zero), and have no effect:
E: lcd_run, lcd_suspend - you must configure the lcd first
C: lcd_config, lcd_suspend - lcd already configured; cannot suspend either
E: lcd_run - you must configure the lcd first
C: lcd_config - lcd already configured
R: lcd_run, lcd_config - lcd already running and config'd
S: lcd_suspend, lcd_config - lcd already suspended and config'd
D: all - lcd is dead
Some of these may be too restrictive, and could change in the future (e.g.,
allow re-config, allow multiple suspend calls - only first one has effect,
rest are no-ops).
allow re-config).
......@@ -113,15 +113,13 @@ Step 3 - Reboot and install
After rebooting the machine, select the new kernel to boot it.
After booting, if you built the lcd system as modules, do:
After booting, if you built the lcd system as a module, do:
[ 1 ] insmod ${MODULE_PATH}/arch/x86/lcd-domains/lcd-domains-arch.ko
[ 2 ] insmod ${MODULE_PATH}/virt/lcd-domains/lcd-domains.ko
sudo insmod ${MODULE_PATH}/virt/lcd-domains/lcd-domains.ko
where ${MODULE_PATH} is something like /lib/modules/3.10.14/kernel.
This will install the lcd system.
This will install the lcd system. You can also use modprobe.
You can now create an lcd using kliblcd. See
Documentation/lcd-domains/kliblcd.txt.
......
========================================
OVERVIEW
========================================
In virt/lcd-domains/test-mods, you can put a new group of test modules
for running with the lcd system. You will find a few in there already.
One of them - load - is built automatically and ran during a test when the
lcd module is inserted.
Beware! The load test module contains an infinite loop, so you probably don't
want to run it on your host machine (i.e., you should run it inside an
lcd where it will be preempted periodically and then stopped by the test
case).
========================================
HOW TO DO A TEST MODULE
========================================
Follow the examples in the test-mods dir when in doubt.
Step 1
------
Create a new sub dir in test-mods with source files. At least one of the
modules should probably run non-isolated - it will set up the lcd's.
Important: Modules that will run inside lcd's should textually include
the liblcd file.
Step 2
------
Set up a makefile.
Step 3
------
Modify the makefile inside virt/lcd-domains/test-mods to include your new sub
dir.
Step 4
------
Modify the Kconfig file inside virt/lcd-domains to include your new
modules.
Step 5
------
Run make menuconfig to turn on building for your modules. It will be under
Virtualization (2) at the bottom.
Step 6
------
Run
make
and then
sudo make modules_install install
do install.
Step 7
------
Ensure the lcd-domains.ko module is insmod'd. Insert your boot module, which
should boot all lcd's, etc.
......@@ -3,51 +3,70 @@
OVERVIEW
==============================
This code is in arch/x86/lcd-domains/ and
arch/x86/include/asm/lcd-domains-arch.h.
This code is in arch/x86/lcd-domains/ and arch/x86/include/asm/lcd-domains/.
The two main objects are struct lcd_arch and struct lcd_arch_thread, defined in
arch/x86/include/asm/lcd-domains-arch.h.
The main object is struct lcd_arch, defined in
arch/x86/include/asm/lcd-domains/lcd-domains.h.
struct lcd_arch contains the extended page table (EPT) guest physical address
space information, and a list of the contained lcd_arch_thread's.
space information, and data for the Intel VT-x virtual machine.
struct lcd_arch_thread corresponds with an Intel VT-x virtual machine, and
contains all state needed to manage it. Each lcd_arch_thread uses the EPT of
its parent lcd_arch.
See the comments in the header lcd-domains.h (above), and the test
cases in arch/x86/lcd-domains/tests/ for more info.
See the comments in the header lcd-domains-arch.h (above), and the test
cases in arch/x86/lcd-domains/lcd-domains-arch-tests.c.
========================================
BUILD
========================================
main.c is built and linked via the arch-independent code make file inside
virt/lcd-domains/Makefile. This is safe for the recursive make because only
one module - lcd-domains.ko - needs it.
No configuration is necessary.
========================================
EXAMPLE
INTERRUPTS
========================================
Here is an example how to set up one lcd_arch with one lcd_arch_thread, and
run it (without the error checking).
All external interrupts and exceptions cause VM exits. We do not ack
external interrupts on vm exits (as before) because emulating interrupt
handling for 64-bit is too hard and maybe even impossible to do in software.
(We were using KVM's code before, but I think this is stale code that is never
run anymore, because KVM does not ack interrupts on vm exit.)
If you do allow an lcd to handle some interrupts, make sure the reschedule
interrupt still causes a vm exit. We are relying on this (as does KVM) for
forcing an lcd to exit so we can schedule a higher priority thread (notice
the cond_resched inside the arch-independent code). You can see the
IRQ vectors for x86 inside arch/x86/include/asm/irq_vectors.h. The reschedule
vector is a very high priority vector.
struct lcd_arch *lcd_arch;
struct lcd_arch_thread *lcd_arch_thread;
========================================
EXAMPLE
========================================
Here is an example how to set up one lcd_arch, and run it, without
error checking:
struct lcd_arch *lcd_arch;
/*
* Create the lcd_arch
*/
lcd_arch = lcd_arch_create();
lcd_arch_create(&lcd_arch);
/* (...Allocate and map pages in the LCD...) */
/* (...Allocate and map pages in the LCD using the ept functions...) */
/*
* Create and add a thread
* Set up the lcd_arch's program counter, stack pointer, etc.
*/
lcd_arch_thread = lcd_arch_add_thread(lcd_arch);
/* (...Set up the thread's program counter, etc...) */
lcd_arch_set_pc(lcd_arch, some_gva);
lcd_arch_set_sp(lcd_arch, some_gva);
lcd_arch_set_gva_root(lcd_arch, some_gpa);
/*
* Tear down the thread
* Run it
*/
lcd_arch_destroy_thread(lcd_arch_thread);
lcd_arch_run(lcd_arch);
/*
* Tear down the LCD
......@@ -92,7 +111,7 @@ LOCKING
TODO: Some of the arch code is not thread safe.
Locks are used on an lcd_arch's ept and its list of lcd_arch_threads.
Locks are initialized but not yet used for the lcd_arch's ept.
We use mutexes for now, so some functions are not safe to call from
interrupt context (we can sleep when we lock a mutex).
......
......@@ -10,9 +10,6 @@ obj-$(CONFIG_XEN) += xen/
# lguest paravirtualization support
obj-$(CONFIG_LGUEST_GUEST) += lguest/
# Lightweight Capability Domains (LCD)
obj-$(CONFIG_LCD_INTEL) += lcd-domains/
# LCD paravirtualization support
obj-$(CONFIG_LCD_GUEST) += lcdguest/
......
#ifndef _ASM_X86_LCD_DOMAINS_ARCH_H
#define _ASM_X86_LCD_DOMAINS_ARCH_H
#ifndef _ASM_X86_LCD_DOMAINS_LCD_DOMAINS_H
#define _ASM_X86_LCD_DOMAINS_LCD_DOMAINS_H
#include <asm/vmx.h>
#include <linux/spinlock.h>
#include <linux/bitmap.h>
#include <lcd-domains/types.h>
extern int lcd_on_cpu;
extern int lcd_in_non_root;
/* DEBUGGING -------------------------------------------------- */
#define LCD_ARCH_DEBUG 0
#define LCD_ARCH_ERR(msg...) __lcd_arch_err(__FILE__, __LINE__, msg)
static inline void __lcd_arch_err(char *file, int lineno, char *fmt, ...)
{
va_list args;
printk(KERN_ERR "lcd-vmx: %s:%d: error: ", file, lineno);
va_start(args, fmt);
vprintk(fmt, args);
va_end(args);
}
#define LCD_ARCH_MSG(msg...) __lcd_arch_msg(__FILE__, __LINE__, msg)
static inline void __lcd_arch_msg(char *file, int lineno, char *fmt, ...)
{
va_list args;
printk(KERN_ERR "lcd-vmx: %s:%d: note: ", file, lineno);
va_start(args, fmt);
vprintk(fmt, args);
va_end(args);
}
#define LCD_ARCH_WARN(msg...) __lcd_arch_warn(__FILE__, __LINE__, msg)
static inline void __lcd_arch_warn(char *file, int lineno, char *fmt, ...)
{
va_list args;
printk(KERN_ERR "lcd-vmx: %s:%d: warning: ", file, lineno);
va_start(args, fmt);
vprintk(fmt, args);
va_end(args);
}
/* LCD ARCH DATA STRUCTURES ---------------------------------------- */
struct lcd_arch_vmcs {
......@@ -304,6 +340,13 @@ int lcd_arch_set_gva_root(struct lcd_arch *lcd_arch, gpa_t a);
* Accessor Macro for syscalls
* ===========================
*/
#define LCD_ARCH_GET_SYSCALL_NUM(lcd) (lcd->regs[LCD_ARCH_REGS_RAX])
static inline u64 lcd_arch_get_syscall_num(struct lcd_arch *lcd)
{
return lcd->regs[LCD_ARCH_REGS_RAX];
}
static inline void lcd_arch_set_syscall_ret(struct lcd_arch *lcd, u64 val)
{
lcd->regs[LCD_ARCH_REGS_RAX] = val;
}
#endif /* _ASM_X86_LCD_DOMAINS_ARCH_H */
This diff is collapsed.
......@@ -36,6 +36,7 @@ void arch_trigger_all_cpu_backtrace(bool include_self)
{
nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
}
EXPORT_SYMBOL(arch_trigger_all_cpu_backtrace);
static int
arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
......@@ -47,6 +48,7 @@ arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
}
NOKPROBE_SYMBOL(arch_trigger_all_cpu_backtrace_handler);
static int __init register_trigger_all_cpu_backtrace(void)
{
register_nmi_handler(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler,
......
#
# Makefile for VMX code for LCDs
# Arch-dependent code for LCDs
#
ccflags-y += -Werror -O0
extra-y += main.o
obj-$(CONFIG_LCD_INTEL) += lcd-domains-arch.o
extra-y += liblcd.o
/**
* liblcd.c - Code for microkernel interface for isolated code.
*
* Authors:
* Charlie Jacobsen <charlesj@cs.utah.edu>
*/
#include <asm/lcd-domains/liblcd.h>
#include <lcd-domains/utcb.h>
#include <lcd-domains/types.h>
/* CPTR CACHE -------------------------------------------------- */
/* This is not needed yet. */
#if 0
/*
* XXX: This is hardwired for a depth of 3 (4 levels) so that we
* don't have to use kmalloc. If you change the cspace depth, table size,
* etc., you will need to change this.
*
* XXX: We don't use a lock because this is only used by one thread.
*/
#define LCD_BMAP0_SIZE (1 << (LCD_CPTR_SLOT_BITS + 0 * LCD_CPTR_FANOUT_BITS))
#define LCD_BMAP1_SIZE (1 << (LCD_CPTR_SLOT_BITS + 1 * LCD_CPTR_FANOUT_BITS))
#define LCD_BMAP2_SIZE (1 << (LCD_CPTR_SLOT_BITS + 2 * LCD_CPTR_FANOUT_BITS))
#define LCD_BMAP3_SIZE (1 << (LCD_CPTR_SLOT_BITS + 3 * LCD_CPTR_FANOUT_BITS))
struct cptr_cache {
/* level 0 */
unsigned long bmap0[LCD_BMAP0_SIZE];
/* level 1 */
unsigned long bmap1[LCD_BMAP1_SIZE];
/* level 2 */
unsigned long bmap2[LCD_BMAP2_SIZE];
/* level 3 */
unsigned long bmap3[LCD_BMAP3_SIZE];
};
/*
* XXX: For now, each kernel module has one thread, so we can use one cptr
* cache. (Avoid using kmalloc.)
*/
static struct cptr_cache cptr_cache;
static void cptr_cache_init(struct cptr_cache *cache)
{
/*
* Only need to set the null cptr's bit so we don't
* allocate that
*/
set_bit(0, cache->bmap0);
}
static void cptr_cache_destroy(struct cptr_cache *cache)
{
/* no op for now - no kmalloc */
return;
}
static int __lcd_alloc_cptr_from_bmap(unsigned long *bmap, int size,
unsigned long *out)
{
unsigned long idx;
/*
* Find next zero bit
*/
idx = find_first_zero_bit(bmap, size);
if (idx >= size)
return 0; /* signal we are full */
/*
* Set bit to mark cptr as in use
*/
set_bit(idx, bmap);
*out = idx;
return 1; /* signal we are done */
}
static int __lcd_alloc_cptr(struct cptr_cache *cache, cptr_t *free_cptr)
{
int ret;
int depth;
int done;
unsigned long *bmap;
unsigned long idx;
int size;
/*
* Can't use a loop since we didn't kmalloc the bitmaps
*/
done = __lcd_alloc_cptr_from_bmap(cache->bmap0, LCD_BMAP0_SIZE, &idx);
if (done) {
depth = 0;
goto found;
}
done = __lcd_alloc_cptr_from_bmap(cache->bmap1, LCD_BMAP1_SIZE, &idx);
if (done) {
depth = 0;
goto found;
}
done = __lcd_alloc_cptr_from_bmap(cache->bmap2, LCD_BMAP2_SIZE, &idx);
if (done) {
depth = 0;
goto found;
}
done = __lcd_alloc_cptr_from_bmap(cache->bmap3, LCD_BMAP3_SIZE, &idx);
if (done) {
depth = 0;
goto found;
}
/* Didn't find one */
return -ENOMEM;
found:
/*
* Found one; dec depth back to what it was, and encode
* depth in cptr
*/
idx |= (depth << LCD_CPTR_LEVEL_SHIFT);
*free_cptr = __cptr(idx);
return 0;
fail2:
fail1:
return ret;
}
void __lcd_free_cptr(struct cptr_cache *cache, cptr_t c)
{
int ret;
unsigned long *bmap;
unsigned long bmap_idx;
unsigned long level;
/*
* Get the correct level bitmap
*/
level = lcd_cptr_level(c);
switch (level) {
case 0:
bmap = cache->bmap0;
break;
case 1:
bmap = cache->bmap1;
break;
case 2:
bmap = cache->bmap2;
break;
case 3:
bmap = cache->bmap3;
break;
default:
/* error shouldn't happen, but if so, just return */
return;
}
/*
* The bitmap index includes all fanout bits and the slot bits
*/
bmap_idx = ((1 << (LCD_CPTR_FANOUT_BITS * level + LCD_CPTR_SLOT_BITS))
- 1) & cptr_val(c);
/*
* Clear the bit in the bitmap
*/
clear_bit(bmap_idx, bmap);
return;
}
int lcd_alloc_cptr(cptr_t *free_slot)
{
return __lcd_alloc_cptr(cptr_cache, free_slot);
}
void lcd_free_cptr(cptr_t c)
{
__lcd_free_cptr(cptr_cache, c);
}
#endif
int lcd_alloc_cptr(cptr_t *free_slot)
{
return -ENOSYS;
}
void lcd_free_cptr(cptr_t c)
{
return;
}
/* BOOT INFO FRAME -------------------------------------------------- */
/* GUEST PHYSICAL ALLOC -------------------------------------------------- */
/* GUEST VIRTUAL ALLOC / MAP / ETC ---------------------------------------- */
/* LOW LEVEL PAGE ALLOC -------------------------------------------------- */
int lcd_page_alloc(cptr_t *slot_out, gpa_t gpa)
{
return -ENOSYS;
}
int lcd_gfp(cptr_t *slot_out, gpa_t *gpa_out, gva_t *gva_out)
{
return -ENOSYS;
}
/* LCD ENTER / EXIT -------------------------------------------------- */
int lcd_enter(void)
{
/*
* For now, we don't do anything. In the future, we could set up
* the cptr cache, page allocation, etc.
*/
return 0;
}
void __noreturn lcd_exit(int retval)
{
lcd_set_r0(retval);
for(;;)
LCD_DO_SYSCALL(LCD_SYSCALL_EXIT); /* doesn't return */
}
/* IPC -------------------------------------------------- */
int lcd_create_sync_endpoint(cptr_t *slot_out)
{
return -ENOSYS;
}
int lcd_send(cptr_t endpoint)
{
lcd_set_cr0(endpoint);
return LCD_DO_SYSCALL(LCD_SYSCALL_SEND);
}
int lcd_recv(cptr_t endpoint)
{
lcd_set_cr0(endpoint);
return LCD_DO_SYSCALL(LCD_SYSCALL_RECV);
}
int lcd_call(cptr_t endpoint)
{
lcd_set_cr0(endpoint);
return LCD_DO_SYSCALL(LCD_SYSCALL_CALL);
}
int lcd_reply(void)
{
return LCD_DO_SYSCALL(LCD_SYSCALL_REPLY);
}
/* LCD CREATE / SETUP -------------------------------------------------- */
int lcd_create(cptr_t *slot_out, gpa_t stack)
{
return -ENOSYS;
}
int lcd_config(cptr_t lcd, gva_t pc, gva_t sp, gpa_t gva_root)
{
return -ENOSYS;
}
int lcd_run(cptr_t lcd)
{
return -ENOSYS;
}
int lcd_suspend(cptr_t lcd)
{
return -ENOSYS;
}
/* CAPABILITIES -------------------------------------------------- */
int lcd_cap_grant(cptr_t lcd, cptr_t src, cptr_t dest)
{
return -ENOSYS;
}
int lcd_cap_page_grant_map(cptr_t lcd, cptr_t page, cptr_t dest, gpa_t gpa)
{
return -ENOSYS;
}