Working on moving Linux's mm into lcd's.
I gave up trying to debug the hang. Confirmed the pages for the lcd's
vm are the ones I expect. Turned on red zones. All tests are passing.
Hang happens after insmod/rmmod of the lcd module about 10 - 20 times, it
varies. Sometimes one core just silently dies / doesn't even respond to
an NMI. Sometimes the ethernet driver complains (this could be an
unrelated bug that was fixed upstream).
Few things in this commit:
Updated documentation in Documentation/lcd-domains/.
Baby version of lib kernel, inside arch/x86/lcd-domains/liblcd.c.
Unfortunately due to the recursive make, this needs to be textually
included inside the modules destined for lcd's, for now.
Added new test modules and modified directory structure and
build system. See documentation in Documentation/lcd-domains.
A few tweaks to the nmi handler to print a backtrace. May remove that in
the future, as it's probably not safe to do inside an nmi handler (but if
we're in that error state, we might be desperate to know what's happening ...).
Changed interrupt handling in arch-dependent code. The KVM code we were using
is probably wrong for 64-bit - it doesn't properly switch stacks, etc., which
is super important for 64-bit and may be impossible to emulate in
software. I think this could be stale code inside KVM, but not sure. Dune
doesn't use it. KVM doesn't ack external interrutps on vm exit, so I think
this interrupt emulation code is always skipped (at least for non-nested
Instead, we're not ack'ing interrupts on exit, and letting the native code
do the right thing, like Dune.
I was thinking this might be the source of the bad hang (stack
overflow, e.g.), but not true.
Resolved-by: Vikram Narayanan <firstname.lastname@example.org>