target/target_os.c · master · a3 / vmi

Add emulated breakpoint/singlestep code in Linux personality. · 71753d5e
David Johnson authored Nov 17, 2014
This means that the personality can emulate breakpoints and
singlesteps in userspace threads, on behalf of the generic
OS Process overlay driver.  Thus, the OS Process driver can
now be stacked atop the KVM/QEMU driver with real breakpoints.
Previously, it could be stacked atop KVM/QEMU, but without
userspace breakpoints... so it wasn't especially useful for
active debugging.

The OS Process driver was previously only able to stack atop
the Xen driver, because the Xen driver supported hacked
hypervisors that would divert userspace debug exceptions to
the driver (as well as kernel debug exceptions).  Hacking
QEMU's GDB stub to do this was undesireable, and infeasible.
(The QEMU/KVM support was built through the GDB driver, because
QEMU provides a GDB server stub --- but supporting userspace
breakpoints in an overlay driver atop our GDB driver was
infeasible due to the "shared page" breakpoint issue.  The
shared page issue occurs when we place a breakpoint at
a userspace function virtual addr, and that same memory page
is later mmap'd into another process.  Our overlay driver
might only be attached to the first process, and not the
latter --- but because the page is shared, we have to
emulate breakpoint handling for the first page.

We did this in a complicated way for Xen -- we hacked the
hypervisor to siphon debug exceptions for kernel and
userspace -- and then the OS Process overlay driver would
just insert its breakpoints at *physical* addresses.
This allowed the underlying Xen driver to recognize that
a legit exception had occurred, and that it had to
"handle" the exception even though no overlay driver was
attached.

Since we could not do this for QEMU/KVM/GDB server stub (and
because we're tired of having people install our hacked Xen),
we now support the next obvious thing!

The Linux OS personality now places probes atop the kernel's
do_int3 and do_debug interrupt handlers, if the underlying
hypervisor cannot process those userspace-generated interrupts
(like our hacked Xen can and does).  When they are hit, the
personality checks to see if the overlay OS Process driver
had placed a physical page breakpoint at the faulting addr;
and if an overlay target is attached to the faulting thread;
if so, it notifies the overlay to handle the exception (and
similarly for single stepping).  If the OS Process driver
handles the exception, the personality *aborts* the interrupt
handler with a return value of 0, to emulate success.  And the
OS Process driver *has* handled the exception -- it resets
the saved RIP on the interrupt stack, removes the breakpoint
instruction, and sets EF_TF in RFLAGS on the interrupt stack.
Thus, when the IRET happens in the kernel to return to userspace,
the process singlesteps the instruction as it should.  Then
the personality has an immediate debug trap to handle in the
kernel's do_debug function -- and it is handled in the same
manner to emulate the single step.

The personality calls the (newly renamed)
target_os_emulate_(bp|ss)_handler functions to handle the
shared page breakpoint cases just like the Xen driver handled
them; it's just slightly different because the personality
now sets up the emulated single step.  The base driver (in this
case, the QEMU/KVM/GDB base driver) can only step at the
kernel level, effectively.

However, we still support the "old way", where the Xen
driver supports hacked hypervisors that siphon userspace
debug exceptions.  This code should also support unmodified
Xens, *if* the user passes the

  --hypervisor-ignores-userspace-exceptions

argument (or sets it in the Xen config struct) -- but I
haven't tried it.

Other notes:

  * I had to add explicit support for x86_64 interrupt
    stacks.  These are interesting; there are 5 per-cpu
    stacks for different interrupts.  So what I do now
    is check to see if the RSP is within the kernel
    per-task kernel stack pages (2 pages); if it is,
    we assume the stack is 2 pages.  If not, we assume
    we're on an alternate 1-page stack.  This can break
    for debug exceptions, because debug exception stacks
    are 2 pages!  But, since we are overloading the
    do_int3 and do_debug handlers, *our drivers* won't
    break it cause the stack won't have flowed onto the
    second page.  The second page is probably mostly
    for handling nested debug exceptions, or in-kernel
    KGDB -- that kind of thing.

  * I also hadn't supported writing back userspace regs
    for the current thread, if the thread was a userspace
    thread in the kernel.  That's now there.

  * There's some memmod tweaks, to allow memmods that
    don't actually *write* their changes.  In other words,
    they exist to track modifications made in some
    other way --- like through a GDB server stub writing
    a breakpoint!  You see, Stackdb needs to track those
    GDB breakpoint states, in order to do things like
    insert arbitrary code at a breakpoint.  Stackdb ---
    and particularly this commit --- allow functions to
    be "aborted", meaning instead of executing the real
    instruction at a breakpoint, we insert a RET instr,
    fix the RSP to garbage-collect the frame, and then
    single step.  This commit relies on that support to
    hook the interrupt handlers, but then "obviate" them,
    since we've handled the exception.  There is new
    code in the probe stuff too to handle this case, and
    it's all because it seemed to me in early debugging
    that the QEMU GDB server stub got kinda ticked off
    when I tried to overwrite its breakpoints temporarily.
    This did seem weird to me; maybe it was a side-effect
    of a separate bug.  Anyway, now this situation is
    actually modeled correctly --- the GDB driver yanks
    out the GDB breakpoint before inserting the return
    code --- and then it puts it back afterwards... and
    of course it's appropriately abstracted.

  * The OS Process driver now indirects its single steps
    through the underlying OS personality, via
    target_os_thread_singlestep.  This is the magic that
    allows us to support either 1) hypervisors that
    siphon off debug exceptions like our hacked Xen; and
    2) hypervisors/stubs that do not siphon them off,
    like QEMU's GDB stub.  If we have situation 1, we
    singlestep the base driver directly; if we have 2,
    we set up the userspace thread state to execute a
    single step, by modifying RFLAGS to set the EF_TF
    flag.
71753d5e