Skip to content
  • David Johnson's avatar
    Add emulated breakpoint/singlestep code in Linux personality. · 71753d5e
    David Johnson authored
    This means that the personality can emulate breakpoints and
    singlesteps in userspace threads, on behalf of the generic
    OS Process overlay driver.  Thus, the OS Process driver can
    now be stacked atop the KVM/QEMU driver with real breakpoints.
    Previously, it could be stacked atop KVM/QEMU, but without
    userspace breakpoints... so it wasn't especially useful for
    active debugging.
    
    The OS Process driver was previously only able to stack atop
    the Xen driver, because the Xen driver supported hacked
    hypervisors that would divert userspace debug exceptions to
    the driver (as well as kernel debug exceptions).  Hacking
    QEMU's GDB stub to do this was undesireable, and infeasible.
    (The QEMU/KVM support was built through the GDB driver, because
    QEMU provides a GDB server stub --- but supporting userspace
    breakpoints in an overlay driver atop our GDB driver was
    infeasible due to the "shared page" breakpoint issue.  The
    shared page issue occurs when we place a breakpoint at
    a userspace function virtual addr, and that same memory page
    is later mmap'd into another process.  Our overlay driver
    might only be attached to the first process, and not the
    latter --- but because the page is shared, we have to
    emulate breakpoint handling for the first page.
    
    We did this in a complicated way for Xen -- we hacked the
    hypervisor to siphon debug exceptions for kernel and
    userspace -- and then the OS Process overlay driver would
    just insert its breakpoints at *physical* addresses.
    This allowed the underlying Xen driver to recognize that
    a legit exception had occurred, and that it had to
    "handle" the exception even though no overlay driver was
    attached.
    
    Since we could not do this for QEMU/KVM/GDB server stub (and
    because we're tired of having people install our hacked Xen),
    we now support the next obvious thing!
    
    The Linux OS personality now places probes atop the kernel's
    do_int3 and do_debug interrupt handlers, if the underlying
    hypervisor cannot process those userspace-generated interrupts
    (like our hacked Xen can and does).  When they are hit, the
    personality checks to see if the overlay OS Process driver
    had placed a physical page breakpoint at the faulting addr;
    and if an overlay target is attached to the faulting thread;
    if so, it notifies the overlay to handle the exception (and
    similarly for single stepping).  If the OS Process driver
    handles the exception, the personality *aborts* the interrupt
    handler with a return value of 0, to emulate success.  And the
    OS Process driver *has* handled the exception -- it resets
    the saved RIP on the interrupt stack, removes the breakpoint
    instruction, and sets EF_TF in RFLAGS on the interrupt stack.
    Thus, when the IRET happens in the kernel to return to userspace,
    the process singlesteps the instruction as it should.  Then
    the personality has an immediate debug trap to handle in the
    kernel's do_debug function -- and it is handled in the same
    manner to emulate the single step.
    
    The personality calls the (newly renamed)
    target_os_emulate_(bp|ss)_handler functions to handle the
    shared page breakpoint cases just like the Xen driver handled
    them; it's just slightly different because the personality
    now sets up the emulated single step.  The base driver (in this
    case, the QEMU/KVM/GDB base driver) can only step at the
    kernel level, effectively.
    
    However, we still support the "old way", where the Xen
    driver supports hacked hypervisors that siphon userspace
    debug exceptions.  This code should also support unmodified
    Xens, *if* the user passes the
    
      --hypervisor-ignores-userspace-exceptions
    
    argument (or sets it in the Xen config struct) -- but I
    haven't tried it.
    
    Other notes:
    
      * I had to add explicit support for x86_64 interrupt
        stacks.  These are interesting; there are 5 per-cpu
        stacks for different interrupts.  So what I do now
        is check to see if the RSP is within the kernel
        per-task kernel stack pages (2 pages); if it is,
        we assume the stack is 2 pages.  If not, we assume
        we're on an alternate 1-page stack.  This can break
        for debug exceptions, because debug exception stacks
        are 2 pages!  But, since we are overloading the
        do_int3 and do_debug handlers, *our drivers* won't
        break it cause the stack won't have flowed onto the
        second page.  The second page is probably mostly
        for handling nested debug exceptions, or in-kernel
        KGDB -- that kind of thing.
    
      * I also hadn't supported writing back userspace regs
        for the current thread, if the thread was a userspace
        thread in the kernel.  That's now there.
    
      * There's some memmod tweaks, to allow memmods that
        don't actually *write* their changes.  In other words,
        they exist to track modifications made in some
        other way --- like through a GDB server stub writing
        a breakpoint!  You see, Stackdb needs to track those
        GDB breakpoint states, in order to do things like
        insert arbitrary code at a breakpoint.  Stackdb ---
        and particularly this commit --- allow functions to
        be "aborted", meaning instead of executing the real
        instruction at a breakpoint, we insert a RET instr,
        fix the RSP to garbage-collect the frame, and then
        single step.  This commit relies on that support to
        hook the interrupt handlers, but then "obviate" them,
        since we've handled the exception.  There is new
        code in the probe stuff too to handle this case, and
        it's all because it seemed to me in early debugging
        that the QEMU GDB server stub got kinda ticked off
        when I tried to overwrite its breakpoints temporarily.
        This did seem weird to me; maybe it was a side-effect
        of a separate bug.  Anyway, now this situation is
        actually modeled correctly --- the GDB driver yanks
        out the GDB breakpoint before inserting the return
        code --- and then it puts it back afterwards... and
        of course it's appropriately abstracted.
    
      * The OS Process driver now indirects its single steps
        through the underlying OS personality, via
        target_os_thread_singlestep.  This is the magic that
        allows us to support either 1) hypervisors that
        siphon off debug exceptions like our hacked Xen; and
        2) hypervisors/stubs that do not siphon them off,
        like QEMU's GDB stub.  If we have situation 1, we
        singlestep the base driver directly; if we have 2,
        we set up the userspace thread state to execute a
        single step, by modifying RFLAGS to set the EF_TF
        flag.
    71753d5e