1. 09 Oct, 2014 3 commits
    • David Johnson's avatar
    • David Johnson's avatar
    • David Johnson's avatar
      Merge branch 'tapref' · 7211c34b
      David Johnson authored
      A quick summary of new features and changes:
        * Split of xen_vm and xen_vm_process drivers into xen_vm driver
          and os_linux_generic personality
        * KVM/QEMU support through a GDB driver, which is pluggable to
          enhance its abilities beyond what GDB stubs can do; naturally there
          is a KVM/QEMU enhancer.
        * Better architecture support, and a register cache data structure.
        * Deliberate support for target events.
        * Improved active probing mechanisms.
        * Expansion of the OS and Process personalities.
        * Mainly speed improvements to dwdebug.
  2. 22 Sep, 2014 1 commit
  3. 20 Sep, 2014 4 commits
    • David Johnson's avatar
      Add some real examples, and an example Makefile, for users. · e0e595f2
      David Johnson authored
      This Makefile uses pkg-config based on the .pc files installed for
      the target and vmilib sub-libraries, and assumes those files were
      installed in @prefix@ -- which is how things should be.
      Users should setup their Makefiles like this to build and link
      against the installed libraries.
      Also, I wrote two quick examples, demonstrating the use of the
      ptrace driver specifically, and the target library generally.
      ptrace.c allows you to attach to a single process and probe it.
      ptrace-multi-process.c allows you to attach to multiple processes,
      and probe them selectively.  This shows how to use target_poll instead
      of target_monitor(); etc.
      ptrace.c is highly-commented; ptrace-multi-process.c assume you've
      read the comments in ptrace.c ; but it does add a few comments
      explaining some of the differences.
    • David Johnson's avatar
      Install a bunch more header files. · a1e6ad7e
      David Johnson authored
    • David Johnson's avatar
      Add the --stay-paused (-P) option. · 81d03abf
      David Johnson authored
      Not all drivers can support this; the GDB driver makes an attempt.
      Overlay drivers probably can't support it; ours don't right now.
      The Xen and Ptrace drivers work.
      -P used to specify --start-paused, but that means nothing right now;
      that would only change if we ever implement --async (which would
      imply --read-only too).
      Anyway... this feature is long overdue.
    • David Johnson's avatar
      Increase the ptrace driver's poll() capabilities. · 3dc780dd
      David Johnson authored
      Before this, both the _poll() and _monitor() driver backend functions
      were using waitpid(-1,...).  That's great for single target scenarios,
      even for multithreaded scenarios -- but not both.
      So we make the best of the situation, and now use waitpid(-pid,...) in
      _poll().  This allows us to wait for any child in pid's process group.
  4. 19 Sep, 2014 9 commits
    • David Johnson's avatar
      Big speedup of the ELF range handling code. · e6c2f96d
      David Johnson authored
      For quite some time now, the ELF symbol loader has attempted to
      assign sizes to 0-byte length symbols.  We do this to create an
      indexed (by base addr) list of symbol *ranges*, which we can then
      use to search the program's address space.  We can debate the
      merits of this, but anyway...
      Before, I had written some complex code that would call the very
      expensive clrange_update_end to change the range of an already-
      inserted symbol.  This is quite complicated, and it was incredibly
      So what this new code does is save 0-byte length symbols until a
      final pass at the end, and it does not add them to the range data
      structure as it is reading the ELF symbol tables, like before.
      Thus there is no update to be done; we can just add the symbols to
      the range structure if we can infer a length based on the following
      symbol or the end of the ELF section.  Note we do this in reverse
      order, of course, because the "following" symbol for any given
      0-byte length symbol X might not have been added yet!  So we have to
      go backwards.
      This is much faster, O(n) instead of O(n**2), if I recall correctly
      what it was.  Silly me.  Saves *seconds* on massive kernel debugfiles.
    • David Johnson's avatar
      Fix active thread_entry probing when copy_process is not probe-able. · c0218280
      David Johnson authored
      On some Linuxes, copy_process (the best previous point we found to
      hook to catch the new process after it's been fully created, or nearly
      so) can't be probed the way we need.  Basically, it can get inlined,
      and we don't have function entry/exit probes for inlined functions
      (and we never will, because the inlined functions of course don't
      have well-defined exit points anymore...).
      So, we ifdef out the copy_process stuff, and instead place probes
      on wake_up_new_task instead.  Not sure why I didn't use this initially,
      but it seems to be a good choice.  wake_up_new_task basically puts the
      task onto the run queues, so it's a great choice for us.
    • David Johnson's avatar
      Make sure threads sleeping in schedule() on x86_64 have an IP. · ce8e9abf
      David Johnson authored
      Linux saves %eip on x86 systems in context switch in thread_struct,
      which is triggered when a thread calls schedule().
      Linux on x86_64 doesn't have to save %rip in context switch after
      schedule(); the context switch is effected by swapping new/old %rsp
      (and of course all the other state too).  This means there is no
      saved %rip for a thread that has "stopped" executing during context
      switch.  This meant we could not backtrace sleeping threads.  Well,
      there is one exception to this rule -- threads that were preempted
      in the kernel will of course have %rip on the stack.  We already
      handle that one.
      So these sleeping threads that are waiting to be context-switched in
      might not have saved %rip, but we know where it is.  Basically, for
      preexisting threads, we choose to set the %rip at the point in context
      switch where the scheduler has just saved its %rsp into thread_struct;
      so the next instruction (when the sleeping thread is resumed) will be
      to set the current %rsp to the value that was saved in thread_struct.
      This seems to be the correct point because we want our backtracer to
      see a "correct" stack (plus, the rest of schedule() and __switch_to
      happen on the stack of the incoming thread).
      But the situation is different for newly-forked threads.  We detect
      those the same way the kernel does -- _TIF_FORK set in
      thread_info->flags.  In this case, the thread immediately returns via
      ret_from_fork as it starts its execution.  So that is what we set the
      IP to in this case.
      A backtrace in every pot.
    • David Johnson's avatar
      Improve our handling of page dir values per thread. · 2c5a15b7
      David Johnson authored
      First, fixup the loading of swapper_pg_dir -- newer kernels don't have
      it.  If it's not there, read init_mm.pgd instead.
      Then, when loading threads from memory, if the thread is a kernel thread,
      instead of leaving the thread's pgd value empty, set it to whatever we
      got from the above detection.
      Finally, also save each thread's current pgd value in our personality
      thread info, and invalidate it at the right time to force it to be
      reloaded.  No sense in re-translating it (v2p) each time we need it,
      even if we're caching the translation in a mem backend...
    • David Johnson's avatar
      Handle a special case where a PV guest is in the Xen hypercall page. · 71a1fd65
      David Johnson authored
      In this case, we can't load the current thread, because (at least
      sometimes!) the current thread_info->task pointer is messed up --
      like 0xffffffff00000000 .  This is strange, but observable.  So
      instead of failing, we just load init_task as the task.  This seems
      to be the right thing to do... we're on that stack in the hypercall.
      At least it allows us to backtrace out of the hypercall stack.
      However, the OS Linux personality has to read the hypercall_page
      symbol to find it; and then the personality sets a special config
      key that the target driver can read (in this case, the Xen driver
      saves it off, but it doesn't currently do anything with it).
      I believe this should fix the behavior we used to see where the
      current thread failed to load on Xen PVMs.  Details supporting
      my analysis:
      My IP was in xen_hypercall_sched_op on a 3.8.x PV.  The current
      thread_info (pointed to by %gs+static_offset because it's percpu data on
      x86_64, and percpu data is indexed off the %gs pointer) in this case is
      the global thread_info associated with the init_task task struct (both
      structs point to each other).  You can check that the default
      thread_info static data is correctly, statically initialized by reading
      the vmlinux binary.
      But by the time I am in xen_hypercall_sched_op (on PV x86_64), the
      default thread_info->task pointer is munged so it no longer points to
      init_task, and instead points to 0xffffffff00000000, and that is not valid.
      I took a fast look in the kernel src, especially the entry.S asm, and
      can't see this manipulation -- but it's the kind of thing that is easy
      to hide.  Maybe this is a "convenient" way of detecting illegal uses of
      current() in interrupt context, or when making a hypercall (current()
      is the current task, basically *(%gs+thread_info_offset)).
      Now, I *did* have a guard against loading *current() in interrupt
      context (it's only valid in process context) -- I check the
      thread_info->preempt_count field to see if we're in an interrupt context
      -- but when making a hypercall we are not in interrupt context,
      apparently (or at least not necessarily in one -- and presumably
      xen_hypercall_sched_op is a yield via the idle path).
    • David Johnson's avatar
    • David Johnson's avatar
    • David Johnson's avatar
      Note the appearance of tgid in the thread struct in the SPF README. · 29609bea
      David Johnson authored
      It's just a new key to filter on in thread dumps.
    • David Johnson's avatar
  5. 17 Sep, 2014 2 commits
  6. 05 Sep, 2014 5 commits
  7. 04 Sep, 2014 4 commits
  8. 03 Sep, 2014 3 commits
  9. 02 Sep, 2014 3 commits
    • David Johnson's avatar
      Refactor part two: convert xen-process target into generic os-process. · 39819b65
      David Johnson authored
      This commit contains the conversion of the target_xen_vm_process target
      into the target_os_process target.  What this basically means is that
      most of the linux-specific and Xen-specific code that was in the
      previous driver is now moved into the os_linux_generic personality; and
      the logic in the target_xen_vm_process driver is now part of the
      target_os_process driver.  That driver is completely generic; it can sit
      atop any other driver that is paired with an OS personality.  The
      os_process driver gets all its information from loading target_process
      objects from the underlying driver, and using that information to build
      its own model of the world.  Its active probing support is implemented
      by enabling OS_PROCESS active probing in the underlying target, and
      listening for the target-object-events that are generated -- and
      handling them.  There is better generic support for both the OS and
      process level.
      One core piece of this refactor is the use of the full weak reference
      support.  This basically means that many target objects assure the
      validity of backref pointers better -- and that deallocation can happen
      to refcnt'd objects when weak refs are dropped -- not just regular refs.
      I use weak refs as a cyclic dependency breaker, which is all I need.
      Kind of weird, but I don't need a completely generic weak refcnt system!
      Anyway, what this means is that we are now ready for the case where a
      language binding can hold onto C-level objects in crazy ways (i.e.,
      given the standard target object hierarchy of
      target->space->region->range, a Python binding could be able to hold
      onto range, drop the target object, and the whole backref chain will be
      intact, and get dropped when range is dropped).  (One thing the weak ref
      stuff does mean is that the <type>_free deallocators must be more
      careful!  Even if their refcnt has gone to nil, they must hold a temp
      ref if they might free things that hold a weak ref to them -- because if
      both the refcnt and refcntw counts go to 0, the deallocator will be
      called again on an RPUTW.  Hence the use of RWGUARD/RWUNGUARD.)
      Drivers can now generate target events to broadcast new, changed, or
      deleted target objects.  This replaces the state change stuff.  Events
      are now broadcast to relevant parties -- namely underlying and
      overlaying targets.  For instance, active probing support in the
      os_process target is built atop the assumption that the underlying
      target supports active OS_PROCESS probing, and generates OS_PROCESS
      events when things change.
      I also needed to make active probing mechanism a bit more useful.  So
      rather than just the the thread_entry, thread_exit, and memory flags for
      each target, I split them into personality-level-specific flags:
      os_thread_entry, process_memory, etc.  This allowed to add a set of
      os_process_* active probing flags, which the os_linux_generic
      personality understands, and uses to dynamically track processes and
      their address spaces.  This is the bit that was critical enable part of
      the conversion from target_xen_process to target_os_process, because it
      allowed me to have the linux personality track processes and their
      addrspaces actively, and to have the generic os_process driver toggle
      that via active probing.  I'm not sure how active probing will change in
      the future; hopefully this is enough for now...
      Also, refactor the memory code (addrspaces, memregions, memranges) to
      use GLists instead of the Linux linked lists.  We don't need the
      advantages of the Linux linked lists.  Also, employ weak refs and object
      liveness data and macros throughout, instead of the old custom "new" and
      "updated" (et al) members.  Much improved.  Technically, a user could
      now hold onto a memrange, and the object chain all the way up to the
      containing target or target_process should be intact until they drop the
      memrange; perfect.  This means they're easier to expose to users via a
      language binding, and more amenable to reuse throughout the target
      library (i.e., the target_process object owns a (disconnected --
      meaning, not bound to a target) addrspace object, which is populated
      with regions and ranges like a normal addrspace describing a valid
      target.  Thus, we avoid duplicating a lot of code and/or logic, by
      having silly duplicates like target_process and target_os_process, or
      having multiple kinds of memregions.  There is some waste; memregions
      and spaces that are not bound to targets will never be fully utilized;
      but lots of code reuse.
      A few more notes:
      This commit introduces target_finalize as the replacement for
      target_free.  Basically, the target library never RHOLDs target objects
      on behalf of users; it only caches them in its global hashtable (iff the
      user has called target_init()), and holds them there.  target_free thus
      becomes the refcnt-callable deallocator for target objects, and thus
      cannot be exposed to users.  It's a better pairing for
      target_instantiate() anyway.
      target_notify_overlay() now passes target_exception_flags_t to better
      encapsulate the nature of the underlying exception.  We'll see if this
      is a good compromise between abstraction and exception handling, or not.
      Get rid of all the personality wrapper functions and replace them with
      SAFE_* macros.  These are better!
      Make debugfile caching explicit, and expose functions to evict
      unnecessary debugfiles.  This also handles the cached debugfile RHOLDs
      appropriately in dwdebug_fini().  Much better.
    • David Johnson's avatar
      Add helper macros for GLists; etc. · 03048720
      David Johnson authored
    • David Johnson's avatar
  10. 27 Aug, 2014 1 commit
  11. 26 Aug, 2014 1 commit
  12. 22 Aug, 2014 1 commit
  13. 14 Aug, 2014 3 commits
    • Mike Hibler's avatar
      Implement kill_parent_process, fixes to command startup, robustness fixes. · 99078688
      Mike Hibler authored
      Actually recognize the kill_parent_process recovery rule and invoke the
      kernel module correctly.
      Changes to make startup of process work (more) correctly. Argument parsing
      was mishandled in some cases. Start all commands via "sh -c" so that there
      is a "regular" parent process to kill with future kill_parent_process
      invocations. Otherwise, we would kill the kernel user-mode helper thread.
      Make sure we don't ever try to kill process 1 (which can happen for a
      detached process that triggers the kill_parent rules). Otherwise, the
      party ends rather abruptly.
      Get rid of all the "exit(0)" calls within functions called when taking
      a snapshot, just return an error so that we can retry the snapshot (after
      letting the guest run briefly). This required further memory cleanup
      code in a (large) number of places. These inconsistent snapshots seem
      to happen quite a lot when running at high frequency and when the guest
      is busy.
    • Mike Hibler's avatar
    • Mike Hibler's avatar
      Make sure trusted load module gets built/installed. · d0ece4e6
      Mike Hibler authored
      De-lint said module.