1. 11 Oct, 2007 7 commits
  2. 18 Aug, 2007 1 commit
  3. 18 Jul, 2007 1 commit
    • Jeremy Fitzhardinge's avatar
      xen: Core Xen implementation · 5ead97c8
      Jeremy Fitzhardinge authored
      This patch is a rollup of all the core pieces of the Xen
      implementation, including:
       - booting and setup
       - pagetable setup
       - privileged instructions
       - segmentation
       - interrupt flags
       - upcalls
       - multicall batching
      The vmlinux image is decorated with ELF notes which tell the Xen
      domain builder what the kernel's requirements are; the domain builder
      then constructs the address space accordingly and starts the kernel.
      Xen has its own entrypoint for the kernel (contained in an ELF note).
      The ELF notes are set up by xen-head.S, which is included into head.S.
      In principle it could be linked separately, but it seems to provoke
      lots of binutils bugs.
      Because the domain builder starts the kernel in a fairly sane state
      (32-bit protected mode, paging enabled, flat segments set up), there's
      not a lot of setup needed before starting the kernel proper.  The main
      steps are:
        1. Install the Xen paravirt_ops, which is simply a matter of a
           structure assignment.
        2. Set init_mm to use the Xen-supplied pagetables (analogous to the
           head.S generated pagetables in a native boot).
        3. Reserve address space for Xen, since it takes a chunk at the top
           of the address space for its own use.
        4. Call start_kernel()
      Once we hit the main kernel boot sequence, it will end up calling back
      via paravirt_ops to set up various pieces of Xen specific state.  One
      of the critical things which requires a bit of extra care is the
      construction of the initial init_mm pagetable.  Because Xen places
      tight constraints on pagetables (an active pagetable must always be
      valid, and must always be mapped read-only to the guest domain), we
      need to be careful when constructing the new pagetable to keep these
      constraints in mind.  It turns out that the easiest way to do this is
      use the initial Xen-provided pagetable as a template, and then just
      insert new mappings for memory where a mapping doesn't already exist.
      This means that during pagetable setup, it uses a special version of
      xen_set_pte which ignores any attempt to remap a read-only page as
      read-write (since Xen will map its own initial pagetable as RO), but
      lets other changes to the ptes happen, so that things like NX are set
      When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
      This means that it is more privileged than user-mode in ring 3, but it
      still can't run privileged instructions directly.  Non-performance
      critical instructions are dealt with by taking a privilege exception
      and trapping into the hypervisor and emulating the instruction, but
      more performance-critical instructions have their own specific
      paravirt_ops.  In many cases we can avoid having to do any hypercalls
      for these instructions, or the Xen implementation is quite different
      from the normal native version.
      The privileged instructions fall into the broad classes of:
        Segmentation: setting up the GDT and the GDT entries, LDT,
           TLS and so on.  Xen doesn't allow the GDT to be directly
           modified; all GDT updates are done via hypercalls where the new
           entries can be validated.  This is important because Xen uses
           segment limits to prevent the guest kernel from damaging the
           hypervisor itself.
        Traps and exceptions: Xen uses a special format for trap entrypoints,
           so when the kernel wants to set an IDT entry, it needs to be
           converted to the form Xen expects.  Xen sets int 0x80 up specially
           so that the trap goes straight from userspace into the guest kernel
           without going via the hypervisor.  sysenter isn't supported.
        Kernel stack: The esp0 entry is extracted from the tss and provided to
        TLB operations: the various TLB calls are mapped into corresponding
           Xen hypercalls.
        Control registers: all the control registers are privileged.  The most
           important is cr3, which points to the base of the current pagetable,
           and we handle it specially.
      Another instruction we treat specially is CPUID, even though its not
      privileged.  We want to control what CPU features are visible to the
      rest of the kernel, and so CPUID ends up going into a paravirt_op.
      Xen implements this mainly to disable the ACPI and APIC subsystems.
      Xen maintains its own separate flag for masking events, which is
      contained within the per-cpu vcpu_info structure.  Because the guest
      kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
      ignored (and must be, because even if a guest domain disables
      interrupts for itself, it can't disable them overall).
      (A note on terminology: "events" and interrupts are effectively
      synonymous.  However, rather than using an "enable flag", Xen uses a
      "mask flag", which blocks event delivery when it is non-zero.)
      There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
      are implemented to manage the Xen event mask state.  The only thing
      worth noting is that when events are unmasked, we need to explicitly
      see if there's a pending event and call into the hypervisor to make
      sure it gets delivered.
      Xen needs a couple of upcall (or callback) functions to be implemented
      by each guest.  One is the event upcalls, which is how events
      (interrupts, effectively) are delivered to the guests.  The other is
      the failsafe callback, which is used to report errors in either
      reloading a segment register, or caused by iret.  These are
      implemented in i386/kernel/entry.S so they can jump into the normal
      iret_exc path when necessary.
      Xen provides a multicall mechanism, which allows multiple hypercalls
      to be issued at once in order to mitigate the cost of trapping into
      the hypervisor.  This is particularly useful for context switches,
      since the 4-5 hypercalls they would normally need (reload cr3, update
      TLS, maybe update LDT) can be reduced to one.  This patch implements a
      generic batching mechanism for hypercalls, which gets used in many
      places in the Xen code.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarChris Wright <chrisw@sous-sol.org>
      Cc: Ian Pratt <ian.pratt@xensource.com>
      Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
      Cc: Adrian Bunk <bunk@stusta.de>
  4. 17 Jul, 2007 1 commit
  5. 17 May, 2007 1 commit
  6. 02 May, 2007 1 commit
    • Andi Kleen's avatar
      [PATCH] x86: Drop cc-options call for all options supported in gcc 3.2+ · c8fdd247
      Andi Kleen authored
      The kernel only supports gcc 3.2+ now so it doesn't make sense
      anymore to explicitely check for options this compiler version
      already has.
      This actually fixes a bug. The -mprefered-stack-boundary check
      never worked because gcc rightly complains
        CC      arch/i386/kernel/asm-offsets.s
      cc1: -mpreferred-stack-boundary=2 is not between 4 and 12
      We just never saw the error because of cc-options.
      I changed it to 4 to actually work.
      Tested by compiling i386 and x86-64 defconfig with gcc 3.2.
      Should speed up the build time a tiny bit and improve
      stack usage on i386 slightly.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
  7. 26 Feb, 2007 1 commit
    • Ingo Molnar's avatar
      [PATCH] x86: add -freg-struct-return to CFLAGS · 25165120
      Ingo Molnar authored
      Jeremy Fitzhardinge suggested the use of -freg-struct-return, which does
      structure-returns (such as when using pte_t) in registers instead of on
      the stack.
      that is indeed so, and this option reduced the kernel size a bit:
          text    data     bss     dec     hex filename
       4799506  543456 3760128 9103090  8ae6f2 vmlinux.before
       4798117  543456 3760128 9101701  8ae185 vmlinux.after
      the resulting kernel booted fine on my testbox. Lets go for it.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 06 Dec, 2006 2 commits
  9. 21 Oct, 2006 2 commits
    • Andi Kleen's avatar
      [PATCH] x86: Use -maccumulate-outgoing-args · cdfce1f5
      Andi Kleen authored
      This avoids some problems with gcc 4.x and earlier generating
      invalid unwind information. In 4.1 the option is default
      when unwind information is enabled.
      And it seems to generate smaller code too, so it's probably
      a good thing on its own. With gcc 4.0:
      4683198  902112  480868 6066178  5c9002 vmlinux (before)
      4449895  902112  480868 5832875  5900ab vmlinux (after)
      4939761 1449584  648216 7037561  6b6279 vmlinux (before)
      4854193 1449584  648216 6951993  6a1439 vmlinux (after)
      On 4.1 it shouldn't make much difference because it is
      default when unwind is enabled anyways.
      Suggested by Michael Matz and Jan Beulich
      Cc: jbeulich@novell.com
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
    • Andrew Morton's avatar
      [PATCH] i386: fix .cfi_signal_frame copy-n-paste error · da8604cc
      Andrew Morton authored
      This was copied, pasted but not edited.
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
  10. 26 Sep, 2006 2 commits
    • Jan Beulich's avatar
      [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder · adf14236
      Jan Beulich authored
      Current gcc generates calls not jumps to noreturn functions. When that happens the
      return address can point to the next function, which confuses the unwinder.
      This patch works around it by marking asynchronous exception
      frames in contrast normal call frames in the unwind information.  Then teach
      the unwinder to decode this.
      For normal call frames the unwinder now subtracts one from the address which avoids
      this problem.  The standard libgcc unwinder uses the same trick.
      It doesn't include adjustment of the printed address (i.e. for the original
      example, it'd still be kernel_math_error+0 that gets displayed, but the
      unwinder wouldn't get confused anymore.
      This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
      unfortunately because earlier binutils don't support .cfi_signal_frame
      [AK: added automatic detection of the new binutils and wrote description]
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
    • Andi Kleen's avatar
      [PATCH] x86: Detect CFI support in the assembler at runtime · e2414910
      Andi Kleen authored
      ... instead of using a CONFIG option. The config option still controls
      if the resulting executable actually has unwind information.
      This is useful to prevent compilation errors when users select
      CONFIG_STACK_UNWIND on old binutils and also allows to use
      CFI in the future for non kernel debugging applications.
      Cc: jbeulich@novell.com
      Cc: sam@ravnborg.org
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
  11. 26 Mar, 2006 1 commit
  12. 25 Mar, 2006 1 commit
    • Andi Kleen's avatar
      [PATCH] x86_64: Don't define string functions to builtin · 6edfba1b
      Andi Kleen authored
      gcc should handle this anyways, and it causes problems when
      sprintf is turned into strcpy by gcc behind our backs and
      the C fallback version of strcpy is actually defining __builtin_strcpy
      Then drop -ffreestanding from the main Makefile because it isn't
      needed anymore and implies -fno-builtin, which is wrong now.
      (it was only added for x86-64, so dropping it should be safe)
      Noticed by Roman Zippel
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  13. 05 Mar, 2006 1 commit
  14. 17 Jan, 2006 1 commit
  15. 14 Jan, 2006 1 commit
  16. 09 Jan, 2006 1 commit
  17. 08 Jan, 2006 2 commits
  18. 30 Oct, 2005 1 commit
  19. 09 Sep, 2005 1 commit
    • Sam Ravnborg's avatar
      kbuild: full dependency check on asm-offsets.h · 86feeaa8
      Sam Ravnborg authored
      Building asm-offsets.h has been moved to a seperate Kbuild file
      located in the top-level directory. This allow us to share the
      functionality across the architectures.
      The old rules in architecture specific Makefiles will die
      in subsequent patches.
      Furhtermore the usual kbuild dependency tracking is now used
      when deciding to rebuild asm-offsets.s. So we no longer risk
      to fail a rebuild caused by asm-offsets.c dependencies being touched.
      With this common rule-set we now force the same name across
      all architectures. Following patches will fix the rest.
      Signed-off-by: default avatarSam Ravnborg <sam@ravnborg.org>
  20. 23 Jun, 2005 1 commit
  21. 05 May, 2005 1 commit
  22. 30 Apr, 2005 1 commit
    • Sam Ravnborg's avatar
      [PATCH] kbuild/i386: re-introduce dependency on vmlinux for install target, and add kernel_install · 2cacb3da
      Sam Ravnborg authored
      Removing the dependency on vmlinux for the install target raised a few
      complaints, so instead a new target i added: kernel_install.
      kernel_install will install the kernel just like the ordinary install target.
      The only difference is that install has a dependency on vmlinux,
      kernel_install does not. Therefore kernel_install is the best choice
      when accessing the kernel over a NFS mount or as another user.
      kernel_install is similar to modules_install in the fact that neither does
      a full kernel compile before performing the install.
      In this way they are good for root use. Also added back the
      dependency on vmlinux for the install target so peoples scripts are no
      longer broken.
      Signed-off-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  23. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      Let it rip!