1. 26 Apr, 2013 1 commit
    • Michael Neuling's avatar
      powerpc: Add isync to copy_and_flush · 29ce3c50
      Michael Neuling authored
      In __after_prom_start we copy the kernel down to zero in two calls to
      copy_and_flush.  After the first call (copy from 0 to copy_to_here:)
      we jump to the newly copied code soon after.
      Unfortunately there's no isync between the copy of this code and the
      jump to it.  Hence it's possible that stale instructions could still be
      in the icache or pipeline before we branch to it.
      We've seen this on real machines and it's results in no console output
        calling quiesce...
        returning from prom_init
      The below adds an isync to ensure that the copy and flushing has
      completed before any branching to the new instructions occurs.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  2. 09 Jan, 2013 2 commits
    • Jimi Xenidis's avatar
      powerpc/kexec: Add kexec "hold" support for Book3e processors · 96f013fe
      Jimi Xenidis authored
      IBM Blue Gene/Q comes with some very strange firmware that I'm trying to get out
      of using in the kernel.  So instead I spin all the threads in the boot wrapper
      (using the firmware) and have them enter the kexec stub, pre-translated at the
      virtual "linear" address, never touching firmware again.
      This works strategy works wonderfully, but I need the following patch in the
      kexec stub. I believe it should not effect Book3S and Book3E does not appear
      to be here yet so I'd love to get any criticisms up front.
      This patch adds two items:
      1) Book3e requires that GPR4 survive the "hold" process, so we make
         sure that happens.
      2) Book3e has no real mode, and the hold code exploits this.  Since
         these processors ares always translated, we arrange for the kexeced
         threads to enter the hold code using the normal kernel linear mapping.
      Signed-off-by: default avatarJimi Xenidis <jimix@pobox.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    • Anton Blanchard's avatar
      powerpc: Build kernel with -mcmodel=medium · 1fbe9cf2
      Anton Blanchard authored
      Finally remove the two level TOC and build with -mcmodel=medium.
      Unfortunately we can't build modules with -mcmodel=medium due to
      the tricks the kernel module loader plays with percpu data:
      # -mcmodel=medium breaks modules because it uses 32bit offsets from
      # the TOC pointer to create pointers where possible. Pointers into the
      # percpu data area are created by this method.
      # The kernel module loader relocates the percpu data section from the
      # original location (starting with 0xd...) to somewhere in the base
      # kernel percpu data space (starting with 0xc...). We need a full
      # 64bit relocation for this to work, hence -mcmodel=large.
      On older kernels we fall back to the two level TOC (-mminimal-toc)
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  3. 14 Nov, 2012 3 commits
    • Michael Neuling's avatar
      powerpc: Add relocation on exception vector handlers · c1fb6816
      Michael Neuling authored
      POWER8/v2.07 allows exceptions to be taken with the MMU still on.
      A new set of exception vectors is added at 0xc000_0000_0000_4xxx.  When the HW
      takes us here, MSR IR/DR will be set already and we no longer need a costly
      RFID to turn the MMU back on again.
      The original 0x0 based exception vectors remain for when the HW can't leave the
      MMU on.  Examples of this are when we can't trust the current MMU mappings,
      like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU
      was off already.  In these cases the HW will take us to the original 0x0 based
      exception vectors with the MMU off as before.
      This uses the new macros added previously too implement these new execption
      vectors at 0xc000_0000_0000_4xxx.  We exit these exception vectors using
      mflr/blr (rather than mtspr SSR0/RFID), since we don't need the costly MMU
      switch anymore.
      This moves the __end_interrupts marker down past these new 0x4000 vectors since
      they will need to be copied down to 0x0 when the kernel is not at 0x0.
      Signed-off-by: default avatarMatt Evans <matt@ozlabs.org>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    • Anton Blanchard's avatar
      powerpc: Fix CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n build · 11ee7e99
      Anton Blanchard authored
      If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n,
      the kernel fails when we run at a non zero offset. It turns out
      we were incorrectly wrapping some of the relocatable kernel code
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    • Benjamin Herrenschmidt's avatar
      powerpc/powernv: Fix OPAL debug entry · ab7f961a
      Benjamin Herrenschmidt authored
      OPAL provides the firmware base/entry in registers at boot time
      for debugging purposes. We had a bug in the code trying to stash
      these into the appropriate kernel globals (a line of code was
      probably dropped by accident back when this was merged)
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  4. 08 Mar, 2012 2 commits
    • Benjamin Herrenschmidt's avatar
      powerpc: Rework lazy-interrupt handling · 7230c564
      Benjamin Herrenschmidt authored
      The current implementation of lazy interrupts handling has some
      issues that this tries to address.
      We don't do the various workarounds we need to do when re-enabling
      interrupts in some cases such as when returning from an interrupt
      and thus we may still lose or get delayed decrementer or doorbell
      The current scheme also makes it much harder to handle the external
      "edge" interrupts provided by some BookE processors when using the
      EPR facility (External Proxy) and the Freescale Hypervisor.
      Additionally, we tend to keep interrupts hard disabled in a number
      of cases, such as decrementer interrupts, external interrupts, or
      when a masked decrementer interrupt is pending. This is sub-optimal.
      This is an attempt at fixing it all in one go by reworking the way
      we do the lazy interrupt disabling from the ground up.
      The base idea is to replace the "hard_enabled" field with a
      "irq_happened" field in which we store a bit mask of what interrupt
      occurred while soft-disabled.
      When re-enabling, either via arch_local_irq_restore() or when returning
      from an interrupt, we can now decide what to do by testing bits in that
      We then implement replaying of the missed interrupts either by
      re-using the existing exception frame (in exception exit case) or via
      the creation of a new one from an assembly trampoline (in the
      arch_local_irq_enable case).
      This removes the need to play with the decrementer to try to create
      fake interrupts, among others.
      In addition, this adds a few refinements:
       - We no longer  hard disable decrementer interrupts that occur
      while soft-disabled. We now simply bump the decrementer back to max
      (on BookS) or leave it stopped (on BookE) and continue with hard interrupts
      enabled, which means that we'll potentially get better sample quality from
      performance monitor interrupts.
       - Timer, decrementer and doorbell interrupts now hard-enable
      shortly after removing the source of the interrupt, which means
      they no longer run entirely hard disabled. Again, this will improve
      perf sample quality.
       - On Book3E 64-bit, we now make the performance monitor interrupt
      act as an NMI like Book3S (the necessary C code for that to work
      appear to already be present in the FSL perf code, notably calling
      nmi_enter instead of irq_enter). (This also fixes a bug where BookE
      perfmon interrupts could clobber r14 ... oops)
       - We could make "masked" decrementer interrupts act as NMIs when doing
      timer-based perf sampling to improve the sample quality.
      Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      - Add hard-enable to decrementer, timer and doorbells
      - Fix CR clobber in masked irq handling on BookE
      - Make embedded perf interrupt act as an NMI
      - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
        to retrigger an interrupt without preventing hard-enable
       - Fix or vs. ori bug on Book3E
       - Fix enabling of interrupts for some exceptions on Book3E
       - Fix resend of doorbells on return from interrupt on Book3E
       - Rebased on top of my latest series, which involves some significant
      rework of some aspects of the patch.
       - 32-bit compile fix
       - more compile fixes with various .config combos
       - factor out the asm code to soft-disable interrupts
       - remove the C wrapper around preempt_schedule_irq
       - Fix a bug with hard irq state tracking on native power7
    • Benjamin Herrenschmidt's avatar
      powerpc: Remove legacy iSeries bits from assembly files · 4f8cf36f
      Benjamin Herrenschmidt authored
      This removes the various bits of assembly in the kernel entry,
      exception handling and SLB management code that were specific
      to running under the legacy iSeries hypervisor which is no
      longer supported.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  5. 20 Sep, 2011 2 commits
    • Benjamin Herrenschmidt's avatar
      powerpc/powernv: Support for OPAL console · daea1175
      Benjamin Herrenschmidt authored
      This adds a udbg and an hvc console backend for supporting a console
      using the OPAL console interfaces.
      On OPAL v1 we have hvc0 mapped to whatever console the system was
      configured for (network or hvsi serial port) via the service
      On OPAL v2 we have hvcN mapped to the Nth console provided by OPAL
      which generally corresponds to:
      	hvc0 : network console (raw protocol)
      	hvc1 : serial port S1 (hvsi)
      	hvc2 : serial port S2 (hvsi)
      Note: At this point, early debug console only works with OPAL v1
      and shouldn't be enabled in a normal kernel.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    • Benjamin Herrenschmidt's avatar
      powerpc/powernv: Add OPAL takeover from PowerVM · 27f44888
      Benjamin Herrenschmidt authored
      On machines supporting the OPAL firmware version 1, the system
      is initially booted under pHyp. We then use a special hypercall
      to verify if OPAL is available and if it is, we then trigger
      a "takeover" which disables pHyp and loads the OPAL runtime
      firmware, giving control to the kernel in hypervisor mode.
      This patch add the necessary code to detect that the OPAL takeover
      capability is present when running under PowerVM (aka pHyp) and
      perform said takeover to get hypervisor control of the processor.
      To perform the takeover, we must first use RTAS (within Open
      Firmware runtime environment) to start all processors & threads,
      in order to give control to OPAL on all of them. We then call
      the takeover hypercall on everybody, OPAL will re-enter the kernel
      main entry point passing it a flat device-tree.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  6. 19 Sep, 2011 1 commit
  7. 17 Jun, 2011 1 commit
  8. 18 May, 2011 1 commit
    • Milton Miller's avatar
      powerpc: Don't search for paca in freed memory · 768d18ad
      Milton Miller authored
      Starting with 1426d5a3 (powerpc:
      Dynamically allocate pacas) we free the memory for pacas beyond
      cpu_possible, but we failed to update the loop the secondary cpus use
      to find their paca.  If the system has running cpu threads for which
      the kernel did not allocate a paca for they will search the memory that
      was freed.  For instance this could happen when the device tree for
      a kdump kernel was not updated after a cpu hotplug, or the kernel is
      running with more cpus than the kernel was configured.
      Since c1854e00 (powerpc: Set nr_cpu_ids
      early and use it to free PACAs) we set nr_cpu_ids before telling the
      cpus to advance, so use that to limit the search.
      We can't reference nr_cpu_ids without CONFIG_SMP because it is defined
      as 1 instead of a memory location, but any extra threads should be sent
      to kexec_wait in that case anyways, so make that explicit and remove
      the search loop for UP.
      Note to stable: The fix also requires
      c1854e00 (powerpc: Set
      nr_cpu_ids early and use it to free PACAs) to function.  Also
      9d07bc84 (Properly handshake CPUs going
      out of boot spin loop) affects the second chunk, specifically the branch
      target was 3b before and is 4b after that patch, and there was a blank
      line before the #ifdef CONFIG_SMP that was removed
      Cc: <stable@kernel.org> # .34.x: c1854e00
       powerpc: Set nr_cpu_ids early
      Cc: <stable@kernel.org> # .34.x
      Signed-off-by: default avatarMilton Miller <miltonm@bga.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  9. 26 Apr, 2011 1 commit
  10. 19 Apr, 2011 4 commits
  11. 31 Mar, 2011 2 commits
  12. 08 Dec, 2010 1 commit
  13. 28 Nov, 2010 1 commit
  14. 24 Oct, 2010 1 commit
  15. 30 Aug, 2010 1 commit
    • Michael Neuling's avatar
      powerpc: Don't use kernel stack with translation off · 54a83404
      Michael Neuling authored
      In f761622e
       we changed
      early_setup_secondary so it's called using the proper kernel stack
      rather than the emergency one.
      Unfortunately, this stack pointer can't be used when translation is off
      on PHYP as this stack pointer might be outside the RMO.  This results in
      the following on all non zero cpus:
        cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
            pc: 000000000001c50c
            lr: 000000000000821c
            sp: c00000001639ff90
           msr: 8000000000001000
           dar: c00000001639ffa0
         dsisr: 42000000
          current = 0xc000000016393540
          paca    = 0xc000000006e00200
            pid   = 0, comm = swapper
      The original patch was only tested on bare metal system, so it never
      caught this problem.
      This changes __secondary_start so that we calculate the new stack
      pointer but only start using it after we've called early_setup_secondary.
      With this patch, the above problem goes away.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  16. 23 Aug, 2010 1 commit
    • Matt Evans's avatar
      powerpc: Initialise paca->kstack before early_setup_secondary · f761622e
      Matt Evans authored
      As early setup calls down to slb_initialize(), we must have kstack
      initialised before checking "should we add a bolted SLB entry for our kstack?"
      Failing to do so means stack access requires an SLB miss exception to refill
      an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text
      & static data).  It's not always allowable to take such a miss, and
      intermittent crashes will result.
      Primary CPUs don't have this issue; an SLB entry is not bolted for their
      stack anyway (as that lives within SLB(0)).  This patch therefore only
      affects the init of secondaries.
      Signed-off-by: default avatarMatt Evans <matt@ozlabs.org>
      Cc: stable <stable@kernel.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  17. 17 May, 2010 1 commit
  18. 08 Mar, 2010 2 commits
  19. 04 Nov, 2009 1 commit
  20. 19 Aug, 2009 3 commits
    • Benjamin Herrenschmidt's avatar
      powerpc: Remaining 64-bit Book3E support · 2d27cfd3
      Benjamin Herrenschmidt authored
      This contains all the bits that didn't fit in previous patches :-) This
      includes the actual exception handlers assembly, the changes to the
      kernel entry, other misc bits and wiring it all up in Kconfig.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    • Benjamin Herrenschmidt's avatar
      powerpc: Use names rather than numbers for SPRGs (v2) · ee43eb78
      Benjamin Herrenschmidt authored
      The kernel uses SPRG registers for various purposes, typically in
      low level assembly code as scratch registers or to hold per-cpu
      global infos such as the PACA or the current thread_info pointer.
      We want to be able to easily shuffle the usage of those registers
      as some implementations have specific constraints realted to some
      of them, for example, some have userspace readable aliases, etc..
      and the current choice isn't always the best.
      This patch should not change any code generation, and replaces the
      usage of SPRN_SPRGn everywhere in the kernel with a named replacement
      and adds documentation next to the definition of the names as to
      what those are used for on each processor family.
      The only parts that still use the original numbers are bits of KVM
      or suspend/resume code that just blindly needs to save/restore all
      the SPRGs.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    • Benjamin Herrenschmidt's avatar
      powerpc: Rename exception.h to exception-64s.h · 8aa34ab8
      Benjamin Herrenschmidt authored
      The file include/asm/exception.h contains definitions
      that are specific to exception handling on 64-bit server
      type processors.
      This renames the file to exception-64s.h to reflect that
      fact and avoid confusion.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  21. 09 Jun, 2009 2 commits
  22. 11 Mar, 2009 1 commit
    • Benjamin Herrenschmidt's avatar
      powerpc/kconfig: Kill PPC_MULTIPLATFORM · 28794d34
      Benjamin Herrenschmidt authored
      CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
      really meaningful anymore. It was basically equivalent to PPC64 || 6xx.
      This removes it along with the following changes:
       - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
         on 6xx which is what they want anyway.
       - A new symbol, PPC_BOOK3S, is defined that represent compliance with
         the "Server" variant of the architecture. This is set when either 6xx
         or PPC64 is set and open the door for future BOOK3E 64-bit.
       - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
         PPC64 && PPC_BOOK3S
       - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
         used to control the use of prom_init.c
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  23. 12 Jan, 2009 1 commit
    • Benjamin Herrenschmidt's avatar
      powerpc/powermac: Fix occasional SMP boot failure · c478b581
      Benjamin Herrenschmidt authored
      The PowerMac kernel occasionally fails to bring up the secondary CPUs on
      SMP, the trigger factor seem to be fairly random and related to location
      of code and data.
      This appears to be due to the initial loading of the TOC value by the
      secondary processor which now happens before we clear HID4:RM_CI (Real
      Mode Cache Invalidate). This bit should really be cleared before we do
      any load or store other than fetching code.
      This fix works based on the assumption that all SMP 64-bit PowerMacs use
      variants of the 970, which fortunately is true, by explicitely clearing
      that bit, adding an slbia for good measure as RM_CI mode is known to
      create bogus ERAT entries.
      I also removed some spurrious debug output that was left enabled by
      mistake while at it.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  24. 30 Oct, 2008 1 commit
    • Milton Miller's avatar
      powerpc/ppc64/kdump: Better flag for running relocatable · 8b8b0cc1
      Milton Miller authored
      The __kdump_flag ABI is overly constraining for future development.
      As of 2.6.27, the kernel entry point has 4 constraints:  Offset 0 is
      the starting point for the master (boot) cpu (entered with r3 pointing
      to the device tree structure), offset 0x60 is code for the slave cpus
      (entered with r3 set to their device tree physical id), offset 0x20 is
      used by the iseries hypervisor, and secondary cpus must be well behaved
      when the first 256 bytes are copied to address 0.
      Placing the __kdump_flag at 0x18 is bad because:
      - It was taking the last 8 bytes before the iseries hypervisor data.
      - It was 8 bytes for a boolean flag
      - It had no way of identifying that the flag was present
      - It does leave any room for the master to add any additional code
        before branching, which hurts debug.
      - It will be unnecessarily hard for 32 bit code to be common (8 bytes)
      Now that we have eliminated the use of __kdump_flag in favor of
      the standard is_kdump_kernel(), this flag only controls run without
      relocating the kernel to PHYSICAL_START (0), so rename it __run_at_load.
      Move the flag to 0x5c, 1 word before the secondary cpu entry point at
      0x60.  Initialize it with "run0" to say it will run at 0 unless it is
      set to 1.  It only exists if we are relocatable.
      Signed-off-by: default avatarMilton Miller <miltonm@bga.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
  25. 21 Oct, 2008 1 commit
    • Mohan Kumar M's avatar
      powerpc: Support for relocatable kdump kernel · 54622f10
      Mohan Kumar M authored
      This adds relocatable kernel support for kdump. With this one can
      use the same regular kernel to capture the kdump. A signature (0xfeed1234)
      is passed in r6 from panic code to the next kernel through kexec_sequence
      and purgatory code. The signature is used to differentiate between
      kdump kernel and non-kdump kernels.
      The purgatory code compares the signature and sets the __kdump_flag in
      head_64.S.  During the boot up, kernel code checks __kdump_flag and if it
      is set, the kernel will behave as relocatable kdump kernel. This kernel
      will boot at the address where it was loaded by kexec-tools ie. at the
      address reserved through crashkernel boot parameter.
      CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
      kernel as relocatable. So the same kernel can be used as production and
      kdump kernel.
      This patch incorporates the changes suggested by Paul Mackerras to avoid
      GOT use and to avoid two copies of the code.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMohan Kumar M <mohan@in.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  26. 15 Sep, 2008 2 commits
    • Paul Mackerras's avatar
      powerpc: Make the 64-bit kernel as a position-independent executable · 549e8152
      Paul Mackerras authored
      This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
      a position-independent executable (PIE) when it is set.  This involves
      processing the dynamic relocations in the image in the early stages of
      booting, even if the kernel is being run at the address it is linked at,
      since the linker does not necessarily fill in words in the image for
      which there are dynamic relocations.  (In fact the linker does fill in
      such words for 64-bit executables, though not for 32-bit executables,
      so in principle we could avoid calling relocate() entirely when we're
      running a 64-bit kernel at the linked address.)
      The dynamic relocations are processed by a new function relocate(addr),
      where the addr parameter is the virtual address where the image will be
      run.  In fact we call it twice; once before calling prom_init, and again
      when starting the main kernel.  This means that reloc_offset() returns
      0 in prom_init (since it has been relocated to the address it is running
      at), which necessitated a few adjustments.
      This also changes __va and __pa to use an equivalent definition that is
      simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
      constants (for 64-bit) whereas PHYSICAL_START is a variable (and
      KERNELBASE ideally should be too, but isn't yet).
      With this, relocatable kernels still copy themselves down to physical
      address 0 and run there.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    • Paul Mackerras's avatar
      powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit · e31aa453
      Paul Mackerras authored
      Using LOAD_REG_IMMEDIATE to get the address of kernel symbols
      generates 5 instructions where LOAD_REG_ADDR can do it in one,
      and will generate R_PPC64_ADDR16_* relocations in the output when
      we get to making the kernel as a position-independent executable,
      which we'd rather not have to handle.  This changes various bits
      of assembly code to use LOAD_REG_ADDR when we need to get the
      address of a symbol, or to use suitable position-independent code
      for cases where we can't access the TOC for various reasons, or
      if we're not running at the address we were linked at.
      It also cleans up a few minor things; there's no reason to save and
      restore SRR0/1 around RTAS calls, __mmu_off can get the return
      address from LR more conveniently than the caller can supply it in
      R4 (and we already assume elsewhere that EA == RA if the MMU is on
      in early boot), and enable_64b_mode was using 5 instructions where
      2 would do.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>