1. 16 Jul, 2010 1 commit
  2. 09 Jul, 2010 2 commits
    • Will Deacon's avatar
      ARM: 6212/1: atomic ops: add memory constraints to inline asm · 398aa668
      Will Deacon authored
      Currently, the 32-bit and 64-bit atomic operations on ARM do not
      include memory constraints in the inline assembly blocks. In the
      case of barrier-less operations [for example, atomic_add], this
      means that the compiler may constant fold values which have actually
      been modified by a call to an atomic operation.
      
      This issue can be observed in the atomic64_test routine in
      <kernel root>/lib/atomic64_test.c:
      
      00000000 <test_atomic64>:
         0:	e1a0c00d 	mov	ip, sp
         4:	e92dd830 	push	{r4, r5, fp, ip, lr, pc}
         8:	e24cb004 	sub	fp, ip, #4
         c:	e24dd008 	sub	sp, sp, #8
        10:	e24b3014 	sub	r3, fp, #20
        14:	e30d000d 	movw	r0, #53261	; 0xd00d
        18:	e3011337 	movw	r1, #4919	; 0x1337
        1c:	e34c0001 	movt	r0, #49153	; 0xc001
        20:	e34a1aa3 	movt	r1, #43683	; 0xaaa3
        24:	e16300f8 	strd	r0, [r3, #-8]!
        28:	e30c0afe 	movw	r0, #51966	; 0xcafe
        2c:	e30b1eef 	movw	r1, #48879	; 0xbeef
        30:	e34d0eaf 	movt	r0, #57007	; 0xdeaf
        34:	e34d1ead 	movt	r1, #57005	; 0xdead
        38:	e1b34f9f 	ldrexd	r4, [r3]
        3c:	e1a34f90 	strexd	r4, r0, [r3]
        40:	e3340000 	teq	r4, #0
        44:	1afffffb 	bne	38 <test_atomic64+0x38>
        48:	e59f0004 	ldr	r0, [pc, #4]	; 54 <test_atomic64+0x54>
        4c:	e3a0101e 	mov	r1, #30
        50:	ebfffffe 	bl	0 <__bug>
        54:	00000000 	.word	0x00000000
      
      The atomic64_set (0x38-0x44) writes to the atomic64_t, but the
      compiler doesn't see this, assumes the test condition is always
      false and generates an unconditional branch to __bug. The rest of the
      test is optimised away.
      
      This patch adds suitable memory constraints to the atomic operations on ARM
      to ensure that the compiler is informed of the correct data hazards. We have
      to use the "Qo" constraints to avoid hitting the GCC anomaly described at
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44492
      
       , where the compiler
      makes assumptions about the writeback in the addressing mode used by the
      inline assembly. These constraints forbid the use of auto{inc,dec} addressing
      modes, so it doesn't matter if we don't use the operand exactly once.
      
      Cc: stable@kernel.org
      Reviewed-by: default avatarNicolas Pitre <nicolas.pitre@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      398aa668
    • Will Deacon's avatar
      ARM: 6211/1: atomic ops: fix register constraints for atomic64_add_unless · 068de8d1
      Will Deacon authored
      
      
      The atomic64_add_unless function compares an atomic variable with
      a given value and, if they are not equal, adds another given value
      to the atomic variable. The function returns zero if the addition
      did not occur and non-zero otherwise.
      
      On ARM, the return value is initialised to 1 in C code. Inline assembly
      code then performs the atomic64_add_unless operation, setting the
      return value to 0 iff the addition does not occur. This means that
      when the addition *does* occur, the value of ret must be preserved
      across the inline assembly and therefore requires a "+r" constraint
      rather than the current one of "=&r".
      
      Thanks to Nicolas Pitre for helping to spot this.
      
      Cc: stable@kernel.org
      Reviewed-by: default avatarNicolas Pitre <nicolas.pitre@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      068de8d1
  3. 01 Jul, 2010 1 commit
    • Will Deacon's avatar
      ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore · 534be1d5
      Will Deacon authored
      
      
      Linux expects that if a CPU modifies a memory location, then that
      modification will eventually become visible to other CPUs in the system.
      
      On an ARM11MPCore processor, loads are prioritised over stores so it is
      possible for a store operation to be postponed if a polling loop immediately
      follows it. If the variable being polled indirectly depends on the outstanding
      store [for example, another CPU may be polling the variable that is pending
      modification] then there is the potential for deadlock if interrupts are
      disabled. This deadlock occurs in the KGDB testsuire when executing on an
      SMP ARM11MPCore configuration.
      
      This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
      forcing a flushing of the write buffer on SMP systems before the next load
      takes place. If the Kernel is not compiled for SMP support, this will expand
      to a barrier() as before.
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      534be1d5
  4. 13 Jun, 2010 1 commit
  5. 27 May, 2010 1 commit
  6. 20 May, 2010 2 commits
  7. 17 May, 2010 5 commits
  8. 15 May, 2010 4 commits
  9. 14 May, 2010 1 commit
  10. 12 May, 2010 1 commit
  11. 08 May, 2010 1 commit
  12. 04 May, 2010 1 commit
  13. 02 May, 2010 7 commits
  14. 01 May, 2010 1 commit
  15. 29 Apr, 2010 2 commits
  16. 22 Apr, 2010 1 commit
  17. 21 Apr, 2010 1 commit
    • Russell King's avatar
      ARM: fix build error in arch/arm/kernel/process.c · 4260415f
      Russell King authored
      
      
      /tmp/ccJ3ssZW.s: Assembler messages:
      /tmp/ccJ3ssZW.s:1952: Error: can't resolve `.text' {.text section} - `.LFB1077'
      
      This is caused because:
      
      	.section .data
      	.section .text
      	.section .text
      	.previous
      
      does not return us to the .text section, but the .data section; this
      makes use of .previous dangerous if the ordering of previous sections
      is not known.
      
      Fix up the other users of .previous; .pushsection and .popsection are
      a safer pairing to use than .section and .previous.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      4260415f
  18. 14 Apr, 2010 3 commits
    • viresh kumar's avatar
    • Imre Deak's avatar
      ARM: 6051/1: VFP: preserve the HW context when calling signal handlers · 82c6f5a5
      Imre Deak authored
      
      
      From: Imre Deak <imre.deak@nokia.com>
      
      Signal handlers can use floating point, so prevent them to corrupt
      the main thread's VFP context. So far there were two signal stack
      frame formats defined based on the VFP implementation, but the user
      struct used for ptrace covers all posibilities, so use it for the
      signal stack too.
      
      Introduce also a new user struct for VFP exception registers. In
      this too fields not relevant to the current VFP architecture are
      ignored.
      
      Support to save / restore the exception registers was added by
      Will Deacon.
      Signed-off-by: default avatarImre Deak <imre.deak@nokia.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      82c6f5a5
    • Nicolas Pitre's avatar
      ARM: 6007/1: fix highmem with VIPT cache and DMA · 7e5a69e8
      Nicolas Pitre authored
      
      
      The VIVT cache of a highmem page is always flushed before the page
      is unmapped.  This cache flush is explicit through flush_cache_kmaps()
      in flush_all_zero_pkmaps(), or through __cpuc_flush_dcache_area() in
      kunmap_atomic().  There is also an implicit flush of those highmem pages
      that were part of a process that just terminated making those pages free
      as the whole VIVT cache has to be flushed on every task switch. Hence
      unmapped highmem pages need no cache maintenance in that case.
      
      However unmapped pages may still be cached with a VIPT cache because the
      cache is tagged with physical addresses.  There is no need for a whole
      cache flush during task switching for that reason, and despite the
      explicit cache flushes in flush_all_zero_pkmaps() and kunmap_atomic(),
      some highmem pages that were mapped in user space end up still cached
      even when they become unmapped.
      
      So, we do have to perform cache maintenance on those unmapped highmem
      pages in the context of DMA when using a VIPT cache.  Unfortunately,
      it is not possible to perform that cache maintenance using physical
      addresses as all the L1 cache maintenance coprocessor functions accept
      virtual addresses only.  Therefore we have no choice but to set up a
      temporary virtual mapping for that purpose.
      
      And of course the explicit cache flushing when unmapping a highmem page
      on a system with a VIPT cache now can go, which should increase
      performance.
      
      While at it, because the code in __flush_dcache_page() has to be modified
      anyway, let's also make sure the mapped highmem pages are pinned with
      kmap_high_get() for the duration of the cache maintenance operation.
      Because kunmap() does unmap highmem pages lazily, it was reported by
      Gary King <GKing@nvidia.com> that those pages ended up being unmapped
      during cache maintenance on SMP causing segmentation faults.
      Signed-off-by: default avatarNicolas Pitre <nico@marvell.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      7e5a69e8
  19. 29 Mar, 2010 2 commits
  20. 25 Mar, 2010 2 commits
    • Catalin Marinas's avatar
      ARM: 5996/1: ARM: Change the mandatory barriers implementation (4/4) · e7c5650f
      Catalin Marinas authored
      
      
      The mandatory barriers (mb, rmb, wmb) are used even on uniprocessor
      systems for things like ordering Normal Non-cacheable memory accesses
      with DMA transfer (via Device memory writes). The current implementation
      uses dmb() for mb() and friends but this is not sufficient. The DMB only
      ensures the relative ordering of the observability of accesses by other
      processors or devices acting as masters. In case of DMA transfers
      started by writes to device memory, the relative ordering is not ensured
      because accesses to slave ports of a device are not considered
      observable by the DMB definition.
      
      A DSB is required for the data to reach the main memory (even if mapped
      as Normal Non-cacheable) before the device receives the notification to
      begin the transfer. Furthermore, some L2 cache controllers (like L2x0 or
      PL310) buffer stores to Normal Non-cacheable memory and this would need
      to be drained with the outer_sync() function call.
      
      The patch also allows platforms to define their own mandatory barriers
      implementation by selecting CONFIG_ARCH_HAS_BARRIERS and providing a
      mach/barriers.h file.
      
      Note that the SMP barriers are unchanged (being DMBs as before) since
      they are only guaranteed to work with Normal Cacheable memory.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      e7c5650f
    • Catalin Marinas's avatar
      ARM: 5994/1: ARM: Add outer_cache_fns.sync function pointer (2/4) · 319f551a
      Catalin Marinas authored
      
      
      This patch introduces the outer_cache_fns.sync function pointer together
      with the OUTER_CACHE_SYNC config option that can be used to drain the
      write buffer of the outer cache.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      319f551a