1. 06 Feb, 2011 2 commits
  2. 11 Jan, 2011 1 commit
  3. 10 Jan, 2011 1 commit
    • Jesse Larrew's avatar
      powerpc/pseries: Fix VPHN build errors on non-SMP systems · 39bf990e
      Jesse Larrew authored
      The header asm/hvcall.h was previously included indirectly via
      smp.h. On non-SMP systems, however, these declarations are excluded
      and the build breaks. This is easily fixed by including asm/hvcall.h
      The VPHN feature is only meaningful on NUMA systems that implement
      the SPLPAR option, so exclude the VPHN code on systems without
      SPLPAR enabled.
      Also, expose unmap_cpu_from_node() on systems with SPLPAR enabled,
      even if CONFIG_HOTPLUG_CPU is disabled.
      Lastly, map_cpu_to_node() is now needed by VPHN to manipulate the
      node masks after boot time, so remove the __cpuinit annotation to
      fix a section mismatch.
      Signed-off-by: default avatarJesse Larrew <jlarrew@linux.vnet.ibm.com>
  4. 08 Dec, 2010 1 commit
  5. 28 Nov, 2010 1 commit
  6. 12 Oct, 2010 1 commit
    • Yinghai Lu's avatar
      memblock, bootmem: Round pfn properly for memory and reserved regions · c7fc2de0
      Yinghai Lu authored
      We need to round memory regions correctly -- specifically, we need to
      round reserved region in the more expansive direction (lower limit
      down, upper limit up) whereas usable memory regions need to be rounded
      in the more restrictive direction (lower limit up, upper limit down).
      This introduces two set of inlines:
      Although they are antisymmetric (and therefore are technically
      duplicates) the use of the different inlines explicitly documents the
      programmer's intention.
      The lack of proper rounding caused a bug on ARM, which was then found
      to also affect other architectures.
      Reported-by: default avatarRussell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB4CDFD.4020105@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
  7. 03 Aug, 2010 1 commit
  8. 22 Jul, 2010 1 commit
  9. 14 Jul, 2010 1 commit
  10. 08 Jul, 2010 1 commit
    • Anton Blanchard's avatar
      powerpc/numa: Use form 1 affinity to setup node distance · 41eab6f8
      Anton Blanchard authored
      Form 1 affinity allows multiple entries in ibm,associativity-reference-points
      which represent affinity domains in decreasing order of importance. The
      Linux concept of a node is always the first entry, but using the other
      values as an input to node_distance() allows the memory allocator to make
      better decisions on which node to go first when local memory has been
      We keep things simple and create an array indexed by NUMA node, capped at
      4 entries. Each time we lookup an associativity property we initialise
      the array which is overkill, but since we should only hit this path during
      boot it didn't seem worth adding a per node valid bit.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  11. 21 May, 2010 1 commit
  12. 06 May, 2010 1 commit
  13. 28 Apr, 2010 1 commit
    • Anton Blanchard's avatar
      powerpc/numa: Add form 1 NUMA affinity · 4b83c330
      Anton Blanchard authored
      Firmware changed the way it represents memory and cpu affinity on POWER7.
      Unfortunately the old method now caps the topology to work around issues
      with legacy operating systems. For Linux to get the correct topology we
      need to use the new form 1 affinity information.
      We set the form 1 field in the client architecture, and if we see "1" in the
      ibm,associativity-form property firmware supports form 1 affinity and
      we should look at the first field in the ibm,associativity-reference-points
      array. If not we use the second field as we always have.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  14. 06 Mar, 2010 1 commit
  15. 09 Jun, 2009 1 commit
  16. 22 Feb, 2009 1 commit
    • Nathan Fontenot's avatar
      powerpc/numa: Cleanup hot_add_scn_to_nid · 0f16ef7f
      Nathan Fontenot authored
      This patch reworks the hot_add_scn_to_nid and its supporting functions
      to make them easier to understand.  There are no functional changes in
      this patch and has been tested on machine with memory represented in the
      device tree as memory nodes and in the ibm,dynamic-memory property.
      My previous patch that introduced support for hotplug memory add on
      systems whose memory was represented by the ibm,dynamic-memory property
      of the device tree only left the code more unintelligible.  This
      will hopefully makes things easier to understand.
      Signed-off-by: default avatarNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  17. 12 Feb, 2009 1 commit
    • Dave Hansen's avatar
      powerpc/mm: Fix numa reserve bootmem page selection · 06eccea6
      Dave Hansen authored
      Fix the powerpc NUMA reserve bootmem page selection logic.
      commit 8f64e1f2
       (powerpc: Reserve
      in bootmem lmb reserved regions that cross NUMA nodes) changed
      the logic for how the powerpc LMB reserved regions were converted
      to bootmen reserved regions.  As the folowing discussion reports,
      the new logic was not correct.
      mark_reserved_regions_for_nid() goes through each LMB on the
      system that specifies a reserved area.  It searches for
      active regions that intersect with that LMB and are on the
      specified node.  It attempts to bootmem-reserve only the area
      where the active region and the reserved LMB intersect.  We
      can not reserve things on other nodes as they may not have
      bootmem structures allocated, yet.
      We base the size of the bootmem reservation on two possible
      things.  Normally, we just make the reservation start and
      stop exactly at the start and end of the LMB.
      However, the LMB reservations are not aware of NUMA nodes and
      on occasion a single LMB may cross into several adjacent
      active regions.  Those may even be on different NUMA nodes
      and will require separate calls to the bootmem reserve
      functions.  So, the bootmem reservation must be trimmed to
      fit inside the current active region.
      That's all fine and dandy, but we trim the reservation
      in a page-aligned fashion.  That's bad because we start the
      reservation at a non-page-aligned address: physbase.
      The reservation may only span 2 bytes, but that those bytes
      may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
      Take the case where you reserve 0x2 bytes at 0x0fff and
      where the active region ends at 0x1000.  You'll jump into
      that if() statment, but node_ar.end_pfn=0x1 and
      start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
      and then call
        reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
      0x1000 may not be on the same node as 0xfff.  Oops.
      In almost all the vm code, end_<anything> is not inclusive.
      If you have an end_pfn of 0x1234, page 0x1234 is not
      included in the range.  Using PFN_UP instead of the
      (>> >> PAGE_SHIFT) will make this consistent with the other VM
      We also need to do math for the reserved size with physbase
      instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
      *precisely* the end of the node.  However,
      (start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
      of the reserved area.  That is, of course, physbase.
      If we don't use physbase here, the reserve_size can be
      made too large.
      From: Dave Hansen <dave@linux.vnet.ibm.com>
      Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  18. 10 Feb, 2009 2 commits
  19. 07 Jan, 2009 4 commits
  20. 15 Dec, 2008 1 commit
    • Dave Hansen's avatar
      powerpc: Fix bootmem reservation on uninitialized node · a4c74ddd
      Dave Hansen authored
      careful_allocation() was calling into the bootmem allocator for
      nodes which had not been fully initialized and caused a previous
      bug:  http://patchwork.ozlabs.org/patch/10528/
        So, I merged a
      few broken out loops in do_init_bootmem() to fix it.  That changed
      the code ordering.
      I think this bug is triggered by having reserved areas for a node
      which are spanned by another node's contents.  In the
      mark_reserved_regions_for_nid() code, we attempt to reserve the
      area for a node before we have allocated the NODE_DATA() for that
      nid.  We do this since I reordered that loop.  I suck.
      This is causing crashes at bootup on some systems, as reported
      by Jon Tollefson.
      This may only present on some systems that have 16GB pages
      reserved.  But, it can probably happen on any system that is
      trying to reserve large swaths of memory that happen to span other
      nodes' contents.
      This commit ensures that we do not touch bootmem for any node which
      has not been initialized, and also removes a compile warning about
      an unused variable.
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
  21. 30 Nov, 2008 1 commit
    • Dave Hansen's avatar
      powerpc: Fix boot freeze on machine with empty memory node · 4a618669
      Dave Hansen authored
      I got a bug report about a distro kernel not booting on a particular
      machine.  It would freeze during boot:
      > ...
      > Could not find start_pfn for node 1
      > [boot]0015 Setup Done
      > Built 2 zonelists in Node order, mobility grouping on.  Total pages: 123783
      > Policy zone: DMA
      > Kernel command line:
      > [boot]0020 XICS Init
      > [boot]0021 XICS Done
      > PID hash table entries: 4096 (order: 12, 32768 bytes)
      > clocksource: timebase mult[7d0000] shift[22] registered
      > Console: colour dummy device 80x25
      > console handover: boot [udbg0] -> real [hvc0]
      > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
      > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
      > freeing bootmem node 0
      I've reproduced this on  It is caused by commit
       ("powerpc: Reserve in bootmem
      lmb reserved regions that cross NUMA nodes").
      The problem is that Jon took a loop which was (in pseudocode):
      		NODE_DATA(nid) = careful_alloc(nid);
      and broke it up into:
      		NODE_DATA(nid) = careful_alloc(nid);
      The issue comes in when the 'careful_alloc()' is called on a node with
      no memory.  It falls back to using bootmem from a previously-initialized
      node.  But, bootmem has not yet been reserved when Jon's patch is
      applied.  It gives back bogus memory (0xc000000000000000) and pukes
      later in boot.
      The following patch collapses the loop back together.  It also breaks
      the mark_reserved_regions_for_nid() code out into a function and adds
      some comments.  I think a huge part of introducing this bug is because
      for loop was too long and hard to read.
      The actual bug fix here is the:
      +		if (end_pfn <= node->node_start_pfn ||
      +		    start_pfn >= node_end_pfn)
      +			continue;
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
  22. 20 Oct, 2008 2 commits
  23. 09 Oct, 2008 1 commit
    • Jon Tollefson's avatar
      powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes · 8f64e1f2
      Jon Tollefson authored
      If there are multiple reserved memory blocks via lmb_reserve() that are
      contiguous addresses and on different NUMA nodes we are losing track of which
      address ranges to reserve in bootmem on which node.  I discovered this
      when I recently got to try 16GB huge pages on a system with more then 2 nodes.
      When scanning the device tree in early boot we call lmb_reserve() with
      the addresses of the 16G pages that we find so that the memory doesn't
      get used for something else.  For example the addresses for the pages
      could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages,
      one on each of eight nodes.  In the lmb after all the pages have been
      reserved it will look something like the following:
          memory.cnt            = 0x2
          memory.size           = 0x3e80000000
          memory.region[0x0].base       = 0x0
                            .size     = 0x1e80000000
          memory.region[0x1].base       = 0x4000000000
                            .size     = 0x2000000000
          reserved.cnt          = 0x5
          reserved.size         = 0x3e80000000
          reserved.region[0x0].base       = 0x0
                            .size     = 0x7b5000
          reserved.region[0x1].base       = 0x2a00000
                            .size     = 0x78c000
          reserved.region[0x2].base       = 0x328c000
                            .size     = 0x43000
          reserved.region[0x3].base       = 0xf4e8000
                            .size     = 0xb18000
          reserved.region[0x4].base       = 0x4000000000
                            .size     = 0x2000000000
      The reserved.region[0x4] contains the 16G pages.  In
      arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the
      node numbers looking for the reserved regions that belong to the
      particular node.  It is not able to identify region 0x4 as being a part
      of each of the 8 nodes.  It is assuming that a reserved region is only
      on a single node.
      This patch takes out the reserved region loop from inside
      the loop that goes over each node.  It looks up the active region containing
      the start of the reserved region.  If it extends past that active region then
      it adjusts the size and gets the next active region containing it.
      Signed-off-by: default avatarJon Tollefson <kniht@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
  24. 15 Sep, 2008 1 commit
    • Chandru's avatar
      powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels · cf00085d
      Chandru authored
      Kdump kernel needs to use only those memory regions that it is allowed
      to use (crashkernel, rtas, tce, etc.).  Each of these regions have
      their own sizes and are currently added under 'linux,usable-memory'
      property under each memory@xxx node of the device tree.
      The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory
      node (on POWER6) now stores in it the representation for most of the
      logical memory blocks with the size of each memory block being a
      constant (lmb_size).  If one or more or part of the above mentioned
      regions lie under one of the lmb from ibm,dynamic-memory property,
      there is a need to identify those regions within the given lmb.
      This makes the kernel recognize a new 'linux,drconf-usable-memory'
      property added by kexec-tools.  Each entry in this property is of the
      form of a count followed by that many (base, size) pairs for the above
      mentioned regions.  The number of cells in the count value is given by
      the #size-cells property of the root node.
      Signed-off-by: default avatarChandru Siddalingappa <chandru@in.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
  25. 24 Jul, 2008 1 commit
  26. 03 Jul, 2008 2 commits
  27. 24 Apr, 2008 1 commit
  28. 13 Feb, 2008 1 commit
  29. 07 Feb, 2008 1 commit
    • Bernhard Walle's avatar
      Introduce flags for reserve_bootmem() · 72a7fe39
      Bernhard Walle authored
      This patchset adds a flags variable to reserve_bootmem() and uses the
      BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
      between crashkernel area and already used memory.
      This patch:
      Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
      If that flag is set, the function returns with -EBUSY if the memory already
      has been reserved in the past.  This is to avoid conflicts.
      Because that code runs before SMP initialisation, there's no race condition
      inside reserve_bootmem_core().
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix powerpc build]
      Signed-off-by: default avatarBernhard Walle <bwalle@suse.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  30. 06 Feb, 2008 1 commit
    • Balbir Singh's avatar
      [POWERPC] Fake NUMA emulation for PowerPC · 1daa6d08
      Balbir Singh authored
      Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
      Fake NUMA nodes can be specified using the following command line
      numa=fake=<node range>
      node range is of the format <range1>,<range2>,...<rangeN>
      Each of the rangeX parameters is passed using memparse().  I find the
      patch useful for fake NUMA emulation on my simple PowerPC machine.
      I've tested it on a numa box with the following arguments
      numa=fake=256M,512M mem=512M
      numa=fake=1G mem=768M
      without any numa= argument
      The other side-effect introduced by this patch is that; in the case
      where we don't have NUMA information, we now set a node online after
      adding each LMB.  This node could very well be node 0, but in the case
      that we enable fake NUMA nodes, when we cross node boundaries, we need
      to set the new node online.
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
  31. 25 Jan, 2008 1 commit
    • Paul Mackerras's avatar
      Revert "[POWERPC] Fake NUMA emulation for PowerPC" · 55852bed
      Paul Mackerras authored
      This reverts commit 5c3f5892
      basically because it changes behaviour even when no fake NUMA
      information is specified on the kernel command line.
      Firstly, it changes the nid, thus destroying the real NUMA
      information.  Secondly, it also changes behaviour in that if a node
      ends up with no memory in it because of the memory limit, we used to
      set it online and now we don't.
      Also, in the non-NUMA case with no fake NUMA information, we do
      node_set_online once for each LMB now, whereas previously we only did
      it once.  I don't know if that is actually a problem, but it does seem
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
  32. 19 Dec, 2007 1 commit
    • Balbir Singh's avatar
      [POWERPC] Fake NUMA emulation for PowerPC · 5c3f5892
      Balbir Singh authored
      Here's a dumb simple implementation of fake NUMA nodes for PowerPC.
      Fake NUMA nodes can be specified using the following command line option
      numa=fake=<node range>
      node range is of the format <range1>,<range2>,...<rangeN>
      Each of the rangeX parameters is passed using memparse().  I find this
      useful for fake NUMA emulation on my simple PowerPC machine.  I've
      tested it on a non-numa box with the following arguments:
      numa=fake=1500M,2800M mem=3500M
      numa=fake=1G mem=512M
      numa=fake=1G mem=1G
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
  33. 03 Aug, 2007 1 commit