Skip to content
  • Christoph Lameter's avatar
    Memoryless nodes: Generic management of nodemasks for various purposes · 13808910
    Christoph Lameter authored
    
    
    Why do we need to support memoryless nodes?
    
    KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
    
    > For fujitsu, problem is called "empty" node.
    >
    > When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
    > creates nodes, which includes no memory, no cpu.
    >
    > I tried to remove empty-node in past, but that was denied.
    > It was because we can hot-add cpu to the empty node.
    > (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
    >
    >
    > For HP, (Lee can comment on this later), they have memory-less-node.
    > As far as I hear, HP's machine can have following configration.
    >
    > (example)
    > Node0: CPU0   memory AAA MB
    > Node1: CPU1   memory AAA MB
    > Node2: CPU2   memory AAA MB
    > Node3: CPU3   memory AAA MB
    > Node4: Memory XXX GB
    >
    > AAA is very small value (below 16MB)  and will be omitted by ia64 bootstrap.
    > After boot, only Node 4 has valid memory (but have no cpu.)
    >
    > Maybe this is memory-interleave by firmware config.
    
    Christoph Lameter <clameter@sgi.com> wrote:
    
    > Future SGI platforms (actually also current one can have but nothing like
    > that is deployed to my knowledge) have nodes with only cpus. Current SGI
    > platforms have nodes with just I/O that we so far cannot manage in the
    > core. So the arch code maps them to the nearest memory node.
    
    Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
    
    > For the HP platforms, we can configure each cell with from 0% to 100%
    > "cell local memory".  When we configure with <100% CLM, the "missing
    > percentages" are interleaved by hardware on a cache-line granularity to
    > improve bandwidth at the expense of latency for numa-challenged
    > applications [and OSes, but not our problem ;-)].  When we boot Linux on
    > such a config, all of the real nodes have no memory--it all resides in a
    > single interleaved pseudo-node.
    >
    > When we boot Linux on a 100% CLM configuration [== NUMA], we still have
    > the interleaved pseudo-node.  It contains a few hundred MB stolen from
    > the real nodes to contain the DMA zone.  [Interleaved memory resides at
    > phys addr 0].  The memoryless-nodes patches, along with the zoneorder
    > patches, support this config as well.
    >
    > Also, when we boot a NUMA config with the "mem=" command line,
    > specifying less memory than actually exists, Linux takes the excluded
    > memory "off the top" rather than distributing it across the nodes.  This
    > can result in memoryless nodes, as well.
    >
    
    This patch:
    
    Preparation for memoryless node patches.
    
    Provide a generic way to keep nodemasks describing various characteristics of
    NUMA nodes.
    
    Remove the node_online_map and the node_possible map and realize the same
    functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
    
    [Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
    Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
    Tested-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
    Acked-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
    Acked-by: default avatarBob Picco <bob.picco@hp.com>
    Cc: Nishanth Aravamudan <nacc@us.ibm.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Mel Gorman <mel@skynet.ie>
    Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: "Serge E. Hallyn" <serge@hallyn.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    13808910