Commit 5ce1a70e authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'akpm' (more incoming from Andrew)

Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
parents 9d3cae26 ef53d16c
What: /sys/kernel/mm/ksm
Date: September 2009
KernelVersion: 2.6.32
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Interface for Kernel Samepage Merging (KSM)
What: /sys/kernel/mm/ksm/full_scans
What: /sys/kernel/mm/ksm/pages_shared
What: /sys/kernel/mm/ksm/pages_sharing
What: /sys/kernel/mm/ksm/pages_to_scan
What: /sys/kernel/mm/ksm/pages_unshared
What: /sys/kernel/mm/ksm/pages_volatile
What: /sys/kernel/mm/ksm/run
What: /sys/kernel/mm/ksm/sleep_millisecs
Date: September 2009
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Kernel Samepage Merging daemon sysfs interface
full_scans: how many times all mergeable areas have been
scanned.
pages_shared: how many shared pages are being used.
pages_sharing: how many more sites are sharing them i.e. how
much saved.
pages_to_scan: how many present pages to scan before ksmd goes
to sleep.
pages_unshared: how many pages unique but repeatedly checked
for merging.
pages_volatile: how many pages changing too fast to be placed
in a tree.
run: write 0 to disable ksm, read 0 while ksm is disabled.
write 1 to run ksm, read 1 while ksm is running.
write 2 to disable ksm and unmerge all its pages.
sleep_millisecs: how many milliseconds ksm should sleep between
scans.
See Documentation/vm/ksm.txt for more information.
What: /sys/kernel/mm/ksm/merge_across_nodes
Date: January 2013
KernelVersion: 3.9
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Control merging pages across different NUMA nodes.
When it is set to 0 only pages from the same node are merged,
otherwise pages from all nodes can be merged together (default).
......@@ -1640,6 +1640,42 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
movablemem_map=acpi
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
This option inform the kernel to use Hot Pluggable bit
in flags from SRAT from ACPI BIOS to determine which
memory devices could be hotplugged. The corresponding
memory ranges will be set as ZONE_MOVABLE.
NOTE: Whatever node the kernel resides in will always
be un-hotpluggable.
movablemem_map=nn[KMG]@ss[KMG]
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
If user specifies memory ranges, the info in SRAT will
be ingored. And it works like the following:
- If more ranges are all within one node, then from
lowest ss to the end of the node will be ZONE_MOVABLE.
- If a range is within a node, then from ss to the end
of the node will be ZONE_MOVABLE.
- If a range covers two or more nodes, then from ss to
the end of the 1st node will be ZONE_MOVABLE, and all
the rest nodes will only have ZONE_MOVABLE.
If memmap is specified at the same time, the
movablemem_map will be limited within the memmap
areas. If kernelcore or movablecore is also specified,
movablemem_map will have higher priority to be
satisfied. So the administrator should be careful that
the amount of movablemem_map areas are not too large.
Otherwise kernel won't have enough memory to start.
NOTE: We don't stop users specifying the node the
kernel resides in as hotpluggable so that this
option can be used as a workaround of firmware
bugs.
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
......
......@@ -58,6 +58,21 @@ sleep_millisecs - how many milliseconds ksmd should sleep before next scan
e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs"
Default: 20 (chosen for demonstration purposes)
merge_across_nodes - specifies if pages from different numa nodes can be merged.
When set to 0, ksm merges only pages which physically
reside in the memory area of same NUMA node. That brings
lower latency to access of shared pages. Systems with more
nodes, at significant NUMA distances, are likely to benefit
from the lower latency of setting 0. Smaller systems, which
need to minimize memory usage, are likely to benefit from
the greater sharing of setting 1 (default). You may wish to
compare how your system performs under each setting, before
deciding on which to use. merge_across_nodes setting can be
changed only when there are no ksm shared pages in system:
set run 2 to unmerge pages first, then to 1 after changing
merge_across_nodes, to remerge according to the new setting.
Default: 1 (merging across nodes as in earlier releases)
run - set 0 to stop ksmd from running but keep merged pages,
set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run",
set 2 to stop ksmd and unmerge all pages currently merged,
......
......@@ -434,4 +434,7 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
#endif /* CONFIG_ARM64_64K_PAGES */
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
......@@ -93,7 +93,7 @@ void show_mem(unsigned int filter)
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
printk(KERN_INFO "%ld free buffer pages\n", nr_free_buffer_pages());
}
......
......@@ -666,7 +666,7 @@ void show_mem(unsigned int filter)
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
printk(KERN_INFO "%ld free buffer pages\n", nr_free_buffer_pages());
}
/**
......@@ -822,4 +822,8 @@ int __meminit vmemmap_populate(struct page *start_page,
{
return vmemmap_populate_basepages(start_page, size, node);
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif
......@@ -688,6 +688,24 @@ int arch_add_memory(int nid, u64 start, u64 size)
return ret;
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
int ret;
zone = page_zone(pfn_to_page(start_pfn));
ret = __remove_pages(zone, start_pfn, nr_pages);
if (ret)
pr_warn("%s: Problem encountered in __remove_pages() as"
" ret=%d\n", __func__, ret);
return ret;
}
#endif
#endif
/*
......
......@@ -297,5 +297,10 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
......@@ -133,6 +133,18 @@ int arch_add_memory(int nid, u64 start, u64 size)
return __add_pages(nid, zone, start_pfn, nr_pages);
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
zone = page_zone(pfn_to_page(start_pfn));
return __remove_pages(zone, start_pfn, nr_pages);
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
/*
......
......@@ -228,4 +228,16 @@ int arch_add_memory(int nid, u64 start, u64 size)
vmem_remove_mapping(start, size);
return rc;
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
/*
* There is no hardware or firmware interface which could trigger a
* hot memory remove on s390. So there is nothing that needs to be
* implemented.
*/
return -EBUSY;
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
......@@ -268,6 +268,10 @@ out:
return ret;
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
/*
* Add memory segment to the segment list if it doesn't overlap with
* an already present segment.
......
......@@ -558,4 +558,21 @@ int memory_add_physaddr_to_nid(u64 addr)
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
#endif
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
int ret;
zone = page_zone(pfn_to_page(start_pfn));
ret = __remove_pages(zone, start_pfn, nr_pages);
if (unlikely(ret))
pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
ret);
return ret;
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
......@@ -57,7 +57,7 @@ void show_mem(unsigned int filter)
printk("Mem-info:\n");
show_free_areas(filter);
printk("Free swap: %6ldkB\n",
nr_swap_pages << (PAGE_SHIFT-10));
get_nr_swap_pages() << (PAGE_SHIFT-10));
printk("%ld pages of RAM\n", totalram_pages);
printk("%ld free pages\n", nr_free_pages());
}
......
......@@ -2235,6 +2235,11 @@ void __meminit vmemmap_populate_print_last(void)
node_start = 0;
}
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
static void prot_init_common(unsigned long page_none,
......
......@@ -130,7 +130,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
if (!retval) {
unsigned long addr = MEM_USER_INTRPT;
addr = mmap_region(NULL, addr, INTRPT_SIZE,
MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
if (addr > (unsigned long) -PAGE_SIZE)
......
......@@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size)
{
return -EINVAL;
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
/* TODO */
return -EBUSY;
}
#endif
#endif
struct kmem_cache *pgd_cache;
......
......@@ -61,7 +61,7 @@ void show_mem(unsigned int filter)
global_page_state(NR_PAGETABLE),
global_page_state(NR_BOUNCE),
global_page_state(NR_FILE_PAGES),
nr_swap_pages);
get_nr_swap_pages());
for_each_zone(zone) {
unsigned long flags, order, total = 0, largest_order = -1;
......
......@@ -57,8 +57,8 @@ static inline int numa_cpu_node(int cpu)
#endif
#ifdef CONFIG_NUMA
extern void __cpuinit numa_set_node(int cpu, int node);
extern void __cpuinit numa_clear_node(int cpu);
extern void numa_set_node(int cpu, int node);
extern void numa_clear_node(int cpu);
extern void __init init_cpu_to_node(void);
extern void __cpuinit numa_add_cpu(int cpu);
extern void __cpuinit numa_remove_cpu(int cpu);
......
......@@ -351,6 +351,7 @@ static inline void update_page_count(int level, unsigned long pages) { }
* as a pte too.
*/
extern pte_t *lookup_address(unsigned long address, unsigned int *level);
extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase);
extern phys_addr_t slow_virt_to_phys(void *__address);
#endif /* !__ASSEMBLY__ */
......
......@@ -696,6 +696,10 @@ EXPORT_SYMBOL(acpi_map_lsapic);
int acpi_unmap_lsapic(int cpu)
{
#ifdef CONFIG_ACPI_NUMA
set_apicid_to_node(per_cpu(x86_cpu_to_apicid, cpu), NUMA_NO_NODE);
#endif
per_cpu(x86_cpu_to_apicid, cpu) = -1;
set_cpu_present(cpu, false);
num_processors--;
......
......@@ -1056,6 +1056,15 @@ void __init setup_arch(char **cmdline_p)
setup_bios_corruption_check();
#endif
/*
* In the memory hotplug case, the kernel needs info from SRAT to
* determine which memory is hotpluggable before allocating memory
* using memblock.
*/
acpi_boot_table_init();
early_acpi_boot_init();
early_parse_srat();
#ifdef CONFIG_X86_32
printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
(max_pfn_mapped<<PAGE_SHIFT) - 1);
......@@ -1101,10 +1110,6 @@ void __init setup_arch(char **cmdline_p)
/*
* Parse the ACPI tables for possible boot-time SMP configuration.
*/
acpi_boot_table_init();
early_acpi_boot_init();
initmem_init();
memblock_find_dma_reserve();
......
......@@ -862,6 +862,18 @@ int arch_add_memory(int nid, u64 start, u64 size)
return __add_pages(nid, zone, start_pfn, nr_pages);
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
zone = page_zone(pfn_to_page(start_pfn));
return __remove_pages(zone, start_pfn, nr_pages);
}
#endif
#endif
/*
......
......@@ -707,6 +707,343 @@ int arch_add_memory(int nid, u64 start, u64 size)
}
EXPORT_SYMBOL_GPL(arch_add_memory);
#define PAGE_INUSE 0xFD
static void __meminit free_pagetable(struct page *page, int order)
{
struct zone *zone;
bool bootmem = false;
unsigned long magic;
unsigned int nr_pages = 1 << order;
/* bootmem page has reserved flag */
if (PageReserved(page)) {
__ClearPageReserved(page);
bootmem = true;
magic = (unsigned long)page->lru.next;
if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
while (nr_pages--)
put_page_bootmem(page++);
} else
__free_pages_bootmem(page, order);
} else
free_pages((unsigned long)page_address(page), order);
/*
* SECTION_INFO pages and MIX_SECTION_INFO pages
* are all allocated by bootmem.
*/
if (bootmem) {
zone = page_zone(page);
zone_span_writelock(zone);
zone->present_pages += nr_pages;
zone_span_writeunlock(zone);
totalram_pages += nr_pages;
}
}
static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
{
pte_t *pte;
int i;
for (i = 0; i < PTRS_PER_PTE; i++) {
pte = pte_start + i;
if (pte_val(*pte))
return;
}
/* free a pte talbe */
free_pagetable(pmd_page(*pmd), 0);
spin_lock(&init_mm.page_table_lock);
pmd_clear(pmd);
spin_unlock(&init_mm.page_table_lock);
}
static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
{
pmd_t *pmd;
int i;
for (i = 0; i < PTRS_PER_PMD; i++) {
pmd = pmd_start + i;
if (pmd_val(*pmd))
return;
}
/* free a pmd talbe */
free_pagetable(pud_page(*pud), 0);
spin_lock(&init_mm.page_table_lock);
pud_clear(pud);
spin_unlock(&init_mm.page_table_lock);
}
/* Return true if pgd is changed, otherwise return false. */
static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
{
pud_t *pud;
int i;
for (i = 0; i < PTRS_PER_PUD; i++) {
pud = pud_start + i;
if (pud_val(*pud))
return false;
}
/* free a pud table */
free_pagetable(pgd_page(*pgd), 0);
spin_lock(&init_mm.page_table_lock);
pgd_clear(pgd);
spin_unlock(&init_mm.page_table_lock);
return true;
}
static void __meminit
remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
bool direct)
{
unsigned long next, pages = 0;
pte_t *pte;
void *page_addr;
phys_addr_t phys_addr;
pte = pte_start + pte_index(addr);
for (; addr < end; addr = next, pte++) {
next = (addr + PAGE_SIZE) & PAGE_MASK;
if (next > end)
next = end;
if (!pte_present(*pte))
continue;
/*
* We mapped [0,1G) memory as identity mapping when
* initializing, in arch/x86/kernel/head_64.S. These
* pagetables cannot be removed.
*/
phys_addr = pte_val(*pte) + (addr & PAGE_MASK);
if (phys_addr < (phys_addr_t)0x40000000)
return;
if (IS_ALIGNED(addr, PAGE_SIZE) &&
IS_ALIGNED(next, PAGE_SIZE)) {
/*
* Do not free direct mapping pages since they were
* freed when offlining, or simplely not in use.
*/
if (!direct)
free_pagetable(pte_page(*pte), 0);
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
spin_unlock(&init_mm.page_table_lock);
/* For non-direct mapping, pages means nothing. */
pages++;
} else {
/*
* If we are here, we are freeing vmemmap pages since
* direct mapped memory ranges to be freed are aligned.
*
* If we are not removing the whole page, it means
* other page structs in this page are being used and
* we canot remove them. So fill the unused page_structs
* with 0xFD, and remove the page when it is wholly
* filled with 0xFD.
*/
memset((void *)addr, PAGE_INUSE, next - addr);
page_addr = page_address(pte_page(*pte));
if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
free_pagetable(pte_page(*pte), 0);
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
spin_unlock(&init_mm.page_table_lock);
}
}
}
/* Call free_pte_table() in remove_pmd_table(). */
flush_tlb_all();
if (direct)
update_page_count(PG_LEVEL_4K, -pages);
}
static void __meminit
remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
bool direct)
{
unsigned long next, pages = 0;
pte_t *pte_base;
pmd_t *pmd;
void *page_addr;
pmd = pmd_start + pmd_index(addr);
for (; addr < end; addr = next, pmd++) {