1. 03 Jul, 2013 2 commits
    • HATAYAMA Daisuke's avatar
      vmalloc: introduce remap_vmalloc_range_partial · e69e9d4a
      HATAYAMA Daisuke authored
      We want to allocate ELF note segment buffer on the 2nd kernel in vmalloc
      space and remap it to user-space in order to reduce the risk that memory
      allocation fails on system with huge number of CPUs and so with huge ELF
      note segment that exceeds 11-order block size.
      
      Although there's already remap_vmalloc_range for the purpose of
      remapping vmalloc memory to user-space, we need to specify user-space
      range via vma.
       Mmap on /proc/vmcore needs to remap range across multiple objects, so
      the interface that requires vma to cover full range is problematic.
      
      This patch introduces remap_vmalloc_range_partial that receives user-space
      range as a pair of base address and size and can be used for mmap on
      /proc/vmcore case.
      
      remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
      
      [akpm@linux-foundation.org: use PAGE_ALIGNED()]
      Signed-off-by: default avatarHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e69e9d4a
    • HATAYAMA Daisuke's avatar
      vmalloc: make find_vm_area check in range · cef2ac3f
      HATAYAMA Daisuke authored
      Currently, __find_vmap_area searches for the kernel VM area starting at
      a given address.  This patch changes this behavior so that it searches
      for the kernel VM area to which the address belongs.  This change is
      needed by remap_vmalloc_range_partial to be introduced in later patch
      that receives any position of kernel VM area as target address.
      
      This patch changes the condition (addr > va->va_start) to the equivalent
      (addr >= va->va_end) by taking advantage of the fact that each kernel VM
      area is non-overlapping.
      Signed-off-by: default avatarHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cef2ac3f
  2. 07 May, 2013 1 commit
  3. 29 Apr, 2013 8 commits
    • Atsushi Kumagai's avatar
      kexec, vmalloc: export additional vmalloc layer information · 13ba3fcb
      Atsushi Kumagai authored
      Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get
      the start address of vmalloc region (vmalloc_start).  The address which
      contains vmalloc_start value is represented as below:
      
        vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start)
      
      However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list)
      aren't exported as VMCOREINFO.
      
      So this patch exports them externally with small cleanup.
      
      [akpm@linux-foundation.org: vmalloc.h should include list.h for list_head]
      Signed-off-by: default avatarAtsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13ba3fcb
    • Joonsoo Kim's avatar
      mm, vmalloc: remove list management of vmlist after initializing vmalloc · 4341fa45
      Joonsoo Kim authored
      Now, there is no need to maintain vmlist after initializing vmalloc.  So
      remove related code and data structure.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4341fa45
    • Joonsoo Kim's avatar
      mm, vmalloc: export vmap_area_list, instead of vmlist · f1c4069e
      Joonsoo Kim authored
      Although our intention is to unexport internal structure entirely, but
      there is one exception for kexec.  kexec dumps address of vmlist and
      makedumpfile uses this information.
      
      We are about to remove vmlist, then another way to retrieve information
      of vmalloc layer is needed for makedumpfile.  For this purpose, we
      export vmap_area_list, instead of vmlist.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1c4069e
    • Joonsoo Kim's avatar
      mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo() · d4033afd
      Joonsoo Kim authored
      This patch is a preparatory step for removing vmlist entirely.  For
      above purpose, we change iterating a vmap_list codes to iterating a
      vmap_area_list.  It is somewhat trivial change, but just one thing
      should be noticed.
      
      Using vmap_area_list in vmallocinfo() introduce ordering problem in SMP
      system.  In s_show(), we retrieve some values from vm_struct.
      vm_struct's values is not fully setup when va->vm is assigned.  Full
      setup is notified by removing VM_UNLIST flag without holding a lock.
      When we see that VM_UNLIST is removed, it is not ensured that vm_struct
      has proper values in view of other CPUs.  So we need smp_[rw]mb for
      ensuring that proper values is assigned when we see that VM_UNLIST is
      removed.
      
      Therefore, this patch not only change a iteration list, but also add a
      appropriate smp_[rw]mb to right places.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4033afd
    • Joonsoo Kim's avatar
      mm, vmalloc: iterate vmap_area_list in get_vmalloc_info() · f98782dd
      Joonsoo Kim authored
      This patch is a preparatory step for removing vmlist entirely.  For
      above purpose, we change iterating a vmap_list codes to iterating a
      vmap_area_list.  It is somewhat trivial change, but just one thing
      should be noticed.
      
      vmlist is lack of information about some areas in vmalloc address space.
      For example, vm_map_ram() allocate area in vmalloc address space, but it
      doesn't make a link with vmlist.  To provide full information about
      vmalloc address space is better idea, so we don't use va->vm and use
      vmap_area directly.  This makes get_vmalloc_info() more precise.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f98782dd
    • Joonsoo Kim's avatar
      mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite() · e81ce85f
      Joonsoo Kim authored
      Now, when we hold a vmap_area_lock, va->vm can't be discarded.  So we can
      safely access to va->vm when iterating a vmap_area_list with holding a
      vmap_area_lock.  With this property, change iterating vmlist codes in
      vread/vwrite() to iterating vmap_area_list.
      
      There is a little difference relate to lock, because vmlist_lock is mutex,
      but, vmap_area_lock is spin_lock.  It may introduce a spinning overhead
      during vread/vwrite() is executing.  But, these are debug-oriented
      functions, so this overhead is not real problem for common case.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e81ce85f
    • Joonsoo Kim's avatar
      mm, vmalloc: protect va->vm by vmap_area_lock · c69480ad
      Joonsoo Kim authored
      Inserting and removing an entry to vmlist is linear time complexity, so
      it is inefficient.  Following patches will try to remove vmlist
      entirely.  This patch is preparing step for it.
      
      For removing vmlist, iterating vmlist codes should be changed to
      iterating a vmap_area_list.  Before implementing that, we should make
      sure that when we iterate a vmap_area_list, accessing to va->vm doesn't
      cause a race condition.  This patch ensure that when iterating a
      vmap_area_list, there is no race condition for accessing to vm_struct.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c69480ad
    • Joonsoo Kim's avatar
      mm, vmalloc: move get_vmalloc_info() to vmalloc.c · db3808c1
      Joonsoo Kim authored
      Now get_vmalloc_info() is in fs/proc/mmu.c.  There is no reason that this
      code must be here and it's implementation needs vmlist_lock and it iterate
      a vmlist which may be internal data structure for vmalloc.
      
      It is preferable that vmlist_lock and vmlist is only used in vmalloc.c
      for maintainability. So move the code to vmalloc.c
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db3808c1
  4. 10 Mar, 2013 1 commit
  5. 23 Feb, 2013 1 commit
  6. 11 Dec, 2012 1 commit
  7. 09 Oct, 2012 2 commits
    • Kees Cook's avatar
      mm: use %pK for /proc/vmallocinfo · 45ec1690
      Kees Cook authored
      In the paranoid case of sysctl kernel.kptr_restrict=2, mask the kernel
      virtual addresses in /proc/vmallocinfo too.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarBrad Spengler <spender@grsecurity.net>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      45ec1690
    • Konstantin Khlebnikov's avatar
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov authored
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  8. 31 Jul, 2012 2 commits
  9. 30 Jul, 2012 2 commits
  10. 24 Jul, 2012 1 commit
  11. 29 May, 2012 2 commits
  12. 20 Mar, 2012 1 commit
  13. 12 Jan, 2012 1 commit
  14. 10 Jan, 2012 1 commit
  15. 20 Dec, 2011 1 commit
  16. 09 Dec, 2011 1 commit
  17. 18 Nov, 2011 1 commit
    • Nicolas Pitre's avatar
      mm: add vm_area_add_early() · be9b7335
      Nicolas Pitre authored
      The existing vm_area_register_early() allows for early vmalloc space
      allocation.  However upcoming cleanups in the ARM architecture require
      that some fixed locations in the vmalloc area be reserved also very early.
      
      The name "vm_area_register_early" would have been a good name for the
      reservation part without the allocation.  Since it is already in use with
      different semantics, let's create vm_area_add_early() instead.
      
      Both vm_area_register_early() and vm_area_add_early() can be used together
      meaning that the former is now implemented using the later where it is
      ensured that no conflicting areas are added, but no attempt is made to
      make the allocation scheme in vm_area_register_early() more sophisticated.
      After all, you must know what you're doing when using those functions.
      Signed-off-by: default avatarNicolas Pitre <nicolas.pitre@linaro.org>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      be9b7335
  18. 16 Nov, 2011 1 commit
  19. 31 Oct, 2011 3 commits
  20. 14 Sep, 2011 1 commit
    • David Vrabel's avatar
      mm: sync vmalloc address space page tables in alloc_vm_area() · 461ae488
      David Vrabel authored
      Xen backend drivers (e.g., blkback and netback) would sometimes fail to
      map grant pages into the vmalloc address space allocated with
      alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen could
      not find the page (in the L2 table) containing the PTEs it needed to
      update.
      
      (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
      
      netback and blkback were making the hypercall from a kernel thread where
      task->active_mm != &init_mm and alloc_vm_area() was only updating the page
      tables for init_mm.  The usual method of deferring the update to the page
      tables of other processes (i.e., after taking a fault) doesn't work as a
      fault cannot occur during the hypercall.
      
      This would work on some systems depending on what else was using vmalloc.
      
      Fix this by reverting ef691947 ("vmalloc: remove vmalloc_sync_all()
      from alloc_vm_area()") and add a comment to explain why it's needed.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Keir Fraser <keir.xen@gmail.com>
      Cc: <stable@kernel.org>		[3.0.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      461ae488
  21. 14 Aug, 2011 1 commit
  22. 26 Jul, 2011 1 commit
  23. 20 Jul, 2011 2 commits
  24. 25 May, 2011 2 commits
    • Dave Hansen's avatar
      mm: print vmalloc() state after allocation failures · 22943ab1
      Dave Hansen authored
      I was tracking down a page allocation failure that ended up in vmalloc().
      Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
      of memory, we'll still get a warning with "order:0" in it.  That's not
      very useful.
      
      During recovery, vmalloc() also nicely frees all of the memory that it got
      up to the point of the failure.  That is wonderful, but it also quickly
      hides any issues.  We have a much different sitation if vmalloc()
      repeatedly fails 10GB in to:
      
      	vmalloc(100 * 1<<30);
      
      versus repeatedly failing 4096 bytes in to a:
      
      	vmalloc(8192);
      
      This patch will print out messages that look like this:
      
      [   68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes
      [   68.124218] bash: page allocation failure: order:0, mode:0xd2
      [   68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e689-dirty #333
      [   68.125579] Call Trace:
      [   68.125853]  [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170
      [   68.126464]  [<ffffffff8107e05c>] ? printk+0x6c/0x70
      [   68.126791]  [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0
      [   68.127661]  [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290
      ...
      
      The 'order' variable is added for clarity when calling warn_alloc_failed()
      to avoid having an unexplained '0' as an argument.
      
      The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take
      us over 80 columns for the alloc_pages_node() call.  If we are going to
      add a line, it might as well be one that makes the sucker easier to read.
      
      As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
      solely on an unverified value passed in from userspace.  Granted, it's
      under CAP_SYS_ADMIN, but it still frightens me a bit.
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22943ab1
    • Johannes Weiner's avatar
      mm/vmalloc: remove guard page from between vmap blocks · 248ac0e1
      Johannes Weiner authored
      The vmap allocator is used to, among other things, allocate per-cpu vmap
      blocks, where each vmap block is naturally aligned to its own size.
      Obviously, leaving a guard page after each vmap area forbids packing vmap
      blocks efficiently and can make the kernel run out of possible vmap blocks
      long before overall vmap space is exhausted.
      
      The new interface to map a user-supplied page array into linear vmalloc
      space (vm_map_ram) insists on allocating from a vmap block (instead of
      falling back to a custom area) when the area size is below a certain
      threshold.  With heavy users of this interface (e.g.  XFS) and limited
      vmalloc space on 32-bit, vmap block exhaustion is a real problem.
      
      Remove the guard page from the core vmap allocator.  vmalloc and the old
      vmap interface enforce a guard page on their own at a higher level.
      
      Note that without this patch, we had accidental guard pages after those
      vm_map_ram areas that happened to be at the end of a vmap block, but not
      between every area.  This patch removes this accidental guard page only.
      
      If we want guard pages after every vm_map_ram area, this should be done
      separately.  And just like with vmalloc and the old interface on a
      different level, not in the core allocator.
      
      Mel pointed out: "If necessary, the guard page could be reintroduced as a
      debugging-only option (CONFIG_DEBUG_PAGEALLOC?).  Otherwise it seems
      reasonable."
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Dave Chinner <david@fromorbit.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      248ac0e1