1. 14 Apr, 2015 2 commits
    • Toshi Kani's avatar
      mm: change vunmap to tear down huge KVA mappings · b9820d8f
      Toshi Kani authored
      Change vunmap_pmd_range() and vunmap_pud_range() to tear down huge KVA
      mappings when they are set.  pud_clear_huge() and pmd_clear_huge() return
      zero when no-operation is performed, i.e.  huge page mapping was not used.
      
      These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined
      on the architecture.
      
      [akpm@linux-foundation.org: use consistent code layout]
      Signed-off-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9820d8f
    • Toshi Kani's avatar
      mm: change __get_vm_area_node() to use fls_long() · 0f616be1
      Toshi Kani authored
      ioremap() and its related interfaces are used to create I/O mappings to
      memory-mapped I/O devices.  The mapping sizes of the traditional I/O
      devices are relatively small.  Non-volatile memory (NVM), however, has
      many GB and is going to have TB soon.  It is not very efficient to create
      large I/O mappings with 4KB.
      
      This patchset extends the ioremap() interfaces to transparently create I/O
      mappings with huge pages whenever possible.  ioremap() continues to use
      4KB mappings when a huge page does not fit into a requested range.  There
      is no change necessary to the drivers using ioremap().  A requested
      physical address must be aligned by a huge page size (1GB or 2MB on x86)
      for using huge page mapping, though.  The kernel huge I/O mapping will
      improve performance of NVM and other devices with large memory, and reduce
      the time to create their mappings as well.
      
      On x86, MTRRs can override PAT memory types with a 4KB granularity.  When
      using a huge page, MTRRs can override the memory type of the huge page,
      which may lead a performance penalty.  The processor can also behave in an
      undefined manner if a huge page is mapped to a memory range that MTRRs
      have mapped with multiple different memory types.  Therefore, the mapping
      code falls back to use a smaller page size toward 4KB when a mapping range
      is covered by non-WB type of MTRRs.  The WB type of MTRRs has no affect on
      the PAT memory types.
      
      The patchset introduces HAVE_ARCH_HUGE_VMAP, which indicates that the arch
      supports huge KVA mappings for ioremap().  User may specify a new kernel
      option "nohugeiomap" to disable the huge I/O mapping capability of
      ioremap() when necessary.
      
      Patch 1-4 change common files to support huge I/O mappings.  There is no
      change in the functinalities unless HAVE_ARCH_HUGE_VMAP is defined on the
      architecture of the system.
      
      Patch 5-6 implement the HAVE_ARCH_HUGE_VMAP funcs on x86, and set
      HAVE_ARCH_HUGE_VMAP on x86.
      
      This patch (of 6):
      
      __get_vm_area_node() takes unsigned long size, which is a 64-bit value on
      a 64-bit kernel.  However, fls(size) simply ignores the upper 32-bit.
      Change to use fls_long() to handle the size properly.
      Signed-off-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f616be1
  2. 12 Mar, 2015 1 commit
    • Andrey Ryabinin's avatar
      kasan, module, vmalloc: rework shadow allocation for modules · a5af5aa8
      Andrey Ryabinin authored
      Current approach in handling shadow memory for modules is broken.
      
      Shadow memory could be freed only after memory shadow corresponds it is no
      longer used.  vfree() called from interrupt context could use memory its
      freeing to store 'struct llist_node' in it:
      
          void vfree(const void *addr)
          {
          ...
              if (unlikely(in_interrupt())) {
                  struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
                  if (llist_add((struct llist_node *)addr, &p->list))
                          schedule_work(&p->wq);
      
      Later this list node used in free_work() which actually frees memory.
      Currently module_memfree() called in interrupt context will free shadow
      before freeing module's memory which could provoke kernel crash.
      
      So shadow memory should be freed after module's memory.  However, such
      deallocation order could race with kasan_module_alloc() in module_alloc().
      
      Free shadow right before releasing vm area.  At this point vfree()'d
      memory is not used anymore and yet not available for other allocations.
      New VM_KASAN flag used to indicate that vm area has dynamically allocated
      shadow memory so kasan frees shadow only if it was previously allocated.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5af5aa8
  3. 13 Feb, 2015 2 commits
    • Andrey Ryabinin's avatar
      mm: vmalloc: pass additional vm_flags to __vmalloc_node_range() · cb9e3c29
      Andrey Ryabinin authored
      For instrumenting global variables KASan will shadow memory backing memory
      for modules.  So on module loading we will need to allocate memory for
      shadow and map it at address in shadow that corresponds to the address
      allocated in module_alloc().
      
      __vmalloc_node_range() could be used for this purpose, except it puts a
      guard hole after allocated area.  Guard hole in shadow memory should be a
      problem because at some future point we might need to have a shadow memory
      at address occupied by guard hole.  So we could fail to allocate shadow
      for module_alloc().
      
      Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
      __vmalloc_node_range().  Add new parameter 'vm_flags' to
      __vmalloc_node_range() function.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb9e3c29
    • Andrey Ryabinin's avatar
      mm: vmalloc: add flag preventing guard hole allocation · 71394fe5
      Andrey Ryabinin authored
      For instrumenting global variables KASan will shadow memory backing memory
      for modules.  So on module loading we will need to allocate memory for
      shadow and map it at address in shadow that corresponds to the address
      allocated in module_alloc().
      
      __vmalloc_node_range() could be used for this purpose, except it puts a
      guard hole after allocated area.  Guard hole in shadow memory should be a
      problem because at some future point we might need to have a shadow memory
      at address occupied by guard hole.  So we could fail to allocate shadow
      for module_alloc().
      
      Add a new vm_struct flag 'VM_NO_GUARD' indicating that vm area doesn't
      have a guard hole.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      71394fe5
  4. 13 Dec, 2014 1 commit
  5. 10 Dec, 2014 1 commit
  6. 09 Oct, 2014 1 commit
  7. 06 Aug, 2014 4 commits
    • WANG Chao's avatar
      mm/vmalloc.c: clean up map_vm_area third argument · f6f8ed47
      WANG Chao authored
      Currently map_vm_area() takes (struct page *** pages) as third argument,
      and after mapping, it moves (*pages) to point to (*pages +
      nr_mappped_pages).
      
      It looks like this kind of increment is useless to its caller these
      days.  The callers don't care about the increments and actually they're
      trying to avoid this by passing another copy to map_vm_area().
      
      The caller can always guarantee all the pages can be mapped into vm_area
      as specified in first argument and the caller only cares about whether
      map_vm_area() fails or not.
      
      This patch cleans up the pointer movement in map_vm_area() and updates
      its callers accordingly.
      Signed-off-by: default avatarWANG Chao <chaowang@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f6f8ed47
    • David Rientjes's avatar
      mm, vmalloc: constify allocation mask · 930f036b
      David Rientjes authored
      tmp_mask in the __vmalloc_area_node() iteration never changes so it can
      be moved into function scope and marked with const.  This causes the
      movl and orl to only be done once per call rather than area->nr_pages
      times.
      
      nested_gfp can also be marked const.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      930f036b
    • Eric Dumazet's avatar
      mm/vmalloc.c: add a schedule point to vmalloc() · 660654f9
      Eric Dumazet authored
      It is not uncommon on busy servers to get stuck hundred of ms in
      vmalloc() calls (like file descriptor expansions).
      
      Add a cond_resched() to __vmalloc_area_node() to be gentle to
      other tasks.
      
      [akpm@linux-foundation.org: only do it for __GFP_WAIT, per David]
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      660654f9
    • Joonsoo Kim's avatar
      vmalloc: use rcu list iterator to reduce vmap_area_lock contention · 474750ab
      Joonsoo Kim authored
      Richard Yao reported a month ago that his system have a trouble with
      vmap_area_lock contention during performance analysis by /proc/meminfo.
      Andrew asked why his analysis checks /proc/meminfo stressfully, but he
      didn't answer it.
      
        https://lkml.org/lkml/2014/4/10/416
      
      Although I'm not sure that this is right usage or not, there is a
      solution reducing vmap_area_lock contention with no side-effect.  That
      is just to use rcu list iterator in get_vmalloc_info().
      
      rcu can be used in this function because all RCU protocol is already
      respected by writers, since Nick Piggin commit db64fe02 ("mm:
      rewrite vmap layer") back in linux-2.6.28
      
      Specifically :
         insertions use list_add_rcu(),
         deletions use list_del_rcu() and kfree_rcu().
      
      Note the rb tree is not used from rcu reader (it would not be safe),
      only the vmap_area_list has full RCU protection.
      
      Note that __purge_vmap_area_lazy() already uses this rcu protection.
      
              rcu_read_lock();
              list_for_each_entry_rcu(va, &vmap_area_list, list) {
                      if (va->flags & VM_LAZY_FREE) {
                              if (va->va_start < *start)
                                      *start = va->va_start;
                              if (va->va_end > *end)
                                      *end = va->va_end;
                              nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
                              list_add_tail(&va->purge_list, &valist);
                              va->flags |= VM_LAZY_FREEING;
                              va->flags &= ~VM_LAZY_FREE;
                      }
              }
              rcu_read_unlock();
      
      Peter:
      
      : While rcu list traversal over the vmap_area_list is safe, this may
      : arrive at different results than the spinlocked version. The rcu list
      : traversal version will not be a 'snapshot' of a single, valid instant
      : of the entire vmap_area_list, but rather a potential amalgam of
      : different list states.
      
      Joonsoo:
      
      : Yes, you are right, but I don't think that we should be strict here.
      : Meminfo is already not a 'snapshot' at specific time.  While we try to get
      : certain stats, the other stats can change.  And, although we may arrive at
      : different results than the spinlocked version, the difference would not be
      : large and would not make serious side-effect.
      
      [edumazet@google.com: add more commit description]
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: default avatarRichard Yao <ryao@gentoo.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      474750ab
  8. 04 Jun, 2014 3 commits
  9. 07 Apr, 2014 2 commits
    • Gioh Kim's avatar
      mm/vmalloc.c: enhance vm_map_ram() comment · 36437638
      Gioh Kim authored
      vm_map_ram() has a fragmentation problem when it cannot purge a
      chunk(ie, 4M address space) if there is a pinning object in that
      addresss space.  So it could consume all VMALLOC address space easily.
      
      We can fix the fragmentation problem by using vmap instead of
      vm_map_ram() but vmap() is known to be slow compared to vm_map_ram().
      Minchan said vm_map_ram is 5 times faster than vmap in his tests.  So I
      thought we should fix fragment problem of vm_map_ram because our
      proprietary GPU driver has used it heavily.
      
      On second thought, it's not an easy because we should reuse freed space
      for solving the problem and it could make more IPI and bitmap operation
      for searching hole.  It could mitigate API's goal which is very fast
      mapping.  And even fragmentation problem wouldn't show in 64 bit
      machine.
      
      Another option is that the user should separate long-life and short-life
      object and use vmap for long-life but vm_map_ram for short-life.  If we
      inform the user about the characteristic of vm_map_ram the user can
      choose one according to the page lifetime.
      
      Let's add some notice messages to user.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: default avatarGioh Kim <gioh.kim@lge.com>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36437638
    • Gideon Israel Dsouza's avatar
      mm: use macros from compiler.h instead of __attribute__((...)) · 3b32123d
      Gideon Israel Dsouza authored
      To increase compiler portability there is <linux/compiler.h> which
      provides convenience macros for various gcc constructs.  Eg: __weak for
      __attribute__((weak)).  I've replaced all instances of gcc attributes with
      the right macro in the memory management (/mm) subsystem.
      
      [akpm@linux-foundation.org: while-we're-there consistency tweaks]
      Signed-off-by: default avatarGideon Israel Dsouza <gidisrael@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b32123d
  10. 27 Jan, 2014 1 commit
    • malc's avatar
      Revert "mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page}" · add688fb
      malc authored
      Revert commit ece86e22, which was intended as a small performance
      improvement.
      
      Despite the claim that the patch doesn't introduce any functional
      changes in fact it does.
      
      The "no page" path behaves different now.  Originally, vmalloc_to_page
      might return NULL under some conditions, with new implementation it
      returns pfn_to_page(0) which is not the same as NULL.
      
      Simple test shows the difference.
      
      test.c
      
      #include <linux/kernel.h>
      #include <linux/module.h>
      #include <linux/vmalloc.h>
      #include <linux/mm.h>
      
      int __init myi(void)
      {
      	struct page *p;
      	void *v;
      
      	v = vmalloc(PAGE_SIZE);
      	/* trigger the "no page" path in vmalloc_to_page*/
      	vfree(v);
      
      	p = vmalloc_to_page(v);
      
      	pr_err("expected val = NULL, returned val = %p", p);
      
      	return -EBUSY;
      }
      
      void __exit mye(void)
      {
      
      }
      module_init(myi)
      module_exit(mye)
      
      Before interchange:
      expected val = NULL, returned val =   (null)
      
      After interchange:
      expected val = NULL, returned val = c7ebe000
      Signed-off-by: default avatarVladimir Murzin <murzin.v@gmail.com>
      Cc: Jianyu Zhan <nasa4836@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      add688fb
  11. 21 Jan, 2014 1 commit
  12. 12 Nov, 2013 6 commits
  13. 11 Sep, 2013 3 commits
  14. 09 Jul, 2013 9 commits
  15. 03 Jul, 2013 3 commits