1. 24 Jul, 2012 1 commit
  2. 17 Jul, 2012 2 commits
    • Aaditya Kumar's avatar
      mm: fix lost kswapd wakeup in kswapd_stop() · 1c7e7f6c
      Aaditya Kumar authored
      
      
      Offlining memory may block forever, waiting for kswapd() to wake up
      because kswapd() does not check the event kthread->should_stop before
      sleeping.
      
      The proper pattern, from Documentation/memory-barriers.txt, is:
      
         ---  waker  ---
         event_indicated = 1;
         wake_up_process(event_daemon);
      
         ---  sleeper  ---
         for (;;) {
            set_current_state(TASK_UNINTERRUPTIBLE);
            if (event_indicated)
               break;
            schedule();
         }
      
         set_current_state() may be wrapped by:
            prepare_to_wait();
      
      In the kswapd() case, event_indicated is kthread->should_stop.
      
        === offlining memory (waker) ===
         kswapd_stop()
            kthread_stop()
               kthread->should_stop = 1
               wake_up_process()
               wait_for_completion()
      
        ===  kswapd_try_to_sleep (sleeper) ===
         kswapd_try_to_sleep()
            prepare_to_wait()
                 .
                 .
            schedule()
                 .
                 .
            finish_wait()
      
      The schedule() needs to be protected by a test of kthread->should_stop,
      which is wrapped by kthread_should_stop().
      
      Reproducer:
         Do heavy file I/O in background.
         Do a memory offline/online in a tight loop
      Signed-off-by: default avatarAaditya Kumar <aaditya.kumar@ap.sony.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c7e7f6c
    • Yinghai Lu's avatar
      bootmem: make ___alloc_bootmem_node_nopanic() really nopanic · c8f4a2d0
      Yinghai Lu authored
      In reaction to commit 99ab7b19 ("mm: sparse: fix usemap allocation
      above node descriptor section") Johannes said:
      | while backporting the below patch, I realised that your fix busted
      | f5bf18fa
      
       again.  The problem was not a panicking version on
      | allocation failure but when the usemap size was too large such that
      | goal + size > limit triggers the BUG_ON in the bootmem allocator.  So
      | we need a version that passes limit ONLY if the usemap is smaller than
      | the section.
      
      after checking the code, the name of ___alloc_bootmem_node_nopanic()
      does not reflect the fact.
      
      Make bootmem really not panic.
      
      Hope will kill bootmem sooner.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>    [3.3.x, 3.4.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8f4a2d0
  3. 11 Jul, 2012 10 commits
    • Yinghai Lu's avatar
      memblock: free allocated memblock_reserved_regions later · 29f67386
      Yinghai Lu authored
      memblock_free_reserved_regions() calls memblock_free(), but
      memblock_free() would double reserved.regions too, so we could free the
      old range for reserved.regions.
      
      Also tj said there is another bug which could be related to this.
      
      | I don't think we're saving any noticeable
      | amount by doing this "free - give it to page allocator - reserve
      | again" dancing.  We should just allocate regions aligned to page
      | boundaries and free them later when memblock is no longer in use.
      
      in that case, when DEBUG_PAGEALLOC, will get panic:
      
           memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
        BUG: unable to handle kernel paging request at ffff88102febd948
        IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
        PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        CPU 0
        Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
        RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
      
      See the discussion at https://lkml.org/lkml/2012/6/13/469
      
      
      
      So try to allocate with PAGE_SIZE alignment and free it later.
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29f67386
    • Yinghai Lu's avatar
      mm: sparse: fix usemap allocation above node descriptor section · 99ab7b19
      Yinghai Lu authored
      After commit f5bf18fa ("bootmem/sparsemem: remove limit constraint
      in alloc_bootmem_section"), usemap allocations may easily be placed
      outside the optimal section that holds the node descriptor, even if
      there is space available in that section.  This results in unnecessary
      hotplug dependencies that need to have the node unplugged before the
      section holding the usemap.
      
      The reason is that the bootmem allocator doesn't guarantee a linear
      search starting from the passed allocation goal but may start out at a
      much higher address absent an upper limit.
      
      Fix this by trying the allocation with the limit at the section end,
      then retry without if that fails.  This keeps the fix from f5bf18fa
      
      
      of not panicking if the allocation does not fit in the section, but
      still makes sure to try to stay within the section at first.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[3.3.x, 3.4.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99ab7b19
    • Yinghai Lu's avatar
      mm: sparse: fix section usemap placement calculation · 07b4e2bc
      Yinghai Lu authored
      Commit 238305bb
      
       ("mm: remove sparsemem allocation details from the
      bootmem allocator") introduced a bug in the allocation goal calculation
      that put section usemaps not in the same section as the node
      descriptors, creating unnecessary hotplug dependencies between them:
      
        node 0 must be removed before remove section 16399
        node 1 must be removed before remove section 16399
        node 2 must be removed before remove section 16399
        node 3 must be removed before remove section 16399
        node 4 must be removed before remove section 16399
        node 5 must be removed before remove section 16399
        node 6 must be removed before remove section 16399
      
      The reason is that it applies PAGE_SECTION_MASK to the physical address
      of the node descriptor when finding a suitable place to put the usemap,
      when this mask is actually intended to be used with PFNs.  Because the
      PFN mask is wider, the target address will point beyond the wanted
      section holding the node descriptor and the node must be offlined before
      the section holding the usemap can go.
      
      Fix this by extending the mask to address width before use.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      07b4e2bc
    • Hugh Dickins's avatar
      shmem: cleanup shmem_add_to_page_cache · b065b432
      Hugh Dickins authored
      
      
      shmem_add_to_page_cache() has three callsites, but only one of them wants
      the radix_tree_preload() (an exceptional entry guarantees that the radix
      tree node is present in the other cases), and only that site can achieve
      mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the
      other cases).  We did it this way originally to reflect
      add_to_page_cache_locked(); but it's confusing now, so move the radix_tree
      preloading and mem_cgroup uncharging to that one caller.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b065b432
    • Hugh Dickins's avatar
      shmem: fix negative rss in memcg memory.stat · d1899228
      Hugh Dickins authored
      
      
      When adding the page_private checks before calling shmem_replace_page(), I
      did realize that there is a further race, but thought it too unlikely to
      need a hurried fix.
      
      But independently I've been chasing why a mem cgroup's memory.stat
      sometimes shows negative rss after all tasks have gone: I expected it to
      be a stats gathering bug, but actually it's shmem swapping's fault.
      
      It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
      the page may have been removed from swapcache before getting the lock; or
      it may have been freed and reused and be back in swapcache; and it can
      even be using the same swap location as before (page_private same).
      
      The swapoff case is already secure against this (swap cannot be reused
      until the whole area has been swapped off, and a new swapped on); and
      shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
      the expected radix_tree entry - but a little too late.
      
      By that time, we might have already decided to shmem_replace_page(): I
      don't know of a problem from that, but I'd feel more at ease not to do so
      spuriously.  And we have already done mem_cgroup_cache_charge(), on
      perhaps the wrong mem cgroup: and this charge is not then undone on the
      error path, because PageSwapCache ends up preventing that.
      
      It's this last case which causes the occasional negative rss in
      memory.stat: the page is charged here as cache, but (sometimes) found to
      be anon when eventually it's uncharged - and in between, it's an
      undeserved charge on the wrong memcg.
      
      Fix this by adding an earlier check on the radix_tree entry: it's
      inelegant to descend the tree twice, but swapping is not the fast path,
      and a better solution would need a pair (try+commit) of memcg calls, and a
      rework of shmem_replace_page() to keep out of the swapcache.
      
      We can use the added shmem_confirm_swap() function to replace the
      find_get_page+page_cache_release we were already doing on the error path.
      And add a comment on that -EEXIST: it seems a peculiar errno to be using,
      but originates from its use in radix_tree_insert().
      
      [It can be surprising to see positive rss left in a memcg's memory.stat
      after all tasks have gone, since it is supposed to count anonymous but not
      shmem.  Aside from sharing anon pages via fork with a task in some other
      memcg, it often happens after swapping: because a swap page can't be freed
      while under writeback, nor while locked.  So it's not an error, and these
      residual pages are easily freed once pressure demands.]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1899228
    • Hugh Dickins's avatar
      tmpfs: revert SEEK_DATA and SEEK_HOLE · f21f8062
      Hugh Dickins authored
      Revert 4fb5ef08
      
       ("tmpfs: support SEEK_DATA and SEEK_HOLE").  I believe
      it's correct, and it's been nice to have from rc1 to rc6; but as the
      original commit said:
      
      I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
      would be of any use to them on tmpfs.  This code adds 92 lines and 752
      bytes on x86_64 - is that bloat or worthwhile?
      
      Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to
      the dumb generic support for v3.5.  We can always reinstate it later if
      useful, and anyone needing it in a hurry can just get it out of git.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Josef Bacik <josef@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Jeff liu <jeff.liu@oracle.com>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f21f8062
    • Wen Congyang's avatar
      mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails · 41b9e2d7
      Wen Congyang authored
      
      
      We should goto error to release memory resource if hotadd_new_pgdat()
      failed.
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41b9e2d7
    • David Rientjes's avatar
      mm, thp: abort compaction if migration page cannot be charged to memcg · 4bf2bba3
      David Rientjes authored
      
      
      If page migration cannot charge the temporary page to the memcg,
      migrate_pages() will return -ENOMEM.  This isn't considered in memory
      compaction however, and the loop continues to iterate over all
      pageblocks trying to isolate and migrate pages.  If a small number of
      very large memcgs happen to be oom, however, these attempts will mostly
      be futile leading to an enormous amout of cpu consumption due to the
      page migration failures.
      
      This patch will short circuit and fail memory compaction if
      migrate_pages() returns -ENOMEM.  COMPACT_PARTIAL is returned in case
      some migrations were successful so that the page allocator will retry.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bf2bba3
    • Jiang Liu's avatar
      memory hotplug: fix invalid memory access caused by stale kswapd pointer · d8adde17
      Jiang Liu authored
      
      
      kswapd_stop() is called to destroy the kswapd work thread when all memory
      of a NUMA node has been offlined.  But kswapd_stop() only terminates the
      work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
      pointer will prevent kswapd_run() from creating a new work thread when
      adding memory to the memory-less NUMA node again.  Eventually the stale
      pointer may cause invalid memory access.
      
      An example stack dump as below. It's reproduced with 2.6.32, but latest
      kernel has the same issue.
      
        BUG: unable to handle kernel NULL pointer dereference at (null)
        IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
        PGD 0
        Oops: 0000 [#1] SMP
        last sysfs file: /sys/devices/system/memory/memory391/state
        CPU 11
        Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
        Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
        RIP: 0010:exit_creds+0x12/0x78
        RSP: 0018:ffff8806044f1d78  EFLAGS: 00010202
        RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
        RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
        RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
        R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
        R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
        FS:  00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
        Stack:
         ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
         ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
         0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
        Call Trace:
          __put_task_struct+0x5d/0x97
          kthread_stop+0x50/0x58
          offline_pages+0x324/0x3da
          memory_block_change_state+0x179/0x1db
          store_mem_state+0x9e/0xbb
          sysfs_write_file+0xd0/0x107
          vfs_write+0xad/0x169
          sys_write+0x45/0x6e
          system_call_fastpath+0x16/0x1b
        Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
        RIP  exit_creds+0x12/0x78
         RSP <ffff8806044f1d78>
        CR2: 0000000000000000
      
      [akpm@linux-foundation.org: add pglist_data.kswapd locking comments]
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8adde17
    • Tony Luck's avatar
      x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults · 6751ed65
      Tony Luck authored
      In commit dad1743e
      
       ("x86/mce: Only restart instruction after machine
      check recovery if it is safe") we fixed mce_notify_process() to force a
      signal to the current process if it was not restartable (RIPV bit not
      set in MCG_STATUS). But doing it here means that the process doesn't
      get told the virtual address of the fault via siginfo_t->si_addr. This
      would prevent application level recovery from the fault.
      
      Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
      that we will provide the right information with the signal.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Cc: stable@kernel.org    # 3.4+
      6751ed65
  4. 06 Jul, 2012 2 commits
    • Andy Lutomirski's avatar
      mm: Hold a file reference in madvise_remove · 9ab4233d
      Andy Lutomirski authored
      Otherwise the code races with munmap (causing a use-after-free
      of the vma) or with close (causing a use-after-free of the struct
      file).
      
      The bug was introduced by commit 90ed52eb
      
       ("[PATCH] holepunch: fix
      mmap_sem i_mutex deadlock")
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ab4233d
    • Rabin Vincent's avatar
      mm: cma: don't replace lowmem pages with highmem · 6a6dccba
      Rabin Vincent authored
      
      
      The filesystem layer expects pages in the block device's mapping to not
      be in highmem (the mapping's gfp mask is set in bdget()), but CMA can
      currently replace lowmem pages with highmem pages, leading to crashes in
      filesystem code such as the one below:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000400
        pgd = c0c98000
        [00000400] *pgd=00c91831, *pte=00000000, *ppte=00000000
        Internal error: Oops: 817 [#1] PREEMPT SMP ARM
        CPU: 0    Not tainted  (3.5.0-rc5+ #80)
        PC is at __memzero+0x24/0x80
        ...
        Process fsstress (pid: 323, stack limit = 0xc0cbc2f0)
        Backtrace:
        [<c010e3f0>] (ext4_getblk+0x0/0x180) from [<c010e58c>] (ext4_bread+0x1c/0x98)
        [<c010e570>] (ext4_bread+0x0/0x98) from [<c0117944>] (ext4_mkdir+0x160/0x3bc)
         r4:c15337f0
        [<c01177e4>] (ext4_mkdir+0x0/0x3bc) from [<c00c29e0>] (vfs_mkdir+0x8c/0x98)
        [<c00c2954>] (vfs_mkdir+0x0/0x98) from [<c00c2a60>] (sys_mkdirat+0x74/0xac)
         r6:00000000 r5:c152eb40 r4:000001ff r3:c14b43f0
        [<c00c29ec>] (sys_mkdirat+0x0/0xac) from [<c00c2ab8>] (sys_mkdir+0x20/0x24)
         r6:beccdcf0 r5:00074000 r4:beccdbbc
        [<c00c2a98>] (sys_mkdir+0x0/0x24) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)
      
      Fix this by replacing only highmem pages with highmem.
      Reported-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      6a6dccba
  5. 20 Jun, 2012 7 commits
  6. 15 Jun, 2012 1 commit
    • Hugh Dickins's avatar
      swap: fix shmem swapping when more than 8 areas · 9b15b817
      Hugh Dickins authored
      
      
      Minchan Kim reports that when a system has many swap areas, and tmpfs
      swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
      back the page cannot locate it, and the read fails with -ENOMEM.
      
      Whoops.  Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
      swp_entry_to_pte()) technique for determining maximum usable swap
      offset, without stopping to realize that that actually depends upon the
      pte swap encoding shifting swap offset to the higher bits and truncating
      it there.  Whereas our radix_tree swap encoding leaves offset in the
      lower bits: it's swap "type" (that is, index of swap area) that was
      truncated.
      
      Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
      broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().
      
      This does not reduce the usable size of a swap area any further, it
      leaves it as claimed when making the original commit: no change from 3.0
      on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
      per swapfile on i386 with PAE.  It's not a change I would have risked
      five years ago, but with x86_64 supported for ten years, I believe it's
      appropriate now.
      
      Hmm, and what if some architecture implements its swap pte with offset
      encoded below type? That would equally break the maximum usable swap
      offset check.  Happily, they all follow the same tradition of encoding
      offset above type, but I'll prepare a check on that for next.
      Reported-and-Reviewed-and-Tested-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b15b817
  7. 13 Jun, 2012 1 commit
    • Eric Dumazet's avatar
      splice: fix racy pipe->buffers uses · 047fe360
      Eric Dumazet authored
      Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
      by splice_shrink_spd() called from vmsplice_to_pipe()
      
      commit 35f3d14d
      
       (pipe: add support for shrinking and growing pipes)
      added capability to adjust pipe->buffers.
      
      Problem is some paths don't hold pipe mutex and assume pipe->buffers
      doesn't change for their duration.
      
      Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
      use it in place of pipe->buffers where appropriate.
      
      splice_shrink_spd() loses its struct pipe_inode_info argument.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Tom Herbert <therbert@google.com>
      Cc: stable <stable@vger.kernel.org> # 2.6.35
      Tested-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      047fe360
  8. 08 Jun, 2012 2 commits
  9. 07 Jun, 2012 1 commit
    • Hugh Dickins's avatar
      shmem: replace_page must flush_dcache and others · 0142ef6c
      Hugh Dickins authored
      Commit bde05d1c
      
       ("shmem: replace page if mapping excludes its zone")
      is not at all likely to break for anyone, but it was an earlier version
      from before review feedback was incorporated.  Fix that up now.
      
      * shmem_replace_page must flush_dcache_page after copy_highpage [akpm]
      * Expand comment on why shmem_unuse_inode needs page_swapcount [akpm]
      * Remove excess of VM_BUG_ONs from shmem_replace_page [wangcong]
      * Check page_private matches swap before calling shmem_replace_page [hughd]
      * shmem_replace_page allow for unexpected race in radix_tree lookup [hughd]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Stephane Marchesin <marcheu@chromium.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Rob Clark <rob.clark@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0142ef6c
  10. 04 Jun, 2012 1 commit
    • Greg Ungerer's avatar
      nommu: fix compilation of nommu.c · ad1ed293
      Greg Ungerer authored
      
      
      Compiling 3.5-rc1 for nommu targets gives:
      
        CC      mm/nommu.o
      mm/nommu.c: In function ‘sys_mmap_pgoff’:
      mm/nommu.c:1489:2: error: ‘ret’ undeclared (first use in this function)
      mm/nommu.c:1489:2: note: each undeclared identifier is reported only once for each function it appears in
      
      It is trivially fixed by replacing 'ret' with the local variable that is
      already defined for the return value 'retval'.
      Signed-off-by: default avatarGreg Ungerer <gerg@uclinux.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ad1ed293
  11. 03 Jun, 2012 2 commits
  12. 01 Jun, 2012 7 commits
  13. 31 May, 2012 2 commits
  14. 30 May, 2012 1 commit