Skip to content
Snippets Groups Projects
  1. Sep 27, 2009
  2. Sep 25, 2009
  3. Sep 22, 2009
    • Pekka Enberg's avatar
      shmem: initialize struct shmem_sb_info to zero · 425fbf04
      Pekka Enberg authored
      
      Fixes the following kmemcheck false positive (the compiler is using
      a 32-bit mov to load the 16-bit sbinfo->mode in shmem_fill_super):
      
      [    0.337000] Total of 1 processors activated (3088.38 BogoMIPS).
      [    0.352000] CPU0 attaching NULL sched-domain.
      [    0.360000] WARNING: kmemcheck: Caught 32-bit read from uninitialized
      memory (9f8020fc)
      [    0.361000]
      a44240820000000041f6998100000000000000000000000000000000ff030000
      [    0.368000]  i i i i i i i i i i i i i i i i u u u u i i i i i i i i i i u
      u
      [    0.375000]                                                          ^
      [    0.376000]
      [    0.377000] Pid: 9, comm: khelper Not tainted (2.6.31-tip #206) P4DC6
      [    0.378000] EIP: 0060:[<810a3a95>] EFLAGS: 00010246 CPU: 0
      [    0.379000] EIP is at shmem_fill_super+0xb5/0x120
      [    0.380000] EAX: 00000000 EBX: 9f845400 ECX: 824042a4 EDX: 8199f641
      [    0.381000] ESI: 9f8020c0 EDI: 9f845400 EBP: 9f81af68 ESP: 81cd6eec
      [    0.382000]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [    0.383000] CR0: 8005003b CR2: 9f806200 CR3: 01ccd000 CR4: 000006d0
      [    0.384000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      [    0.385000] DR6: ffff4ff0 DR7: 00000400
      [    0.386000]  [<810c25fc>] get_sb_nodev+0x3c/0x80
      [    0.388000]  [<810a3514>] shmem_get_sb+0x14/0x20
      [    0.390000]  [<810c207f>] vfs_kern_mount+0x4f/0x120
      [    0.392000]  [<81b2849e>] init_tmpfs+0x7e/0xb0
      [    0.394000]  [<81b11597>] do_basic_setup+0x17/0x30
      [    0.396000]  [<81b11907>] kernel_init+0x57/0xa0
      [    0.398000]  [<810039b7>] kernel_thread_helper+0x7/0x10
      [    0.400000]  [<ffffffff>] 0xffffffff
      [    0.402000] khelper used greatest stack depth: 2820 bytes left
      [    0.407000] calling  init_mmap_min_addr+0x0/0x10 @ 1
      [    0.408000] initcall init_mmap_min_addr+0x0/0x10 returned 0 after 0 usecs
      
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Analysed-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      425fbf04
    • Hugh Dickins's avatar
      tmpfs: depend on shmem · 3f96b79a
      Hugh Dickins authored
      
      CONFIG_SHMEM off gives you (ramfs masquerading as) tmpfs, even when
      CONFIG_TMPFS is off: that's a little anomalous, and I'd intended to make
      more sense of it by removing CONFIG_TMPFS altogether, always enabling its
      code when CONFIG_SHMEM; but so many defconfigs have CONFIG_SHMEM on
      CONFIG_TMPFS off that we'd better leave that as is.
      
      But there is no point in asking for CONFIG_TMPFS if CONFIG_SHMEM is off:
      make TMPFS depend on SHMEM, which also prevents TMPFS_POSIX_ACL
      shmem_acl.o being pointlessly built into the kernel when SHMEM is off.
      
      And a selfish change, to prevent the world from being rebuilt when I
      switch between CONFIG_SHMEM on and off: the only CONFIG_SHMEM in the
      header files is mm.h shmem_lock() - give that a shmem.c stub instead.
      
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f96b79a
    • Jaswinder Singh Rajput's avatar
      mm: includecheck fix for mm/shmem.c · cff397e6
      Jaswinder Singh Rajput authored
      
      Fix the following 'make includecheck' warning:
      
        mm/shmem.c: linux/vfs.h is included more than once.
      
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cff397e6
    • Daisuke Nishimura's avatar
      mm: add_to_swap_cache() does not return -EEXIST · 2ca4532a
      Daisuke Nishimura authored
      
      After commit 355cfa73 ("mm: modify swap_map and add SWAP_HAS_CACHE flag"),
      only the context which have set SWAP_HAS_CACHE flag by swapcache_prepare()
      or get_swap_page() would call add_to_swap_cache().  So add_to_swap_cache()
      doesn't return -EEXIST any more.
      
      Even though it doesn't return -EEXIST, it's not good behavior conceptually
      to call swapcache_prepare() in the -EEXIST case, because it means clearing
      SWAP_HAS_CACHE flag while the entry is on swap cache.
      
      This patch removes redundant codes and comments from callers of it, and
      adds VM_BUG_ON() in error path of add_to_swap_cache() and some comments.
      
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ca4532a
  4. Sep 16, 2009
    • Andi Kleen's avatar
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen authored
      
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      aa261f54
    • Wu Fengguang's avatar
      HWPOISON: shmem: call set_page_dirty() with locked page · 6746aff7
      Wu Fengguang authored
      
      The dirtying of page and set_page_dirty() can be moved into the page lock.
      
      - In shmem_write_end(), the page was dirtied while the page lock was held,
        but it's being marked dirty just after dropping the page lock.
      - In shmem_symlink(), both dirtying and marking can be moved into page lock.
      
      It's valuable for the hwpoison code to know whether one bad page can be dropped
      without losing data. It mainly judges by testing the PG_dirty bit after taking
      the page lock. So it becomes important that the dirtying of page and the
      marking of dirtiness are both done inside the page lock. Which is a common
      practice, but sadly not a rule.
      
      The noticeable exceptions are
      - mapped pages
      - pages with buffer_heads
      The above pages could go dirty at any time. Fortunately the hwpoison will
      unmap the page and release the buffer_heads beforehand anyway.
      
      Many other types of pages (eg. metadata pages) can also be dirtied at will by
      their owners, the hwpoison code cannot do meaningful things to them anyway.
      Only the dirtiness of pagecache pages owned by regular files are interested.
      
      v2: AK: Add comment about set_page_dirty rules (suggested by Peter Zijlstra)
      
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Reviewed-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      6746aff7
  5. Sep 15, 2009
    • Kay Sievers's avatar
      Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev · 2b2af54a
      Kay Sievers authored
      
      Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
      very early at kernel initialization, before any driver-core device
      is registered. Every device with a major/minor will provide a
      device node in devtmpfs.
      
      Devtmpfs can be changed and altered by userspace at any time,
      and in any way needed - just like today's udev-mounted tmpfs.
      Unmodified udev versions will run just fine on top of it, and will
      recognize an already existing kernel-created device node and use it.
      The default node permissions are root:root 0600. Proper permissions
      and user/group ownership, meaningful symlinks, all other policy still
      needs to be applied by userspace.
      
      If a node is created by devtmps, devtmpfs will remove the device node
      when the device goes away. If the device node was created by
      userspace, or the devtmpfs created node was replaced by userspace, it
      will no longer be removed by devtmpfs.
      
      If it is requested to auto-mount it, it makes init=/bin/sh work
      without any further userspace support. /dev will be fully populated
      and dynamic, and always reflect the current device state of the kernel.
      With the commonly used dynamic device numbers, it solves the problem
      where static devices nodes may point to the wrong devices.
      
      It is intended to make the initial bootup logic simpler and more robust,
      by de-coupling the creation of the inital environment, to reliably run
      userspace processes, from a complex userspace bootstrap logic to provide
      a working /dev.
      
      Signed-off-by: default avatarKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: default avatarJan Blunck <jblunck@suse.de>
      Tested-By: default avatarHarald Hoyer <harald@redhat.com>
      Tested-By: default avatarScott James Remnant <scott@ubuntu.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2b2af54a
  6. Sep 08, 2009
  7. Jun 24, 2009
  8. Jun 16, 2009
  9. May 21, 2009
  10. May 02, 2009
    • Daisuke Nishimura's avatar
      memcg: fix mem_cgroup_shrink_usage() · ae3abae6
      Daisuke Nishimura authored
      
      Current mem_cgroup_shrink_usage() has two problems.
      
      1. It doesn't call mem_cgroup_out_of_memory and doesn't update
         last_oom_jiffies, so pagefault_out_of_memory invokes global OOM.
      
      2. Considering hierarchy, shrinking has to be done from the
         mem_over_limit, not from the memcg which the page would be charged to.
      
      mem_cgroup_try_charge_swapin() does all of these things properly, so we
      use it and call cancel_charge_swapin when it succeeded.
      
      The name of "shrink_usage" is not appropriate for this behavior, so we
      change it too.
      
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.cn>
      Cc: Paul Menage <menage@google.com>
      Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae3abae6
  11. Apr 13, 2009
    • Hugh Dickins's avatar
      shmem: respect MAX_LFS_FILESIZE · caefba17
      Hugh Dickins authored
      
      SHMEM_MAX_BYTES was derived from the maximum size of its triple-indirect
      swap vector, forgetting to take the MAX_LFS_FILESIZE limit into account.
      Never mind 256kB pages, even 8kB pages on 32-bit kernels allowed files to
      grow slightly bigger than that supposed maximum.
      
      Fix this by using the min of both (at build time not run time).  And it
      happens that this calculation is good as far as 8MB pages on 32-bit or
      16MB pages on 64-bit: though SHMSWP_MAX_INDEX gets truncated before that,
      it's truncated to such large numbers that we don't need to care.
      
      [akpm@linux-foundation.org: it needs pagemap.h]
      [akpm@linux-foundation.org: fix sparc64 min() warnings]
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Yuri Tikhonov <yur@emcraft.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caefba17
    • Yuri Tikhonov's avatar
      shmem: fix division by zero · 61609d01
      Yuri Tikhonov authored
      
      Fix a division by zero which we have in shmem_truncate_range() and
      shmem_unuse_inode() when using big PAGE_SIZE values (e.g.  256kB on
      ppc44x).
      
      With 256kB PAGE_SIZE, the ENTRIES_PER_PAGEPAGE constant becomes too large
      (0x1.0000.0000) on a 32-bit kernel, so this patch just changes its type
      from 'unsigned long' to 'unsigned long long'.
      
      Hugh: reverted its unsigned long longs in shmem_truncate_range() and
      shmem_getpage(): the pagecache index cannot be more than an unsigned long,
      so the divisions by zero occurred in unreached code.  It's a pity we need
      any ULL arithmetic here, but I found no pretty way to avoid it.
      
      Signed-off-by: default avatarYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      61609d01
  12. Apr 01, 2009
    • Hugh Dickins's avatar
      shmem: writepage directly to swap · 9fab5619
      Hugh Dickins authored
      
      Synopsis: if shmem_writepage calls swap_writepage directly, most shmem
      swap loads benefit, and a catastrophic interaction between SLUB and some
      flash storage is avoided.
      
      shmem_writepage() has always been peculiar in making no attempt to write:
      it has just transferred a shmem page from file cache to swap cache, then
      let that page make its way around the LRU again before being written and
      freed.
      
      The idea was that people use tmpfs because they want those pages to stay
      in RAM; so although we give it an overflow to swap, we should resist
      writing too soon, giving those pages a second chance before they can be
      reclaimed.
      
      That was always questionable, and I've toyed with this patch for years;
      but never had a clear justification to depart from the original design.
      
      It became more questionable in 2.6.28, when the split LRU patches classed
      shmem and tmpfs pages as SwapBacked rather than as file_cache: that in
      itself gives them more resistance to reclaim than normal file pages.  I
      prepared this patch for 2.6.29, but the merge window arrived before I'd
      completed gathering statistics to justify sending it in.
      
      Then while comparing SLQB against SLUB, running SLUB on a laptop I'd
      habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping
      tests five times slower than SLAB or SLQB - other machines slower too, but
      nowhere near so bad.  Simpler "cp -a" swapping tests showed the same.
      
      slub_max_order=0 brings sanity to all, but heavy swapping is too far from
      normal to justify such a tuning.  The crucial factor on that laptop turns
      out to be that I'm using an SD card for swap.  What happens is this:
      
      By default, SLUB uses order-2 pages for shmem_inode_cache (and many other
      fs inodes), so creating tmpfs files under memory pressure brings lumpy
      reclaim into play.  One subpage of the order is chosen from the bottom of
      the LRU as usual, then the other three picked out from their random
      positions on the LRUs.
      
      In a tmpfs load, many of these pages will be ones which already passed
      through shmem_writepage, so already have swap allocated.  And though their
      offsets on swap were probably allocated sequentially, now that the pages
      are picked off at random, their swap offsets are scattered.
      
      But the flash storage on the SD card is very sensitive to having its
      writes merged: once swap is written at scattered offsets, performance
      falls apart.  Rotating disk seeks increase too, but less disastrously.
      
      So: stop giving shmem/tmpfs pages a second pass around the LRU, write them
      out to swap as soon as their swap has been allocated.
      
      It's surely possible to devise an artificial load which runs faster the
      old way, one whose sizing is such that the tmpfs pages on their second
      pass are the ones that are wanted again, and other pages not.
      
      But I've not yet found such a load: on all machines, under the loads I've
      tried, immediate swap_writepage speeds up shmem swapping: especially when
      using the SLUB allocator (and more effectively than slub_max_order=0), but
      also with the others; and it also reduces the variance between runs.  How
      much faster varies widely: a factor of five is rare, 5% is common.
      
      One load which might have suffered: imagine a swapping shmem load in a
      limited mem_cgroup on a machine with plenty of memory.  Before 2.6.29 the
      swapcache was not charged, and such a load would have run quickest with
      the shmem swapcache never written to swap.  But now swapcache is charged,
      so even this load benefits from shmem_writepage directly to swap.
      
      Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h:
      it's silly because that will never get called; but refactoring shmem.c
      sensibly according to CONFIG_SWAP will be a separate task.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9fab5619
  13. Feb 25, 2009
    • Hugh Dickins's avatar
      shmem: fix shared anonymous accounting · 0b0a0806
      Hugh Dickins authored
      
      Each time I exit Firefox, /proc/meminfo's Committed_AS goes down almost
      400 kB: OVERCOMMIT_NEVER would be allowing overcommits it should
      prohibit.
      
      Commit fc8744ad "Stop playing silly
      games with the VM_ACCOUNT flag" changed shmem_file_setup() to set the
      shmem file's VM_ACCOUNT flag according to VM_NORESERVE not being set in
      the vma flags; but did so only _after_ the shmem_acct_size(flags, size)
      call which is expected to pre-account a shared anonymous object.
      
      It's all clearer if we switch shmem.c over to use VM_NORESERVE
      throughout in place of !VM_ACCOUNT.
      
      But I very nearly sent in a patch which mistakenly removed the
      accounting from tmpfs files: shmem_get_inode()'s memset was good for not
      setting VM_ACCOUNT, but now it needs to set VM_NORESERVE.
      
      Rather than setting that by default, then perhaps clearing it again in
      shmem_file_setup(), let's pass it as a flag to shmem_get_inode(): that
      allows us to remove the #ifdef CONFIG_SHMEM from shmem_file_setup().
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b0a0806
  14. Feb 10, 2009
    • Mimi Zohar's avatar
      integrity: shmem zero fix · ed850a52
      Mimi Zohar authored
      Based on comments from Mike Frysinger and Randy Dunlap:
      (http://lkml.org/lkml/2009/2/9/262
      
      )
      - moved ima.h include before CONFIG_SHMEM test to fix compiler error
        on Blackfin:
      mm/shmem.c: In function 'shmem_zero_setup':
      mm/shmem.c:2670: error: implicit declaration of function 'ima_shm_check'
      
      - added 'struct linux_binprm' in ima.h to fix compiler warning on Blackfin:
      In file included from mm/shmem.c:32:
      include/linux/ima.h:25: warning: 'struct linux_binprm' declared inside
      parameter list
      include/linux/ima.h:25: warning: its scope is only this definition or
      declaration, which is probably not what you want
      
      - moved fs.h include within _LINUX_IMA_H definition
      
      Signed-off-by: default avatarMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      ed850a52
  15. Feb 05, 2009
    • Mimi Zohar's avatar
      Integrity: IMA file free imbalance · 1df9f0a7
      Mimi Zohar authored
      
      The number of calls to ima_path_check()/ima_file_free()
      should be balanced.  An extra call to fput(), indicates
      the file could have been accessed without first being
      measured.
      
      Although f_count is incremented/decremented in places other
      than fget/fput, like fget_light/fput_light and get_file, the
      current task must already hold a file refcnt.  The call to
      __fput() is delayed until the refcnt becomes 0, resulting
      in ima_file_free() flagging any changes.
      
      - add hook to increment opencount for IPC shared memory(SYSV),
        shmat files, and /dev/zero
      - moved NULL iint test in opencount_get()
      
      Signed-off-by: default avatarMimi Zohar <zohar@us.ibm.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      1df9f0a7
  16. Jan 31, 2009
    • Linus Torvalds's avatar
      Stop playing silly games with the VM_ACCOUNT flag · fc8744ad
      Linus Torvalds authored
      
      The mmap_region() code would temporarily set the VM_ACCOUNT flag for
      anonymous shared mappings just to inform shmem_zero_setup() that it
      should enable accounting for the resulting shm object.  It would then
      clear the flag after calling ->mmap (for the /dev/zero case) or doing
      shmem_zero_setup() (for the MAP_ANON case).
      
      This just resulted in vma merge issues, but also made for just
      unnecessary confusion.  Use the already-existing VM_NORESERVE flag for
      this instead, and let shmem_{zero|file}_setup() just figure it out from
      that.
      
      This also happens to make it obvious that the new DRI2 GEM layer uses a
      non-reserving backing store for its object allocation - which is quite
      possibly not intentional.  But since I didn't want to change semantics
      in this patch, I left it alone, and just updated the caller to use the
      new flag semantics.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc8744ad
  17. Jan 08, 2009
    • KAMEZAWA Hiroyuki's avatar
      memcg: fix shmem's swap accounting · b5a84319
      KAMEZAWA Hiroyuki authored
      
      Now, you can see following even when swap accounting is enabled.
      
       1. Create Group 01, and 02.
       2. allocate a "file" on tmpfs by a task under 01.
       3. swap out the "file" (by memory pressure)
       4. Read "file" from a task in group 02.
       5. the charge of "file" is moved to group 02.
      
      This is not ideal behavior. This is because SwapCache which was loaded
      by read-ahead is not taken into account..
      
      This is a patch to fix shmem's swapcache behavior.
        - remove mem_cgroup_cache_charge_swapin().
        - Add SwapCache handler routine to mem_cgroup_cache_charge().
          By this, shmem's file cache is charged at add_to_page_cache()
          with GFP_NOWAIT.
        - pass the page of swapcache to shrink_mem_cgroup.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5a84319
    • KAMEZAWA Hiroyuki's avatar
      memcg: revert gfp mask fix · 2c26fdd7
      KAMEZAWA Hiroyuki authored
      
      My patch, memcg-fix-gfp_mask-of-callers-of-charge.patch changed gfp_mask
      of callers of charge to be GFP_HIGHUSER_MOVABLE for showing what will
      happen at memory reclaim.
      
      But in recent discussion, it's NACKed because it sounds ugly.
      
      This patch is for reverting it and add some clean up to gfp_mask of
      callers of charge.  No behavior change but need review before generating
      HUNK in deep queue.
      
      This patch also adds explanation to meaning of gfp_mask passed to charge
      functions in memcontrol.h.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c26fdd7
    • KAMEZAWA Hiroyuki's avatar
      memcg: handle swap caches · d13d1443
      KAMEZAWA Hiroyuki authored
      
      SwapCache support for memory resource controller (memcg)
      
      Before mem+swap controller, memcg itself should handle SwapCache in proper
      way.  This is cut-out from it.
      
      In current memcg, SwapCache is just leaked and the user can create tons of
      SwapCache.  This is a leak of account and should be handled.
      
      SwapCache accounting is done as following.
      
        charge (anon)
      	- charged when it's mapped.
      	  (because of readahead, charge at add_to_swap_cache() is not sane)
        uncharge (anon)
      	- uncharged when it's dropped from swapcache and fully unmapped.
      	  means it's not uncharged at unmap.
      	  Note: delete from swap cache at swap-in is done after rmap information
      	        is established.
        charge (shmem)
      	- charged at swap-in. this prevents charge at add_to_page_cache().
      
        uncharge (shmem)
      	- uncharged when it's dropped from swapcache and not on shmem's
      	  radix-tree.
      
        at migration, check against 'old page' is modified to handle shmem.
      
      Comparing to the old version discussed (and caused troubles), we have
      advantages of
        - PCG_USED bit.
        - simple migrating handling.
      
      So, situation is much easier than several months ago, maybe.
      
      [hugh@veritas.com: memcg: handle swap caches build fix]
      Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Tested-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d13d1443
    • KAMEZAWA Hiroyuki's avatar
      memcg: fix gfp_mask of callers of charge · bced0520
      KAMEZAWA Hiroyuki authored
      
      Fix misuse of gfp_kernel.
      
      Now, most of callers of mem_cgroup_charge_xxx functions uses GFP_KERNEL.
      
      I think that this is from the fact that page_cgroup *was* dynamically
      allocated.
      
      But now, we allocate all page_cgroup at boot.  And
      mem_cgroup_try_to_free_pages() reclaim memory from GFP_HIGHUSER_MOVABLE +
      specified GFP_RECLAIM_MASK.
      
        * This is because we just want to reduce memory usage.
          "Where we should reclaim from ?" is not a problem in memcg.
      
      This patch modifies gfp masks to be GFP_HIGUSER_MOVABLE if possible.
      
      Note: This patch is not for fixing behavior but for showing sane information
            in source code.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bced0520
  18. Jan 06, 2009
  19. Nov 13, 2008
  20. Oct 30, 2008
    • Alan Cox's avatar
      nfsd: fix vm overcommit crash · 731572d3
      Alan Cox authored
      
      Junjiro R.  Okajima reported a problem where knfsd crashes if you are
      using it to export shmemfs objects and run strict overcommit.  In this
      situation the current->mm based modifier to the overcommit goes through a
      NULL pointer.
      
      We could simply check for NULL and skip the modifier but we've caught
      other real bugs in the past from mm being NULL here - cases where we did
      need a valid mm set up (eg the exec bug about a year ago).
      
      To preserve the checks and get the logic we want shuffle the checking
      around and add a new helper to the vm_ security wrappers
      
      Also fix a current->mm reference in nommu that should use the passed mm
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix build]
      Reported-by: default avatarJunjiro R. Okajima <hooanon05@yahoo.co.jp>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Signed-off-by: default avatarAlan Cox <alan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      731572d3
  21. Oct 20, 2008
    • Lee Schermerhorn's avatar
      SHM_LOCKED pages are unevictable · 89e004ea
      Lee Schermerhorn authored
      
      Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be
      kept on the normal LRU, since scanning them is a waste of time and might
      throw off kswapd's balancing algorithms.  Place them on the unevictable
      LRU list instead.
      
      Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared
      memory regions as unevictable.  Then these pages will be culled off the
      normal LRU lists during vmscan.
      
      Add new wrapper function to clear the mapping's unevictable state when/if
      shared memory segment is munlocked.
      
      Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in
      the shmem segment's mapping [struct address_space] for evictability now
      that they're no longer locked.  If so, move them to the appropriate zone
      lru list.
      
      Changes depend on [CONFIG_]UNEVICTABLE_LRU.
      
      [kosaki.motohiro@jp.fujitsu.com: revert shm change]
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarKosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89e004ea
    • Rik van Riel's avatar
      vmscan: split LRU lists into anon & file sets · 4f98a2fe
      Rik van Riel authored
      
      Split the LRU lists in two, one set for pages that are backed by real file
      systems ("file") and one for pages that are backed by memory and swap
      ("anon").  The latter includes tmpfs.
      
      The advantage of doing this is that the VM will not have to scan over lots
      of anonymous pages (which we generally do not want to swap out), just to
      find the page cache pages that it should evict.
      
      This patch has the infrastructure and a basic policy to balance how much
      we scan the anon lists and how much we scan the file lists.  The big
      policy changes are in separate patches.
      
      [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
      [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
      [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
      [hugh@veritas.com: memcg swapbacked pages active]
      [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
      [akpm@linux-foundation.org: fix /proc/vmstat units]
      [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
      [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
      [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarLee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f98a2fe
    • Rik van Riel's avatar
      define page_file_cache() function · b2e18538
      Rik van Riel authored
      
      Define page_file_cache() function to answer the question:
      	is page backed by a file?
      
      Originally part of Rik van Riel's split-lru patch.  Extracted to make
      available for other, independent reclaim patches.
      
      Moved inline function to linux/mm_inline.h where it will be needed by
      subsequent "split LRU" and "noreclaim" patches.
      
      Unfortunately this needs to use a page flag, since the PG_swapbacked state
      needs to be preserved all the way to the point where the page is last
      removed from the LRU.  Trying to derive the status from other info in the
      page resulted in wrong VM statistics in earlier split VM patchsets.
      
      The total number of page flags in use on a 32 bit machine after this patch
      is 19.
      
      [akpm@linux-foundation.org: fix up out-of-order merge fallout]
      [hugh@veritas.com: splitlru: shmem_getpage SetPageSwapBacked sooner[
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarMinChan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b2e18538
  22. Oct 17, 2008
  23. Oct 12, 2008
  24. Aug 04, 2008
  25. Jul 28, 2008
  26. Jul 26, 2008
Loading