1. 07 Apr, 2014 1 commit
  2. 03 Apr, 2014 2 commits
    • Johannes Weiner's avatar
      mm + fs: prepare for non-page entries in page cache radix trees · 0cd6144a
      Johannes Weiner authored
      
      
      shmem mappings already contain exceptional entries where swap slot
      information is remembered.
      
      To be able to store eviction information for regular page cache, prepare
      every site dealing with the radix trees directly to handle entries other
      than pages.
      
      The common lookup functions will filter out non-page entries and return
      NULL for page cache holes, just as before.  But provide a raw version of
      the API which returns non-page entries as well, and switch shmem over to
      use it.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0cd6144a
    • Johannes Weiner's avatar
      mm: shmem: save one radix tree lookup when truncating swapped pages · 6dbaf22c
      Johannes Weiner authored
      
      
      Page cache radix tree slots are usually stabilized by the page lock, but
      shmem's swap cookies have no such thing.  Because the overall truncation
      loop is lockless, the swap entry is currently confirmed by a tree lookup
      and then deleted by another tree lookup under the same tree lock region.
      
      Use radix_tree_delete_item() instead, which does the verification and
      deletion with only one lookup.  This also allows removing the
      delete-only special case from shmem_radix_tree_replace().
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6dbaf22c
  3. 26 Jan, 2014 1 commit
  4. 23 Jan, 2014 1 commit
  5. 02 Dec, 2013 1 commit
    • Eric Paris's avatar
      security: shmem: implement kernel private shmem inodes · c7277090
      Eric Paris authored
      
      
      We have a problem where the big_key key storage implementation uses a
      shmem backed inode to hold the key contents.  Because of this detail of
      implementation LSM checks are being done between processes trying to
      read the keys and the tmpfs backed inode.  The LSM checks are already
      being handled on the key interface level and should not be enforced at
      the inode level (since the inode is an implementation detail, not a
      part of the security model)
      
      This patch implements a new function shmem_kernel_file_setup() which
      returns the equivalent to shmem_file_setup() only the underlying inode
      has S_PRIVATE set.  This means that all LSM checks for the inode in
      question are skipped.  It should only be used for kernel internal
      operations where the inode is not exposed to userspace without proper
      LSM checking.  It is possible that some other users of
      shmem_file_setup() should use the new interface, but this has not been
      explored.
      
      Reproducing this bug is a little bit difficult.  The steps I used on
      Fedora are:
      
       (1) Turn off selinux enforcing:
      
      	setenforce 0
      
       (2) Create a huge key
      
      	k=`dd if=/dev/zero bs=8192 count=1 | keyctl padd big_key test-key @s`
      
       (3) Access the key in another context:
      
      	runcon system_u:system_r:httpd_t:s0-s0:c0.c1023 keyctl print $k >/dev/null
      
       (4) Examine the audit logs:
      
      	ausearch -m AVC -i --subject httpd_t | audit2allow
      
      If the last command's output includes a line that looks like:
      
      	allow httpd_t user_tmpfs_t:file { open read };
      
      There was an inode check between httpd and the tmpfs filesystem.  With
      this patch no such denial will be seen.  (NOTE! you should clear your
      audit log if you have tested for this previously)
      
      (Please return you box to enforcing)
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Hugh Dickins <hughd@google.com>
      cc: linux-mm@kvack.org
      c7277090
  6. 11 Sep, 2013 2 commits
    • Rob Landley's avatar
      initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled · 16203a7a
      Rob Landley authored
      
      
      Conditionally call the appropriate fs_init function and fill_super
      functions.  Add a use once guard to shmem_init() to simply succeed on a
      second call.
      
      (Note that IS_ENABLED() is a compile time constant so dead code
      elimination removes unused function calls when CONFIG_TMPFS is disabled.)
      Signed-off-by: default avatarRob Landley <rob@landley.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Stephen Warren <swarren@nvidia.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16203a7a
    • Jan Kara's avatar
      lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt · 5e4c0d97
      Jan Kara authored
      
      
      With users of radix_tree_preload() run from interrupt (block/blk-ioc.c is
      one such possible user), the following race can happen:
      
      radix_tree_preload()
      ...
      radix_tree_insert()
        radix_tree_node_alloc()
          if (rtp->nr) {
            ret = rtp->nodes[rtp->nr - 1];
      <interrupt>
      ...
      radix_tree_preload()
      ...
      radix_tree_insert()
        radix_tree_node_alloc()
          if (rtp->nr) {
            ret = rtp->nodes[rtp->nr - 1];
      
      And we give out one radix tree node twice.  That clearly results in radix
      tree corruption with different results (usually OOPS) depending on which
      two users of radix tree race.
      
      We fix the problem by making radix_tree_node_alloc() always allocate fresh
      radix tree nodes when in interrupt.  Using preloading when in interrupt
      doesn't make sense since all the allocations have to be atomic anyway and
      we cannot steal nodes from process-context users because some users rely
      on radix_tree_insert() succeeding after radix_tree_preload().
      in_interrupt() check is somewhat ugly but we cannot simply key off passed
      gfp_mask as that is acquired from root_gfp_mask() and thus the same for
      all preload users.
      
      Another part of the fix is to avoid node preallocation in
      radix_tree_preload() when passed gfp_mask doesn't allow waiting.  Again,
      preallocation in such case doesn't make sense and when preallocation would
      happen in interrupt we could possibly leak some allocated nodes.  However,
      some users of radix_tree_preload() require following radix_tree_insert()
      to succeed.  To avoid unexpected effects for these users,
      radix_tree_preload() only warns if passed gfp mask doesn't allow waiting
      and we provide a new function radix_tree_maybe_preload() for those users
      which get different gfp mask from different call sites and which are
      prepared to handle radix_tree_insert() failure.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e4c0d97
  7. 03 Sep, 2013 1 commit
  8. 24 Aug, 2013 1 commit
  9. 04 Aug, 2013 1 commit
  10. 03 Jul, 2013 1 commit
    • Jie Liu's avatar
      vfs: export lseek_execute() to modules · 46a1c2c7
      Jie Liu authored
      
      
      For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
      SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
      matter in lseek_execute() to update the current file offset
      to the desired offset if it is valid, ceph also does the
      simliar things at ceph_llseek().
      
      To reduce the duplications, this patch make lseek_execute()
      public accessible so that we can call it directly from the
      underlying file systems.
      
      Thanks Dave Chinner for this suggestion.
      
      [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]
      
      v2->v1:
      - Add kernel-doc comments for lseek_execute()
      - Call lseek_execute() in ceph->llseek()
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Ted Tso <tytso@mit.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      46a1c2c7
  11. 29 Jun, 2013 1 commit
  12. 20 Jun, 2013 1 commit
  13. 07 May, 2013 1 commit
  14. 29 Apr, 2013 1 commit
  15. 01 Mar, 2013 1 commit
  16. 26 Feb, 2013 2 commits
  17. 23 Feb, 2013 3 commits
    • Greg Thelen's avatar
      tmpfs: fix mempolicy object leaks · 49cd0a5c
      Greg Thelen authored
      
      
      Fix several mempolicy leaks in the tmpfs mount logic.  These leaks are
      slow - on the order of one object leaked per mount attempt.
      
      Leak 1 (umount doesn't free mpol allocated in mount):
          while true; do
              mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt
              umount /mnt
          done
      
      Leak 2 (errors parsing remount options will leak mpol):
          mount -t tmpfs -o size=100M nodev /mnt
          while true; do
              mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null
          done
          umount /mnt
      
      Leak 3 (multiple mpol per mount leak mpol):
          while true; do
              mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt
              umount /mnt
          done
      
      This patch fixes all of the above.  I could have broken the patch into
      three pieces but is seemed easier to review as one.
      
      [akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh]
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49cd0a5c
    • Greg Thelen's avatar
      tmpfs: fix use-after-free of mempolicy object · 5f00110f
      Greg Thelen authored
      
      
      The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
      option is not specified in the remount request.  A new policy can be
      specified if mpol=M is given.
      
      Before this patch remounting an mpol bound tmpfs without specifying
      mpol= mount option in the remount request would set the filesystem's
      mempolicy object to a freed mempolicy object.
      
      To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
          # mkdir /tmp/x
      
          # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
      
          # mount -o remount,size=200M nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
              # note ? garbage in mpol=... output above
      
          # dd if=/dev/zero of=/tmp/x/f count=1
              # panic here
      
      Panic:
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          [...]
          Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
          Call Trace:
            mpol_shared_policy_init+0xa5/0x160
            shmem_get_inode+0x209/0x270
            shmem_mknod+0x3e/0xf0
            shmem_create+0x18/0x20
            vfs_create+0xb5/0x130
            do_last+0x9a1/0xea0
            path_openat+0xb3/0x4d0
            do_filp_open+0x42/0xa0
            do_sys_open+0xfe/0x1e0
            compat_sys_open+0x1b/0x20
            cstar_dispatch+0x7/0x1f
      
      Non-debug kernels will not crash immediately because referencing the
      dangling mpol will not cause a fault.  Instead the filesystem will
      reference a freed mempolicy object, which will cause unpredictable
      behavior.
      
      The problem boils down to a dropped mpol reference below if
      shmem_parse_options() does not allocate a new mpol:
      
          config = *sbinfo
          shmem_parse_options(data, &config, true)
          mpol_put(sbinfo->mpol)
          sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */
      
      This patch avoids the crash by not releasing the mempolicy if
      shmem_parse_options() doesn't create a new mpol.
      
      How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
      not look back further.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f00110f
    • Johannes Weiner's avatar
      mm: shmem: use new radix tree iterator · 860f2759
      Johannes Weiner authored
      In shmem_find_get_pages_and_swap(), use the faster radix tree iterator
      construct from commit 78c1d784
      
       ("radix-tree: introduce bit-optimized
      iterator").
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      860f2759
  18. 22 Feb, 2013 3 commits
  19. 26 Jan, 2013 1 commit
    • Eric W. Biederman's avatar
      userns: Allow the userns root to mount tmpfs. · 2b8576cb
      Eric W. Biederman authored
      
      
      There is no backing store to tmpfs and file creation rules are the
      same as for any other filesystem so it is semantically safe to allow
      unprivileged users to mount it.  ramfs is safe for the same reasons so
      allow either flavor of tmpfs to be mounted by a user namespace root
      user.
      
      The memory control group successfully limits how much memory tmpfs can
      consume on any system that cares about a user namespace root using
      tmpfs to exhaust memory the memory control group can be deployed.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      2b8576cb
  20. 02 Jan, 2013 1 commit
  21. 17 Dec, 2012 1 commit
  22. 12 Dec, 2012 1 commit
    • Hugh Dickins's avatar
      tmpfs: support SEEK_DATA and SEEK_HOLE (reprise) · 220f2ac9
      Hugh Dickins authored
      Revert 3.5's commit f21f8062 ("tmpfs: revert SEEK_DATA and
      SEEK_HOLE") to reinstate 4fb5ef08
      
       ("tmpfs: support SEEK_DATA and
      SEEK_HOLE"), with the intervening additional arg to
      generic_file_llseek_size().
      
      In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper
      SEEK_DATA and SEEK_HOLE support; and a good case has now been made for
      it on tmpfs, so let's join the party.
      
      It's quite easy for tmpfs to scan the radix_tree to support llseek's new
      SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are
      still on my mind (in particular, the !PageUptodate-ness of pages
      fallocated but still unwritten).
      
      [akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <wenqing.lz@taobao.com>
      Cc: Jeff liu <jeff.liu@oracle.com>
      Cc: Paul Eggert <eggert@cs.ucla.edu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Josef Bacik <josef@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Sunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      220f2ac9
  23. 06 Dec, 2012 1 commit
    • Mel Gorman's avatar
      tmpfs: fix shared mempolicy leak · 18a2f371
      Mel Gorman authored
      This fixes a regression in 3.7-rc, which has since gone into stable.
      
      Commit 00442ad0 ("mempolicy: fix a memory corruption by refcount
      imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
      refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
      on expecting alloc_page_vma() to drop the refcount it had acquired.
      This deserves a rework: but for now fix the leak in shmem_alloc_page().
      
      Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
      the same refcounting there as in shmem_alloc_page(), delete its onstack
      mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
      those were invented to let swapin_readahead() make an unknown number of
      calls to alloc_pages_vma() with one mempolicy; but since 00442ad0
      
      ,
      alloc_pages_vma() has kept refcount in balance, so now no problem.
      Reported-and-tested-by: default avatarTommi Rantala <tt.rantala@gmail.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18a2f371
  24. 16 Nov, 2012 2 commits
    • Hugh Dickins's avatar
      tmpfs: change final i_blocks BUG to WARNING · 0f3c42f5
      Hugh Dickins authored
      
      
      Under a particular load on one machine, I have hit shmem_evict_inode()'s
      BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
      race between swapout and eviction.
      
      It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
      and the lack of coherent locking between mapping's nrpages and shmem's
      swapped count.  There's a window in shmem_writepage(), between lowering
      nrpages in shmem_delete_from_page_cache() and then raising swapped
      count, when the freed count appears to be +1 when it should be 0, and
      then the asymmetry stops it from being corrected with -1 before hitting
      the BUG.
      
      One answer is coherent locking: using tree_lock throughout, without
      info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
      used_blocks makes that messier than expected.  Another answer may be a
      further effort to eliminate the weird shmem_recalc_inode() altogether,
      but previous attempts at that failed.
      
      So far undecided, but for now change the BUG_ON to WARN_ON: in usual
      circumstances it remains a useful consistency check.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f3c42f5
    • Hugh Dickins's avatar
      tmpfs: fix shmem_getpage_gfp() VM_BUG_ON · 215c02bc
      Hugh Dickins authored
      
      
      Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora
      has converted to WARNING) in shmem_getpage_gfp():
      
        WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
        Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
        Call Trace:
          warn_slowpath_common+0x7f/0xc0
          warn_slowpath_null+0x1a/0x20
          shmem_getpage_gfp+0xa5c/0xa70
          shmem_fault+0x4f/0xa0
          __do_fault+0x71/0x5c0
          handle_pte_fault+0x97/0xae0
          handle_mm_fault+0x289/0x350
          __do_page_fault+0x18e/0x530
          do_page_fault+0x2b/0x50
          page_fault+0x28/0x30
          tracesys+0xe1/0xe6
      
      Thanks to Johannes for pointing to truncation: free_swap_and_cache()
      only does a trylock on the page, so the page lock we've held since
      before confirming swap is not enough to protect against truncation.
      
      What cleanup is needed in this case? Just delete_from_swap_cache(),
      which takes care of the memcg uncharge.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      215c02bc
  25. 09 Oct, 2012 2 commits
    • Hugh Dickins's avatar
      tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking · 35c2a7f4
      Hugh Dickins authored
      
      
      Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
      	u64 inum = fid->raw[2];
      which is unhelpfully reported as at the end of shmem_alloc_inode():
      
      BUG: unable to handle kernel paging request at ffff880061cd3000
      IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
      Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Call Trace:
       [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
       [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
       [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
       [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6
      
      Right, tmpfs is being stupid to access fid->raw[2] before validating that
      fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
      fall at the end of a page, and the next page not be present.
      
      But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
      careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
      could oops in the same way: add the missing fh_len checks to those.
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      35c2a7f4
    • Konstantin Khlebnikov's avatar
      mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4
      Konstantin Khlebnikov authored
      
      
      Move actual pte filling for non-linear file mappings into the new special
      vma operation: ->remap_pages().
      
      Filesystems must implement this method to get non-linear mapping support,
      if it uses filemap_fault() then generic_file_remap_pages() can be used.
      
      Now device drivers can implement this method and obtain nonlinear vma support.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b173bc4
  26. 24 Aug, 2012 1 commit
    • Aristeu Rozanski's avatar
      xattr: extract simple_xattr code from tmpfs · 38f38657
      Aristeu Rozanski authored
      
      
      Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.
      
      $ size vmlinux.o
         text    data     bss     dec     hex filename
      4658782  880729 5195032 10734543         a3cbcf vmlinux.o
      $ size vmlinux.o
         text    data     bss     dec     hex filename
      4658957  880729 5195032 10734718         a3cc7e vmlinux.o
      
      v7:
      - checkpatch warnings fixed
      - Implement the changes requested by Hugh Dickins:
      	- make simple_xattrs_init and simple_xattrs_free inline
      	- get rid of locking and list reinitialization in simple_xattrs_free,
      	  they're not needed
      v6:
      - no changes
      v5:
      - no changes
      v4:
      - move simple_xattrs_free() to fs/xattr.c
      v3:
      - in kmem_xattrs_free(), reinitialize the list
      - use simple_xattr_* prefix
      - introduce simple_xattr_add() to prevent direct list usage
      Original-patch-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      38f38657
  27. 31 Jul, 2012 1 commit
  28. 14 Jul, 2012 1 commit
  29. 11 Jul, 2012 3 commits
    • Hugh Dickins's avatar
      shmem: cleanup shmem_add_to_page_cache · b065b432
      Hugh Dickins authored
      
      
      shmem_add_to_page_cache() has three callsites, but only one of them wants
      the radix_tree_preload() (an exceptional entry guarantees that the radix
      tree node is present in the other cases), and only that site can achieve
      mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the
      other cases).  We did it this way originally to reflect
      add_to_page_cache_locked(); but it's confusing now, so move the radix_tree
      preloading and mem_cgroup uncharging to that one caller.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b065b432
    • Hugh Dickins's avatar
      shmem: fix negative rss in memcg memory.stat · d1899228
      Hugh Dickins authored
      
      
      When adding the page_private checks before calling shmem_replace_page(), I
      did realize that there is a further race, but thought it too unlikely to
      need a hurried fix.
      
      But independently I've been chasing why a mem cgroup's memory.stat
      sometimes shows negative rss after all tasks have gone: I expected it to
      be a stats gathering bug, but actually it's shmem swapping's fault.
      
      It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
      the page may have been removed from swapcache before getting the lock; or
      it may have been freed and reused and be back in swapcache; and it can
      even be using the same swap location as before (page_private same).
      
      The swapoff case is already secure against this (swap cannot be reused
      until the whole area has been swapped off, and a new swapped on); and
      shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
      the expected radix_tree entry - but a little too late.
      
      By that time, we might have already decided to shmem_replace_page(): I
      don't know of a problem from that, but I'd feel more at ease not to do so
      spuriously.  And we have already done mem_cgroup_cache_charge(), on
      perhaps the wrong mem cgroup: and this charge is not then undone on the
      error path, because PageSwapCache ends up preventing that.
      
      It's this last case which causes the occasional negative rss in
      memory.stat: the page is charged here as cache, but (sometimes) found to
      be anon when eventually it's uncharged - and in between, it's an
      undeserved charge on the wrong memcg.
      
      Fix this by adding an earlier check on the radix_tree entry: it's
      inelegant to descend the tree twice, but swapping is not the fast path,
      and a better solution would need a pair (try+commit) of memcg calls, and a
      rework of shmem_replace_page() to keep out of the swapcache.
      
      We can use the added shmem_confirm_swap() function to replace the
      find_get_page+page_cache_release we were already doing on the error path.
      And add a comment on that -EEXIST: it seems a peculiar errno to be using,
      but originates from its use in radix_tree_insert().
      
      [It can be surprising to see positive rss left in a memcg's memory.stat
      after all tasks have gone, since it is supposed to count anonymous but not
      shmem.  Aside from sharing anon pages via fork with a task in some other
      memcg, it often happens after swapping: because a swap page can't be freed
      while under writeback, nor while locked.  So it's not an error, and these
      residual pages are easily freed once pressure demands.]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1899228
    • Hugh Dickins's avatar
      tmpfs: revert SEEK_DATA and SEEK_HOLE · f21f8062
      Hugh Dickins authored
      Revert 4fb5ef08
      
       ("tmpfs: support SEEK_DATA and SEEK_HOLE").  I believe
      it's correct, and it's been nice to have from rc1 to rc6; but as the
      original commit said:
      
      I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
      would be of any use to them on tmpfs.  This code adds 92 lines and 752
      bytes on x86_64 - is that bloat or worthwhile?
      
      Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to
      the dumb generic support for v3.5.  We can always reinstate it later if
      useful, and anyone needing it in a hurry can just get it out of git.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Josef Bacik <josef@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Jeff liu <jeff.liu@oracle.com>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f21f8062