1. 30 Mar, 2009 1 commit
  2. 23 Feb, 2009 1 commit
  3. 05 Jan, 2009 1 commit
    • Alexey Dobriyan's avatar
      proc: stop using BKL · b4df2b92
      Alexey Dobriyan authored
      There are four BKL users in proc: de_put(), proc_lookup_de(),
      proc_readdir_de(), proc_root_readdir(),
      1) de_put()
      de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
      needed. BKL doesn't matter to possible refcount leak as well.
      2) proc_lookup_de()
      Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
      potentially blocking, all callers of proc_lookup_de() eventually end up
      from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
      doesn't protect anything.
      3) proc_readdir_de()
      "." and ".." part doesn't need BKL, walking PDE list is under
      proc_subdir_lock, calling filldir callback is potentially blocking
      because it writes to luserspace. All proc_readdir_de() callers
      eventually come from ->readdir hook which is under directory's
      ->i_mutex -- BKL doesn't protect anything.
      4) proc_root_readdir_de()
      proc_root_readdir_de is ->readdir hook, see (3).
      Since readdir hooks doesn't use BKL anymore, switch to
      generic_file_llseek, since it also takes directory's i_mutex.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
  4. 23 Oct, 2008 1 commit
  5. 09 Oct, 2008 1 commit
  6. 26 Jul, 2008 2 commits
    • Al Viro's avatar
      [PATCH] sanitize proc_sysctl · 9043476f
      Al Viro authored
      * keep references to ctl_table_head and ctl_table in /proc/sys inodes
      * grab the former during operations, use the latter for access to
        entry if that succeeds
      * have ->d_compare() check if table should be seen for one who does lookup;
        that allows us to avoid flipping inodes - if we have the same name resolve
        to different things, we'll just keep several dentries and ->d_compare()
        will reject the wrong ones.
      * have ->lookup() and ->readdir() scan the table of our inode first, then
        walk all ctl_table_header and scan ->attached_by for those that are
        attached to our directory.
      * implement ->getattr().
      * get rid of insane amounts of tree-walking
      * get rid of the need to know dentry in ->permission() and of the contortions
        induced by that.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Alexey Dobriyan's avatar
      SL*B: drop kmem cache argument from constructor · 51cc5068
      Alexey Dobriyan authored
      Kmem cache passed to constructor is only needed for constructors that are
      themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
      passed kmem cache in non-trivial way, so pass only pointer to object.
      Non-trivial places are:
      This is flag day, yes.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Matt Mackall <mpm@selenic.com>
      [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
      [akpm@linux-foundation.org: fix mm/slab.c]
      [akpm@linux-foundation.org: fix ubifs]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 25 Jul, 2008 2 commits
  8. 24 May, 2008 1 commit
  9. 29 Apr, 2008 1 commit
  10. 08 Feb, 2008 1 commit
  11. 07 Feb, 2008 1 commit
  12. 05 Dec, 2007 1 commit
    • Alexey Dobriyan's avatar
      proc: fix proc_dir_entry refcounting · 5a622f2d
      Alexey Dobriyan authored
      Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
      Switch to usual scheme:
      * PDE is created with refcount 1
      * every de_get does +1
      * every de_put() and remove_proc_entry() do -1
      * once refcount reaches 0, PDE is freed.
      This elegantly fixes at least two following races (both observed) without
      introducing new locks, without abusing old locks, without spreading
      1) PDE leak
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      if (atomic_read(&de->count) == 0)
      					if (atomic_dec_and_test(&de->count))
      						if (de->deleted)
      							/* also not taken! */
      	de->deleted = 1;
      		[refcount=0, deleted=1]
      2) use after free
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      					if (atomic_dec_and_test(&de->count))
      if (atomic_read(&de->count) == 0)
      						/* boom! */
      						if (de->deleted)
      BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
      printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
      Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c086340
      EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
      EIP is at strnlen+0x6/0x18
      EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
      ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
      Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
             c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
             f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
      Call Trace:
       [<c10ac4f0>] vsnprintf+0x2ad/0x49b
       [<c10ac779>] vscnprintf+0x14/0x1f
       [<c1018e6b>] vprintk+0xc5/0x2f9
       [<c10379f1>] handle_fasteoi_irq+0x0/0xab
       [<c1004f44>] do_IRQ+0x9f/0xb7
       [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
       [<c100264e>] need_resched+0x1f/0x21
       [<c10190ba>] printk+0x1b/0x1f
       [<c107c8ad>] de_put+0x3d/0x50
       [<c107c8f8>] proc_delete_inode+0x38/0x41
       [<c107c8c0>] proc_delete_inode+0x0/0x41
       [<c1066298>] generic_delete_inode+0x5e/0xc6
       [<c1065aa9>] iput+0x60/0x62
       [<c1063c8e>] d_kill+0x2d/0x46
       [<c1063fa9>] dput+0xdc/0xe4
       [<c10571a1>] __fput+0xb0/0xcd
       [<c1054e49>] filp_close+0x48/0x4f
       [<c1055ee9>] sys_close+0x67/0xa5
       [<c10026b6>] sysenter_past_esp+0x5f/0x85
      Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
      EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44
      Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
      module is already pinned and remove_proc_entry() can't happen => nobody
      can mark PDE deleted.
      Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
      never get it, it's just for proper /proc/net removal. I double checked
      CLONE_NETNS continues to work.
      Patch survives many hours of modprobe/rmmod/cat loops without new bugs
      which can be attributed to refcounting.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  13. 19 Oct, 2007 1 commit
    • Pavel Emelyanov's avatar
      pid namespaces: make proc have multiple superblocks - one for each namespace · 07543f5c
      Pavel Emelyanov authored
      Each pid namespace have to be visible through its own proc mount.  Thus we
      need to have per-namespace proc trees with their own superblocks.
      We cannot easily show different pid namespace via one global proc tree, since
      each pid refers to different tasks in different namespaces.  E.g.  pid 1
      refers to the init task in the initial namespace and to some other task when
      seeing from another namespace.  Moreover - pid, exisintg in one namespace may
      not exist in the other.
      This approach has one move advantage is that the tasks from the init namespace
      can see what tasks live in another namespace by reading entries from another
      proc tree.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  14. 17 Oct, 2007 2 commits
  15. 11 Sep, 2007 1 commit
    • Alexey Dobriyan's avatar
      Fix select on /proc files without ->poll · dd23aae4
      Alexey Dobriyan authored
      Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
       aka "Fix rmmod/read/write races
      in /proc entries" broke SBCL + SLIME combo.
      The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
      ->poll handler.  The new code makes ->poll always there and returns 0 by
      default, which is not correct.  Return DEFAULT_POLLMASK instead.
      Steps to reproduce:
      	install emacs, SBCL, SLIME
      	M-x slime	in *inferior-lisp* buffer
      	[watch it doing "Connecting to Swank on port X.."]
      Please, apply before 2.6.23.
      P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  16. 28 Jul, 2007 1 commit
  17. 19 Jul, 2007 1 commit
    • Paul Mundt's avatar
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt authored
      Slab destructors were no longer supported after Christoph's
       change. They've been
      BUGs for both slab and slub, and slob never supported them
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
  18. 16 Jul, 2007 1 commit
    • Alexey Dobriyan's avatar
      Fix rmmod/read/write races in /proc entries · 786d7e16
      Alexey Dobriyan authored
      Fix following races:
      1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
         meanwhile. Or, more generically, system call done on /proc file, method
         supplied by module is called, module dissapeares meanwhile.
         pde = create_proc_entry()
         if (!pde)
      	return -ENOMEM;
         pde->write_proc = ...
         pde = create_proc_entry();
         if (!pde) {
      	return -ENOMEM;
      	/* module unloaded */
      2. bogo-revoke aka proc_kill_inodes()
        remove_proc_entry		vfs_read
        proc_kill_inodes		[check ->f_op validness]
      				[check ->f_op->read validness]
      				[verify_area, security permissions checks]
      	->f_op = NULL;
      				if (file->f_op->read)
      					/* ->f_op dereference, boom */
      NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
      see how this scheme behaves, then extend if needed for directories.
      Directories creators in /proc only set ->owner for them, so proxying for
      directories may be unneeded.
      NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
      ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
      If your in-tree module uses something else, yell on me. Full audit pending.
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  19. 17 May, 2007 1 commit
    • Christoph Lameter's avatar
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter authored
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 08 May, 2007 2 commits
  21. 07 May, 2007 1 commit
    • Christoph Lameter's avatar
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter authored
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  22. 14 Feb, 2007 1 commit
    • Eric W. Biederman's avatar
      [PATCH] sysctl: reimplement the sysctl proc support · 77b14db5
      Eric W. Biederman authored
      With this change the sysctl inodes can be cached and nothing needs to be done
      when removing a sysctl table.
      For a cost of 2K code we will save about 4K of static tables (when we remove
      de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
      about half that on a 32bit arch.
      The speed feels about the same, even though we can now cache the sysctl
      dentries :(
      We get the core advantage that we don't need to have a 1 to 1 mapping between
      ctl table entries and proc files.  Making it possible to have /proc/sys vary
      depending on the namespace you are in.  The currently merged namespaces don't
      have an issue here but the network namespace under /proc/sys/net needs to have
      different directories depending on which network adapters are visible.  By
      simply being a cache different directories being visible depending on who you
      are is trivial to implement.
      [akpm@osdl.org: fix uninitialised var]
      [akpm@osdl.org: fix ARM build]
      [bunk@stusta.de: make things static]
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  23. 12 Feb, 2007 1 commit
  24. 07 Dec, 2006 2 commits
  25. 15 Jul, 2006 1 commit
  26. 26 Jun, 2006 3 commits
  27. 24 Mar, 2006 2 commits
    • Paul Jackson's avatar
      [PATCH] cpuset memory spread: slab cache format · fffb60f9
      Paul Jackson authored
      Rewrap the overly long source code lines resulting from the previous
      patch's addition of the slab cache flag SLAB_MEM_SPREAD.  This patch
      contains only formatting changes, and no function change.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Paul Jackson's avatar
      [PATCH] cpuset memory spread: slab cache filesystems · 4b6a9316
      Paul Jackson authored
      Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
      memory spreading.
      If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
      in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
      from such a slab cache, the allocations are spread evenly over all the
      memory nodes (task->mems_allowed) allowed to that task, instead of favoring
      allocation on the node local to the current cpu.
      The following inode and similar caches are marked SLAB_MEM_SPREAD:
          file                               cache
          ====                               =====
          fs/adfs/super.c                    adfs_inode_cache
          fs/affs/super.c                    affs_inode_cache
          fs/befs/linuxvfs.c                 befs_inode_cache
          fs/bfs/inode.c                     bfs_inode_cache
          fs/block_dev.c                     bdev_cache
          fs/cifs/cifsfs.c                   cifs_inode_cache
          fs/coda/inode.c                    coda_inode_cache
          fs/dquot.c                         dquot
          fs/efs/super.c                     efs_inode_cache
          fs/ext2/super.c                    ext2_inode_cache
          fs/ext2/xattr.c (fs/mbcache.c)     ext2_xattr
          fs/ext3/super.c                    ext3_inode_cache
          fs/ext3/xattr.c (fs/mbcache.c)     ext3_xattr
          fs/fat/cache.c                     fat_cache
          fs/fat/inode.c                     fat_inode_cache
          fs/freevxfs/vxfs_super.c           vxfs_inode
          fs/hpfs/super.c                    hpfs_inode_cache
          fs/isofs/inode.c                   isofs_inode_cache
          fs/jffs/inode-v23.c                jffs_fm
          fs/jffs2/super.c                   jffs2_i
          fs/jfs/super.c                     jfs_ip
          fs/minix/inode.c                   minix_inode_cache
          fs/ncpfs/inode.c                   ncp_inode_cache
          fs/nfs/direct.c                    nfs_direct_cache
          fs/nfs/inode.c                     nfs_inode_cache
          fs/ntfs/super.c                    ntfs_big_inode_cache_name
          fs/ntfs/super.c                    ntfs_inode_cache
          fs/ocfs2/dlm/dlmfs.c               dlmfs_inode_cache
          fs/ocfs2/super.c                   ocfs2_inode_cache
          fs/proc/inode.c                    proc_inode_cache
          fs/qnx4/inode.c                    qnx4_inode_cache
          fs/reiserfs/super.c                reiser_inode_cache
          fs/romfs/inode.c                   romfs_inode_cache
          fs/smbfs/inode.c                   smb_inode_cache
          fs/sysv/inode.c                    sysv_inode_cache
          fs/udf/super.c                     udf_inode_cache
          fs/ufs/super.c                     ufs_inode_cache
          net/socket.c                       sock_inode_cache
          net/sunrpc/rpc_pipe.c              rpc_inode_cache
      The choice of which slab caches to so mark was quite simple.  I marked
      those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
      inode_cache, and buffer_head, which were marked in a previous patch.  Even
      though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
      potentially large file system i/o related slab caches as we need for memory
      Given that the rule now becomes "wherever you would have used a
      SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
      the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
      Future file system writers will just copy one of the existing file system
      slab cache setups and tend to get it right without thinking.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  28. 18 Feb, 2006 1 commit
  29. 08 Jan, 2006 1 commit
  30. 30 Oct, 2005 1 commit
  31. 09 Sep, 2005 1 commit
  32. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      Let it rip!