1. 17 Jul, 2007 1 commit
    • Mel Gorman's avatar
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman authored
      
      
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
  2. 16 Jul, 2007 1 commit
  3. 21 May, 2007 1 commit
  4. 17 May, 2007 2 commits
    • Christoph Lameter's avatar
      Fix page allocation flags in grow_dev_page() · ea125892
      Christoph Lameter authored
      
      
      grow_dev_page() simply passes GFP_NOFS to find_or_create_page.  This means
      the allocation of radix tree nodes is done with GFP_NOFS and the allocation
      of a new page is done using GFP_NOFS.
      
      The mapping has a flags field that contains the necessary allocation flags
      for the page cache allocation.  These need to be consulted in order to get
      DMA and HIGHMEM allocations etc right.  And yes a blockdev could be
      allowing Highmem allocations if its a ramdisk.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea125892
    • Christoph Lameter's avatar
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter authored
      
      
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  5. 09 May, 2007 2 commits
    • Rafael J. Wysocki's avatar
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki authored
      
      
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
    • Nate Diller's avatar
      fs: convert core functions to zero_user_page · 01f2705d
      Nate Diller authored
      
      
      It's very common for file systems to need to zero part or all of a page,
      the simplist way is just to use kmap_atomic() and memset().  There's
      actually a library function in include/linux/highmem.h that does exactly
      that, but it's confusingly named memclear_highpage_flush(), which is
      descriptive of *how* it does the work rather than what the *purpose* is.
      So this patchset renames the function to zero_user_page(), and calls it
      from the various places that currently open code it.
      
      This first patch introduces the new function call, and converts all the
      core kernel callsites, both the open-coded ones and the old
      memclear_highpage_flush() ones.  Following this patch is a series of
      conversions for each file system individually, per AKPM, and finally a
      patch deprecating the old call.  The diffstat below shows the entire
      patchset.
      
      [akpm@linux-foundation.org: fix a few things]
      Signed-off-by: default avatarNate Diller <nate.diller@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01f2705d
  6. 08 May, 2007 2 commits
  7. 07 May, 2007 4 commits
    • Christoph Lameter's avatar
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter authored
      
      
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
    • Peter Zijlstra's avatar
      mm: optimize kill_bdev() · f9a14399
      Peter Zijlstra authored
      
      
      Remove duplicate work in kill_bdev().
      
      It currently invalidates and then truncates the bdev's mapping.
      invalidate_mapping_pages() will opportunistically remove pages from the
      mapping.  And truncate_inode_pages() will forcefully remove all pages.
      
      The only thing truncate doesn't do is flush the bh lrus.  So do that
      explicitly.  This avoids (very unlikely) but possible invalid lookup
      results if the same bdev is quickly re-issued.
      
      It also will prevent extreme kernel latencies which are observed when
      blockdevs which have a large amount of pagecache are unmounted, by avoiding
      invalidate_mapping_pages() on that path.  invalidate_mapping_pages() has no
      cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
      has one.
      
      [akpm@linux-foundation.org: restore nrpages==0 optimisation]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9a14399
    • Peter Zijlstra's avatar
      mm: remove destroy_dirty_buffers from invalidate_bdev() · f98393a6
      Peter Zijlstra authored
      
      
      Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
      been used in 6 years (so akpm says).
      
      find * -name \*.[ch] | xargs grep -l invalidate_bdev |
      while read file; do
      	quilt add $file;
      	sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
      done
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f98393a6
    • Nick Piggin's avatar
      fs: buffer don't PageUptodate without page locked · 3d67f2d7
      Nick Piggin authored
      
      
      __block_write_full_page is calling SetPageUptodate without the page locked.
      This is unusual, but not incorrect, as PG_writeback is still set.
      
      However the next patch will require that SetPageUptodate always be called with
      the page locked.  Simply don't bother setting the page uptodate in this case
      (it is unusual that the write path does such a thing anyway).  Instead just
      leave it to the read side to bring the page uptodate when it notices that all
      buffers are uptodate.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d67f2d7
  8. 06 Mar, 2007 1 commit
  9. 20 Feb, 2007 2 commits
  10. 12 Feb, 2007 2 commits
  11. 11 Feb, 2007 2 commits
    • Nick Piggin's avatar
      [PATCH] buffer: memorder fix · 72ed3d03
      Nick Piggin authored
      
      
      unlock_buffer(), like unlock_page(), must not clear the lock without
      ensuring that the critical section is closed.
      
      Mingming later sent the same patch, saying:
      
        We are running SDET benchmark and saw double free issue for ext3 extended
        attributes block, which complains the same xattr block already being freed (in
        ext3_xattr_release_block()).  The problem could also been triggered by
        multiple threads loop untar/rm a kernel tree.
      
        The race is caused by missing a memory barrier at unlock_buffer() before the
        lock bit being cleared, resulting in possible concurrent h_refcounter update.
        That causes a reference counter leak, then later leads to the double free that
        we have seen.
      
        Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
        bit is being cleared, however, there is no memory barrier *before* the bit is
        cleared.  On some arch the h_refcount update instruction and the clear bit
        instruction could be reordered, thus leave the critical section re-entered.
      
        The race is like this: For example, if the h_refcount is initialized as 1,
      
        cpu 0:                                   cpu1
        --------------------------------------   -----------------------------------
        lock_buffer() /* test_and_set_bit */
        clear_buffer_locked(bh);
                                                lock_buffer() /* test_and_set_bit */
        h_refcount = h_refcount+1; /* = 2*/     h_refcount = h_refcount + 1; /*= 2 */
                                                clear_buffer_locked(bh);
        ....                                    ......
      
        We lost a h_refcount here. We need a memory barrier before the buffer head lock
        bit being cleared to force the order of the two writes.  Please apply.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72ed3d03
    • Andrew Morton's avatar
      [PATCH] remove invalidate_inode_pages() · fc0ecff6
      Andrew Morton authored
      
      
      Convert all calls to invalidate_inode_pages() into open-coded calls to
      invalidate_mapping_pages().
      
      Leave the invalidate_inode_pages() wrapper in place for now, marked as
      deprecated.
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc0ecff6
  12. 29 Jan, 2007 1 commit
  13. 26 Jan, 2007 1 commit
    • Linus Torvalds's avatar
      Resurrect 'try_to_free_buffers()' VM hackery · ecdfc978
      Linus Torvalds authored
      It's not pretty, but it appears that ext3 with data=journal will clean
      pages without ever actually telling the VM that they are clean.  This,
      in turn, will result in the VM (and balance_dirty_pages() in particular)
      to never realize that the pages got cleaned, and wait forever for an
      event that already happened.
      
      Technically, this seems to be a problem with ext3 itself, but it used to
      be hidden by 'try_to_free_buffers()' noticing this situation on its own,
      and just working around the filesystem problem.
      
      This commit re-instates that hack, in order to avoid a regression for
      the 2.6.20 release. This fixes bugzilla 7844:
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=7844
      
      
      
      Peter Zijlstra points out that we should probably retain the debugging
      code that this removes from cancel_dirty_page(), and I agree, but for
      the imminent release we might as well just silence the warning too
      (since it's not a new bug: anything that triggers that warning has been
      around forever).
      Acked-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Acked-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ecdfc978
  14. 11 Jan, 2007 1 commit
  15. 21 Dec, 2006 1 commit
  16. 10 Dec, 2006 3 commits
    • Andrew Morton's avatar
      [PATCH] io-accounting: write-cancel accounting · e08748ce
      Andrew Morton authored
      
      
      Account for the number of byte writes which this process caused to not happen
      after all.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e08748ce
    • Andrew Morton's avatar
      [PATCH] io-accounting: write accounting · 55e829af
      Andrew Morton authored
      
      
      Accounting writes is fairly simple: whenever a process flips a page from clean
      to dirty, we accuse it of having caused a write to underlying storage of
      PAGE_CACHE_SIZE bytes.
      
      This may overestimate the amount of writing: the page-dirtying may cause only
      one buffer_head's worth of writeout.  Fixing that is possible, but probably a
      bit messy and isn't obviously important.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      55e829af
    • Andrew Morton's avatar
      [PATCH] clean up __set_page_dirty_nobuffers() · 8c08540f
      Andrew Morton authored
      
      
      Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
      and a few other places.  No functional changes.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8c08540f
  17. 07 Dec, 2006 2 commits
  18. 17 Oct, 2006 1 commit
    • Jan Kara's avatar
      [PATCH] Fix IO error reporting on fsync() · 58ff407b
      Jan Kara authored
      
      
      When IO error happens on metadata buffer, buffer is freed from memory and
      later fsync() is called, filesystems like ext2 fail to report EIO.  We
      
      solve the problem by introducing a pointer to associated address space into
      the buffer_head.  When a buffer is removed from a list of metadata buffers
      associated with an address space, IO error is transferred from the buffer to
      the address space, so that fsync can later report it.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      58ff407b
  19. 11 Oct, 2006 2 commits
  20. 09 Oct, 2006 1 commit
    • Nick Piggin's avatar
      [PATCH] mm: bug in set_page_dirty_buffers · ebf7a227
      Nick Piggin authored
      
      
      This was triggered, but not the fault of, the dirty page accounting
      patches. Suitable for -stable as well, after it goes upstream.
      
        Unable to handle kernel NULL pointer dereference at virtual address 0000004c
        EIP is at _spin_lock+0x12/0x66
        Call Trace:
         [<401766e7>] __set_page_dirty_buffers+0x15/0xc0
         [<401401e7>] set_page_dirty+0x2c/0x51
         [<40140db2>] set_page_dirty_balance+0xb/0x3b
         [<40145d29>] __do_fault+0x1d8/0x279
         [<40147059>] __handle_mm_fault+0x125/0x951
         [<401133f1>] do_page_fault+0x440/0x59f
         [<4034d0c1>] error_code+0x39/0x40
         [<08048a33>] 0x8048a33
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ebf7a227
  21. 30 Sep, 2006 1 commit
    • David Howells's avatar
      [PATCH] BLOCK: Move functions out of buffer code [try #6] · cf9a2ae8
      David Howells authored
      
      
      Move some functions out of the buffering code that aren't strictly buffering
      specific.  This is a precursor to being able to disable the block layer.
      
       (*) Moved some stuff out of fs/buffer.c:
      
           (*) The file sync and general sync stuff moved to fs/sync.c.
      
           (*) The superblock sync stuff moved to fs/super.c.
      
           (*) do_invalidatepage() moved to mm/truncate.c.
      
           (*) try_to_release_page() moved to mm/filemap.c.
      
       (*) Moved some related declarations between header files:
      
           (*) declarations for do_invalidatepage() and try_to_release_page() moved
           	 to linux/mm.h.
      
           (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cf9a2ae8
  22. 26 Sep, 2006 1 commit
    • Peter Zijlstra's avatar
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra authored
      
      
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d08b3851
  23. 31 Jul, 2006 1 commit
  24. 30 Jun, 2006 2 commits
  25. 28 Jun, 2006 1 commit
  26. 27 Jun, 2006 1 commit