1. 27 Oct, 2010 11 commits
    • Namhyung Kim's avatar
      ext4: Check return value of sb_getblk() and friends · 87783690
      Namhyung Kim authored
      
      
      Fail block allocation if sb_getblk() returns NULL. In that case,
      sb_find_get_block() also likely to fail so that it should skip
      calling ext4_forget().
      Signed-off-by: default avatarNamhyung Kim <namhyung@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      87783690
    • Theodore Ts'o's avatar
      ext4: use bio layer instead of buffer layer in mpage_da_submit_io · bd2d0210
      Theodore Ts'o authored
      
      
      Call the block I/O layer directly instad of going through the buffer
      layer.  This should give us much better performance and scalability,
      as well as lowering our CPU utilization when doing buffered writeback.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      bd2d0210
    • Theodore Ts'o's avatar
      ext4: move mpage_put_bnr_to_bhs()'s functionality to mpage_da_submit_io() · 1de3e3df
      Theodore Ts'o authored
      
      
      This massively simplifies the ext4_da_writepages() code path by
      completely removing mpage_put_bnr_bhs(), which is almost 100 lines of
      code iterating over a set of pages using pagevec_lookup(), and folds
      that functionality into mpage_da_submit_io()'s existing
      pagevec_lookup() loop.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1de3e3df
    • Theodore Ts'o's avatar
      ext4: inline walk_page_buffers() into mpage_da_submit_io · 3ecdb3a1
      Theodore Ts'o authored
      
      
      Expand the call:
      
        if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
                              ext4_bh_delay_or_unwritten))
      	goto redirty_page
      
      into mpage_da_submit_io().
      
      This will allow us to merge in mpage_put_bnr_to_bhs() in the next
      patch.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      3ecdb3a1
    • Theodore Ts'o's avatar
      ext4: inline ext4_writepage() into mpage_da_submit_io() · cb20d518
      Theodore Ts'o authored
      
      
      As a prepratory step to switching to bio_submit, inline
      ext4_writepage() into mpage_da_submit() and then simplify things a
      bit.  This makes it clearer what mpage_da_submit needs to do.
      
      Also, move the ClearPageChecked(page) call into
      __ext4_journalled_writepage(), as a minor bit of cleanup refactoring.
      
      This also allows us to pull i_size_read() and
      ext4_should_journal_data() out of the loop, which should be a very
      minor CPU savings.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      cb20d518
    • Theodore Ts'o's avatar
      ext4: simplify ext4_writepage() · a42afc5f
      Theodore Ts'o authored
      
      
      The actual code in ext4_writepage() is unnecessarily convoluted.
      Simplify it so it is easier to understand, but otherwise logically
      equivalent.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a42afc5f
    • Theodore Ts'o's avatar
      ext4: call mpage_da_submit_io() from mpage_da_map_blocks() · 5a87b7a5
      Theodore Ts'o authored
      
      
      Eventually we need to completely reorganize the ext4 writepage
      callpath, but for now, we simplify things a little by calling
      mpage_da_submit_io() from mpage_da_map_blocks(), since all of the
      places where we call mpage_da_map_blocks() it is followed up by a call
      to mpage_da_submit_io().
      
      We're also a wee bit better with respect to error handling, but there
      are still a number of issues where it's not clear what the right thing
      is to do with ext4 functions deep in the writeback codepath fails.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5a87b7a5
    • Eric Sandeen's avatar
      ext4: queue conversion after adding to inode's completed IO list · c999af2b
      Eric Sandeen authored
      
      
      By queuing the io end on the unwritten workqueue before adding it
      to our inode's list of completed IOs, I think we run the risk
      of the work getting completed, and the IO freed, before we try
      to add it to the inode's i_completed_io_list.
      
      It should be safe to add it to the inode's list of completed
      IOs, and -then- queue it for completion, I think.
      
      Thanks to Dave Chinner for pointing out the race.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarJiaying Zhang <jiayingz@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c999af2b
    • Toshiyuki Okajima's avatar
      ext4: fix potential infinite loop in ext4_da_writepages() · 0c9169cc
      Toshiyuki Okajima authored
      
      
      On linux-2.6.36-rc2, if we execute the following script, we can hang
      the system when the /bin/sync command is executed:
      
      ========================================================================
      #!/bin/sh
      
      echo -n "HANG UP TEST: "
      /bin/dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1M 2> /dev/null
      /sbin/mkfs.ext4 -Fq /tmp/img
      /bin/mount -o loop -t ext4 /tmp/img /mnt
      /bin/dd if=/dev/zero of=/mnt/file bs=1 count=1 \
      seek=$((16*1024*1024*1024*1024-4096)) 2> /dev/null
      /bin/sync
      /bin/umount /mnt
      echo "DONE"
      exit 0
      ========================================================================
      
      We can see the following backtrace if we get the kdump when this
      hangup occurs:
      
      ======================================================================
      kthread()
      => bdi_writeback_thread()
         => wb_do_writeback()
            => wb_writeback()
               => writeback_inodes_wb()
                  => writeback_sb_inodes()
                     => writeback_single_inode()
                        => ext4_da_writepages()  ---+ 
                                      ^ infinite    |
                                      |   loop      |
                                      +-------------+
      ======================================================================
      
      The reason why this hangup happens is described as follows:
      1) We write the last extent block of the file whose size is the filesystem 
         maximum size.
      2) "BH_Delay" flag is set on the buffer_head of its block.
      3) - the member, "m_lblk" of struct mpage_da_data is 4294967295 (UINT_MAX)
         - the member, "m_len" of struct mpage_da_data is 1
        mpage_put_bnr_to_bhs() which is called via ext4_da_writepages()
        cannot clear "BH_Delay" flag of the buffer_head because the type of
        m_lblk is ext4_lblk_t and then m_lblk + m_len is overflow.
      
        Therefore an infinite loop occurs because ext4_da_writepages()
        cannot write the page (which corresponds to the block) since
        "BH_Delay" flag isn't cleared.
      ----------------------------------------------------------------------
      static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd,
      				struct ext4_map_blocks *map)
      {
      ...
      	int blocks = map->m_len;
      ...
      		do {
      			// cur_logical = 4294967295
      			// map->m_lblk = 4294967295
      			// blocks = 1
      			// *** map->m_lblk + blocks == 0 (OVERFLOW!) ***
      			// (cur_logical >= map->m_lblk + blocks) => true
      			if (cur_logical >= map->m_lblk + blocks)
      				break;
      ----------------------------------------------------------------------
      
      NOTE: Mounting with the nodelalloc option will avoid this codepath,
      and thus, avoid this hang
      Signed-off-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0c9169cc
    • Eric Sandeen's avatar
      ext4: don't bump up LONG_MAX nr_to_write by a factor of 8 · b443e733
      Eric Sandeen authored
      
      
      I'm uneasy with lots of stuff going on in ext4_da_writepages(),
      but bumping nr_to_write from LLONG_MAX to -8 clearly isn't
      making anything better, so avoid the multiplier in that case.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b443e733
    • Eric Sandeen's avatar
      ext4: stop looping in ext4_num_dirty_pages when max_pages reached · 659c6009
      Eric Sandeen authored
      
      
      Today we simply break out of the inner loop when we have accumulated
      max_pages; this keeps scanning forwad and doing pagevec_lookup_tag()
      in the while (!done) loop, this does potentially a lot of work
      with no net effect.
      
      When we have accumulated max_pages, just clean up and return.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      659c6009
  2. 09 Aug, 2010 4 commits
    • Al Viro's avatar
      convert ext4 to ->evict_inode() · 0930fcc1
      Al Viro authored
      
      
      pretty much brute-force...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0930fcc1
    • Christoph Hellwig's avatar
      remove inode_setattr · 1025774c
      Christoph Hellwig authored
      
      
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1025774c
    • Christoph Hellwig's avatar
      introduce __block_write_begin · 6e1db88d
      Christoph Hellwig authored
      
      
      Split up the block_write_begin implementation - __block_write_begin is a new
      trivial wrapper for block_prepare_write that always takes an already
      allocated page and can be either called from block_write_begin or filesystem
      code that already has a page allocated.  Remove the handling of already
      allocated pages from block_write_begin after switching all callers that
      do it to __block_write_begin.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6e1db88d
    • Christoph Hellwig's avatar
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig authored
      
      
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  3. 05 Aug, 2010 1 commit
    • Jan Kara's avatar
      ext4: Fix dirtying of journalled buffers in data=journal mode · 56d35a4c
      Jan Kara authored
      
      
      In data=journal mode, we still use block_write_begin() to prepare
      page for writing. This function can occasionally mark buffer dirty
      which violates journalling assumptions - when a buffer is part of
      a transaction, it should be dirty and a buffer can be already part
      of a forget list of some transaction when block_write_begin()
      gets called. This violation of journalling assumptions then results
      in "JBD: Spotted dirty metadata buffer..." warnings.
      
      In fact, temporary dirtying the buffer while the page is still locked
      does not really cause problems to the journalling because we won't write
      the buffer until the page gets unlocked. So we just have to make sure
      to clear dirty bits before unlocking the page.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      56d35a4c
  4. 03 Aug, 2010 1 commit
  5. 29 Jul, 2010 1 commit
  6. 27 Jul, 2010 8 commits
  7. 26 Jul, 2010 1 commit
  8. 29 Jun, 2010 1 commit
  9. 14 Jun, 2010 3 commits
  10. 05 Jun, 2010 1 commit
  11. 21 May, 2010 1 commit
  12. 17 May, 2010 2 commits
  13. 16 May, 2010 5 commits
    • Dmitry Monakhov's avatar
      ext4: Use bitops to read/modify i_flags in struct ext4_inode_info · 12e9b892
      Dmitry Monakhov authored
      At several places we modify EXT4_I(inode)->i_flags without holding
      i_mutex (ext4_do_update_inode, ...). These modifications are racy and
      we can lose updates to i_flags. So convert handling of i_flags to use
      bitops which are atomic.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=15792
      
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      12e9b892
    • Theodore Ts'o's avatar
      ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() · 24676da4
      Theodore Ts'o authored
      
      
      EXT4_ERROR_INODE() tends to provide better error information and in a
      more consistent format.  Some errors were not even identifying the inode
      or directory which was corrupted, which made them not very useful.
      
      Addresses-Google-Bug: #2507977
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      24676da4
    • Theodore Ts'o's avatar
      ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() · 2ed88685
      Theodore Ts'o authored
      
      
      This saves a huge amount of stack space by avoiding unnecesary struct
      buffer_head's from being allocated on the stack.
      
      In addition, to make the code easier to understand, collapse and
      refactor ext4_get_block(), ext4_get_block_write(),
      noalloc_get_block_write(), into a single function.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      2ed88685
    • Theodore Ts'o's avatar
      ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() · e35fd660
      Theodore Ts'o authored
      
      
      Jack up ext4_get_blocks() and add a new function, ext4_map_blocks()
      which uses a much smaller structure, struct ext4_map_blocks which is
      20 bytes, as opposed to a struct buffer_head, which nearly 5 times
      bigger on an x86_64 machine.  By switching things to use
      ext4_map_blocks(), we can save stack space by using ext4_map_blocks()
      since we can avoid allocating a struct buffer_head on the stack.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e35fd660
    • Theodore Ts'o's avatar
      ext4: Use our own write_cache_pages() · 8e48dcfb
      Theodore Ts'o authored
      
      
      Make a copy of write_cache_pages() for the benefit of
      ext4_da_writepages().  This allows us to simplify the code some, and
      will allow us to further customize the code in future patches.
      
      There are some nasty hacks in write_cache_pages(), which Linus has
      (correctly) characterized as vile.  I've just copied it into
      write_cache_pages_da(), without trying to clean those bits up lest I
      break something in the ext4's delalloc implementation, which is a bit
      fragile right now.  This will allow Dave Chinner to clean up
      write_cache_pages() in mm/page-writeback.c, without worrying about
      breaking ext4.  Eventually write_cache_pages_da() will go away when I
      rewrite ext4's delayed allocation and create a general
      ext4_writepages() which is used for all of ext4's writeback.  Until
      now this is the lowest risk way to clean up the core
      write_cache_pages() function.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Dave Chinner <david@fromorbit.com>
      8e48dcfb