1. 20 Jul, 2011 1 commit
  2. 27 May, 2011 1 commit
    • Christoph Hellwig's avatar
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig authored
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      aa385729
  3. 31 Mar, 2011 1 commit
  4. 24 Mar, 2011 1 commit
  5. 10 Mar, 2011 1 commit
  6. 10 Jan, 2011 1 commit
  7. 27 Oct, 2010 2 commits
  8. 25 Oct, 2010 1 commit
  9. 09 Aug, 2010 4 commits
    • Al Viro's avatar
      convert ext3 to ->evict_inode() · ac14a95b
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ac14a95b
    • Christoph Hellwig's avatar
      remove inode_setattr · 1025774c
      Christoph Hellwig authored
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1025774c
    • Christoph Hellwig's avatar
      introduce __block_write_begin · 6e1db88d
      Christoph Hellwig authored
      Split up the block_write_begin implementation - __block_write_begin is a new
      trivial wrapper for block_prepare_write that always takes an already
      allocated page and can be either called from block_write_begin or filesystem
      code that already has a page allocated.  Remove the handling of already
      allocated pages from block_write_begin after switching all callers that
      do it to __block_write_begin.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6e1db88d
    • Christoph Hellwig's avatar
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig authored
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  10. 05 Aug, 2010 1 commit
    • Jan Kara's avatar
      ext3: Fix dirtying of journalled buffers in data=journal mode · 5f11e6a4
      Jan Kara authored
      In data=journal mode, we still use block_write_begin() to prepare page for
      writing. This function can occasionally mark buffer dirty which violates
      journalling assumptions - when a buffer is part of a transaction, it should be
      dirty and a buffer can be already part of a forget list of some transaction
      when block_write_begin() gets called. This violation of journalling assumptions
      then results in "JBD: Spotted dirty metadata buffer..." warnings.
      
      In fact, temporary dirtying the buffer while the page is still locked does not
      really cause problems to the journalling because we won't write the buffer
      until the page gets unlocked. So we just have to make sure to clear dirty bits
      before unlocking the page.
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      5f11e6a4
  11. 21 Jul, 2010 2 commits
    • Jan Kara's avatar
      ext3: Avoid filesystem corruption after a crash under heavy delete load · f25f6242
      Jan Kara authored
      It can happen that ext3_free_branches calls ext3_forget() for an indirect block
      in an earlier transaction than a transaction in which we clear pointer to this
      indirect block. Thus if we crash before a transaction clearing the block
      pointer is committed, we will see indirect block pointing to already freed
      blocks and complain during orphan list cleanup.
      
      The fix is simple: Make sure ext3_forget() is called in the transaction
      doing block pointer clearing.
      
      This is a backport of an ext4 fix by Amir G. <amir73il@users.sourceforge.net>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      f25f6242
    • Christoph Hellwig's avatar
      ext3: remove vestiges of nobh support · 4c4d3901
      Christoph Hellwig authored
      The nobh option was only supported for writeback mode, but given that all
      write paths (except mmapped writed) actually create buffer heads, it
      effectively was a no-op already.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      4c4d3901
  12. 21 May, 2010 1 commit
  13. 29 Mar, 2010 1 commit
    • Linus Torvalds's avatar
      ext3: fix broken handling of EXT3_STATE_NEW · de329820
      Linus Torvalds authored
      In commit 9df93939 ("ext3: Use bitops to read/modify
      EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
      use bitops for its state handling.  However, unline the same ext4
      change, it didn't actually change the name of the field when it changed
      the semantics of it.
      
      As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
      initialized the field to EXT3_STATE_NEW.  And that does not work
      _at_all_ when we're now working with individually named bits rather than
      values that get masked.  So the code tried to mark the state to be new,
      but in actual fact set the field to EXT3_STATE_JDATA.  Which makes no
      sense at all, and screws up all the code that checks whether the inode
      was newly allocated.
      
      In particular, it made the xattr code unhappy, and caused various random
      behavior, like apparently
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=577911
      
      So fix the initialization, and rename the field to match ext4 so that we
      don't have this happen again.
      
      Cc: James Morris <jmorris@namei.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Daniel J Walsh <dwalsh@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de329820
  14. 05 Mar, 2010 1 commit
  15. 04 Mar, 2010 7 commits
    • Christoph Hellwig's avatar
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig authored
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      871a2931
    • Christoph Hellwig's avatar
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig authored
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      907f4554
    • Christoph Hellwig's avatar
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig authored
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b43fa828
    • Christoph Hellwig's avatar
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig authored
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      5dd4056d
    • Dmitry Monakhov's avatar
      ext3: add writepage sanity checks · 49792c80
      Dmitry Monakhov authored
      - There is theoretical possibility to perform writepage on
         RO superblock. Add explicit check for what case.
      - Page must being locked before writepage.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      49792c80
    • Jan Kara's avatar
      ext3: Truncate allocated blocks if direct IO write fails to update i_size · 7eb4969e
      Jan Kara authored
      We have to truncate blocks allocated to file during direct IO when we
      fail to update i_size properly.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      7eb4969e
    • Jan Kara's avatar
      ext3: Use bitops to read/modify EXT3_I(inode)->i_state · 9df93939
      Jan Kara authored
      At several places we modify EXT3_I(inode)->i_state without holding i_mutex
      (ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode,
      ...). These modifications are racy and we can lose updates to i_state. So
      convert handling of i_state to use bitops which are atomic.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      9df93939
  16. 23 Dec, 2009 1 commit
  17. 10 Dec, 2009 1 commit
    • Jan Kara's avatar
      ext3: Fix data / filesystem corruption when write fails to copy data · 68eb3db0
      Jan Kara authored
      When ext3_write_begin fails after allocating some blocks or
      generic_perform_write fails to copy data to write, we truncate blocks already
      instantiated beyond i_size. Although these blocks were never inside i_size, we
      have to truncate pagecache of these blocks so that corresponding buffers get
      unmapped. Otherwise subsequent __block_prepare_write (called because we are
      retrying the write) will find the buffers mapped, not call ->get_block, and
      thus the page will be backed by already freed blocks leading to filesystem and
      data corruption.
      Reported-by: default avatarJames Y Knight <foom@fuhm.net>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      68eb3db0
  18. 04 Dec, 2009 1 commit
  19. 11 Nov, 2009 2 commits
  20. 16 Sep, 2009 3 commits
    • Chris Mason's avatar
      ext3: Add locking to ext3_do_update_inode · 4f003fd3
      Chris Mason authored
      I've been struggling with this off and on while I've been testing the
      data=guarded work.  The symptom is corrupted orphan lists and inodes
      with the wrong i_size stored on disk.  I was convinced the
      data=guarded code was just missing a call to ext3_mark_inode_dirty, but
      tracing showed the i_disksize I was sending to ext3_mark_inode_dirty
      wasn't actually making it to the drive.
      
      ext3_mark_inode_dirty can be called without locks held (atime updates
      and a few others), so the data=guarded code uses locks while updating
      the in-memory inode, and then calls ext3_mark_inode_dirty
      without any locks held.
      
      But, ext3_mark_inode_dirty has no internal locking to make sure that
      only one CPU is updating the buffer head at a time.  Generally this
      works out ok because everyone that changes the inode then calls
      ext3_mark_inode_dirty themselves.  Even though it races, eventually
      someone updates the buffer heads and things move on.
      
      But there is still a risk of the wrong values getting in, and the
      data=guarded code seems to hit the race very often.
      
      Since everyone that changes the inode also logs it, it should be
      possible to fix this with some memory barriers.  I'll leave that as an
      exercise to the reader and lock the buffer head instead.
      
      It it probably a good idea to have a different patch series for lockless
      bit flipping on the ext3 i_state field.  ext3_do_update_inode &= clears
      EXT3_STATE_NEW without any locks held.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      4f003fd3
    • Jan Kara's avatar
      ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks() · 00171d3c
      Jan Kara authored
      During truncate we are sometimes forced to start a new transaction as the
      amount of blocks to be journaled is both quite large and hard to predict. So
      far we restarted a transaction while holding truncate_mutex and that violates
      lock ordering because truncate_mutex ranks below transaction start (and it
      can lead to a real deadlock with ext3_get_blocks() allocating new blocks
      from ext3_writepage()).
      
      Luckily, the problem is easy to fix: We just drop the truncate_mutex before
      restarting the transaction and acquire it afterwards. We are safe to do this as
      by the time ext3_truncate() is called, all the page cache for the truncated
      part of the file is dropped and so writepage() cannot come and allocate new
      blocks in the part of the file we are truncating. The rest of writers is
      stopped by us holding i_mutex.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      00171d3c
    • Andi Kleen's avatar
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen authored
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      aa261f54
  21. 15 Jul, 2009 2 commits
    • Jan Kara's avatar
      ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() · 43237b54
      Jan Kara authored
      Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
      be a relict from some old days and setting disksize in this function does not
      make much sence. Currently it was set only by ext3_getblk().  Since the
      parameter has some effect only if create == 1, it is easy to check that the
      three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
      ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      43237b54
    • Jan Kara's avatar
      ext3: Fix truncation of symlinks after failed write · 9eaaa2d5
      Jan Kara authored
      Contents of long symlinks is written via standard write methods. So when the
      write fails, we add inode to orphan list. But symlinks don't have .truncate
      method defined so nobody properly removes them from the orphan list (both on
      disk and in memory).
      
      Fix this by calling ext3_truncate() directly instead of calling vmtruncate()
      (which is saner anyway since we don't need anything vmtruncate() does except
      from calling .truncate in these paths).  We also add inode to orphan list only
      if ext3_can_truncate() is true (currently, it can be false for symlinks when
      there are no blocks allocated) - otherwise orphan list processing will complain
      and ext3_truncate() will not remove inode from on-disk orphan list.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      9eaaa2d5
  22. 24 Jun, 2009 1 commit
  23. 18 Jun, 2009 2 commits
  24. 11 Jun, 2009 1 commit