1. 03 Apr, 2014 1 commit
  2. 09 Feb, 2014 1 commit
  3. 23 Nov, 2013 1 commit
    • Kent Overstreet's avatar
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet authored
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
  4. 09 Sep, 2013 1 commit
  5. 04 Sep, 2013 2 commits
    • Christoph Hellwig's avatar
      direct-io: Handle O_(D)SYNC AIO · 02afc27f
      Christoph Hellwig authored
      Call generic_write_sync() from the deferred I/O completion handler if
      O_DSYNC is set for a write request.  Also make sure various callers
      don't call generic_write_sync if the direct I/O code returns
      Based on an earlier patch from Jan Kara <jack@suse.cz> with updates from
      Jeff Moyer <jmoyer@redhat.com> and Darrick J. Wong <darrick.wong@oracle.com>.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Christoph Hellwig's avatar
      direct-io: Implement generic deferred AIO completions · 7b7a8665
      Christoph Hellwig authored
      Add support to the core direct-io code to defer AIO completions to user
      context using a workqueue.  This replaces opencoded and less efficient
      code in XFS and ext4 (we save a memory allocation for each direct IO)
      and will be needed to properly support O_(D)SYNC for AIO.
      The communication between the filesystem and the direct I/O code requires
      a new buffer head flag, which is a bit ugly but not avoidable until the
      direct I/O code stops abusing the buffer_head structure for communicating
      with the filesystems.
      Currently this creates a per-superblock unbound workqueue for these
      completions, which is taken from an earlier patch by Jan Kara.  I'm
      not really convinced about this use and would prefer a "normal" global
      workqueue with a high concurrency limit, but this needs further discussion.
      JK: Fixed ext4 part, dynamic allocation of the workqueue.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  6. 07 May, 2013 1 commit
  7. 29 Apr, 2013 2 commits
  8. 23 Mar, 2013 1 commit
    • Kent Overstreet's avatar
      block: Convert some code to bio_for_each_segment_all() · cb34e057
      Kent Overstreet authored
      More prep work for immutable bvecs:
      A few places in the code were either open coding or using the wrong
      version - fix.
      After we introduce the bvec iter, it'll no longer be possible to modify
      the biovec through bio_for_each_segment_all() - it doesn't increment a
      pointer to the current bvec, you pass in a struct bio_vec (not a
      pointer) which is updated with what the current biovec would be (taking
      into account bi_bvec_done and bi_size).
      So because of that it's more worthwhile to be consistent about
      bio_for_each_segment()/bio_for_each_segment_all() usage.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Alexander Viro <viro@zeniv.linux.org.uk>
  9. 22 Feb, 2013 1 commit
  10. 29 Nov, 2012 1 commit
    • Linus Torvalds's avatar
      direct-io: don't read inode->i_blkbits multiple times · ab73857e
      Linus Torvalds authored
      Since directio can work on a raw block device, and the block size of the
      device can change under it, we need to do the same thing that
      fs/buffer.c now does: read the block size a single time, using
      Reading it multiple times can get different results, which will then
      confuse the code because it actually encodes the i_blksize in
      relationship to the underlying logical blocksize.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  11. 09 Aug, 2012 1 commit
  12. 14 Jul, 2012 1 commit
  13. 31 May, 2012 1 commit
  14. 23 Feb, 2012 1 commit
    • Anton Altaparmakov's avatar
      Restore direct_io / truncate locking API · 37fbf4bf
      Anton Altaparmakov authored
      With kernel 3.1, Christoph removed i_alloc_sem and replaced it with
      calls (namely inode_dio_wait() and inode_dio_done()) which are
      EXPORT_SYMBOL_GPL() thus they cannot be used by non-GPL file systems and
      further inode_dio_wait() was pushed from notify_change() into the file
      system ->setattr() method but no non-GPL file system can make this call.
      That means non-GPL file systems cannot exist any more unless they do not
      use any VFS functionality related to reading/writing as far as I can
      tell or at least as long as they want to implement direct i/o.
      Both Linus and Al (and others) have said on LKML that this breakage of
      the VFS API should not have happened and that the change was simply
      missed as it was not documented in the change logs of the patches that
      did those changes.
      This patch changes the two function exports in question to be
      EXPORT_SYMBOL() thus restoring the VFS API as it used to be - accessible
      for all modules.
      Christoph, who introduced the two functions and exported them GPL-only
      is CC-ed on this patch to give him the opportunity to object to the
      symbols being changed in this manner if he did indeed intend them to be
      GPL-only and does not want them to become available to all modules.
      Signed-off-by: default avatarAnton Altaparmakov <anton@tuxera.com>
      CC: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  15. 12 Jan, 2012 2 commits
    • Andi Kleen's avatar
      dio: optimize cache misses in the submission path · 65dd2aa9
      Andi Kleen authored
      Some investigation of a transaction processing workload showed that a
      major consumer of cycles in __blockdev_direct_IO is the cache miss while
      accessing the block size.  This is because it has to walk the chain from
      block_dev to gendisk to queue.
      The block size is needed early on to check alignment and sizes.  It's only
      done if the check for the inode block size fails.  But the costly block
      device state is unconditionally fetched.
      - Reorganize the code to only fetch block dev state when actually
      Then do a prefetch on the block dev early on in the direct IO path.  This
      is worth it, because there is substantial code run before we actually
      touch the block dev now.
      - I also added some unlikelies to make it clear the compiler that block
        device fetch code is not normally executed.
      This gave a small, but measurable improvement on a large database
      benchmark (about 0.3%)
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: using prefetch requires including prefetch.h]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Tao Ma's avatar
      fs/direct-io.c: calculate fs_count correctly in get_more_blocks() · ae55e1aa
      Tao Ma authored
      In get_more_blocks(), we use dio_count to calcuate fs_count and do some
      tricky things to increase fs_count if dio_count isn't aligned.  But
      actually it still has some corner cases that can't be coverd.  See the
      following example:
      	dio_write foo -s 1024 -w 4096
      (direct write 4096 bytes at offset 1024).  The same goes if the offset
      isn't aligned to fs_blocksize.
      In this case, the old calculation counts fs_count to be 1, but actually we
      will write into 2 different blocks (if fs_blocksize=4096).  The old code
      just works, since it will call get_block twice (and may have to allocate
      and create extents twice for filesystems like ext4).  So we'd better call
      get_block just once with the proper fs_count.
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  16. 28 Oct, 2011 7 commits
  17. 26 Jul, 2011 1 commit
  18. 20 Jul, 2011 4 commits
    • Christoph Hellwig's avatar
      fs: move inode_dio_done to the end_io handler · 72c5052d
      Christoph Hellwig authored
      For filesystems that delay their end_io processing we should keep our
      i_dio_count until the the processing is done.  Enable this by moving
      the inode_dio_done call to the end_io handler if one exist.  Note that
      the actual move to the workqueue for ext4 and XFS is not done in
      this patch yet, but left to the filesystem maintainers.  At least
      for XFS it's not needed yet either as XFS has an internal equivalent
      to i_dio_count.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Christoph Hellwig's avatar
      fs: always maintain i_dio_count · df2d6f26
      Christoph Hellwig authored
      Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING.
      This these filesystems to also protect truncate against direct I/O requests
      by using common code.  Right now the only non-DIO_LOCKING filesystem that
      appears to do so is XFS, which uses an opencoded variant of the i_dio_count
      Behaviour doesn't change for filesystems never calling inode_dio_wait.
      For ext4 behaviour changes when using the dioread_nonlock option, which
      previously was missing any protection between truncate and direct I/O reads.
      For ocfs2 that handcrafted i_dio_count manipulations are replaced with
      the common code now enable.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Christoph Hellwig's avatar
      fs: kill i_alloc_sem · bd5fe6c5
      Christoph Hellwig authored
      i_alloc_sem is a rather special rw_semaphore.  It's the last one that may
      be released by a non-owner, and it's write side is always mirrored by
      real exclusion.  It's intended use it to wait for all pending direct I/O
      requests to finish before starting a truncate.
      Replace it with a hand-grown construct:
       - exclusion for truncates is already guaranteed by i_mutex, so it can
         simply fall way
       - the reader side is replaced by an i_dio_count member in struct inode
         that counts the number of pending direct I/O requests.  Truncate can't
         proceed as long as it's non-zero
       - when i_dio_count reaches non-zero we wake up a pending truncate using
         wake_up_bit on a new bit in i_flags
       - new references to i_dio_count can't appear while we are waiting for
         it to read zero because the direct I/O count always needs i_mutex
         (or an equivalent like XFS's i_iolock) for starting a new operation.
      This scheme is much simpler, and saves the space of a spinlock_t and a
      struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    • Christoph Hellwig's avatar
      fs: simplify handling of zero sized reads in __blockdev_direct_IO · f9b5570d
      Christoph Hellwig authored
      Reject zero sized reads as soon as we know our I/O length, and don't
      borther with locks or allocations that might have to be cleaned up
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  19. 10 Mar, 2011 2 commits
    • Jens Axboe's avatar
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe authored
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
    • Jens Axboe's avatar
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe authored
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
  20. 20 Jan, 2011 1 commit
  21. 19 Jan, 2011 1 commit
  22. 26 Oct, 2010 1 commit
  23. 09 Sep, 2010 1 commit
    • Jeff Moyer's avatar
      O_DIRECT: fix the splitting up of contiguous I/O · 7a801ac6
      Jeff Moyer authored
      commit c2c6ca41 (direct-io: do not merge logically non-contiguous requests)
      introduced a bug whereby all O_DIRECT I/Os were submitted a page at a time
      to the block layer.  The problem is that the code expected
      dio->block_in_file to correspond to the current page in the dio.  In fact,
      it corresponds to the previous page submitted via submit_page_section.
      This was purely an oversight, as the dio->cur_page_fs_offset field was
      introduced for just this purpose.  This patch simply uses the correct
      variable when calculating whether there is a mismatch between contiguous
      logical blocks and contiguous physical blocks (as described in the
      I also switched the if conditional following this check to an else if, to
      ensure that we never call dio_bio_submit twice for the same dio (in
      theory, this should not happen, anyway).
      I've tested this by running blktrace and verifying that a 64KB I/O was
      submitted as a single I/O.  I also ran the patched kernel through
      xfstests' aio tests using xfs, ext4 (with 1k and 4k block sizes) and btrfs
      and verified that there were no regressions as compared to an unpatched
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarJosef Bacik <jbacik@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: <stable@kernel.org>		[2.6.35.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  24. 09 Aug, 2010 1 commit
    • Christoph Hellwig's avatar
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig authored
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  25. 27 Jul, 2010 1 commit
    • Christoph Hellwig's avatar
      direct-io: move aio_complete into ->end_io · 552ef802
      Christoph Hellwig authored
      Filesystems with unwritten extent support must not complete an AIO request
      until the transaction to convert the extent has been commited.  That means
      the aio_complete calls needs to be moved into the ->end_io callback so
      that the filesystem can control when to call it exactly.
      This makes a bit of a mess out of dio_complete and the ->end_io callback
      prototype even more complicated. 
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: Jan Kara <jack@suse.cz> 
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
  26. 26 Jul, 2010 1 commit
  27. 27 May, 2010 1 commit
    • npiggin@suse.de's avatar
      fs: introduce new truncate sequence · 7bb46a67
      npiggin@suse.de authored
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
      setattr > vmtruncate > truncate, have filesystems call their truncate sequence
      from ->setattr if filesystem specific operations are required. vmtruncate is
      deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
      previously should be used.
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence. Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      simple_setsize is also introduced to perform just the ATTR_SIZE portion
      of simple_setattr (ie. changing i_size and trimming pagecache).
      To implement the new truncate sequence:
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - convert usage of helpers block_write_begin, nobh_write_begin,
        cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
        variants. These avoid calling vmtruncate to trim blocks (see previous).
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      - make use of the better opportunity to handle errors with the new sequence.
      Big problem with the previous calling sequence: the filesystem is not called
      until i_size has already changed.  This means it is not allowed to fail the
      call, and also it does not know what the previous i_size was. Also, generic
      code calling vmtruncate to truncate allocated blocks in case of error had
      no good way to return a meaningful error (or, for example, atomically handle
      block deallocation).
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>