1. 28 Feb, 2012 1 commit
  2. 08 Feb, 2012 1 commit
    • Kent Overstreet's avatar
      bio: don't overflow in bio_get_nr_vecs() · 5abebfdd
      Kent Overstreet authored
      There were two places bio_get_nr_vecs() could overflow:
      
      First, it did a left shift to convert from sectors to bytes immediately
      before dividing by PAGE_SIZE.  If PAGE_SIZE ever was less than 512 a great
      many things would break, so dividing by PAGE_SIZE >> 9 is safe and will
      generate smaller code too.
      
      The nastier overflow was in the DIV_ROUND_UP() (that's what the code was
      effectively doing, anyways).  If n + d overflowed, the whole thing would
      return 0 which breaks things rather effectively.
      
      bio_get_nr_vecs() doesn't claim to give an exact value anyways, so the
      DIV_ROUND_UP() is silly; we could do a straight divide except if a
      device's queue_max_sectors was less than PAGE_SIZE we'd return 0.  So we
      just add 1; this should always be safe - things will break badly if
      bio_get_nr_vecs() returns > BIO_MAX_PAGES (bio_alloc() will suddenly start
      failing) but it's queue_max_segments that must guard against this, if
      queue_max_sectors is preventing this from happen things are going to
      explode on architectures with different PAGE_SIZE.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: default avatarValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5abebfdd
  3. 16 Nov, 2011 1 commit
  4. 24 Oct, 2011 1 commit
    • Tao Ma's avatar
      block: Remove the control of complete cpu from bio. · 9562ad9a
      Tao Ma authored
      bio originally has the functionality to set the complete cpu, but
      it is broken.
      
      Chirstoph said that "This code is unused, and from the all the
      discussions lately pretty obviously broken.  The only thing keeping
      it serves is creating more confusion and possibly more bugs."
      
      And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
      with leaving cpu control to the request based drivers, they are the
      only ones that can toggle the setting anyway".
      
      So this patch tries to remove all the work of controling complete cpu
      from a bio.
      
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9562ad9a
  5. 28 May, 2011 1 commit
  6. 27 May, 2011 1 commit
  7. 31 Mar, 2011 1 commit
  8. 22 Mar, 2011 1 commit
  9. 17 Mar, 2011 1 commit
  10. 08 Mar, 2011 1 commit
  11. 10 Nov, 2010 2 commits
  12. 07 Aug, 2010 1 commit
    • Christoph Hellwig's avatar
      block: unify flags for struct bio and struct request · 7b6d91da
      Christoph Hellwig authored
      Remove the current bio flags and reuse the request flags for the bio, too.
      This allows to more easily trace the type of I/O from the filesystem
      down to the block driver.  There were two flags in the bio that were
      missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
      renamed two request flags that had a superflous RW in them.
      
      Note that the flags are in bio.h despite having the REQ_ name - as
      blkdev.h includes bio.h that is the only way to go for now.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      7b6d91da
  13. 08 Mar, 2010 1 commit
  14. 02 Mar, 2010 1 commit
  15. 28 Feb, 2010 1 commit
  16. 26 Feb, 2010 1 commit
  17. 05 Feb, 2010 1 commit
  18. 28 Jan, 2010 1 commit
  19. 19 Jan, 2010 1 commit
  20. 04 Dec, 2009 1 commit
  21. 26 Nov, 2009 1 commit
    • Ilya Loginov's avatar
      block: add helpers to run flush_dcache_page() against a bio and a request's pages · 2d4dc890
      Ilya Loginov authored
      Mtdblock driver doesn't call flush_dcache_page for pages in request.  So,
      this causes problems on architectures where the icache doesn't fill from
      the dcache or with dcache aliases.  The patch fixes this.
      
      The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
      pointless empty cache-thrashing loops on architectures for which
      flush_dcache_page() is a no-op.  Every architecture was provided with this
      flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
      equal 1 or do nothing otherwise.
      
      See "fix mtd_blkdevs problem with caches on some architectures" discussion
      on LKML for more information.
      Signed-off-by: default avatarIlya Loginov <isloginov@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Peter Horton <phorton@bitbox.co.uk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      2d4dc890
  22. 02 Nov, 2009 2 commits
  23. 01 Oct, 2009 1 commit
  24. 10 Jul, 2009 1 commit
    • FUJITA Tomonori's avatar
      block: fix sg SG_DXFER_TO_FROM_DEV regression · ecb554a8
      FUJITA Tomonori authored
      I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use
      the block layer mapping API (2.6.28).
      
      Douglas Gilbert explained SG_DXFER_TO_FROM_DEV:
      
      http://www.spinics.net/lists/linux-scsi/msg37135.html
      
      =
      The semantics of SG_DXFER_TO_FROM_DEV were:
         - copy user space buffer to kernel (LLD) buffer
         - do SCSI command which is assumed to be of the DATA_IN
           (data from device) variety. This would overwrite
           some or all of the kernel buffer
         - copy kernel (LLD) buffer back to the user space.
      
      The idea was to detect short reads by filling the original
      user space buffer with some marker bytes ("0xec" it would
      seem in this report). The "resid" value is a better way
      of detecting short reads but that was only added this century
      and requires co-operation from the LLD.
      =
      
      This patch changes the block layer mapping API to support this
      semantics. This simply adds another field to struct rq_map_data and
      enables __bio_copy_iov() to copy data from user space even with READ
      requests.
      
      It's better to add the flags field and kills null_mapped and the new
      from_user fields in struct rq_map_data but that approach makes it
      difficult to send this patch to stable trees because st and osst
      drivers use struct rq_map_data (they were converted to use the block
      layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer
      mapping API.
      
      zhou sf reported this regiression and tested this patch:
      
      http://www.spinics.net/lists/linux-scsi/msg37128.html
      http://www.spinics.net/lists/linux-scsi/msg37168.htmlReported-by: default avatarzhou sf <sxzzsf@gmail.com>
      Tested-by: default avatarzhou sf <sxzzsf@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      ecb554a8
  25. 01 Jul, 2009 1 commit
  26. 16 Jun, 2009 1 commit
  27. 12 Jun, 2009 1 commit
  28. 10 Jun, 2009 1 commit
  29. 09 Jun, 2009 1 commit
    • Li Zefan's avatar
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan authored
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      55782138
  30. 22 May, 2009 2 commits
  31. 19 May, 2009 1 commit
    • Tejun Heo's avatar
      bio: always copy back data for copied kernel requests · 4fc981ef
      Tejun Heo authored
      When a read bio_copy_kern() request fails, the content of the bounce
      buffer is not copied back.  However, as request failure doesn't
      necessarily mean complete failure, the buffer state can be useful.
      This behavior is also inconsistent with the user map counterpart and
      causes the subtle difference between bounced and unbounced IO causes
      confusion.
      
      This patch makes bio_copy_kern_endio() ignore @err and always copy
      back data on request completion.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      4fc981ef
  32. 28 Apr, 2009 1 commit
  33. 22 Apr, 2009 2 commits
    • Tejun Heo's avatar
      bio: use bio_kmalloc() in copy/map functions · a9e9dc24
      Tejun Heo authored
      Impact: remove possible deadlock condition
      
      There is no reason to use mempool backed allocation for map functions.
      Also, because kern mapping is used inside LLDs (e.g. for EH), using
      mempool backed allocation can lead to deadlock under extreme
      conditions (mempool already consumed by the time a request reached EH
      and requests are blocked on EH).
      
      Switch copy/map functions to bio_kmalloc().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      a9e9dc24
    • Tejun Heo's avatar
      bio: fix bio_kmalloc() · 451a9ebf
      Tejun Heo authored
      Impact: fix bio_kmalloc() and its destruction path
      
      bio_kmalloc() was broken in two ways.
      
      * bvec_alloc_bs() first allocates bvec using kmalloc() and then
        ignores it and allocates again like non-kmalloc bvecs.
      
      * bio_kmalloc_destructor() didn't check for and free bio integrity
        data.
      
      This patch fixes the above problems.  kmalloc patch is separated out
      from bio_alloc_bioset() and allocates the requested number of bvecs as
      inline bvecs.
      
      * bio_alloc_bioset() no longer takes NULL @bs.  None other than
        bio_kmalloc() used it and outside users can't know how it was
        allocated anyway.
      
      * Define and use BIO_POOL_NONE so that pool index check in
        bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
        there.
      
      * Relocate destructors on top of each allocation function so that how
        they're used is more clear.
      
      Jens Axboe suggested allocating bvecs inline.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      451a9ebf
  34. 15 Apr, 2009 1 commit
  35. 30 Mar, 2009 1 commit
  36. 24 Mar, 2009 1 commit