Skip to content
  • Ryan Ding's avatar
    ocfs2: fix sparse file & data ordering issue in direct io · c15471f7
    Ryan Ding authored
    There are mainly three issues in the direct io code path after commit
    24c40b32
    
     ("ocfs2: implement ocfs2_direct_IO_write"):
    
      * Does not support sparse file.
      * Does not support data ordering.  eg: when write to a file hole, it
        will alloc extent first.  If system crashed before io finished, data
        will corrupt.
      * Potential risk when doing aio+dio.  The -EIOCBQUEUED return value is
        likely to be ignored by ocfs2_direct_IO_write().
    
    To resolve above problems, re-design direct io code with following ideas:
      * Use buffer io to fill in holes.  And this will make better
        performance also.
      * Clear unwritten after direct write finished.  So we can make sure
        meta data changes after data write to disk.  (Unwritten extent is
        invisible to user, from user's view, meta data is not changed when
        allocate an unwritten extent.)
      * Clear ocfs2_direct_IO_write().  Do all ending work in end_io.
    
    This patch has passed fs,dio,ltp-aiodio.part1,ltp-aiodio.part2,ltp-aiodio.part4
    test cases of ltp.
    
    For performance improvement, see following test result:
    ocfs2 cluster size 1MB, ocfs2 volume is mounted on /mnt/.
    The original way:
      + rm /mnt/test.img -f
      + dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct
      1048576+0 records in
      1048576+0 records out
      4294967296 bytes (4.3 GB) copied, 1707.83 s, 2.5 MB/s
      + rm /mnt/test.img -f
      + dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct
      16384+0 records in
      16384+0 records out
      4294967296 bytes (4.3 GB) copied, 582.705 s, 7.4 MB/s
    
    After this patch:
      + rm /mnt/test.img -f
      + dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct
      1048576+0 records in
      1048576+0 records out
      4294967296 bytes (4.3 GB) copied, 64.6412 s, 66.4 MB/s
      + rm /mnt/test.img -f
      + dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct
      16384+0 records in
      16384+0 records out
      4294967296 bytes (4.3 GB) copied, 34.7611 s, 124 MB/s
    
    Signed-off-by: default avatarRyan Ding <ryan.ding@oracle.com>
    Reviewed-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
    Cc: Joseph Qi <joseph.qi@huawei.com>
    Cc: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c15471f7