1. 27 May, 2010 2 commits
    • npiggin@suse.de's avatar
      fs: introduce new truncate sequence · 7bb46a67
      npiggin@suse.de authored
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
      setattr > vmtruncate > truncate, have filesystems call their truncate sequence
      from ->setattr if filesystem specific operations are required. vmtruncate is
      deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
      previously should be used.
      
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence. Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      away.
      
      simple_setsize is also introduced to perform just the ATTR_SIZE portion
      of simple_setattr (ie. changing i_size and trimming pagecache).
      
      To implement the new truncate sequence:
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - convert usage of helpers block_write_begin, nobh_write_begin,
        cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
        variants. These avoid calling vmtruncate to trim blocks (see previous).
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      - make use of the better opportunity to handle errors with the new sequence.
      
      Big problem with the previous calling sequence: the filesystem is not called
      until i_size has already changed.  This means it is not allowed to fail the
      call, and also it does not know what the previous i_size was. Also, generic
      code calling vmtruncate to truncate allocated blocks in case of error had
      no good way to return a meaningful error (or, for example, atomically handle
      block deallocation).
      
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7bb46a67
    • Christoph Hellwig's avatar
      drop unused dentry argument to ->fsync · 7ea80859
      Christoph Hellwig authored
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7ea80859
  2. 16 Apr, 2009 1 commit
  3. 01 Apr, 2009 2 commits
    • Nick Piggin's avatar
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin authored
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
    • Al Viro's avatar
      Cleanup after commit 585d3bc0 · 47e4491b
      Al Viro authored
      fsync_bdev() export and a bunch of stubs for !CONFIG_BLOCK case had
      been left behind
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      47e4491b
  4. 27 Mar, 2009 1 commit
  5. 09 Jan, 2009 1 commit
    • Takashi Sato's avatar
      filesystem freeze: implement generic freeze feature · fcccf502
      Takashi Sato authored
      The ioctls for the generic freeze feature are below.
      o Freeze the filesystem
        int ioctl(int fd, int FIFREEZE, arg)
          fd: The file descriptor of the mountpoint
          FIFREEZE: request code for the freeze
          arg: Ignored
          Return value: 0 if the operation succeeds. Otherwise, -1
      
      o Unfreeze the filesystem
        int ioctl(int fd, int FITHAW, arg)
          fd: The file descriptor of the mountpoint
          FITHAW: request code for unfreeze
          arg: Ignored
          Return value: 0 if the operation succeeds. Otherwise, -1
          Error number: If the filesystem has already been unfrozen,
                        errno is set to EINVAL.
      
      [akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
      Signed-off-by: default avatarTakashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: default avatarMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
      Cc: <xfs-masters@oss.sgi.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcccf502
  6. 29 Dec, 2008 1 commit
    • Keith Mannthey's avatar
      block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03
      Keith Mannthey authored
      Allow the scsi request REQ_QUIET flag to be propagated to the buffer
      file system layer. The basic ideas is to pass the flag from the scsi
      request to the bio (block IO) and then to the buffer layer.  The buffer
      layer can then suppress needless printks.
      
      This patch declutters the kernel log by removed the 40-50 (per lun)
      buffer io error messages seen during a boot in my multipath setup . It
      is a good chance any real errors will be missed in the "noise" it the
      logs without this patch.
      
      During boot I see blocks of messages like
      "
      __ratelimit: 211 callbacks suppressed
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242847
      Buffer I/O error on device sdm, logical block 1
      Buffer I/O error on device sdm, logical block 5242878
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242872
      "
      in my logs.
      
      My disk environment is multipath fiber channel using the SCSI_DH_RDAC
      code and multipathd.  This topology includes an "active" and "ghost"
      path for each lun. IO's to the "ghost" path will never complete and the
      SCSI layer, via the scsi device handler rdac code, quick returns the IOs
      to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
      layer messages.
      
       I am wanting to extend the QUIET behavior to include the buffer file
      system layer to deal with these errors as well. I have been running this
      patch for a while now on several boxes without issue.  A few runs of
      bonnie++ show no noticeable difference in performance in my setup.
      
      Thanks for John Stultz for the quiet_error finalization.
      Submitted-by: default avatarKeith Mannthey <kmannth@us.ibm.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      08bafc03
  7. 20 Oct, 2008 1 commit
  8. 04 Aug, 2008 1 commit
  9. 28 Jul, 2008 1 commit
    • Hisashi Hifumi's avatar
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi authored
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: default avatarHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
  10. 29 Apr, 2008 1 commit
  11. 13 Feb, 2008 1 commit
  12. 28 Jan, 2008 1 commit
  13. 16 Oct, 2007 4 commits
  14. 19 Jul, 2007 1 commit
    • David Chinner's avatar
      [FS] Implement block_page_mkwrite. · 54171690
      David Chinner authored
      Many filesystems need a ->page-mkwrite callout to correctly
      set up pages that have been written to by mmap. This is especially
      important when mmap is writing into holes as it allows filesystems
      to correctly account for and allocate space before the mmap
      write is allowed to proceed.
      
      Protection against truncate races is provided by locking the page
      and checking to see whether the page mapping is correct and whether
      it is beyond EOF so we don't end up allowing allocations beyond
      the current EOF or changing EOF as a result of a mmap write.
      
      SGI-PV: 940392
      SGI-Modid: 2.6.x-xfs-melb:linux:29146a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarTim Shimmin <tes@sgi.com>
      54171690
  15. 07 May, 2007 2 commits
  16. 12 Feb, 2007 2 commits
  17. 17 Oct, 2006 1 commit
    • Jan Kara's avatar
      [PATCH] Fix IO error reporting on fsync() · 58ff407b
      Jan Kara authored
      When IO error happens on metadata buffer, buffer is freed from memory and
      later fsync() is called, filesystems like ext2 fail to report EIO.  We
      
      solve the problem by introducing a pointer to associated address space into
      the buffer_head.  When a buffer is removed from a list of metadata buffers
      associated with an address space, IO error is transferred from the buffer to
      the address space, so that fsync can later report it.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      58ff407b
  18. 30 Sep, 2006 2 commits
    • David Howells's avatar
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells authored
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9361401e
    • David Howells's avatar
      [PATCH] BLOCK: Move functions out of buffer code [try #6] · cf9a2ae8
      David Howells authored
      Move some functions out of the buffering code that aren't strictly buffering
      specific.  This is a precursor to being able to disable the block layer.
      
       (*) Moved some stuff out of fs/buffer.c:
      
           (*) The file sync and general sync stuff moved to fs/sync.c.
      
           (*) The superblock sync stuff moved to fs/super.c.
      
           (*) do_invalidatepage() moved to mm/truncate.c.
      
           (*) try_to_release_page() moved to mm/filemap.c.
      
       (*) Moved some related declarations between header files:
      
           (*) declarations for do_invalidatepage() and try_to_release_page() moved
           	 to linux/mm.h.
      
           (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cf9a2ae8
  19. 27 Jun, 2006 1 commit
  20. 26 Mar, 2006 4 commits
  21. 08 Jan, 2006 1 commit
  22. 30 Oct, 2005 1 commit
    • Jan Kara's avatar
      [PATCH] ext3: Fix unmapped buffers in transaction's lists · aaa4059b
      Jan Kara authored
      Fix the problem (BUG 4964) with unmapped buffers in transaction's
      t_sync_data list.  The problem is we need to call filesystem's own
      invalidatepage() from block_write_full_page().
      
      block_write_full_page() must call filesystem's invalidatepage().  Otherwise
      following nasty race can happen:
      
         proc 1                                        proc 2
         ------                                        ------
      - write some new data to 'offset'
        => bh gets to the transactions data list
                                                    - starts truncate
                                                      => i_size set to new size
      - mpage_writepages()
        - ext3_ordered_writepage() to 'offset'
          - block_write_full_page()
            - page->index > end_index+1
              - block_invalidatepage()
                - discard_buffer()
                  - clear_buffer_mapped()
      
      - commit triggers and finds unmapped buffer - BOOM!
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      aaa4059b
  23. 29 Oct, 2005 1 commit
    • Hugh Dickins's avatar
      [PATCH] mm: split page table lock · 4c21e2f2
      Hugh Dickins authored
      Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
      a many-threaded application which concurrently initializes different parts of
      a large anonymous area.
      
      This patch corrects that, by using a separate spinlock per page table page, to
      guard the page table entries in that page, instead of using the mm's single
      page_table_lock.  (But even then, page_table_lock is still used to guard page
      table allocation, and anon_vma allocation.)
      
      In this implementation, the spinlock is tucked inside the struct page of the
      page table page: with a BUILD_BUG_ON in case it overflows - which it would in
      the case of 32-bit PA-RISC with spinlock debugging enabled.
      
      Splitting the lock is not quite for free: another cacheline access.  Ideally,
      I suppose we would use split ptlock only for multi-threaded processes on
      multi-cpu machines; but deciding that dynamically would have its own costs.
      So for now enable it by config, at some number of cpus - since the Kconfig
      language doesn't support inequalities, let preprocessor compare that with
      NR_CPUS.  But I don't think it's worth being user-configurable: for good
      testing of both split and unsplit configs, split now at 4 cpus, and perhaps
      change that to 8 later.
      
      There is a benefit even for singly threaded processes: kswapd can be attacking
      one part of the mm while another part is busy faulting.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4c21e2f2
  24. 28 Oct, 2005 1 commit
    • Al Viro's avatar
      [PATCH] gfp_t: fs/* · 27496a8c
      Al Viro authored
       - ->releasepage() annotated (s/int/gfp_t), instances updated
       - missing gfp_t in fs/* added
       - fixed misannotation from the original sweep caught by bitwise checks:
         XFS used __nocast both for gfp_t and for flags used by XFS allocator.
         The latter left with unsigned int __nocast; we might want to add a
         different type for those but for now let's leave them alone.  That,
         BTW, is a case when __nocast use had been actively confusing - it had
         been used in the same code for two different and similar types, with
         no way to catch misuses.  Switch of gfp_t to bitwise had caught that
         immediately...
      
      One tricky bit is left alone to be dealt with later - mapping->flags is
      a mix of gfp_t and error indications.  Left alone for now.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      27496a8c
  25. 08 Oct, 2005 1 commit
  26. 07 Jul, 2005 1 commit
  27. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4