1. 27 May, 2010 1 commit
  2. 24 Mar, 2010 1 commit
  3. 04 Nov, 2009 1 commit
  4. 24 Sep, 2009 1 commit
  5. 11 May, 2009 1 commit
  6. 04 Apr, 2009 1 commit
    • Linus Torvalds's avatar
      Make non-compat preadv/pwritev use native register size · 601cc11d
      Linus Torvalds authored
      
      
      Instead of always splitting the file offset into 32-bit 'high' and 'low'
      parts, just split them into the largest natural word-size - which in C
      terms is 'unsigned long'.
      
      This allows 64-bit architectures to avoid the unnecessary 32-bit
      shifting and masking for native format (while the compat interfaces will
      obviously always have to do it).
      
      This also changes the order of 'high' and 'low' to be "low first".  Why?
      Because when we have it like this, the 64-bit system calls now don't use
      the "pos_high" argument at all, and it makes more sense for the native
      system call to simply match the user-mode prototype.
      
      This results in a much more natural calling convention, and allows the
      compiler to generate much more straightforward code.  On x86-64, we now
      generate
      
              testq   %rcx, %rcx      # pos_l
              js      .L122   #,
              movq    %rcx, -48(%rbp) # pos_l, pos
      
      from the C source
      
              loff_t pos = pos_from_hilo(pos_h, pos_l);
      	...
              if (pos < 0)
                      return -EINVAL;
      
      and the 'pos_h' register isn't even touched.  It used to generate code
      like
      
              mov     %r8d, %r8d      # pos_low, pos_low
              salq    $32, %rcx       #, tmp71
              movq    %r8, %rax       # pos_low, pos.386
              orq     %rcx, %rax      # tmp71, pos.386
              js      .L122   #,
              movq    %rax, -48(%rbp) # pos.386, pos
      
      which isn't _that_ horrible, but it does show how the natural word size
      is just a more sensible interface (same arguments will hold in the user
      level glibc wrapper function, of course, so the kernel side is just half
      of the equation!)
      
      Note: in all cases the user code wrapper can again be the same. You can
      just do
      
      	#define HALF_BITS (sizeof(unsigned long)*4)
      	__syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);
      
      or something like that.  That way the user mode wrapper will also be
      nicely passing in a zero (it won't actually have to do the shifts, the
      compiler will understand what is going on) for the last argument.
      
      And that is a good idea, even if nobody will necessarily ever care: if
      we ever do move to a 128-bit lloff_t, this particular system call might
      be left alone.  Of course, that will be the least of our worries if we
      really ever need to care, so this may not be worth really caring about.
      
      [ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]
      Acked-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      601cc11d
  7. 02 Apr, 2009 1 commit
    • Gerd Hoffmann's avatar
      preadv/pwritev: Add preadv and pwritev system calls. · f3554f4b
      Gerd Hoffmann authored
      This patch adds preadv and pwritev system calls.  These syscalls are a
      pretty straightforward combination of pread and readv (same for write).
      They are quite useful for doing vectored I/O in threaded applications.
      Using lseek+readv instead opens race windows you'll have to plug with
      locking.
      
      Other systems have such system calls too, for example NetBSD, check
      here: http://www.daemon-systems.org/man/preadv.2.html
      
      
      
      The application-visible interface provided by glibc should look like
      this to be compatible to the existing implementations in the *BSD family:
      
        ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
        ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);
      
      This prototype has one problem though: On 32bit archs is the (64bit)
      offset argument unaligned, which the syscall ABI of several archs doesn't
      allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
      we'll need a wrappers in glibc anyway I've decided to push problem to
      glibc entriely and use a syscall prototype which works without
      arch-specific wrappers inside the kernel: The offset argument is
      explicitly splitted into two 32bit values.
      
      The patch sports the actual system call implementation and the windup in
      the x86 system call tables.  Other archs follow as separate patches.
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3554f4b
  8. 14 Jan, 2009 5 commits
  9. 05 Jan, 2009 1 commit
    • Alain Knaff's avatar
      vfs: lseek(fd, 0, SEEK_CUR) race condition · 5b6f1eb9
      Alain Knaff authored
      
      
      This patch fixes a race condition in lseek. While it is expected that
      unpredictable behaviour may result while repositioning the offset of a
      file descriptor concurrently with reading/writing to the same file
      descriptor, this should not happen when merely *reading* the file
      descriptor's offset.
      
      Unfortunately, the only portable way in Unix to read a file
      descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
      concurrently with read/write may mess up the position.
      
      [with fixes from akpm]
      Signed-off-by: default avatarAlain Knaff <alain@knaff.lu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5b6f1eb9
  10. 23 Oct, 2008 1 commit
  11. 02 Jul, 2008 1 commit
    • Andi Kleen's avatar
      Remove BKL from remote_llseek v2 · 9465efc9
      Andi Kleen authored
      
      
      - Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
      failures in all users)
      - Change all users to either use generic_file_llseek_unlocked directly or
      take the BKL around. I changed the file systems who don't use the BKL
      for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
      take the BKL, but explicitely in their own source now.
      
      I moved them all over in a single patch to avoid unbisectable sections.
      
      Open problem: 32bit kernels can corrupt fpos because its modification
      is not atomic, but they can do that anyways because there's other paths who
      modify it without BKL.
      
      Do we need a special lock for the pos/f_version = 0 checks?
      
      Trond says the NFS BKL is likely not needed, but keep it for now
      until his full audit.
      
      v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
          and factor duplicated code (suggested by hch)
      
      Cc: Trond.Myklebust@netapp.com
      Cc: swhiteho@redhat.com
      Cc: sfrench@samba.org
      Cc: vandrove@vc.cvut.cz
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      9465efc9
  12. 22 Apr, 2008 1 commit
  13. 08 Feb, 2008 1 commit
  14. 28 Jan, 2008 1 commit
  15. 24 Jan, 2008 1 commit
  16. 14 Nov, 2007 1 commit
    • Arjan van de Ven's avatar
      mark sys_open/sys_read exports unused · cb51f973
      Arjan van de Ven authored
      sys_open / sys_read were used in the early 1.2 days to load firmware from
      disk inside drivers.  Since 2.0 or so this was deprecated behavior, but
      several drivers still were using this.  Since a few years we have a
      request_firmware() API that implements this in a nice, consistent way.
      Only some old ISA sound drivers (pre-ALSA) still straggled along for some
      time....  however with commit c2b1239a
      
       the
      last user is now gone.
      
      This is a good thing, since using sys_open / sys_read etc for firmware is a
      very buggy to dangerous thing to do; these operations put an fd in the
      process file descriptor table....  which then can be tampered with from
      other threads for example.  For those who don't want the firmware loader,
      filp_open()/vfs_read are the better APIs to use, without this security
      issue.
      
      The patch below marks sys_open and sys_read unused now that they're
      really not used anymore, and for deletion in the 2.6.25 timeframe.
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb51f973
  17. 09 Oct, 2007 1 commit
    • Pavel Emelyanov's avatar
      Cleanup macros for distinguishing mandatory locks · a16877ca
      Pavel Emelyanov authored
      
      
      The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
      inode as "mandatory lockable" and there's a macro for this check called
      MANDATORY_LOCK(inode).  However, fs/locks.c and some filesystems still perform
      the explicit i_mode checking.  Besides, Andrew pointed out, that this macro is
      buggy itself, as it dereferences the inode arg twice.
      
      Convert this macro into static inline function and switch its users to it,
      making the code shorter and more readable.
      
      The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
      for superblock is already known to be true.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a16877ca
  18. 10 Jul, 2007 3 commits
  19. 08 May, 2007 2 commits
  20. 12 Feb, 2007 1 commit
    • Eric Dumazet's avatar
      [PATCH] FS: speed up rw_verify_area() · 163da958
      Eric Dumazet authored
      
      
      oprofile hunting showed a stall in rw_verify_area(), because of triple
      indirection and potential cache misses.
      (file->f_path.dentry->d_inode->i_flock)
      
      By moving initialization of 'struct inode' pointer before the pos/count
      sanity tests, we allow the compiler and processor to perform two loads by
      anticipation, reducing stall, without prefetch() hints.  Even x86 arch has
      enough registers to not use temporary variables and not increase text size.
      
      I validated this patch running a bench and studied oprofile changes, and
      absolute perf of the test program.
      
      Results of my epoll_pipe_bench (source available on request) on a Pentium-M
      1.6 GHz machine
      
      Before :
      # ./epoll_pipe_bench -l 30 -t 20
      Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples
      per call
      (best value out of 10 runs)
      
      After :
      # ./epoll_pipe_bench -l 30 -t 20
      Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples
      per call
      (best value out of 10 runs)
      
      oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 %
      for the rw_verify_area() function.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      163da958
  21. 11 Feb, 2007 1 commit
  22. 13 Dec, 2006 1 commit
  23. 08 Dec, 2006 1 commit
  24. 01 Oct, 2006 4 commits
  25. 10 Jul, 2006 1 commit
  26. 11 Apr, 2006 1 commit
  27. 28 Mar, 2006 1 commit
  28. 25 Mar, 2006 1 commit
  29. 09 Jan, 2006 1 commit
  30. 04 Jan, 2006 1 commit
    • Linus Torvalds's avatar
      Relax the rw_verify_area() error checking. · e28cc715
      Linus Torvalds authored
      
      
      In particular, allow over-large read- or write-requests to be downgraded
      to a more reasonable range, rather than considering them outright errors.
      
      We want to protect lower layers from (the sadly all too common) overflow
      conditions, but prefer to do so by chopping the requests up, rather than
      just refusing them outright.
      
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e28cc715