1. 30 Jul, 2012 3 commits
  2. 31 May, 2012 1 commit
  3. 15 Mar, 2012 1 commit
    • Chris Metcalf's avatar
      [PATCH v3] ipc: provide generic compat versions of IPC syscalls · 48b25c43
      Chris Metcalf authored
      When using the "compat" APIs, architectures will generally want to
      be able to make direct syscalls to msgsnd(), shmctl(), etc., and
      in the kernel we would want them to be handled directly by
      compat_sys_xxx() functions, as is true for other compat syscalls.
      However, for historical reasons, several of the existing compat IPC
      syscalls do not do this.  semctl() expects a pointer to the fourth
      argument, instead of the fourth argument itself.  msgsnd(), msgrcv()
      and shmat() expect arguments in different order.
      This change adds an ARCH_WANT_OLD_COMPAT_IPC config option that can be
      set to preserve this behavior for ports that use it (x86, sparc, powerpc,
      s390, and mips).  No actual semantics are changed for those architectures,
      and there is only a minimal amount of code refactoring in ipc/compat.c.
      Newer architectures like tile (and perhaps future architectures such
      as arm64 and unicore64) should not select this option, and thus can
      avoid having any IPC-specific code at all in their architecture-specific
      compat layer.  In the same vein, if this option is not selected, IPC_64
      mode is assumed, since that's what the <asm-generic> headers expect.
      The workaround code in "tile" for msgsnd() and msgrcv() is removed
      with this change; it also fixes the bug that shmat() and semctl() were
      not being properly handled.
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
  4. 26 Feb, 2012 1 commit
  5. 20 Feb, 2012 2 commits
    • H. Peter Anvin's avatar
      compat: Add helper functions to read/write struct timeval, timespec · 6684ba20
      H. Peter Anvin authored
      Add helper functions to read and write struct timeval and struct
      timespec from userspace.  We already had helper functions for reading
      and writing struct compat_timespec; add a set of functions to do the
      same with struct timeval, and add a second suite of functions which
      can be sensitive to COMPAT_USE_64BIT_TIME and access either 32- or
      64-bit time structures.
      This also exports these helper functions to modules.
      Rename the existing inlines for converting between struct
      compat_timeval and native struct timespec so we can have a saner
      naming convention for the exported functions.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
    • H. J. Lu's avatar
      compat: Introduce COMPAT_USE_64BIT_TIME · 45e87781
      H. J. Lu authored
      Allow a compatibility ABI to use a 64-bit time_t and 64-bit members in
      struct timeval and struct timespec to avoid the Y2038 problem.
      This will be used for the x32 ABI.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
  6. 03 Jan, 2012 1 commit
  7. 03 Dec, 2011 1 commit
  8. 31 Oct, 2011 1 commit
    • Christopher Yeoh's avatar
      Cross Memory Attach · fcf63409
      Christopher Yeoh authored
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      There is a basic man page for the proposed interface here:
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: default avatarChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  9. 26 Aug, 2011 1 commit
  10. 15 Jul, 2011 1 commit
    • NeilBrown's avatar
      nfsd: Remove deprecated nfsctl system call and related code. · 49b28684
      NeilBrown authored
      As promised in feature-removal-schedule.txt it is time to
      remove the nfsctl system call.
      Userspace has perferred to not use this call throughout 2.6 and it has been
      excluded in the default configuration since 2.6.36 (9 months ago).
      So this patch removes all the code that was being compiled out.
      There are still references to sys_nfsctl in various arch systemcall tables
      and related code.  These should be cleaned out too, probably in the next
      merge window.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
  11. 27 Jun, 2011 1 commit
  12. 24 May, 2011 1 commit
  13. 19 May, 2011 1 commit
  14. 12 May, 2011 1 commit
    • Chris Metcalf's avatar
      compat: fixes to allow working with tile arch · be84cb43
      Chris Metcalf authored
      The existing <asm-generic/unistd.h> mechanism doesn't really provide
      enough to create the 64-bit "compat" ABI properly in a generic way,
      since the compat ABI is a mix of things were you can re-use the 64-bit
      versions of syscalls and things where you need a compat wrapper.
      To provide this in the most direct way possible, I added two new macros
      to go along with the existing __SYSCALL and __SC_3264 macros: __SC_COMP
      and SC_COMP_3264.  These macros take an additional argument, typically a
      "compat_sys_xxx" function, which is passed to __SYSCALL if you define
      __SYSCALL_COMPAT when including the header, resulting in a pointer to
      the compat function being placed in the generated syscall table.
      The change also adds some missing definitions to <linux/compat.h> so that
      it actually has declarations for all the compat syscalls, since the
      "[nr] = ##call" approach requires proper C declarations for all the
      functions included in the syscall table.
      Finally, compat.c defines compat_sys_sigpending() and
      compat_sys_sigprocmask() even if the underlying architecture doesn't
      request it, which tries to pull in undefined compat_old_sigset_t defines.
      We need to guard those compat syscall definitions with appropriate
      __ARCH_WANT_SYS_xxx ifdefs.
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
  15. 14 Sep, 2010 1 commit
    • H. Peter Anvin's avatar
      compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5
      H. Peter Anvin authored
      compat_alloc_user_space() expects the caller to independently call
      access_ok() to verify the returned area.  A missing call could
      introduce problems on some architectures.
      This patch incorporates the access_ok() check into
      compat_alloc_user_space() and also adds a sanity check on the length.
      The existing compat_alloc_user_space() implementations are renamed
      arch_compat_alloc_user_space() and are used as part of the
      implementation of the new global function.
      This patch assumes NULL will cause __get_user()/__put_user() to either
      fail or access userspace on all architectures.  This should be
      followed by checking the return value of compat_access_user_space()
      for NULL in the callers, at which time the access_ok() in the callers
      can also be removed.
      Reported-by: default avatarBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: <stable@kernel.org>
  16. 13 Aug, 2010 1 commit
  17. 27 May, 2010 1 commit
  18. 12 Mar, 2010 1 commit
  19. 08 Nov, 2009 1 commit
  20. 07 Nov, 2009 1 commit
  21. 06 Nov, 2009 1 commit
  22. 30 Apr, 2009 1 commit
  23. 04 Apr, 2009 1 commit
    • Linus Torvalds's avatar
      Make non-compat preadv/pwritev use native register size · 601cc11d
      Linus Torvalds authored
      Instead of always splitting the file offset into 32-bit 'high' and 'low'
      parts, just split them into the largest natural word-size - which in C
      terms is 'unsigned long'.
      This allows 64-bit architectures to avoid the unnecessary 32-bit
      shifting and masking for native format (while the compat interfaces will
      obviously always have to do it).
      This also changes the order of 'high' and 'low' to be "low first".  Why?
      Because when we have it like this, the 64-bit system calls now don't use
      the "pos_high" argument at all, and it makes more sense for the native
      system call to simply match the user-mode prototype.
      This results in a much more natural calling convention, and allows the
      compiler to generate much more straightforward code.  On x86-64, we now
              testq   %rcx, %rcx      # pos_l
              js      .L122   #,
              movq    %rcx, -48(%rbp) # pos_l, pos
      from the C source
              loff_t pos = pos_from_hilo(pos_h, pos_l);
              if (pos < 0)
                      return -EINVAL;
      and the 'pos_h' register isn't even touched.  It used to generate code
              mov     %r8d, %r8d      # pos_low, pos_low
              salq    $32, %rcx       #, tmp71
              movq    %r8, %rax       # pos_low, pos.386
              orq     %rcx, %rax      # tmp71, pos.386
              js      .L122   #,
              movq    %rax, -48(%rbp) # pos.386, pos
      which isn't _that_ horrible, but it does show how the natural word size
      is just a more sensible interface (same arguments will hold in the user
      level glibc wrapper function, of course, so the kernel side is just half
      of the equation!)
      Note: in all cases the user code wrapper can again be the same. You can
      just do
      	#define HALF_BITS (sizeof(unsigned long)*4)
      	__syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);
      or something like that.  That way the user mode wrapper will also be
      nicely passing in a zero (it won't actually have to do the shifts, the
      compiler will understand what is going on) for the last argument.
      And that is a good idea, even if nobody will necessarily ever care: if
      we ever do move to a 128-bit lloff_t, this particular system call might
      be left alone.  Of course, that will be the least of our worries if we
      really ever need to care, so this may not be worth really caring about.
      [ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]
      Acked-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  24. 02 Apr, 2009 1 commit
    • Gerd Hoffmann's avatar
      preadv/pwritev: Add preadv and pwritev system calls. · f3554f4b
      Gerd Hoffmann authored
      This patch adds preadv and pwritev system calls.  These syscalls are a
      pretty straightforward combination of pread and readv (same for write).
      They are quite useful for doing vectored I/O in threaded applications.
      Using lseek+readv instead opens race windows you'll have to plug with
      Other systems have such system calls too, for example NetBSD, check
      here: http://www.daemon-systems.org/man/preadv.2.html
      The application-visible interface provided by glibc should look like
      this to be compatible to the existing implementations in the *BSD family:
        ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
        ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);
      This prototype has one problem though: On 32bit archs is the (64bit)
      offset argument unaligned, which the syscall ABI of several archs doesn't
      allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
      we'll need a wrappers in glibc anyway I've decided to push problem to
      glibc entriely and use a syscall prototype which works without
      arch-specific wrappers inside the kernel: The offset argument is
      explicitly splitted into two 32bit values.
      The patch sports the actual system call implementation and the windup in
      the x86 system call tables.  Other archs follow as separate patches.
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  25. 27 Mar, 2009 1 commit
    • Christoph Hellwig's avatar
      generic compat_sys_ustat · 2b1c6bd7
      Christoph Hellwig authored
      Due to a different size of ino_t ustat needs a compat handler, but
      currently only x86 and mips provide one.  Add a generic compat_sys_ustat
      and switch all architectures over to it.  Instead of doing various
      user copy hacks compat_sys_ustat just reimplements sys_ustat as
      it's trivial.  This was suggested by Arnd Bergmann.
      Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
      stack smashing warnings on RHEL/Fedora due to the too large amount of
      data writen by the syscall.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  26. 14 Jan, 2009 1 commit
  27. 30 Nov, 2008 1 commit
  28. 16 Oct, 2008 2 commits
    • Christoph Hellwig's avatar
      compat: generic compat get/settimeofday · b418da16
      Christoph Hellwig authored
      Nothing arch specific in get/settimeofday.  The details of the timeval
      conversion varied a little from arch to arch, but all with the same
      Also add an extern declaration for sys_tz to linux/time.h because externs
      in .c files are fowned upon.  I'll kill the externs in various other files
      in a sparate patch.
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Christoph Hellwig's avatar
      compat: move cp_compat_stat to common code · f7a5000f
      Christoph Hellwig authored
      struct stat / compat_stat is the same on all architectures, so
      cp_compat_stat should be, too.
      Turns out it is, except that various architectures have slightly and some
      high2lowuid/high2lowgid or the direct assignment instead of the
      SET_UID/SET_GID that expands to the correct one anyway.
      This patch replaces the arch-specific cp_compat_stat implementations with
      a common one based on the x86-64 one.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
      Acked-by: Kyle McMartin <kyle@mcmartin.ca> [ parisc bits ]
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  29. 01 May, 2008 1 commit
  30. 30 Mar, 2008 1 commit
  31. 06 Feb, 2008 1 commit
  32. 05 Feb, 2008 1 commit
    • Davide Libenzi's avatar
      timerfd: new timerfd API · 4d672e7a
      Davide Libenzi authored
      This is the new timerfd API as it is implemented by the following patch:
      int timerfd_create(int clockid, int flags);
      int timerfd_settime(int ufd, int flags,
      		    const struct itimerspec *utmr,
      		    struct itimerspec *otmr);
      int timerfd_gettime(int ufd, struct itimerspec *otmr);
      The timerfd_create() API creates an un-programmed timerfd fd.  The "clockid"
      parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
      The timerfd_settime() API give new settings by the timerfd fd, by optionally
      retrieving the previous expiration time (in case the "otmr" parameter is not
      The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
      is set in the "flags" parameter.  Otherwise it's a relative time.
      The timerfd_gettime() API returns the next expiration time of the timer, or
      {0, 0} if the timerfd has not been set yet.
      Like the previous timerfd API implementation, read(2) and poll(2) are
      supported (with the same interface).  Here's a simple test program I used to
      exercise the new timerfd APIs:
      [akpm@linux-foundation.org: coding-style cleanups]
      [akpm@linux-foundation.org: fix ia64 build]
      [akpm@linux-foundation.org: fix m68k build]
      [akpm@linux-foundation.org: fix mips build]
      [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
      [heiko.carstens@de.ibm.com: fix s390]
      [akpm@linux-foundation.org: fix powerpc build]
      [akpm@linux-foundation.org: fix sparc64 more]
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  33. 30 Jan, 2008 3 commits
  34. 14 May, 2007 1 commit