1. 19 May, 2016 1 commit
  2. 17 Mar, 2016 1 commit
    • John Stultz's avatar
      timer: convert timer_slack_ns from unsigned long to u64 · da8b44d5
      John Stultz authored
      This patchset introduces a /proc/<pid>/timerslack_ns interface which
      would allow controlling processes to be able to set the timerslack value
      on other processes in order to save power by avoiding wakeups (Something
      Android currently does via out-of-tree patches).
      The first patch tries to fix the internal timer_slack_ns usage which was
      defined as a long, which limits the slack range to ~4 seconds on 32bit
      systems.  It converts it to a u64, which provides the same basically
      unlimited slack (500 years) on both 32bit and 64bit machines.
      The second patch introduces the /proc/<pid>/timerslack_ns interface
      which allows the full 64bit slack range for a task to be read or set on
      both 32bit and 64bit machines.
      With these two patches, on a 32bit machine, after setting the slack on
      bash to 10 seconds:
      $ time sleep 1
      real    0m10.747s
      user    0m0.001s
      sys     0m0.005s
      The first patch is a little ugly, since I had to chase the slack delta
      arguments through a number of functions converting them to u64s.  Let me
      know if it makes sense to break that up more or not.
      Other than that things are fairly straightforward.
      This patch (of 2):
      The timer_slack_ns value in the task struct is currently a unsigned
      long.  This means that on 32bit applications, the maximum slack is just
      over 4 seconds.  However, on 64bit machines, its much much larger (~500
      This disparity could make application development a little (as well as
      the default_slack) to a u64.  This means both 32bit and 64bit systems
      have the same effective internal slack range.
      Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
      the interface as a unsigned long, so we preserve that limitation on
      32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
      long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
      actually larger then what can be stored by an unsigned long.
      This patch also modifies hrtimer functions which specified the slack
      delta as a unsigned long.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  3. 13 Oct, 2012 1 commit
  4. 23 Mar, 2012 1 commit
    • Hans Verkuil's avatar
      poll: add poll_requested_events() and poll_does_not_wait() functions · 626cf236
      Hans Verkuil authored
      In some cases the poll() implementation in a driver has to do different
      things depending on the events the caller wants to poll for.  An example
      is when a driver needs to start a DMA engine if the caller polls for
      POLLIN, but doesn't want to do that if POLLIN is not requested but instead
      only POLLOUT or POLLPRI is requested.  This is something that can happen
      in the video4linux subsystem among others.
      Unfortunately, the current epoll/poll/select implementation doesn't
      provide that information reliably.  The poll_table_struct does have it: it
      has a key field with the event mask.  But once a poll() call matches one
      or more bits of that mask any following poll() calls are passed a NULL
      poll_table pointer.
      Also, the eventpoll implementation always left the key field at ~0 instead
      of using the requested events mask.
      This was changed in eventpoll.c so the key field now contains the actual
      events that should be polled for as set by the caller.
      The solution to the NULL poll_table pointer is to set the qproc field to
      NULL in poll_table once poll() matches the events, not the poll_table
      pointer itself.  That way drivers can obtain the mask through a new
      poll_requested_events inline.
      The poll_table_struct can still be NULL since some kernel code calls it
      internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
      that case poll_requested_events() returns ~0 (i.e.  all events).
      Very rarely drivers might want to know whether poll_wait will actually
      wait.  If another earlier file descriptor in the set already matched the
      events the caller wanted to wait for, then the kernel will return from the
      select() call without waiting.  This might be useful information in order
      to avoid doing expensive work.
      A new helper function poll_does_not_wait() is added that drivers can use
      to detect this situation.  This is now used in sock_poll_wait() in
      include/net/sock.h.  This was the only place in the kernel that needed
      this information.
      Drivers should no longer access any of the poll_table internals, but use
      the poll_requested_events() and poll_does_not_wait() access functions
      instead.  In order to enforce that the poll_table fields are now prepended
      with an underscore and a comment was added warning against using them
      This required a change in unix_dgram_poll() in unix/af_unix.c which used
      the key field to get the requested events.  It's been replaced by a call
      to poll_requested_events().
      For qproc it was especially important to change its name since the
      behavior of that field changes with this patch since this function pointer
      can now be NULL when that wasn't possible in the past.
      Any driver accessing the qproc or key fields directly will now fail to compile.
      Some notes regarding the correctness of this patch: the driver's poll()
      function is called with a 'struct poll_table_struct *wait' argument.  This
      pointer may or may not be NULL, drivers can never rely on it being one or
      the other as that depends on whether or not an earlier file descriptor in
      the select()'s fdset matched the requested events.
      There are only three things a driver can do with the wait argument:
      1) obtain the key field:
      	events = wait ? wait->key : ~0;
         This will still work although it should be replaced with the new
         poll_requested_events() function (which does exactly the same).
         This will now even work better, since wait is no longer set to NULL
      2) use the qproc callback. This could be deadly since qproc can now be
         NULL. Renaming qproc should prevent this from happening. There are no
         kernel drivers that actually access this callback directly, BTW.
      3) test whether wait == NULL to determine whether poll would return without
         waiting. This is no longer sufficient as the correct test is now
         wait == NULL || wait->_qproc == NULL.
         However, the worst that can happen here is a slight performance hit in
         the case where wait != NULL and wait->_qproc == NULL. In that case the
         driver will assume that poll_wait() will actually add the fd to the set
         of waiting file descriptors. Of course, poll_wait() will not do that
         since it tests for wait->_qproc. This will not break anything, though.
         There is only one place in the whole kernel where this happens
         (sock_poll_wait() in include/net/sock.h) and that code will be replaced
         by a call to poll_does_not_wait() in the next patch.
         Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
         actually waiting. The next file descriptor from the set might match the
         event mask and thus any possible waits will never happen.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Reviewed-by: default avatarJonathan Corbet <corbet@lwn.net>
      Reviewed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  5. 31 Mar, 2011 1 commit
  6. 10 Dec, 2010 1 commit
  7. 27 Oct, 2010 1 commit
  8. 12 Mar, 2010 1 commit
  9. 04 Oct, 2009 1 commit
  10. 16 Jun, 2009 1 commit
  11. 06 Jan, 2009 1 commit
    • Tejun Heo's avatar
      poll: allow f_op->poll to sleep · 5f820f64
      Tejun Heo authored
      f_op->poll is the only vfs operation which is not allowed to sleep.  It's
      because poll and select implementation used task state to synchronize
      against wake ups, which doesn't have to be the case anymore as wait/wake
      interface can now use custom wake up functions.  The non-sleep restriction
      can be a bit tricky because ->poll is not called from an atomic context
      and the result of accidentally sleeping in ->poll only shows up as
      temporary busy looping when the timing is right or rather wrong.
      This patch converts poll/select to use custom wake up function and use
      separate triggered variable to synchronize against wake up events.  The
      only added overhead is an extra function call during wake up and
      This patch removes the one non-sleep exception from vfs locking rules and
      is beneficial to userland filesystem implementations like FUSE, 9p or
      peculiar fs like spufs as it's very difficult for those to implement
      non-sleeping poll method.
      While at it, make the following cosmetic changes to make poll.h and
      select.c checkpatch friendly.
      * s/type * symbol/type *symbol/		   : three places in poll.h
      * remove blank line before EXPORT_SYMBOL() : two places in select.c
      Oleg: spotted missing barrier in poll_schedule_timeout()
      Davide: spotted missing write barrier in pollwake()
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Brad Boyer <flar@allandria.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  12. 05 Sep, 2008 2 commits
  13. 01 May, 2008 1 commit
  14. 11 Sep, 2007 1 commit
    • Alexey Dobriyan's avatar
      Fix select on /proc files without ->poll · dd23aae4
      Alexey Dobriyan authored
      Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
      786d7e16 aka "Fix rmmod/read/write races
      in /proc entries" broke SBCL + SLIME combo.
      The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
      ->poll handler.  The new code makes ->poll always there and returns 0 by
      default, which is not correct.  Return DEFAULT_POLLMASK instead.
      Steps to reproduce:
      	install emacs, SBCL, SLIME
      	M-x slime	in *inferior-lisp* buffer
      	[watch it doing "Connecting to Swank on port X.."]
      Please, apply before 2.6.23.
      P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  15. 04 Dec, 2006 1 commit
  16. 28 Mar, 2006 1 commit
    • Andi Kleen's avatar
      [PATCH] Optimize select/poll by putting small data sets on the stack · 70674f95
      Andi Kleen authored
      Optimize select and poll by a using stack space for small fd sets
      This brings back an old optimization from Linux 2.0.  Using the stack is
      faster than kmalloc.  On a Intel P4 system it speeds up a select of a
      single pty fd by about 13% (~4000 cycles -> ~3500)
      It also saves memory because a daemon hanging in select or poll will
      usually save one or two less pages.  This can add up - e.g.  if you have 10
      daemons blocking in poll/select you save 40KB of memory.
      I did a patch for this long ago, but it was never applied.  This version is
      a reimplementation of the old patch that tries to be less intrusive.  I
      only did the minimal changes needed for the stack allocation.
      The cut off point before external memory is allocated is currently at
      832bytes.  The system calls always allocate this much memory on the stack.
      These 832 bytes are divided into 256 bytes frontend data (for the select
      bitmaps of the pollfds) and the rest of the space for the wait queues used
      by the low level drivers.  There are some extreme cases where this won't
      work out for select and it falls back to allocating memory too early -
      especially with very sparse large select bitmaps - but the majority of
      processes who only have a small number of file descriptors should be ok.
      [TBD: 832/256 might not be the best split for select or poll]
      I suspect more optimizations might be possible, but they would be more
      complicated.  One way would be to cache the select/poll context over
      multiple system calls because typically the input values should be similar.
       Problem is when to flush the file descriptors out though.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  17. 18 Jan, 2006 1 commit
    • David Woodhouse's avatar
      [PATCH] Add pselect/ppoll system call implementation · 9f72949f
      David Woodhouse authored
      The following implementation of ppoll() and pselect() system calls
      depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
      These system calls have to change the signal mask during their
      operation, and signal handlers must be invoked using the new, temporary
      signal mask. The old signal mask must be restored either upon successful
      exit from the system call, or upon returning from the invoked signal
      handler if the system call is interrupted. We can't simply restore the
      original signal mask and return to userspace, since the restored signal
      mask may actually block the signal which interrupted the system call.
      The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
      path to trap into do_signal() just as TIF_SIGPENDING does, and by
      causing do_signal() to use the saved signal mask instead of the current
      signal mask when setting up the stack frame for the signal handler -- or
      by causing do_signal() to simply restore the saved signal mask in the
      case where there is no handler to be invoked.
      The first patch implements the sys_pselect() and sys_ppoll() system
      calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
      #ifdef should go away in time when all architectures have implemented
      it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
      kernel (in the -mm tree), and the third patch then removes the
      arch-specific implementations of sys_rt_sigsuspend() and replaces them
      with generic versions using the same trick.
      The fourth and fifth patches, provided by David Howells, implement
      TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
      adds the syscalls to the i386 syscall table.
      This patch:
      Add the pselect() and ppoll() system calls, providing core routines usable by
      the original select() and poll() system calls and also the new calls (with
      their semantics w.r.t timeouts).
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  18. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      Let it rip!