1. 04 May, 2013 7 commits
  2. 02 May, 2013 1 commit
  3. 01 May, 2013 4 commits
    • Rik van Riel's avatar
      ipc,sem: fine grained locking for semtimedop · 6062a8dc
      Rik van Riel authored
      
      
      Introduce finer grained locking for semtimedop, to handle the common case
      of a program wanting to manipulate one semaphore from an array with
      multiple semaphores.
      
      If the call is a semop manipulating just one semaphore in an array with
      multiple semaphores, only take the lock for that semaphore itself.
      
      If the call needs to manipulate multiple semaphores, or another caller is
      in a transaction that manipulates multiple semaphores, the sem_array lock
      is taken, as well as all the locks for the individual semaphores.
      
      On a 24 CPU system, performance numbers with the semop-multi
      test with N threads and N semaphores, look like this:
      
      	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
      threads			patches		rwlock patches	v3 patches
      10	610652		726325		1783589		2142206
      20	341570		365699		1520453		1977878
      30	288102		307037		1498167		2037995
      40	290714		305955		1612665		2256484
      50	288620		312890		1733453		2650292
      60	289987		306043		1649360		2388008
      70	291298		306347		1723167		2717486
      80	290948		305662		1729545		2763582
      90	290996		306680		1736021		2757524
      100	292243		306700		1773700		3059159
      
      [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
      [davidlohr.bueso@hp.com: make refcounter atomic]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Jason Low <jason.low2@hp.com>
      Reviewed-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarEmmanuel Benisty <benisty.e@gmail.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6062a8dc
    • Rik van Riel's avatar
      ipc,sem: have only one list in struct sem_queue · 9f1bc2c9
      Rik van Riel authored
      
      
      Having only one list in struct sem_queue, and only queueing simple
      semaphore operations on the list for the semaphore involved, allows us to
      introduce finer grained locking for semtimedop.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f1bc2c9
    • Rik van Riel's avatar
      ipc,sem: open code and rename sem_lock · c460b662
      Rik van Riel authored
      
      
      Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
      later that only locks the sem_array and does nothing else.
      
      Open code the locking from ipc_lock() in sem_obtain_lock() so we can
      introduce finer grained locking for the sem_array in the next patch.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c460b662
    • Davidlohr Bueso's avatar
      ipc,sem: do not hold ipc lock more than necessary · 16df3674
      Davidlohr Bueso authored
      
      
      Instead of holding the ipc lock for permissions and security checks, among
      others, only acquire it when necessary.
      
      Some numbers....
      
      1) With Rik's semop-multi.c microbenchmark we can see the following
         results:
      
      Baseline (3.9-rc1):
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 151452270, ops/sec 5048409
      
      +  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
      +   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock
      
      With this patchset:
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 273156400, ops/sec 9105213
      
      +  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
      +   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check
      
      2) While on an Oracle swingbench DSS (data mining) workload the
         improvements are not as exciting as with Rik's benchmark, we can see
         some positive numbers.  For an 8 socket machine the following are the
         percentages of %sys time incurred in the ipc lock:
      
      Baseline (3.9-rc1):
      100 swingbench users: 8,74%
      400 swingbench users: 21,86%
      800 swingbench users: 84,35%
      
      With this patchset:
      100 swingbench users: 8,11%
      400 swingbench users: 19,93%
      800 swingbench users: 77,69%
      
      [riel@redhat.com: fix two locking bugs]
      [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      Acked-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16df3674
  4. 05 Mar, 2013 1 commit
  5. 03 Mar, 2013 1 commit
  6. 06 Sep, 2012 1 commit
  7. 02 Nov, 2011 3 commits
  8. 25 Jul, 2011 1 commit
  9. 20 Jul, 2011 1 commit
  10. 31 Mar, 2011 1 commit
  11. 23 Mar, 2011 1 commit
  12. 01 Oct, 2010 1 commit
    • Dan Rosenberg's avatar
      sys_semctl: fix kernel stack leakage · 982f7c2b
      Dan Rosenberg authored
      
      
      The semctl syscall has several code paths that lead to the leakage of
      uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
      IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
      version of the semid_ds struct.
      
      The copy_semid_to_user() function declares a semid_ds struct on the stack
      and copies it back to the user without initializing or zeroing the
      "sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
      allowing the leakage of 16 bytes of kernel stack memory.
      
      The code is still reachable on 32-bit systems - when calling semctl()
      newer glibc's automatically OR the IPC command with the IPC_64 flag, but
      invoking the syscall directly allows users to use the older versions of
      the struct.
      Signed-off-by: default avatarDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      982f7c2b
  13. 20 Jul, 2010 1 commit
  14. 27 May, 2010 4 commits
    • Julia Lawall's avatar
      ipc/sem.c: use ERR_CAST · 4de85cd6
      Julia Lawall authored
      Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
      clear what is the purpose of the operation, which otherwise looks like a
      no-op.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/
      
      )
      
      // <smpl>
      @@
      type T;
      T x;
      identifier f;
      @@
      
      T f (...) { <+...
      - ERR_PTR(PTR_ERR(x))
      + x
       ...+> }
      
      @@
      expression x;
      @@
      
      - ERR_PTR(PTR_ERR(x))
      + ERR_CAST(x)
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4de85cd6
    • Manfred Spraul's avatar
      ipc/sem.c: update description of the implementation · c5cf6359
      Manfred Spraul authored
      
      
      ipc/sem.c begins with a 15 year old description about bugs in the initial
      implementation in Linux-1.0.  The patch replaces that with a top level
      description of the current code.
      
      A TODO could be derived from this text:
      
      The opengroup man page for semop() does not mandate FIFO.  Thus there is
      no need for a semaphore array list of pending operations.
      
      If
      
      - this list is removed
      - the per-semaphore array spinlock is removed (possible if there is no
        list to protect)
      - sem_otime is moved into the semaphores and calculated on demand during
        semctl()
      
      then the array would be read-mostly - which would significantly improve
      scaling for applications that use semaphore arrays with lots of entries.
      
      The price would be expensive semctl() calls:
      
      	for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
      	<do stuff>
      	for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);
      
      I'm not sure if the complexity is worth the effort, thus here is the
      documentation of the current behavior first.
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5cf6359
    • Manfred Spraul's avatar
      ipc/sem.c: move wake_up_process out of the spinlock section · 0a2b9d4c
      Manfred Spraul authored
      
      
      The wake-up part of semtimedop() consists out of two steps:
      
      - the right tasks must be identified.
      - they must be woken up.
      
      Right now, both steps run while the array spinlock is held.  This patch
      reorders the code and moves the actual wake_up_process() behind the point
      where the spinlock is dropped.
      
      The code also moves setting sem->sem_otime to one place: It does not make
      sense to set the last modify time multiple times.
      
      [akpm@linux-foundation.org: repair kerneldoc]
      [akpm@linux-foundation.org: fix uninitialised retval]
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a2b9d4c
    • Manfred Spraul's avatar
      ipc/sem.c: optimize update_queue() for bulk wakeup calls · fd5db422
      Manfred Spraul authored
      
      
      The following series of patches tries to fix the spinlock contention
      reported by Chris Mason - his benchmark exposes problems of the current
      code:
      
      - In the worst case, the algorithm used by update_queue() is O(N^2).
        Bulk wake-up calls can enter this worst case.  The patch series fix
        that.
      
        Note that the benchmark app doesn't expose the problem, it just should
        be fixed: Real world apps might do the wake-ups in another order than
        perfect FIFO.
      
      - The part of the code that runs within the semaphore array spinlock is
        significantly larger than necessary.
      
        The patch series fixes that.  This change is responsible for the main
        improvement.
      
      - The cacheline with the spinlock is also used for a variable that is
        read in the hot path (sem_base) and for a variable that is unnecessarily
        written to multiple times (sem_otime).  The last step of the series
        cacheline-aligns the spinlock.
      
      This patch:
      
      The SysV semaphore code allows to perform multiple operations on all
      semaphores in the array as atomic operations.  After a modification,
      update_queue() checks which of the waiting tasks can complete.
      
      The algorithm that is used to identify the tasks is O(N^2) in the worst
      case.  For some cases, it is simple to avoid the O(N^2).
      
      The patch adds a detection logic for some cases, especially for the case
      of an array where all sleeping tasks are single sembuf operations and a
      multi-sembuf operation is used to wake up multiple tasks.
      
      A big database application uses that approach.
      
      The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
      the patch breaks that.
      
      [akpm@linux-foundation.org: make do_smart_update() static]
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd5db422
  15. 16 Dec, 2009 9 commits
  16. 15 Apr, 2009 1 commit
  17. 14 Jan, 2009 2 commits