1. 12 May, 2016 1 commit
  2. 22 May, 2014 1 commit
  3. 07 May, 2014 1 commit
  4. 18 Apr, 2014 1 commit
  5. 18 Jul, 2013 1 commit
  6. 07 Feb, 2013 1 commit
  7. 26 Jul, 2012 1 commit
  8. 23 Jan, 2012 1 commit
    • Randy Dunlap's avatar
      kernel-doc: fix kernel-doc warnings in sched · fa757281
      Randy Dunlap authored
      Fix new kernel-doc notation warnings:
      Warning(include/linux/sched.h:2094): No description found for parameter 'p'
      Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task'
      Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri'
      Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set'
      Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc:	Ingo Molnar <mingo@elte.hu>
      Cc:	Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  9. 17 Nov, 2011 1 commit
  10. 14 Aug, 2011 3 commits
    • Yong Zhang's avatar
      sched/cpupri: Remove cpupri->pri_active · 5710f15b
      Yong Zhang authored
      Since [sched/cpupri: Remove the vec->lock], member pri_active
      of struct cpupri is not needed any more, just remove it. Also
      clean stuff related to it.
      Signed-off-by: default avatarYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110806001004.GA2207@zhySigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    • Steven Rostedt's avatar
      sched/cpupri: Fix memory barriers for vec updates to always be in order · d473750b
      Steven Rostedt authored
      [ This patch actually compiles. Thanks to Mike Galbraith for pointing
      that out. I compiled and booted this patch with no issues. ]
      Re-examining the cpupri patch, I see there's a possible race because the
      update of the two priorities vec->counts are not protected by a memory
      When a RT runqueue is overloaded and wants to push an RT task to another
      runqueue, it scans the RT priority vectors in a loop from lowest
      priority to highest.
      When we queue or dequeue an RT task that changes a runqueue's highest
      priority task, we update the vectors to show that a runqueue is rated at
      a different priority. To do this, we first set the new priority mask,
      and increment the vec->count, and then set the old priority mask by
      decrementing the vec->count.
      If we are lowering the runqueue's RT priority rating, it will trigger a
      RT pull, and we do not care if we miss pushing to this runqueue or not.
      But if we raise the priority, but the priority is still lower than an RT
      task that is looking to be pushed, we must make sure that this runqueue
      is still seen by the push algorithm (the loop).
      Because the loop reads from lowest to highest, and the new priority is
      set before the old one is cleared, we will either see the new or old
      priority set and the vector will be checked.
      But! Since there's no memory barrier between the updates of the two, the
      old count may be decremented first before the new count is incremented.
      This means the loop may see the old count of zero and skip it, and also
      the new count of zero before it was updated. A possible runqueue that
      the RT task could move to could be missed.
      A conditional memory barrier is placed between the vec->count updates
      and is only called when both updates are done.
      The smp_wmb() has also been changed to smp_mb__before_atomic_inc/dec(),
      as they are not needed by archs that already synchronize
      The smp_rmb() has been moved to be called at every iteration of the loop
      so that the race between seeing the two updates is visible by each
      iteration of the loop, as an arch is free to optimize the reading of
      memory of the counters in the loop.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1312547269.18583.194.camel@gandalf.stny.rr.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    • Steven Rostedt's avatar
      sched/cpupri: Remove the vec->lock · c92211d9
      Steven Rostedt authored
      sched/cpupri: Remove the vec->lock
      The cpupri vec->lock has been showing up as a top contention
      lately. This is because of the RT push/pull logic takes an
      agressive approach for migrating RT tasks. The cpupri logic is
      in place to improve the performance of the push/pull when dealing
      with large number CPU machines.
      The problem though is a vec->lock is required, where a vec is a
      global per RT priority structure. That is, if there are lots of
      RT tasks at the same priority, every time they are added or removed
      from the RT queue, this global vec->lock is taken. Now that more
      kernel threads are becoming RT (RCU boost and threaded interrupts)
      this is becoming much more of an issue.
      There are two variables that are being synced by the vec->lock.
      The cpupri bitmask, and the vec->counter. The cpupri bitmask
      is one bit per priority. If a RT priority vec has a process queued,
      then the vec->count is > 0 and the cpupri bitmask is set for that
      RT priority.
      If the cpupri bitmask gets out of sync with the vec->counter, we could
      end up pushing a low proirity RT task to a high priority queue.
      That RT task that could have run immediately could be queued on a
      run queue with a higher priority task indefinitely.
      The solution is not to use the cpupri bitmask and just look at the
      vec->count directly when doing a pull. The cpupri bitmask is just
      a fast way to scan the RT priorities when a pull is made. Instead
      of using the bitmask, and just examine all RT priorities, and
      look at the vec->counts, we could eliminate the vec->lock. The
      scan of RT tasks is to find a run queue that we can push an RT task
      to, and we do not push to a high priority queue, thus the scan only
      needs to go from 1 to RT task->prio, and not all 100 RT priorities.
      The push algorithm, which does the scan of RT priorities (and
      scan of the bitmask) only happens when we have an overloaded RT run
      queue (more than one RT task queued). The grabbing of the vec->lock
      happens every time any RT task is queued or dequeued on the run
      queue for that priority. The slowing down of the scan by not using
      a bitmask is negligible by the speed up of removing the vec->lock
      contention, and replacing it with an atomic counter and memory barrier.
      To prove this, I wrote a patch that times both the loop and the code
      that grabs the vec->locks. I passed the patches to various people
      (and companies) to test and show the results. I let everyone choose
      their own load to test, giving different loads on the system,
      for various different setups.
      Here's some of the results: (snipping to a few CPUs to not make
      this change log huge, but the results were consistent across
      the entire system).
      System 1 (24 CPUs)
      Before patch:
      CPU:    Name    Count   Max     Min     Average Total
      ----    ----    -----   ---     ---     ------- -----
      cpu 20: loop    3057    1.766   0.061   0.642   1963.170
              vec     6782949 90.469  0.089   0.414   2811760.503
      cpu 21: loop    2617    1.723   0.062   0.641   1679.074
              vec     6782810 90.499  0.089   0.291   1978499.900
      cpu 22: loop    2212    1.863   0.063   0.699   1547.160
              vec     6767244 85.685  0.089   0.435   2949676.898
      cpu 23: loop    2320    2.013   0.062   0.594   1380.265
              vec     6781694 87.923  0.088   0.431   2928538.224
      After patch:
      cpu 20: loop    2078    1.579   0.061   0.533   1108.006
              vec     6164555 5.704   0.060   0.143   885185.809
      cpu 21: loop    2268    1.712   0.065   0.575   1305.248
              vec     6153376 5.558   0.060   0.187   1154960.469
      cpu 22: loop    1542    1.639   0.095   0.533   823.249
              vec     6156510 5.720   0.060   0.190   1172727.232
      cpu 23: loop    1650    1.733   0.068   0.545   900.781
              vec     6170784 5.533   0.060   0.167   1034287.953
      All times are in microseconds. The 'loop' is the amount of time spent
      doing the loop across the priorities (before patch uses bitmask).
      the 'vec' is the amount of time in the code that requires grabbing
      the vec->lock. The second patch just does not have the vec lock, but
      encompasses the same code.
      Amazingly the loop code even went down on average. The vec code went
      from .5 down to .18, that's more than half the time spent!
      Note, more than one test was run, but they all had the same results.
      System 2 (64 CPUs)
      Before patch:
      CPU:    Name    Count   Max     Min     Average Total
      ----    ----    -----   ---     ---     ------- -----
      cpu 60: loop    0       0       0       0       0
              vec     5410840 277.954 0.084   0.782   4232895.727
      cpu 61: loop    0       0       0       0       0
              vec     4915648 188.399 0.084   0.570   2803220.301
      cpu 62: loop    0       0       0       0       0
              vec     5356076 276.417 0.085   0.786   4214544.548
      cpu 63: loop    0       0       0       0       0
              vec     4891837 170.531 0.085   0.799   3910948.833
      After patch:
      cpu 60: loop    0       0       0       0       0
              vec     5365118 5.080   0.021   0.063   340490.267
      cpu 61: loop    0       0       0       0       0
              vec     4898590 1.757   0.019   0.071   347903.615
      cpu 62: loop    0       0       0       0       0
              vec     5737130 3.067   0.021   0.119   687108.734
      cpu 63: loop    0       0       0       0       0
              vec     4903228 1.822   0.021   0.071   348506.477
      The test run during the measurement did not have any (very few,
      from other CPUs) RT tasks pushing. But this shows that it helped
      out tremendously with the contention, as the contention happens
      because the vec->lock is taken only on queuing at an RT priority,
      and different CPUs that queue tasks at the same priority will
      have contention.
      I tested on my own 4 CPU machine with the following results:
      Before patch:
      CPU:    Name    Count   Max     Min     Average Total
      ----    ----    -----   ---     ---     ------- -----
      cpu 0:  loop    2377    1.489   0.158   0.588   1398.395
              vec     4484    770.146 2.301   4.396   19711.755
      cpu 1:  loop    2169    1.962   0.160   0.576   1250.110
              vec     4425    152.769 2.297   4.030   17834.228
      cpu 2:  loop    2324    1.749   0.155   0.559   1299.799
              vec     4368    779.632 2.325   4.665   20379.268
      cpu 3:  loop    2325    1.629   0.157   0.561   1306.113
              vec     4650    408.782 2.394   4.348   20222.577
      After patch:
      CPU:    Name    Count   Max     Min     Average Total
      ----    ----    -----   ---     ---     ------- -----
      cpu 0:  loop    2121    1.616   0.113   0.636   1349.189
              vec     4303    1.151   0.225   0.421   1811.966
      cpu 1:  loop    2130    1.638   0.178   0.644   1372.927
              vec     4627    1.379   0.235   0.428   1983.648
      cpu 2:  loop    2056    1.464   0.165   0.637   1310.141
              vec     4471    1.311   0.217   0.433   1937.927
      cpu 3:  loop    2154    1.481   0.162   0.601   1295.083
              vec     4236    1.253   0.230   0.425   1803.008
      This was running my migrate.c code that can be found at:
      The migrate code does stress the RT tasks a bit. This shows that
      the loop did increase a little after the patch, but not by much.
      The vec code dropped dramatically. From 4.3us down to .42us.
      That's a 10x improvement!
      Tested-by: default avatarMike Galbraith <mgalbraith@suse.de>
      Tested-by: default avatarLuis Claudio R. Gonçalves <lgoncalv@redhat.com>
      Tested-by: Matthew Hank Sabins<msabins@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarGregory Haskins <gregory.haskins@gmail.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Chris Mason <chris.mason@oracle.com>
      Link: http://lkml.kernel.org/r/1312317372.18583.101.camel@gandalf.stny.rr.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
  11. 17 Jul, 2010 1 commit
  12. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  13. 06 Mar, 2010 1 commit
  14. 04 Feb, 2010 1 commit
  15. 01 Feb, 2010 1 commit
  16. 14 Dec, 2009 1 commit
  17. 02 Aug, 2009 2 commits
    • Steven Rostedt's avatar
      sched: Add new prio to cpupri before removing old prio · c3a2ae3d
      Steven Rostedt authored
      We need to add the new prio to the cpupri accounting before
      removing the old prio. This is because removing the old prio
      first will open a race window where the cpu will be removed
      from pri_active. In this case the cpu will not be visible for
      RT push and pulls. This could cause a RT task to not migrate
      appropriately, and create a very large latency.
      This bug was found with the use of ftrace sched events and
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090729042526.438281019@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    • Gregory Haskins's avatar
      sched: Fix race in cpupri introduced by cpumask_var changes · 07903af1
      Gregory Haskins authored
      Several race conditions in the scheduler have cropped up
      recently, which Steven and I have tracked down using ftrace.
      The most recent one turns out to be a race in how the scheduler
      determines a suitable migration target for RT tasks, introduced
      recently with commit:
          commit 68e74568
          Date:   Tue Nov 25 02:35:13 2008 +1030
              sched: convert struct cpupri_vec cpumask_var_t.
      The original design of cpupri allowed lockless readers to
      quickly determine a best-estimate target.  Races between the
      pri_active bitmap and the vec->mask were handled in the
      original code because we would detect and return "0" when this
      occured.  The design was predicated on the *effective*
      atomicity (*) of caching the result of cpus_and() between the
      cpus_allowed and the vec->mask.
      Commit 68e74568 changed the behavior such that vec->mask is
      accessed multiple times.  This introduces a subtle race, the
      result of which means we can have a result that returns "1",
      but with an empty bitmap.
      *) yes, we know cpus_and() is not a locked operator across the
         entire composite array, but it is implicitly atomic on a
         per-word basis which is all the design required to work.
      Rather than forgoing the lockless design, or reverting to a
      stack-based cpumask_t, we simply check for when the race has
      been encountered and continue processing in the event that the
      race is hit.  This renders the removal race as if the priority
      bit had been atomically cleared as well, and allows the
      algorithm to execute correctly.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      CC: Rusty Russell <rusty@rustcorp.com.au>
      CC: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  18. 17 Jun, 2009 1 commit
  19. 11 Jun, 2009 1 commit
  20. 09 Jun, 2009 1 commit
  21. 01 Apr, 2009 1 commit
  22. 06 Jan, 2009 1 commit
  23. 24 Nov, 2008 1 commit
  24. 06 Jun, 2008 1 commit