1. 09 Oct, 2012 2 commits
  2. 30 Jul, 2012 1 commit
  3. 05 May, 2012 2 commits
  4. 11 Jan, 2012 1 commit
  5. 04 Dec, 2011 1 commit
  6. 29 Oct, 2011 1 commit
  7. 03 Aug, 2011 1 commit
  8. 26 May, 2011 1 commit
  9. 30 Mar, 2011 1 commit
  10. 29 Mar, 2011 5 commits
  11. 23 Mar, 2011 1 commit
  12. 21 Jan, 2011 2 commits
  13. 18 Oct, 2010 1 commit
    • Peter Zijlstra's avatar
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra authored
      
      
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e360adbe
  14. 19 Sep, 2010 1 commit
  15. 27 Jul, 2010 1 commit
  16. 21 Sep, 2009 1 commit
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      
      
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
  17. 01 Jul, 2009 1 commit
  18. 11 Jun, 2009 1 commit
  19. 20 Oct, 2008 1 commit
    • Matt Helsley's avatar
      container freezer: implement freezer cgroup subsystem · dc52ddc0
      Matt Helsley authored
      
      
      This patch implements a new freezer subsystem in the control groups
      framework.  It provides a way to stop and resume execution of all tasks in
      a cgroup by writing in the cgroup filesystem.
      
      The freezer subsystem in the container filesystem defines a file named
      freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
      in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
      the cgroup.  Reading will return the current state.
      
      * Examples of usage :
      
         # mkdir /containers/freezer
         # mount -t cgroup -ofreezer freezer  /containers
         # mkdir /containers/0
         # echo $some_pid > /containers/0/tasks
      
      to get status of the freezer subsystem :
      
         # cat /containers/0/freezer.state
         RUNNING
      
      to freeze all tasks in the container :
      
         # echo FROZEN > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         FREEZING
         # cat /containers/0/freezer.state
         FROZEN
      
      to unfreeze all tasks in the container :
      
         # echo RUNNING > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         RUNNING
      
      This is the basic mechanism which should do the right thing for user space
      task in a simple scenario.
      
      It's important to note that freezing can be incomplete.  In that case we
      return EBUSY.  This means that some tasks in the cgroup are busy doing
      something that prevents us from completely freezing the cgroup at this
      time.  After EBUSY, the cgroup will remain partially frozen -- reflected
      by freezer.state reporting "FREEZING" when read.  The state will remain
      "FREEZING" until one of these things happens:
      
      	1) Userspace cancels the freezing operation by writing "RUNNING" to
      		the freezer.state file
      	2) Userspace retries the freezing operation by writing "FROZEN" to
      		the freezer.state file (writing "FREEZING" is not legal
      		and returns EIO)
      	3) The tasks that blocked the cgroup from entering the "FROZEN"
      		state disappear from the cgroup's set of tasks.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: export thaw_process]
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Acked-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Tested-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc52ddc0
  20. 09 Feb, 2008 1 commit
  21. 08 Feb, 2008 1 commit
    • H. Peter Anvin's avatar
      avoid overflows in kernel/time.c · bdc80787
      H. Peter Anvin authored
      
      
      When the conversion factor between jiffies and milli- or microseconds is
      not a single multiply or divide, as for the case of HZ == 300, we currently
      do a multiply followed by a divide.  The intervening result, however, is
      subject to overflows, especially since the fraction is not simplified (for
      HZ == 300, we multiply by 300 and divide by 1000).
      
      This is exposed to the user when passing a large timeout to poll(), for
      example.
      
      This patch replaces the multiply-divide with a reciprocal multiplication on
      32-bit platforms.  When the input is an unsigned long, there is no portable
      way to do this on 64-bit platforms there is no portable way to do this
      since it requires a 128-bit intermediate result (which gcc does support on
      64-bit platforms but may generate libgcc calls, e.g.  on 64-bit s390), but
      since the output is a 32-bit integer in the cases affected, just simplify
      the multiply-divide (*3/10 instead of *300/1000).
      
      The reciprocal multiply used can have off-by-one errors in the upper half
      of the valid output range.  This could be avoided at the expense of having
      to deal with a potential 65-bit intermediate result.  Since the intent is
      to avoid overflow problems and most of the other time conversions are only
      semiexact, the off-by-one errors were considered an acceptable tradeoff.
      
      At Ralf Baechle's suggestion, this version uses a Perl script to compute
      the necessary constants.  We already have dependencies on Perl for kernel
      compiles.  This does, however, require the Perl module Math::BigInt, which
      is included in the standard Perl distribution starting with version 5.8.0.
      In order to support older versions of Perl, include a table of canned
      constants in the script itself, and structure the script so that
      Math::BigInt isn't required if pulling values from said table.
      
      Running the script requires that the HZ value is available from the
      Makefile.  Thus, this patch also adds the Kconfig variable CONFIG_HZ to the
      architectures which didn't already have it (alpha, cris, frv, h8300, m32r,
      m68k, m68knommu, sparc, v850, and xtensa.) It does *not* touch the sh or
      sh64 architectures, since Paul Mundt has dealt with those separately in the
      sh tree.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>,
      Cc: Sam Ravnborg <sam@ravnborg.org>,
      Cc: Paul Mundt <lethal@linux-sh.org>,
      Cc: Richard Henderson <rth@twiddle.net>,
      Cc: Michael Starvik <starvik@axis.com>,
      Cc: David Howells <dhowells@redhat.com>,
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>,
      Cc: Hirokazu Takata <takata@linux-m32r.org>,
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>,
      Cc: Roman Zippel <zippel@linux-m68k.org>,
      Cc: William L. Irwin <sparclinux@vger.kernel.org>,
      Cc: Chris Zankel <chris@zankel.net>,
      Cc: H. Peter Anvin <hpa@zytor.com>,
      Cc: Jan Engelhardt <jengelh@computergmbh.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bdc80787
  22. 05 Feb, 2008 1 commit
  23. 03 Feb, 2008 2 commits
  24. 01 Feb, 2008 2 commits
  25. 19 Oct, 2007 1 commit
  26. 17 May, 2007 1 commit
    • Christoph Lameter's avatar
      Slab allocators: define common size limitations · 0aa817f0
      Christoph Lameter authored
      
      
      Currently we have a maze of configuration variables that determine the
      maximum slab size.  Worst of all it seems to vary between SLAB and SLUB.
      
      So define a common maximum size for kmalloc.  For conveniences sake we use
      the maximum size ever supported which is 32 MB.  We limit the maximum size
      to a lower limit if MAX_ORDER does not allow such large allocations.
      
      For many architectures this patch will have the effect of adding large
      kmalloc sizes.  x86_64 adds 5 new kmalloc sizes.  So a small amount of
      memory will be needed for these caches (contemporary SLAB has dynamically
      sizeable node and cpu structure so the waste is less than in the past)
      
      Most architectures will then be able to allocate object with sizes up to
      MAX_ORDER.  We have had repeated breakage (in fact whenever we doubled the
      number of supported processors) on IA64 because one or the other struct
      grew beyond what the slab allocators supported.  This will avoid future
      issues and f.e.  avoid fixes for 2k and 4k cpu support.
      
      CONFIG_LARGE_ALLOCS is no longer necessary so drop it.
      
      It fixes sparc64 with SLAB.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatar"David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0aa817f0
  27. 09 May, 2007 1 commit
  28. 07 May, 2007 1 commit
    • Christoph Lameter's avatar
      SLUB core · 81819f0f
      Christoph Lameter authored
      
      
      This is a new slab allocator which was motivated by the complexity of the
      existing code in mm/slab.c. It attempts to address a variety of concerns
      with the existing implementation.
      
      A. Management of object queues
      
         A particular concern was the complex management of the numerous object
         queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
         each allocating CPU and use objects from a slab directly instead of
         queueing them up.
      
      B. Storage overhead of object queues
      
         SLAB Object queues exist per node, per CPU. The alien cache queue even
         has a queue array that contain a queue for each processor on each
         node. For very large systems the number of queues and the number of
         objects that may be caught in those queues grows exponentially. On our
         systems with 1k nodes / processors we have several gigabytes just tied up
         for storing references to objects for those queues  This does not include
         the objects that could be on those queues. One fears that the whole
         memory of the machine could one day be consumed by those queues.
      
      C. SLAB meta data overhead
      
         SLAB has overhead at the beginning of each slab. This means that data
         cannot be naturally aligned at the beginning of a slab block. SLUB keeps
         all meta data in the corresponding page_struct. Objects can be naturally
         aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
         boundaries and can fit tightly into a 4k page with no bytes left over.
         SLAB cannot do this.
      
      D. SLAB has a complex cache reaper
      
         SLUB does not need a cache reaper for UP systems. On SMP systems
         the per CPU slab may be pushed back into partial list but that
         operation is simple and does not require an iteration over a list
         of objects. SLAB expires per CPU, shared and alien object queues
         during cache reaping which may cause strange hold offs.
      
      E. SLAB has complex NUMA policy layer support
      
         SLUB pushes NUMA policy handling into the page allocator. This means that
         allocation is coarser (SLUB does interleave on a page level) but that
         situation was also present before 2.6.13. SLABs application of
         policies to individual slab objects allocated in SLAB is
         certainly a performance concern due to the frequent references to
         memory policies which may lead a sequence of objects to come from
         one node after another. SLUB will get a slab full of objects
         from one node and then will switch to the next.
      
      F. Reduction of the size of partial slab lists
      
         SLAB has per node partial lists. This means that over time a large
         number of partial slabs may accumulate on those lists. These can
         only be reused if allocator occur on specific nodes. SLUB has a global
         pool of partial slabs and will consume slabs from that pool to
         decrease fragmentation.
      
      G. Tunables
      
         SLAB has sophisticated tuning abilities for each slab cache. One can
         manipulate the queue sizes in detail. However, filling the queues still
         requires the uses of the spin lock to check out slabs. SLUB has a global
         parameter (min_slab_order) for tuning. Increasing the minimum slab
         order can decrease the locking overhead. The bigger the slab order the
         less motions of pages between per CPU and partial lists occur and the
         better SLUB will be scaling.
      
      G. Slab merging
      
         We often have slab caches with similar parameters. SLUB detects those
         on boot up and merges them into the corresponding general caches. This
         leads to more effective memory use. About 50% of all caches can
         be eliminated through slab merging. This will also decrease
         slab fragmentation because partial allocated slabs can be filled
         up again. Slab merging can be switched off by specifying
         slub_nomerge on boot up.
      
         Note that merging can expose heretofore unknown bugs in the kernel
         because corrupted objects may now be placed differently and corrupt
         differing neighboring objects. Enable sanity checks to find those.
      
      H. Diagnostics
      
         The current slab diagnostics are difficult to use and require a
         recompilation of the kernel. SLUB contains debugging code that
         is always available (but is kept out of the hot code paths).
         SLUB diagnostics can be enabled via the "slab_debug" option.
         Parameters can be specified to select a single or a group of
         slab caches for diagnostics. This means that the system is running
         with the usual performance and it is much more likely that
         race conditions can be reproduced.
      
      I. Resiliency
      
         If basic sanity checks are on then SLUB is capable of detecting
         common error conditions and recover as best as possible to allow the
         system to continue.
      
      J. Tracing
      
         Tracing can be enabled via the slab_debug=T,<slabcache> option
         during boot. SLUB will then protocol all actions on that slabcache
         and dump the object contents on free.
      
      K. On demand DMA cache creation.
      
         Generally DMA caches are not needed. If a kmalloc is used with
         __GFP_DMA then just create this single slabcache that is needed.
         For systems that have no ZONE_DMA requirement the support is
         completely eliminated.
      
      L. Performance increase
      
         Some benchmarks have shown speed improvements on kernbench in the
         range of 5-10%. The locking overhead of slub is based on the
         underlying base allocation size. If we can reliably allocate
         larger order pages then it is possible to increase slub
         performance much further. The anti-fragmentation patches may
         enable further performance increases.
      
      Tested on:
      i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator
      
      SLUB Boot options
      
      slub_nomerge		Disable merging of slabs
      slub_min_order=x	Require a minimum order for slab caches. This
      			increases the managed chunk size and therefore
      			reduces meta data and locking overhead.
      slub_min_objects=x	Mininum objects per slab. Default is 8.
      slub_max_order=x	Avoid generating slabs larger than order specified.
      slub_debug		Enable all diagnostics for all caches
      slub_debug=<options>	Enable selective options for all caches
      slub_debug=<o>,<cache>	Enable selective options for a certain set of
      			caches
      
      Available Debug options
      F		Double Free checking, sanity and resiliency
      R		Red zoning
      P		Object / padding poisoning
      U		Track last free / alloc
      T		Trace all allocs / frees (only use for individual slabs).
      
      To use SLUB: Apply this patch and then select SLUB as the default slab
      allocator.
      
      [hugh@veritas.com: fix an oops-causing locking error]
      [akpm@linux-foundation.org: various stupid cleanups and small fixes]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81819f0f
  29. 11 Feb, 2007 1 commit
    • Christoph Lameter's avatar
      [PATCH] optional ZONE_DMA: introduce CONFIG_ZONE_DMA · 66701b14
      Christoph Lameter authored
      
      
      This patch simply defines CONFIG_ZONE_DMA for all arches.  We later do special
      things with CONFIG_ZONE_DMA after the VM and an arch are prepared to work
      without ZONE_DMA.
      
      CONFIG_ZONE_DMA can be defined in two ways depending on how an architecture
      handles ISA DMA.
      
      First if CONFIG_GENERIC_ISA_DMA is set by the arch then we know that the arch
      needs ZONE_DMA because ISA DMA devices are supported.  We can catch this in
      mm/Kconfig and do not need to modify arch code.
      
      Second, arches may use ZONE_DMA in an unknown way.  We set CONFIG_ZONE_DMA for
      all arches that do not set CONFIG_GENERIC_ISA_DMA in order to insure backwards
      compatibility.  The arches may later undefine ZONE_DMA if their arch code has
      been verified to not depend on ZONE_DMA.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      66701b14
  30. 08 Dec, 2006 1 commit
    • David Howells's avatar
      [PATCH] LOG2: Implement a general integer log2 facility in the kernel · f0d1b0b3
      David Howells authored
      
      
      This facility provides three entry points:
      
      	ilog2()		Log base 2 of unsigned long
      	ilog2_u32()	Log base 2 of u32
      	ilog2_u64()	Log base 2 of u64
      
      These facilities can either be used inside functions on dynamic data:
      
      	int do_something(long q)
      	{
      		...;
      		y = ilog2(x)
      		...;
      	}
      
      Or can be used to statically initialise global variables with constant values:
      
      	unsigned n = ilog2(27);
      
      When performing static initialisation, the compiler will report "error:
      initializer element is not constant" if asked to take a log of zero or of
      something not reducible to a constant.  They treat negative numbers as
      unsigned.
      
      When not dealing with a constant, they fall back to using fls() which permits
      them to use arch-specific log calculation instructions - such as BSR on
      x86/x86_64 or SCAN on FRV - if available.
      
      [akpm@osdl.org: MMC fix]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f0d1b0b3
  31. 03 Oct, 2006 1 commit