1. 09 Aug, 2009 1 commit
    • Frederic Weisbecker's avatar
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker authored
      
      
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f413cdb8
  2. 07 Aug, 2009 4 commits
    • Phillip Lougher's avatar
      bzip2/lzma/gzip: fix comments describing decompressor API · daeb6b6f
      Phillip Lougher authored
      
      
      Fix and improve comments in decompress/generic.h that describe the
      decompressor API.  Also remove an unused definition, and rename INBUF_LEN
      in lib/decompress_inflate.c to conform to bzip2/lzma naming.
      Signed-off-by: default avatarPhillip Lougher <phillip@lougher.demon.co.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      daeb6b6f
    • KAMEZAWA Hiroyuki's avatar
      mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware · 4bfc4495
      KAMEZAWA Hiroyuki authored
      At first, init_task's mems_allowed is initialized as this.
       init_task->mems_allowed == node_state[N_POSSIBLE]
      
      And cpuset's top_cpuset mask is initialized as this
       top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
      
      Before 2.6.29:
      policy's mems_allowed is initialized as this.
      
        1. update tasks->mems_allowed by its cpuset->mems_allowed.
        2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
      
      Updating task's mems_allowed in reference to top_cpuset's one.
      cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
      
      In 2.6.30: After commit 58568d2a
      
      
      ("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
      is initialized as this.
      
        1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
      
      Here, if task is in top_cpuset, task->mems_allowed is not updated from
      init's one.  Assume user excutes command as #numactrl --interleave=all
      ,....
      
        policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
      
      Then, policy's mems_allowd can includes a possible node, which has no pgdat.
      
      MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
      directly.
      
        NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
      
      Then, what's we need is making policy->mems_allowed be aware of
      N_HIGH_MEMORY.  This patch does that.  But to do so, extra nodemask will
      be on statck.  Because I know cpumask has a new interface of
      CPUMASK_ALLOC(), I added it to node.
      
      This patch stands on old behavior.  But I feel this fix itself is just a
      Band-Aid.  But to do fundametal fix, we have to take care of memory
      hotplug and it takes time.  (task->mems_allowd should be N_HIGH_MEMORY, I
      think.)
      
      mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
      should be includes only online nodes.
      
      In old behavior, this is guaranteed by frequent reference to cpuset's
      code.  Now, most of them are removed and mempolicy has to check it by
      itself.
      
      To do check, a few nodemask_t will be used for calculating nodemask.  But,
      size of nodemask_t can be big and it's not good to allocate them on stack.
      
      Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
      NODEMASK_ALLOC/FREE shoudl be there.
      
      [akpm@linux-foundation.org: cleanups & tweaks]
      Tested-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bfc4495
    • Christoph Hellwig's avatar
      vfs: add __destroy_inode · 2e00c97e
      Christoph Hellwig authored
      
      
      When we want to tear down an inode that lost the add to the cache race
      in XFS we must not call into ->destroy_inode because that would delete
      the inode that won the race from the inode cache radix tree.
      
      This patch provides the __destroy_inode helper needed to fix this,
      the actual fix will be in th next patch.  As XFS was the only reason
      destroy_inode was exported we shift the export to the new __destroy_inode.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarEric Sandeen <sandeen@sandeen.net>
      2e00c97e
    • Christoph Hellwig's avatar
      vfs: fix inode_init_always calling convention · 54e34621
      Christoph Hellwig authored
      
      
      Currently inode_init_always calls into ->destroy_inode if the additional
      initialization fails.  That's not only counter-intuitive because
      inode_init_always did not allocate the inode structure, but in case of
      XFS it's actively harmful as ->destroy_inode might delete the inode from
      a radix-tree that has never been added.  This in turn might end up
      deleting the inode for the same inum that has been instanciated by
      another process and cause lots of cause subtile problems.
      
      Also in the case of re-initializing a reclaimable inode in XFS it would
      free an inode we still want to keep alive.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarEric Sandeen <sandeen@sandeen.net>
      54e34621
  3. 05 Aug, 2009 2 commits
  4. 04 Aug, 2009 1 commit
  5. 03 Aug, 2009 2 commits
  6. 02 Aug, 2009 1 commit
    • Peter Zijlstra's avatar
      perf_counter: Full task tracing · 9f498cc5
      Peter Zijlstra authored
      
      
      In order to be able to distinguish between no samples due to
      inactivity and no samples due to task ended, Arjan asked for
      PERF_EVENT_EXIT events. This is useful to the boot delay
      instrumentation (bootchart) app.
      
      This patch changes the PERF_EVENT_FORK to be emitted on every
      clone, and adds PERF_EVENT_EXIT to be emitted on task exit,
      after the task's counters have been closed.
      
      This task tracing is controlled through: attr.comm || attr.mmap
      and through the new attr.task field.
      Suggested-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ cleaned up perf_counter.h a bit ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9f498cc5
  7. 01 Aug, 2009 1 commit
  8. 31 Jul, 2009 4 commits
  9. 30 Jul, 2009 2 commits
  10. 29 Jul, 2009 7 commits
  11. 28 Jul, 2009 1 commit
  12. 27 Jul, 2009 1 commit
  13. 24 Jul, 2009 1 commit
  14. 23 Jul, 2009 1 commit
  15. 22 Jul, 2009 3 commits
    • Anton Vorontsov's avatar
      of/mdio: Add support function for Ethernet fixed-link property · 24c30dbb
      Anton Vorontsov authored
      
      
      Fixed-link support is broken for the ucc_eth, gianfar, and fs_enet
      device drivers.  The "OF MDIO rework" patches removed most of the
      support. Instead of re-adding fixed-link stuff to the drivers, this
      patch adds a support function for parsing the fixed-link property
      and obtaining a dummy phy to match.
      
      Note: the dummy phy handling in arch/powerpc is a bit of a hack and
      needs to be reworked.  This function is being added now to solve the
      regression in the Ethernet drivers, but it should be considered a
      temporary measure until the fixed link handling can be reworked.
      Signed-off-by: default avatarAnton Vorontsov <avorontsov@ru.mvista.com>
      Signed-off-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24c30dbb
    • Peter Zijlstra's avatar
      perf_counter: PERF_SAMPLE_ID and inherited counters · 7f453c24
      Peter Zijlstra authored
      
      
      Anton noted that for inherited counters the counter-id as provided by
      PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
      because each inherited counter gets its own id.
      
      His suggestion was to always return the parent counter id, since that
      is the primary counter id as exposed. However, these inherited
      counters have a unique identifier so that events like
      PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
      counter gets modified, which is important when trying to normalize the
      sample streams.
      
      This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
      which is more useful anyway, since changing periods became a lot more
      common than initially thought -- rendering PERF_EVENT_PERIOD the less
      useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
      value, since it reports the value used to trigger the overflow,
      whereas PERF_EVENT_PERIOD simply reports the requested period changed,
      which might only take effect on the next cycle).
      
      This still leaves us PERF_EVENT_THROTTLE to consider, but since that
      _should_ be a rare occurrence, and linking it to a primary id is the
      most useful bit to diagnose the problem, we introduce a
      PERF_SAMPLE_STREAM_ID, for those few cases where the full
      reconstruction is important.
      
      [Does change the ABI a little, but I see no other way out]
      Suggested-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248095846.15751.8781.camel@twins>
      7f453c24
    • Peter Zijlstra's avatar
      softirq: introduce tasklet_hrtimer infrastructure · 9ba5f005
      Peter Zijlstra authored
      commit ca109491
      
       (hrtimer: removing all ur callback modes) moved all
      hrtimer callbacks into hard interrupt context when high resolution
      timers are active. That breaks code which relied on the assumption
      that the callback happens in softirq context.
      
      Provide a generic infrastructure which combines tasklets and hrtimers
      together to provide an in-softirq hrtimer experience.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: torvalds@linux-foundation.org
      Cc: kaber@trash.net
      Cc: David Miller <davem@davemloft.net>
      LKML-Reference: <1248265724.27058.1366.camel@twins>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      9ba5f005
  16. 21 Jul, 2009 3 commits
    • Eric Paris's avatar
      inotify: use GFP_NOFS under potential memory pressure · f44aebcc
      Eric Paris authored
      
      
      inotify can have a watchs removed under filesystem reclaim.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.31-rc2 #16
      ---------------------------------
      inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3
      {IN-RECLAIM_FS-W} state was registered at:
        [<c10536ab>] __lock_acquire+0x2c9/0xac4
        [<c1053f45>] lock_acquire+0x9f/0xc2
        [<c1308872>] __mutex_lock_common+0x2d/0x323
        [<c1308c00>] mutex_lock_nested+0x2e/0x36
        [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2
        [<c108bfb6>] shrink_slab+0xe2/0x13c
        [<c108c3e1>] kswapd+0x3d1/0x55d
        [<c10449b5>] kthread+0x66/0x6b
        [<c1003fdf>] kernel_thread_helper+0x7/0x10
        [<ffffffff>] 0xffffffff
      
      Two things are needed to fix this.  First we need a method to tell
      fsnotify_create_event() to use GFP_NOFS and second we need to stop using
      one global IN_IGNORED event and allocate them one at a time.  This solves
      current issues with multiple IN_IGNORED on a queue having tail drop
      problems and simplifies the allocations since we don't have to worry about
      two tasks opperating on the IGNORED event concurrently.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      f44aebcc
    • Alan Jenkins's avatar
      rfkill: remove too-strict __must_check · e56f0975
      Alan Jenkins authored
      
      
      Some drivers don't need the return value of rfkill_set_hw_state(),
      so it should not be marked as __must_check.
      Signed-off-by: default avatarAlan Jenkins <alan-jenkins@tuffmail.co.uk>
      Acked-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      e56f0975
    • Thomas Gleixner's avatar
      genirq: Delegate irq affinity setting to the irq thread · 591d2fb0
      Thomas Gleixner authored
      
      
      irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might
      sleep, but irq_set_thread_affinity() is called with desc->lock held
      and can be called from hard interrupt context as well. The code has
      another bug as it does not hold a ref on the task struct as required
      by set_cpus_allowed_ptr().
      
      Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time
      the thread runs it migrates itself. Solves all of the above problems
      nicely.
      
      Add kerneldoc to irq_set_thread_affinity() while at it.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      591d2fb0
  17. 18 Jul, 2009 1 commit
    • Thomas Gleixner's avatar
      sched: fix nr_uninterruptible accounting of frozen tasks really · 6301cb95
      Thomas Gleixner authored
      commit e3c8ca83
      
       (sched: do not count frozen tasks toward load) broke
      the nr_uninterruptible accounting on freeze/thaw. On freeze the task
      is excluded from accounting with a check for (task->flags &
      PF_FROZEN), but that flag is cleared before the task is thawed. So
      while we prevent that the task with state TASK_UNINTERRUPTIBLE
      is accounted to nr_uninterruptible on freeze we decrement
      nr_uninterruptible on thaw.
      
      Use a separate flag which is handled by the freezing task itself. Set
      it before calling the scheduler with TASK_UNINTERRUPTIBLE state and
      clear it after we return from frozen state.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      6301cb95
  18. 17 Jul, 2009 2 commits
  19. 16 Jul, 2009 1 commit
  20. 15 Jul, 2009 1 commit
    • Jan Kara's avatar
      ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() · 43237b54
      Jan Kara authored
      
      
      Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
      be a relict from some old days and setting disksize in this function does not
      make much sence. Currently it was set only by ext3_getblk().  Since the
      parameter has some effect only if create == 1, it is easy to check that the
      three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
      ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      43237b54