1. 20 May, 2016 2 commits
    • Michal Hocko's avatar
      mm, compaction: distinguish between full and partial COMPACT_COMPLETE · c8f7de0b
      Michal Hocko authored
      COMPACT_COMPLETE now means that compaction and free scanner met.  This
      is not very useful information if somebody just wants to use this
      feedback and make any decisions based on that.  The current caller might
      be a poor guy who just happened to scan tiny portion of the zone and
      that could be the reason no suitable pages were compacted.  Make sure we
      distinguish the full and partial zone walks.
      
      Consumers should treat COMPACT_PARTIAL_SKIPPED as a potential success
      and be optimistic in retrying.
      
      The existing users of COMPACT_COMPLETE are conservatively changed to use
      COMPACT_PARTIAL_SKIPPED as well but some of them should be probably
      reconsidered and only defer the compaction only for COMPACT_COMPLETE
      with the new semantic.
      
      This patch shouldn't introduce any functional changes.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8f7de0b
    • Michal Hocko's avatar
      mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED · 1d4746d3
      Michal Hocko authored
      try_to_compact_pages() can currently return COMPACT_SKIPPED even when
      the compaction is defered for some zone just because zone DMA is skipped
      in 99% of cases due to watermark checks.  This makes COMPACT_DEFERRED
      basically unusable for the page allocator as a feedback mechanism.
      
      Make sure we distinguish those two states properly and switch their
      ordering in the enum.  This would mean that the COMPACT_SKIPPED will be
      returned only when all eligible zones are skipped.
      
      As a result COMPACT_DEFERRED handling for THP in __alloc_pages_slowpath
      will be more precise and we would bail out rather than reclaim.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d4746d3
  2. 13 May, 2016 1 commit
    • Christian Borntraeger's avatar
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger authored
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  3. 02 May, 2016 1 commit
  4. 11 Apr, 2016 2 commits
  5. 10 Apr, 2016 1 commit
  6. 04 Apr, 2016 1 commit
  7. 01 Apr, 2016 1 commit
  8. 31 Mar, 2016 2 commits
    • Paul E. McKenney's avatar
      rcu: Enforce expedited-GP fairness via funnel wait queue · f6a12f34
      Paul E. McKenney authored
      The current mutex-based funnel-locking approach used by expedited grace
      periods is subject to severe unfairness.  The problem arises when a
      few tasks, making a path from leaves to root, all wake up before other
      tasks do.  A new task can then follow this path all the way to the root,
      which needlessly delays tasks whose grace period is done, but who do
      not happen to acquire the lock quickly enough.
      
      This commit avoids this problem by maintaining per-rcu_node wait queues,
      along with a per-rcu_node counter that tracks the latest grace period
      sought by an earlier task to visit this node.  If that grace period
      would satisfy the current task, instead of proceeding up the tree,
      it waits on the current rcu_node structure using a pair of wait queues
      provided for that purpose.  This decouples awakening of old tasks from
      the arrival of new tasks.
      
      If the wakeups prove to be a bottleneck, additional kthreads can be
      brought to bear for that purpose.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f6a12f34
    • Paul E. McKenney's avatar
  9. 20 Mar, 2016 1 commit
    • Daniel Borkmann's avatar
      ipv6, trace: fix tos reporting on fib6_table_lookup · 69716a2b
      Daniel Borkmann authored
      flowi6_tos of struct flowi6 is unused in IPv6, therefore dumping tos on
      that tracepoint will also give incorrect information wrt traffic class.
      
      If we want to fix it, we need to extract it via ip6_tclass(flp->flowlabel).
      While for the same test case I get a count of 0 non-zero tos values before
      the change, they now start to show up after the change:
      
        # ./perf record -e fib6:fib6_table_lookup -a sleep 10
        # ./perf script | grep -v "tos 0" | wc -l
        60
      
      Since there's no user in the kernel tree anymore of flowi6_tos, remove the
      define to avoid any future confusion on this.
      
      Fixes: b811580d ("net: IPv6 fib lookup tracepoint")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69716a2b
  10. 17 Mar, 2016 3 commits
    • Joonsoo Kim's avatar
      mm/page_ref: add tracepoint to track down page reference manipulation · 95813b8f
      Joonsoo Kim authored
      CMA allocation should be guaranteed to succeed by definition, but,
      unfortunately, it would be failed sometimes.  It is hard to track down
      the problem, because it is related to page reference manipulation and we
      don't have any facility to analyze it.
      
      This patch adds tracepoints to track down page reference manipulation.
      With it, we can find exact reason of failure and can fix the problem.
      Following is an example of tracepoint output.  (note: this example is
      stale version that printing flags as the number.  Recent version will
      print it as human readable string.)
      
      <...>-9018  [004]    92.678375: page_ref_set:         pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
      <...>-9018  [004]    92.678378: kernel_stack:
       => get_page_from_freelist (ffffffff81176659)
       => __alloc_pages_nodemask (ffffffff81176d22)
       => alloc_pages_vma (ffffffff811bf675)
       => handle_mm_fault (ffffffff8119e693)
       => __do_page_fault (ffffffff810631ea)
       => trace_do_page_fault (ffffffff81063543)
       => do_async_page_fault (ffffffff8105c40a)
       => async_page_fault (ffffffff817581d8)
      [snip]
      <...>-9018  [004]    92.678379: page_ref_mod:         pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
      [snip]
      ...
      ...
      <...>-9131  [001]    93.174468: test_pages_isolated:  start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
      [snip]
      <...>-9018  [004]    93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
       => release_pages (ffffffff8117c9e4)
       => free_pages_and_swap_cache (ffffffff811b0697)
       => tlb_flush_mmu_free (ffffffff81199616)
       => tlb_finish_mmu (ffffffff8119a62c)
       => exit_mmap (ffffffff811a53f7)
       => mmput (ffffffff81073f47)
       => do_exit (ffffffff810794e9)
       => do_group_exit (ffffffff81079def)
       => SyS_exit_group (ffffffff81079e74)
       => entry_SYSCALL_64_fastpath (ffffffff817560b6)
      
      This output shows that problem comes from exit path.  In exit path, to
      improve performance, pages are not freed immediately.  They are gathered
      and processed by batch.  During this process, migration cannot be
      possible and CMA allocation is failed.  This problem is hard to find
      without this page reference tracepoint facility.
      
      Enabling this feature bloat kernel text 30 KB in my configuration.
      
         text    data     bss     dec     hex filename
      12127327        2243616 1507328 15878271         f2487f vmlinux_disabled
      12157208        2258880 1507328 15923416         f2f8d8 vmlinux_enabled
      
      Note that, due to header file dependency problem between mm.h and
      tracepoint.h, this feature has to open code the static key functions for
      tracepoints.  Proposed by Steven Rostedt in following link.
      
      https://lkml.org/lkml/2015/12/9/699
      
      [arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
      [iamjoonsoo.kim@lge.com: fix build failure for xtensa]
      [akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95813b8f
    • Kirill A. Shutemov's avatar
      mm, tracing: refresh __def_vmaflag_names · bcf66917
      Kirill A. Shutemov authored
      Get list of VMA flags up-to-date and sort it to match VM_* definition
      order.
      
      [vbabka@suse.cz: add a note above vmaflag definitions to update the names when changing]
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bcf66917
    • Vlastimil Babka's avatar
      mm, compaction: introduce kcompactd · 698b1b30
      Vlastimil Babka authored
      Memory compaction can be currently performed in several contexts:
      
       - kswapd balancing a zone after a high-order allocation failure
       - direct compaction to satisfy a high-order allocation, including THP
         page fault attemps
       - khugepaged trying to collapse a hugepage
       - manually from /proc
      
      The purpose of compaction is two-fold.  The obvious purpose is to
      satisfy a (pending or future) high-order allocation, and is easy to
      evaluate.  The other purpose is to keep overal memory fragmentation low
      and help the anti-fragmentation mechanism.  The success wrt the latter
      purpose is more
      
      The current situation wrt the purposes has a few drawbacks:
      
       - compaction is invoked only when a high-order page or hugepage is not
         available (or manually).  This might be too late for the purposes of
         keeping memory fragmentation low.
       - direct compaction increases latency of allocations.  Again, it would
         be better if compaction was performed asynchronously to keep
         fragmentation low, before the allocation itself comes.
       - (a special case of the previous) the cost of compaction during THP
         page faults can easily offset the benefits of THP.
       - kswapd compaction appears to be complex, fragile and not working in
         some scenarios.  It could also end up compacting for a high-order
         allocation request when it should be reclaiming memory for a later
         order-0 request.
      
      To improve the situation, we should be able to benefit from an
      equivalent of kswapd, but for compaction - i.e. a background thread
      which responds to fragmentation and the need for high-order allocations
      (including hugepages) somewhat proactively.
      
      One possibility is to extend the responsibilities of kswapd, which could
      however complicate its design too much.  It should be better to let
      kswapd handle reclaim, as order-0 allocations are often more critical
      than high-order ones.
      
      Another possibility is to extend khugepaged, but this kthread is a
      single instance and tied to THP configs.
      
      This patch goes with the option of a new set of per-node kthreads called
      kcompactd, and lays the foundations, without introducing any new
      tunables.  The lifecycle mimics kswapd kthreads, including the memory
      hotplug hooks.
      
      For compaction, kcompactd uses the standard compaction_suitable() and
      ompact_finished() criteria and the deferred compaction functionality.
      Unlike direct compaction, it uses only sync compaction, as there's no
      allocation latency to minimize.
      
      This patch doesn't yet add a call to wakeup_kcompactd.  The kswapd
      compact/reclaim loop for high-order pages will be replaced by waking up
      kcompactd in the next patch with the description of what's wrong with
      the old approach.
      
      Waking up of the kcompactd threads is also tied to kswapd activity and
      follows these rules:
       - we don't want to affect any fastpaths, so wake up kcompactd only from
         the slowpath, as it's done for kswapd
       - if kswapd is doing reclaim, it's more important than compaction, so
         don't invoke kcompactd until kswapd goes to sleep
       - the target order used for kswapd is passed to kcompactd
      
      Future possible future uses for kcompactd include the ability to wake up
      kcompactd on demand in special situations, such as when hugepages are
      not available (currently not done due to __GFP_NO_KSWAPD) or when a
      fragmentation event (i.e.  __rmqueue_fallback()) occurs.  It's also
      possible to perform periodic compaction with kcompactd.
      
      [arnd@arndb.de: fix build errors with kcompactd]
      [paul.gortmaker@windriver.com: don't use modular references for non modular code]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      698b1b30
  11. 15 Mar, 2016 2 commits
    • Vlastimil Babka's avatar
      mm, tracing: unify mm flags handling in tracepoints and printk · 420adbe9
      Vlastimil Babka authored
      In tracepoints, it's possible to print gfp flags in a human-friendly
      format through a macro show_gfp_flags(), which defines a translation
      array and passes is to __print_flags().  Since the following patch will
      introduce support for gfp flags printing in printk(), it would be nice
      to reuse the array.  This is not straightforward, since __print_flags()
      can't simply reference an array defined in a .c file such as mm/debug.c
      - it has to be a macro to allow the macro magic to communicate the
      format to userspace tools such as trace-cmd.
      
      The solution is to create a macro __def_gfpflag_names which is used both
      in show_gfp_flags(), and to define the gfpflag_names[] array in
      mm/debug.c.
      
      On the other hand, mm/debug.c also defines translation tables for page
      flags and vma flags, and desire was expressed (but not implemented in
      this series) to use these also from tracepoints.  Thus, this patch also
      renames the events/gfpflags.h file to events/mmflags.h and moves the
      table definitions there, using the same macro approach as for gfpflags.
      This allows translating all three kinds of mm-specific flags both in
      tracepoints and printk.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      420adbe9
    • Vlastimil Babka's avatar
      mm, tracing: make show_gfp_flags() up to date · 1f7866b4
      Vlastimil Babka authored
      The show_gfp_flags() macro provides human-friendly printing of gfp flags
      in tracepoints.  However, it is somewhat out of date and missing several
      flags.  This patches fills in the missing flags, and distinguishes
      properly between GFP_ATOMIC and __GFP_ATOMIC which were both translated
      to "GFP_ATOMIC".  More generally, all __GFP_X flags which were
      previously printed as GFP_X, are now printed as __GFP_X, since ommiting
      the underscores results in output that doesn't actually match the source
      code, and can only lead to confusion.  Where both variants are defined
      equal (e.g.  _DMA and _DMA32), the variant without underscores are
      preferred.
      
      Also add a note in gfp.h so hopefully future changes will be synced
      better.
      
      __GFP_MOVABLE is defined twice in include/linux/gfp.h with different
      comments.  Leave just the newer one, which was intended to replace the
      old one.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f7866b4
  12. 14 Mar, 2016 1 commit
    • Michele Di Giorgio's avatar
      thermal: trace: migrating thermal traces to use TRACE_DEFINE_ENUM() macros · d0b45880
      Michele Di Giorgio authored
      Userspace tools are not aware of how to convert the enums provided by
      the tracepoints to their corresponding strings.
      
      Adding TRACE_DEFINE_ENUM() macros allows to make the enums available
      to userspace to let the tools know what those enum values represent.
      
      In particular, for thermal zone trip types what we obtained before was
      something like:
      
      kworker/1:1-460   [001]   320.372732: thermal_zone_trip:    thermal_zone=soc
      				id=0 trip=1 trip_type=1
      
      Unfortunately, userspace tools do not know how to convert enum values to
      strings and as a consequence they can only forward the enum value to the
      output. By using TRACE_DEFINE_ENUM() macros for thermal traces we get the
      following trace line:
      
      kworker/1:1-460   [001]   320.372732: thermal_zone_trip:    thermal_zone=soc
      				id=0 trip=1 trip_type=PASSIVE
      
      Userspace tools are now able to better understand the meaning of the trip_type
      and provide the user with more readable information.
      
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Eduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarMichele Di Giorgio <michele.digiorgio@arm.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarJavi Merino <javi.merino@arm.com>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      d0b45880
  13. 08 Mar, 2016 2 commits
    • Yang Shi's avatar
      tracing, writeback: Replace cgroup path to cgroup ino · a664edb3
      Yang Shi authored
      commit 5634cc2a ("writeback: update writeback
      tracepoints to report cgroup") made writeback tracepoints print out cgroup
      path when CGROUP_WRITEBACK is enabled, but it may trigger the below bug on -rt
      kernel since kernfs_path and kernfs_path_len are called by tracepoints, which
      acquire spin lock that is sleepable on -rt kernel.
      
      BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:930
      in_atomic(): 1, irqs_disabled(): 0, pid: 625, name: kworker/u16:3
      INFO: lockdep is turned off.
      Preemption disabled at:[<ffffffc000374a5c>] wb_writeback+0xec/0x830
      
      CPU: 7 PID: 625 Comm: kworker/u16:3 Not tainted 4.4.1-rt5 #20
      Hardware name: Freescale Layerscape 2085a RDB Board (DT)
      Workqueue: writeback wb_workfn (flush-7:0)
      Call trace:
      [<ffffffc00008d708>] dump_backtrace+0x0/0x200
      [<ffffffc00008d92c>] show_stack+0x24/0x30
      [<ffffffc0007b0f40>] dump_stack+0x88/0xa8
      [<ffffffc000127d74>] ___might_sleep+0x2ec/0x300
      [<ffffffc000d5d550>] rt_spin_lock+0x38/0xb8
      [<ffffffc0003e0548>] kernfs_path_len+0x30/0x90
      [<ffffffc00036b360>] trace_event_raw_event_writeback_work_class+0xe8/0x2e8
      [<ffffffc000374f90>] wb_writeback+0x620/0x830
      [<ffffffc000376224>] wb_workfn+0x61c/0x950
      [<ffffffc000110adc>] process_one_work+0x3ac/0xb30
      [<ffffffc0001112fc>] worker_thread+0x9c/0x7a8
      [<ffffffc00011a9e8>] kthread+0x190/0x1b0
      [<ffffffc000086ca0>] ret_from_fork+0x10/0x30
      
      With unlocked kernfs_* functions, synchronize_sched() has to be called in
      kernfs_rename which could be called in syscall path, but it is problematic.
      So, print out cgroup ino instead of path name, which could be converted to
      path name by userland.
      
      Withouth CGROUP_WRITEBACK enabled, it just prints out root dir. But, root
      dir ino vary from different filesystems, so printing out -1U to indicate
      an invalid cgroup ino.
      
      Link: http://lkml.kernel.org/r/1456996137-8354-1-git-send-email-yang.shi@linaro.orgAcked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a664edb3
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Remove duplicate checks for online CPUs · 633f6f58
      Steven Rostedt (Red Hat) authored
      Some trace events have conditions that check if the current CPU is online or
      not before recording the tracepoint. That's because certain trace events are
      in locations that can be called as the CPU is going offline and when RCU no
      longer monitors it (like kfree and friends). The check was added because
      trace events require RCU to be active.
      
      This is a trace event infrastructure issue and not something that individual
      trace events should worry about. The tracepoint.h code now has added a check
      to see if the current CPU is considered online, and it only does the
      tracepoint if it is. There's no more need for individual trace events to
      also include this check. It is now redundant.
      
      Cc: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      633f6f58
  14. 02 Mar, 2016 1 commit
    • Frederic Weisbecker's avatar
      nohz: Use enum code for tick stop failure tracing message · e6e6cc22
      Frederic Weisbecker authored
      It makes nohz tracing more lightweight, standard and easier to parse.
      
      Examples:
      
             user_loop-2904  [007] d..1   517.701126: tick_stop: success=1 dependency=NONE
             user_loop-2904  [007] dn.1   518.021181: tick_stop: success=0 dependency=SCHED
          posix_timers-6142  [007] d..1  1739.027400: tick_stop: success=0 dependency=POSIX_TIMER
             user_loop-5463  [007] dN.1  1185.931939: tick_stop: success=0 dependency=PERF_EVENTS
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      e6e6cc22
  15. 01 Mar, 2016 1 commit
    • Thomas Gleixner's avatar
      cpu/hotplug: Add tracepoints · 5ba9ac8e
      Thomas Gleixner authored
      We want to trace the hotplug machinery. Add tracepoints to track the
      invocation of callbacks and their result.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182340.593563875@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5ba9ac8e
  16. 25 Feb, 2016 1 commit
    • Arnd Bergmann's avatar
      ASoC: trace: fix printing jack name · f4833a51
      Arnd Bergmann authored
      After a change to the snd_jack structure, the 'name' member
      is no longer available in all configurations, which results in a
      build failure in the tracing code:
      
      include/trace/events/asoc.h: In function 'trace_event_raw_event_snd_soc_jack_report':
      include/trace/events/asoc.h:240:32: error: 'struct snd_jack' has no member named 'name'
      
      The name field is normally initialized from the card shortname and
      the jack "id" field:
      
              snprintf(jack->name, sizeof(jack->name), "%s %s",
                       card->shortname, jack->id);
      
      This changes the tracing output to just contain the 'id' by
      itself, which slightly changes the output format but avoids the
      link error and is hopefully still enough to see what is going on.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: fe0d128c ("ALSA: jack: Allow building the jack layer without input device")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      f4833a51
  17. 22 Feb, 2016 2 commits
    • Chao Yu's avatar
      f2fs: trace old block address for CoWed page · 7a9d7548
      Chao Yu authored
      This patch enables to trace old block address of CoWed page for better
      debugging.
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7a9d7548
    • Chao Yu's avatar
      f2fs: support revoking atomic written pages · 28bc106b
      Chao Yu authored
      f2fs support atomic write with following semantics:
      1. open db file
      2. ioctl start atomic write
      3. (write db file) * n
      4. ioctl commit atomic write
      5. close db file
      
      With this flow we can avoid file becoming corrupted when abnormal power
      cut, because we hold data of transaction in referenced pages linked in
      inmem_pages list of inode, but without setting them dirty, so these data
      won't be persisted unless we commit them in step 4.
      
      But we should still hold journal db file in memory by using volatile
      write, because our semantics of 'atomic write support' is incomplete, in
      step 4, we could fail to submit all dirty data of transaction, once
      partial dirty data was committed in storage, then after a checkpoint &
      abnormal power-cut, db file will be corrupted forever.
      
      So this patch tries to improve atomic write flow by adding a revoking flow,
      once inner error occurs in committing, this gives another chance to try to
      revoke these partial submitted data of current transaction, it makes
      committing operation more like aotmical one.
      
      If we're not lucky, once revoking operation was failed, EAGAIN will be
      reported to user for suggesting doing the recovery with held journal file,
      or retrying current transaction again.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      28bc106b
  18. 16 Feb, 2016 1 commit
  19. 07 Feb, 2016 1 commit
  20. 04 Feb, 2016 1 commit
  21. 25 Jan, 2016 1 commit
  22. 21 Jan, 2016 1 commit
  23. 15 Jan, 2016 1 commit
  24. 14 Jan, 2016 4 commits
  25. 08 Jan, 2016 1 commit
  26. 18 Dec, 2015 1 commit
  27. 17 Dec, 2015 2 commits
  28. 11 Dec, 2015 1 commit