- 26 Jul, 2016 2 commits
-
-
Ebru Akagunduz authored
Introduce a new sysfs integer knob /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap which makes optimistic check for swapin readahead to increase thp collapse rate. Before getting swapped out pages to memory, checks them and allows up to a certain number. It also prints out using tracepoints amount of unmapped ptes. [vdavydov@parallels.com: fix scan not aborted on SCAN_EXCEED_SWAP_PTE] [sfr@canb.auug.org.au: build fix] Link: http://lkml.kernel.org/r/20160616154503.65806e12@canb.auug.org.auSigned-off-by:
Ebru Akagunduz <ebru.akagunduz@gmail.com> Acked-by:
Rik van Riel <riel@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Brian Foster authored
The per-sb inode writeback list tracks inodes currently under writeback to facilitate efficient sync processing. In particular, it ensures that sync only needs to walk through a list of inodes that were cleaned by the sync. Add a couple tracepoints to help identify when inodes are added/removed to and from the writeback lists. Piggyback off of the writeback lazytime tracepoint template as it already tracks the relevant inode information. Link: http://lkml.kernel.org/r/1466594593-6757-3-git-send-email-bfoster@redhat.comSigned-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> cc: Josef Bacik <jbacik@fb.com> Cc: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 24 May, 2016 1 commit
-
-
Jan Kiszka authored
Specifically the change from hex to decimal helps correlating events. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 20 May, 2016 2 commits
-
-
Michal Hocko authored
COMPACT_COMPLETE now means that compaction and free scanner met. This is not very useful information if somebody just wants to use this feedback and make any decisions based on that. The current caller might be a poor guy who just happened to scan tiny portion of the zone and that could be the reason no suitable pages were compacted. Make sure we distinguish the full and partial zone walks. Consumers should treat COMPACT_PARTIAL_SKIPPED as a potential success and be optimistic in retrying. The existing users of COMPACT_COMPLETE are conservatively changed to use COMPACT_PARTIAL_SKIPPED as well but some of them should be probably reconsidered and only defer the compaction only for COMPACT_COMPLETE with the new semantic. This patch shouldn't introduce any functional changes. Signed-off-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Acked-by:
Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
try_to_compact_pages() can currently return COMPACT_SKIPPED even when the compaction is defered for some zone just because zone DMA is skipped in 99% of cases due to watermark checks. This makes COMPACT_DEFERRED basically unusable for the page allocator as a feedback mechanism. Make sure we distinguish those two states properly and switch their ordering in the enum. This would mean that the COMPACT_SKIPPED will be returned only when all eligible zones are skipped. As a result COMPACT_DEFERRED handling for THP in __alloc_pages_slowpath will be more precise and we would bail out rather than reclaim. Signed-off-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Acked-by:
Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 18 May, 2016 1 commit
-
-
Jaegeuk Kim authored
This patch substitutes percpu_counter for atomic_counter when counting various types of pages. Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org>
-
- 13 May, 2016 1 commit
-
-
Christian Borntraeger authored
Some wakeups should not be considered a sucessful poll. For example on s390 I/O interrupts are usually floating, which means that _ALL_ CPUs would be considered runnable - letting all vCPUs poll all the time for transactional like workload, even if one vCPU would be enough. This can result in huge CPU usage for large guests. This patch lets architectures provide a way to qualify wakeups if they should be considered a good/bad wakeups in regard to polls. For s390 the implementation will fence of halt polling for anything but known good, single vCPU events. The s390 implementation for floating interrupts does a wakeup for one vCPU, but the interrupt will be delivered by whatever CPU checks first for a pending interrupt. We prefer the woken up CPU by marking the poll of this CPU as "good" poll. This code will also mark several other wakeup reasons like IPI or expired timers as "good". This will of course also mark some events as not sucessful. As KVM on z runs always as a 2nd level hypervisor, we prefer to not poll, unless we are really sure, though. This patch successfully limits the CPU usage for cases like uperf 1byte transactional ping pong workload or wakeup heavy workload like OLTP while still providing a proper speedup. This also introduced a new vcpu stat "halt_poll_no_tuning" that marks wakeups that are considered not good for polling. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version) Cc: David Matlack <dmatlack@google.com> Cc: Wanpeng Li <kernellwp@gmail.com> [Rename config symbol. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 11 May, 2016 1 commit
-
-
Chao Yu authored
This patch introduces reserve_new_blocks to make preallocation of multi blocks as in batch operation, so it can avoid lots of redundant operation, result in better performance. In virtual machine, with rotational device: time fallocate -l 32G /mnt/f2fs/file Before: real 0m4.584s user 0m0.000s sys 0m4.580s After: real 0m0.292s user 0m0.000s sys 0m0.272s In x86, with SSD: time fallocate -l 500G $MNT/testfile Before : 24.758 s After : 1.604 s Signed-off-by:
Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix bugs and add performance numbers measured in x86.] Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org>
-
- 09 May, 2016 4 commits
-
-
Hannes Reinecke authored
ZAC drives implement a 'ZAC Management Out' command template, which maps onto the ZBC OUT command. Signed-off-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Hannes Reinecke authored
ZAC drives implement a 'ZAC Management In' command template, which maps onto the ZBC IN command. Signed-off-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Hannes Reinecke authored
Some commands like FPDMA RECEIVE or NCQ NON DATA can encapsulate other commands to NCQ transport. So decode the subcmds, too. Signed-off-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Hannes Reinecke authored
Define the NCQ NON DATA command and update libsas to handle it correctly. Signed-off-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- 02 May, 2016 1 commit
-
-
Baolin Wang authored
This patch provides some tracepoints for the lifecycle of a mmc request from starting to completion to help with performance analysis of MMC subsystem. Signed-off-by:
Baolin Wang <baolin.wang@linaro.org> Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org>
-
- 11 Apr, 2016 2 commits
-
-
Hannes Reinecke authored
Add new trace functions for ZBC_IN and ZBC_OUT. Reviewed-by:
Doug Gilbert <dgilbert@interlog.com> Reviewed-by:
Ewan D. Milne <emilne@redhat.com> Signed-off-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Hannes Reinecke authored
scsi_opcode_name() is displaying the opcode, not the service action. Reviewed-by:
Ewan D. Milne <emilne@redhat.com> Signed-off-by:
Hannes Reinecke <hare@suse.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 10 Apr, 2016 1 commit
-
-
Al Viro authored
... and neither can ever be NULL Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 04 Apr, 2016 1 commit
-
-
Mark Fasheh authored
This patch adds tracepoints to the qgroup code on both the reporting side (insert_dirty_extents) and the accounting side. Taken together it allows us to see what qgroup operations have happened, and what their result was. Signed-off-by:
Mark Fasheh <mfasheh@suse.de> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 01 Apr, 2016 1 commit
-
-
Lucas Stach authored
Page isolation has not failed if the fin pfn extends beyond the end pfn and test_pages_isolated checks this correctly. Fix the tracepoint to report the same result as the actual check function. Signed-off-by:
Lucas Stach <l.stach@pengutronix.de> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 31 Mar, 2016 2 commits
-
-
Paul E. McKenney authored
The current mutex-based funnel-locking approach used by expedited grace periods is subject to severe unfairness. The problem arises when a few tasks, making a path from leaves to root, all wake up before other tasks do. A new task can then follow this path all the way to the root, which needlessly delays tasks whose grace period is done, but who do not happen to acquire the lock quickly enough. This commit avoids this problem by maintaining per-rcu_node wait queues, along with a per-rcu_node counter that tracks the latest grace period sought by an earlier task to visit this node. If that grace period would satisfy the current task, instead of proceeding up the tree, it waits on the current rcu_node structure using a pair of wait queues provided for that purpose. This decouples awakening of old tasks from the arrival of new tasks. If the wakeups prove to be a bottleneck, additional kthreads can be brought to bear for that purpose. Signed-off-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Paul E. McKenney authored
Signed-off-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 20 Mar, 2016 1 commit
-
-
Daniel Borkmann authored
flowi6_tos of struct flowi6 is unused in IPv6, therefore dumping tos on that tracepoint will also give incorrect information wrt traffic class. If we want to fix it, we need to extract it via ip6_tclass(flp->flowlabel). While for the same test case I get a count of 0 non-zero tos values before the change, they now start to show up after the change: # ./perf record -e fib6:fib6_table_lookup -a sleep 10 # ./perf script | grep -v "tos 0" | wc -l 60 Since there's no user in the kernel tree anymore of flowi6_tos, remove the define to avoid any future confusion on this. Fixes: b811580d ("net: IPv6 fib lookup tracepoint") Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 17 Mar, 2016 3 commits
-
-
Joonsoo Kim authored
CMA allocation should be guaranteed to succeed by definition, but, unfortunately, it would be failed sometimes. It is hard to track down the problem, because it is related to page reference manipulation and we don't have any facility to analyze it. This patch adds tracepoints to track down page reference manipulation. With it, we can find exact reason of failure and can fix the problem. Following is an example of tracepoint output. (note: this example is stale version that printing flags as the number. Recent version will print it as human readable string.) <...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1 <...>-9018 [004] 92.678378: kernel_stack: => get_page_from_freelist (ffffffff81176659) => __alloc_pages_nodemask (ffffffff81176d22) => alloc_pages_vma (ffffffff811bf675) => handle_mm_fault (ffffffff8119e693) => __do_page_fault (ffffffff810631ea) => trace_do_page_fault (ffffffff81063543) => do_async_page_fault (ffffffff8105c40a) => async_page_fault (ffffffff817581d8) [snip] <...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1 [snip] ... ... <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail [snip] <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1 => release_pages (ffffffff8117c9e4) => free_pages_and_swap_cache (ffffffff811b0697) => tlb_flush_mmu_free (ffffffff81199616) => tlb_finish_mmu (ffffffff8119a62c) => exit_mmap (ffffffff811a53f7) => mmput (ffffffff81073f47) => do_exit (ffffffff810794e9) => do_group_exit (ffffffff81079def) => SyS_exit_group (ffffffff81079e74) => entry_SYSCALL_64_fastpath (ffffffff817560b6) This output shows that problem comes from exit path. In exit path, to improve performance, pages are not freed immediately. They are gathered and processed by batch. During this process, migration cannot be possible and CMA allocation is failed. This problem is hard to find without this page reference tracepoint facility. Enabling this feature bloat kernel text 30 KB in my configuration. text data bss dec hex filename 12127327 2243616 1507328 15878271 f2487f vmlinux_disabled 12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled Note that, due to header file dependency problem between mm.h and tracepoint.h, this feature has to open code the static key functions for tracepoints. Proposed by Steven Rostedt in following link. https://lkml.org/lkml/2015/12/9/699 [arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()] [iamjoonsoo.kim@lge.com: fix build failure for xtensa] [akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil] Signed-off-by:
Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by:
Michal Nazarewicz <mina86@mina86.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Acked-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Get list of VMA flags up-to-date and sort it to match VM_* definition order. [vbabka@suse.cz: add a note above vmaflag definitions to update the names when changing] Signed-off-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
Memory compaction can be currently performed in several contexts: - kswapd balancing a zone after a high-order allocation failure - direct compaction to satisfy a high-order allocation, including THP page fault attemps - khugepaged trying to collapse a hugepage - manually from /proc The purpose of compaction is two-fold. The obvious purpose is to satisfy a (pending or future) high-order allocation, and is easy to evaluate. The other purpose is to keep overal memory fragmentation low and help the anti-fragmentation mechanism. The success wrt the latter purpose is more The current situation wrt the purposes has a few drawbacks: - compaction is invoked only when a high-order page or hugepage is not available (or manually). This might be too late for the purposes of keeping memory fragmentation low. - direct compaction increases latency of allocations. Again, it would be better if compaction was performed asynchronously to keep fragmentation low, before the allocation itself comes. - (a special case of the previous) the cost of compaction during THP page faults can easily offset the benefits of THP. - kswapd compaction appears to be complex, fragile and not working in some scenarios. It could also end up compacting for a high-order allocation request when it should be reclaiming memory for a later order-0 request. To improve the situation, we should be able to benefit from an equivalent of kswapd, but for compaction - i.e. a background thread which responds to fragmentation and the need for high-order allocations (including hugepages) somewhat proactively. One possibility is to extend the responsibilities of kswapd, which could however complicate its design too much. It should be better to let kswapd handle reclaim, as order-0 allocations are often more critical than high-order ones. Another possibility is to extend khugepaged, but this kthread is a single instance and tied to THP configs. This patch goes with the option of a new set of per-node kthreads called kcompactd, and lays the foundations, without introducing any new tunables. The lifecycle mimics kswapd kthreads, including the memory hotplug hooks. For compaction, kcompactd uses the standard compaction_suitable() and ompact_finished() criteria and the deferred compaction functionality. Unlike direct compaction, it uses only sync compaction, as there's no allocation latency to minimize. This patch doesn't yet add a call to wakeup_kcompactd. The kswapd compact/reclaim loop for high-order pages will be replaced by waking up kcompactd in the next patch with the description of what's wrong with the old approach. Waking up of the kcompactd threads is also tied to kswapd activity and follows these rules: - we don't want to affect any fastpaths, so wake up kcompactd only from the slowpath, as it's done for kswapd - if kswapd is doing reclaim, it's more important than compaction, so don't invoke kcompactd until kswapd goes to sleep - the target order used for kswapd is passed to kcompactd Future possible future uses for kcompactd include the ability to wake up kcompactd on demand in special situations, such as when hugepages are not available (currently not done due to __GFP_NO_KSWAPD) or when a fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also possible to perform periodic compaction with kcompactd. [arnd@arndb.de: fix build errors with kcompactd] [paul.gortmaker@windriver.com: don't use modular references for non modular code] Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Mar, 2016 2 commits
-
-
Vlastimil Babka authored
In tracepoints, it's possible to print gfp flags in a human-friendly format through a macro show_gfp_flags(), which defines a translation array and passes is to __print_flags(). Since the following patch will introduce support for gfp flags printing in printk(), it would be nice to reuse the array. This is not straightforward, since __print_flags() can't simply reference an array defined in a .c file such as mm/debug.c - it has to be a macro to allow the macro magic to communicate the format to userspace tools such as trace-cmd. The solution is to create a macro __def_gfpflag_names which is used both in show_gfp_flags(), and to define the gfpflag_names[] array in mm/debug.c. On the other hand, mm/debug.c also defines translation tables for page flags and vma flags, and desire was expressed (but not implemented in this series) to use these also from tracepoints. Thus, this patch also renames the events/gfpflags.h file to events/mmflags.h and moves the table definitions there, using the same macro approach as for gfpflags. This allows translating all three kinds of mm-specific flags both in tracepoints and printk. Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
Michal Hocko <mhocko@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Vlastimil Babka authored
The show_gfp_flags() macro provides human-friendly printing of gfp flags in tracepoints. However, it is somewhat out of date and missing several flags. This patches fills in the missing flags, and distinguishes properly between GFP_ATOMIC and __GFP_ATOMIC which were both translated to "GFP_ATOMIC". More generally, all __GFP_X flags which were previously printed as GFP_X, are now printed as __GFP_X, since ommiting the underscores results in output that doesn't actually match the source code, and can only lead to confusion. Where both variants are defined equal (e.g. _DMA and _DMA32), the variant without underscores are preferred. Also add a note in gfp.h so hopefully future changes will be synced better. __GFP_MOVABLE is defined twice in include/linux/gfp.h with different comments. Leave just the newer one, which was intended to replace the old one. Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
Michal Hocko <mhocko@suse.com> Acked-by:
David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 14 Mar, 2016 1 commit
-
-
Michele Di Giorgio authored
Userspace tools are not aware of how to convert the enums provided by the tracepoints to their corresponding strings. Adding TRACE_DEFINE_ENUM() macros allows to make the enums available to userspace to let the tools know what those enum values represent. In particular, for thermal zone trip types what we obtained before was something like: kworker/1:1-460 [001] 320.372732: thermal_zone_trip: thermal_zone=soc id=0 trip=1 trip_type=1 Unfortunately, userspace tools do not know how to convert enum values to strings and as a consequence they can only forward the enum value to the output. By using TRACE_DEFINE_ENUM() macros for thermal traces we get the following trace line: kworker/1:1-460 [001] 320.372732: thermal_zone_trip: thermal_zone=soc id=0 trip=1 trip_type=PASSIVE Userspace tools are now able to better understand the meaning of the trip_type and provide the user with more readable information. CC: Steven Rostedt <rostedt@goodmis.org> CC: Eduardo Valentin <edubezval@gmail.com> Signed-off-by:
Michele Di Giorgio <michele.digiorgio@arm.com> Acked-by:
Steven Rostedt <rostedt@goodmis.org> Acked-by:
Javi Merino <javi.merino@arm.com> Signed-off-by:
Zhang Rui <rui.zhang@intel.com>
-
- 08 Mar, 2016 2 commits
-
-
Yang Shi authored
commit 5634cc2a ("writeback: update writeback tracepoints to report cgroup") made writeback tracepoints print out cgroup path when CGROUP_WRITEBACK is enabled, but it may trigger the below bug on -rt kernel since kernfs_path and kernfs_path_len are called by tracepoints, which acquire spin lock that is sleepable on -rt kernel. BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:930 in_atomic(): 1, irqs_disabled(): 0, pid: 625, name: kworker/u16:3 INFO: lockdep is turned off. Preemption disabled at:[<ffffffc000374a5c>] wb_writeback+0xec/0x830 CPU: 7 PID: 625 Comm: kworker/u16:3 Not tainted 4.4.1-rt5 #20 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Workqueue: writeback wb_workfn (flush-7:0) Call trace: [<ffffffc00008d708>] dump_backtrace+0x0/0x200 [<ffffffc00008d92c>] show_stack+0x24/0x30 [<ffffffc0007b0f40>] dump_stack+0x88/0xa8 [<ffffffc000127d74>] ___might_sleep+0x2ec/0x300 [<ffffffc000d5d550>] rt_spin_lock+0x38/0xb8 [<ffffffc0003e0548>] kernfs_path_len+0x30/0x90 [<ffffffc00036b360>] trace_event_raw_event_writeback_work_class+0xe8/0x2e8 [<ffffffc000374f90>] wb_writeback+0x620/0x830 [<ffffffc000376224>] wb_workfn+0x61c/0x950 [<ffffffc000110adc>] process_one_work+0x3ac/0xb30 [<ffffffc0001112fc>] worker_thread+0x9c/0x7a8 [<ffffffc00011a9e8>] kthread+0x190/0x1b0 [<ffffffc000086ca0>] ret_from_fork+0x10/0x30 With unlocked kernfs_* functions, synchronize_sched() has to be called in kernfs_rename which could be called in syscall path, but it is problematic. So, print out cgroup ino instead of path name, which could be converted to path name by userland. Withouth CGROUP_WRITEBACK enabled, it just prints out root dir. But, root dir ino vary from different filesystems, so printing out -1U to indicate an invalid cgroup ino. Link: http://lkml.kernel.org/r/1456996137-8354-1-git-send-email-yang.shi@linaro.orgAcked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Yang Shi <yang.shi@linaro.org> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt (Red Hat) authored
Some trace events have conditions that check if the current CPU is online or not before recording the tracepoint. That's because certain trace events are in locations that can be called as the CPU is going offline and when RCU no longer monitors it (like kfree and friends). The check was added because trace events require RCU to be active. This is a trace event infrastructure issue and not something that individual trace events should worry about. The tracepoint.h code now has added a check to see if the current CPU is considered online, and it only does the tracepoint if it is. There's no more need for individual trace events to also include this check. It is now redundant. Cc: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
- 02 Mar, 2016 1 commit
-
-
Frederic Weisbecker authored
It makes nohz tracing more lightweight, standard and easier to parse. Examples: user_loop-2904 [007] d..1 517.701126: tick_stop: success=1 dependency=NONE user_loop-2904 [007] dn.1 518.021181: tick_stop: success=0 dependency=SCHED posix_timers-6142 [007] d..1 1739.027400: tick_stop: success=0 dependency=POSIX_TIMER user_loop-5463 [007] dN.1 1185.931939: tick_stop: success=0 dependency=PERF_EVENTS Suggested-by:
Peter Zijlstra <peterz@infradead.org> Reviewed-by:
Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com>
-
- 01 Mar, 2016 1 commit
-
-
Thomas Gleixner authored
We want to trace the hotplug machinery. Add tracepoints to track the invocation of callbacks and their result. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.593563875@linutronix.deSigned-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- 25 Feb, 2016 1 commit
-
-
Arnd Bergmann authored
After a change to the snd_jack structure, the 'name' member is no longer available in all configurations, which results in a build failure in the tracing code: include/trace/events/asoc.h: In function 'trace_event_raw_event_snd_soc_jack_report': include/trace/events/asoc.h:240:32: error: 'struct snd_jack' has no member named 'name' The name field is normally initialized from the card shortname and the jack "id" field: snprintf(jack->name, sizeof(jack->name), "%s %s", card->shortname, jack->id); This changes the tracing output to just contain the 'id' by itself, which slightly changes the output format but avoids the link error and is hopefully still enough to see what is going on. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Fixes: fe0d128c ("ALSA: jack: Allow building the jack layer without input device") Signed-off-by:
Mark Brown <broonie@kernel.org>
-
- 22 Feb, 2016 2 commits
-
-
Chao Yu authored
This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by:
Chao Yu <chao2.yu@samsung.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org>
-
Chao Yu authored
f2fs support atomic write with following semantics: 1. open db file 2. ioctl start atomic write 3. (write db file) * n 4. ioctl commit atomic write 5. close db file With this flow we can avoid file becoming corrupted when abnormal power cut, because we hold data of transaction in referenced pages linked in inmem_pages list of inode, but without setting them dirty, so these data won't be persisted unless we commit them in step 4. But we should still hold journal db file in memory by using volatile write, because our semantics of 'atomic write support' is incomplete, in step 4, we could fail to submit all dirty data of transaction, once partial dirty data was committed in storage, then after a checkpoint & abnormal power-cut, db file will be corrupted forever. So this patch tries to improve atomic write flow by adding a revoking flow, once inner error occurs in committing, this gives another chance to try to revoke these partial submitted data of current transaction, it makes committing operation more like aotmical one. If we're not lucky, once revoking operation was failed, EAGAIN will be reported to user for suggesting doing the recovery with held journal file, or retrying current transaction again. Signed-off-by:
Chao Yu <chao2.yu@samsung.com> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org>
-
- 16 Feb, 2016 1 commit
-
-
Christian Borntraeger authored
Right now halt_poll_ns can be change during runtime. The grow and shrink factors can only be set during module load. Lets fix several aspects of grow shrink: - make grow/shrink changeable by root - make all variables unsigned int - read the variables once to prevent races Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 07 Feb, 2016 1 commit
-
-
Sowmini Varadhan authored
Add perf event macros for support of tracing and instrumentation of LDC state machine Signed-off-by:
Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 04 Feb, 2016 1 commit
-
-
Shilpasri G Bhat authored
This patch adds the powernv_throttle tracepoint to trace the CPU frequency throttling event, which is used by the powernv-cpufreq driver in POWER8. Signed-off-by:
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by:
Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 25 Jan, 2016 1 commit
-
-
Gustavo Padovan authored
timeline was wrongly assigned with ->get_driver_name(). Link: http://lkml.kernel.org/r/1453376895-30747-1-git-send-email-gustavo@padovan.orgSigned-off-by:
Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
- 21 Jan, 2016 1 commit
-
-
yalin wang authored
This crash is caused by NULL pointer deference, in page_to_pfn() marco, when page == NULL : Unable to handle kernel NULL pointer dereference at virtual address 00000000 Internal error: Oops: 94000006 [#1] SMP Modules linked in: CPU: 1 PID: 26 Comm: khugepaged Tainted: G W 4.3.0-rc6-next-20151022ajb-00001-g32f3386-dirty #3 PC is at khugepaged+0x378/0x1af8 LR is at khugepaged+0x418/0x1af8 Process khugepaged (pid: 26, stack limit = 0xffffffc079638020) Call trace: khugepaged+0x378/0x1af8 kthread+0xdc/0xf4 ret_from_fork+0xc/0x40 Code: 35001700 f0002c60 aa0703e3 f9009fa0 (f94000e0) ---[ end trace 637503d8e28ae69e ]--- Kernel panic - not syncing: Fatal exception CPU2: stopping CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D W 4.3.0-rc6-next-20151022ajb-00001-g32f3386-dirty #3 Hardware name: linux,dummy-virt (DT) [akpm@linux-foundation.org: fix fat-fingered merge resolution] Signed-off-by:
yalin wang <yalin.wang2010@gmail.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by:
David Rientjes <rientjes@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Jan, 2016 1 commit
-
-
Kirill A. Shutemov authored
Prepare khugepaged to see compound pages mapped with pte. For now we won't collapse the pmd table with such pte. khugepaged is subject for future rework wrt new refcounting. Signed-off-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by:
Sasha Levin <sasha.levin@oracle.com> Tested-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by:
Jerome Marchand <jmarchan@redhat.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-