1. 28 Jul, 2016 9 commits
  2. 27 Jul, 2016 1 commit
    • Oleg Nesterov's avatar
      stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE · ce4f06dc
      Oleg Nesterov authored
      Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
      hard-lockup can be triggered on another CPU which does nothing wrong and
      the trace from nmi_panic() won't help to investigate the problem.
      
      And this change "fixes" the problem we (seem to) hit in practice.
      
       - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.
      
       - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
         lockup and tries to report the problem.
      
       - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
         which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.
      
       - CPU_1 spends more than 10 seconds trying to flush the log buffer to
         the slow serial console.
      
       - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().
      Reported-by: default avatarWang Shu <shuwang@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20160726185736.GB4088@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ce4f06dc
  3. 26 Jul, 2016 4 commits
    • Johannes Weiner's avatar
      cgroup: remove unnecessary 0 check from css_from_id() · cb773df8
      Johannes Weiner authored
      css_idr allocation starts at 1, so index 0 will never point to an item.
      css_from_id() currently filters that before asking idr_find(), but
      idr_find() would also just return NULL, so this is not needed.
      
      Link: http://lkml.kernel.org/r/20160617162427.GC19084@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb773df8
    • Johannes Weiner's avatar
      cgroup: fix idr leak for the first cgroup root · 1fe4d021
      Johannes Weiner authored
      The valid cgroup hierarchy ID range includes 0, so we can't filter for
      positive numbers when freeing it, or it'll leak the first ID.  No big
      deal, just disruptive when reading the code.
      
      The ID is freed during error handling and when the reference count hits
      zero, so the double-free test is not necessary; remove it.
      
      Link: http://lkml.kernel.org/r/20160617162359.GB19084@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1fe4d021
    • Vladimir Davydov's avatar
      mm: charge/uncharge kmemcg from generic page allocator paths · 4949148a
      Vladimir Davydov authored
      Currently, to charge a non-slab allocation to kmemcg one has to use
      alloc_kmem_pages helper with __GFP_ACCOUNT flag.  A page allocated with
      this helper should finally be freed using free_kmem_pages, otherwise it
      won't be uncharged.
      
      This API suits its current users fine, but it turns out to be impossible
      to use along with page reference counting, i.e.  when an allocation is
      supposed to be freed with put_page, as it is the case with pipe or unix
      socket buffers.
      
      To overcome this limitation, this patch moves charging/uncharging to
      generic page allocator paths, i.e.  to __alloc_pages_nodemask and
      free_pages_prepare, and zaps alloc/free_kmem_pages helpers.  This way,
      one can use any of the available page allocation functions to get the
      allocated page charged to kmemcg - it's enough to pass __GFP_ACCOUNT,
      just like in case of kmalloc and friends.  A charged page will be
      automatically uncharged on free.
      
      To make it possible, we need to mark pages charged to kmemcg somehow.
      To avoid introducing a new page flag, we make use of page->_mapcount for
      marking such pages.  Since pages charged to kmemcg are not supposed to
      be mapped to userspace, it should work just fine.  There are other
      (ab)users of page->_mapcount - buddy and balloon pages - but we don't
      conflict with them.
      
      In case kmemcg is compiled out or not used at runtime, this patch
      introduces no overhead to generic page allocator paths.  If kmemcg is
      used, it will be plus one gfp flags check on alloc and plus one
      page->_mapcount check on free, which shouldn't hurt performance, because
      the data accessed are hot.
      
      Link: http://lkml.kernel.org/r/a9736d856f895bcb465d9f257b54efe32eda6f99.1464079538.git.vdavydov@virtuozzo.comSigned-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4949148a
    • Sargun Dhillon's avatar
      bpf: Add bpf_probe_write_user BPF helper to be called in tracers · 96ae5227
      Sargun Dhillon authored
      This allows user memory to be written to during the course of a kprobe.
      It shouldn't be used to implement any kind of security mechanism
      because of TOC-TOU attacks, but rather to debug, divert, and
      manipulate execution of semi-cooperative processes.
      
      Although it uses probe_kernel_write, we limit the address space
      the probe can write into by checking the space with access_ok.
      We do this as opposed to calling copy_to_user directly, in order
      to avoid sleeping. In addition we ensure the threads's current fs
      / segment is USER_DS and the thread isn't exiting nor a kernel thread.
      
      Given this feature is meant for experiments, and it has a risk of
      crashing the system, and running programs, we print a warning on
      when a proglet that attempts to use this helper is installed,
      along with the pid and process name.
      Signed-off-by: default avatarSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96ae5227
  4. 25 Jul, 2016 1 commit
    • Daniel Borkmann's avatar
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann authored
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa7145c1
  5. 22 Jul, 2016 1 commit
    • Chen Yu's avatar
      PM / hibernate: Introduce test_resume mode for hibernation · fe12c00d
      Chen Yu authored
      test_resume mode is to verify if the snapshot data
      written to swap device can be successfully restored
      to memory. It is useful to ease the debugging process
      on hibernation, since this mode can not only bypass
      the BIOSes/bootloader, but also the system re-initialization.
      
      To avoid the risk to break the filesystm on persistent storage,
      this patch resumes the image with tasks frozen.
      
      For example:
      echo test_resume > /sys/power/disk
      echo disk > /sys/power/state
      
      [  187.306470] PM: Image saving progress:  70%
      [  187.395298] PM: Image saving progress:  80%
      [  187.476697] PM: Image saving progress:  90%
      [  187.554641] PM: Image saving done.
      [  187.558896] PM: Wrote 594600 kbytes in 0.90 seconds (660.66 MB/s)
      [  187.566000] PM: S|
      [  187.589742] PM: Basic memory bitmaps freed
      [  187.594694] PM: Checking hibernation image
      [  187.599865] PM: Image signature found, resuming
      [  187.605209] PM: Loading hibernation image.
      [  187.665753] PM: Basic memory bitmaps created
      [  187.691397] PM: Using 3 thread(s) for decompression.
      [  187.691397] PM: Loading and decompressing image data (148650 pages)...
      [  187.889719] PM: Image loading progress:   0%
      [  188.100452] PM: Image loading progress:  10%
      [  188.244781] PM: Image loading progress:  20%
      [  189.057305] PM: Image loading done.
      [  189.068793] PM: Image successfully loaded
      Suggested-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fe12c00d
  6. 21 Jul, 2016 1 commit
    • Steve Muckle's avatar
      cpufreq: schedutil: map raw required frequency to driver frequency · 5cbea469
      Steve Muckle authored
      The slow-path frequency transition path is relatively expensive as it
      requires waking up a thread to do work. Should support be added for
      remote CPU cpufreq updates that is also expensive since it requires an
      IPI. These activities should be avoided if they are not necessary.
      
      To that end, calculate the actual driver-supported frequency required by
      the new utilization value in schedutil by using the recently added
      cpufreq_driver_resolve_freq API. If it is the same as the previously
      requested driver frequency then there is no need to continue with the
      update assuming the cpu frequency limits have not changed. This will
      have additional benefits should the semantics of the rate limit be
      changed to apply solely to frequency transitions rather than to
      frequency calculations in schedutil.
      
      The last raw required frequency is cached. This allows the driver
      frequency lookup to be skipped in the event that the new raw required
      frequency matches the last one, assuming a frequency update has not been
      forced due to limits changing (indicated by a next_freq value of
      UINT_MAX, see sugov_should_update_freq).
      Signed-off-by: default avatarSteve Muckle <smuckle@linaro.org>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5cbea469
  7. 20 Jul, 2016 5 commits
  8. 19 Jul, 2016 4 commits
  9. 17 Jul, 2016 1 commit
  10. 15 Jul, 2016 13 commits