1. 29 Oct, 2014 1 commit
  2. 28 Oct, 2014 2 commits
  3. 17 Oct, 2014 1 commit
    • Daniel Borkmann's avatar
      random: add and use memzero_explicit() for clearing data · d4c5efdb
      Daniel Borkmann authored
      zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
      memset() calls which clear out sensitive data in extract_{buf,entropy,
      entropy_user}() in random driver are being optimized away by gcc.
      
      Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
      that can be used in such cases where a variable with sensitive data is
      being cleared out in the end. Other use cases might also be in crypto
      code. [ I have put this into lib/string.c though, as it's always built-in
      and doesn't need any dependencies then. ]
      
      Fixes kernel bugzilla: 82041
      
      Reported-by: zatimend@hotmail.co.uk
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d4c5efdb
  4. 14 Oct, 2014 1 commit
  5. 13 Oct, 2014 11 commits
  6. 09 Oct, 2014 3 commits
  7. 03 Oct, 2014 2 commits
  8. 02 Oct, 2014 1 commit
    • Peter Zijlstra's avatar
      locking/lockdep: Revert qrwlock recusive stuff · 8acd91e8
      Peter Zijlstra authored
      Commit f0bab73c ("locking/lockdep: Restrict the use of recursive
      read_lock() with qrwlock") changed lockdep to try and conform to the
      qrwlock semantics which differ from the traditional rwlock semantics.
      
      In particular qrwlock is fair outside of interrupt context, but in
      interrupt context readers will ignore all fairness.
      
      The problem modeling this is that read and write side have different
      lock state (interrupts) semantics but we only have a single
      representation of these. Therefore lockdep will get confused, thinking
      the lock can cause interrupt lock inversions.
      
      So revert it for now; the old rwlock semantics were already imperfectly
      modeled and the qrwlock extra won't fit either.
      
      If we want to properly fix this, I think we need to resurrect the work
      by Gautham did a few years ago that split the read and write state of
      locks:
      
         http://lwn.net/Articles/332801/
      
      FWIW the locking selftest that would've failed (and was reported by
      Borislav earlier) is something like:
      
        RL(X1);	/* IRQ-ON */
        LOCK(A);
        UNLOCK(A);
        RU(X1);
      
        IRQ_ENTER();
        RL(X1);	/* IN-IRQ */
        RU(X1);
        IRQ_EXIT();
      
      At which point it would report that because A is an IRQ-unsafe lock we
      can suffer the following inversion:
      
      	CPU0		CPU1
      
      	lock(A)
      			lock(X1)
      			lock(A)
      	<IRQ>
      	 lock(X1)
      
      And this is 'wrong' because X1 can recurse (assuming the above lock are
      in fact read-lock) but lockdep doesn't know about this.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: ego@linux.vnet.ibm.com
      Cc: bp@alien8.de
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8acd91e8
  9. 28 Sep, 2014 2 commits
    • Willy Tarreau's avatar
      lzo: check for length overrun in variable length encoding. · 72cf9012
      Willy Tarreau authored
      This fix ensures that we never meet an integer overflow while adding
      255 while parsing a variable length encoding. It works differently from
      commit 206a81c1 ("lzo: properly check for overruns") because instead of
      ensuring that we don't overrun the input, which is tricky to guarantee
      due to many assumptions in the code, it simply checks that the cumulated
      number of 255 read cannot overflow by bounding this number.
      
      The MAX_255_COUNT is the maximum number of times we can add 255 to a base
      count without overflowing an integer. The multiply will overflow when
      multiplying 255 by more than MAXINT/255. The sum will overflow earlier
      depending on the base count. Since the base count is taken from a u8
      and a few bits, it is safe to assume that it will always be lower than
      or equal to 2*255, thus we can always prevent any overflow by accepting
      two less 255 steps.
      
      This patch also reduces the CPU overhead and actually increases performance
      by 1.1% compared to the initial code, while the previous fix costs 3.1%
      (measured on x86_64).
      
      The fix needs to be backported to all currently supported stable kernels.
      Reported-by: default avatarWillem Pinckaers <willem@lekkertech.net>
      Cc: "Don A. Bailey" <donb@securitymouse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72cf9012
    • Willy Tarreau's avatar
      Revert "lzo: properly check for overruns" · af958a38
      Willy Tarreau authored
      This reverts commit 206a81c1 ("lzo: properly check for overruns").
      
      As analysed by Willem Pinckaers, this fix is still incomplete on
      certain rare corner cases, and it is easier to restart from the
      original code.
      Reported-by: default avatarWillem Pinckaers <willem@lekkertech.net>
      Cc: "Don A. Bailey" <donb@securitymouse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af958a38
  10. 26 Sep, 2014 2 commits
    • Alexei Starovoitov's avatar
      bpf: mini eBPF library, test stubs and verifier testsuite · 3c731eba
      Alexei Starovoitov authored
      1.
      the library includes a trivial set of BPF syscall wrappers:
      int bpf_create_map(int key_size, int value_size, int max_entries);
      int bpf_update_elem(int fd, void *key, void *value);
      int bpf_lookup_elem(int fd, void *key, void *value);
      int bpf_delete_elem(int fd, void *key);
      int bpf_get_next_key(int fd, void *key, void *next_key);
      int bpf_prog_load(enum bpf_prog_type prog_type,
      		  const struct sock_filter_int *insns, int insn_len,
      		  const char *license);
      bpf_prog_load() stores verifier log into global bpf_log_buf[] array
      
      and BPF_*() macros to build instructions
      
      2.
      test stubs configure eBPF infra with 'unspec' map and program types.
      These are fake types used by user space testsuite only.
      
      3.
      verifier tests valid and invalid programs and expects predefined
      error log messages from kernel.
      40 tests so far.
      
      $ sudo ./test_verifier
       #0 add+sub+mul OK
       #1 unreachable OK
       #2 unreachable2 OK
       #3 out of range jump OK
       #4 out of range jump2 OK
       #5 test1 ld_imm64 OK
       ...
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c731eba
    • Vladimir Zapolskiy's avatar
      genalloc: fix device node resource counter · 6f3aabd1
      Vladimir Zapolskiy authored
      Decrement the np_pool device_node refcount, which was incremented on
      the preceding of_parse_phandle() call.
      Signed-off-by: default avatarVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f3aabd1
  11. 24 Sep, 2014 11 commits
    • Tejun Heo's avatar
      percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky · 1cae13e7
      Tejun Heo authored
      Currently, a percpu_ref which is initialized with
      PERPCU_REF_INIT_ATOMIC or switched to atomic mode via
      switch_to_atomic() automatically reverts to percpu mode on the first
      percpu_ref_reinit().  This makes the atomic mode difficult to use for
      cases where a percpu_ref is used as a persistent on/off switch which
      may be cycled multiple times.
      
      This patch makes such atomic state sticky so that it survives through
      kill/reinit cycles.  After this patch, atomic state is cleared only by
      an explicit percpu_ref_switch_to_percpu() call.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      1cae13e7
    • Tejun Heo's avatar
      percpu_ref: add PERCPU_REF_INIT_* flags · 2aad2a86
      Tejun Heo authored
      With the recent addition of percpu_ref_reinit(), percpu_ref now can be
      used as a persistent switch which can be turned on and off repeatedly
      where turning off maps to killing the ref and waiting for it to drain;
      however, there currently isn't a way to initialize a percpu_ref in its
      off (killed and drained) state, which can be inconvenient for certain
      persistent switch use cases.
      
      Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
      selection of operation mode; however, currently a newly initialized
      percpu_ref is always in percpu mode making it impossible to avoid the
      latency overhead of switching to atomic mode.
      
      This patch adds @flags to percpu_ref_init() and implements the
      following flags.
      
      * PERCPU_REF_INIT_ATOMIC	: start ref in atomic mode
      * PERCPU_REF_INIT_DEAD		: start ref killed and drained
      
      These flags should be able to serve the above two use cases.
      
      v2: target_core_tpg.c conversion was missing.  Fixed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      2aad2a86
    • Tejun Heo's avatar
      percpu_ref: decouple switching to percpu mode and reinit · f47ad457
      Tejun Heo authored
      percpu_ref has treated the dropping of the base reference and
      switching to atomic mode as an integral operation; however, there's
      nothing inherent tying the two together.
      
      The use cases for percpu_ref have been expanding continuously.  While
      the current init/kill/reinit/exit model can cover a lot, the coupling
      of kill/reinit with atomic/percpu mode switching is turning out to be
      too restrictive for use cases where many percpu_refs are created and
      destroyed back-to-back with only some of them reaching extended
      operation.  The coupling also makes implementing always-atomic debug
      mode difficult.
      
      This patch separates out percpu mode switching into
      percpu_ref_switch_to_percpu() and reimplements percpu_ref_reinit() on
      top of it.
      
      * DEAD still requires ATOMIC.  A dead ref can't be switched to percpu
        mode w/o going through reinit.
      
      v2: __percpu_ref_switch_to_percpu() was missing static.  Fixed.
          Reported by Fengguang aka kbuild test robot.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      f47ad457
    • Tejun Heo's avatar
      percpu_ref: decouple switching to atomic mode and killing · 490c79a6
      Tejun Heo authored
      percpu_ref has treated the dropping of the base reference and
      switching to atomic mode as an integral operation; however, there's
      nothing inherent tying the two together.
      
      The use cases for percpu_ref have been expanding continuously.  While
      the current init/kill/reinit/exit model can cover a lot, the coupling
      of kill/reinit with atomic/percpu mode switching is turning out to be
      too restrictive for use cases where many percpu_refs are created and
      destroyed back-to-back with only some of them reaching extended
      operation.  The coupling also makes implementing always-atomic debug
      mode difficult.
      
      This patch separates out atomic mode switching into
      percpu_ref_switch_to_atomic() and reimplements
      percpu_ref_kill_and_confirm() on top of it.
      
      * The handling of __PERCPU_REF_ATOMIC and __PERCPU_REF_DEAD is now
        differentiated.  Among get/put operations, percpu_ref_tryget_live()
        is the only one which cares about DEAD.
      
      * percpu_ref_switch_to_atomic() can be called multiple times on the
        same ref.  This means that multiple @confirm_switch may get queued
        up which we can't do reliably without extra memory area.  This is
        handled by making the later invocation synchronously wait for the
        completion of the previous one.  This isn't particularly desirable
        but such synchronous waits shouldn't happen in most cases.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      490c79a6
    • Tejun Heo's avatar
      percpu_ref: add PCPU_REF_DEAD · 27344a90
      Tejun Heo authored
      percpu_ref will be restructured so that percpu/atomic mode switching
      and reference killing are dedoupled.  In preparation, add
      PCPU_REF_DEAD and PCPU_REF_ATOMIC_DEAD which is OR of ATOMIC and DEAD.
      For now, ATOMIC and DEAD are changed together and all PCPU_REF_ATOMIC
      uses are converted to PCPU_REF_ATOMIC_DEAD without causing any
      behavior changes.
      
      percpu_ref_init() now specifies an explicit alignment when allocating
      the percpu counters so that the pointer has enough unused low bits to
      accomodate the flags.  Note that one flag was fine as min alignment
      for percpu memory is 2 bytes but two flags are already too many for
      the natural alignment of unsigned longs on archs like cris and m68k.
      
      v2: The original patch had BUILD_BUG_ON() which triggers if unsigned
          long's alignment isn't enough to accomodate the flags, which
          triggered on cris and m64k.  percpu_ref_init() updated to specify
          the required alignment explicitly.  Reported by Fengguang.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      27344a90
    • Tejun Heo's avatar
      percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch · 9e804d1f
      Tejun Heo authored
      percpu_ref will be restructured so that percpu/atomic mode switching
      and reference killing are dedoupled.  In preparation, do the following
      renames.
      
      * percpu_ref->confirm_kill	-> percpu_ref->confirm_switch
      * __PERCPU_REF_DEAD		-> __PERCPU_REF_ATOMIC
      * __percpu_ref_alive()		-> __ref_is_percpu()
      
      This patch is pure rename and doesn't introduce any functional
      changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      9e804d1f
    • Tejun Heo's avatar
      percpu_ref: replace pcpu_ prefix with percpu_ · eecc16ba
      Tejun Heo authored
      percpu_ref uses pcpu_ prefix for internal stuff and percpu_ for
      externally visible ones.  This is the same convention used in the
      percpu allocator implementation.  It works fine there but percpu_ref
      doesn't have too much internal-only stuff and scattered usages of
      pcpu_ prefix are confusing than helpful.
      
      This patch replaces all pcpu_ prefixes with percpu_.  This is pure
      rename and there's no functional change.  Note that PCPU_REF_DEAD is
      renamed to __PERCPU_REF_DEAD to signify that the flag is internal.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      eecc16ba
    • Tejun Heo's avatar
      percpu_ref: minor code and comment updates · 6251f997
      Tejun Heo authored
      * Some comments became stale.  Updated.
      * percpu_ref_tryget() unnecessarily initializes @ret.  Removed.
      * A blank line removed from percpu_ref_kill_rcu().
      * Explicit function name in a WARN format string replaced with __func__.
      * WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      6251f997
    • Tejun Heo's avatar
      percpu_ref: relocate percpu_ref_reinit() · a2237370
      Tejun Heo authored
      percpu_ref is gonna go through restructuring.  Move
      percpu_ref_reinit() after percpu_ref_kill_and_confirm().  This will
      make later changes easier to follow and result in cleaner
      organization.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      a2237370
    • Tejun Heo's avatar
      Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" · 9eca8046
      Tejun Heo authored
      This reverts commit 0a30288d, which
      was a temporary fix for SCSI blk-mq stall issue.  The following
      patches will fix the issue properly by introducing atomic mode to
      percpu_ref.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      9eca8046
    • Tejun Heo's avatar
      blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe · 0a30288d
      Tejun Heo authored
      blk-mq uses percpu_ref for its usage counter which tracks the number
      of in-flight commands and used to synchronously drain the queue on
      freeze.  percpu_ref shutdown takes measureable wallclock time as it
      involves a sched RCU grace period.  This means that draining a blk-mq
      takes measureable wallclock time.  One would think that this shouldn't
      matter as queue shutdown should be a rare event which takes place
      asynchronously w.r.t. userland.
      
      Unfortunately, SCSI probing involves synchronously setting up and then
      tearing down a lot of request_queues back-to-back for non-existent
      LUNs.  This means that SCSI probing may take more than ten seconds
      when scsi-mq is used.
      
      This will be properly fixed by implementing a mechanism to keep
      q->mq_usage_counter in atomic mode till genhd registration; however,
      that involves rather big updates to percpu_ref which is difficult to
      apply late in the devel cycle (v3.17-rc6 at the moment).  As a
      stop-gap measure till the proper fix can be implemented in the next
      cycle, this patch introduces __percpu_ref_kill_expedited() and makes
      blk_mq_freeze_queue() use it.  This is heavy-handed but should work
      for testing the experimental SCSI blk-mq implementation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
      Fixes: add703fd ("blk-mq: use percpu_ref for mq usage count")
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Tested-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0a30288d
  12. 22 Sep, 2014 1 commit
  13. 19 Sep, 2014 2 commits
    • Tejun Heo's avatar
      percpu-refcount: make percpu_ref based on longs instead of ints · e625305b
      Tejun Heo authored
      percpu_ref is currently based on ints and the number of refs it can
      cover is (1 << 31).  This makes it impossible to use a percpu_ref to
      count memory objects or pages on 64bit machines as it may overflow.
      This forces those users to somehow aggregate the references before
      contributing to the percpu_ref which is often cumbersome and sometimes
      challenging to get the same level of performance as using the
      percpu_ref directly.
      
      While using ints for the percpu counters makes them pack tighter on
      64bit machines, the possible gain from using ints instead of longs is
      extremely small compared to the overall gain from per-cpu operation.
      This patch makes percpu_ref based on longs so that it can be used to
      directly count memory objects or pages.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      e625305b
    • Tejun Heo's avatar
      percpu-refcount: improve WARN messages · 4843c332
      Tejun Heo authored
      percpu_ref's WARN messages can be a lot more helpful by indicating
      who's the culprit.  Make them report the release function that the
      offending percpu-refcount is associated with.  This should make it a
      lot easier to track down the reported invalid refcnting operations.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      4843c332