1. 05 Mar, 2012 6 commits
    • Hugh Dickins's avatar
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins authored
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
    • Oleg Nesterov's avatar
      vfork: kill PF_STARTING · 6e27f63e
      Oleg Nesterov authored
      Previously it was (ab)used by utrace.  Then it was wrongly used by the
      scheduler code.
      
      Currently it is not used, kill it before it finds the new erroneous user.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e27f63e
    • Oleg Nesterov's avatar
      coredump_wait: don't call complete_vfork_done() · 57b59c4a
      Oleg Nesterov authored
      Now that CLONE_VFORK is killable, coredump_wait() no longer needs
      complete_vfork_done().  zap_threads() should find and kill all tasks with
      the same ->mm, this includes our parent if ->vfork_done is set.
      
      mm_release() becomes the only caller, unexport complete_vfork_done().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57b59c4a
    • Oleg Nesterov's avatar
      vfork: make it killable · d68b46fe
      Oleg Nesterov authored
      Make vfork() killable.
      
      Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
      fails we do not return to the user-mode and never touch the memory shared
      with our child.
      
      However, in this case we should clear child->vfork_done before return, we
      use task_lock() in do_fork()->wait_for_vfork_done() and
      complete_vfork_done() to serialize with each other.
      
      Note: now that we use task_lock() we don't really need completion, we
      could turn task->vfork_done into "task_struct *wake_up_me" but this needs
      some complications.
      
      NOTE: this and the next patches do not affect in-kernel users of
      CLONE_VFORK, kernel threads run with all signals ignored including
      SIGKILL/SIGSTOP.
      
      However this is obviously the user-visible change.  Not only a fatal
      signal can kill the vforking parent, a sub-thread can do execve or
      exit_group() and kill the thread sleeping in vfork().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d68b46fe
    • Oleg Nesterov's avatar
      vfork: introduce complete_vfork_done() · c415c3b4
      Oleg Nesterov authored
      No functional changes.
      
      Move the clear-and-complete-vfork_done code into the new trivial helper,
      complete_vfork_done().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c415c3b4
    • Matthew Garrett's avatar
      kmsg_dump: don't run on non-error paths by default · c22ab332
      Matthew Garrett authored
      Since commit 04c6862c ("kmsg_dump: add kmsg_dump() calls to the
      reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
      run on normal paths including poweroff and reboot.
      
      This is less than ideal given pstore implementations that can only
      represent single backtraces, since a reboot may overwrite a stored oops
      before it's been picked up by userspace.  In addition, some pstore
      backends may have low performance and provide a significant delay in
      reboot as a result.
      
      This patch adds a printk.always_kmsg_dump kernel parameter (which can also
      be changed from userspace).  Without it, the code will only be run on
      failure paths rather than on normal paths.  The option can be enabled in
      environments where there's a desire to attempt to audit whether or not a
      reboot was cleanly requested or not.
      Signed-off-by: default avatarMatthew Garrett <mjg@redhat.com>
      Acked-by: default avatarSeiji Aguchi <seiji.aguchi@hds.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c22ab332
  2. 04 Mar, 2012 2 commits
  3. 02 Mar, 2012 5 commits
  4. 28 Feb, 2012 1 commit
  5. 27 Feb, 2012 1 commit
    • James Bottomley's avatar
      [PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional · 97a29d59
      James Bottomley authored
      The problem in
      
      commit fea80311
      Author: Randy Dunlap <rdunlap@xenotime.net>
      Date:   Sun Jul 24 11:39:14 2011 -0700
      
          iomap: make IOPORT/PCI mapping functions conditional
      
      is that if your architecture supplies pci_iomap/pci_iounmap, it expects
      always to supply them.  Adding empty body defitions in the !CONFIG_PCI
      case, which is what this patch does, breaks the parisc compile because
      the functions become doubly defined.  It took us a while to spot this,
      because we don't actually build !CONFIG_PCI very often (only if someone
      is brave enough to test the snake/asp machines).
      
      Since the note in the commit log says this is to fix a
      CONFIG_GENERIC_IOMAP issue (which it does because CONFIG_GENERIC_IOMAP
      supplies pci_iounmap only if CONFIG_PCI is set), there should actually
      have been a condition upon this.  This should make sure no other
      architecture's !CONFIG_PCI compile breaks in the same way as parisc.
      
      The fix had to be updated to take account of the GENERIC_PCI_IOMAP
      separation.
      Reported-by: default avatarRolf Eike Beer <eike@sf-mail.de>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      97a29d59
  6. 26 Feb, 2012 1 commit
  7. 24 Feb, 2012 2 commits
    • Oleg Nesterov's avatar
      epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() · d80e731e
      Oleg Nesterov authored
      This patch is intentionally incomplete to simplify the review.
      It ignores ep_unregister_pollwait() which plays with the same wqh.
      See the next change.
      
      epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
      f_op->poll() needs. In particular it assumes that the wait queue
      can't go away until eventpoll_release(). This is not true in case
      of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
      which is not connected to the file.
      
      This patch adds the special event, POLLFREE, currently only for
      epoll. It expects that init_poll_funcptr()'ed hook should do the
      necessary cleanup. Perhaps it should be defined as EPOLLFREE in
      eventpoll.
      
      __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
      ->signalfd_wqh is not empty, we add the new signalfd_cleanup()
      helper.
      
      ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
      This make this poll entry inconsistent, but we don't care. If you
      share epoll fd which contains our sigfd with another process you
      should blame yourself. signalfd is "really special". I simply do
      not know how we can define the "right" semantics if it used with
      epoll.
      
      The main problem is, epoll calls signalfd_poll() once to establish
      the connection with the wait queue, after that signalfd_poll(NULL)
      returns the different/inconsistent results depending on who does
      EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
      has nothing to do with the file, it works with the current thread.
      
      In short: this patch is the hack which tries to fix the symptoms.
      It also assumes that nobody can take tasklist_lock under epoll
      locks, this seems to be true.
      
      Note:
      
      	- we do not have wake_up_all_poll() but wake_up_poll()
      	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.
      
      	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
      	  we need a couple of simple changes in eventpoll.c to
      	  make sure it can't be "lost".
      Reported-by: default avatarMaxime Bizon <mbizon@freebox.fr>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d80e731e
    • Jozsef Kadlecsik's avatar
      netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2) · 7d367e06
      Jozsef Kadlecsik authored
      Marcell Zambo and Janos Farago noticed and reported that when
      new conntrack entries are added via netlink and the conntrack table
      gets full, soft lockup happens. This is because the nf_conntrack_lock
      is held while nf_conntrack_alloc is called, which is in turn wants
      to lock nf_conntrack_lock while evicting entries from the full table.
      
      The patch fixes the soft lockup with limiting the holding of the
      nf_conntrack_lock to the minimum, where it's absolutely required.
      It required to extend (and thus change) nf_conntrack_hash_insert
      so that it makes sure conntrack and ctnetlink do not add the same entry
      twice to the conntrack table.
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7d367e06
  8. 23 Feb, 2012 2 commits
  9. 22 Feb, 2012 1 commit
    • Peter Zijlstra's avatar
      sched/events: Revert trace_sched_stat_sleeptime() · 8c79a045
      Peter Zijlstra authored
      Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
      added a new sched:sched_stat_sleeptime tracepoint.
      
      It's broken: the first sample we get on a task might be bad because
      of a stale sleep_start value that wasn't reset at the last task switch
      because the tracepoint was not active.
      
      It also breaks the existing schedstat samples due to the side
      effects of:
      
      -               se->statistics.sleep_start = 0;
      ...
      -               se->statistics.block_start = 0;
      
      Nor do I see means to fix it without adding overhead to the scheduler
      fast path, which I'm not willing to for the sake of redundant
      instrumentation.
      
      Most importantly, sleep time information can already be constructed
      by tracing context switches and wakeups, and taking the timestamp
      difference between the schedule-out, the wakeup and the schedule-in.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Vagin <avagin@openvz.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8c79a045
  10. 21 Feb, 2012 6 commits
    • Linus Torvalds's avatar
      sys_poll: fix incorrect type for 'timeout' parameter · faf30900
      Linus Torvalds authored
      The 'poll()' system call timeout parameter is supposed to be 'int', not
      'long'.
      
      Now, the reason this matters is that right now 32-bit compat mode is
      broken on at least x86-64, because the 32-bit code just calls
      'sys_poll()' directly on x86-64, and the 32-bit argument will have been
      zero-extended, turning a signed 'int' into a large unsigned 'long'
      value.
      
      We could just introduce a 'compat_sys_poll()' function for this, and
      that may eventually be what we have to do, but since the actual standard
      poll() semantics is *supposed* to be 'int', and since at least on x86-64
      glibc sign-extends the argument before invocing the system call (so
      nobody can actually use a 64-bit timeout value in user space _anyway_,
      even in 64-bit binaries), the simpler solution would seem to be to just
      fix the definition of the system call to match what it should have been
      from the very start.
      
      If it turns out that somebody somehow circumvents the user-level libc
      64-bit sign extension and actually uses a large unsigned 64-bit timeout
      despite that not being how poll() is supposed to work, we will need to
      do the compat_sys_poll() approach.
      Reported-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      faf30900
    • Hitoshi Mitake's avatar
      asm-generic: architecture independent readq/writeq for 32bit environment · 797a796a
      Hitoshi Mitake authored
      This provides unified readq()/writeq() helper functions for 32-bit
      drivers.
      
      For some cases, readq/writeq without atomicity is harmful, and order of
      io access has to be specified explicitly.  So in this patch, new two
      header files which contain non-atomic readq/writeq are added.
      
       - <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/
         writeq with the order of lower address -> higher address
      
       - <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/
         writeq with reversed order
      
      This allows us to remove some readq()s that were added drivers when the
      default non-atomic ones were removed in commit dbee8a0a ("x86:
      remove 32-bit versions of readq()/writeq()")
      
      The drivers which need readq/writeq but can do with the non-atomic ones
      must add the line:
      
        #include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */
      
      But this will be nop in 64-bit environments, and no other #ifdefs are
      required.  So I believe that this patch can solve the problem of
       1. driver-specific readq/writeq
       2. atomicity and order of io access
      
      This patch is tested with building allyesconfig and allmodconfig as
      ARCH=x86 and ARCH=i386 on top of tip/master.
      
      Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Ravi Anand <ravi.anand@qlogic.com>
      Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Cc: Jason Uhlenkott <juhlenko@akamai.com>
      Cc: James Bottomley <James.Bottomley@parallels.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarHitoshi Mitake <h.mitake@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      797a796a
    • Greg Rose's avatar
      rtnetlink: Fix problem with buffer allocation · 115c9b81
      Greg Rose authored
      Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
      is a 32 bit value that can be used to indicate to the kernel that
      certain extended ifinfo values are requested by the user application.
      At this time the only mask value defined is RTEXT_FILTER_VF to
      indicate that the user wants the ifinfo dump to send information
      about the VFs belonging to the interface.
      
      This patch fixes a bug in which certain applications do not have
      large enough buffers to accommodate the extra information returned
      by the kernel with large numbers of SR-IOV virtual functions.
      Those applications will not send the new netlink attribute with
      the interface info dump request netlink messages so they will
      not get unexpectedly large request buffers returned by the kernel.
      
      Modifies the rtnl_calcit function to traverse the list of net
      devices and compute the minimum buffer size that can hold the
      info dumps of all matching devices based upon the filter passed
      in via the new netlink attribute filter mask.  If no filter
      mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
      
      With this change it is possible to add yet to be defined netlink
      attributes to the dump request which should make it fairly extensible
      in the future.
      Signed-off-by: default avatarGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      115c9b81
    • Ming Lei's avatar
      percpu: use raw_local_irq_* in _this_cpu op · e920d597
      Ming Lei authored
      It doesn't make sense to trace irq off or do irq flags
      lock proving inside 'this_cpu' operations, so replace local_irq_*
      with raw_local_irq_* in 'this_cpu' op.
      
      Also the patch fixes onelockdep warning[1] by the replacement, see
      below:
      
      In commit: 933393f5(percpu:
      Remove irqsafe_cpu_xxx variants), local_irq_save/restore(flags) are
      added inside this_cpu_inc operation, so that trace_hardirqs_off_caller
      will be called by trace_hardirqs_on_caller directly because
      __debug_atomic_inc is implemented as this_cpu_inc, which may trigger
      the lockdep warning[1], for example in the below ARM scenary:
      
      	kernel_thread_helper	/*irq disabled*/
      		->trace_hardirqs_on_caller	/*hardirqs_enabled was set*/
      			->trace_hardirqs_off_caller	/*hardirqs_enabled cleared*/
      				__this_cpu_add(redundant_hardirqs_on)
      			->trace_hardirqs_off_caller	/*irq disabled, so call here*/
      
      The 'unannotated irqs-on' warning will be triggered somewhere because
      irq is just enabled after the irq trace in kernel_thread_helper.
      
      [1],
      [    0.162841] ------------[ cut here ]------------
      [    0.167694] WARNING: at kernel/lockdep.c:3493 check_flags+0xc0/0x1d0()
      [    0.174468] Modules linked in:
      [    0.177703] Backtrace:
      [    0.180328] [<c00171f0>] (dump_backtrace+0x0/0x110) from [<c0412320>] (dump_stack+0x18/0x1c)
      [    0.189086]  r6:c051f778 r5:00000da5 r4:00000000 r3:60000093
      [    0.195007] [<c0412308>] (dump_stack+0x0/0x1c) from [<c00410e8>] (warn_slowpath_common+0x54/0x6c)
      [    0.204223] [<c0041094>] (warn_slowpath_common+0x0/0x6c) from [<c0041124>] (warn_slowpath_null+0x24/0x2c)
      [    0.214111]  r8:00000000 r7:00000000 r6:ee069598 r5:60000013 r4:ee082000
      [    0.220825] r3:00000009
      [    0.223693] [<c0041100>] (warn_slowpath_null+0x0/0x2c) from [<c0088f38>] (check_flags+0xc0/0x1d0)
      [    0.232910] [<c0088e78>] (check_flags+0x0/0x1d0) from [<c008d348>] (lock_acquire+0x4c/0x11c)
      [    0.241668] [<c008d2fc>] (lock_acquire+0x0/0x11c) from [<c0415aa4>] (_raw_spin_lock+0x3c/0x74)
      [    0.250610] [<c0415a68>] (_raw_spin_lock+0x0/0x74) from [<c010a844>] (set_task_comm+0x20/0xc0)
      [    0.259521]  r6:ee069588 r5:ee0691c0 r4:ee082000
      [    0.264404] [<c010a824>] (set_task_comm+0x0/0xc0) from [<c0060780>] (kthreadd+0x28/0x108)
      [    0.272857]  r8:00000000 r7:00000013 r6:c0044a08 r5:ee0691c0 r4:ee082000
      [    0.279571] r3:ee083fe0
      [    0.282470] [<c0060758>] (kthreadd+0x0/0x108) from [<c0044a08>] (do_exit+0x0/0x6dc)
      [    0.290405]  r5:c0060758 r4:00000000
      [    0.294189] ---[ end trace 1b75b31a2719ed1c ]---
      [    0.299041] possible reason: unannotated irqs-on.
      [    0.303955] irq event stamp: 5
      [    0.307159] hardirqs last  enabled at (4): [<c001331c>] no_work_pending+0x8/0x2c
      [    0.314880] hardirqs last disabled at (5): [<c0089b08>] trace_hardirqs_on_caller+0x60/0x26c
      [    0.323547] softirqs last  enabled at (0): [<c003f754>] copy_process+0x33c/0xef4
      [    0.331207] softirqs last disabled at (0): [<  (null)>]   (null)
      [    0.337585] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e920d597
    • Konstantin Khlebnikov's avatar
      percpu: fix generic definition of __this_cpu_add_and_return() · 7d96b3e5
      Konstantin Khlebnikov authored
      This patch adds missed "__" into function prefix.
      Otherwise on all archectures (except x86) it expands to irq/preemtion-safe
      variant: _this_cpu_generic_add_return(), which do extra irq-save/irq-restore.
      Optimal generic implementation is __this_cpu_generic_add_return().
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      7d96b3e5
    • Joerg Willmann's avatar
      netfilter: ebtables: fix alignment problem in ppc · 88ba136d
      Joerg Willmann authored
      ebt_among extension of ebtables uses __alignof__(_xt_align) while the
      corresponding kernel module uses __alignof__(ebt_replace) to determine
      the alignment in EBT_ALIGN().
      
      These are the results of these values on different platforms:
      
      x86 x86_64 ppc
      __alignof__(_xt_align) 4 8 8
      __alignof__(ebt_replace) 4 8 4
      
      ebtables fails to add rules which use the among extension.
      
      I'm using kernel 2.6.33 and ebtables 2.0.10-4
      
      According to Bart De Schuymer, userspace alignment was changed to
      _xt_align to fix an alignment issue on a userspace32-kernel64 system
      (he thinks it was for an ARM device). So userspace must be right.
      The kernel alignment macro needs to change so it also uses _xt_align
      instead of ebt_replace. The userspace changes date back from
      June 29, 2009.
      Signed-off-by: default avatarJoerg Willmann <joe@clnt.de>
      Signed-off by: Bart De Schuymer <bdschuym@pandora.be>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      88ba136d
  11. 20 Feb, 2012 1 commit
  12. 15 Feb, 2012 7 commits
    • Alexey Dobriyan's avatar
      crypto: sha512 - use standard ror64() · f2ea0f5f
      Alexey Dobriyan authored
      Use standard ror64() instead of hand-written.
      There is no standard ror64, so create it.
      
      The difference is shift value being "unsigned int" instead of uint64_t
      (for which there is no reason). gcc starts to emit native ROR instructions
      which it doesn't do for some reason currently. This should make the code
      faster.
      
      Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      f2ea0f5f
    • Ulisses Furquim's avatar
      Bluetooth: Remove usage of __cancel_delayed_work() · 6de32750
      Ulisses Furquim authored
      __cancel_delayed_work() is being used in some paths where we cannot
      sleep waiting for the delayed work to finish. However, that function
      might return while the timer is running and the work will be queued
      again. Replace the calls with safer cancel_delayed_work() version
      which spins until the timer handler finishes on other CPUs and
      cancels the delayed work.
      Signed-off-by: default avatarUlisses Furquim <ulisses@profusion.mobi>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      6de32750
    • Andre Guedes's avatar
      Bluetooth: Fix potential deadlock · a51cd2be
      Andre Guedes authored
      We don't need to use the _sync variant in hci_conn_hold and
      hci_conn_put to cancel conn->disc_work delayed work. This way
      we avoid potential deadlocks like this one reported by lockdep.
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.2.0+ #1 Not tainted
      -------------------------------------------------------
      kworker/u:1/17 is trying to acquire lock:
       (&hdev->lock){+.+.+.}, at: [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
      
      but task is already holding lock:
       ((&(&conn->disc_work)->work)){+.+...}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 ((&(&conn->disc_work)->work)){+.+...}:
             [<ffffffff81057444>] lock_acquire+0x8a/0xa7
             [<ffffffff81034ed1>] wait_on_work+0x3d/0xaa
             [<ffffffff81035b54>] __cancel_work_timer+0xac/0xef
             [<ffffffff81035ba4>] cancel_delayed_work_sync+0xd/0xf
             [<ffffffffa00554b0>] smp_chan_create+0xde/0xe6 [bluetooth]
             [<ffffffffa0056160>] smp_conn_security+0xa3/0x12d [bluetooth]
             [<ffffffffa0053640>] l2cap_connect_cfm+0x237/0x2e8 [bluetooth]
             [<ffffffffa004239c>] hci_proto_connect_cfm+0x2d/0x6f [bluetooth]
             [<ffffffffa0046ea5>] hci_event_packet+0x29d1/0x2d60 [bluetooth]
             [<ffffffffa003dde3>] hci_rx_work+0xd0/0x2e1 [bluetooth]
             [<ffffffff810357af>] process_one_work+0x178/0x2bf
             [<ffffffff81036178>] worker_thread+0xce/0x152
             [<ffffffff81039a03>] kthread+0x95/0x9d
             [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10
      
      -> #1 (slock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}:
             [<ffffffff81057444>] lock_acquire+0x8a/0xa7
             [<ffffffff812e553a>] _raw_spin_lock_bh+0x36/0x6a
             [<ffffffff81244d56>] lock_sock_nested+0x24/0x7f
             [<ffffffffa004d96f>] lock_sock+0xb/0xd [bluetooth]
             [<ffffffffa0052906>] l2cap_chan_connect+0xa9/0x26f [bluetooth]
             [<ffffffffa00545f8>] l2cap_sock_connect+0xb3/0xff [bluetooth]
             [<ffffffff81243b48>] sys_connect+0x69/0x8a
             [<ffffffff812e6579>] system_call_fastpath+0x16/0x1b
      
      -> #0 (&hdev->lock){+.+.+.}:
             [<ffffffff81056d06>] __lock_acquire+0xa80/0xd74
             [<ffffffff81057444>] lock_acquire+0x8a/0xa7
             [<ffffffff812e3870>] __mutex_lock_common+0x48/0x38e
             [<ffffffff812e3c75>] mutex_lock_nested+0x2a/0x31
             [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
             [<ffffffff810357af>] process_one_work+0x178/0x2bf
             [<ffffffff81036178>] worker_thread+0xce/0x152
             [<ffffffff81039a03>] kthread+0x95/0x9d
             [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10
      
      other info that might help us debug this:
      
      Chain exists of:
        &hdev->lock --> slock-AF_BLUETOOTH-BTPROTO_L2CAP --> (&(&conn->disc_work)->work)
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock((&(&conn->disc_work)->work));
                                     lock(slock-AF_BLUETOOTH-BTPROTO_L2CAP);
                                     lock((&(&conn->disc_work)->work));
        lock(&hdev->lock);
      
       *** DEADLOCK ***
      
      2 locks held by kworker/u:1/17:
       #0:  (hdev->name){.+.+.+}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf
       #1:  ((&(&conn->disc_work)->work)){+.+...}, at: [<ffffffff81035751>] process_one_work+0x11a/0x2bf
      
      stack backtrace:
      Pid: 17, comm: kworker/u:1 Not tainted 3.2.0+ #1
      Call Trace:
       [<ffffffff812e06c6>] print_circular_bug+0x1f8/0x209
       [<ffffffff81056d06>] __lock_acquire+0xa80/0xd74
       [<ffffffff81021ef2>] ? arch_local_irq_restore+0x6/0xd
       [<ffffffff81022bc7>] ? vprintk+0x3f9/0x41e
       [<ffffffff81057444>] lock_acquire+0x8a/0xa7
       [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff812e3870>] __mutex_lock_common+0x48/0x38e
       [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff81190fd6>] ? __dynamic_pr_debug+0x6d/0x6f
       [<ffffffffa0041155>] ? hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff8105320f>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffff812e3c75>] mutex_lock_nested+0x2a/0x31
       [<ffffffffa0041155>] hci_conn_timeout+0x62/0x158 [bluetooth]
       [<ffffffff810357af>] process_one_work+0x178/0x2bf
       [<ffffffff81035751>] ? process_one_work+0x11a/0x2bf
       [<ffffffff81055af3>] ? lock_acquired+0x1d0/0x1df
       [<ffffffffa00410f3>] ? hci_acl_disconn+0x65/0x65 [bluetooth]
       [<ffffffff81036178>] worker_thread+0xce/0x152
       [<ffffffff810407ed>] ? finish_task_switch+0x45/0xc5
       [<ffffffff810360aa>] ? manage_workers.isra.25+0x16a/0x16a
       [<ffffffff81039a03>] kthread+0x95/0x9d
       [<ffffffff812e7754>] kernel_thread_helper+0x4/0x10
       [<ffffffff812e5db4>] ? retint_restore_args+0x13/0x13
       [<ffffffff8103996e>] ? __init_kthread_worker+0x55/0x55
       [<ffffffff812e7750>] ? gs_change+0x13/0x13
      Signed-off-by: default avatarAndre Guedes <andre.guedes@openbossa.org>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@openbossa.org>
      Reviewed-by: default avatarUlisses Furquim <ulisses@profusion.mobi>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      a51cd2be
    • Octavian Purdila's avatar
      Bluetooth: silence lockdep warning · b5a30dda
      Octavian Purdila authored
      Since bluetooth uses multiple protocols types, to avoid lockdep
      warnings, we need to use different lockdep classes (one for each
      protocol type).
      
      This is already done in bt_sock_create but it misses a couple of cases
      when new connections are created. This patch corrects that to fix the
      following warning:
      
      <4>[ 1864.732366] =======================================================
      <4>[ 1864.733030] [ INFO: possible circular locking dependency detected ]
      <4>[ 1864.733544] 3.0.16-mid3-00007-gc9a0f62 #3
      <4>[ 1864.733883] -------------------------------------------------------
      <4>[ 1864.734408] t.android.btclc/4204 is trying to acquire lock:
      <4>[ 1864.734869]  (rfcomm_mutex){+.+.+.}, at: [<c14970ea>] rfcomm_dlc_close+0x15/0x30
      <4>[ 1864.735541]
      <4>[ 1864.735549] but task is already holding lock:
      <4>[ 1864.736045]  (sk_lock-AF_BLUETOOTH){+.+.+.}, at: [<c1498bf7>] lock_sock+0xa/0xc
      <4>[ 1864.736732]
      <4>[ 1864.736740] which lock already depends on the new lock.
      <4>[ 1864.736750]
      <4>[ 1864.737428]
      <4>[ 1864.737437] the existing dependency chain (in reverse order) is:
      <4>[ 1864.738016]
      <4>[ 1864.738023] -> #1 (sk_lock-AF_BLUETOOTH){+.+.+.}:
      <4>[ 1864.738549]        [<c1062273>] lock_acquire+0x104/0x140
      <4>[ 1864.738977]        [<c13d35c1>] lock_sock_nested+0x58/0x68
      <4>[ 1864.739411]        [<c1493c33>] l2cap_sock_sendmsg+0x3e/0x76
      <4>[ 1864.739858]        [<c13d06c3>] __sock_sendmsg+0x50/0x59
      <4>[ 1864.740279]        [<c13d0ea2>] sock_sendmsg+0x94/0xa8
      <4>[ 1864.740687]        [<c13d0ede>] kernel_sendmsg+0x28/0x37
      <4>[ 1864.741106]        [<c14969ca>] rfcomm_send_frame+0x30/0x38
      <4>[ 1864.741542]        [<c1496a2a>] rfcomm_send_ua+0x58/0x5a
      <4>[ 1864.741959]        [<c1498447>] rfcomm_run+0x441/0xb52
      <4>[ 1864.742365]        [<c104f095>] kthread+0x63/0x68
      <4>[ 1864.742742]        [<c14d5182>] kernel_thread_helper+0x6/0xd
      <4>[ 1864.743187]
      <4>[ 1864.743193] -> #0 (rfcomm_mutex){+.+.+.}:
      <4>[ 1864.743667]        [<c1061ada>] __lock_acquire+0x988/0xc00
      <4>[ 1864.744100]        [<c1062273>] lock_acquire+0x104/0x140
      <4>[ 1864.744519]        [<c14d2c70>] __mutex_lock_common+0x3b/0x33f
      <4>[ 1864.744975]        [<c14d303e>] mutex_lock_nested+0x2d/0x36
      <4>[ 1864.745412]        [<c14970ea>] rfcomm_dlc_close+0x15/0x30
      <4>[ 1864.745842]        [<c14990d9>] __rfcomm_sock_close+0x5f/0x6b
      <4>[ 1864.746288]        [<c1499114>] rfcomm_sock_shutdown+0x2f/0x62
      <4>[ 1864.746737]        [<c13d275d>] sys_socketcall+0x1db/0x422
      <4>[ 1864.747165]        [<c14d42f0>] syscall_call+0x7/0xb
      Signed-off-by: default avatarOctavian Purdila <octavian.purdila@intel.com>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      b5a30dda
    • Vinicius Costa Gomes's avatar
      Bluetooth: Fix using an absolute timeout on hci_conn_put() · 33166063
      Vinicius Costa Gomes authored
      queue_delayed_work() expects a relative time for when that work
      should be scheduled.
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@openbossa.org>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      33166063
    • Andrzej Kaczmarek's avatar
      Bluetooth: l2cap_set_timer needs jiffies as timeout value · 6e1da683
      Andrzej Kaczmarek authored
      After moving L2CAP timers to workqueues l2cap_set_timer expects timeout
      value to be specified in jiffies but constants defined in miliseconds
      are used. This makes timeouts unreliable when CONFIG_HZ is not set to
      1000.
      
      __set_chan_timer macro still uses jiffies as input to avoid multiple
      conversions from/to jiffies for sk_sndtimeo value which is already
      specified in jiffies.
      Signed-off-by: default avatarAndrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
      Ackec-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      6e1da683
    • Johan Hedberg's avatar
      Bluetooth: Remove bogus inline declaration from l2cap_chan_connect · 4aa832c2
      Johan Hedberg authored
      As reported by Dan Carpenter this function causes a Sparse warning and
      shouldn't be declared inline:
      
      include/net/bluetooth/l2cap.h:837:30 error: marked inline, but without a
      definition"
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      4aa832c2
  13. 14 Feb, 2012 2 commits
  14. 13 Feb, 2012 3 commits
    • Jan Kara's avatar
      vfs: Provide function to get superblock and wait for it to thaw · 6b6dc836
      Jan Kara authored
      In quota code we need to find a superblock corresponding to a device and wait
      for superblock to be unfrozen. However this waiting has to happen without
      s_umount semaphore because that is required for superblock to thaw. So provide
      a function in VFS for this to keep dances with s_umount where they belong.
      
      [AV: implementation switched to saner variant]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6b6dc836
    • Seungwon Jeon's avatar
      mmc: dw_mmc: Fix PIO mode with support of highmem · f9c2a0dc
      Seungwon Jeon authored
      Current PIO mode makes a kernel crash with CONFIG_HIGHMEM.
      Highmem pages have a NULL from sg_virt(sg).
      This patch fixes the following problem.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c0004000
      [00000000] *pgd=00000000
      Internal error: Oops: 817 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0    Not tainted  (3.0.15-01423-gdbf465f #589)
      PC is at dw_mci_pull_data32+0x4c/0x9c
      LR is at dw_mci_read_data_pio+0x54/0x1f0
      pc : [<c0358824>]    lr : [<c035988c>]    psr: 20000193
      sp : c0619d48  ip : c0619d70  fp : c0619d6c
      r10: 00000000  r9 : 00000002  r8 : 00001000
      r7 : 00000200  r6 : 00000000  r5 : e1dd3100  r4 : 00000000
      r3 : 65622023  r2 : 0000007f  r1 : eeb96000  r0 : e1dd3100
      Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment
      xkernel
      Control: 10c5387d  Table: 61e2004a  DAC: 00000015
      Process swapper (pid: 0, stack limit = 0xc06182f0)
      Stack: (0xc0619d48 to 0xc061a000)
      9d40:                   e1dd3100 e1a4f000 00000000 e1dd3100 e1a4f000 00000200
      9d60: c0619da4 c0619d70 c035988c c03587e4 c0619d9c e18158f4 e1dd3100 e1dd3100
      9d80: 00000020 00000000 00000000 00000020 c06e8a84 00000000 c0619e04 c0619da8
      9da0: c0359b24 c0359844 e18158f4 e1dd3164 e1dd3168 e1dd3150 3d02fc79 e1dd3154
      9dc0: e1dd3178 00000000 00000020 00000000 e1dd3150 00000000 c10dd7e8 e1a84900
      9de0: c061e7cc 00000000 00000000 0000008d c06e8a84 c061e780 c0619e4c c0619e08
      9e00: c00c4738 c0359a34 3d02fc79 00000000 c0619e4c c05a1698 c05a1670 c05a165c
      9e20: c04de8b0 c061e780 c061e7cc e1a84900 ffffed68 0000008d c0618000 00000000
      9e40: c0619e6c c0619e50 c00c48b4 c00c46c8 c061e780 c00423ac c061e7cc ffffed68
      9e60: c0619e8c c0619e70 c00c7358 c00c487c 0000008d ffffee38 c0618000 ffffed68
      9e80: c0619ea4 c0619e90 c00c4258 c00c72b0 c00423ac ffffee38 c0619ecc c0619ea8
      9ea0: c004241c c00c4234 ffffffff f8810000 0000006d 00000002 00000001 7fffffff
      9ec0: c0619f44 c0619ed0 c0048bc0 c00423c4 220ae7a9 00000000 386f0d30 0005d3a4
      9ee0: c00423ac c10dd0b8 c06f2cd8 c0618000 c0594778 c003a674 7fffffff c0619f44
      9f00: 386f0d30 c0619f18 c00a6f94 c005be3c 80000013 ffffffff 386f0d30 0005d3a4
      9f20: 386f0d30 0005d2d1 c10dd0a8 c10dd0b8 c06f2cd8 c0618000 c0619f74 c0619f48
      9f40: c0345858 c005be00 c00a2440 c0618000 c0618000 c00410d8 c06c1944 c00410fc
      9f60: c0594778 c003a674 c0619f9c c0619f78 c004a7e8 c03457b4 c0618000 c06c18f8
      9f80: 00000000 c0039c70 c06c18d4 c003a674 c0619fb4 c0619fa0 c04ceafc c004a714
      9fa0: c06287b4 c06c18f8 c0619ff4 c0619fb8 c0008b68 c04cea68 c0008578 00000000
      9fc0: 00000000 c003a674 00000000 10c5387d c0628658 c003aa78 c062f1c4 4000406a
      9fe0: 413fc090 00000000 00000000 c0619ff8 40008044 c0008858 00000000 00000000
      Backtrace:
      [<c03587d8>] (dw_mci_pull_data32+0x0/0x9c) from [<c035988c>] (dw_mci_read_data_pio+0x54/0x1f0)
       r6:00000200 r5:e1a4f000 r4:e1dd3100
       [<c0359838>] (dw_mci_read_data_pio+0x0/0x1f0) from [<c0359b24>] (dw_mci_interrupt+0xfc/0x4a4)
      [<c0359a28>] (dw_mci_interrupt+0x0/0x4a4) from [<c00c4738>] (handle_irq_event_percpu+0x7c/0x1b4)
      [<c00c46bc>] (handle_irq_event_percpu+0x0/0x1b4) from [<c00c48b4>] (handle_irq_event+0x44/0x64)
      [<c00c4870>] (handle_irq_event+0x0/0x64) from [<c00c7358>] (handle_fasteoi_irq+0xb4/0x124)
       r7:ffffed68 r6:c061e7cc r5:c00423ac r4:c061e780
       [<c00c72a4>] (handle_fasteoi_irq+0x0/0x124) from [<c00c4258>] (generic_handle_irq+0x30/0x38)
       r7:ffffed68 r6:c0618000 r5:ffffee38 r4:0000008d
       [<c00c4228>] (generic_handle_irq+0x0/0x38) from [<c004241c>] (asm_do_IRQ+0x64/0xe0)
       r5:ffffee38 r4:c00423ac
       [<c00423b8>] (asm_do_IRQ+0x0/0xe0) from [<c0048bc0>] (__irq_svc+0x80/0x14c)
      Exception stack(0xc0619ed0 to 0xc0619f18)
      Signed-off-by: default avatarSeungwon Jeon <tgih.jun@samsung.com>
      Acked-by: default avatarWill Newton <will.newton@imgtec.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      f9c2a0dc
    • Girish K S's avatar
      mmc: core: Fix PowerOff Notify suspend/resume · 3e73c36b
      Girish K S authored
      Modified the mmc_poweroff to resume before sending the poweroff
      notification command. In sleep mode only AWAKE and RESET commands are
      allowed, so before sending the poweroff notification command resume from
      sleep mode and then send the notification command.
      
      PowerOff Notify is tested on a Synopsis Designware Host Controller
      (eMMC 4.5). The suspend to RAM and resume works fine.
      Signed-off-by: default avatarGirish K S <girish.shivananjappa@linaro.org>
      Tested-by: default avatarGirish K S <girish.shivananjappa@linaro.org>
      Reviewed-by: default avatarSaugata Das <saugata.das@linaro.org>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      3e73c36b