1. 28 Mar, 2012 1 commit
  2. 05 Mar, 2012 6 commits
    • Hugh Dickins's avatar
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins authored
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
    • Oleg Nesterov's avatar
      vfork: kill PF_STARTING · 6e27f63e
      Oleg Nesterov authored
      Previously it was (ab)used by utrace.  Then it was wrongly used by the
      scheduler code.
      
      Currently it is not used, kill it before it finds the new erroneous user.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e27f63e
    • Oleg Nesterov's avatar
      coredump_wait: don't call complete_vfork_done() · 57b59c4a
      Oleg Nesterov authored
      Now that CLONE_VFORK is killable, coredump_wait() no longer needs
      complete_vfork_done().  zap_threads() should find and kill all tasks with
      the same ->mm, this includes our parent if ->vfork_done is set.
      
      mm_release() becomes the only caller, unexport complete_vfork_done().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57b59c4a
    • Oleg Nesterov's avatar
      vfork: make it killable · d68b46fe
      Oleg Nesterov authored
      Make vfork() killable.
      
      Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
      fails we do not return to the user-mode and never touch the memory shared
      with our child.
      
      However, in this case we should clear child->vfork_done before return, we
      use task_lock() in do_fork()->wait_for_vfork_done() and
      complete_vfork_done() to serialize with each other.
      
      Note: now that we use task_lock() we don't really need completion, we
      could turn task->vfork_done into "task_struct *wake_up_me" but this needs
      some complications.
      
      NOTE: this and the next patches do not affect in-kernel users of
      CLONE_VFORK, kernel threads run with all signals ignored including
      SIGKILL/SIGSTOP.
      
      However this is obviously the user-visible change.  Not only a fatal
      signal can kill the vforking parent, a sub-thread can do execve or
      exit_group() and kill the thread sleeping in vfork().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d68b46fe
    • Oleg Nesterov's avatar
      vfork: introduce complete_vfork_done() · c415c3b4
      Oleg Nesterov authored
      No functional changes.
      
      Move the clear-and-complete-vfork_done code into the new trivial helper,
      complete_vfork_done().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c415c3b4
    • Matthew Garrett's avatar
      kmsg_dump: don't run on non-error paths by default · c22ab332
      Matthew Garrett authored
      Since commit 04c6862c ("kmsg_dump: add kmsg_dump() calls to the
      reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
      run on normal paths including poweroff and reboot.
      
      This is less than ideal given pstore implementations that can only
      represent single backtraces, since a reboot may overwrite a stored oops
      before it's been picked up by userspace.  In addition, some pstore
      backends may have low performance and provide a significant delay in
      reboot as a result.
      
      This patch adds a printk.always_kmsg_dump kernel parameter (which can also
      be changed from userspace).  Without it, the code will only be run on
      failure paths rather than on normal paths.  The option can be enabled in
      environments where there's a desire to attempt to audit whether or not a
      reboot was cleanly requested or not.
      Signed-off-by: default avatarMatthew Garrett <mjg@redhat.com>
      Acked-by: default avatarSeiji Aguchi <seiji.aguchi@hds.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c22ab332
  3. 04 Mar, 2012 2 commits
  4. 02 Mar, 2012 7 commits
  5. 28 Feb, 2012 1 commit
  6. 26 Feb, 2012 1 commit
  7. 24 Feb, 2012 1 commit
    • Oleg Nesterov's avatar
      epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() · d80e731e
      Oleg Nesterov authored
      This patch is intentionally incomplete to simplify the review.
      It ignores ep_unregister_pollwait() which plays with the same wqh.
      See the next change.
      
      epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
      f_op->poll() needs. In particular it assumes that the wait queue
      can't go away until eventpoll_release(). This is not true in case
      of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
      which is not connected to the file.
      
      This patch adds the special event, POLLFREE, currently only for
      epoll. It expects that init_poll_funcptr()'ed hook should do the
      necessary cleanup. Perhaps it should be defined as EPOLLFREE in
      eventpoll.
      
      __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
      ->signalfd_wqh is not empty, we add the new signalfd_cleanup()
      helper.
      
      ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
      This make this poll entry inconsistent, but we don't care. If you
      share epoll fd which contains our sigfd with another process you
      should blame yourself. signalfd is "really special". I simply do
      not know how we can define the "right" semantics if it used with
      epoll.
      
      The main problem is, epoll calls signalfd_poll() once to establish
      the connection with the wait queue, after that signalfd_poll(NULL)
      returns the different/inconsistent results depending on who does
      EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
      has nothing to do with the file, it works with the current thread.
      
      In short: this patch is the hack which tries to fix the symptoms.
      It also assumes that nobody can take tasklist_lock under epoll
      locks, this seems to be true.
      
      Note:
      
      	- we do not have wake_up_all_poll() but wake_up_poll()
      	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.
      
      	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
      	  we need a couple of simple changes in eventpoll.c to
      	  make sure it can't be "lost".
      Reported-by: default avatarMaxime Bizon <mbizon@freebox.fr>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d80e731e
  8. 23 Feb, 2012 2 commits
  9. 21 Feb, 2012 5 commits
    • Linus Torvalds's avatar
      sys_poll: fix incorrect type for 'timeout' parameter · faf30900
      Linus Torvalds authored
      The 'poll()' system call timeout parameter is supposed to be 'int', not
      'long'.
      
      Now, the reason this matters is that right now 32-bit compat mode is
      broken on at least x86-64, because the 32-bit code just calls
      'sys_poll()' directly on x86-64, and the 32-bit argument will have been
      zero-extended, turning a signed 'int' into a large unsigned 'long'
      value.
      
      We could just introduce a 'compat_sys_poll()' function for this, and
      that may eventually be what we have to do, but since the actual standard
      poll() semantics is *supposed* to be 'int', and since at least on x86-64
      glibc sign-extends the argument before invocing the system call (so
      nobody can actually use a 64-bit timeout value in user space _anyway_,
      even in 64-bit binaries), the simpler solution would seem to be to just
      fix the definition of the system call to match what it should have been
      from the very start.
      
      If it turns out that somebody somehow circumvents the user-level libc
      64-bit sign extension and actually uses a large unsigned 64-bit timeout
      despite that not being how poll() is supposed to work, we will need to
      do the compat_sys_poll() approach.
      Reported-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      faf30900
    • Greg Rose's avatar
      rtnetlink: Fix problem with buffer allocation · 115c9b81
      Greg Rose authored
      Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
      is a 32 bit value that can be used to indicate to the kernel that
      certain extended ifinfo values are requested by the user application.
      At this time the only mask value defined is RTEXT_FILTER_VF to
      indicate that the user wants the ifinfo dump to send information
      about the VFs belonging to the interface.
      
      This patch fixes a bug in which certain applications do not have
      large enough buffers to accommodate the extra information returned
      by the kernel with large numbers of SR-IOV virtual functions.
      Those applications will not send the new netlink attribute with
      the interface info dump request netlink messages so they will
      not get unexpectedly large request buffers returned by the kernel.
      
      Modifies the rtnl_calcit function to traverse the list of net
      devices and compute the minimum buffer size that can hold the
      info dumps of all matching devices based upon the filter passed
      in via the new netlink attribute filter mask.  If no filter
      mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
      
      With this change it is possible to add yet to be defined netlink
      attributes to the dump request which should make it fairly extensible
      in the future.
      Signed-off-by: default avatarGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      115c9b81
    • Ming Lei's avatar
      percpu: use raw_local_irq_* in _this_cpu op · e920d597
      Ming Lei authored
      It doesn't make sense to trace irq off or do irq flags
      lock proving inside 'this_cpu' operations, so replace local_irq_*
      with raw_local_irq_* in 'this_cpu' op.
      
      Also the patch fixes onelockdep warning[1] by the replacement, see
      below:
      
      In commit: 933393f5(percpu:
      Remove irqsafe_cpu_xxx variants), local_irq_save/restore(flags) are
      added inside this_cpu_inc operation, so that trace_hardirqs_off_caller
      will be called by trace_hardirqs_on_caller directly because
      __debug_atomic_inc is implemented as this_cpu_inc, which may trigger
      the lockdep warning[1], for example in the below ARM scenary:
      
      	kernel_thread_helper	/*irq disabled*/
      		->trace_hardirqs_on_caller	/*hardirqs_enabled was set*/
      			->trace_hardirqs_off_caller	/*hardirqs_enabled cleared*/
      				__this_cpu_add(redundant_hardirqs_on)
      			->trace_hardirqs_off_caller	/*irq disabled, so call here*/
      
      The 'unannotated irqs-on' warning will be triggered somewhere because
      irq is just enabled after the irq trace in kernel_thread_helper.
      
      [1],
      [    0.162841] ------------[ cut here ]------------
      [    0.167694] WARNING: at kernel/lockdep.c:3493 check_flags+0xc0/0x1d0()
      [    0.174468] Modules linked in:
      [    0.177703] Backtrace:
      [    0.180328] [<c00171f0>] (dump_backtrace+0x0/0x110) from [<c0412320>] (dump_stack+0x18/0x1c)
      [    0.189086]  r6:c051f778 r5:00000da5 r4:00000000 r3:60000093
      [    0.195007] [<c0412308>] (dump_stack+0x0/0x1c) from [<c00410e8>] (warn_slowpath_common+0x54/0x6c)
      [    0.204223] [<c0041094>] (warn_slowpath_common+0x0/0x6c) from [<c0041124>] (warn_slowpath_null+0x24/0x2c)
      [    0.214111]  r8:00000000 r7:00000000 r6:ee069598 r5:60000013 r4:ee082000
      [    0.220825] r3:00000009
      [    0.223693] [<c0041100>] (warn_slowpath_null+0x0/0x2c) from [<c0088f38>] (check_flags+0xc0/0x1d0)
      [    0.232910] [<c0088e78>] (check_flags+0x0/0x1d0) from [<c008d348>] (lock_acquire+0x4c/0x11c)
      [    0.241668] [<c008d2fc>] (lock_acquire+0x0/0x11c) from [<c0415aa4>] (_raw_spin_lock+0x3c/0x74)
      [    0.250610] [<c0415a68>] (_raw_spin_lock+0x0/0x74) from [<c010a844>] (set_task_comm+0x20/0xc0)
      [    0.259521]  r6:ee069588 r5:ee0691c0 r4:ee082000
      [    0.264404] [<c010a824>] (set_task_comm+0x0/0xc0) from [<c0060780>] (kthreadd+0x28/0x108)
      [    0.272857]  r8:00000000 r7:00000013 r6:c0044a08 r5:ee0691c0 r4:ee082000
      [    0.279571] r3:ee083fe0
      [    0.282470] [<c0060758>] (kthreadd+0x0/0x108) from [<c0044a08>] (do_exit+0x0/0x6dc)
      [    0.290405]  r5:c0060758 r4:00000000
      [    0.294189] ---[ end trace 1b75b31a2719ed1c ]---
      [    0.299041] possible reason: unannotated irqs-on.
      [    0.303955] irq event stamp: 5
      [    0.307159] hardirqs last  enabled at (4): [<c001331c>] no_work_pending+0x8/0x2c
      [    0.314880] hardirqs last disabled at (5): [<c0089b08>] trace_hardirqs_on_caller+0x60/0x26c
      [    0.323547] softirqs last  enabled at (0): [<c003f754>] copy_process+0x33c/0xef4
      [    0.331207] softirqs last disabled at (0): [<  (null)>]   (null)
      [    0.337585] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e920d597
    • Konstantin Khlebnikov's avatar
      percpu: fix generic definition of __this_cpu_add_and_return() · 7d96b3e5
      Konstantin Khlebnikov authored
      This patch adds missed "__" into function prefix.
      Otherwise on all archectures (except x86) it expands to irq/preemtion-safe
      variant: _this_cpu_generic_add_return(), which do extra irq-save/irq-restore.
      Optimal generic implementation is __this_cpu_generic_add_return().
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      7d96b3e5
    • Joerg Willmann's avatar
      netfilter: ebtables: fix alignment problem in ppc · 88ba136d
      Joerg Willmann authored
      ebt_among extension of ebtables uses __alignof__(_xt_align) while the
      corresponding kernel module uses __alignof__(ebt_replace) to determine
      the alignment in EBT_ALIGN().
      
      These are the results of these values on different platforms:
      
      x86 x86_64 ppc
      __alignof__(_xt_align) 4 8 8
      __alignof__(ebt_replace) 4 8 4
      
      ebtables fails to add rules which use the among extension.
      
      I'm using kernel 2.6.33 and ebtables 2.0.10-4
      
      According to Bart De Schuymer, userspace alignment was changed to
      _xt_align to fix an alignment issue on a userspace32-kernel64 system
      (he thinks it was for an ARM device). So userspace must be right.
      The kernel alignment macro needs to change so it also uses _xt_align
      instead of ebt_replace. The userspace changes date back from
      June 29, 2009.
      Signed-off-by: default avatarJoerg Willmann <joe@clnt.de>
      Signed-off by: Bart De Schuymer <bdschuym@pandora.be>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      88ba136d
  10. 20 Feb, 2012 1 commit
  11. 15 Feb, 2012 4 commits
    • Alexey Dobriyan's avatar
      crypto: sha512 - use standard ror64() · f2ea0f5f
      Alexey Dobriyan authored
      Use standard ror64() instead of hand-written.
      There is no standard ror64, so create it.
      
      The difference is shift value being "unsigned int" instead of uint64_t
      (for which there is no reason). gcc starts to emit native ROR instructions
      which it doesn't do for some reason currently. This should make the code
      faster.
      
      Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      f2ea0f5f
    • Shawn Guo's avatar
      dt: add empty of_find_compatible_node function · 2261cc62
      Shawn Guo authored
      Add empty of_find_compatible_node function for !CONFIG_OF build.
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      2261cc62
    • Tejun Heo's avatar
      block: exit_io_context() should call elevator_exit_icq_fn() · 621032ad
      Tejun Heo authored
      While updating locking, b2efa052 "block, cfq: unlink
      cfq_io_context's immediately" moved elevator_exit_icq_fn() invocation
      from exit_io_context() to the final ioc put.  While this doesn't cause
      catastrophic failure, it effectively removes task exit notification to
      elevator and cause noticeable IO performance degradation with CFQ.
      
      On task exit, CFQ used to immediately expire the slice if it was being
      used by the exiting task as no more IO would be issued by the task;
      however, after b2efa052, the notification is lost and disk could sit
      idle needlessly, leading to noticeable IO performance degradation for
      certain workloads.
      
      This patch renames ioc_exit_icq() to ioc_destroy_icq(), separates
      elevator_exit_icq_fn() invocation into ioc_exit_icq() and invokes it
      from exit_io_context().  ICQ_EXITED flag is added to avoid invoking
      the callback more than once for the same icq.
      
      Walking icq_list from ioc side and invoking elevator callback requires
      reverse double locking.  This may be better implemented using RCU;
      unfortunately, using RCU isn't trivial.  e.g. RCU protection would
      need to cover request_queue and queue_lock switch on cleanup makes
      grabbing queue_lock from RCU unsafe.  Reverse double locking should
      do, at least for now.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-bisected-by: default avatarShaohua Li <shli@kernel.org>
      LKML-Reference: <CANejiEVzs=pUhQSTvUppkDcc2TNZyfohBRLygW5zFmXyk5A-xQ@mail.gmail.com>
      Tested-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      621032ad
    • Tejun Heo's avatar
      block: replace icq->changed with icq->flags · d705ae6b
      Tejun Heo authored
      icq->changed was used for ICQ_*_CHANGED bits.  Rename it to flags and
      access it under ioc->lock instead of using atomic bitops.
      ioc_get_changed() is added so that the changed part can be fetched and
      cleared as before.
      
      icq->flags will be used to carry other flags.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d705ae6b
  12. 13 Feb, 2012 5 commits
    • Jan Kara's avatar
      vfs: Provide function to get superblock and wait for it to thaw · 6b6dc836
      Jan Kara authored
      In quota code we need to find a superblock corresponding to a device and wait
      for superblock to be unfrozen. However this waiting has to happen without
      s_umount semaphore because that is required for superblock to thaw. So provide
      a function in VFS for this to keep dances with s_umount where they belong.
      
      [AV: implementation switched to saner variant]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6b6dc836
    • Seungwon Jeon's avatar
      mmc: dw_mmc: Fix PIO mode with support of highmem · f9c2a0dc
      Seungwon Jeon authored
      Current PIO mode makes a kernel crash with CONFIG_HIGHMEM.
      Highmem pages have a NULL from sg_virt(sg).
      This patch fixes the following problem.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c0004000
      [00000000] *pgd=00000000
      Internal error: Oops: 817 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0    Not tainted  (3.0.15-01423-gdbf465f #589)
      PC is at dw_mci_pull_data32+0x4c/0x9c
      LR is at dw_mci_read_data_pio+0x54/0x1f0
      pc : [<c0358824>]    lr : [<c035988c>]    psr: 20000193
      sp : c0619d48  ip : c0619d70  fp : c0619d6c
      r10: 00000000  r9 : 00000002  r8 : 00001000
      r7 : 00000200  r6 : 00000000  r5 : e1dd3100  r4 : 00000000
      r3 : 65622023  r2 : 0000007f  r1 : eeb96000  r0 : e1dd3100
      Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment
      xkernel
      Control: 10c5387d  Table: 61e2004a  DAC: 00000015
      Process swapper (pid: 0, stack limit = 0xc06182f0)
      Stack: (0xc0619d48 to 0xc061a000)
      9d40:                   e1dd3100 e1a4f000 00000000 e1dd3100 e1a4f000 00000200
      9d60: c0619da4 c0619d70 c035988c c03587e4 c0619d9c e18158f4 e1dd3100 e1dd3100
      9d80: 00000020 00000000 00000000 00000020 c06e8a84 00000000 c0619e04 c0619da8
      9da0: c0359b24 c0359844 e18158f4 e1dd3164 e1dd3168 e1dd3150 3d02fc79 e1dd3154
      9dc0: e1dd3178 00000000 00000020 00000000 e1dd3150 00000000 c10dd7e8 e1a84900
      9de0: c061e7cc 00000000 00000000 0000008d c06e8a84 c061e780 c0619e4c c0619e08
      9e00: c00c4738 c0359a34 3d02fc79 00000000 c0619e4c c05a1698 c05a1670 c05a165c
      9e20: c04de8b0 c061e780 c061e7cc e1a84900 ffffed68 0000008d c0618000 00000000
      9e40: c0619e6c c0619e50 c00c48b4 c00c46c8 c061e780 c00423ac c061e7cc ffffed68
      9e60: c0619e8c c0619e70 c00c7358 c00c487c 0000008d ffffee38 c0618000 ffffed68
      9e80: c0619ea4 c0619e90 c00c4258 c00c72b0 c00423ac ffffee38 c0619ecc c0619ea8
      9ea0: c004241c c00c4234 ffffffff f8810000 0000006d 00000002 00000001 7fffffff
      9ec0: c0619f44 c0619ed0 c0048bc0 c00423c4 220ae7a9 00000000 386f0d30 0005d3a4
      9ee0: c00423ac c10dd0b8 c06f2cd8 c0618000 c0594778 c003a674 7fffffff c0619f44
      9f00: 386f0d30 c0619f18 c00a6f94 c005be3c 80000013 ffffffff 386f0d30 0005d3a4
      9f20: 386f0d30 0005d2d1 c10dd0a8 c10dd0b8 c06f2cd8 c0618000 c0619f74 c0619f48
      9f40: c0345858 c005be00 c00a2440 c0618000 c0618000 c00410d8 c06c1944 c00410fc
      9f60: c0594778 c003a674 c0619f9c c0619f78 c004a7e8 c03457b4 c0618000 c06c18f8
      9f80: 00000000 c0039c70 c06c18d4 c003a674 c0619fb4 c0619fa0 c04ceafc c004a714
      9fa0: c06287b4 c06c18f8 c0619ff4 c0619fb8 c0008b68 c04cea68 c0008578 00000000
      9fc0: 00000000 c003a674 00000000 10c5387d c0628658 c003aa78 c062f1c4 4000406a
      9fe0: 413fc090 00000000 00000000 c0619ff8 40008044 c0008858 00000000 00000000
      Backtrace:
      [<c03587d8>] (dw_mci_pull_data32+0x0/0x9c) from [<c035988c>] (dw_mci_read_data_pio+0x54/0x1f0)
       r6:00000200 r5:e1a4f000 r4:e1dd3100
       [<c0359838>] (dw_mci_read_data_pio+0x0/0x1f0) from [<c0359b24>] (dw_mci_interrupt+0xfc/0x4a4)
      [<c0359a28>] (dw_mci_interrupt+0x0/0x4a4) from [<c00c4738>] (handle_irq_event_percpu+0x7c/0x1b4)
      [<c00c46bc>] (handle_irq_event_percpu+0x0/0x1b4) from [<c00c48b4>] (handle_irq_event+0x44/0x64)
      [<c00c4870>] (handle_irq_event+0x0/0x64) from [<c00c7358>] (handle_fasteoi_irq+0xb4/0x124)
       r7:ffffed68 r6:c061e7cc r5:c00423ac r4:c061e780
       [<c00c72a4>] (handle_fasteoi_irq+0x0/0x124) from [<c00c4258>] (generic_handle_irq+0x30/0x38)
       r7:ffffed68 r6:c0618000 r5:ffffee38 r4:0000008d
       [<c00c4228>] (generic_handle_irq+0x0/0x38) from [<c004241c>] (asm_do_IRQ+0x64/0xe0)
       r5:ffffee38 r4:c00423ac
       [<c00423b8>] (asm_do_IRQ+0x0/0xe0) from [<c0048bc0>] (__irq_svc+0x80/0x14c)
      Exception stack(0xc0619ed0 to 0xc0619f18)
      Signed-off-by: default avatarSeungwon Jeon <tgih.jun@samsung.com>
      Acked-by: default avatarWill Newton <will.newton@imgtec.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      f9c2a0dc
    • Girish K S's avatar
      mmc: core: Fix PowerOff Notify suspend/resume · 3e73c36b
      Girish K S authored
      Modified the mmc_poweroff to resume before sending the poweroff
      notification command. In sleep mode only AWAKE and RESET commands are
      allowed, so before sending the poweroff notification command resume from
      sleep mode and then send the notification command.
      
      PowerOff Notify is tested on a Synopsis Designware Host Controller
      (eMMC 4.5). The suspend to RAM and resume works fine.
      Signed-off-by: default avatarGirish K S <girish.shivananjappa@linaro.org>
      Tested-by: default avatarGirish K S <girish.shivananjappa@linaro.org>
      Reviewed-by: default avatarSaugata Das <saugata.das@linaro.org>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      3e73c36b
    • Jaehoon Chung's avatar
      mmc: core: add the capability for broken voltage · 6e8201f5
      Jaehoon Chung authored
      There is an understood mismatch between the voltage the host controller is
      set to and the voltage supplied to the card by a fixed voltage regulator.
      Teaching the driver to accept the mismatch is overly complicated.  Instead
      just accept the regulator's voltage.
      
      This patch adds MMC_CAP2_BROKEN_VOLTAGE.
      
      If the voltage didn't satisfy between min_uV and max_uV, try to change
      the voltage in core.c.  When changing the voltage, maybe use
      regulator_set_voltage().
      
      In regulator_set_voltage(), check the below condition.
      
      	/* sanity check */
      	if (!rdev->desc->ops->set_voltage &&
      	    !rdev->desc->ops->set_voltage_sel) {
      		ret = -EINVAL;
      		goto out;
      	}
      
      If some board should use the fixed-regulator, always return -EINVAL.
      Then, eMMC didn't initialize always.
      
      So if use a fixed-regulator, we need to add the MMC_CAP2_BROKEN_VOLTAGE.
      Signed-off-by: default avatarJaehoon Chung <jh80.chung@samsung.com>
      Signed-off-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      6e8201f5
    • Sujit Reddy Thumma's avatar
      mmc: core: Ensure clocks are always enabled before host interaction · 2c4967f7
      Sujit Reddy Thumma authored
      Ensure clocks are always enabled before any interaction with the
      host controller driver. This makes sure that there is no race
      between host execution and the core layer turning off clocks
      in different context with clock gating framework.
      Signed-off-by: default avatarSujit Reddy Thumma <sthumma@codeaurora.org>
      Acked-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarPer Forlin <per.forlin@stericsson.com>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      2c4967f7
  13. 10 Feb, 2012 1 commit
    • Sarah Sharp's avatar
      USB: Remove duplicate USB 3.0 hub feature #defines. · d9f5343e
      Sarah Sharp authored
      Somehow we ended up with duplicate hub feature #defines in ch11.h.
      Tatyana Brokhman first created the USB 3.0 hub feature macros in 2.6.38
      with commit 0eadcc09 "usb: USB3.0 ch11
      definitions".  In 2.6.39, I modified a patch from John Youn that added
      similar macros in a different place in the same file, and committed
      dbe79bbe "USB 3.0 Hub Changes".
      
      Some of the #defines used different names for the same values.  Others
      used exactly the same names with the same values, like these gems:
      
       #define USB_PORT_FEAT_BH_PORT_RESET     28
      ...
       #define USB_PORT_FEAT_BH_PORT_RESET            28
      
      According to my very geeky husband (who looked it up in the C99 spec),
      it is allowed to have object-like macros with duplicate names as long as
      the replacement list is exactly the same.  However, he recalled that
      some compilers will give warnings when they find duplicate macros.  It's
      probably best to remove the duplicates in the stable tree, so that the
      code compiles for everyone.
      
      The macros are now fixed to move the feature requests that are specific
      to USB 3.0 hubs into a new section (out of the USB 2.0 hub feature
      section), and use the most common macro name.
      
      This patch should be backported to 2.6.39.
      Signed-off-by: default avatarSarah Sharp <sarah.a.sharp@linux.intel.com>
      Cc: Tatyana Brokhman <tlinder@codeaurora.org>
      Cc: John Youn <johnyoun@synopsys.com>
      Cc: Jamey Sharp <jamey@minilop.net>
      Cc: stable@vger.kernel.org
      d9f5343e
  14. 08 Feb, 2012 3 commits
    • Paolo Bonzini's avatar
      cdrom: move shared static to cdrom_device_info · cdccaa94
      Paolo Bonzini authored
      The keeplocked variable in the cdrom driver is shared across multiple
      drives, but set in per-device ioctls.  Move it to the per-device struct,
      avoiding that the setting on one drive affects the driver's behavior
      when closing another.
      
      [ Impact: limit udev's confusion to one drive when a CD burning program
        unlocks the CD door at the end of burning. ]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cdccaa94
    • Tejun Heo's avatar
      block: don't call elevator callbacks for plug merges · 07c2bd37
      Tejun Heo authored
      Plug merge calls two elevator callbacks outside queue lock -
      elevator_allow_merge_fn() and elevator_bio_merged_fn().  Although
      attempt_plug_merge() suggests that elevator is guaranteed to be there
      through the existing request on the plug list, nothing prevents plug
      merge from calling into dying or initializing elevator.
      
      For regular merges, bypass ensures elvpriv count to reach zero, which
      in turn prevents merges as all !ELVPRIV requests get REQ_SOFTBARRIER
      from forced back insertion.  Plug merge doesn't check ELVPRIV, and, as
      the requests haven't gone through elevator insertion yet, it doesn't
      have SOFTBARRIER set allowing merges on a bypassed queue.
      
      This, for example, leads to the following crash during elevator
      switch.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
       IP: [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
       PGD 112cbc067 PUD 115d5c067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU 1
       Modules linked in: deadline_iosched
      
       Pid: 819, comm: dd Not tainted 3.3.0-rc2-work+ #76 Bochs Bochs
       RIP: 0010:[<ffffffff813b34e9>]  [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
       RSP: 0018:ffff8801143a38f8  EFLAGS: 00010297
       RAX: 0000000000000000 RBX: ffff88011817ce28 RCX: ffff880116eb6cc0
       RDX: 0000000000000000 RSI: ffff880118056e20 RDI: ffff8801199512f8
       RBP: ffff8801143a3908 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffff880118195708
       R13: ffff880118052aa0 R14: ffff8801143a3d50 R15: ffff880118195708
       FS:  00007f19f82cb700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000008 CR3: 0000000112c6a000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process dd (pid: 819, threadinfo ffff8801143a2000, task ffff880116eb6cc0)
       Stack:
        ffff88011817ce28 ffff880118195708 ffff8801143a3928 ffffffff81391bba
        ffff88011817ce28 ffff880118195708 ffff8801143a3948 ffffffff81391bf1
        ffff88011817ce28 0000000000000000 ffff8801143a39a8 ffffffff81398e3e
       Call Trace:
        [<ffffffff81391bba>] elv_rq_merge_ok+0x4a/0x60
        [<ffffffff81391bf1>] elv_try_merge+0x21/0x40
        [<ffffffff81398e3e>] blk_queue_bio+0x8e/0x390
        [<ffffffff81396a5a>] generic_make_request+0xca/0x100
        [<ffffffff81396b04>] submit_bio+0x74/0x100
        [<ffffffff811d45c2>] __blockdev_direct_IO+0x1ce2/0x3450
        [<ffffffff811d0dc7>] blkdev_direct_IO+0x57/0x60
        [<ffffffff811460b5>] generic_file_aio_read+0x6d5/0x760
        [<ffffffff811986b2>] do_sync_read+0xe2/0x120
        [<ffffffff81199345>] vfs_read+0xc5/0x180
        [<ffffffff81199501>] sys_read+0x51/0x90
        [<ffffffff81aeac12>] system_call_fastpath+0x16/0x1b
      
      There are multiple ways to fix this including making plug merge check
      ELVPRIV; however,
      
      * Calling into elevator outside queue lock is confusing and
        error-prone.
      
      * Requests on plug list aren't known to the elevator.  They aren't on
        the elevator yet, so there's no elevator specific state to update.
      
      * Given the nature of plug merges - collecting bio's for the same
        purpose from the same issuer - elevator specific restrictions aren't
        applicable.
      
      So, simply don't call into elevator methods from plug merge by moving
      elv_bio_merged() from bio_attempt_*_merge() to blk_queue_bio(), and
      using blk_try_merge() in attempt_plug_merge().
      
      This is based on Jens' patch to skip elevator_allow_merge_fn() from
      plug merge.
      
      Note that this makes per-cgroup merged stats skip plug merging.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      LKML-Reference: <4F16F3CA.90904@kernel.dk>
      Original-patch-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      07c2bd37
    • Tejun Heo's avatar
      block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions · 050c8ea8
      Tejun Heo authored
      blk_rq_merge_ok() is the elevator-neutral part of merge eligibility
      test.  blk_try_merge() determines merge direction and expects the
      caller to have tested elv_rq_merge_ok() previously.
      
      elv_rq_merge_ok() now wraps blk_rq_merge_ok() and then calls
      elv_iosched_allow_merge().  elv_try_merge() is removed and the two
      callers are updated to call elv_rq_merge_ok() explicitly followed by
      blk_try_merge().  While at it, make rq_merge_ok() functions return
      bool.
      
      This is to prepare for plug merge update and doesn't introduce any
      behavior change.
      
      This is based on Jens' patch to skip elevator_allow_merge_fn() from
      plug merge.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      LKML-Reference: <4F16F3CA.90904@kernel.dk>
      Original-patch-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      050c8ea8