1. 11 Apr, 2014 2 commits
    • David S. Miller's avatar
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller authored
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      676d2369
    • Dave Hansen's avatar
      mm: slab/slub: use page->list consistently instead of page->lru · 34bf6ef9
      Dave Hansen authored
      'struct page' has two list_head fields: 'lru' and 'list'.  Conveniently,
      they are unioned together.  This means that code can use them
      interchangably, which gets horribly confusing like with this nugget from
      slab.c:
      
      >	list_del(&page->lru);
      >	if (page->active == cachep->num)
      >		list_add(&page->list, &n->slabs_full);
      
      This patch makes the slab and slub code use page->lru universally instead
      of mixing ->list and ->lru.
      
      So, the new rule is: page->lru is what the you use if you want to keep
      your page on a list.  Don't like the fact that it's not called ->list?
      Too bad.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      34bf6ef9
  2. 10 Apr, 2014 5 commits
  3. 09 Apr, 2014 8 commits
    • Jens Axboe's avatar
      block: fix regression with block enabled tagging · 360f92c2
      Jens Axboe authored
      Martin reported that his test system would not boot with
      current git, it oopsed with this:
      
      BUG: unable to handle kernel paging request at ffff88046c6c9e80
      IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
      PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060
      Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm
      rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon
      CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246
      Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
      Workqueue: events_unbound async_run_entry_fn
      task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000
      RIP: 0010:[<ffffffff812971e0>]  [<ffffffff812971e0>]
      blk_queue_start_tag+0x90/0x150
      RSP: 0018:ffff880273d03a58  EFLAGS: 00010092
      RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6
      RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d
      RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410
      R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0
      R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e
      FS:  0000000000000000(0000) GS:ffff880277b00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0
      Stack:
       ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000
       ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62
       ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8
      Call Trace:
       [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510
       [<ffffffff81293167>] __blk_run_queue+0x37/0x50
       [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130
       [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0
       [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0
       [<ffffffff813bba35>] scsi_execute+0xe5/0x180
       [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110
       [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod]
       [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0
       [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod]
       [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod]
       [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140
       [<ffffffff8106a1c5>] process_one_work+0x175/0x410
       [<ffffffff8106b373>] worker_thread+0x123/0x400
       [<ffffffff8106b250>] ? manage_workers+0x160/0x160
       [<ffffffff8107104e>] kthread+0xce/0xf0
       [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0
       [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
      Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89
      df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89
      58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d
      RIP  [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
       RSP <ffff880273d03a58>
      CR2: ffff88046c6c9e80
      
      Martin bisected and found this to be the problem patch;
      
      	commit 6d113398
      	Author: Jan Kara <jack@suse.cz>
      	Date:   Mon Feb 24 16:39:54 2014 +0100
      
      	    block: Stop abusing rq->csd.list in blk-softirq
      
      and the problem was immediately apparent. The patch states that
      it is safe to reuse queuelist at completion time, since it is
      no longer used. However, that is not true if a device is using
      block enabled tagging. If that is the case, then the queuelist
      is reused to keep track of busy tags. If a device also ended
      up using softirq completions, we'd reuse ->queuelist for the
      IPI handling while block tagging was still using it. Boom.
      
      Fix this by adding a new ipi_list list head, and share the
      memory used with the request hash table. The hash table is
      never used after the request is moved to the dispatch list,
      which happens long before any potential completion of the
      request. Add a new request bit for this, so we don't have
      cases that check rq->hash while it could potentially have
      been reused for the IPI completion.
      Reported-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Tested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      360f92c2
    • Martin K. Petersen's avatar
      scsi: Make sure cmd_flags are 64-bit · 2bfad21e
      Martin K. Petersen authored
      cmd_flags in struct request is now 64 bits wide but the scsi_execute
      functions truncated arguments passed to int leading to errors. Make sure
      the flags parameters are u64.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Jens Axboe <axboe@fb.com>
      CC: Jan Kara <jack@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2bfad21e
    • Mathieu Desnoyers's avatar
      tracing: Fix anonymous unions in struct ftrace_event_call · abb43f69
      Mathieu Desnoyers authored
      gcc <= 4.5.x has significant limitations with respect to initialization
      of anonymous unions within structures. They need to be surrounded by
      brackets, _and_ they need to be initialized in the same order in which
      they appear in the structure declaration.
      
      Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676
      Link: http://lkml.kernel.org/r/1397077568-3156-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      abb43f69
    • Behan Webster's avatar
      x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id" · c4586256
      Behan Webster authored
      Similar to the fix in 40413dcb
      
      MODULE_DEVICE_TABLE(x86cpu, ...) expects the struct to be called struct
      x86cpu_device_id, and not struct x86_cpu_id which is what is used in the rest
      of the kernel code.  Although gcc seems to ignore this error, clang fails
      without this define to fix the name.
      
      Code from drivers/thermal/x86_pkg_temp_thermal.c
      static const struct x86_cpu_id __initconst pkg_temp_thermal_ids[] = { ... };
      MODULE_DEVICE_TABLE(x86cpu, pkg_temp_thermal_ids);
      
      Error from clang:
      drivers/thermal/x86_pkg_temp_thermal.c:577:1: error: variable has
            incomplete type 'const struct x86cpu_device_id'
      MODULE_DEVICE_TABLE(x86cpu, pkg_temp_thermal_ids);
      ^
      include/linux/module.h:145:3: note: expanded from macro
            'MODULE_DEVICE_TABLE'
        MODULE_GENERIC_TABLE(type##_device, name)
        ^
      include/linux/module.h:87:32: note: expanded from macro
            'MODULE_GENERIC_TABLE'
      extern const struct gtype##_id __mod_##gtype##_table            \
                                     ^
      <scratch space>:143:1: note: expanded from here
      __mod_x86cpu_device_table
      ^
      drivers/thermal/x86_pkg_temp_thermal.c:577:1: note: forward declaration of
            'struct x86cpu_device_id'
      include/linux/module.h:145:3: note: expanded from macro
            'MODULE_DEVICE_TABLE'
        MODULE_GENERIC_TABLE(type##_device, name)
        ^
      include/linux/module.h:87:21: note: expanded from macro
            'MODULE_GENERIC_TABLE'
      extern const struct gtype##_id __mod_##gtype##_table            \
                          ^
      <scratch space>:141:1: note: expanded from here
      x86cpu_device_id
      ^
      1 error generated.
      Signed-off-by: default avatarBehan Webster <behanw@converseincode.com>
      Signed-off-by: default avatarJan-Simon Möller <dl9pf@gmx.de>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4586256
    • Mark Charlebois's avatar
      LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h · 565cbdc2
      Mark Charlebois authored
      Add a compiler-clang.h file to add specific macros needed for compiling the
      kernel with clang.
      
      Initially the only override required is the macro for silencing the
      compiler for a purposefully uninintialized variable.
      
      Author: Mark Charlebois <charlebm@gmail.com>
      Signed-off-by: default avatarMark Charlebois <charlebm@gmail.com>
      Signed-off-by: default avatarBehan Webster <behanw@converseincode.com>
      565cbdc2
    • Behan Webster's avatar
      LLVMLinux: Remove warning about returning an uninitialized variable · aa93685a
      Behan Webster authored
      Fix uninitialized return code in default case in cmpxchg-local.h
      
      This patch fixes the code to prevent an uninitialized return value that is detected
      when compiling with clang. The bug produces numerous warnings when compiling the
      Linux kernel with clang.
      Signed-off-by: default avatarBehan Webster <behanw@converseincode.com>
      Signed-off-by: default avatarMark Charlebois <charlebm@gmail.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      aa93685a
    • Mathieu Desnoyers's avatar
      tracepoint: Fix sparse warnings in tracepoint.c · b725dfea
      Mathieu Desnoyers authored
      Fix the following sparse warnings:
      
        CHECK   kernel/tracepoint.c
      kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:184:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:184:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:216:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:216:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:392:24: error: return expression in void function
        CC      kernel/tracepoint.o
      kernel/tracepoint.c: In function tracepoint_module_going:
      kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static?
      kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static?
      
      Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b725dfea
    • Martin K. Petersen's avatar
      block: Fix for_each_bvec() · b7aa84d9
      Martin K. Petersen authored
      Commit 4550dd6c introduced for_each_bvec() which iterates over each
      bvec attached to a bio or bip. However, the macro fails to check bi_size
      before dereferencing which can lead to crashes while counting/mapping
      integrity scatterlist segments.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Nicholas Bellinger <nab@linux-iscsi.org>
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b7aa84d9
  4. 08 Apr, 2014 3 commits
  5. 07 Apr, 2014 22 commits
    • Mark Salter's avatar
      mm: create generic early_ioremap() support · 9e5c33d7
      Mark Salter authored
      This patch creates a generic implementation of early_ioremap() support
      based on the existing x86 implementation.  early_ioremp() is useful for
      early boot code which needs to temporarily map I/O or memory regions
      before normal mapping functions such as ioremap() are available.
      
      Some architectures have optional MMU.  In the no-MMU case, the remap
      functions simply return the passed in physical address and the unmap
      functions do nothing.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e5c33d7
    • Josh Triplett's avatar
      lglock: map to spinlock when !CONFIG_SMP · 64b47e8f
      Josh Triplett authored
      When the system has only one CPU, lglock is effectively a spinlock; map
      it directly to spinlock to eliminate the indirection and duplicate code.
      
      In addition to removing overhead, this drops 1.6k of code with a
      defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64b47e8f
    • Christoph Lameter's avatar
      percpu: add preemption checks to __this_cpu ops · 188a8140
      Christoph Lameter authored
      We define a check function in order to avoid trouble with the include
      files.  Then the higher level __this_cpu macros are modified to invoke
      the preemption check.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      188a8140
    • Christoph Lameter's avatar
      vmstat: use raw_cpu_ops to avoid false positives on preemption checks · 293b6a4c
      Christoph Lameter authored
      vm counters are allowed to be racy.  Use raw_cpu_ops to avoid the
      local_irq_disable overhead and to avoid preemption checks which will be
      added to the __this_cpu operations.
      
      [akpm@linux-foundation.org: Add comment.  Again.]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      293b6a4c
    • Christoph Lameter's avatar
      mm: use raw_cpu ops for determining current NUMA node · dc322a99
      Christoph Lameter authored
      With the preempt checking logic for __this_cpu_ops we will get false
      positives from locations in the code that use numa_node_id.
      
      Before the __this_cpu ops where introduced there were no checks for
      preemption present either.  smp_raw_processor_id() was used.  See
      
        http://www.spinics.net/lists/linux-numa/msg00641.html
      
      Therefore we need to use raw_cpu_read here to avoid false postives.
      
      Note that this issue has been discussed in prior years.  If the process
      changes nodes after retrieving the current numa node then that is
      acceptable since most uses of numa_node etc are for optimization and not
      for correctness.
      
      There were suggestions to implement a raw_numa_node_id in order to do
      preempt checks for numa_node_id as well.  But I think we better defer
      that to another patch since that would mean investigating how
      numa_node_id() is used throughout the kernel which would increase the
      scope of this patchset significantly.  After all preemption was never
      checked before when numa_node_id() was used.
      
      Some sample traces:
      
      __this_cpu_read operation in preemptible [00000000] code: login/1456
      caller is __this_cpu_preempt_check+0x2b/0x2d
      CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3b-dirty #185
      Call Trace:
        dump_stack+0x4e/0x82
        check_preemption_disabled+0xc5/0xe0
        __this_cpu_preempt_check+0x2b/0x2d
        get_task_policy+0x1d/0x49
        get_vma_policy+0x14/0x76
        alloc_pages_vma+0x35/0xff
        handle_mm_fault+0x290/0x73b
        __do_page_fault+0x3fe/0x44d
        do_page_fault+0x9/0xc
        page_fault+0x22/0x30
        generic_file_aio_read+0x38e/0x624
        do_sync_read+0x54/0x73
        vfs_read+0x9d/0x12a
        SyS_read+0x47/0x7e
        cstar_dispatch+0x7/0x23
      
      caller is __this_cpu_preempt_check+0x2b/0x2d
      CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3b-dirty #185
      Call Trace:
        dump_stack+0x4e/0x82
        check_preemption_disabled+0xc5/0xe0
        __this_cpu_preempt_check+0x2b/0x2d
        alloc_pages_current+0x8f/0xbc
        __page_cache_alloc+0xb/0xd
        __do_page_cache_readahead+0xf4/0x219
        ra_submit+0x1c/0x20
        ondemand_readahead+0x28c/0x2b4
        page_cache_sync_readahead+0x38/0x3a
        generic_file_aio_read+0x261/0x624
        do_sync_read+0x54/0x73
        vfs_read+0x9d/0x12a
        SyS_read+0x47/0x7e
        cstar_dispatch+0x7/0x23
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc322a99
    • Christoph Lameter's avatar
      percpu: add raw_cpu_ops · b3ca1c10
      Christoph Lameter authored
      The kernel has never been audited to ensure that this_cpu operations are
      consistently used throughout the kernel.  The code generated in many
      places can be improved through the use of this_cpu operations (which
      uses a segment register for relocation of per cpu offsets instead of
      performing address calculations).
      
      The patch set also addresses various consistency issues in general with
      the per cpu macros.
      
      A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
         because checks are skipped. This is typically shown through a raw_
         prefix. So this patch set changes the places where __this_cpu_ptr()
         is used to raw_cpu_ptr().
      
      B. There has been the long term wish by some that __this_cpu operations
         would check for preemption. However, there are cases where preemption
         checks need to be skipped. This patch set adds raw_cpu operations that
         do not check for preemption and then adds preemption checks to the
         __this_cpu operations.
      
      C. The use of __get_cpu_var is always a reference to a percpu variable
         that can also be handled via a this_cpu operation. This patch set
         replaces all uses of __get_cpu_var with this_cpu operations.
      
      D. We can then use this_cpu RMW operations in various places replacing
         sequences of instructions by a single one.
      
      E. The use of this_cpu operations throughout will allow other arches than
         x86 to implement optimized references and RMV operations to work with
         per cpu local data.
      
      F. The use of this_cpu operations opens up the possibility to
         further optimize code that relies on synchronization through
         per cpu data.
      
      The patch set works in a couple of stages:
      
      I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
          Also converts the existing __this_cpu_xx_# primitive in the x86
          code to raw_cpu_xx_#.
      
      II. Patch 2-4 use the raw_cpu operations in places that would give
           us false positives once they are enabled.
      
      III. Patch 5 adds preemption checks to __this_cpu operations to allow
          checking if preemption is properly disabled when these functions
          are used.
      
      IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
         with this_cpu_ptr. They do not depend on any changes to the percpu
         code. No preemption tests are skipped if they are applied.
      
      V. Patches 21-46 are conversion patches that use this_cpu operations
         in various kernel subsystems/drivers or arch code.
      
      VI.  Patches 47/48 (not included in this series) remove no longer used
          functions (__this_cpu_ptr and __get_cpu_var).  These should only be
          applied after all the conversion patches have made it and after we
          have done additional passes through the kernel to ensure that none of
          the uses of these functions remain.
      
      This patch (of 46):
      
      The patches following this one will add preemption checks to __this_cpu
      ops so we need to have an alternative way to use this_cpu operations
      without preemption checks.
      
      raw_cpu_ops will be the basis for all other ops since these will be the
      operations that do not implement any checks.
      
      Primitive operations are renamed by this patch from __this_cpu_xxx to
      raw_cpu_xxxx.
      
      Also change the uses of the x86 percpu primitives in preempt.h.
      These depend directly on asm/percpu.h (header #include nesting issue).
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bryan Wu <cooloney@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3ca1c10
    • Vladimir Davydov's avatar
      slub: rework sysfs layout for memcg caches · 9a41707b
      Vladimir Davydov authored
      Currently, we try to arrange sysfs entries for memcg caches in the same
      manner as for global caches.  Apart from turning /sys/kernel/slab into a
      mess when there are a lot of kmem-active memcgs created, it actually
      does not work properly - we won't create more than one link to a memcg
      cache in case its parent is merged with another cache.  For instance, if
      A is a root cache merged with another root cache B, we will have the
      following sysfs setup:
      
        X
        A -> X
        B -> X
      
      where X is some unique id (see create_unique_id()).  Now if memcgs M and
      N start to allocate from cache A (or B, which is the same), we will get:
      
        X
        X:M
        X:N
        A -> X
        B -> X
        A:M -> X:M
        A:N -> X:N
      
      Since B is an alias for A, we won't get entries B:M and B:N, which is
      confusing.
      
      It is more logical to have entries for memcg caches under the
      corresponding root cache's sysfs directory.  This would allow us to keep
      sysfs layout clean, and avoid such inconsistencies like one described
      above.
      
      This patch does the trick.  It creates a "cgroup" kset in each root
      cache kobject to keep its children caches there.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a41707b
    • Vladimir Davydov's avatar
      memcg, slab: do not destroy children caches if parent has aliases · b8529907
      Vladimir Davydov authored
      Currently we destroy children caches at the very beginning of
      kmem_cache_destroy().  This is wrong, because the root cache will not
      necessarily be destroyed in the end - if it has aliases (refcount > 0),
      kmem_cache_destroy() will simply decrement its refcount and return.  In
      this case, at best we will get a bunch of warnings in dmesg, like this
      one:
      
        kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
        CPU: 1 PID: 7139 Comm: modprobe Tainted: G    B   W    3.13.0+ #117
        Call Trace:
          dump_stack+0x49/0x5b
          kmem_cache_destroy+0xdf/0xf0
          kmem_cache_destroy_memcg_children+0x97/0xc0
          kmem_cache_destroy+0xf/0xf0
          xfs_mru_cache_uninit+0x21/0x30 [xfs]
          exit_xfs_fs+0x2e/0xc44 [xfs]
          SyS_delete_module+0x198/0x1f0
          system_call_fastpath+0x16/0x1b
      
      At worst - if kmem_cache_destroy() will race with an allocation from a
      memcg cache - the kernel will panic.
      
      This patch fixes this by moving children caches destruction after the
      check if the cache has aliases.  Plus, it forbids destroying a root
      cache if it still has children caches, because each children cache keeps
      a reference to its parent.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8529907
    • Vladimir Davydov's avatar
      memcg, slab: separate memcg vs root cache creation paths · 794b1248
      Vladimir Davydov authored
      Memcg-awareness turned kmem_cache_create() into a dirty interweaving of
      memcg-only and except-for-memcg calls.  To clean this up, let's move the
      code responsible for memcg cache creation to a separate function.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      794b1248
    • Vladimir Davydov's avatar
      memcg, slab: cleanup memcg cache creation · 5722d094
      Vladimir Davydov authored
      This patch cleans up the memcg cache creation path as follows:
      
      - Move memcg cache name creation to a separate function to be called
        from kmem_cache_create_memcg().  This allows us to get rid of the mutex
        protecting the temporary buffer used for the name formatting, because
        the whole cache creation path is protected by the slab_mutex.
      
      - Get rid of memcg_create_kmem_cache().  This function serves as a proxy
        to kmem_cache_create_memcg().  After separating the cache name creation
        path, it would be reduced to a function call, so let's inline it.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5722d094
    • Uwe Kleine-König's avatar
      Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP · ce816fa8
      Uwe Kleine-König authored
      If the renamed symbol is defined lib/iomap.c implements ioport_map and
      ioport_unmap and currently (nearly) all platforms define the port
      accessor functions outb/inb and friend unconditionally.  So
      HAS_IOPORT_MAP is the better name for this.
      
      Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.
      
      The motivation for this change is to reintroduce a symbol HAS_IOPORT
      that signals if outb/int et al are available.  I will address that at
      least one merge window later though to keep surprises to a minimum and
      catch new introductions of (HAS|NO)_IOPORT.
      
      The changes in this commit were done using:
      
      	$ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/'
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce816fa8
    • Josh Triplett's avatar
      bug: Make BUG() always stop the machine · a4b5d580
      Josh Triplett authored
      When !CONFIG_BUG and !HAVE_ARCH_BUG, define the generic BUG() as an
      infinite loop rather than a no-op.  This avoids undefined behavior if
      execution ever actually reaches BUG(), and avoids warnings about code
      after BUG() (such as on non-void functions calling BUG() and then not
      returning).
      
      bloat-o-meter results:
      
        add/remove: 0/0 grow/shrink: 43/10 up/down: 235/-98 (137)
        function                             old     new   delta
        umount_collect                       119     138     +19
        notify_change                        306     324     +18
        xstate_enable_boot_cpu               252     269     +17
        kunmap                                54      70     +16
        balloon_page_dequeue                 112     126     +14
        mm_take_all_locks                    223     233     +10
        list_lru_walk_node                   143     152      +9
        vma_adjust                          1059    1067      +8
        pcpu_setup_first_chunk              1130    1138      +8
        mm_drop_all_locks                    143     151      +8
        ns_capable                            55      62      +7
        anon_transport_class_unregister        8      15      +7
        srcu_init_notifier_head               35      41      +6
        shrink_dcache_for_umount             174     180      +6
        kunmap_high                           99     105      +6
        end_page_writeback                    43      49      +6
        do_exit                             1339    1345      +6
        __kfifo_dma_out_prepare_r             86      92      +6
        __kfifo_dma_in_prepare_r              90      96      +6
        fixup_user_fault                     120     125      +5
        repair_env_string                     73      77      +4
        read_cache_pages_invalidate_page      56      60      +4
        isolate_lru_pages.isra               142     146      +4
        do_notify_parent_cldstop             255     259      +4
        cpu_init                             370     374      +4
        utimes_common                        270     272      +2
        tasklet_hi_action                     91      93      +2
        tasklet_action                        91      93      +2
        set_pte_vaddr                         46      48      +2
        find_get_pages_tag                   202     204      +2
        early_iounmap                        185     187      +2
        __native_set_fixmap                   36      38      +2
        __get_user_pages                     822     824      +2
        __early_ioremap                      299     301      +2
        yield_task_stop                        1       2      +1
        tick_resume                           37      38      +1
        switched_to_stop                       1       2      +1
        switched_to_idle                       1       2      +1
        prio_changed_stop                      1       2      +1
        prio_changed_idle                      1       2      +1
        pm_qos_power_read                    111     112      +1
        arch_cpu_idle_dead                     1       2      +1
        __insert_vmap_area                   140     141      +1
        sys_renameat                         614     612      -2
        mm_fault_error                       297     295      -2
        SyS_renameat                         614     612      -2
        sys_linkat                           416     413      -3
        SyS_linkat                           416     413      -3
        chmod_common                         129     122      -7
        proc_cap_handler                     240     225     -15
        __schedule                           849     831     -18
        sys_madvise                         1077    1054     -23
        SyS_madvise                         1077    1054     -23
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4b5d580
    • Josh Triplett's avatar
      bug: when !CONFIG_BUG, make WARN call no_printk to check format and args · 4e50ebde
      Josh Triplett authored
      The stub version of WARN for !CONFIG_BUG completely ignored its format
      string and subsequent arguments; make it check them instead, using
      no_printk.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e50ebde
    • Josh Triplett's avatar
    • Josh Triplett's avatar
      bug: when !CONFIG_BUG, simplify WARN_ON_ONCE and family · b607e70e
      Josh Triplett authored
      When !CONFIG_BUG, WARN_ON and family become simple passthroughs of their
      condition argument; however, WARN_ON_ONCE and family still have conditions
      and a boolean to detect one-time invocation, even though the warning
      they'd emit doesn't exist.  Make the existing definitions conditional on
      CONFIG_BUG, and add definitions for !CONFIG_BUG that map to the
      passthrough versions of WARN and WARN_ON.
      
      This saves 4.4k on a minimized configuration (smaller than allnoconfig),
      and 20.6k with defconfig plus CONFIG_BUG=n.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b607e70e
    • Alexandre Bounine's avatar
      rapidio: rework device hierarchy and introduce mport class of devices · 2aaf308b
      Alexandre Bounine authored
      This patch removes an artificial RapidIO bus root device and establishes
      actual device hierarchy by providing reference to real parent devices.
      It also introduces device class for RapidIO controller devices (on-chip
      or an eternal bridge, known as "mport").
      
      Existing implementation was sufficient for SoC-based platforms that have
      a single RapidIO controller.  With introduction of devices using
      multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme
      is very limiting or does not work at all.  The implemented changes allow
      to properly reference platform's local RapidIO mport devices and provide
      device details needed for upper layers.
      
      This change to RapidIO device hierarchy does not break any known
      existing kernel or user space interfaces.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
      Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com>
      Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2aaf308b
    • Stephen Hemminger's avatar
      idr: remove dead code · 90ae3ae5
      Stephen Hemminger authored
      Remove no longer used deprecated code, and make local functions
      static.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarJean Delvare <jdelvare@suse.de>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: George Spelvin <linux@horizon.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90ae3ae5
    • Rashika Kheria's avatar
      include/linux/crash_dump.h: add vmcore_cleanup() prototype · 82e0703b
      Rashika Kheria authored
      Eliminate the following warning in proc/vmcore.c:
      
        fs/proc/vmcore.c:1088:6: warning: no previous prototype for `vmcore_cleanup' [-Wmissing-prototypes]
      
      [akpm@linux-foundation.org: clean up powerpc, remove unneeded EXPORT_SYMBOL]
      Signed-off-by: default avatarRashika Kheria <rashika.kheria@gmail.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82e0703b
    • Oleg Nesterov's avatar
      wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide EXIT_TRACE from user-space · ad86622b
      Oleg Nesterov authored
      get_task_state() uses the most significant bit to report the state to
      user-space, this means that EXIT_ZOMBIE->EXIT_TRACE->EXIT_DEAD transition
      can be noticed via /proc as Z -> X -> Z change.  Note that this was
      possible even before EXIT_TRACE was introduced.
      
      This is not really bad but imho it make sense to hide EXIT_TRACE from
      user-space completely.  So the patch simply swaps EXIT_ZOMBIE and
      EXIT_DEAD, this way EXIT_TRACE will be seen as EXIT_ZOMBIE by user-space.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad86622b
    • Oleg Nesterov's avatar
      wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition · abd50b39
      Oleg Nesterov authored
      wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
      drops tasklist_lock.  If this task is not the natural child and it is
      traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
      
      The last transition is racy, this is even documented in 50b8d257
      "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
      race".  wait_consider_task() tries to detect this transition and clear
      ->notask_error but we can't rely on ptrace_reparented(), debugger can
      exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
      
      And there is another problem which were missed before: this transition
      can also race with reparent_leader() which doesn't reset >exit_signal if
      EXIT_DEAD, assuming that this task must be reaped by someone else.  So
      the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
      /sbin/init doesn't use __WALL it becomes unreapable.  This was fixed by
      the previous commit, but it was the temporary hack.
      
      1. Add the new exit_state, EXIT_TRACE. It means that the task is the
         traced zombie, debugger is going to detach and notify its natural
         parent.
      
         This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
         can avoid the changes in proc/kgdb code, get_task_state() still
         reports "X (dead)" in this case.
      
         Note: with or without this change userspace can see Z -> X -> Z
         transition. Not really bad, but probably makes sense to fix.
      
      2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
         if we need to notify the ->real_parent.
      
      3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
         is always the final state we can safely ignore such a task.
      
      4. Change wait_consider_task() to check EXIT_TRACE separately and kill
         the racy and no longer needed ptrace_reparented() case.
      
         If ptrace == T an EXIT_TRACE thread should be simply ignored, the
         owner of this state is going to ptrace_unlink() this task. We can
         pretend that it was already removed from ->ptraced list.
      
         Otherwise we should skip this thread too but clear ->notask_error,
         we must be the natural parent and debugger is going to untrace and
         notify us. IOW, this doesn't differ from "EXIT_ZOMBIE && p->ptrace"
         even if the task was already untraced.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarJan Kratochvil <jan.kratochvil@redhat.com>
      Reported-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Tested-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abd50b39
    • Oleg Nesterov's avatar
      exec: kill bprm->tcomm[], simplify the "basename" logic · 23aebe16
      Oleg Nesterov authored
      Starting from commit c4ad8f98 ("execve: use 'struct filename *' for
      executable name passing") bprm->filename can not go away after
      flush_old_exec(), so we do not need to save the binary name in
      bprm->tcomm[] added by 96e02d15 ("exec: fix use-after-free bug in
      setup_new_exec()").
      
      And there was never need for filename_to_taskname-like code, we can
      simply do set_task_comm(kbasename(filename).
      
      This patch has to change set_task_comm() and trace_task_rename() to
      accept "const char *", but I think this change is also good.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23aebe16
    • Srikar Dronamraju's avatar
      numa: use LAST_CPUPID_SHIFT to calculate LAST_CPUPID_MASK · 834a964a
      Srikar Dronamraju authored
      LAST_CPUPID_MASK is calculated using LAST_CPUPID_WIDTH.  However
      LAST_CPUPID_WIDTH itself can be 0.  (when LAST_CPUPID_NOT_IN_PAGE_FLAGS is
      set).  In such a case LAST_CPUPID_MASK turns out to be 0.
      
      But with recent commit 1ae71d03: (mm: numa: bugfix for
      LAST_CPUPID_NOT_IN_PAGE_FLAGS) if LAST_CPUPID_MASK is 0,
      page_cpupid_xchg_last() and page_cpupid_reset_last() causes
      page->_last_cpupid to be set to 0.
      
      This causes performance regression. Its almost as if numa_balancing is
      off.
      
      Fix LAST_CPUPID_MASK by using LAST_CPUPID_SHIFT instead of
      LAST_CPUPID_WIDTH.
      
      Some performance numbers and perf stats with and without the fix.
      
      (3.14-rc6)
      ----------
      numa01
      
       Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa01':
      
               12,27,462 cs                                                           [100.00%]
                2,41,957 migrations                                                   [100.00%]
             1,68,01,713 faults                                                       [100.00%]
          7,99,35,29,041 cache-misses
                  98,808 migrate:mm_migrate_pages                                     [100.00%]
      
          1407.690148814 seconds time elapsed
      
      numa02
      
       Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa02':
      
                  63,065 cs                                                           [100.00%]
                  14,364 migrations                                                   [100.00%]
                2,08,118 faults                                                       [100.00%]
            25,32,59,404 cache-misses
                      12 migrate:mm_migrate_pages                                     [100.00%]
      
            63.840827219 seconds time elapsed
      
      (3.14-rc6 with fix)
      -------------------
      numa01
      
       Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa01':
      
                9,68,911 cs                                                           [100.00%]
                1,01,414 migrations                                                   [100.00%]
               88,38,697 faults                                                       [100.00%]
          4,42,92,51,042 cache-misses
                4,25,060 migrate:mm_migrate_pages                                     [100.00%]
      
           685.965331189 seconds time elapsed
      
      numa02
      
       Performance counter stats for '/usr/bin/time -f %e %S %U %c %w -o start_bench.out -a ./numa02':
      
                  17,543 cs                                                           [100.00%]
                   2,962 migrations                                                   [100.00%]
                1,17,843 faults                                                       [100.00%]
            11,80,61,644 cache-misses
                  12,358 migrate:mm_migrate_pages                                     [100.00%]
      
            20.380132343 seconds time elapsed
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      834a964a