1. 18 Mar, 2016 1 commit
    • Arnd Bergmann's avatar
      scsi: fc: use get/put_unaligned64 for wwn access · ef3fb242
      Arnd Bergmann authored
      A bug in the gcc-6.0 prerelease version caused at least one
      driver (lpfc) to have excessive stack usage when dealing with
      wwn data, on the ARM architecture.
      
      lpfc_scsi.c: In function 'lpfc_find_next_oas_lun':
      lpfc_scsi.c:117:1: warning: the frame size of 1152 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      I have reported this as a gcc regression in
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
      
      However, using a better implementation of wwn_to_u64() not only
      helps with the particular gcc problem but also leads to better
      object code for any version or architecture.
      
      The kernel already provides get_unaligned_be64() and
      put_unaligned_be64() helper functions that provide an
      optimized implementation with the desired semantics.
      
      The lpfc_find_next_oas_lun() function in the example that
      grew from 1146 bytes to 5144 bytes when moving from gcc-5.3
      to gcc-6.0 is now 804 bytes, as the optimized
      get_unaligned_be64() load can be done in three instructions.
      The stack usage is now down to 28 bytes from 128 bytes with
      gcc-5.3 before.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarHannes Reinicke <hare@suse.de>
      Reviewed-by: default avatarEwan Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      ef3fb242
  2. 05 Mar, 2016 1 commit
  3. 23 Feb, 2016 5 commits
  4. 21 Feb, 2016 1 commit
  5. 19 Feb, 2016 1 commit
    • Nikolay Aleksandrov's avatar
      net: make netdev_for_each_lower_dev safe for device removal · cfdd28be
      Nikolay Aleksandrov authored
      When I used netdev_for_each_lower_dev in commit bad53162 ("vrf:
      remove slave queue and private slave struct") I thought that it acts
      like netdev_for_each_lower_private and can be used to remove the current
      device from the list while walking, but unfortunately it acts more like
      netdev_for_each_lower_private_rcu and doesn't allow it. The difference
      is where the "iter" points to, right now it points to the current element
      and that makes it impossible to remove it. Change the logic to be
      similar to netdev_for_each_lower_private and make it point to the "next"
      element so we can safely delete the current one. VRF is the only such
      user right now, there's no change for the read-only users.
      
      Here's what can happen now:
      [98423.249858] general protection fault: 0000 [#1] SMP
      [98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss
      oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul
      crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng
      sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw
      gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon
      parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button
      9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg
      virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd
      ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy
      virtio_ring virtio scsi_mod [last unloaded: bridge]
      [98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G           O
      4.5.0-rc2+ #81
      [98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.8.1-20150318_183358- 04/01/2014
      [98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti:
      ffff88003428c000
      [98423.256123] RIP: 0010:[<ffffffff81514f3e>]  [<ffffffff81514f3e>]
      netdev_lower_get_next+0x1e/0x30
      [98423.256534] RSP: 0018:ffff88003428f940  EFLAGS: 00010207
      [98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX:
      0000000000000000
      [98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI:
      ffff880054ff90c0
      [98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09:
      0000000000000000
      [98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12:
      ffff88003428f9e0
      [98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15:
      0000000000000001
      [98423.258055] FS:  00007f3d76881700(0000) GS:ffff88005d000000(0000)
      knlGS:0000000000000000
      [98423.258418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4:
      00000000000406e0
      [98423.258902] Stack:
      [98423.259075]  ffff88003428f960 ffffffffa0442636 0002000100000004
      ffff880054ff9000
      [98423.259647]  ffff88003428f9b0 ffffffff81518205 ffff880054ff9000
      ffff88003428f978
      [98423.260208]  ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0
      ffff880035b35f00
      [98423.260739] Call Trace:
      [98423.260920]  [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf]
      [98423.261156]  [<ffffffff81518205>]
      rollback_registered_many+0x205/0x390
      [98423.261401]  [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70
      [98423.261641]  [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50
      [98423.271557]  [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0
      [98423.271800]  [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90
      [98423.272049]  [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200
      [98423.272279]  [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10
      [98423.272513]  [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40
      [98423.272755]  [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40
      [98423.272983]  [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0
      [98423.273209]  [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40
      [98423.273476]  [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0
      [98423.273710]  [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610
      [98423.273947]  [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70
      [98423.274175]  [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0
      [98423.274416]  [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140
      [98423.274658]  [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210
      [98423.274894]  [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210
      [98423.275130]  [<ffffffff81269611>] ? __fget_light+0x91/0xb0
      [98423.275365]  [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80
      [98423.275595]  [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20
      [98423.275827]  [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66
      90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48
      89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66
      [98423.279639] RIP  [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30
      [98423.279920]  RSP <ffff88003428f940>
      
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Vlad Yasevich <vyasevic@redhat.com>
      Fixes: bad53162 ("vrf: remove slave queue and private slave struct")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfdd28be
  6. 18 Feb, 2016 4 commits
    • Maarten Lankhorst's avatar
      drm/atomic: Allow for holes in connector state, v2. · 5fff80bb
      Maarten Lankhorst authored
      Because we record connector_mask using 1 << drm_connector_index now
      the connector_mask should stay the same even when other connectors
      are removed. This was not the case with MST, in that case when removing
      a connector all other connectors may change their index.
      
      This is fixed by waiting until the first get_connector_state to allocate
      connector_state, and force reallocation when state is too small.
      
      As a side effect connector arrays no longer have to be preallocated,
      and can be allocated on first use which means a less allocations in
      the page flip only path.
      
      Changes since v1:
      - Whitespace. (Ville)
      - Call ida_remove when destroying the connector. (Ville)
      - u32 alloc -> int. (Ville)
      
      Fixes: 14de6c44 ("drm/atomic: Remove drm_atomic_connectors_for_crtc.")
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarLyude <cpaul@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      5fff80bb
    • Jeff Layton's avatar
      Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" · 13d34ac6
      Jeff Layton authored
      This reverts commit c510eff6 ("fsnotify: destroy marks with
      call_srcu instead of dedicated thread").
      
      Eryu reported that he was seeing some OOM kills kick in when running a
      testcase that adds and removes inotify marks on a file in a tight loop.
      
      The above commit changed the code to use call_srcu to clean up the
      marks.  While that does (in principle) work, the srcu callback job is
      limited to cleaning up entries in small batches and only once per jiffy.
      It's easily possible to overwhelm that machinery with too many call_srcu
      callbacks, and Eryu's reproduer did just that.
      
      There's also another potential problem with using call_srcu here.  While
      you can obviously sleep while holding the srcu_read_lock, the callbacks
      run under local_bh_disable, so you can't sleep there.
      
      It's possible when putting the last reference to the fsnotify_mark that
      we'll end up putting a chain of references including the fsnotify_group,
      uid, and associated keys.  While I don't see any obvious ways that that
      could occurs, it's probably still best to avoid using call_srcu here
      after all.
      
      This patch reverts the above patch.  A later patch will take a different
      approach to eliminated the dedicated thread here.
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Reported-by: default avatarEryu Guan <guaneryu@gmail.com>
      Tested-by: default avatarEryu Guan <guaneryu@gmail.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13d34ac6
    • Eric Dumazet's avatar
      tcp/dccp: fix another race at listener dismantle · 7716682c
      Eric Dumazet authored
      Ilya reported following lockdep splat:
      
      kernel: =========================
      kernel: [ BUG: held lock freed! ]
      kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
      kernel: -------------------------
      kernel: swapper/5/0 is freeing memory
      ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
      kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      kernel: 4 locks held by swapper/5/0:
      kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
      netif_receive_skb_internal+0x4b/0x1f0
      kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
      ip_local_deliver_finish+0x3f/0x380
      kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
      sk_clone_lock+0x19b/0x440
      kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      
      To properly fix this issue, inet_csk_reqsk_queue_add() needs
      to return to its callers if the child as been queued
      into accept queue.
      
      We also need to make sure listener is still there before
      calling sk->sk_data_ready(), by holding a reference on it,
      since the reference carried by the child can disappear as
      soon as the child is put on accept queue.
      Reported-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7716682c
    • Xin Long's avatar
      route: check and remove route cache when we get route · deed49df
      Xin Long authored
      Since the gc of ipv4 route was removed, the route cached would has
      no chance to be removed, and even it has been timeout, it still could
      be used, cause no code to check it's expires.
      
      Fix this issue by checking  and removing route cache when we get route.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      deed49df
  7. 17 Feb, 2016 3 commits
    • Jessica Yu's avatar
      ftrace/module: remove ftrace module notifier · 7dcd182b
      Jessica Yu authored
      Remove the ftrace module notifier in favor of directly calling
      ftrace_module_enable() and ftrace_release_mod() in the module loader.
      Hard-coding the function calls directly in the module loader removes
      dependence on the module notifier call chain and provides better
      visibility and control over what gets called when, which is important
      to kernel utilities such as livepatch.
      
      This fixes a notifier ordering issue in which the ftrace module notifier
      (and hence ftrace_module_enable()) for coming modules was being called
      after klp_module_notify(), which caused livepatch modules to initialize
      incorrectly. This patch removes dependence on the module notifier call
      chain in favor of hard coding the corresponding function calls in the
      module loader. This ensures that ftrace and livepatch code get called in
      the correct order on patch module load and unload.
      
      Fixes: 5156dca3 ("ftrace: Fix the race between ftrace and insmod")
      Signed-off-by: default avatarJessica Yu <jeyu@redhat.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.cz>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      7dcd182b
    • Kinglong Mee's avatar
      pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page · c8975706
      Kinglong Mee authored
      unreferenced object 0xffffc90000abf000 (size 16900):
        comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280
          [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50
          [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver]
          [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver]
          [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4]
          [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4]
          [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4]
          [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0
          [<ffffffff81228f1d>] do_fsync+0x3d/0x70
          [<ffffffff812291d0>] SyS_fsync+0x10/0x20
          [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      v2, add missing include header
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      c8975706
    • Huy Nguyen's avatar
      net/mlx4_core: Set UAR page size to 4KB regardless of system page size · 85743f1e
      Huy Nguyen authored
      problem description:
      
      The current code sets UAR page size equal to system page size.
      The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
      The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
      
      solution:
      
      Always set UAR page to 4KB. This allows more UAR pages if the OS
      has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
      system page size, with 4MB uar region, there are 4MB/2/64KB = 32
      uars (half for uar, half for blueflame). This does not meet minimum 128
      UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
      which meet the minimum requirement.
      
      Note that only codes in mlx4_core that deal with firmware know that uar
      page size is 4KB. Codes that deal with usr page in cq and qp context
      (mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
      that uar page size equals to system page size.
      
      Note that with this implementation, on 64KB system page size kernel, there
      are 16 uars per system page but only one uars is used. The other 15
      uars are ignored because of the above assumption.
      
      Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
      to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
      firmware.
      
      Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
      the virtual OS must be updated. If hypervisor has old code, and the virtual
      OS has this new code, the new code will be backward compatible with the
      old code. If the uar size is big enough, this new code in VF continues to
      work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
      meet 128 uars requirement, this new code not loaded in VF and print the same
      error message as the old code in Hypervisor.
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85743f1e
  8. 16 Feb, 2016 1 commit
    • Matan Barak's avatar
      net/mlx5: Use offset based reserved field names in the IFC header file · b4ff3a36
      Matan Barak authored
      mlx5_ifc.h is a header file representing the API and ABI between
      the driver to the firmware and hardware. This file is used from
      both the mlx5_ib and mlx5_core drivers.
      
      Previously, this file used incrementing counter to indicate
      reserved fields, for example:
      
      struct mlx5_ifc_odp_per_transport_service_cap_bits {
              u8         send[0x1];
              u8         receive[0x1];
              u8         write[0x1];
              u8         read[0x1];
              u8         reserved_0[0x1];
              u8         srq_receive[0x1];
              u8         reserved_1[0x1a];
      };
      
      If one developer implements through net-next feature A that uses
      reserved_0, they replace it with featureA and renames reserved_1 to
      reserved_0. In the same kernel cycle, a 2nd developer could implement
      feature B through the rdma tree, that uses reserved_1 and split it to
      featureB and a smaller reserved_1 field. This will cause a conflict
      when the two trees are merged.
      
      The source of this conflict is that the 1st developer changed *all*
      reserved fields.
      
      As Linus suggested, we change the layout of structs to:
      
      struct mlx5_ifc_odp_per_transport_service_cap_bits {
      	u8         send[0x1];
      	u8         receive[0x1];
      	u8         write[0x1];
      	u8         read[0x1];
      	u8         reserved_at_4[0x1];
      	u8         srq_receive[0x1];
      	u8         reserved_at_6[0x1a];
      };
      
      This makes the conflicts much more rare and preserves the locality of
      changes.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4ff3a36
  9. 15 Feb, 2016 4 commits
    • Arnd Bergmann's avatar
      tracing: Fix freak link error caused by branch tracer · b33c8ff4
      Arnd Bergmann authored
      In my randconfig tests, I came across a bug that involves several
      components:
      
      * gcc-4.9 through at least 5.3
      * CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
      * CONFIG_PROFILE_ALL_BRANCHES overriding every if()
      * The optimized implementation of do_div() that tries to
        replace a library call with an division by multiplication
      * code in drivers/media/dvb-frontends/zl10353.c doing
      
              u32 adc_clock = 450560; /* 45.056 MHz */
              if (state->config.adc_clock)
                      adc_clock = state->config.adc_clock;
              do_div(value, adc_clock);
      
      In this case, gcc fails to determine whether the divisor
      in do_div() is __builtin_constant_p(). In particular, it
      concludes that __builtin_constant_p(adc_clock) is false, while
      __builtin_constant_p(!!adc_clock) is true.
      
      That in turn throws off the logic in do_div() that also uses
      __builtin_constant_p(), and instead of picking either the
      constant- optimized division, and the code in ilog2() that uses
      __builtin_constant_p() to figure out whether it knows the answer at
      compile time. The result is a link error from failing to find
      multiple symbols that should never have been called based on
      the __builtin_constant_p():
      
      dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
      dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
      ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
      ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
      
      This patch avoids the problem by changing __trace_if() to check
      whether the condition is known at compile-time to be nonzero, rather
      than checking whether it is actually a constant.
      
      I see this one link error in roughly one out of 1600 randconfig builds
      on ARM, and the patch fixes all known instances.
      
      Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.deAcked-by: default avatarNicolas Pitre <nico@linaro.org>
      Fixes: ab3c9c68 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
      Cc: stable@vger.kernel.org # v2.6.30+
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b33c8ff4
    • Steven Rostedt (Red Hat)'s avatar
      tracepoints: Do not trace when cpu is offline · f3775549
      Steven Rostedt (Red Hat) authored
      The tracepoint infrastructure uses RCU sched protection to enable and
      disable tracepoints safely. There are some instances where tracepoints are
      used in infrastructure code (like kfree()) that get called after a CPU is
      going offline, and perhaps when it is coming back online but hasn't been
      registered yet.
      
      This can probuce the following warning:
      
       [ INFO: suspicious RCU usage. ]
       4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
       -------------------------------
       include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       RCU used illegally from offline CPU!  rcu_scheduler_active = 1, debug_locks = 1
       no locks held by swapper/8/0.
      
       stack backtrace:
        CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S              4.4.0-00006-g0fe53e8-dirty #34
        Call Trace:
        [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
        [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
        [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
        [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
        [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
        [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
        [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
        [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
        [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
        [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
        [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
        [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
      
      This warning is not a false positive either. RCU is not protecting code that
      is being executed while the CPU is offline.
      
      Instead of playing "whack-a-mole(TM)" and adding conditional statements to
      the tracepoints we find that are used in this instance, simply add a
      cpu_online() test to the tracepoint code where the tracepoint will be
      ignored if the CPU is offline.
      
      Use of raw_smp_processor_id() is fine, as there should never be a case where
      the tracepoint code goes from running on a CPU that is online and suddenly
      gets migrated to a CPU that is offline.
      
      Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.orgReported-by: default avatarDenis Kirjanov <kda@linux-powerpc.org>
      Fixes: 97e1c18e ("tracing: Kernel Tracepoints")
      Cc: stable@vger.kernel.org # v2.6.28+
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f3775549
    • David Woodhouse's avatar
      iommu/vt-d: Clear PPR bit to ensure we get more page request interrupts · 46924008
      David Woodhouse authored
      According to the VT-d specification we need to clear the PPR bit in
      the Page Request Status register when handling page requests, or the
      hardware won't generate any more interrupts.
      
      This wasn't actually necessary on SKL/KBL (which may well be the
      subject of a hardware erratum, although it's harmless enough). But
      other implementations do appear to get it right, and we only ever get
      one interrupt unless we clear the PPR bit.
      Reported-by: default avatarCQ Tang <cq.tang@intel.com>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: stable@vger.kernel.org
      46924008
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fix Multi hit ERAT cause by recent THP update · c777e2a8
      Aneesh Kumar K.V authored
      With ppc64 we use the deposited pgtable_t to store the hash pte slot
      information. We should not withdraw the deposited pgtable_t without
      marking the pmd none. This ensure that low level hash fault handling
      will skip this huge pte and we will handle them at upper levels.
      
      Recent change to pmd splitting changed the above in order to handle the
      race between pmd split and exit_mmap. The race is explained below.
      
      Consider following race:
      
      		CPU0				CPU1
      shrink_page_list()
        add_to_swap()
          split_huge_page_to_list()
            __split_huge_pmd_locked()
              pmdp_huge_clear_flush_notify()
      	// pmd_none() == true
      					exit_mmap()
      					  unmap_vmas()
      					    zap_pmd_range()
      					      // no action on pmd since pmd_none() == true
      	pmd_populate()
      
      As result the THP will not be freed. The leak is detected by check_mm():
      
      	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
      
      The above required us to not mark pmd none during a pmd split.
      
      The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
      level fault handling code skip this pte. At higher level we do take ptl
      lock. That should serialze us against the pmd split. Once the lock is
      acquired we do check the pmd again using pmd_same. That should always
      return false for us and hence we should retry the access. We do the
      pmd_same check in all case after taking plt with
      THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
      huge_pmd_set_accessed)
      
      Also make sure we wait for irq disable section in other cpus to finish
      before flipping a huge pte entry with a regular pmd entry. Code paths
      like find_linux_pte_or_hugepte depend on irq disable to get
      a stable pte_t pointer. A parallel thp split need to make sure we
      don't convert a pmd pte to a regular pmd entry without waiting for the
      irq disable section to finish.
      
      Fixes: eef1b3ba ("thp: implement split_huge_pmd()")
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c777e2a8
  10. 11 Feb, 2016 3 commits
  11. 10 Feb, 2016 4 commits
  12. 09 Feb, 2016 2 commits
  13. 08 Feb, 2016 2 commits
  14. 07 Feb, 2016 1 commit
    • Herton R. Krzesinski's avatar
      pty: make sure super_block is still valid in final /dev/tty close · 1f55c718
      Herton R. Krzesinski authored
      Considering current pty code and multiple devpts instances, it's possible
      to umount a devpts file system while a program still has /dev/tty opened
      pointing to a previosuly closed pty pair in that instance. In the case all
      ptmx and pts/N files are closed, umount can be done. If the program closes
      /dev/tty after umount is done, devpts_kill_index will use now an invalid
      super_block, which was already destroyed in the umount operation after
      running ->kill_sb. This is another "use after free" type of issue, but now
      related to the allocated super_block instance.
      
      To avoid the problem (warning at ida_remove and potential crashes) for
      this specific case, I added two functions in devpts which grabs additional
      references to the super_block, which pty code now uses so it makes sure
      the super block structure is still valid until pty shutdown is done.
      I also moved the additional inode references to the same functions, which
      also covered similar case with inode being freed before /dev/tty final
      close/shutdown.
      Signed-off-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Cc: stable@vger.kernel.org # 2.6.29+
      Reviewed-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f55c718
  15. 06 Feb, 2016 1 commit
  16. 05 Feb, 2016 4 commits
    • Konstantin Khlebnikov's avatar
      radix-tree: fix oops after radix_tree_iter_retry · 73204282
      Konstantin Khlebnikov authored
      Helper radix_tree_iter_retry() resets next_index to the current index.
      In following radix_tree_next_slot current chunk size becomes zero.  This
      isn't checked and it tries to dereference null pointer in slot.
      
      Tagged iterator is fine because retry happens only at slot 0 where tag
      bitmask in iter->tags is filled with single bit.
      
      Fixes: 46437f9a ("radix-tree: fix race in gang lookup")
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Jeremiah Mahler <jmmahler@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73204282
    • Konstantin Khlebnikov's avatar
      mm: replace vma_lock_anon_vma with anon_vma_lock_read/write · 12352d3c
      Konstantin Khlebnikov authored
      Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if
      anon_vma appeared between lock and unlock.  We have to check anon_vma
      first or call anon_vma_prepare() to be sure that it's here.  There are
      only few users of these legacy helpers.  Let's get rid of them.
      
      This patch fixes anon_vma lock imbalance in validate_mm().  Write lock
      isn't required here, read lock is enough.
      
      And reorders expand_downwards/expand_upwards: security_mmap_addr() and
      wrapping-around check don't have to be under anon vma lock.
      
      Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.comSigned-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12352d3c
    • Vlastimil Babka's avatar
      mm, hugetlb: don't require CMA for runtime gigantic pages · 080fe206
      Vlastimil Babka authored
      Commit 944d9fec ("hugetlb: add support for gigantic page allocation
      at runtime") has added the runtime gigantic page allocation via
      alloc_contig_range(), making this support available only when CONFIG_CMA
      is enabled.  Because it doesn't depend on MIGRATE_CMA pageblocks and the
      associated infrastructure, it is possible with few simple adjustments to
      require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.
      
      After this patch, alloc_contig_range() and related functions are
      available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
      enabled.  Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION.  This allows
      supporting runtime gigantic pages without the CMA-specific checks in
      page allocator fastpaths.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      080fe206
    • Nicholas Bellinger's avatar
      target: Fix remote-port TMR ABORT + se_cmd fabric stop · 0f4a9431
      Nicholas Bellinger authored
      To address the bug where fabric driver level shutdown
      of se_cmd occurs at the same time when TMR CMD_T_ABORTED
      is happening resulting in a -1 ->cmd_kref, this patch
      adds a CMD_T_FABRIC_STOP bit that is used to determine
      when TMR + driver I_T nexus shutdown is happening
      concurrently.
      
      It changes target_sess_cmd_list_set_waiting() to obtain
      se_cmd->cmd_kref + set CMD_T_FABRIC_STOP, and drop local
      reference in target_wait_for_sess_cmds() and invoke extra
      target_put_sess_cmd() during Task Aborted Status (TAS)
      when necessary.
      
      Also, it adds a new target_wait_free_cmd() wrapper around
      transport_wait_for_tasks() for the special case within
      transport_generic_free_cmd() to set CMD_T_FABRIC_STOP,
      and is now aware of CMD_T_ABORTED + CMD_T_TAS status
      bits to know when an extra transport_put_cmd() during
      TAS is required.
      
      Note transport_generic_free_cmd() is expected to block on
      cmd->cmd_wait_comp in order to follow what iscsi-target
      expects during iscsi_conn context se_cmd shutdown.
      
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: default avatarNicholas Bellinger <nab@daterainc.com>
      0f4a9431
  17. 04 Feb, 2016 2 commits