1. 30 Sep, 2016 2 commits
    • Xin Long's avatar
      sctp: remove prsctp_param from sctp_chunk · 0605483f
      Xin Long authored
      Now sctp uses chunk->prsctp_param to save the prsctp param for all the
      prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
      We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
      and reuse msg->expires_at for TTL policy, as the prsctp polices and old
      expires policy are mutual exclusive.
      
      This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
      expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
      polices.
      
      Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
      as it needs a u64 variables to save the expires_at time.
      
      This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
      issue.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: 's avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      0605483f
    • Xin Long's avatar
      sctp: move sent_count to the memory hole in sctp_chunk · 73dca124
      Xin Long authored
      Now pahole sctp_chunk, it has 2 memory holes:
         struct sctp_chunk {
      	struct list_head           list;
      	atomic_t                   refcnt;
      	/* XXX 4 bytes hole, try to pack */
      	...
      	long unsigned int          prsctp_param;
      	int                        sent_count;
      	/* XXX 4 bytes hole, try to pack */
      
      This patch is to move up sent_count to fill the 1st one and eliminate
      the 2nd one.
      
      It's not just another struct compaction, it also fixes the "netperf-
      Throughput_Mbps -37.2% regression" issue when overloading the CPU.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: 's avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      73dca124
  2. 21 Sep, 2016 1 commit
    • Nicolas Dichtel's avatar
      vti6: fix input path · 63c43787
      Nicolas Dichtel authored
      Since commit 1625f452, vti6 is broken, all input packets are dropped
      (LINUX_MIB_XFRMINNOSTATES is incremented).
      
      XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
      xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
      xfrm6_rcv_spi().
      
      A new function xfrm6_rcv_tnl() that enables to pass a value to
      xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
      is used in several handlers).
      
      CC: Alexey Kodanev <alexey.kodanev@oracle.com>
      Fixes: 1625f452 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
      Signed-off-by: 's avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: 's avatarSteffen Klassert <steffen.klassert@secunet.com>
      63c43787
  3. 17 Sep, 2016 2 commits
  4. 13 Sep, 2016 1 commit
  5. 06 Sep, 2016 1 commit
  6. 04 Sep, 2016 1 commit
  7. 30 Aug, 2016 1 commit
  8. 25 Aug, 2016 2 commits
    • Liping Zhang's avatar
      netfilter: nft_meta: improve the validity check of pkttype set expr · 960fa72f
      Liping Zhang authored
      "meta pkttype set" is only supported on prerouting chain with bridge
      family and ingress chain with netdev family.
      
      But the validate check is incomplete, and the user can add the nft
      rules on input chain with bridge family, for example:
        # nft add table bridge filter
        # nft add chain bridge filter input {type filter hook input \
          priority 0 \;}
        # nft add chain bridge filter test
        # nft add rule bridge filter test meta pkttype set unicast
        # nft add rule bridge filter input jump test
      
      This patch fixes the problem.
      Signed-off-by: 's avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      960fa72f
    • Liping Zhang's avatar
      netfilter: nft_reject: restrict to INPUT/FORWARD/OUTPUT · 89e1f6d2
      Liping Zhang authored
      After I add the nft rule "nft add rule filter prerouting reject
      with tcp reset", kernel panic happened on my system:
        NULL pointer dereference at ...
        IP: [<ffffffff81b9db2f>] nf_send_reset+0xaf/0x400
        Call Trace:
        [<ffffffff81b9da80>] ? nf_reject_ip_tcphdr_get+0x160/0x160
        [<ffffffffa0928061>] nft_reject_ipv4_eval+0x61/0xb0 [nft_reject_ipv4]
        [<ffffffffa08e836a>] nft_do_chain+0x1fa/0x890 [nf_tables]
        [<ffffffffa08e8170>] ? __nft_trace_packet+0x170/0x170 [nf_tables]
        [<ffffffffa06e0900>] ? nf_ct_invert_tuple+0xb0/0xc0 [nf_conntrack]
        [<ffffffffa07224d4>] ? nf_nat_setup_info+0x5d4/0x650 [nf_nat]
        [...]
      
      Because in the PREROUTING chain, routing information is not exist,
      then we will dereference the NULL pointer and oops happen.
      
      So we restrict reject expression to INPUT, FORWARD and OUTPUT chain.
      This is consistent with iptables REJECT target.
      Signed-off-by: 's avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      89e1f6d2
  9. 19 Aug, 2016 1 commit
    • Eric Dumazet's avatar
      tcp: fix use after free in tcp_xmit_retransmit_queue() · bb1fceca
      Eric Dumazet authored
      When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
      tail of the write queue using tcp_add_write_queue_tail()
      
      Then it attempts to copy user data into this fresh skb.
      
      If the copy fails, we undo the work and remove the fresh skb.
      
      Unfortunately, this undo lacks the change done to tp->highest_sack and
      we can leave a dangling pointer (to a freed skb)
      
      Later, tcp_xmit_retransmit_queue() can dereference this pointer and
      access freed memory. For regular kernels where memory is not unmapped,
      this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
      returning garbage instead of tp->snd_nxt, but with various debug
      features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
      
      This bug was found by Marco Grassi thanks to syzkaller.
      
      Fixes: 6859d494 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
      Reported-by: 's avatarMarco Grassi <marco.gra@gmail.com>
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: 's avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: 's avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      bb1fceca
  10. 17 Aug, 2016 3 commits
  11. 15 Aug, 2016 1 commit
  12. 13 Aug, 2016 1 commit
  13. 05 Aug, 2016 2 commits
    • David Howells's avatar
      rxrpc: Fix races between skb free, ACK generation and replying · 372ee163
      David Howells authored
      Inside the kafs filesystem it is possible to occasionally have a call
      processed and terminated before we've had a chance to check whether we need
      to clean up the rx queue for that call because afs_send_simple_reply() ends
      the call when it is done, but this is done in a workqueue item that might
      happen to run to completion before afs_deliver_to_call() completes.
      
      Further, it is possible for rxrpc_kernel_send_data() to be called to send a
      reply before the last request-phase data skb is released.  The rxrpc skb
      destructor is where the ACK processing is done and the call state is
      advanced upon release of the last skb.  ACK generation is also deferred to
      a work item because it's possible that the skb destructor is not called in
      a context where kernel_sendmsg() can be invoked.
      
      To this end, the following changes are made:
      
       (1) kernel_rxrpc_data_consumed() is added.  This should be called whenever
           an skb is emptied so as to crank the ACK and call states.  This does
           not release the skb, however.  kernel_rxrpc_free_skb() must now be
           called to achieve that.  These together replace
           rxrpc_kernel_data_delivered().
      
       (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
      
           This makes afs_deliver_to_call() easier to work as the skb can simply
           be discarded unconditionally here without trying to work out what the
           return value of the ->deliver() function means.
      
           The ->deliver() functions can, via afs_data_complete(),
           afs_transfer_reply() and afs_extract_data() mark that an skb has been
           consumed (thereby cranking the state) without the need to
           conditionally free the skb to make sure the state is correct on an
           incoming call for when the call processor tries to send the reply.
      
       (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
           has finished with a packet and MSG_PEEK isn't set.
      
       (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
      
           Because of this, we no longer need to clear the destructor and put the
           call before we free the skb in cases where we don't want the ACK/call
           state to be cranked.
      
       (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
           than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
           the delivery function already), and the caller is now responsible for
           producing an abort if that was the last packet.
      
       (6) There are many bits of unmarshalling code where:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		switch (ret) {
      		case 0:		break;
      		case -EAGAIN:	return 0;
      		default:	return ret;
      		}
      
           is to be found.  As -EAGAIN can now be passed back to the caller, we
           now just return if ret < 0:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		if (ret < 0)
      			return ret;
      
       (7) Checks for trailing data and empty final data packets has been
           consolidated as afs_data_complete().  So:
      
      		if (skb->len > 0)
      			return -EBADMSG;
      		if (!last)
      			return 0;
      
           becomes:
      
      		ret = afs_data_complete(call, skb, last);
      		if (ret < 0)
      			return ret;
      
       (8) afs_transfer_reply() now checks the amount of data it has against the
           amount of data desired and the amount of data in the skb and returns
           an error to induce an abort if we don't get exactly what we want.
      
      Without these changes, the following oops can occasionally be observed,
      particularly if some printks are inserted into the delivery path:
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
      CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G            E   4.7.0-fsdevel+ #1303
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Workqueue: kafsd afs_async_workfn [kafs]
      task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
      RIP: 0010:[<ffffffff8108fd3c>]  [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
      RSP: 0018:ffff88040c073bc0  EFLAGS: 00010002
      RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
      RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
      FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
      Stack:
       0000000000000006 000000000be04930 0000000000000000 ffff880400000000
       ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
       ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
      Call Trace:
       [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
       [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
       [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
       [<ffffffff810915f4>] lock_acquire+0x122/0x1b6
       [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff814c928f>] skb_dequeue+0x18/0x61
       [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
       [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
       [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
       [<ffffffff81063a3a>] process_one_work+0x29d/0x57c
       [<ffffffff81064ac2>] worker_thread+0x24a/0x385
       [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
       [<ffffffff810696f5>] kthread+0xf3/0xfb
       [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
       [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf
      Signed-off-by: 's avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      372ee163
    • Maxim Altshul's avatar
      mac80211: Add ieee80211_hw pointer to get_expected_throughput · 2439ca04
      Maxim Altshul authored
      The variable is added to allow the driver an easy access to
      it's own hw->priv when the op is invoked.
      
      This fixes a crash in wlcore because it was relying on a
      station pointer that wasn't initialized yet. It's the wrong
      way to fix the crash, but it solves the problem for now and
      it does make sense to have the hw pointer here.
      Signed-off-by: 's avatarMaxim Altshul <maxim.altshul@ti.com>
      [rewrite commit message, fix indentation]
      Signed-off-by: 's avatarJohannes Berg <johannes.berg@intel.com>
      2439ca04
  14. 02 Aug, 2016 1 commit
  15. 01 Aug, 2016 4 commits
  16. 25 Jul, 2016 6 commits
  17. 22 Jul, 2016 2 commits
  18. 21 Jul, 2016 1 commit
  19. 20 Jul, 2016 1 commit
  20. 19 Jul, 2016 5 commits
    • Gavin Shan's avatar
      net/ncsi: Package and channel management · e6f44ed6
      Gavin Shan authored
      This manages NCSI packages and channels:
      
       * The available packages and channels are enumerated in the first
         time of calling ncsi_start_dev(). The channels' capabilities are
         probed in the meanwhile. The NCSI network topology won't change
         until the NCSI device is destroyed.
       * There in a queue in every NCSI device. The element in the queue,
         channel, is waiting for configuration (bringup) or suspending
         (teardown). The channel's state (inactive/active) indicates the
         futher action (configuration or suspending) will be applied on the
         channel. Another channel's state (invisible) means the requested
         action is being applied.
       * The hardware arbitration will be enabled if all available packages
         and channels support it. All available channels try to provide
         service when hardware arbitration is enabled. Otherwise, one channel
         is selected as the active one at once.
       * When channel is in active state, meaning it's providing service, a
         timer started to retrieve the channe's link status. If the channel's
         link status fails to be updated in the determined period, the channel
         is going to be reconfigured. It's the error handling implementation
         as defined in NCSI spec.
      Signed-off-by: 's avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Acked-by: 's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      e6f44ed6
    • Gavin Shan's avatar
      net/ncsi: Resource management · 2d283bdd
      Gavin Shan authored
      NCSI spec (DSP0222) defines several objects: package, channel, mode,
      filter, version and statistics etc. This introduces the data structs
      to represent those objects and implement functions to manage them.
      Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
      stack.
      
         * The user (e.g. netdev driver) dereference NCSI device by
           "struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
           The later one is used by NCSI stack internally.
         * Every NCSI device can have multiple packages simultaneously, up
           to 8 packages. It's represented by "struct ncsi_package" and
           identified by 3-bits ID.
         * Every NCSI package can have multiple channels, up to 32. It's
           represented by "struct ncsi_channel" and identified by 5-bits ID.
         * Every NCSI channel has version, statistics, various modes and
           filters. They are represented by "struct ncsi_channel_version",
           "struct ncsi_channel_stats", "struct ncsi_channel_mode" and
           "struct ncsi_channel_filter" separately.
         * Apart from AEN (Asynchronous Event Notification), the NCSI stack
           works in terms of command and response. This introduces "struct
           ncsi_req" to represent a complete NCSI transaction made of NCSI
           request and response.
      
      link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdfSigned-off-by: 's avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Acked-by: 's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      2d283bdd
    • Vivien Didelot's avatar
      net: dsa: support switchdev ageing time attr · 34a79f63
      Vivien Didelot authored
      Add a new function for DSA drivers to handle the switchdev
      SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.
      
      The ageing time is passed as milliseconds.
      
      Also because we can have multiple logical bridges on top of a physical
      switch and ageing time are switch-wide, call the driver function with
      the fastest ageing time in use on the chip instead of the requested one.
      Signed-off-by: 's avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: 's avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      34a79f63
    • Vivien Didelot's avatar
      net: switchdev: change ageing_time type to clock_t · eabfdda9
      Vivien Didelot authored
      The switchdev value for the SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME
      attribute is a clock_t and requires to use helpers such as
      clock_t_to_jiffies() to convert to milliseconds.
      
      Change ageing_time type from u32 to clock_t to make it explicit.
      
      Fixes: f55ac58a ("switchdev: add bridge ageing_time attribute")
      Signed-off-by: 's avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      eabfdda9
    • Shmulik Ladkani's avatar
      net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags · 359ebda2
      Shmulik Ladkani authored
      This flag indicates whether fragmentation of segments is allowed.
      
      Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by
      either ip_forward or ipmr_forward).
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: 's avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: 's avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      359ebda2
  21. 18 Jul, 2016 1 commit