1. 12 May, 2016 2 commits
  2. 27 Apr, 2016 1 commit
  3. 26 Apr, 2016 3 commits
  4. 25 Apr, 2016 17 commits
  5. 24 Apr, 2016 1 commit
    • Eric Dumazet's avatar
      tcp-tso: do not split TSO packets at retransmit time · 10d3be56
      Eric Dumazet authored
      Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
      
      This was fine back in the days when TSO/GSO were emerging, with their
      bugs, but we believe the dark age is over.
      
      Keeping big packets in write queues, but also in stack traversal
      has a lot of benefits.
       - Less memory overhead, because write queues have less skbs
       - Less cpu overhead at ACK processing.
       - Better SACK processing, as lot of studies mentioned how
         awful linux was at this ;)
       - Less cpu overhead to send the rtx packets
         (IP stack traversal, netfilter traversal, drivers...)
       - Better latencies in presence of losses.
       - Smaller spikes in fq like packet schedulers, as retransmits
         are not constrained by TCP Small Queues.
      
      1 % packet losses are common today, and at 100Gbit speeds, this
      translates to ~80,000 losses per second.
      Losses are often correlated, and we see many retransmit events
      leading to 1-MSS train of packets, at the time hosts are already
      under stress.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10d3be56
  6. 23 Apr, 2016 7 commits
  7. 21 Apr, 2016 9 commits
    • Hannes Frederic Sowa's avatar
      geneve: break dependency with netdev drivers · 681e683f
      Hannes Frederic Sowa authored
      Equivalent to "vxlan: break dependency with netdev drivers", don't
      autoload geneve module in case the driver is loaded. Instead make the
      coupling weaker by using netdevice notifiers as proxy.
      
      Cc: Jesse Gross <jesse@kernel.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      681e683f
    • Hannes Frederic Sowa's avatar
      vxlan: break dependency with netdev drivers · b7aade15
      Hannes Frederic Sowa authored
      Currently all drivers depend and autoload the vxlan module because how
      vxlan_get_rx_port is linked into them. Remove this dependency:
      
      By using a new event type in the netdevice notifier call chain we proxy
      the request from the drivers to flush and resetup the vxlan ports not
      directly via function call but by the already existing netdevice
      notifier call chain.
      
      I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so.
      We don't need to save those ids, as the event type field is an unsigned
      long and using specialized event types for this purpose seemed to be a
      more elegant way. This also comes in beneficial if in future we want to
      add offloading knobs for vxlan.
      
      Cc: Jesse Gross <jesse@kernel.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7aade15
    • Tariq Toukan's avatar
      net/mlx5e: Support RX multi-packet WQE (Striding RQ) · 461017cb
      Tariq Toukan authored
      Introduce the feature of multi-packet WQE (RX Work Queue Element)
      referred to as (MPWQE or Striding RQ), in which WQEs are larger
      and serve multiple packets each.
      
      Every WQE consists of many strides of the same size, every received
      packet is aligned to a beginning of a stride and is written to
      consecutive strides within a WQE.
      
      In the regular approach, each regular WQE is big enough to be capable
      of serving one received packet of any size up to MTU or 64K in case of
      device LRO is enabled, making it very wasteful when dealing with
      small packets or device LRO is enabled.
      
      For its flexibility, MPWQE allows a better memory utilization
      (implying improvements in CPU utilization and packet rate) as packets
      consume strides according to their size, preserving the rest of
      the WQE to be available for other packets.
      
      MPWQE default configuration:
      	Num of WQEs	= 16
      	Strides Per WQE = 2048
      	Stride Size	= 64 byte
      
      The default WQEs memory footprint went from 1024*mtu (~1.5MB) to
      16 * 2048 * 64 = 2MB per ring.
      However, HW LRO can now be supported at no additional cost in memory
      footprint, and hence we turn it on by default and get an even better
      performance.
      
      Performance tested on ConnectX4-Lx 50G.
      To isolate the feature under test, the numbers below were measured with
      HW LRO turned off. We verified that the performance just improves when
      LRO is turned back on.
      
      * Netperf single TCP stream:
      - BW raised by 10-15% for representative packet sizes:
        default, 64B, 1024B, 1478B, 65536B.
      
      * Netperf multi TCP stream:
      - No degradation, line rate reached.
      
      * Pktgen: packet rate raised by 2-10% for traffic of different message
      sizes: 64B, 128B, 256B, 1024B, and 1500B.
      
      * Pktgen: packet loss in bursts of small messages (64byte),
      single stream:
      - | num packets | packets loss before | packets loss after
        |     2K      |       ~ 1K          |       0
        |     8K      |       ~ 6K          |       0
        |     16K     |       ~13K          |       0
        |     32K     |       ~28K          |       0
        |     64K     |       ~57K          |     ~24K
      
      As expected as the driver can receive as many small packets (<=64B) as
      the number of total strides in the ring (default = 2048 * 16) vs. 1024
      (default ring size regardless of packets size) before this feature.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      461017cb
    • Tariq Toukan's avatar
      net/mlx5: Introduce device queue counters · 237cd218
      Tariq Toukan authored
      A queue counter can collect several statistics for one or more
      hardware queues (QPs, RQs, etc ..) that the counter is attached to.
      
      For Ethernet it will provide an "out of buffer" counter which
      collects the number of all packets that are dropped due to lack
      of software buffers.
      
      Here we add device commands to alloc/query/dealloc queue counters.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarRana Shahout <ranas@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      237cd218
    • Daniel Jurgens's avatar
      net/mlx4_core: Avoid repeated calls to pci enable/disable · 4bfd2e6e
      Daniel Jurgens authored
      Maintain the PCI status and provide wrappers for enabling and disabling
      the PCI device.  Performing the actions more than once without doing
      its opposite results in warning logs.
      
      This occurred when EEH hotplugged the device causing a warning for
      disabling an already disabled device.
      
      Fixes: 2ba5fbd6 ('net/mlx4_core: Handle AER flow properly')
      Signed-off-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bfd2e6e
    • Martin KaFai Lau's avatar
      tcp: Merge tx_flags and tskey in tcp_shifted_skb · cfea5a68
      Martin KaFai Lau authored
      After receiving sacks, tcp_shifted_skb() will collapse
      skbs if possible.  tx_flags and tskey also have to be
      merged.
      
      This patch reuses the tcp_skb_collapse_tstamp() to handle
      them.
      
      BPF Output Before:
      ~~~~~
      <no-output-due-to-missing-tstamp-event>
      
      BPF Output After:
      ~~~~~
      <...>-2024  [007] d.s.    88.644374: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      0.400 close(4) = 0
      0.400 > F. 14601:14601(0) ack 1
      0.500 < F. 1:1(0) ack 14602 win 257
      0.500 > . 14602:14602(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfea5a68
    • Nicolas Dichtel's avatar
      a9a08042
    • Nicolas Dichtel's avatar
    • Alexander Duyck's avatar
      netdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE · b1c20f0b
      Alexander Duyck authored
      This patch folds NETIF_F_ALL_TSO into the bitmask for NETIF_F_GSO_SOFTWARE.
      The idea is to avoid duplication of defines since the only difference
      between the two was the GSO_UDP bit.
      Signed-off-by: default avatarAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1c20f0b