1. 25 Apr, 2016 24 commits
  2. 24 Apr, 2016 10 commits
    • Haiyang Zhang's avatar
      hv_netvsc: Fix the list processing for network change event · 15cfd407
      Haiyang Zhang authored
      RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
      media disconnect & connect. The second half should be added to the list
      head, not to the tail. So all events are processed in normal order.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15cfd407
    • Sridhar Samudrala's avatar
      ixgbe: make 'action' field in struct ixgbe_fdir_filter a u64 value · 2a9ed5d1
      Sridhar Samudrala authored
      This field is used to record the RX queue index for a redirect action
      passed via ring_cookie field in struct ethtool_rx_flow_spec which is
      a u64 value.
      
      For ex: after adding a filter rule to redirect to a VF using ethtool
        # echo 4 > /sys/class/net/p4p1/device/sriov_numvfs
        # ethtool -N p4p1 flow-type ip4 src-ip 192.168.0.1 action 0x100000000
      
      querying for the rule shows the Action as 'Direct to queue 0'
      
        # ethtool -n p4p1
        4 RX rings available
        Total 1 rules
      
        Filter: 2045
       	Rule Type: Raw IPv4
      	Src IP addr: 192.168.0.1 mask: 0.0.0.0
      	Dest IP addr: 0.0.0.0 mask: 255.255.255.255
      	TOS: 0x0 mask: 0xff
      	Protocol: 0 mask: 0xff
      	L4 bytes: 0x0 mask: 0xffffffff
      	VLAN EtherType: 0x0 mask: 0xffff
      	VLAN: 0x0 mask: 0xffff
      	User-defined: 0x0 mask: 0xffffffffffffffff
      	Action: Direct to queue 0
      
      With this fix, ethtool will report the right queue index even for VFs.
      	Action: Direct to queue 4294967296
      
      Here 4294967296 corresponds to 0x100000000.
      We need to update 'ethtool' to report the queue index as a Hex value so
      that it is more  user friendly and matches with the 'action' value that
      is passed when adding the rule.
      Signed-off-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2a9ed5d1
    • Emil Tantilov's avatar
      ixgbe: fix default mac->ops.setup_link for X550EM · 4695886c
      Emil Tantilov authored
      X550EM_a/x did not have a default value for mac->ops.setup_link which
      was causing link issues for backplane devices.
      
      This patch sets mac->ops.setup_link to ixgbe_setup_mac_link_X540 for
      X550EM_a/x which is also default for X550. This will result in
      mac->ops.setup_link calling the link setup function for the respective
      PHY type in case we do not need a special function to deal with it.
      Reported-by: default avatarKen Cox <jkc@redhat.com>
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4695886c
    • Emil Tantilov's avatar
      ixgbe: set VLAN spoof checking unconditionally · d3dec7c7
      Emil Tantilov authored
      Previously the PF driver would only set VLAN spoof checking if
      the VF had created VLANs. This was done by setting and checking
      a counter (vlan_count) whenever a VLAN was created by the VF.
      However it is possible for the vlan_count to be !=0 while there are
      no VLANs assigned to the VF due to the count incrementing every
      time a VLAN 0 is added on ifdown/up, which resulted in VLAN spoofing
      always being set for those VFs.
      
      This patch cleans up the logic by unconditionally setting VLAN based on
      how the VF is configured (via ip link set ethX vf Y spoofchk on/off).
      This change also resolves an issue where the VLAN spoofing can remain
      set even after being disabled by the user due to the driver enabling
      VLAN spoof checking every time a VLAN is added to the VF, but would
      only allow changes in the setting if vlan_count != 0.
      
      Also default_vf_vlan_id and vlans_enabled were removed from the
      vf_data_storage structure since they are not being used in the driver.
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d3dec7c7
    • Emil Tantilov's avatar
      ixgbe: consolidate the configuration of spoof checking · 77f192af
      Emil Tantilov authored
      Consolidate the logic behind configuring spoof checking:
      
      Move the setting of the MAC, VLAN and Ethertype spoof checking into
      ixgbe_ndo_set_vf_spoofchk().
      
      Change ixgbe_set_mac_anti_spoofing() to set MAC spoofing per VF similar
      to the VLAN and Ethertype functions - this allows us to call the helper
      functions in ixgbe_ndo_set_vf_spoofchk() for all spoof check types and
      only disable MAC spoof checking when creating MACVLAN.
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      77f192af
    • Eric Dumazet's avatar
      tcp-tso: do not split TSO packets at retransmit time · 10d3be56
      Eric Dumazet authored
      Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
      
      This was fine back in the days when TSO/GSO were emerging, with their
      bugs, but we believe the dark age is over.
      
      Keeping big packets in write queues, but also in stack traversal
      has a lot of benefits.
       - Less memory overhead, because write queues have less skbs
       - Less cpu overhead at ACK processing.
       - Better SACK processing, as lot of studies mentioned how
         awful linux was at this ;)
       - Less cpu overhead to send the rtx packets
         (IP stack traversal, netfilter traversal, drivers...)
       - Better latencies in presence of losses.
       - Smaller spikes in fq like packet schedulers, as retransmits
         are not constrained by TCP Small Queues.
      
      1 % packet losses are common today, and at 100Gbit speeds, this
      translates to ~80,000 losses per second.
      Losses are often correlated, and we see many retransmit events
      leading to 1-MSS train of packets, at the time hosts are already
      under stress.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10d3be56
    • Parthasarathy Bhuvaragan's avatar
      tipc: fix stale links after re-enabling bearer · 8cee83dd
      Parthasarathy Bhuvaragan authored
      Commit 42b18f60 ("tipc: refactor function tipc_link_timeout()"),
      introduced a bug which prevents sending of probe messages during
      link synchronization phase. This leads to hanging links, if the
      bearer is disabled/enabled after links are up.
      
      In this commit, we send the probe messages correctly.
      
      Fixes: 42b18f60 ("tipc: refactor function tipc_link_timeout()")
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cee83dd
    • David S. Miller's avatar
      Merge branch 'tcp-tcstamp_ack-frag-coalesce' · 6a74c196
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      tcp: Handle txstamp_ack when fragmenting/coalescing skbs
      
      This patchset is to handle the txstamp-ack bit when
      fragmenting/coalescing skbs.
      
      The second patch depends on the recently posted series
      for the net branch:
      "tcp: Merge timestamp info when coalescing skbs"
      
      A BPF prog is used to kprobe to sock_queue_err_skb()
      and print out the value of serr->ee.ee_data.  The BPF
      prog (run-able from bcc) is attached here:
      
      BPF prog used for testing:
      ~~~~~
      
      from __future__ import print_function
      from bcc import BPF
      
      bpf_text = """
      
      int trace_err_skb(struct pt_regs *ctx)
      {
      	struct sk_buff *skb = (struct sk_buff *)ctx->si;
      	struct sock *sk = (struct sock *)ctx->di;
      	struct sock_exterr_skb *serr;
      	u32 ee_data = 0;
      
      	if (!sk || !skb)
      		return 0;
      
      	serr = SKB_EXT_ERR(skb);
      	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
      	bpf_trace_printk("ee_data:%u\\n", ee_data);
      
      	return 0;
      };
      """
      
      b = BPF(text=bpf_text)
      b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
      print("Attached to kprobe")
      b.trace_print()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a74c196
    • Martin KaFai Lau's avatar
      tcp: Merge txstamp_ack in tcp_skb_collapse_tstamp · 2de8023e
      Martin KaFai Lau authored
      When collapsing skbs, txstamp_ack also needs to be merged.
      
      Retrans Collapse Test:
      ~~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      0.200 write(4, ..., 11680) = 11680
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2027  [007] d.s.    79.765921: : ee_data:1459
      
      Sacks Collapse Test:
      ~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2049  [007] d.s.    89.185538: : ee_data:14599
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de8023e
    • Martin KaFai Lau's avatar
      tcp: Carry txstamp_ack in tcp_fragment_tstamp · b51e13fa
      Martin KaFai Lau authored
      When a tcp skb is sliced into two smaller skbs (e.g. in
      tcp_fragment() and tso_fragment()),  it does not carry
      the txstamp_ack bit to the newly created skb if it is needed.
      The end result is a timestamping event (SCM_TSTAMP_ACK) will
      be missing from the sk->sk_error_queue.
      
      This patch carries this bit to the new skb2
      in tcp_fragment_tstamp().
      
      BPF Output Before:
      ~~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~~
      <...>-2050  [000] d.s.   100.928763: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 14600) = 14600
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > . 1:7301(7300) ack 1
      0.200 > P. 7301:14601(7300) ack 1
      
      0.300 < . 1:1(0) ack 14601 win 257
      
      0.300 close(4) = 0
      0.300 > F. 14601:14601(0) ack 1
      0.400 < F. 1:1(0) ack 16062 win 257
      0.400 > . 14602:14602(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b51e13fa
  3. 23 Apr, 2016 6 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 11afbff8
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree, mostly from Florian Westphal to sort out the lack of sufficient
      validation in x_tables and connlabel preparation patches to add
      nf_tables support. They are:
      
      1) Ensure we don't go over the ruleset blob boundaries in
         mark_source_chains().
      
      2) Validate that target jumps land on an existing xt_entry. This extra
         sanitization comes with a performance penalty when loading the ruleset.
      
      3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.
      
      4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.
      
      5) Make sure the minimal possible target size in x_tables.
      
      6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.
      
      7) Check that standard target size is valid.
      
      8) More sanitization to ensure that the target_offset field is correct.
      
      9) Add xt_check_entry_match() to validate that matches are well-formed.
      
      10-12) Three patch to reduce the number of parameters in
          translate_compat_table() for {arp,ip,ip6}tables by using a container
          structure.
      
      13) No need to return value from xt_compat_match_from_user(), so make
          it void.
      
      14) Consolidate translate_table() so it can be used by compat code too.
      
      15) Remove obsolete check for compat code, so we keep consistent with
          what was already removed in the native layout code (back in 2007).
      
      16) Get rid of target jump validation from mark_source_chains(),
          obsoleted by #2.
      
      17) Introduce xt_copy_counters_from_user() to consolidate counter
          copying, and use it from {arp,ip,ip6}tables.
      
      18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump
          functions.
      
      19) Move nf_connlabel_match() to xt_connlabel.
      
      20) Skip event notification if connlabel did not change.
      
      21) Update of nf_connlabels_get() to make the upcoming nft connlabel
          support easier.
      
      23) Remove spinlock to read protocol state field in conntrack.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11afbff8
    • David S. Miller's avatar
      Merge branch 'nla_align-more' · 8d9ea160
      David S. Miller authored
      Nicolas Dichtel says:
      
      ====================
      netlink: align attributes when needed (patchset #1)
      
      This is the continuation of the work done to align netlink attributes
      when these attributes contain some 64-bit fields.
      
      David, if the third patch is too big (or maybe the series), I can split it.
      Just tell me what you prefer.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d9ea160
    • Nicolas Dichtel's avatar
      taskstats: use the libnl API to align nlattr on 64-bit · 80df5542
      Nicolas Dichtel authored
      Goal of this patch is to use the new libnl API to align netlink attribute
      when needed.
      The layout of the netlink message will be a bit different after the patch,
      because the padattr (TASKSTATS_TYPE_STATS) will be inside the nested
      attribute instead of before it.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80df5542
    • Nicolas Dichtel's avatar
    • Nicolas Dichtel's avatar
      libnl: add nla_put_u64_64bit() helper · 73520786
      Nicolas Dichtel authored
      With this function, nla_data() is aligned on a 64-bit area.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73520786
    • Nicolas Dichtel's avatar
      libnl: nla_put_msecs(): align on a 64-bit area · 2175d87c
      Nicolas Dichtel authored
      nla_data() is now aligned on a 64-bit area.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2175d87c