1. 25 Feb, 2015 1 commit
    • Florian Fainelli's avatar
      net: dsa: integrate with SWITCHDEV for HW bridging · b73adef6
      Florian Fainelli authored
      In order to support bridging offloads in DSA switch drivers, select
      NET_SWITCHDEV to get access to the port_stp_update and parent_get_id
      NDOs that we are required to implement.
      
      To facilitate the integratation at the DSA driver level, we implement 3
      types of operations:
      
      - port_join_bridge
      - port_leave_bridge
      - port_stp_update
      
      DSA will resolve which switch ports that are currently bridge port
      members as some Switch hardware/drivers need to know about that to limit
      the register programming to just the relevant registers (especially for
      slow MDIO buses).
      
      We also take care of setting the correct STP state when slave network
      devices are brought up/down while being bridge members.
      
      Finally, when a port is leaving the bridge, we make sure we set in
      BR_STATE_FORWARDING state, otherwise the bridge layer would leave it
      disabled as a result of having left the bridge.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b73adef6
  2. 24 Feb, 2015 1 commit
    • Mahesh Bandewar's avatar
      bonding: Implement port churn-machine (AD standard 43.4.17). · 14c9551a
      Mahesh Bandewar authored
      The Churn Detection machines detect the situation where a port is operable,
      but the Actor and Partner have not attached the link to an Aggregator and
      brought the link into operation within a bound time period. Under normal
      operation of the LACP, agreement between Actor and Partner should be reached
      very rapidly. Continued failure to reach agreement can be symptomatic of
      device failure.
      
      Actor-churn-detection state-machine
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      
      ===================================
      
      BEGIN=True + PortEnable=False
                 |
                 v
       +------------------------+   ActorPort.Sync=True  +------------------+
       |   ACTOR_CHURN_MONITOR  | ---------------------> |  NO_ACTOR_CHURN  |
       |========================|                        |==================|
       |    ActorChurn=False    |  ActorPort.Sync=False  | ActorChurn=False |
       | ActorChurn.Timer=Start | <--------------------- |                  |
       +------------------------+                        +------------------+
                 |                                                ^
                 |                                                |
        ActorChurn.Timer=Expired                                  |
                 |                                       ActorPort.Sync=True
                 |                                                |
                 |                +-----------------+             |
                 |                |   ACTOR_CHURN   |             |
                 |                |=================|             |
                 +--------------> | ActorChurn=True | ------------+
                                  |                 |
                                  +-----------------+
      
      Similar for the Partner-churn-detection.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14c9551a
  3. 12 Feb, 2015 1 commit
  4. 11 Feb, 2015 3 commits
    • Tom Herbert's avatar
      vxlan: Use checksum partial with remote checksum offload · 0ace2ca8
      Tom Herbert authored
      Change remote checksum handling to set checksum partial as default
      behavior. Added an iflink parameter to configure not using
      checksum partial (calling csum_partial to update checksum).
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ace2ca8
    • Tom Herbert's avatar
      net: Fix remcsum in GRO path to not change packet · 26c4f7da
      Tom Herbert authored
      Remote checksum offload processing is currently the same for both
      the GRO and non-GRO path. When the remote checksum offload option
      is encountered, the checksum field referred to is modified in
      the packet. So in the GRO case, the packet is modified in the
      GRO path and then the operation is skipped when the packet goes
      through the normal path based on skb->remcsum_offload. There is
      a problem in that the packet may be modified in the GRO path, but
      then forwarded off host still containing the remote checksum option.
      A remote host will again perform RCO but now the checksum verification
      will fail since GRO RCO already modified the checksum.
      
      To fix this, we ensure that GRO restores a packet to it's original
      state before returning. In this model, when GRO processes a remote
      checksum option it still changes the checksum per the algorithm
      but on return from lower layer processing the checksum is restored
      to its original value.
      
      In this patch we add define gro_remcsum structure which is passed
      to skb_gro_remcsum_process to save offset and delta for the checksum
      being changed. After lower layer processing, skb_gro_remcsum_cleanup
      is called to restore the checksum before returning from GRO.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26c4f7da
    • Paul Moore's avatar
      cipso: don't use IPCB() to locate the CIPSO IP option · 04f81f01
      Paul Moore authored
      Using the IPCB() macro to get the IPv4 options is convenient, but
      unfortunately NetLabel often needs to examine the CIPSO option outside
      of the scope of the IP layer in the stack.  While historically IPCB()
      worked above the IP layer, due to the inclusion of the inet_skb_param
      struct at the head of the {tcp,udp}_skb_cb structs, recent commit
      971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      reordered the tcp_skb_cb struct and invalidated this IPCB() trick.
      
      This patch fixes the problem by creating a new function,
      cipso_v4_optptr(), which locates the CIPSO option inside the IP header
      without calling IPCB().  Unfortunately, this isn't as fast as a simple
      lookup so some additional tweaks were made to limit the use of this
      new function.
      
      Cc: <stable@vger.kernel.org> # 3.18
      Reported-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Tested-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      04f81f01
  5. 09 Feb, 2015 3 commits
  6. 08 Feb, 2015 3 commits
    • Eric Dumazet's avatar
      net: rfs: add hash collision detection · 567e4b79
      Eric Dumazet authored
      Receive Flow Steering is a nice solution but suffers from
      hash collisions when a mix of connected and unconnected traffic
      is received on the host, when flow hash table is populated.
      
      Also, clearing flow in inet_release() makes RFS not very good
      for short lived flows, as many packets can follow close().
      (FIN , ACK packets, ...)
      
      This patch extends the information stored into global hash table
      to not only include cpu number, but upper part of the hash value.
      
      I use a 32bit value, and dynamically split it in two parts.
      
      For host with less than 64 possible cpus, this gives 6 bits for the
      cpu number, and 26 (32-6) bits for the upper part of the hash.
      
      Since hash bucket selection use low order bits of the hash, we have
      a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
      enough.
      
      If the hash found in flow table does not match, we fallback to RPS (if
      it is enabled for the rxqueue).
      
      This means that a packet for an non connected flow can avoid the
      IPI through a unrelated/victim CPU.
      
      This also means we no longer have to clear the table at socket
      close time, and this helps short lived flows performance.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      567e4b79
    • Neal Cardwell's avatar
      tcp: mitigate ACK loops for connections as tcp_request_sock · a9b2c06d
      Neal Cardwell authored
      In the SYN_RECV state, where the TCP connection is represented by
      tcp_request_sock, we now rate-limit SYNACKs in response to a client's
      retransmitted SYNs: we do not send a SYNACK in response to client SYN
      if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms)
      since we last sent a SYNACK in response to a client's retransmitted
      SYN.
      
      This allows the vast majority of legitimate client connections to
      proceed unimpeded, even for the most aggressive platforms, iOS and
      MacOS, which actually retransmit SYNs 1-second intervals for several
      times in a row. They use SYN RTO timeouts following the progression:
      1,1,1,1,1,2,4,8,16,32.
      Reported-by: default avatarAvery Fay <avery@mixpanel.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9b2c06d
    • Neal Cardwell's avatar
      tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks · 032ee423
      Neal Cardwell authored
      Helpers for mitigating ACK loops by rate-limiting dupacks sent in
      response to incoming out-of-window packets.
      
      This patch includes:
      
      - rate-limiting logic
      - sysctl to control how often we allow dupacks to out-of-window packets
      - SNMP counter for cases where we rate-limited our dupack sending
      
      The rate-limiting logic in this patch decides to not send dupacks in
      response to out-of-window segments if (a) they are SYNs or pure ACKs
      and (b) the remote endpoint is sending them faster than the configured
      rate limit.
      
      We rate-limit our responses rather than blocking them entirely or
      resetting the connection, because legitimate connections can rely on
      dupacks in response to some out-of-window segments. For example, zero
      window probes are typically sent with a sequence number that is below
      the current window, and ZWPs thus expect to thus elicit a dupack in
      response.
      
      We allow dupacks in response to TCP segments with data, because these
      may be spurious retransmissions for which the remote endpoint wants to
      receive DSACKs. This is safe because segments with data can't
      realistically be part of ACK loops, which by their nature consist of
      each side sending pure/data-less ACKs to each other.
      
      The dupack interval is controlled by a new sysctl knob,
      tcp_invalid_ratelimit, given in milliseconds, in case an administrator
      needs to dial this upward in the face of a high-rate DoS attack. The
      name and units are chosen to be analogous to the existing analogous
      knob for ICMP, icmp_ratelimit.
      
      The default value for tcp_invalid_ratelimit is 500ms, which allows at
      most one such dupack per 500ms. This is chosen to be 2x faster than
      the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
      2.4). We allow the extra 2x factor because network delay variations
      can cause packets sent at 1 second intervals to be compressed and
      arrive much closer.
      Reported-by: default avatarAvery Fay <avery@mixpanel.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      032ee423
  7. 05 Feb, 2015 3 commits
    • Erik Kline's avatar
      net: ipv6: allow explicitly choosing optimistic addresses · c58da4c6
      Erik Kline authored
      RFC 4429 ("Optimistic DAD") states that optimistic addresses
      should be treated as deprecated addresses.  From section 2.1:
      
         Unless noted otherwise, components of the IPv6 protocol stack
         should treat addresses in the Optimistic state equivalently to
         those in the Deprecated state, indicating that the address is
         available for use but should not be used if another suitable
         address is available.
      
      Optimistic addresses are indeed avoided when other addresses are
      available (i.e. at source address selection time), but they have
      not heretofore been available for things like explicit bind() and
      sendmsg() with struct in6_pktinfo, etc.
      
      This change makes optimistic addresses treated more like
      deprecated addresses than tentative ones.
      Signed-off-by: default avatarErik Kline <ek@google.com>
      Acked-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c58da4c6
    • Eric Dumazet's avatar
      ipv6: fix sparse errors in ip6_make_flowlabel() · 67765146
      Eric Dumazet authored
      include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
      include/net/ipv6.h:713:22:    expected restricted __be32 [usertype] hash
      include/net/ipv6.h:713:22:    got unsigned int
      include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
      include/net/ipv6.h:719:22: warning: invalid assignment: ^=
      include/net/ipv6.h:719:22:    left side has type restricted __be32
      include/net/ipv6.h:719:22:    right side has type unsigned int
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67765146
    • Eric Dumazet's avatar
      flow_keys: n_proto type should be __be16 · f4575d35
      Eric Dumazet authored
      (struct flow_keys)->n_proto is in network order, use
      proper type for this.
      
      Fixes following sparse errors :
      
      net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
      net/core/flow_dissector.c:139:39:    expected unsigned short [unsigned] [usertype] n_proto
      net/core/flow_dissector.c:139:39:    got restricted __be16 [assigned] [usertype] proto
      net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
      net/core/flow_dissector.c:237:23:    expected unsigned short [unsigned] [usertype] n_proto
      net/core/flow_dissector.c:237:23:    got restricted __be16 [assigned] [usertype] proto
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: e0f31d84 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4575d35
  8. 04 Feb, 2015 8 commits
  9. 03 Feb, 2015 3 commits
  10. 02 Feb, 2015 14 commits