1. 21 Sep, 2013 1 commit
  2. 20 Sep, 2013 12 commits
  3. 19 Sep, 2013 1 commit
    • Ansis Atteka's avatar
      ip: generate unique IP identificator if local fragmentation is allowed · 703133de
      Ansis Atteka authored
      If local fragmentation is allowed, then ip_select_ident() and
      ip_select_ident_more() need to generate unique IDs to ensure
      correct defragmentation on the peer.
      For example, if IPsec (tunnel mode) has to encrypt large skbs
      that have local_df bit set, then all IP fragments that belonged
      to different ESP datagrams would have used the same identificator.
      If one of these IP fragments would get lost or reordered, then
      peer could possibly stitch together wrong IP fragments that did
      not belong to the same datagram. This would lead to a packet loss
      or data corruption.
      Signed-off-by: default avatarAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  4. 13 Sep, 2013 1 commit
  5. 11 Sep, 2013 1 commit
    • Michal Kubeček's avatar
      ipv6: don't call fib6_run_gc() until routing is ready · 2c861cc6
      Michal Kubeček authored
      When loading the ipv6 module, ndisc_init() is called before
      ip6_route_init(). As the former registers a handler calling
      fib6_run_gc(), this opens a window to run the garbage collector
      before necessary data structures are initialized. If a network
      device is initialized in this window, adding MAC address to it
      triggers a NETDEV_CHANGEADDR event, leading to a crash in
      Take the event handler registration out of ndisc_init() into a
      separate function ndisc_late_init() and move it after
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  6. 05 Sep, 2013 1 commit
    • Joseph Gasparakis's avatar
      vxlan: Notify drivers for listening UDP port changes · 53cf5275
      Joseph Gasparakis authored
      This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
      Drivers can get notifications through the above functions about changes
      of the UDP listening port of VXLAN. Also, when physical ports come up,
      now they can call vxlan_get_rx_port() in order to obtain the port number(s)
      of the existing VXLAN interface in case they already up before them.
      This information about the listening UDP port would be used for VXLAN
      related offloads.
      A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
      input and his suggestions on this patch set.
      CC: John Fastabend <john.r.fastabend@intel.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJoseph Gasparakis <joseph.gasparakis@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  7. 04 Sep, 2013 2 commits
    • Daniel Borkmann's avatar
      net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation · e3f5b170
      Daniel Borkmann authored
      Get rid of MLDV2_MRC and use our new macros for mantisse and
      exponent to calculate Maximum Response Delay out of the Maximum
      Response Code.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Daniel Borkmann's avatar
      net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12. · 89225d1c
      Daniel Borkmann authored
      i) RFC3810, 9.2. Query Interval [QI] says:
         The Query Interval variable denotes the interval between General
         Queries sent by the Querier. Default value: 125 seconds. [...]
      ii) RFC3810, 9.3. Query Response Interval [QRI] says:
        The Maximum Response Delay used to calculate the Maximum Response
        Code inserted into the periodic General Queries. Default value:
        10000 (10 seconds) [...] The number of seconds represented by the
        [Query Response Interval] must be less than the [Query Interval].
      iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:
        The Older Version Querier Present Timeout is the time-out for
        transitioning a host back to MLDv2 Host Compatibility Mode. When an
        MLDv1 query is received, MLDv2 hosts set their Older Version Querier
        Present Timer to [Older Version Querier Present Timeout].
        This value MUST be ([Robustness Variable] times (the [Query Interval]
        in the last Query received)) plus ([Query Response Interval]).
      Hence, on *default* the timeout results in:
        [RV] = 2, [QI] = 125sec, [QRI] = 10sec
        [OVQPT] = [RV] * [QI] + [QRI] = 260sec
      Having that said, we currently calculate [OVQPT] (here given as 'switchback'
      variable) as ...
        switchback = (idev->mc_qrv + 1) * max_delay
      RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
      section "9.14. Configuring timers", it is said:
        This section is meant to provide advice to network administrators on
        how to tune these settings to their network. Ambitious router
        implementations might tune these settings dynamically based upon
        changing characteristics of the network. [...]
      iv) RFC38010, 9.14.2. Query Interval:
        The overall level of periodic MLD traffic is inversely proportional
        to the Query Interval. A longer Query Interval results in a lower
        overall level of MLD traffic. The value of the Query Interval MUST
        be equal to or greater than the Maximum Response Delay used to
        calculate the Maximum Response Code inserted in General Query
      I assume that was why switchback is calculated as is (3 * max_delay), although
      this setting seems to be meant for routers only to configure their [QI]
      interval for non-default intervals. So usage here like this is clearly wrong.
      Concluding, the current behaviour in IPv6's multicast code is not conform
      to the RFC as switch back is calculated wrongly. That is, it has a too small
      value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
      instead of ~260secs on default.
      Hence, introduce necessary helper functions and fix this up properly as it
      should be.
      Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
      Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
      who did initial testing.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: David Stevens <dlstevens@us.ibm.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 03 Sep, 2013 6 commits
  9. 02 Sep, 2013 1 commit
  10. 31 Aug, 2013 6 commits
  11. 30 Aug, 2013 1 commit
    • stephen hemminger's avatar
      qdisc: allow setting default queuing discipline · 6da7c8fc
      stephen hemminger authored
      By default, the pfifo_fast queue discipline has been used by default
      for all devices. But we have better choices now.
      This patch allow setting the default queueing discipline with sysctl.
      This allows easy use of better queueing disciplines on all devices
      without having to use tc qdisc scripts. It is intended to allow
      an easy path for distributions to make fq_codel or sfq the default
      This patch also makes pfifo_fast more of a first class qdisc, since
      it is now possible to manually override the default and explicitly
      use pfifo_fast. The behavior for systems who do not use the sysctl
      is unchanged, they still get pfifo_fast
      Also removes leftover random # in sysctl net core.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  12. 29 Aug, 2013 2 commits
    • Eric Dumazet's avatar
      tcp: TSO packets automatic sizing · 95bd09eb
      Eric Dumazet authored
      After hearing many people over past years complaining against TSO being
      bursty or even buggy, we are proud to present automatic sizing of TSO
      One part of the problem is that tcp_tso_should_defer() uses an heuristic
      relying on upcoming ACKS instead of a timer, but more generally, having
      big TSO packets makes little sense for low rates, as it tends to create
      micro bursts on the network, and general consensus is to reduce the
      buffering amount.
      This patch introduces a per socket sk_pacing_rate, that approximates
      the current sending rate, and allows us to size the TSO packets so
      that we try to send one packet every ms.
      This field could be set by other transports.
      Patch has no impact for high speed flows, where having large TSO packets
      makes sense to reach line rate.
      For other flows, this helps better packet scheduling and ACK clocking.
      This patch increases performance of TCP flows in lossy environments.
      A new sysctl (tcp_min_tso_segs) is added, to specify the
      minimal size of a TSO packet (default being 2).
      A follow-up patch will provide a new packet scheduler (FQ), using
      sk_pacing_rate as an input to perform optional per flow pacing.
      This explains why we chose to set sk_pacing_rate to twice the current
      rate, allowing 'slow start' ramp up.
      sk_pacing_rate = 2 * cwnd * mss / srtt
      v2: Neal Cardwell reported a suspect deferring of last two segments on
      initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
      into account tp->xmit_size_goal_segs
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Daniel Borkmann's avatar
      net: sctp: reorder sctp_globals to reduce cacheline usage · 76bfd898
      Daniel Borkmann authored
      Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By
      reordering elements, we can close gaps and simply achieve the following:
      Current situation:
        /* size: 80, cachelines: 2, members: 10 */
        /* sum members: 57, holes: 4, sum holes: 16 */
        /* padding: 7 */
        /* last cacheline: 16 bytes */
        /* size: 64, cachelines: 1, members: 10 */
        /* padding: 7 */
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  13. 28 Aug, 2013 2 commits
  14. 27 Aug, 2013 3 commits