1. 31 Aug, 2013 3 commits
  2. 30 Aug, 2013 1 commit
    • stephen hemminger's avatar
      qdisc: allow setting default queuing discipline · 6da7c8fc
      stephen hemminger authored
      By default, the pfifo_fast queue discipline has been used by default
      for all devices. But we have better choices now.
      
      This patch allow setting the default queueing discipline with sysctl.
      This allows easy use of better queueing disciplines on all devices
      without having to use tc qdisc scripts. It is intended to allow
      an easy path for distributions to make fq_codel or sfq the default
      qdisc.
      
      This patch also makes pfifo_fast more of a first class qdisc, since
      it is now possible to manually override the default and explicitly
      use pfifo_fast. The behavior for systems who do not use the sysctl
      is unchanged, they still get pfifo_fast
      
      Also removes leftover random # in sysctl net core.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6da7c8fc
  3. 29 Aug, 2013 2 commits
    • Eric Dumazet's avatar
      tcp: TSO packets automatic sizing · 95bd09eb
      Eric Dumazet authored
      After hearing many people over past years complaining against TSO being
      bursty or even buggy, we are proud to present automatic sizing of TSO
      packets.
      
      One part of the problem is that tcp_tso_should_defer() uses an heuristic
      relying on upcoming ACKS instead of a timer, but more generally, having
      big TSO packets makes little sense for low rates, as it tends to create
      micro bursts on the network, and general consensus is to reduce the
      buffering amount.
      
      This patch introduces a per socket sk_pacing_rate, that approximates
      the current sending rate, and allows us to size the TSO packets so
      that we try to send one packet every ms.
      
      This field could be set by other transports.
      
      Patch has no impact for high speed flows, where having large TSO packets
      makes sense to reach line rate.
      
      For other flows, this helps better packet scheduling and ACK clocking.
      
      This patch increases performance of TCP flows in lossy environments.
      
      A new sysctl (tcp_min_tso_segs) is added, to specify the
      minimal size of a TSO packet (default being 2).
      
      A follow-up patch will provide a new packet scheduler (FQ), using
      sk_pacing_rate as an input to perform optional per flow pacing.
      
      This explains why we chose to set sk_pacing_rate to twice the current
      rate, allowing 'slow start' ramp up.
      
      sk_pacing_rate = 2 * cwnd * mss / srtt
      
      v2: Neal Cardwell reported a suspect deferring of last two segments on
      initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
      into account tp->xmit_size_goal_segs
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95bd09eb
    • Daniel Borkmann's avatar
      net: sctp: reorder sctp_globals to reduce cacheline usage · 76bfd898
      Daniel Borkmann authored
      Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By
      reordering elements, we can close gaps and simply achieve the following:
      
      Current situation:
        /* size: 80, cachelines: 2, members: 10 */
        /* sum members: 57, holes: 4, sum holes: 16 */
        /* padding: 7 */
        /* last cacheline: 16 bytes */
      
      Afterwards:
        /* size: 64, cachelines: 1, members: 10 */
        /* padding: 7 */
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76bfd898
  4. 27 Aug, 2013 4 commits
  5. 23 Aug, 2013 1 commit
  6. 22 Aug, 2013 1 commit
  7. 20 Aug, 2013 2 commits
  8. 15 Aug, 2013 3 commits
    • Jesper Dangaard Brouer's avatar
      net_sched: restore "linklayer atm" handling · 8a8e3d84
      Jesper Dangaard Brouer authored
      commit 56b765b7 ("htb: improved accuracy at high rates")
      broke the "linklayer atm" handling.
      
       tc class add ... htb rate X ceil Y linklayer atm
      
      The linklayer setting is implemented by modifying the rate table
      which is send to the kernel.  No direct parameter were
      transferred to the kernel indicating the linklayer setting.
      
      The commit 56b765b7 ("htb: improved accuracy at high rates")
      removed the use of the rate table system.
      
      To keep compatible with older iproute2 utils, this patch detects
      the linklayer by parsing the rate table.  It also supports future
      versions of iproute2 to send this linklayer parameter to the
      kernel directly. This is done by using the __reserved field in
      struct tc_ratespec, to convey the choosen linklayer option, but
      only using the lower 4 bits of this field.
      
      Linklayer detection is limited to speeds below 100Mbit/s, because
      at high rates the rtab is gets too inaccurate, so bad that
      several fields contain the same values, this resembling the ATM
      detect.  Fields even start to contain "0" time to send, e.g. at
      1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
      been more broken than we first realized.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a8e3d84
    • Nicolas Dichtel's avatar
      ip6tnl: add x-netns support · 0bd87628
      Nicolas Dichtel authored
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated. In other word, the encapsulated packet is received in a netns,
      where the lookup is done to find the tunnel. Once the tunnel is found, the
      packet is decapsulated and injecting into the corresponding interface which
      stands to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bd87628
    • Nicolas Dichtel's avatar
      ipip: add x-netns support · 6c742e71
      Nicolas Dichtel authored
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated. In other word, the encapsulated packet is received in a netns,
      where the lookup is done to find the tunnel. Once the tunnel is found, the
      packet is decapsulated and injecting into the corresponding interface which
      stands to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c742e71
  9. 13 Aug, 2013 2 commits
  10. 10 Aug, 2013 3 commits
    • Eric Dumazet's avatar
      net: attempt high order allocations in sock_alloc_send_pskb() · 28d64271
      Eric Dumazet authored
      Adding paged frags skbs to af_unix sockets introduced a performance
      regression on large sends because of additional page allocations, even
      if each skb could carry at least 100% more payload than before.
      
      We can instruct sock_alloc_send_pskb() to attempt high order
      allocations.
      
      Most of the time, it does a single page allocation instead of 8.
      
      I added an additional parameter to sock_alloc_send_pskb() to
      let other users to opt-in for this new feature on followup patches.
      
      Tested:
      
      Before patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    46861.15
      
      After patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    57981.11
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28d64271
    • Eric Dumazet's avatar
      af_unix: improve STREAM behavior with fragmented memory · e370a723
      Eric Dumazet authored
      unix_stream_sendmsg() currently uses order-2 allocations,
      and we had numerous reports this can fail.
      
      The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
      not helping.
      
      This patch extends the work done in commit eb6a2481
      ("af_unix: reduce high order page allocations) for
      datagram sockets.
      
      This opens the possibility of zero copy IO (splice() and
      friends)
      
      The trick is to not use skb_pull() anymore in recvmsg() path,
      and instead add a @consumed field in UNIXCB() to track amount
      of already read payload in the skb.
      
      There is a performance regression for large sends
      because of extra page allocations that will be addressed
      in a follow-up patch, allowing sock_alloc_send_pskb()
      to attempt high order page allocations.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e370a723
    • Yuchung Cheng's avatar
      tcp: add server ip to encrypt cookie in fast open · 149479d0
      Yuchung Cheng authored
      Encrypt the cookie with both server and client IPv4 addresses,
      such that multi-homed server will grant different cookies
      based on both the source and destination IPs. No client change
      is needed since cookie is opaque to the client.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      149479d0
  11. 09 Aug, 2013 5 commits
  12. 07 Aug, 2013 1 commit
  13. 05 Aug, 2013 2 commits
    • fan.du's avatar
      sctp: Pack dst_cookie into 1st cacheline hole for 64bit host · 5a139296
      fan.du authored
      As dst_cookie is used in fast path sctp_transport_dst_check.
      
      Before:
      struct sctp_transport {
      	struct list_head           transports;           /*     0    16 */
      	atomic_t                   refcnt;               /*    16     4 */
      	__u32                      dead:1;               /*    20:31  4 */
      	__u32                      rto_pending:1;        /*    20:30  4 */
      	__u32                      hb_sent:1;            /*    20:29  4 */
      	__u32                      pmtu_pending:1;       /*    20:28  4 */
      
      	/* XXX 28 bits hole, try to pack */
      
      	__u32                      sack_generation;      /*    24     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct flowi               fl;                   /*    32    64 */
      	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
      	union sctp_addr            ipaddr;               /*    96    28 */
      
      After:
      struct sctp_transport {
      	struct list_head           transports;           /*     0    16 */
      	atomic_t                   refcnt;               /*    16     4 */
      	__u32                      dead:1;               /*    20:31  4 */
      	__u32                      rto_pending:1;        /*    20:30  4 */
      	__u32                      hb_sent:1;            /*    20:29  4 */
      	__u32                      pmtu_pending:1;       /*    20:28  4 */
      
      	/* XXX 28 bits hole, try to pack */
      
      	__u32                      sack_generation;      /*    24     4 */
      	u32                        dst_cookie;           /*    28     4 */
      	struct flowi               fl;                   /*    32    64 */
      	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
      	union sctp_addr            ipaddr;               /*    96    28 */
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a139296
    • Mathias Krause's avatar
      xfrm: constify mark argument of xfrm_find_acq() · e473fcb4
      Mathias Krause authored
      The mark argument is read only, so constify it. Also make dummy_mark in
      af_key const -- only used as dummy argument for this very function.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      e473fcb4
  14. 04 Aug, 2013 1 commit
  15. 03 Aug, 2013 2 commits
    • Eric Dumazet's avatar
      fib_rules: reorder struct fib_rules fields · fba3679d
      Eric Dumazet authored
      Move refcnt, pref, suppress_ifgroup, suppress_prefixlen out of first
      cache line, as they are not used in fast path.
      
      Make sure ctarget & fr_net are in first cache line.
      
      (Assuming 64 bit arches and 64 bytes cache lines)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fba3679d
    • Stefan Tomanek's avatar
      fib_rules: fix suppressor names and default values · 73f5698e
      Stefan Tomanek authored
      This change brings the suppressor attribute names into line; it also changes
      the data types to provide a more consistent interface.
      
      While -1 indicates that the suppressor is not enabled, values >= 0 for
      suppress_prefixlen or suppress_ifgroup  reject routing decisions violating the
      constraint.
      
      This changes the previously presented behaviour of suppress_prefixlen, where a
      prefix length _less_ than the attribute value was rejected. After this change,
      a prefix length less than *or* equal to the value is considered a violation of
      the rule constraint.
      
      It also changes the default values for default and newly added rules (disabling
      any suppression for those).
      Signed-off-by: default avatarStefan Tomanek <stefan.tomanek@wertarbyte.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73f5698e
  16. 02 Aug, 2013 3 commits
  17. 01 Aug, 2013 4 commits