1. 04 Apr, 2016 1 commit
    • Eric Dumazet's avatar
      tcp/dccp: do not touch listener sk_refcnt under synflood · 3b24d854
      Eric Dumazet authored
      When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
      cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.
      
      By letting listeners use SOCK_RCU_FREE infrastructure,
      we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt
      
      Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
      only listeners are impacted by this change.
      
      Peak performance under SYNFLOOD is increased by ~33% :
      
      On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps
      
      Most consuming functions are now skb_set_owner_w() and sock_wfree()
      contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b24d854
  2. 11 Feb, 2016 2 commits
    • Craig Gallek's avatar
      inet: refactor inet[6]_lookup functions to take skb · a583636a
      Craig Gallek authored
      This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
      groups.  Doing so with a BPF filter will require access to the
      skb in question.  This change plumbs the skb (and offset to payload
      data) through the call stack to the listening socket lookup
      implementations where it will be used in a following patch.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a583636a
    • Craig Gallek's avatar
      inet: create IPv6-equivalent inet_hash function · 496611d7
      Craig Gallek authored
      In order to support fast lookups for TCP sockets with SO_REUSEPORT,
      the function that adds sockets to the listening hash set needs
      to be able to check receive address equality.  Since this equality
      check is different for IPv4 and IPv6, we will need two different
      socket hashing functions.
      
      This patch adds inet6_hash identical to the existing inet_hash function
      and updates the appropriate references.  A following patch will
      differentiate the two by passing different comparison functions to
      __inet_hash.
      
      Additionally, in order to use the IPv6 address equality function from
      inet6_hashtables (which is compiled as a built-in object when IPv6 is
      enabled) it also needs to be in a built-in object file as well.  This
      moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      496611d7
  3. 18 Mar, 2015 1 commit
  4. 05 Nov, 2014 1 commit
  5. 17 Oct, 2014 1 commit
  6. 19 Oct, 2013 1 commit
  7. 08 Oct, 2013 1 commit
    • Eric Dumazet's avatar
      ipv6: make lookups simpler and faster · efe4208f
      Eric Dumazet authored
      TCP listener refactoring, part 4 :
      
      To speed up inet lookups, we moved IPv4 addresses from inet to struct
      sock_common
      
      Now is time to do the same for IPv6, because it permits us to have fast
      lookups for all kind of sockets, including upcoming SYN_RECV.
      
      Getting IPv6 addresses in TCP lookups currently requires two extra cache
      lines, plus a dereference (and memory stall).
      
      inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
      
      This patch is way bigger than its IPv4 counter part, because for IPv4,
      we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
      it's not doable easily.
      
      inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
      inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
      
      And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
      at the same offset.
      
      We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
      macro.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe4208f
  8. 21 Sep, 2013 1 commit
  9. 21 Feb, 2013 1 commit
  10. 23 Jan, 2013 1 commit
    • Tom Herbert's avatar
      soreuseport: TCP/IPv6 implementation · 5ba24953
      Tom Herbert authored
      Motivation for soreuseport would be something like a web server
      binding to port 80 running with multiple threads, where each thread
      might have it's own listener socket.  This could be done as an
      alternative to other models: 1) have one listener thread which
      dispatches completed connections to workers. 2) accept on a single
      listener socket from multiple threads.  In case #1 the listener thread
      can easily become the bottleneck with high connection turn-over rate.
      In case #2, the proportion of connections accepted per thread tends
      to be uneven under high connection load (assuming simple event loop:
      while (1) { accept(); process() }, wakeup does not promote fairness
      among the sockets.  We have seen the  disproportion to be as high
      as 3:1 ratio between thread accepting most connections and the one
      accepting the fewest.  With so_reusport the distribution is
      uniform.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ba24953
  11. 26 Jul, 2012 1 commit
  12. 11 Dec, 2011 1 commit
  13. 08 Dec, 2009 1 commit
  14. 18 Oct, 2009 1 commit
    • Eric Dumazet's avatar
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet authored
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c720c7e8
  15. 03 Jun, 2009 1 commit
  16. 07 Oct, 2008 2 commits
  17. 16 Jun, 2008 2 commits
  18. 03 Feb, 2008 1 commit
    • Arnaldo Carvalho de Melo's avatar
      [SOCK] proto: Add hashinfo member to struct proto · ab1e0a13
      Arnaldo Carvalho de Melo authored
      This way we can remove TCP and DCCP specific versions of
      
      sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
      sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                             a specific version to deal with mapped sockets
      sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly
      
      struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
      that inet_csk_get_port can find the per family routine.
      
      Now only the lookup routines receive as a parameter a struct inet_hashtable.
      
      With this we further reuse code, reducing the difference among INET transport
      protocols.
      
      Eventually work has to be done on UDP and SCTP to make them share this
      infrastructure and get as a bonus inet_diag interfaces so that iproute can be
      used with these protocols.
      
      net-2.6/net/ipv4/inet_hashtables.c:
        struct proto			     |   +8
        struct inet_connection_sock_af_ops |   +8
       2 structs changed
        __inet_hash_nolisten               |  +18
        __inet_hash                        | -210
        inet_put_port                      |   +8
        inet_bind_bucket_create            |   +1
        __inet_hash_connect                |   -8
       5 functions changed, 27 bytes added, 218 bytes removed, diff: -191
      
      net-2.6/net/core/sock.c:
        proto_seq_show                     |   +3
       1 function changed, 3 bytes added, diff: +3
      
      net-2.6/net/ipv4/inet_connection_sock.c:
        inet_csk_get_port                  |  +15
       1 function changed, 15 bytes added, diff: +15
      
      net-2.6/net/ipv4/tcp.c:
        tcp_set_state                      |   -7
       1 function changed, 7 bytes removed, diff: -7
      
      net-2.6/net/ipv4/tcp_ipv4.c:
        tcp_v4_get_port                    |  -31
        tcp_v4_hash                        |  -48
        tcp_v4_destroy_sock                |   -7
        tcp_v4_syn_recv_sock               |   -2
        tcp_unhash                         | -179
       5 functions changed, 267 bytes removed, diff: -267
      
      net-2.6/net/ipv6/inet6_hashtables.c:
        __inet6_hash |   +8
       1 function changed, 8 bytes added, diff: +8
      
      net-2.6/net/ipv4/inet_hashtables.c:
        inet_unhash                        | +190
        inet_hash                          | +242
       2 functions changed, 432 bytes added, diff: +432
      
      vmlinux:
       16 functions changed, 485 bytes added, 492 bytes removed, diff: -7
      
      /home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
        tcp_v6_get_port                    |  -31
        tcp_v6_hash                        |   -7
        tcp_v6_syn_recv_sock               |   -9
       3 functions changed, 47 bytes removed, diff: -47
      
      /home/acme/git/net-2.6/net/dccp/proto.c:
        dccp_destroy_sock                  |   -7
        dccp_unhash                        | -179
        dccp_hash                          |  -49
        dccp_set_state                     |   -7
        dccp_done                          |   +1
       5 functions changed, 1 bytes added, 242 bytes removed, diff: -241
      
      /home/acme/git/net-2.6/net/dccp/ipv4.c:
        dccp_v4_get_port                   |  -31
        dccp_v4_request_recv_sock          |   -2
       2 functions changed, 33 bytes removed, diff: -33
      
      /home/acme/git/net-2.6/net/dccp/ipv6.c:
        dccp_v6_get_port                   |  -31
        dccp_v6_hash                       |   -7
        dccp_v6_request_recv_sock          |   +5
       3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab1e0a13
  19. 31 Jan, 2008 1 commit
  20. 25 Apr, 2007 1 commit
  21. 02 Dec, 2006 1 commit
  22. 26 Apr, 2006 1 commit
  23. 09 Apr, 2006 1 commit
  24. 03 Jan, 2006 2 commits
  25. 03 Oct, 2005 1 commit
    • Eric Dumazet's avatar
      [INET]: speedup inet (tcp/dccp) lookups · 81c3d547
      Eric Dumazet authored
      Arnaldo and I agreed it could be applied now, because I have other
      pending patches depending on this one (Thank you Arnaldo)
      
      (The other important patch moves skc_refcnt in a separate cache line,
      so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)
      
      1) First some performance data :
      --------------------------------
      
      tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()
      
      The most time critical code is :
      
      sk_for_each(sk, node, &head->chain) {
           if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
               goto hit; /* You sunk my battleship! */
      }
      
      The sk_for_each() does use prefetch() hints but only the begining of
      "struct sock" is prefetched.
      
      As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
      away from the begining of "struct sock", it has to bring into CPU
      cache cold cache line. Each iteration has to use at least 2 cache
      lines.
      
      This can be problematic if some chains are very long.
      
      2) The goal
      -----------
      
      The idea I had is to change things so that INET_MATCH() may return
      FALSE in 99% of cases only using the data already in the CPU cache,
      using one cache line per iteration.
      
      3) Description of the patch
      ---------------------------
      
      Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
      filling a 32 bits hole on 64 bits platform.
      
      struct sock_common {
      	unsigned short		skc_family;
      	volatile unsigned char	skc_state;
      	unsigned char		skc_reuse;
      	int			skc_bound_dev_if;
      	struct hlist_node	skc_node;
      	struct hlist_node	skc_bind_node;
      	atomic_t		skc_refcnt;
      +	unsigned int		skc_hash;
      	struct proto		*skc_prot;
      };
      
      Store in this 32 bits field the full hash, not masked by (ehash_size -
      1) Using this full hash as the first comparison done in INET_MATCH
      permits us immediatly skip the element without touching a second cache
      line in case of a miss.
      
      Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
      sk_hash and tw_hash) already contains the slot number if we mask with
      (ehash_size - 1)
      
      File include/net/inet_hashtables.h
      
      64 bits platforms :
      #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))
           ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
           ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      32bits platforms:
      #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))                 &&  \
           (inet_sk(__sk)->daddr          == (__saddr))   &&  \
           (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      
      - Adds a prefetch(head->chain.first) in 
      __inet_lookup_established()/__tcp_v4_check_established() and 
      __inet6_lookup_established()/__tcp_v6_check_established() and 
      __dccp_v4_check_established() to bring into cache the first element of the 
      list, before the {read|write}_lock(&head->lock);
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81c3d547
  26. 29 Aug, 2005 2 commits