1. 14 Apr, 2016 1 commit
    • Craig Gallek's avatar
      soreuseport: fix ordering for mixed v4/v6 sockets · d894ba18
      Craig Gallek authored
      With the SO_REUSEPORT socket option, it is possible to create sockets
      in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
      This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
      the AF_INET6 sockets.
      
      Prior to the commits referenced below, an incoming IPv4 packet would
      always be routed to a socket of type AF_INET when this mixed-mode was used.
      After those changes, the same packet would be routed to the most recently
      bound socket (if this happened to be an AF_INET6 socket, it would
      have an IPv4 mapped IPv6 address).
      
      The change in behavior occurred because the recent SO_REUSEPORT optimizations
      short-circuit the socket scoring logic as soon as they find a match.  They
      did not take into account the scoring logic that favors AF_INET sockets
      over AF_INET6 sockets in the event of a tie.
      
      To fix this problem, this patch changes the insertion order of AF_INET
      and AF_INET6 addresses in the TCP and UDP socket lists when the sockets
      have SO_REUSEPORT set.  AF_INET sockets will be inserted at the head of the
      list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at
      the tail of the list.  This will force AF_INET sockets to always be
      considered first.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection")
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d894ba18
  2. 02 Jun, 2013 1 commit
  3. 19 Aug, 2010 1 commit
  4. 25 Feb, 2010 1 commit
    • Paul E. McKenney's avatar
      rcu: Disable lockdep checking in RCU list-traversal primitives · 3120438a
      Paul E. McKenney authored
      The theory is that use of bare rcu_dereference() is more prone
      to error than use of the RCU list-traversal primitives.
      Therefore, disable lockdep RCU read-side critical-section
      checking in these primitives for the time being.  Once all of
      the rcu_dereference() uses have been dealt with, it may be time
      to re-enable lockdep checking for the RCU list-traversal
      primitives.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-4-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3120438a
  5. 19 Sep, 2009 1 commit
    • Paul E. McKenney's avatar
      rcu: Fix whitespace inconsistencies · a71fca58
      Paul E. McKenney authored
      Fix a number of whitespace ^Ierrors in the include/linux/rcu*
      and the kernel/rcu* files.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      LKML-Reference: <20090918172819.GA24405@linux.vnet.ibm.com>
      [ did more checkpatch fixlets ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a71fca58
  6. 16 Nov, 2008 1 commit
    • Eric Dumazet's avatar
      rcu: Introduce hlist_nulls variant of hlist · bbaffaca
      Eric Dumazet authored
      hlist uses NULL value to finish a chain.
      
      hlist_nulls variant use the low order bit set to 1 to signal an end-of-list marker.
      
      This allows to store many different end markers, so that some RCU lockless
      algos (used in TCP/UDP stack for example) can save some memory barriers in
      fast paths.
      
      Two new files are added :
      
      include/linux/list_nulls.h
        - mimics hlist part of include/linux/list.h, derived to hlist_nulls variant
      
      include/linux/rculist_nulls.h
        - mimics hlist part of include/linux/rculist.h, derived to hlist_nulls variant
      
         Only four helpers are declared for the moment :
      
           hlist_nulls_del_init_rcu(), hlist_nulls_del_rcu(),
           hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry_rcu()
      
      prefetches() were removed, since an end of list is not anymore NULL value.
      prefetches() could trigger useless (and possibly dangerous) memory transactions.
      
      Example of use (extracted from __udp4_lib_lookup())
      
      	struct sock *sk, *result;
              struct hlist_nulls_node *node;
              unsigned short hnum = ntohs(dport);
              unsigned int hash = udp_hashfn(net, hnum);
              struct udp_hslot *hslot = &udptable->hash[hash];
              int score, badness;
      
              rcu_read_lock();
      begin:
              result = NULL;
              badness = -1;
              sk_nulls_for_each_rcu(sk, node, &hslot->head) {
                      score = compute_score(sk, net, saddr, hnum, sport,
                                            daddr, dport, dif);
                      if (score > badness) {
                              result = sk;
                              badness = score;
                      }
              }
              /*
               * if the nulls value we got at the end of this lookup is
               * not the expected one, we must restart lookup.
               * We probably met an item that was moved to another chain.
               */
              if (get_nulls_value(node) != hash)
                      goto begin;
      
              if (result) {
                      if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
                              result = NULL;
                      else if (unlikely(compute_score(result, net, saddr, hnum, sport,
                                        daddr, dport, dif) < badness)) {
                              sock_put(result);
                              goto begin;
                      }
              }
              rcu_read_unlock();
              return result;
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbaffaca