1. 26 Jun, 2013 7 commits
  2. 20 Jun, 2013 9 commits
    • Eric Dumazet's avatar
      netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag · 681f130f
      Eric Dumazet authored
      xt_socket module can be a nice replacement to conntrack module
      in some cases (SYN filtering for example)
      
      But it lacks the ability to match the 3rd packet of TCP
      handshake (ACK coming from the client).
      
      Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism.
      
      The wildcard is the legacy socket match behavior, that ignores
      LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent)
      
      iptables -I INPUT -p tcp --syn -j SYN_CHAIN
      iptables -I INPUT -m socket --nowildcard -j ACCEPT
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      681f130f
    • Florian Westphal's avatar
      netfilter: nf_conntrack: avoid large timeout for mid-stream pickup · 6547a221
      Florian Westphal authored
      When loose tracking is enabled (default), non-syn packets cause
      creation of new conntracks in established state with default timeout for
      established state (5 days).  This causes the table to fill up with UNREPLIED
      when the 'new ack' packet happened to be the last-ack of a previous,
      already timed-out connection.
      
      Consider:
      
      A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255
      B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123
      <61 second pause>
      C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123
      D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255
      
      B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout,
      C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout.
      
      Use UNACK timeout (5 minutes) instead to get rid of these entries sooner
      when in ESTABLISHED state without having seen traffic in both directions.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6547a221
    • Daniel Borkmann's avatar
      netfilter: check return code from nla_parse_tested · 130ffbc2
      Daniel Borkmann authored
      These are the only calls under net/ that do not check nla_parse_nested()
      for its error code, but simply continue execution. If parsing of netlink
      attributes fails, we should return with an error instead of continuing.
      In nearly all of these calls we have a policy attached, that is being
      type verified during nla_parse_nested(), which we would miss checking
      for otherwise.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      130ffbc2
    • Joe Perches's avatar
      ndisc: Convert use of typedef ctl_table to struct ctl_table · fedaf4ff
      Joe Perches authored
      This typedef is unnecessary and should just be removed.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fedaf4ff
    • Joe Perches's avatar
      ipv6: Convert use of typedef ctl_table to struct ctl_table · 9e8cda3b
      Joe Perches authored
      This typedef is unnecessary and should just be removed.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e8cda3b
    • Rami Rosen's avatar
      inet: frag , remove an empty ifdef. · af92e542
      Rami Rosen authored
      This patch removes an empty ifdef from inet_frag_intern()
      in net/ipv4/inet_fragment.c.
      
      commit b67bfe0d
      (hlist: drop the node parameter from iterators) removed hlist from
      net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
      which is now empty.
      Signed-off-by: default avatarRami Rosen <ramirose@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af92e542
    • Eric Dumazet's avatar
      htb: refactor struct htb_sched fields for performance · c9364636
      Eric Dumazet authored
      htb_sched structures are big, and source of false sharing on SMP.
      
      Every time a packet is queued or dequeue, many cache lines must be
      touched because structures are not lay out properly.
      
      By carefully splitting htb_sched in two parts, and define sub structures
      to increase data locality, we can improve performance dramatically on
      SMP.
      
      New htb_prio structure can also be used in htb_class to increase data
      locality.
      
      I got 26 % performance increase on a 24 threads machine, with 200
      concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9364636
    • Cong Wang's avatar
      tcp: introduce a per-route knob for quick ack · bcefe17c
      Cong Wang authored
      In previous discussions, I tried to find some reasonable heuristics
      for delayed ACK, however this seems not possible, according to Eric:
      
      	"ACKS might also be delayed because of bidirectional
      	traffic, and is more controlled by the application
      	response time. TCP stack can not easily estimate it."
      
      	"ACK can be incredibly useful to recover from losses in
      	a short time.
      
      	The vast majority of TCP sessions are small lived, and we
      	send one ACK per received segment anyway at beginning or
      	retransmits to let the sender smoothly increase its cwnd,
      	so an auto-tuning facility wont help them that much."
      
      and according to David:
      
      	"ACKs are the only information we have to detect loss.
      
      	And, for the same reasons that TCP VEGAS is fundamentally
      	broken, we cannot measure the pipe or some other
      	receiver-side-visible piece of information to determine
      	when it's "safe" to stretch ACK.
      
      	And even if it's "safe", we should not do it so that losses are
      	accurately detected and we don't spuriously retransmit.
      
      	The only way to know when the bandwidth increases is to
      	"test" it, by sending more and more packets until drops happen.
      	That's why all successful congestion control algorithms must
      	operate on explicited tested pieces of information.
      
      	Similarly, it's not really possible to universally know if
      	it's safe to stretch ACK or not."
      
      It still makes sense to enable or disable quick ack mode like
      what TCP_QUICK_ACK does.
      
      Similar to TCP_QUICK_ACK option, but for people who can't
      modify the source code and still wants to control
      TCP delayed ACK behavior. As David suggested, this should belong
      to per-path scope, since different pathes may want different
      behaviors.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Rick Jones <rick.jones2@hp.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      CC: David Laight <David.Laight@ACULAB.COM>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcefe17c
    • Dave Jones's avatar
      2c0740e4
  3. 19 Jun, 2013 24 commits