1. 26 Sep, 2010 1 commit
  2. 04 Jun, 2010 1 commit
  3. 17 May, 2010 2 commits
    • Eric Dumazet's avatar
      net: Introduce skb_tunnel_rx() helper · d19d56dd
      Eric Dumazet authored
      
      
      skb rxhash should be cleared when a skb is handled by a tunnel before
      being delivered again, so that correct packet steering can take place.
      
      There are other cleanups and accounting that we can factorize in a new
      helper, skb_tunnel_rx()
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d19d56dd
    • Eric Dumazet's avatar
      net: add a noref bit on skb dst · 7fee226a
      Eric Dumazet authored
      
      
      Use low order bit of skb->_skb_dst to tell dst is not refcounted.
      
      Change _skb_dst to _skb_refdst to make sure all uses are catched.
      
      skb_dst() returns the dst, regardless of noref bit set or not, but
      with a lockdep check to make sure a noref dst is not given if current
      user is not rcu protected.
      
      New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
      (with lockdep check)
      
      skb_dst_drop() drops a reference only if skb dst was refcounted.
      
      skb_dst_force() helper is used to force a refcount on dst, when skb
      is queued and not anymore RCU protected.
      
      Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
      !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
      sock_queue_rcv_skb(), in __nf_queue().
      
      Use skb_dst_force() in dev_requeue_skb().
      
      Note: dst_use_noref() still dirties dst, we might transform it
      later to do one dirtying per jiffies.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fee226a
  4. 13 Apr, 2010 1 commit
    • Eric Dumazet's avatar
      net: sk_dst_cache RCUification · b6c6712a
      Eric Dumazet authored
      
      
      With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
      work.
      
      sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)
      
      This rwlock is readlocked for a very small amount of time, and dst
      entries are already freed after RCU grace period. This calls for RCU
      again :)
      
      This patch converts sk_dst_lock to a spinlock, and use RCU for readers.
      
      __sk_dst_get() is supposed to be called with rcu_read_lock() or if
      socket locked by user, so use appropriate rcu_dereference_check()
      condition (rcu_read_lock_held() || sock_owned_by_user(sk))
      
      This patch avoids two atomic ops per tx packet on UDP connected sockets,
      for example, and permits sk_dst_lock to be much less dirtied.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6c6712a
  5. 23 Dec, 2009 1 commit
    • laurent chavey's avatar
      net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926
      laurent chavey authored
      
      
      Add rtnetlink init_rcvwnd to set the TCP initial receive window size
      advertised by passive and active TCP connections.
      The current Linux TCP implementation limits the advertised TCP initial
      receive window to the one prescribed by slow start. For short lived
      TCP connections used for transaction type of traffic (i.e. http
      requests), bounding the advertised TCP initial receive window results
      in increased latency to complete the transaction.
      Support for setting initial congestion window is already supported
      using rtnetlink init_cwnd, but the feature is useless without the
      ability to set a larger TCP initial receive window.
      The rtnetlink init_rcvwnd allows increasing the TCP initial receive
      window, allowing TCP connection to advertise larger TCP receive window
      than the ones bounded by slow start.
      Signed-off-by: default avatarLaurent Chavey <chavey@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31d12926
  6. 15 Dec, 2009 1 commit
    • David S. Miller's avatar
      tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11
      David S. Miller authored
      It creates a regression, triggering badness for SYN_RECV
      sockets, for example:
      
      [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
      [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
      [19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
      [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
      [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
      
      This is likely caused by the change in the 'estab' parameter
      passed to tcp_parse_options() when invoked by the functions
      in net/ipv4/tcp_minisocks.c
      
      But even if that is fixed, the ->conn_request() changes made in
      this patch series is fundamentally wrong.  They try to use the
      listening socket's 'dst' to probe the route settings.  The
      listening socket doesn't even have a route, and you can't
      get the right route (the child request one) until much later
      after we setup all of the state, and it must be done by hand.
      
      This stuff really isn't ready, so the best thing to do is a
      full revert.  This reverts the following commits:
      
      f55017a9
      022c3f7d
      1aba721e
      cda42ebd
      345cda2f
      dc343475
      05eaade2
      6a2a2d6b
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb5b7c11
  7. 05 Nov, 2009 1 commit
  8. 04 Nov, 2009 1 commit
  9. 29 Oct, 2009 1 commit
  10. 20 Oct, 2009 1 commit
  11. 01 Sep, 2009 1 commit
    • Alexey Dobriyan's avatar
      netns: embed ip6_dst_ops directly · 86393e52
      Alexey Dobriyan authored
      
      
      struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
      but there is no fundamental reason for it. Embed it directly into
      struct netns_ipv6.
      
      For that:
      * move struct dst_ops into separate header to fix circular dependencies
      	I honestly tried not to, it's pretty impossible to do other way
      * drop dynamical allocation, allocate together with netns
      
      For a change, remove struct dst_ops::dst_net, it's deducible
      by using container_of() given dst_ops pointer.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86393e52
  12. 03 Jun, 2009 1 commit
  13. 25 Nov, 2008 1 commit
  14. 16 Nov, 2008 1 commit
    • Eric Dumazet's avatar
      net: make sure struct dst_entry refcount is aligned on 64 bytes · 5635c10d
      Eric Dumazet authored
      As found in the past (commit f1dd9c37
      
      
      [NET]: Fix tbench regression in 2.6.25-rc1), it is really
      important that struct dst_entry refcount is aligned on a cache line.
      
      We cannot use __atribute((aligned)), so manually pad the structure
      for 32 and 64 bit arches.
      
      for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
      for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0
      
      As it is not possible to guess at compile time cache line size,
      we use a generic value of 64 bytes, that satisfies many current arches.
      (Using 128 bytes alignment on 64bit arches would waste 64 bytes)
      
      Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
      break this alignment.
      
      "tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
      (2350 MB/s instead of 2250 MB/s)
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5635c10d
  15. 11 Nov, 2008 1 commit
  16. 28 Oct, 2008 1 commit
  17. 05 Aug, 2008 1 commit
  18. 19 Jul, 2008 1 commit
    • Stephen Hemminger's avatar
      tcp: RTT metrics scaling · c1e20f7c
      Stephen Hemminger authored
      
      
      Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in
      kernel units (jiffies) and this leaks out through the netlink API to
      user space where the units for jiffies are unknown.
      
      This patches changes the kernel to convert to/from milliseconds. This
      changes the ABI, but milliseconds seemed like the most natural unit
      for these parameters.  Values available via syscall in
      /proc/net/rt_cache and netlink will be in milliseconds.
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1e20f7c
  19. 27 Mar, 2008 1 commit
  20. 12 Mar, 2008 1 commit
    • Zhang Yanmin's avatar
      [NET]: Fix tbench regression in 2.6.25-rc1 · f1dd9c37
      Zhang Yanmin authored
      Comparing with kernel 2.6.24, tbench result has regression with
      2.6.25-rc1.
      
      1) On 2 quad-core processor stoakley: 4%.
      2) On 4 quad-core processor tigerton: more than 30%.
      
      bisect located below patch.
      
      b4ce9277 is first bad commit
      commit b4ce9277
      
      
      Author: Herbert Xu <herbert@gondor.apana.org.au>
      Date:   Tue Nov 13 21:33:32 2007 -0800
      
          [IPV6]: Move nfheader_len into rt6_info
      
          The dst member nfheader_len is only used by IPv6.  It's also currently
          creating a rather ugly alignment hole in struct dst.  Therefore this patch
          moves it from there into struct rt6_info.
      
      Above patch changes the cache line alignment, especially member
      __refcnt. I did a testing by adding 2 unsigned long pading before
      lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next
      cache line. The performance is recovered.
      
      I created a patch to rearrange the members in struct dst_entry.
      
      With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.
      
      1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So
         sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I
         tested many patches on my 16-core tigerton by moving tclassid to
         different place. It looks like tclassid could also have impact on
         performance.  If moving tclassid before metrics, or just don't move
         tclassid, the performance isn't good. So I move it behind metrics.
      
      2) Add comments before __refcnt.
      
      On 16-core tigerton:
      
      If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18%
      better than the one without the patch;
      
      If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30%
      better than the one without the patch.
      
      With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't
      introduce regression.
      
      Thank Eric, Valdis, and David!
      Signed-off-by: default avatarZhang Yanmin <yanmin.zhang@intel.com>
      Acked-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1dd9c37
  21. 28 Jan, 2008 9 commits
  22. 10 Nov, 2007 1 commit
  23. 10 Jul, 2007 1 commit
  24. 24 May, 2007 1 commit
    • David S. Miller's avatar
      [XFRM]: Allow packet drops during larval state resolution. · 14e50e57
      David S. Miller authored
      
      
      The current IPSEC rule resolution behavior we have does not work for a
      lot of people, even though technically it's an improvement from the
      -EAGAIN buisness we had before.
      
      Right now we'll block until the key manager resolves the route.  That
      works for simple cases, but many folks would rather packets get
      silently dropped until the key manager resolves the IPSEC rules.
      
      We can't tell these folks to "set the socket non-blocking" because
      they don't have control over the non-block setting of things like the
      sockets used to resolve DNS deep inside of the resolver libraries in
      libc.
      
      With that in mind I coded up the patch below with some help from
      Herbert Xu which provides packet-drop behavior during larval state
      resolution, controllable via sysctl and off by default.
      
      This lays the framework to either:
      
      1) Make this default at some point or...
      
      2) Move this logic into xfrm{4,6}_policy.c and implement the
         ARP-like resolution queue we've all been dreaming of.
         The idea would be to queue packets to the policy, then
         once the larval state is resolved by the key manager we
         re-resolve the route and push the packets out.  The
         packets would timeout if the rule didn't get resolved
         in a certain amount of time.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14e50e57
  25. 11 Feb, 2007 2 commits
  26. 07 Dec, 2006 1 commit
  27. 28 Sep, 2006 1 commit
  28. 22 Sep, 2006 1 commit
  29. 26 Apr, 2006 1 commit
  30. 07 Jan, 2006 1 commit
    • Patrick McHardy's avatar
      [XFRM]: Netfilter IPsec output hooks · 16a6677f
      Patrick McHardy authored
      
      
      Call netfilter hooks before IPsec transforms. Packets visit the
      FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation
      and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode
      transform.
      
      Patch from Herbert Xu <herbert@gondor.apana.org.au>:
      
      Move the loop from dst_output into xfrm4_output/xfrm6_output since they're
      the only ones who need to it. xfrm{4,6}_output_one() processes the first SA
      all subsequent transport mode SAs and is called in a loop that calls the
      netfilter hooks between each two calls.
      
      In order to avoid the tail call issue, I've added the inline function
      nf_hook which is nf_hook_slow plus the empty list check.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16a6677f