All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

  1. 22 Sep, 2009 1 commit
  2. 29 Aug, 2009 1 commit
  3. 30 Jul, 2009 1 commit
    • Neil Horman's avatar
      xfrm: select sane defaults for xfrm[4|6] gc_thresh · a33bc5c1
      Neil Horman authored
      Choose saner defaults for xfrm[4|6] gc_thresh values on init
      
      Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
      (set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
      dynamically at boot time, the static selections can be non-sensical.
      This patch dynamically selects an appropriate gc threshold based on
      the corresponding main routing table size, using the assumption that
      we should in the worst case be able to handle as many connections as
      the routing table can.
      
      For ipv4, the maximum route cache size is 16 * the number of hash
      buckets in the route cache.  Given that xfrm4 starts garbage
      collection at the gc_thresh and prevents new allocations at 2 *
      gc_thresh, we set gc_thresh to half the maximum route cache size.
      
      For ipv6, its a bit trickier.  there is no maximum route cache size,
      but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
      sane to select a simmilar gc_thresh for the xfrm6 code that is half
      the number of hash buckets in the v6 route cache times 16 (like the v4
      code does).
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33bc5c1
  4. 23 Jun, 2009 1 commit
    • Neil Horman's avatar
      ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off · b6280b47
      Neil Horman authored
      When route caching is disabled (rt_caching returns false), We still use route
      cache entries that are created and passed into rt_intern_hash once.  These
      routes need to be made usable for the one call path that holds a reference to
      them, and they need to be reclaimed when they're finished with their use.  To be
      made usable, they need to be associated with a neighbor table entry (which they
      currently are not), otherwise iproute_finish2 just discards the packet, since we
      don't know which L2 peer to send the packet to.  To do this binding, we need to
      follow the path a bit higher up in rt_intern_hash, which calls
      arp_bind_neighbour, but not assign the route entry to the hash table.
      Currently, if caching is off, we simply assign the route to the rp pointer and
      are reutrn success.  This patch associates us with a neighbor entry first.
      
      Secondly, we need to make sure that any single use routes like this are known to
      the garbage collector when caching is off.  If caching is off, and we try to
      hash in a route, it will leak when its refcount reaches zero.  To avoid this,
      this patch calls rt_free on the route cache entry passed into rt_intern_hash.
      This places us on the gc list for the route cache garbage collector, so that
      when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this
      suggestion).
      
      I've tested this on a local system here, and with these patches in place, I'm
      able to maintain routed connectivity to remote systems, even if I set
      /proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to
      return false.
      Signed-off-by: default avatarNeil Horman <nhorman@redhat.com>
      Reported-by: default avatarJarek Poplawski <jarkao2@gmail.com>
      Reported-by: default avatarMaxime Bizon <mbizon@freebox.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6280b47
  5. 20 Jun, 2009 1 commit
    • Neil Horman's avatar
      ipv4: fix NULL pointer + success return in route lookup path · 73e42897
      Neil Horman authored
      Don't drop route if we're not caching	
      
      	I recently got a report of an oops on a route lookup.  Maxime was
      testing what would happen if route caching was turned off (doing so by setting
      making rt_caching always return 0), and found that it triggered an oops.  I
      looked at it and found that the problem stemmed from the fact that the route
      lookup routines were returning success from their lookup paths (which is good),
      but never set the **rp pointer to anything (which is bad).  This happens because
      in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0.
      This almost emulates slient success.  What we should be doing is assigning *rp =
      rt and _not_ dropping the route.  This way, during slow path lookups, when we
      create a new route cache entry, we don't immediately discard it, rather we just
      don't add it into the cache hash table, but we let this one lookup use it for
      the purpose of this route request.  Maxime has tested and reports it prevents
      the oops.  There is still a subsequent routing issue that I'm looking into
      further, but I'm confident that, even if its related to this same path, this
      patch makes sense to take.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73e42897
  6. 14 Jun, 2009 1 commit
  7. 03 Jun, 2009 2 commits
  8. 20 May, 2009 2 commits
    • Eric Dumazet's avatar
      net: fix rtable leak in net/ipv4/route.c · 1ddbcb00
      Eric Dumazet authored
      Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
      analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
      Quoted here because its a perfect one :
      
      begin_of_quotation
       2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
       patch has at least one critical flaw, and another problem.
      
       rt_intern_hash calculates rthi pointer, which is later used for new entry
       insertion. The same loop calculates cand pointer which is used to clean the
       list. If the pointers are the same, rtable leak occurs, as first the cand is
       removed then the new entry is appended to it.
      
       This leak leads to unregister_netdevice problem (usage count > 0).
      
       Another problem of the patch is that it tries to insert the entries in certain
       order, to facilitate counting of entries distinct by all but QoS parameters.
       Unfortunately, referencing an existing rtable entry moves it to list beginning,
       to speed up further lookups, so the carefully built order is destroyed.
      
       For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
       it will also destroy the ordering.
      end_of_quotation
      
      Problematic commit is 1080d709
      (net: implement emergency route cache rebulds when gc_elasticity is exceeded)
      
      Trying to keep dst_entries ordered is too complex and breaks the fact that
      order should depend on the frequency of use for garbage collection.
      
      A possible fix is to make rt_intern_hash() simpler, and only makes
      rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
      entries order. The added loop is running on cache hot data, while cpu
      is prefetching next object, so should be unnoticied.
      Reported-and-analyzed-by: default avatarAlexander V. Lukyanov <lav@yar.ru>
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ddbcb00
    • Eric Dumazet's avatar
      net: fix length computation in rt_check_expire() · cf8da764
      Eric Dumazet authored
      rt_check_expire() computes average and standard deviation of chain lengths,
      but not correclty reset length to 0 at beginning of each chain.
      This probably gives overflows for sum2 (and sum) on loaded machines instead
      of meaningful results.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf8da764
  9. 27 Apr, 2009 1 commit
    • Anton Blanchard's avatar
      ipv4: Limit size of route cache hash table · c9503e0f
      Anton Blanchard authored
      Right now we have no upper limit on the size of the route cache hash table.
      On a 128GB POWER6 box it ends up as 32MB:
      
          IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
      
      It would be nice to cap this for memory consumption reasons, but a massive
      hashtable also causes a significant spike when measuring OS jitter.
      
      With a 32MB hashtable and 4 million entries, rt_worker_func is taking
      5 ms to complete. On another system with more memory it's taking 14 ms.
      Even though rt_worker_func does call cond_sched() to limit its impact,
      in an HPC environment we want to keep all sources of OS jitter to a minimum.
      
      With the patch applied we limit the number of entries to 512k which
      can still be overriden by using the rt_entries boot option:
      
          IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)
      
      With this patch rt_worker_func now takes 0.460 ms on the same system.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9503e0f
  10. 25 Feb, 2009 1 commit
  11. 01 Feb, 2009 1 commit
  12. 22 Jan, 2009 1 commit
    • Benjamin Thery's avatar
      netns: ipmr: enable namespace support in ipv4 multicast routing code · 4feb88e5
      Benjamin Thery authored
      This last patch makes the appropriate changes to use and propagate the
      network namespace where needed in IPv4 multicast routing code.
      
      This consists mainly in replacing all the remaining init_net occurences
      with current netns pointer retrieved from sockets, net devices or
      mfc_caches depending on the routines' contexts.
      
      Some routines receive a new 'struct net' parameter to propagate the current
      netns:
      * vif_add/vif_delete
      * ipmr_new_tunnel
      * mroute_clean_tables
      * ipmr_cache_find
      * ipmr_cache_report
      * ipmr_cache_unresolved
      * ipmr_mfc_add/ipmr_mfc_delete
      * ipmr_get_route
      * rt_fill_info (in route.c)
      Signed-off-by: default avatarBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4feb88e5
  13. 29 Dec, 2008 1 commit
  14. 25 Nov, 2008 1 commit
  15. 11 Nov, 2008 1 commit
  16. 03 Nov, 2008 1 commit
    • Alexey Dobriyan's avatar
      net: '&' redux · 6d9f239a
      Alexey Dobriyan authored
      I want to compile out proc_* and sysctl_* handlers totally and
      stub them to NULL depending on config options, however usage of &
      will prevent this, since taking adress of NULL pointer will break
      compilation.
      
      So, drop & in front of every ->proc_handler and every ->strategy
      handler, it was never needed in fact.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d9f239a
  17. 31 Oct, 2008 1 commit
  18. 28 Oct, 2008 2 commits
  19. 27 Oct, 2008 1 commit
    • Neil Horman's avatar
      net: implement emergency route cache rebulds when gc_elasticity is exceeded · 1080d709
      Neil Horman authored
      This is a patch to provide on demand route cache rebuilding.  Currently, our
      route cache is rebulid periodically regardless of need.  This introduced
      unneeded periodic latency.  This patch offers a better approach.  Using code
      provided by Eric Dumazet, we compute the standard deviation of the average hash
      bucket chain length while running rt_check_expire.  Should any given chain
      length grow to larger that average plus 4 standard deviations, we trigger an
      emergency hash table rebuild for that net namespace.  This allows for the common
      case in which chains are well behaved and do not grow unevenly to not incur any
      latency at all, while those systems (which may be being maliciously attacked),
      only rebuild when the attack is detected.  This patch take 2 other factors into
      account:
      1) chains with multiple entries that differ by attributes that do not affect the
      hash value are only counted once, so as not to unduly bias system to rebuilding
      if features like QOS are heavily used
      2) if rebuilding crosses a certain threshold (which is adjustable via the added
      sysctl in this patch), route caching is disabled entirely for that net
      namespace, since constant rebuilding is less efficient that no caching at all
      
      Tested successfully by me.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1080d709
  20. 16 Oct, 2008 2 commits
  21. 01 Oct, 2008 1 commit
  22. 28 Aug, 2008 1 commit
  23. 27 Aug, 2008 1 commit
  24. 25 Aug, 2008 1 commit
  25. 15 Aug, 2008 1 commit
    • Herbert Xu's avatar
      ipv4: Disable route secret interval on zero interval · c6153b5b
      Herbert Xu authored
      Let me first state that disabling the route cache hash rebuild
      should not be done without extensive analysis on the risk profile
      and careful deliberation.
      
      However, there are times when this can be done safely or for
      testing.  For example, when you have mechanisms for ensuring
      that offending parties do not exist in your network.
      
      This patch lets the user disable the rebuild if the interval is
      set to zero.  This also incidentally fixes a divide-by-zero error
      with name-spaces.
      
      In addition, this patch makes the effect of an interval change
      immediate rather than it taking effect at the next rebuild as
      is currently the case.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6153b5b
  26. 06 Aug, 2008 2 commits
  27. 01 Aug, 2008 1 commit
  28. 31 Jul, 2008 1 commit
    • Ingo Molnar's avatar
      net/ipv4/route.c: fix build error · 8a9204db
      Ingo Molnar authored
      fix:
      
      net/ipv4/route.c: In function 'ip_static_sysctl_init':
      net/ipv4/route.c:3225: error: 'ipv4_route_path' undeclared (first use in this function)
      net/ipv4/route.c:3225: error: (Each undeclared identifier is reported only once
      net/ipv4/route.c:3225: error: for each function it appears in.)
      net/ipv4/route.c:3225: error: 'ipv4_route_table' undeclared (first use in this function)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a9204db
  29. 27 Jul, 2008 2 commits
    • Al Viro's avatar
      missing bits of net-namespace / sysctl · eeb61f71
      Al Viro authored
      Piss-poor sysctl registration API strikes again, film at 11...
      
      What we really need is _pathname_ required to be present in already
      registered table, so that kernel could warn about bad order.  That's the
      next target for sysctl stuff (and generally saner and more explicit
      order of initialization of ipv[46] internals wouldn't hurt either).
      
      For the time being, here are full fixups required by ..._rotable()
      stuff; we make per-net sysctl sets descendents of "ro" one and make sure
      that sufficient skeleton is there before we start registering per-net
      sysctls.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eeb61f71
    • Al Viro's avatar
      net: missing bits of net-namespace / sysctl · 6f9f489a
      Al Viro authored
      Piss-poor sysctl registration API strikes again, film at 11...
      What we really need is _pathname_ required to be present in
      already registered table, so that kernel could warn about bad
      order.  That's the next target for sysctl stuff (and generally
      saner and more explicit order of initialization of ipv[46]
      internals wouldn't hurt either).
      
      For the time being, here are full fixups required by ..._rotable()
      stuff; we make per-net sysctl sets descendents of "ro" one and
      make sure that sufficient skeleton is there before we start registering
      per-net sysctls.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f9f489a
  30. 26 Jul, 2008 1 commit
  31. 16 Jul, 2008 1 commit
  32. 08 Jul, 2008 1 commit
  33. 05 Jul, 2008 2 commits