Skip to content
  • Evgeniy Polyakov's avatar
    inet: Allowing more than 64k connections and heavily optimize bind(0) time. · a9d8f911
    Evgeniy Polyakov authored
    
    
    With simple extension to the binding mechanism, which allows to bind more
    than 64k sockets (or smaller amount, depending on sysctl parameters),
    we have to traverse the whole bind hash table to find out empty bucket.
    And while it is not a problem for example for 32k connections, bind()
    completion time grows exponentially (since after each successful binding
    we have to traverse one bucket more to find empty one) even if we start
    each time from random offset inside the hash table.
    
    So, when hash table is full, and we want to add another socket, we have
    to traverse the whole table no matter what, so effectivelly this will be
    the worst case performance and it will be constant.
    
    Attached picture shows bind() time depending on number of already bound
    sockets.
    
    Green area corresponds to the usual binding to zero port process, which
    turns on kernel port selection as described above. Red area is the bind
    process, when number of reuse-bound sockets is not limited by 64k (or
    sysctl parameters). The same exponential growth (hidden by the green
    area) before number of ports reaches sysctl limit.
    
    At this time bind hash table has exactly one reuse-enbaled socket in a
    bucket, but it is possible that they have different addresses. Actually
    kernel selects the first port to try randomly, so at the beginning bind
    will take roughly constant time, but with time number of port to check
    after random start will increase. And that will have exponential growth,
    but because of above random selection, not every next port selection
    will necessary take longer time than previous. So we have to consider
    the area below in the graph (if you could zoom it, you could find, that
    there are many different times placed there), so area can hide another.
    
    Blue area corresponds to the port selection optimization.
    
    This is rather simple design approach: hashtable now maintains (unprecise
    and racely updated) number of currently bound sockets, and when number
    of such sockets becomes greater than predefined value (I use maximum
    port range defined by sysctls), we stop traversing the whole bind hash
    table and just stop at first matching bucket after random start. Above
    limit roughly corresponds to the case, when bind hash table is full and
    we turned on mechanism of allowing to bind more reuse-enabled sockets,
    so it does not change behaviour of other sockets.
    
    Signed-off-by: default avatarEvgeniy Polyakov <zbr@ioremap.net>
    Tested-by: default avatarDenys Fedoryschenko <denys@visp.net.lb>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    a9d8f911