Skip to content
  • Mike Stroyan's avatar
    [IPV4] tcp/route: Another look at hash table sizes · 18955cfc
    Mike Stroyan authored
    
    
      The tcp_ehash hash table gets too big on systems with really big memory.
    It is worse on systems with pages larger than 4KB.  It wastes memory that
    could be better used.  It also makes the netstat command slow because reading
    /proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table.
    
      The default value should not be larger for larger page sizes.  It seems
    that the effect of page size is an unintended error dating back a long
    time.  I also wonder if the default value really should be a larger
    fraction of memory for systems with more memory.  While systems with
    really big ram can afford more space for hash tables, it is not clear to
    me that they benefit from increasing the allocation ratio for this table.
    
      The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and
    mm/page_alloc.c:alloc_large_system_hash.
    
    tcp_init calls alloc_large_system_hash passing parameters-
        bucketsize=sizeof(struct tcp_ehash_bucket)
        numentries=thash_entries
        scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT)
        limit=0
    
    On i386, PAGE_SHIFT is 12 for a page size of 4K
    On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K
    
    The num_physpages test above makes the allocation take a larger fraction
    of the total memory on systems with larger memory.  The threshold size
    for a i386 system is 512MB.  For an ia64 system with 16KB pages the
    threshold is 2GB.
    
    For smaller memory systems-
    On i386, scale = (27 - 12) = 15
    On ia64, scale = (27 - 14) = 13
    For larger memory systems-
    On i386, scale = (25 - 12) = 13
    On ia64, scale = (25 - 14) = 11
    
      For the rest of this discussion, I'll just track the larger memory case.
    
      The default behavior has numentries=thash_entries=0, so the allocated
    size is determined by either scale or by the default limit of 1/16 of
    total memory.
    
    In alloc_large_system_hash-
    |	numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages;
    |	numentries += (1UL << (20 - PAGE_SHIFT)) - 1;
    |	numentries >>= 20 - PAGE_SHIFT;
    |	numentries <<= 20 - PAGE_SHIFT;
    
      At this point, numentries is pages for all of memory, rounded up to the
    nearest megabyte boundary.
    
    |	/* limit to 1 bucket per 2^scale bytes of low memory */
    |	if (scale > PAGE_SHIFT)
    |		numentries >>= (scale - PAGE_SHIFT);
    |	else
    |		numentries <<= (PAGE_SHIFT - scale);
    
    On i386, numentries >>= (13 - 12), so numentries is 1/8196 of
    bytes of total memory.
    On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of
    bytes of total memory.
    
    |        log2qty = long_log2(numentries);
    |
    |        do {
    |                size = bucketsize << log2qty;
    
    bucketsize is 16, so size is 16 times numentries, rounded
    down to a power of two.
    
    On i386, size is 1/512 of bytes of total memory.
    On ia64, size is 1/128 of bytes of total memory.
    
    For smaller systems the results are
    On i386, size is 1/2048 of bytes of total memory.
    On ia64, size is 1/512 of bytes of total memory.
    
      The large page effect can be removed by just replacing
    the use of PAGE_SHIFT with a constant of 12 in the calls to
    alloc_large_system_hash.  That makes them more like the other uses of
    that function from fs/inode.c and fs/dcache.c
    
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    18955cfc