• Eric Dumazet's avatar
    net: speedup sk_wake_async() · bcdce719
    Eric Dumazet authored
    An incoming datagram must bring into cpu cache *lot* of cache lines,
    in particular : (other parts omitted (hash chains, ip route cache...))
    
    On 32bit arches :
    
    offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
    offsetof(struct sock, sk_lock)         =0x34   (rw)
    
    offsetof(struct sock, sk_sleep)        =0x50 (read)
    offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
    offsetof(struct sock, sk_receive_queue)=0x74   (rw)
    
    offsetof(struct sock, sk_forward_alloc)=0x98   (rw)
    
    offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
    offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
    offsetof(struct sock, sk_filter)       =0xf8    (read)
    
    offsetof(struct sock, sk_socket)       =0x138 (read)
    
    offsetof(struct sock, sk_data_ready)   =0x15c   (read)
    
    
    We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
    with no fasync() structures. (socket->fasync_list ptr is probably already in cache
    because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
    
    This avoids one cache line load per incoming packet for common cases (no fasync())
    
    We can leave (or even move in a future patch) sk->sk_socket in a cold location
    Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    bcdce719
sock.h 45.4 KB