Skip to content
  • Rainer Weikusat's avatar
    af_unix: fix 'poll for write'/ connected DGRAM sockets · 3c73419c
    Rainer Weikusat authored
    
    
    The unix_dgram_sendmsg routine implements a (somewhat crude)
    form of receiver-imposed flow control by comparing the length of the
    receive queue of the 'peer socket' with the max_ack_backlog value
    stored in the corresponding sock structure, either blocking
    the thread which caused the send-routine to be called or returning
    EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
    sockets. The poll-implementation for these socket types is
    datagram_poll from core/datagram.c. A socket is deemed to be writeable
    by this routine when the memory presently consumed by datagrams
    owned by it is less than the configured socket send buffer size. This
    is always wrong for connected PF_UNIX non-stream sockets when the
    abovementioned receive queue is currently considered to be full.
    'poll' will then return, indicating that the socket is writeable, but
    a subsequent write result in EAGAIN, effectively causing an
    (usual) application to 'poll for writeability by repeated send request
    with O_NONBLOCK set' until it has consumed its time quantum.
    
    The change below uses a suitably modified variant of the datagram_poll
    routines for both type of PF_UNIX sockets, which tests if the
    recv-queue of the peer a socket is connected to is presently
    considered to be 'full' as part of the 'is this socket
    writeable'-checking code. The socket being polled is additionally
    put onto the peer_wait wait queue associated with its peer, because the
    unix_dgram_sendmsg routine does a wake up on this queue after a
    datagram was received and the 'other wakeup call' is done implicitly
    as part of skb destruction, meaning, a process blocked in poll
    because of a full peer receive queue could otherwise sleep forever
    if no datagram owned by its socket was already sitting on this queue.
    Among this change is a small (inline) helper routine named
    'unix_recvq_full', which consolidates the actual testing code (in three
    different places) into a single location.
    
    Signed-off-by: default avatarRainer Weikusat <rweikusat@mssgmbh.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    3c73419c