Skip to content
  • Parthasarathy Bhuvaragan's avatar
    tipc: fix a race condition leading to subscriber refcnt bug · 333f7962
    Parthasarathy Bhuvaragan authored
    
    
    Until now, the requests sent to topology server are queued
    to a workqueue by the generic server framework.
    These messages are processed by worker threads and trigger the
    registered callbacks.
    To reduce latency on uniprocessor systems, explicit rescheduling
    is performed using cond_resched() after MAX_RECV_MSG_COUNT(25)
    messages.
    
    This implementation on SMP systems leads to an subscriber refcnt
    error as described below:
    When a worker thread yields by calling cond_resched() in a SMP
    system, a new worker is created on another CPU to process the
    pending workitem. Sometimes the sleeping thread wakes up before
    the new thread finishes execution.
    This breaks the assumption on ordering and being single threaded.
    The fault is more frequent when MAX_RECV_MSG_COUNT is lowered.
    
    If the first thread was processing subscription create and the
    second thread processing close(), the close request will free
    the subscriber and the create request oops as follows:
    
    [31.224137] WARNING: CPU: 2 PID: 266 at include/linux/kref.h:46 tipc_subscrb_rcv_cb+0x317/0x380         [tipc]
    [31.228143] CPU: 2 PID: 266 Comm: kworker/u8:1 Not tainted 4.5.0+ #97
    [31.228377] Workqueue: tipc_rcv tipc_recv_work [tipc]
    [...]
    [31.228377] Call Trace:
    [31.228377]  [<ffffffff812fbb6b>] dump_stack+0x4d/0x72
    [31.228377]  [<ffffffff8105a311>] __warn+0xd1/0xf0
    [31.228377]  [<ffffffff8105a3fd>] warn_slowpath_null+0x1d/0x20
    [31.228377]  [<ffffffffa0098067>] tipc_subscrb_rcv_cb+0x317/0x380 [tipc]
    [31.228377]  [<ffffffffa00a4984>] tipc_receive_from_sock+0xd4/0x130 [tipc]
    [31.228377]  [<ffffffffa00a439b>] tipc_recv_work+0x2b/0x50 [tipc]
    [31.228377]  [<ffffffff81071925>] process_one_work+0x145/0x3d0
    [31.246554] ---[ end trace c3882c9baa05a4fd ]---
    [31.248327] BUG: spinlock bad magic on CPU#2, kworker/u8:1/266
    [31.249119] BUG: unable to handle kernel NULL pointer dereference at 0000000000000428
    [31.249323] IP: [<ffffffff81099d0c>] spin_dump+0x5c/0xe0
    [31.249323] PGD 0
    [31.249323] Oops: 0000 [#1] SMP
    
    In this commit, we
    - rename tipc_conn_shutdown() to tipc_conn_release().
    - move connection release callback execution from tipc_close_conn()
      to a new function tipc_sock_release(), which is executed before
      we free the connection.
    Thus we release the subscriber during connection release procedure
    rather than connection shutdown procedure.
    
    Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
    Acked-by: default avatarYing Xue <ying.xue@windriver.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    333f7962