1. 20 Jul, 2015 2 commits
  2. 14 May, 2015 1 commit
    • Jon Paul Maloy's avatar
      tipc: simplify include dependencies · a6bf70f7
      Jon Paul Maloy authored
      
      
      When we try to add new inline functions in the code, we sometimes
      run into circular include dependencies.
      
      The main problem is that the file core.h, which really should be at
      the root of the dependency chain, instead is a leaf. I.e., core.h
      includes a number of header files that themselves should be allowed
      to include core.h. In reality this is unnecessary, because core.h does
      not need to know the full signature of any of the structs it refers to,
      only their type declaration.
      
      In this commit, we remove all dependencies from core.h towards any
      other tipc header file.
      
      As a consequence of this change, we can now move the function
      tipc_own_addr(net) from addr.c to addr.h, and make it inline.
      
      There are no functional changes in this commit.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6bf70f7
  3. 29 Mar, 2015 2 commits
    • Ying Xue's avatar
      tipc: involve reference counter for node structure · 8a0f6ebe
      Ying Xue authored
      
      
      TIPC node hash node table is protected with rcu lock on read side.
      tipc_node_find() is used to look for a node object with node address
      through iterating the hash node table. As the entire process of what
      tipc_node_find() traverses the table is guarded with rcu read lock,
      it's safe for us. However, when callers use the node object returned
      by tipc_node_find(), there is no rcu read lock applied. Therefore,
      this is absolutely unsafe for callers of tipc_node_find().
      
      Now we introduce a reference counter for node structure. Before
      tipc_node_find() returns node object to its caller, it first increases
      the reference counter. Accordingly, after its caller used it up,
      it decreases the counter again. This can prevent a node being used by
      one thread from being freed by another thread.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a0f6ebe
    • Ying Xue's avatar
      tipc: fix potential deadlock when all links are reset · b952b2be
      Ying Xue authored
      
      
      [   60.988363] ======================================================
      [   60.988754] [ INFO: possible circular locking dependency detected ]
      [   60.989152] 3.19.0+ #194 Not tainted
      [   60.989377] -------------------------------------------------------
      [   60.989781] swapper/3/0 is trying to acquire lock:
      [   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.990743]
      [   60.990743] but task is already holding lock:
      [   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.991738]
      [   60.991738] which lock already depends on the new lock.
      [   60.991738]
      [   60.992174]
      [   60.992174] the existing dependency chain (in reverse order) is:
      [   60.992174]
      -> #1 (&(&bclink->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
      [   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
      [   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
      [   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
      [   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      -> #0 (&(&n_ptr->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
      [   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      [   60.992174] other info that might help us debug this:
      [   60.992174]
      [   60.992174]  Possible unsafe locking scenario:
      [   60.992174]
      [   60.992174]        CPU0                    CPU1
      [   60.992174]        ----                    ----
      [   60.992174]   lock(&(&bclink->lock)->rlock);
      [   60.992174]                                lock(&(&n_ptr->lock)->rlock);
      [   60.992174]                                lock(&(&bclink->lock)->rlock);
      [   60.992174]   lock(&(&n_ptr->lock)->rlock);
      [   60.992174]
      [   60.992174]  *** DEADLOCK ***
      [   60.992174]
      [   60.992174] 3 locks held by swapper/3/0:
      [   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
      [   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
      [   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]
      
      The correct the sequence of grabbing n_ptr->lock and bclink->lock
      should be that the former is first held and the latter is then taken,
      which exactly happened on CPU1. But especially when the retransmission
      of broadcast link is failed, bclink->lock is first held in
      tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
      called by tipc_link_retransmit() subsequently, which is demonstrated on
      CPU0. As a result, deadlock occurs.
      
      If the order of holding the two locks happening on CPU0 is reversed, the
      deadlock risk will be relieved. Therefore, the node lock taken in
      link_retransmit_failure() originally is moved to tipc_bclink_rcv()
      so that it's obtained before bclink lock. But the precondition of
      the adjustment of node lock is that responding to bclink reset event
      must be moved from tipc_bclink_unlock() to tipc_node_unlock().
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b952b2be
  4. 14 Mar, 2015 2 commits
  5. 09 Feb, 2015 2 commits
  6. 05 Feb, 2015 2 commits
    • Jon Paul Maloy's avatar
      tipc: eliminate race condition at multicast reception · cb1b7280
      Jon Paul Maloy authored
      
      
      In a previous commit in this series we resolved a race problem during
      unicast message reception.
      
      Here, we resolve the same problem at multicast reception. We apply the
      same technique: an input queue serializing the delivery of arriving
      buffers. The main difference is that here we do it in two steps.
      First, the broadcast link feeds arriving buffers into the tail of an
      arrival queue, which head is consumed at the socket level, and where
      destination lookup is performed. Second, if the lookup is successful,
      the resulting buffer clones are fed into a second queue, the input
      queue. This queue is consumed at reception in the socket just like
      in the unicast case. Both queues are protected by the same lock, -the
      one of the input queue.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb1b7280
    • Jon Paul Maloy's avatar
      tipc: resolve race problem at unicast message reception · c637c103
      Jon Paul Maloy authored
      
      
      TIPC handles message cardinality and sequencing at the link layer,
      before passing messages upwards to the destination sockets. During the
      upcall from link to socket no locks are held. It is therefore possible,
      and we see it happen occasionally, that messages arriving in different
      threads and delivered in sequence still bypass each other before they
      reach the destination socket. This must not happen, since it violates
      the sequentiality guarantee.
      
      We solve this by adding a new input buffer queue to the link structure.
      Arriving messages are added safely to the tail of that queue by the
      link, while the head of the queue is consumed, also safely, by the
      receiving socket. Sequentiality is secured per socket by only allowing
      buffers to be dequeued inside the socket lock. Since there may be multiple
      simultaneous readers of the queue, we use a 'filter' parameter to reduce
      the risk that they peek the same buffer from the queue, hence also
      reducing the risk of contention on the receiving socket locks.
      
      This solves the sequentiality problem, and seems to cause no measurable
      performance degradation.
      
      A nice side effect of this change is that lock handling in the functions
      tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
      will enable future simplifications of those functions.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c637c103
  7. 12 Jan, 2015 1 commit
    • Ying Xue's avatar
      tipc: make tipc node table aware of net namespace · f2f9800d
      Ying Xue authored
      
      
      Global variables associated with node table are below:
      - node table list (node_htable)
      - node hash table list (tipc_node_list)
      - node table lock (node_list_lock)
      - node number counter (tipc_num_nodes)
      - node link number counter (tipc_num_links)
      
      To make node table support namespace, above global variables must be
      moved to tipc_net structure in order to keep secret for different
      namespaces. As a consequence, these variables are allocated and
      initialized when namespace is created, and deallocated when namespace
      is destroyed. After the change, functions associated with these
      variables have to utilize a namespace pointer to access them. So
      adding namespace pointer as a parameter of these functions is the
      major change made in the commit.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Tested-by: default avatarTero Aho <Tero.Aho@coriant.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f9800d
  8. 26 Nov, 2014 2 commits
  9. 21 Nov, 2014 1 commit
  10. 21 Oct, 2014 1 commit
    • Ying Xue's avatar
      tipc: fix a potential deadlock · 7b8613e0
      Ying Xue authored
      
      
      Locking dependency detected below possible unsafe locking scenario:
      
                 CPU0                          CPU1
      T0:  tipc_named_rcv()                tipc_rcv()
      T1:  [grab nametble write lock]*     [grab node lock]*
      T2:  tipc_update_nametbl()           tipc_node_link_up()
      T3:  tipc_nodesub_subscribe()        tipc_nametbl_publish()
      T4:  [grab node lock]*               [grab nametble write lock]*
      
      The opposite order of holding nametbl write lock and node lock on
      above two different paths may result in a deadlock. If we move the
      the updating of the name table after link state named out of node
      lock, the reverse order of holding locks will be eliminated, and
      as a result, the deadlock risk.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b8613e0
  11. 07 Oct, 2014 1 commit
    • Jon Maloy's avatar
      tipc: fix bug in multicast congestion handling · 908344cd
      Jon Maloy authored
      One aim of commit 50100a5e
      
       ("tipc:
      use pseudo message to wake up sockets after link congestion") was
      to handle link congestion abatement in a uniform way for both unicast
      and multicast transmit. However, the latter doesn't work correctly,
      and has been broken since the referenced commit was applied.
      
      If a user now sends a burst of multicast messages that is big
      enough to cause broadcast link congestion, it will be put to sleep,
      and not be waked up when the congestion abates as it should be.
      
      This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
      is set correctly, but in the wrong field. Instead of setting it in the
      'action_flags' field of the arrival node struct, it is by mistake set
      in the dummy node struct that is owned by the broadcast link, where it
      will never tested for. Second, we cannot use the same flag for waking
      up unicast and multicast users, since the function tipc_node_unlock()
      needs to pick the wakeup pseudo messages to deliver from different
      queues. It must hence be able to distinguish between the two cases.
      
      This commit solves this problem by adding a new flag
      TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
      The latter is to be called by tipc_node_unlock() when the named flag,
      now set in the correct field, is encountered.
      
      v2: using explicit 'unsigned int' declaration instead of 'uint', as
      per comment from David Miller.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      908344cd
  12. 23 Aug, 2014 2 commits
    • Jon Paul Maloy's avatar
      tipc: use message to abort connections when losing contact to node · 02be61a9
      Jon Paul Maloy authored
      
      
      In the current implementation, each 'struct tipc_node' instance keeps
      a linked list of those ports/sockets that are connected to the node
      represented by that struct. The purpose of this is to let the node
      object know which sockets to alert when it loses contact with its peer
      node, i.e., which sockets need to have their connections aborted.
      
      This entails an unwanted direct reference from the node structure
      back to the port/socket structure, and a need to grab port_lock
      when we have to make an upcall to the port. We want to get rid of
      this unecessary BH entry point into the socket, and also eliminate
      its use of port_lock.
      
      In this commit, we instead let the node struct keep list of "connected
      socket" structs, which each represents a connected socket, but is
      allocated independently by the node at the moment of connection. If
      the node loses contact with its peer node, the list is traversed, and
      a "connection abort" message is created for each entry in the list. The
      message is sent to it respective connected socket using the ordinary
      data path, and the receiving socket aborts its connections upon reception
      of the message.
      
      This enables us to get rid of the direct reference from 'struct node' to
      ´struct port', and another unwanted BH access point to the latter.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02be61a9
    • Jon Paul Maloy's avatar
      tipc: use pseudo message to wake up sockets after link congestion · 50100a5e
      Jon Paul Maloy authored
      
      
      The current link implementation keeps a linked list of blocked ports/
      sockets that is populated when there is link congestion. The purpose
      of this is to let the link know which users to wake up when the
      congestion abates.
      
      This adds unnecessary complexity to the data structure and the code,
      since it forces us to involve the link each time we want to delete
      a socket. It also forces us to grab the spinlock port_lock within
      the scope of node_lock. We want to get rid of this direct dependence,
      as well as the deadlock hazard resulting from the usage of port_lock.
      
      In this commit, we instead let the link keep list of a "wakeup" pseudo
      messages for use in such situations. Those messages are sent to the
      pending sockets via the ordinary message reception path, and wake up
      the socket's owner when they are received.
      
      This enables us to get rid of the 'waiting_ports' linked lists in struct
      tipc_port that manifest this direct reference. As a consequence, we can
      eliminate another BH entry into the socket, and hence the need to grab
      port_lock. This is a further step in our effort to remove port_lock
      altogether.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50100a5e
  13. 27 Jun, 2014 1 commit
  14. 14 May, 2014 1 commit
  15. 08 May, 2014 1 commit
  16. 05 May, 2014 5 commits
  17. 26 Apr, 2014 1 commit
  18. 27 Mar, 2014 2 commits
  19. 07 Jan, 2014 1 commit
    • Jon Paul Maloy's avatar
      tipc: remove 'has_redundant_link' flag from STATE link protocol messages · b9d4c339
      Jon Paul Maloy authored
      The flag 'has_redundant_link' is defined only in RESET and ACTIVATE
      protocol messages. Due to an ambiguity in the protocol specification it
      is currently also transferred in STATE messages. Its value is used to
      initialize a link state variable, 'permit_changeover', which is used
      to inhibit futile link failover attempts when it is known that the
      peer node has no working links at the moment, although the local node
      may still think it has one.
      
      The fact that 'has_redundant_link' incorrectly is read from STATE
      messages has the effect that 'permit_changeover' sometimes gets a wrong
      value, and permanently blocks any links from being re-established. Such
      failures can only occur in in dual-link systems, and are extremely rare.
      This bug seems to have always been present in the code.
      
      Furthermore, since commit b4b56102
      
      
      ("tipc: Ensure both nodes recognize loss of contact between them"),
      the 'permit_changeover' field serves no purpose any more. The task of
      enforcing 'lost contact' cycles at both peer endpoints is now taken
      by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN
      in struct tipc_node to abort unnecessary failover attempts.
      
      We therefore remove the 'has_redundant_link' flag from STATE messages,
      as well as the now redundant 'permit_changeover' variable.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9d4c339
  20. 04 Jan, 2014 1 commit
  21. 07 Nov, 2013 1 commit
    • Erik Hugne's avatar
      tipc: message reassembly using fragment chain · 40ba3cdf
      Erik Hugne authored
      
      
      When the first fragment of a long data data message is received on a link, a
      reassembly buffer large enough to hold the data from this and all subsequent
      fragments of the message is allocated. The payload of each new fragment is
      copied into this buffer upon arrival. When the last fragment is received, the
      reassembled message is delivered upwards to the port/socket layer.
      
      Not only is this an inefficient approach, but it may also cause bursts of
      reassembly failures in low memory situations. since we may fail to allocate
      the necessary large buffer in the first place. Furthermore, after 100 subsequent
      such failures the link will be reset, something that in reality aggravates the
      situation.
      
      To remedy this problem, this patch introduces a different approach. Instead of
      allocating a big reassembly buffer, we now append the arriving fragments
      to a reassembly chain on the link, and deliver the whole chain up to the
      socket layer once the last fragment has been received. This is safe because
      the retransmission layer of a TIPC link always delivers packets in strict
      uninterrupted order, to the reassembly layer as to all other upper layers.
      Hence there can never be more than one fragment chain pending reassembly at
      any given time in a link, and we can trust (but still verify) that the
      fragments will be chained up in the correct order.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40ba3cdf
  22. 22 Nov, 2012 2 commits
  23. 30 Apr, 2012 1 commit
    • Paul Gortmaker's avatar
      tipc: compress out gratuitous extra carriage returns · 617d3c7a
      Paul Gortmaker authored
      
      
      Some of the comment blocks are floating in limbo between two
      functions, or between blocks of code.  Delete the extra line
      feeds between any comment and its associated following block
      of code, to be consistent with the majority of the rest of
      the kernel.  Also delete trailing newlines at EOF and fix
      a couple trivial typos in existing comments.
      
      This is a 100% cosmetic change with no runtime impact.  We get
      rid of over 500 lines of non-code, and being blank line deletes,
      they won't even show up as noise in git blame.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      617d3c7a
  24. 24 Feb, 2012 2 commits
  25. 06 Feb, 2012 2 commits
    • Allan Stephens's avatar
      tipc: Remove obsolete broadcast tag capability · 1ec2bb08
      Allan Stephens authored
      
      
      Eliminates support for the broadcast tag field, which is no longer
      used by broadcast link NACK messages.
      Signed-off-by: default avatarAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      1ec2bb08
    • Allan Stephens's avatar
      tipc: Major redesign of broadcast link ACK/NACK algorithms · 7a54d4a9
      Allan Stephens authored
      
      
      Completely redesigns broadcast link ACK and NACK mechanisms to prevent
      spurious retransmit requests in dual LAN networks, and to prevent the
      broadcast link from stalling due to the failure of a receiving node to
      acknowledge receiving a broadcast message or request its retransmission.
      
      Note: These changes only impact the timing of when ACK and NACK messages
      are sent, and not the basic broadcast link protocol itself, so inter-
      operability with nodes using the "classic" algorithms is maintained.
      
      The revised algorithms are as follows:
      
      1) An explicit ACK message is still sent after receiving 16 in-sequence
      messages, and implicit ACK information continues to be carried in other
      unicast link message headers (including link state messages).  However,
      the timing of explicit ACKs is now based on the receiving node's absolute
      network address rather than its relative network address to ensure that
      the failure of another node does not delay the ACK beyond its 16 message
      target.
      
      2) A NACK message is now typically sent only when a message gap persists
      for two consecutive incoming link state messages; this ensures that a
      suspected gap is not confirmed until both LANs in a dual LAN network have
      had an opportunity to deliver the message, thereby preventing spurious NACKs.
      A NACK message can also be generated by the arrival of a single link state
      message, if the deferred queue is so big that the current message gap
      cannot be the result of "normal" mis-ordering due to the use of dual LANs
      (or one LAN using a bonded interface). Since link state messages typically
      arrive at different nodes at different times the problem of multiple nodes
      issuing identical NACKs simultaneously is inherently avoided.
      
      3) Nodes continue to "peek" at NACK messages sent by other nodes. If
      another node requests retransmission of a message gap suspected (but not
      yet confirmed) by the peeking node, the peeking node forgets about the
      gap and does not generate a duplicate retransmit request. (If the peeking
      node subsequently fails to receive the lost message, later link state
      messages will cause it to rediscover and confirm the gap and send another
      NACK.)
      
      4) Message gap "equality" is now determined by the start of the gap only.
      This is sufficient to deal with the most common cases of message loss,
      and eliminates the need for complex end of gap computations.
      
      5) A peeking node no longer tries to determine whether it should send a
      complementary NACK, since the most common cases of message loss don't
      require it to be sent. Consequently, the node no longer examines the
      "broadcast tag" field of a NACK message when peeking.
      Signed-off-by: default avatarAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      7a54d4a9