• David Ahern's avatar
    net: ipv4: Consider failed nexthops in multipath routes · a6db4494
    David Ahern authored
    Multipath route lookups should consider knowledge about next hops and not
    select a hop that is known to be failed.
                         [h2]                   [h3]
                          |                      |
                         3|                     3|
                        [SP1]                  [SP2]--+
                         1  2                   1     2
                         |  |     /-------------+     |
                         |   \   /                    |
                         |     X                      |
                         |    / \                     |
                         |   /   \---------------\    |
                         1  2                     1   2
     [TOR1] 3-----------------3 [TOR2]
                         4                         4
                          \                       /
                            \                    /
                             \                  /
                              -------|   |-----/
                                     1   2
    host h1 with IP has 2 paths to host h3 at
        root@h1:~# ip ro ls
        ... dev swp1  proto kernel  scope link  src
                nexthop via  dev swp1 weight 1
                nexthop via  dev swp1 weight 1
    If the link between tor3 and tor1 is down and the link between tor1
    and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
    in h1 are alternating between the 2 routes: ping gets one and
    ssh gets the other. Connections that attempt to use the nexthop fail since that neighbor is not reachable:
        root@h1:~# ip neigh show
        ... dev swp1 lladdr 00:02:00:00:00:1b REACHABLE dev swp1  FAILED
    The failed path can be avoided by considering known neighbor information
    when selecting next hops. If the neighbor lookup fails we have no
    knowledge about the nexthop, so give it a shot. If there is an entry
    then only select the nexthop if the state is sane. This is similar to
    what fib_detect_death does.
    To maintain backward compatibility use of the neighbor information is
    based on a new sysctl, fib_multipath_use_neigh.
    Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
    Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
ipv4.h 3.18 KB