linux/include/net
David Ahern a6db4494d2 net: ipv4: Consider failed nexthops in multipath routes
Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

                     [h2]                   [h3]   15.0.0.5
                      |                      |
                     3|                     3|
                    [SP1]                  [SP2]--+
                     1  2                   1     2
                     |  |     /-------------+     |
                     |   \   /                    |
                     |     X                      |
                     |    / \                     |
                     |   /   \---------------\    |
                     1  2                     1   2
         12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                     4                         4
                      \                       /
                        \                    /
                         \                  /
                          -------|   |-----/
                                 1   2
                                [TOR3]
                                  3|
                                   |
                                  [h1]  12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

    root@h1:~# ip ro ls
    ...
    12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
    15.0.0.0/16
            nexthop via 12.0.0.2  dev swp1 weight 1
            nexthop via 12.0.0.3  dev swp1 weight 1
    ...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

    root@h1:~# ip neigh show
    ...
    12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
    12.0.0.2 dev swp1  FAILED
    ...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:16:13 -04:00
..
9p 9p: switch p9_client_read() to passing struct iov_iter * 2015-04-11 22:28:27 -04:00
bluetooth Bluetooth: Add support for limited privacy mode 2016-03-10 19:51:30 +01:00
caif caif: fix a signedness bug in cfpkt_iterate() 2015-02-20 17:35:14 -05:00
irda irda: Convert function pointer arrays and uses to const 2014-12-10 15:33:16 -05:00
iucv s390/iucv: do not use arrays as argument 2015-09-21 16:03:04 -07:00
netfilter netfilter: nft_masq: support port range 2016-03-02 20:05:27 +01:00
netns net: ipv4: Consider failed nexthops in multipath routes 2016-04-11 15:16:13 -04:00
nfc nfc: netlink: HCI event connectivity implementation 2015-12-29 19:06:20 +01:00
phonet sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
sctp sctp: use list_* in sctp_list_dequeue 2016-04-05 15:44:08 -04:00
tc_act net/act_skbedit: Utility functions for mark action 2016-03-10 16:24:02 -05:00
6lowpan.h 6lowpan: iphc: add support for stateful compression 2016-02-23 20:29:40 +01:00
act_api.h net_sched: fix a memory leak in tc action 2016-04-06 16:04:29 -04:00
addrconf.h inet: refactor inet[6]_lookup functions to take skb 2016-02-11 03:54:14 -05:00
af_ieee802154.h ieee802154: af_ieee802154: fix typo in comment. 2015-09-17 13:20:05 +02:00
af_rxrpc.h
af_unix.h unix: correctly track in-flight fds in sending process user_struct 2016-02-08 10:30:42 -05:00
af_vsock.h Revert "Merge branch 'vsock-virtio'" 2015-12-08 21:55:49 -05:00
ah.h ipsec: Remove obsolete MAX_AH_AUTH_LEN 2014-09-18 10:54:36 +02:00
arp.h neigh: Factor out ___neigh_lookup_noref 2015-03-04 00:23:23 -05:00
atmclip.h
ax25.h ax25: Stop using sock->sk_protinfo. 2015-06-28 16:55:44 -07:00
ax88796.h
bond_3ad.h bonding: 3ad: apply ad_actor settings changes immediately 2016-02-09 04:45:49 -05:00
bond_alb.h net: Move bonding headers under include/net 2014-11-10 13:27:49 -05:00
bond_options.h bonding: convert num_grat_arp to the new bonding option API 2015-07-27 01:05:24 -07:00
bonding.h bonding: fix bond_get_stats() 2016-03-18 23:14:15 -04:00
busy_poll.h net: un-inline sk_busy_loop() 2015-11-18 16:17:38 -05:00
cfg80211-wext.h
cfg80211.h cfg80211: Add option to specify previous BSSID for Connect command 2016-04-06 13:18:21 +02:00
cfg802154.h nl802154: add support for security layer 2015-09-30 13:16:44 +02:00
checksum.h csum: Update csum_block_add to use rotate instead of byteswap 2016-03-13 15:01:00 -04:00
cipso_ipv4.h cipso: don't use IPCB() to locate the CIPSO IP option 2015-02-11 14:46:37 -05:00
cls_cgroup.h net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct 2015-12-08 22:02:33 -05:00
codel.h net_sched: update hierarchical backlog too 2016-02-29 17:02:33 -05:00
compat.h net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
datalink.h
dcbevent.h
dcbnl.h net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
devlink.h Introduce devlink infrastructure 2016-03-01 16:07:29 -05:00
dn_dev.h
dn_fib.h
dn_neigh.h netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
dn_nsp.h
dn_route.h
dn.h
dsa.h net: dsa: make the VLAN add function return void 2016-04-08 16:50:41 -04:00
dsfield.h
dst_cache.h net: add dst_cache support 2016-02-16 20:21:48 -05:00
dst_metadata.h ip_tunnel: add support for setting flow label via collect metadata 2016-03-11 15:14:26 -05:00
dst_ops.h ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out 2015-10-08 04:27:02 -07:00
dst.h bpf, dst: add and use dst_tclassid helper 2016-03-18 19:38:46 -04:00
esp.h
ethoc.h net/ethoc: support big-endian register layout 2015-09-23 15:33:15 -07:00
fib_rules.h net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
firewire.h
flow_dissector.h net/flow_dissector: Make dissector_uses_key() and skb_flow_dissector_target() public 2016-03-10 16:24:02 -05:00
flow.h ipv6, trace: fix tos reporting on fib6_table_lookup 2016-03-20 13:44:34 -04:00
flowcache.h
fou.h ip_tunnel: Ops registration for secondary encap (fou, gue) 2014-11-12 15:01:35 -05:00
garp.h
gen_stats.h net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
genetlink.h Revert "genl: Add genlmsg_new_unicast() for unicast message allocation" 2016-02-18 11:42:19 -05:00
geneve.h geneve: Add geneve_get_rx_port support 2015-12-16 10:58:56 -05:00
gre.h gre: Remove support for sharing GRE protocol hook. 2015-08-10 14:03:54 -07:00
gro_cells.h gro_cells: remove spinlock protecting receive queues 2015-08-31 15:17:17 -07:00
gue.h gue: Protocol constants for remote checksum offload 2014-11-05 16:30:03 -05:00
hwbm.h net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
icmp.h
ieee80211_radiotap.h
ieee802154_netdev.h mac802154: constify ieee802154_llsec_ops structure 2016-01-04 20:40:41 +01:00
if_inet6.h ipv6: do retries on stable privacy addresses 2015-03-23 22:12:09 -04:00
ila.h ila: Add generic ILA translation facility 2015-12-15 23:25:20 -05:00
inet6_connection_sock.h ipv6: remove unused in6_addr struct 2016-03-22 15:45:44 -04:00
inet6_hashtables.h tcp/dccp: do not touch listener sk_refcnt under synflood 2016-04-04 22:11:20 -04:00
inet_common.h net: avoid NULL deref in inet_ctl_sock_destroy() 2015-11-02 22:46:09 -05:00
inet_connection_sock.h tcp/dccp: fix another race at listener dismantle 2016-02-18 11:35:51 -05:00
inet_ecn.h ipv6: update skb->csum when CE mark is propagated 2016-01-15 15:07:23 -05:00
inet_frag.h ipv4: namespacify ip fragment max dist sysctl knob 2016-02-16 20:42:54 -05:00
inet_hashtables.h tcp/dccp: do not touch listener sk_refcnt under synflood 2016-04-04 22:11:20 -04:00
inet_sock.h net: Allow accepted sockets to be bound to l3mdev domain 2015-12-18 14:43:38 -05:00
inet_timewait_sock.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-20 06:08:27 -07:00
inetpeer.h inet: tcp: fix inetpeer_set_addr_v4() 2015-12-16 00:14:12 -05:00
ip6_checksum.h ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short 2016-03-13 23:55:13 -04:00
ip6_fib.h ipv6: Check rt->dst.from for the DST_NOCACHE route 2015-11-15 17:12:37 -05:00
ip6_route.h ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() 2016-01-29 20:31:26 -08:00
ip6_tunnel.h net: replace dst_cache ip6_tunnel implementation with the generic one 2016-02-16 20:21:48 -05:00
ip_fib.h route: check and remove route cache when we get route 2016-02-18 11:31:36 -05:00
ip_tunnels.h ip_tunnel: implement __iptunnel_pull_header 2016-04-06 16:50:32 -04:00
ip_vs.h ipvs: drop first packet to redirect conntrack 2016-03-07 11:53:30 +09:00
ip.h ipv4: process socket-level control messages in IPv4 2016-04-04 15:50:30 -04:00
ipcomp.h
ipconfig.h
ipv6.h sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
ipx.h switch ipxrtr_route_packet() from iovec to msghdr 2014-11-24 04:28:49 -05:00
iw_handler.h cfg80211/wext: fix message ordering 2016-01-29 17:13:43 +01:00
kcm.h kcm: mark helper functions inline 2016-03-10 14:42:03 -05:00
l3mdev.h net: l3mdev: address selection should only consider devices in L3 domain 2016-02-26 14:22:26 -05:00
lapb.h
lib80211.h lib80211: remove unused print_ssid() 2014-10-14 02:18:27 +02:00
llc_c_ac.h
llc_c_ev.h
llc_c_st.h llc: Make llc_conn_ev_qfyr_t function pointer arrays const 2014-12-10 15:21:24 -05:00
llc_conn.h net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h llc: Make llc_sap_action_t function pointer arrays const 2014-12-10 15:21:24 -05:00
llc_sap.h
llc.h
lwtunnel.h lwtunnel: autoload of lwt modules 2016-02-21 22:00:28 -05:00
mac80211.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
mac802154.h mac802154: use put and get unaligned functions 2016-03-10 19:51:28 +01:00
mip6.h
mld.h ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback 2014-09-22 16:23:15 -04:00
mpls_iptunnel.h mpls: multipath route support 2015-10-23 06:26:42 -07:00
mpls.h openvswitch: Add basic MPLS support to kernel 2014-11-05 23:52:33 -08:00
mrp.h
ndisc.h Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" 2015-12-01 15:07:59 -05:00
neighbour.h net: add explicit logging and stat for neighbour table overflow 2015-08-10 13:46:21 -07:00
net_namespace.h netfilter: cttimeout: add netns support 2015-12-14 12:48:58 +01:00
net_ratelimit.h
netevent.h
netlabel.h netlabel: fix the netlbl_catmap_setlong() dummy function 2014-08-07 20:55:21 -04:00
netlink.h netlink: add nla_get for le32 and le64 2015-09-30 13:16:44 +02:00
netprio_cgroup.h net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct 2015-12-08 22:02:33 -05:00
netrom.h
nexthop.h
nl802154.h nl802154: add support for security layer 2015-09-30 13:16:44 +02:00
p8022.h
ping.h net: ping: make ping_v6_sendmsg static 2016-03-23 22:09:58 -04:00
pkt_cls.h net/flower: Fix pointer cast 2016-03-11 12:04:37 -05:00
pkt_sched.h net: sched: consolidate tc_classify{,_compat} 2015-08-27 14:18:48 -07:00
protocol.h udp: Remove udp_offloads 2016-04-07 16:53:30 -04:00
psnap.h
raw.h sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
rawv6.h
red.h
regulatory.h cfg80211: allow wiphy specific regdomain management 2014-12-17 11:49:55 +01:00
request_sock.h inet: reqsk_alloc() needs to take care of dead listeners 2016-04-04 22:11:19 -04:00
rose.h
route.h net: Checks skb_dst to be NULL in inet_iif 2016-04-07 16:53:14 -04:00
rtnetlink.h netlink: Rightsize IFLA_AF_SPEC size calculation 2015-10-21 19:15:20 -07:00
sch_generic.h net: sched: use pfifo_fast for non real queues 2016-03-03 17:38:46 -05:00
scm.h unix: correctly track in-flight fds in sending process user_struct 2016-02-08 10:30:42 -05:00
secure_seq.h
slhc_vj.h
snmp.h Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-10-15 07:48:18 +02:00
sock_reuseport.h soreuseport: fix NULL ptr dereference SO_REUSEPORT after bind 2016-01-19 14:44:23 -05:00
sock.h net: Fix build failure due to lockdep_sock_is_held(). 2016-04-07 20:40:25 -04:00
Space.h
stp.h
switchdev.h switchdev: Adding MDB entry offload 2016-01-10 16:50:20 -05:00
tcp_states.h inet: add TCP_NEW_SYN_RECV state 2015-03-12 22:58:12 -04:00
tcp.h tcp: increment sk_drops for listeners 2016-04-04 22:11:20 -04:00
timewait_sock.h inet: remove BUG_ON() in twsk_destructor() 2015-07-09 15:12:20 -07:00
transp_v6.h ipv6: process socket-level control messages in IPv6 2016-04-04 15:50:30 -04:00
tso.h net: tso: add support for IPv6 2015-10-26 22:24:22 -07:00
udp_tunnel.h udp: Add socket based GRO and config 2016-04-07 16:53:29 -04:00
udp.h udp: Add GRO functions to UDP socket 2016-04-07 16:53:29 -04:00
udplite.h net: switch memcpy_fromiovec()/memcpy_fromiovecend() users to copy_from_iter() 2015-02-04 01:34:15 -05:00
vsock_addr.h
vxlan.h vxlan: change vxlan to use UDP socket GRO 2016-04-07 16:53:29 -04:00
wext.h
wimax.h net: treewide: Fix typo found in DocBook/networking.xml 2014-09-05 17:35:28 -07:00
x25.h
x25device.h
xfrm.h xfrm: add rcu protection to sk->sk_policy[] 2015-12-11 19:22:06 -05:00