linux/net/ipv4/netfilter
Andy Gospodarek 0eeb075fad net: ipv4 sysctl option to ignore routes when nexthop link is down
This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup.  This will signal to userspace that the route will not be
selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down.  This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected.   The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
    cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 dev p8p1  src 80.0.0.1
    cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set.  Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

v6: Style changes from Dave and checkpatch suggestions

v7: One more checkpatch fixup

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 02:15:54 -07:00
..
arp_tables.c netfilter: x_tables: remove XT_TABLE_INFO_SZ and a dereference. 2015-06-15 20:19:20 +02:00
arpt_mangle.c netfilter: arpt_mangle: fix return values of checkentry 2011-02-01 16:03:46 +01:00
arptable_filter.c netfilter: Pass nf_hook_state through arpt_do_table(). 2015-04-04 13:26:52 -04:00
ip_tables.c netfilter: x_tables: remove XT_TABLE_INFO_SZ and a dereference. 2015-06-15 20:19:20 +02:00
ipt_ah.c netfilter: xtables: change hotdrop pointer to direct modification 2010-05-11 18:35:27 +02:00
ipt_CLUSTERIP.c netfilter: x_tables: add context to know if extension runs from nft_compat 2015-05-15 20:14:07 +02:00
ipt_ECN.c netfilter: xtables: substitute temporary defines by final name 2010-05-11 18:31:17 +02:00
ipt_MASQUERADE.c netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables 2014-09-09 16:31:29 +02:00
ipt_REJECT.c netfilter: reject: don't send icmp error if csum is invalid 2015-03-03 02:10:35 +01:00
ipt_rpfilter.c net: ipv4 sysctl option to ignore routes when nexthop link is down 2015-06-24 02:15:54 -07:00
ipt_SYNPROXY.c netfilter: synproxy: fix sparse errors 2015-05-17 13:08:29 -04:00
iptable_filter.c netfilter: Pass nf_hook_state through ipt_do_table(). 2015-04-04 12:47:04 -04:00
iptable_mangle.c netfilter: Pass nf_hook_state through ipt_do_table(). 2015-04-04 12:47:04 -04:00
iptable_nat.c netfilter: Pass nf_hook_state through ipt_do_table(). 2015-04-04 12:47:04 -04:00
iptable_raw.c netfilter: Pass nf_hook_state through ipt_do_table(). 2015-04-04 12:47:04 -04:00
iptable_security.c netfilter: Pass nf_hook_state through ipt_do_table(). 2015-04-04 12:47:04 -04:00
Kconfig netfilter: Kconfig: get rid of parens around depends on 2015-06-15 17:26:37 +02:00
Makefile netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module 2014-11-27 13:08:42 +01:00
nf_conntrack_l3proto_ipv4_compat.c netfilter: Remove uses of seq_<foo> return values 2015-03-18 10:51:35 +01:00
nf_conntrack_l3proto_ipv4.c netfilter: Make nf_hookfn use nf_hook_state. 2015-04-04 12:31:38 -04:00
nf_conntrack_proto_icmp.c netfilter: Convert print_tuple functions to return void 2014-11-05 14:10:33 -05:00
nf_defrag_ipv4.c netfilter: Make nf_hookfn use nf_hook_state. 2015-04-04 12:31:38 -04:00
nf_log_arp.c netfilter: Use LOGLEVEL_<FOO> defines 2015-03-25 12:09:39 +01:00
nf_log_ipv4.c netfilter: Use LOGLEVEL_<FOO> defines 2015-03-25 12:09:39 +01:00
nf_nat_h323.c netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() 2014-02-05 17:46:05 +01:00
nf_nat_l3proto_ipv4.c netfilter: Pass nf_hook_state through nf_nat_ipv4_{in,out,fn,local_fn}(). 2015-04-04 12:45:19 -04:00
nf_nat_masquerade_ipv4.c netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables 2014-09-09 16:31:29 +02:00
nf_nat_pptp.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
nf_nat_proto_gre.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_proto_icmp.c netfilter: use IS_ENABLED() macro 2014-06-30 11:38:03 +02:00
nf_nat_snmp_basic.c netfilter: nf_nat_snmp_basic: fix duplicates in if/else branches 2014-02-14 11:37:36 +01:00
nf_reject_ipv4.c netfilter: bridge: add helpers for fetching physin/outdev 2015-04-08 16:49:08 +02:00
nf_tables_arp.c netfilter: Pass nf_hook_state through nft_set_pktinfo*(). 2015-04-04 12:54:27 -04:00
nf_tables_ipv4.c netfilter: Pass nf_hook_state through nft_set_pktinfo*(). 2015-04-04 12:54:27 -04:00
nft_chain_nat_ipv4.c netfilter: Pass nf_hook_state through nft_set_pktinfo*(). 2015-04-04 12:54:27 -04:00
nft_chain_route_ipv4.c netfilter: Pass nf_hook_state through nft_set_pktinfo*(). 2015-04-04 12:54:27 -04:00
nft_masq_ipv4.c netfilter: nf_tables: get rid of NFT_REG_VERDICT usage 2015-04-13 17:17:07 +02:00
nft_redir_ipv4.c netfilter: nf_tables: switch registers to 32 bit addressing 2015-04-13 17:17:29 +02:00
nft_reject_ipv4.c netfilter: nf_tables: get rid of NFT_REG_VERDICT usage 2015-04-13 17:17:07 +02:00