linux/net/ipv6
David Ahern 8f34e53b60 ipv6: Use global sernum for dst validation with nexthop objects
Nik reported a bug with pcpu dst cache when nexthop objects are
used illustrated by the following:
    $ ip netns add foo
    $ ip -netns foo li set lo up
    $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
    $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
    $ ip li add veth1 type veth peer name veth2
    $ ip li set veth1 up
    $ ip addr add 2001:db8:10::1/64 dev veth1
    $ ip li set dev veth2 netns foo
    $ ip -netns foo li set veth2 up
    $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
    $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Create a pcpu entry on cpu 0:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1

    Re-add the route entry:
    $ ip -6 ro del 2001:db8:11::1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Route get on cpu 0 returns the stale pcpu:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
    RTNETLINK answers: Network is unreachable

    While cpu 1 works:
    $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
    2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium

Conversion of FIB entries to work with external nexthop objects
missed an important difference between IPv4 and IPv6 - how dst
entries are invalidated when the FIB changes. IPv4 has a per-network
namespace generation id (rt_genid) that is bumped on changes to the FIB.
Checking if a dst_entry is still valid means comparing rt_genid in the
rtable to the current value of rt_genid for the namespace.

IPv6 also has a per network namespace counter, fib6_sernum, but the
count is saved per fib6_node. With the per-node counter only dst_entries
based on fib entries under the node are invalidated when changes are
made to the routes - limiting the scope of invalidations. IPv6 uses a
reference in the rt6_info, 'from', to track the corresponding fib entry
used to create the dst_entry. When validating a dst_entry, the 'from'
is used to backtrack to the fib6_node and check the sernum of it to the
cookie passed to the dst_check operation.

With the inline format (nexthop definition inline with the fib6_info),
dst_entries cached in the fib6_nh have a 1:1 correlation between fib
entries, nexthop data and dst_entries. With external nexthops, IPv6
looks more like IPv4 which means multiple fib entries across disparate
fib6_nodes can all reference the same fib6_nh. That means validation
of dst_entries based on external nexthops needs to use the IPv4 format
- the per-network namespace counter.

Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
rt6_get_cookie to return sernum if it is set and update dst_check for
IPv6 to look for sernum set and based the check on it if so. Finally,
rt6_get_pcpu_route needs to validate the cached entry before returning
a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
__mkroute_output for IPv4).

This problem only affects routes using the new, external nexthops.

Thanks to the kbuild test robot for catching the IS_ENABLED needed
around rt_genid_ipv6 before I sent this out.

Fixes: 5b98324ebe ("ipv6: Allow routes to use nexthop objects")
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 12:46:30 -07:00
..
ila net: add net available in build_state 2020-03-29 22:30:57 -07:00
netfilter netfilter: Replace zero-length array with flexible-array member 2020-03-15 15:20:16 +01:00
addrconf_core.c net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-04 12:27:13 -08:00
addrconf.c neigh: support smaller retrans_time settting 2020-04-02 17:55:26 -07:00
addrlabel.c netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
af_inet6.c net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
ah6.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
anycast.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
calipso.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
datagram.c net: ipv6: add net argument to ip6_dst_lookup_flow 2019-12-04 12:27:12 -08:00
esp6_offload.c esp6: add gso_segment for esp6 beet mode 2020-03-26 14:51:07 +01:00
esp6.c ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
exthdrs_core.c ipv6: remove printk 2019-07-27 14:23:48 -07:00
exthdrs_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
exthdrs.c net: ipv6: add support for rpl sr exthdr 2020-03-29 22:30:57 -07:00
fib6_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib6_rules.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fou6.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
icmp.c net: icmp6: do not select saddr from iif when route has prefsrc set 2020-04-07 18:25:10 -07:00
inet6_connection_sock.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
inet6_hashtables.c net: annotate accesses to sk->sk_incoming_cpu 2019-10-30 13:24:25 -07:00
ip6_checksum.c net: udp: fix handling of CHECKSUM_COMPLETE packets 2018-10-24 14:18:16 -07:00
ip6_fib.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
ip6_flowlabel.c ipv6: fix static key imbalance in fl_create() 2019-07-11 14:43:25 -07:00
ip6_gre.c net: ip6_gre: Distribute switch variables for initialization 2020-02-20 10:00:19 -08:00
ip6_icmp.c icmp: introduce helper for nat'd source address in network device context 2020-02-13 14:19:00 -08:00
ip6_input.c bpf: Add socket assign support 2020-03-30 13:45:04 -07:00
ip6_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip6_offload.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip6_output.c net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc. 2020-02-24 13:31:42 -08:00
ip6_tunnel.c net: ip6_gre: Distribute switch variables for initialization 2020-02-20 10:00:19 -08:00
ip6_udp_tunnel.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
ip6_vti.c vti6: Fix memory leak of skb if input policy check fails 2020-03-16 11:13:48 +01:00
ip6mr.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
ipcomp6.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
ipv6_sockglue.c ipv6: fix restrict IPV6_ADDRFORM operation 2020-04-20 11:06:39 -07:00
Kconfig net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
Makefile net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
mcast_snoop.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 343 2019-06-05 17:37:07 +02:00
mcast.c mld: fix memory leak in mld_del_delrec() 2019-08-28 14:47:35 -07:00
mip6.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
ndisc.c neigh: support smaller retrans_time settting 2020-04-02 17:55:26 -07:00
netfilter.c net: ensure correct skb->tstamp in various fragmenters 2019-10-18 10:02:37 -07:00
output_core.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
ping.c ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' 2019-09-12 11:20:33 +01:00
proc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
raw.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
reassembly.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
route.c ipv6: Use global sernum for dst validation with nexthop objects 2020-05-01 12:46:30 -07:00
rpl_iptunnel.c net: ipv6: rpl_iptunnel: remove redundant assignments to variable err 2020-04-02 06:57:34 -07:00
rpl.c ipv6: rpl: fix full address compression 2020-04-18 15:04:27 -07:00
seg6_hmac.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
seg6_iptunnel.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
seg6_local.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
seg6.c ipv6: remove redundant assignment to variable err 2020-04-15 16:20:22 -07:00
sit.c sit: do not confirm neighbor when do pmtu update 2019-12-24 22:28:55 -08:00
syncookies.c mptcp: handle tcp fallback when using syn cookies 2020-01-29 17:45:20 +01:00
sysctl_net_ipv6.c ipv6: Use math to point per net sysctls into the appropriate struct net 2020-03-03 14:50:08 -08:00
tcp_ipv6.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
tcpv6_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tunnel6.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
udp_impl.h udp6: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
udp_offload.c udp: Support UDP fraglist GRO/GSO. 2020-01-27 11:00:21 +01:00
udp.c net: Track socket refcounts in skb_steal_sock() 2020-03-30 13:45:04 -07:00
udplite.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm6_input.c net: use skb_sec_path helper in more places 2018-12-19 11:21:37 -08:00
xfrm6_output.c xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish 2020-04-22 12:32:11 -07:00
xfrm6_policy.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
xfrm6_protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm6_state.c xfrm: remove eth_proto value from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
xfrm6_tunnel.c ipv6: xfrm6_tunnel.c: Use built-in RCU list checking 2020-02-27 10:17:41 +01:00