mirror of
https://github.com/torvalds/linux.git
synced 2024-11-24 21:21:41 +00:00
8f34e53b60
Nik reported a bug with pcpu dst cache when nexthop objects are
used illustrated by the following:
$ ip netns add foo
$ ip -netns foo li set lo up
$ ip -netns foo addr add 2001:db8:11::1/128 dev lo
$ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
$ ip li add veth1 type veth peer name veth2
$ ip li set veth1 up
$ ip addr add 2001:db8:10::1/64 dev veth1
$ ip li set dev veth2 netns foo
$ ip -netns foo li set veth2 up
$ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
$ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
$ ip -6 route add 2001:db8:11::1/128 nhid 100
Create a pcpu entry on cpu 0:
$ taskset -a -c 0 ip -6 route get 2001:db8:11::1
Re-add the route entry:
$ ip -6 ro del 2001:db8:11::1
$ ip -6 route add 2001:db8:11::1/128 nhid 100
Route get on cpu 0 returns the stale pcpu:
$ taskset -a -c 0 ip -6 route get 2001:db8:11::1
RTNETLINK answers: Network is unreachable
While cpu 1 works:
$ taskset -a -c 1 ip -6 route get 2001:db8:11::1
2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium
Conversion of FIB entries to work with external nexthop objects
missed an important difference between IPv4 and IPv6 - how dst
entries are invalidated when the FIB changes. IPv4 has a per-network
namespace generation id (rt_genid) that is bumped on changes to the FIB.
Checking if a dst_entry is still valid means comparing rt_genid in the
rtable to the current value of rt_genid for the namespace.
IPv6 also has a per network namespace counter, fib6_sernum, but the
count is saved per fib6_node. With the per-node counter only dst_entries
based on fib entries under the node are invalidated when changes are
made to the routes - limiting the scope of invalidations. IPv6 uses a
reference in the rt6_info, 'from', to track the corresponding fib entry
used to create the dst_entry. When validating a dst_entry, the 'from'
is used to backtrack to the fib6_node and check the sernum of it to the
cookie passed to the dst_check operation.
With the inline format (nexthop definition inline with the fib6_info),
dst_entries cached in the fib6_nh have a 1:1 correlation between fib
entries, nexthop data and dst_entries. With external nexthops, IPv6
looks more like IPv4 which means multiple fib entries across disparate
fib6_nodes can all reference the same fib6_nh. That means validation
of dst_entries based on external nexthops needs to use the IPv4 format
- the per-network namespace counter.
Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
rt6_get_cookie to return sernum if it is set and update dst_check for
IPv6 to look for sernum set and based the check on it if so. Finally,
rt6_get_pcpu_route needs to validate the cached entry before returning
a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
__mkroute_output for IPv4).
This problem only affects routes using the new, external nexthops.
Thanks to the kbuild test robot for catching the IS_ENABLED needed
around rt_genid_ipv6 before I sent this out.
Fixes:
|
||
---|---|---|
.. | ||
ila | ||
netfilter | ||
addrconf_core.c | ||
addrconf.c | ||
addrlabel.c | ||
af_inet6.c | ||
ah6.c | ||
anycast.c | ||
calipso.c | ||
datagram.c | ||
esp6_offload.c | ||
esp6.c | ||
exthdrs_core.c | ||
exthdrs_offload.c | ||
exthdrs.c | ||
fib6_notifier.c | ||
fib6_rules.c | ||
fou6.c | ||
icmp.c | ||
inet6_connection_sock.c | ||
inet6_hashtables.c | ||
ip6_checksum.c | ||
ip6_fib.c | ||
ip6_flowlabel.c | ||
ip6_gre.c | ||
ip6_icmp.c | ||
ip6_input.c | ||
ip6_offload.c | ||
ip6_offload.h | ||
ip6_output.c | ||
ip6_tunnel.c | ||
ip6_udp_tunnel.c | ||
ip6_vti.c | ||
ip6mr.c | ||
ipcomp6.c | ||
ipv6_sockglue.c | ||
Kconfig | ||
Makefile | ||
mcast_snoop.c | ||
mcast.c | ||
mip6.c | ||
ndisc.c | ||
netfilter.c | ||
output_core.c | ||
ping.c | ||
proc.c | ||
protocol.c | ||
raw.c | ||
reassembly.c | ||
route.c | ||
rpl_iptunnel.c | ||
rpl.c | ||
seg6_hmac.c | ||
seg6_iptunnel.c | ||
seg6_local.c | ||
seg6.c | ||
sit.c | ||
syncookies.c | ||
sysctl_net_ipv6.c | ||
tcp_ipv6.c | ||
tcpv6_offload.c | ||
tunnel6.c | ||
udp_impl.h | ||
udp_offload.c | ||
udp.c | ||
udplite.c | ||
xfrm6_input.c | ||
xfrm6_output.c | ||
xfrm6_policy.c | ||
xfrm6_protocol.c | ||
xfrm6_state.c | ||
xfrm6_tunnel.c |