linux/net/core
Toke Høiland-Jørgensen 782347b6bc xdp: Add proper __rcu annotations to redirect map entries
XDP_REDIRECT works by a three-step process: the bpf_redirect() and
bpf_redirect_map() helpers will lookup the target of the redirect and store
it (along with some other metadata) in a per-CPU struct bpf_redirect_info.
Next, when the program returns the XDP_REDIRECT return code, the driver
will call xdp_do_redirect() which will use the information thus stored to
actually enqueue the frame into a bulk queue structure (that differs
slightly by map type, but shares the same principle). Finally, before
exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will
flush all the different bulk queues, thus completing the redirect.

Pointers to the map entries will be kept around for this whole sequence of
steps, protected by RCU. However, there is no top-level rcu_read_lock() in
the core code; instead drivers add their own rcu_read_lock() around the XDP
portions of the code, but somewhat inconsistently as Martin discovered[0].
However, things still work because everything happens inside a single NAPI
poll sequence, which means it's between a pair of calls to
local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could
document this intention by using rcu_dereference_check() with
rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and
lockdep to verify that everything is done correctly.

This patch does just that: we add an __rcu annotation to the map entry
pointers and remove the various comments explaining the NAPI poll assurance
strewn through devmap.c in favour of a longer explanation in filter.c. The
goal is to have one coherent documentation of the entire flow, and rely on
the RCU annotations as a "standard" way of communicating the flow in the
map code (which can additionally be understood by sparse and lockdep).

The RCU annotation replacements result in a fairly straight-forward
replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE()
becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the
proper constructs to cast the pointer back and forth between __rcu and
__kernel address space (for the benefit of sparse). The one complication is
that xskmap has a few constructions where double-pointers are passed back
and forth; these simply all gain __rcu annotations, and only the final
reference/dereference to the inner-most pointer gets changed.

With this, everything can be run through sparse without eliciting
complaints, and lockdep can verify correctness even without the use of
rcu_read_lock() in the drivers. Subsequent patches will clean these up from
the drivers.

[0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/
[1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210624160609.292325-6-toke@redhat.com
2021-06-24 19:41:15 +02:00
..
bpf_sk_storage.c bpf: Use struct_size() in kzalloc() 2021-05-13 15:58:00 -07:00
datagram.c udp: fix skb_copy_and_csum_datagram with odd segment sizes 2021-02-04 18:56:56 -08:00
datagram.h
dev_addr_lists.c net: core: Correct function name dev_uc_flush() in the kerneldoc 2021-03-28 17:56:56 -07:00
dev_ioctl.c net: fix dev_ifsioc_locked() race condition 2021-02-11 18:14:19 -08:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-05-27 09:55:10 -07:00
devlink.c net: core: devlink: add dropped stats traps field 2021-06-14 13:04:25 -07:00
drop_monitor.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-03-25 15:31:22 -07:00
dst_cache.c
dst.c net, bpf: Fix ip6ip6 crash with collect_md populated skbs 2021-03-10 12:24:18 -08:00
failover.c
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c fib: Return the correct errno code 2021-06-03 15:13:56 -07:00
filter.c xdp: Add proper __rcu annotations to redirect map entries 2021-06-24 19:41:15 +02:00
flow_dissector.c net: flow_dissector: fix RPS on DSA masters 2021-06-14 13:15:22 -07:00
flow_offload.c net: flow_offload: Fix memory leak for indirect flow block 2020-12-09 16:08:33 -08:00
gen_estimator.c net_sched: gen_estimator: support large ewma log 2021-01-15 18:11:06 -08:00
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c gro_cells: reduce number of synchronize_net() calls 2020-11-25 11:28:12 -08:00
hwbm.c
link_watch.c net: Add IF_OPER_TESTING 2020-04-20 12:43:24 -07:00
lwt_bpf.c lwt_bpf: Replace preempt_disable() with migrate_disable() 2020-12-07 11:53:40 -08:00
lwtunnel.c net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
Makefile net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
neighbour.c neighbour: Remove redundant initialization of 'bucket' 2021-05-10 14:25:13 -07:00
net_namespace.c net: initialize net->net_cookie at netns setup 2021-02-11 14:10:07 -08:00
net-procfs.c net: move the ptype_all and ptype_base declarations to include/linux/netdevice.h 2021-03-22 13:14:45 -07:00
net-sysfs.c net-sysfs: remove possible sleep from an RCU read-side critical section 2021-03-22 13:28:13 -07:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
netclassid_cgroup.c net: Remove the err argument from sock_from_file 2020-12-04 22:32:40 +01:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c netpoll: don't require irqs disabled in rt kernels 2021-06-01 15:15:11 -07:00
netprio_cgroup.c net: Remove the err argument from sock_from_file 2020-12-04 22:32:40 +01:00
page_pool.c page_pool: Allow drivers to hint on SKB recycling 2021-06-07 14:11:47 -07:00
pktgen.c pktgen: add pktgen_handle_all_threads() for the same code 2021-06-07 13:15:31 -07:00
ptp_classifier.c ptp: Add generic ptp v2 header parsing function 2020-08-19 16:07:49 -07:00
request_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
rtnetlink.c wwan: add interface creation support 2021-06-12 13:16:45 -07:00
scm.c scm: fix a typo in put_cmsg() 2021-04-16 11:41:07 -07:00
secure_seq.c crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h 2020-05-08 15:32:17 +10:00
selftests.c net: add generic selftest support 2021-04-20 16:08:02 -07:00
skbuff.c page_pool: Allow drivers to hint on SKB recycling 2021-06-07 14:11:47 -07:00
skmsg.c skmsg: Remove unused parameters of sk_msg_wait_data() 2021-05-18 16:44:19 +02:00
sock_diag.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
sock_map.c sock_map: Fix a potential use-after-free in sock_map_close() 2021-04-12 17:35:26 +02:00
sock_reuseport.c bpf: Support socket migration by eBPF. 2021-06-15 18:01:06 +02:00
sock.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2021-06-07 13:01:52 -07:00
stream.c
sysctl_net_core.c net: change netdev_unregister_timeout_secs min value to 1 2021-03-25 17:24:06 -07:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c xdp: Extend xdp_redirect_map with broadcast support 2021-05-26 09:46:16 +02:00