linux/net/ipv4
Martin KaFai Lau eb18b49ea7 bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
This patch allows the bpf-tcp-cc to call bpf_setsockopt.  One use
case is to allow a bpf-tcp-cc switching to another cc during init().
For example, when the tcp flow is not ecn ready, the bpf_dctcp
can switch to another cc by calling setsockopt(TCP_CONGESTION).

During setsockopt(TCP_CONGESTION), the new tcp-cc's init() will be
called and this could cause a recursion but it is stopped by the
current trampoline's logic (in the prog->active counter).

While retiring a bpf-tcp-cc (e.g. in tcp_v[46]_destroy_sock()),
the tcp stack calls bpf-tcp-cc's release().  To avoid the retiring
bpf-tcp-cc making further changes to the sk, bpf_setsockopt is not
available to the bpf-tcp-cc's release().  This will avoid release()
making setsockopt() call that will potentially allocate new resources.

Although the bpf-tcp-cc already has a more powerful way to read tcp_sock
from the PTR_TO_BTF_ID, it is usually expected that bpf_getsockopt and
bpf_setsockopt are available together.  Thus, bpf_getsockopt() is also
added to all tcp_congestion_ops except release().

When the old bpf-tcp-cc is calling setsockopt(TCP_CONGESTION)
to switch to a new cc, the old bpf-tcp-cc will be released by
bpf_struct_ops_put().  Thus, this patch also puts the bpf_struct_ops_map
after a rcu grace period because the trampoline's image cannot be freed
while the old bpf-tcp-cc is still running.

bpf-tcp-cc can only access icsk_ca_priv as SCALAR.  All kernel's
tcp-cc is also accessing the icsk_ca_priv as SCALAR.   The size
of icsk_ca_priv has already been raised a few times to avoid
extra kmalloc and memory referencing.  The only exception is the
kernel's tcp_cdg.c that stores a kmalloc()-ed pointer in icsk_ca_priv.
To avoid the old bpf-tcp-cc accidentally overriding this tcp_cdg's pointer
value stored in icsk_ca_priv after switching and without over-complicating
the bpf's verifier for this one exception in tcp_cdg, this patch does not
allow switching to tcp_cdg.  If there is a need, bpf_tcp_cdg can be
implemented and then use the bpf_sk_storage as the extended storage.

bpf_sk_setsockopt proto has only been recently added and used
in bpf-sockopt and bpf-iter-tcp, so impose the tcp_cdg limitation in the
same proto instead of adding a new proto specifically for bpf-tcp-cc.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210824173007.3976921-1-kafai@fb.com
2021-08-25 17:40:35 -07:00
..
bpfilter net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces" 2020-08-10 12:06:44 -07:00
netfilter netfilter: x_tables: never register tables by default 2021-08-09 10:22:01 +02:00
af_inet.c bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum 2021-08-23 17:50:24 -07:00
ah4.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
arp.c net: Exempt multicast addresses from five-second neighbor lifetime 2020-11-13 14:24:39 -08:00
bpf_tcp_ca.c bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt 2021-08-25 17:40:35 -07:00
cipso_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
datagram.c
devinet.c net: add extack arg for link ops 2021-08-04 10:01:26 +01:00
esp4_offload.c xfrm: remove description from xfrm_type struct 2021-06-09 09:38:52 +02:00
esp4.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
fib_frontend.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
fib_lookup.h ipv4: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
fib_notifier.c
fib_rules.c fib: use indirect call wrappers in the most common fib_rules_ops 2020-07-28 17:42:31 -07:00
fib_semantics.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
fib_trie.c memcg: enable accounting for IP address and routing-related objects 2021-07-20 06:00:38 -07:00
fou.c genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
gre_demux.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
gre_offload.c ip_gre: add csum offload support for gre header 2021-01-29 20:39:14 -08:00
icmp.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
igmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
inet_connection_sock.c tcp: Add stats for socket migration. 2021-06-23 12:56:08 -07:00
inet_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
inet_fragment.c inet: frags: batch fqdir destroy works 2020-12-12 15:08:54 -08:00
inet_hashtables.c tcp: Keep TCP_CLOSE sockets in the reuseport group. 2021-06-15 18:01:05 +02:00
inet_timewait_sock.c net: Use generic ns_common::count 2020-08-19 14:06:36 +02:00
inetpeer.c inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold 2021-03-01 13:32:12 -08:00
ip_forward.c
ip_fragment.c
ip_gre.c ip_tunnel: use ndo_siocdevprivate 2021-07-27 20:11:44 +01:00
ip_input.c net: use indirect call helpers for dst_input 2021-02-03 14:51:39 -08:00
ip_options.c net: clean up codestyle for net/ipv4 2020-08-25 06:28:02 -07:00
ip_output.c ipv4: use skb_expand_head in ip_finish_output2 2021-08-03 11:21:39 +01:00
ip_sockglue.c net/ipv4/ipv6: Replace one-element arraya with flexible-array members 2021-08-05 11:46:42 +01:00
ip_tunnel_core.c net: ip_tunnel: clean up endianness conversions 2021-01-08 19:25:35 -08:00
ip_tunnel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-31 09:14:46 -07:00
ip_vti.c ip_tunnel: use ndo_siocdevprivate 2021-07-27 20:11:44 +01:00
ipcomp.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
ipconfig.c net: ipconfig: Don't override command-line hostnames or domains 2021-06-02 13:27:03 -07:00
ipip.c ip_tunnel: use ndo_siocdevprivate 2021-07-27 20:11:44 +01:00
ipmr_base.c
ipmr.c ipmr: Fix indentation issue 2021-07-07 20:52:25 -07:00
Kconfig net: ipv4: remove duplicate "the the" phrase in Kconfig text 2020-08-18 16:02:16 -07:00
Makefile bpf: Clean up sockmap related Kconfigs 2021-02-26 12:28:03 -08:00
metrics.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
netfilter.c netfilter: Dissect flow after packet mangling 2021-04-18 22:04:16 +02:00
netlink.c
nexthop.c nexthop: Restart nexthop dump based on last dumped nexthop identifier 2021-04-19 15:20:34 -07:00
ping.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
proc.c tcp: Add stats for socket migration. 2021-06-23 12:56:08 -07:00
protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
raw_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
raw.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
route.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
syncookies.c selinux/stable-5.11 PR 20201214 2020-12-16 11:01:04 -08:00
sysctl_net_ipv4.c net: Introduce net.ipv4.tcp_migrate_req. 2021-06-15 18:01:05 +02:00
tcp_bbr.c tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets 2021-08-11 15:00:15 -07:00
tcp_bic.c
tcp_bpf.c bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats 2021-07-15 19:54:22 +02:00
tcp_cdg.c
tcp_cong.c net: Only allow init netns to set default tcp cong to a restricted algo 2021-05-04 11:58:28 -07:00
tcp_cubic.c tcp: Rename bictcp function prefix to cubictcp 2021-03-26 20:41:51 -07:00
tcp_dctcp.c
tcp_dctcp.h
tcp_diag.c
tcp_fastopen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-23 16:13:06 +01:00
tcp_highspeed.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_htcp.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: more accurately check DSACKs to grow RACK reordering window 2021-07-27 20:07:21 +01:00
tcp_ipv4.c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-07-31 11:23:26 -07:00
tcp_lp.c ipv4: tcp_lp.c: Couple of typo fixes 2021-03-28 17:31:13 -07:00
tcp_metrics.c fixes-v5.11 2020-12-14 16:40:27 -08:00
tcp_minisocks.c tcp: Add stats for socket migration. 2021-06-23 12:56:08 -07:00
tcp_nv.c
tcp_offload.c net, gro: Set inner transport header offset in tcp/udp GRO hook 2021-08-02 10:20:56 +01:00
tcp_output.c ipv6: tcp: drop silly ICMPv6 packet too big messages 2021-07-08 12:27:08 -07:00
tcp_rate.c
tcp_recovery.c tcp: more accurately check DSACKs to grow RACK reordering window 2021-07-27 20:07:21 +01:00
tcp_scalable.c net: ipv4: delete repeated words 2020-08-24 17:31:20 -07:00
tcp_timer.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
tcp_ulp.c
tcp_vegas.c tcp: use semicolons rather than commas to separate statements 2020-10-13 17:11:52 -07:00
tcp_vegas.h
tcp_veno.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_westwood.c
tcp_yeah.c tcp_yeah: check struct yeah size at compile time 2021-06-29 11:54:36 -07:00
tcp.c memcg: enable accounting for inet_bin_bucket cache 2021-07-20 06:00:38 -07:00
tunnel4.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
udp_bpf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-23 16:13:06 +01:00
udp_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
udp_impl.h net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
udp_offload.c net, gro: Set inner transport header offset in tcp/udp GRO hook 2021-08-02 10:20:56 +01:00
udp_tunnel_core.c udp_tunnel: reshuffle NETIF_F_RX_UDP_TUNNEL_PORT checks 2021-01-07 12:53:29 -08:00
udp_tunnel_nic.c udp_tunnel: add the ability to share port tables 2020-09-28 12:50:12 -07:00
udp_tunnel_stub.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udp.c bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum 2021-08-23 17:50:24 -07:00
udplite.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c
xfrm4_protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
xfrm4_state.c
xfrm4_tunnel.c xfrm: remove description from xfrm_type struct 2021-06-09 09:38:52 +02:00