linux/net
Daniel Borkmann 8520e224f5 bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode
Fix cgroup v1 interference when non-root cgroup v2 BPF programs are used.
Back in the days, commit bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
embedded per-socket cgroup information into sock->sk_cgrp_data and in order
to save 8 bytes in struct sock made both mutually exclusive, that is, when
cgroup v1 socket tagging (e.g. net_cls/net_prio) is used, then cgroup v2
falls back to the root cgroup in sock_cgroup_ptr() (&cgrp_dfl_root.cgrp).

The assumption made was "there is no reason to mix the two and this is in line
with how legacy and v2 compatibility is handled" as stated in bd1060a1d6.
However, with Kubernetes more widely supporting cgroups v2 as well nowadays,
this assumption no longer holds, and the possibility of the v1/v2 mixed mode
with the v2 root fallback being hit becomes a real security issue.

Many of the cgroup v2 BPF programs are also used for policy enforcement, just
to pick _one_ example, that is, to programmatically deny socket related system
calls like connect(2) or bind(2). A v2 root fallback would implicitly cause
a policy bypass for the affected Pods.

In production environments, we have recently seen this case due to various
circumstances: i) a different 3rd party agent and/or ii) a container runtime
such as [0] in the user's environment configuring legacy cgroup v1 net_cls
tags, which triggered implicitly mentioned root fallback. Another case is
Kubernetes projects like kind [1] which create Kubernetes nodes in a container
and also add cgroup namespaces to the mix, meaning programs which are attached
to the cgroup v2 root of the cgroup namespace get attached to a non-root
cgroup v2 path from init namespace point of view. And the latter's root is
out of reach for agents on a kind Kubernetes node to configure. Meaning, any
entity on the node setting cgroup v1 net_cls tag will trigger the bypass
despite cgroup v2 BPF programs attached to the namespace root.

Generally, this mutual exclusiveness does not hold anymore in today's user
environments and makes cgroup v2 usage from BPF side fragile and unreliable.
This fix adds proper struct cgroup pointer for the cgroup v2 case to struct
sock_cgroup_data in order to address these issues; this implicitly also fixes
the tradeoffs being made back then with regards to races and refcount leaks
as stated in bd1060a1d6, and removes the fallback, so that cgroup v2 BPF
programs always operate as expected.

  [0] https://github.com/nestybox/sysbox/
  [1] https://kind.sigs.k8s.io/

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20210913230759.2313-1-daniel@iogearbox.net
2021-09-13 16:35:58 -07:00
..
6lowpan 6lowpan: iphc: Fix an off-by-one check of array index 2021-07-22 16:19:03 +02:00
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802 net: 802: remove dead leftover after ipx driver removal 2021-08-13 16:30:35 -07:00
8021q dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
appletalk net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25 ax25: use skb_expand_head 2021-08-03 11:21:39 +01:00
batman-adv batman-adv: bcast: remove remaining skb-copy calls 2021-08-18 18:39:00 +02:00
bluetooth bluetooth-next pull request for net-next: 2021-08-20 12:16:05 +01:00
bpf bpf: Refactor BPF_PROG_RUN into a function 2021-08-17 00:45:07 +02:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: use mld2r_ngrec instead of icmpv6_dataun 2021-08-31 11:29:23 +01:00
caif net: fix uninit-value in caif_seqpkt_sendmsg 2021-07-15 11:08:33 -07:00
can net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
decnet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
dns_resolver
dsa net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid 2021-08-25 11:14:34 +01:00
ethernet move netdev_boot_setup into Space.c 2021-08-03 13:05:26 +01:00
ethtool ethtool: extend coalesce setting uAPI with CQE mode 2021-08-24 07:38:29 -07:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
ife
ipv4 fou: remove sparse errors 2021-08-31 12:03:33 +01:00
ipv6 ipv6: seg6: remove duplicated include 2021-08-31 12:41:47 +01:00
iucv net/iucv: Replace deprecated CPU-hotplug functions. 2021-08-09 10:13:32 +01:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
mac80211 mac80211: parse transmit power envelope element 2021-08-26 10:18:56 +02:00
mac802154
mctp mctp: Specify route types, require rtm_type in RTM_*ROUTE messages 2021-08-11 16:01:17 -07:00
mpls mpls: defer ttl decrement in mpls_forward() 2021-07-23 17:17:56 +01:00
mptcp mptcp: make the locking tx schema more readable 2021-08-27 09:45:07 +01:00
ncsi net/ncsi: add dummy response handler for Intel boards 2021-07-08 14:16:39 -07:00
netfilter Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-08-30 16:42:47 -07:00
netlabel net: fix NULL pointer reference in cipso_v4_doi_free 2021-08-30 12:23:18 +01:00
netlink net: netlink: Remove unused function 2021-07-30 18:35:47 +02:00
netrom net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
nfc net: in_irq() cleanup 2021-08-13 14:09:19 -07:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-19 18:09:18 -07:00
packet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
phonet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
psample
qrtr net: qrtr: make checks in qrtr_endpoint_post() stricter 2021-08-30 12:26:57 +01:00
rds net/rds: dma_map_sg is entitled to merge entries 2021-08-18 15:35:50 -07:00
rfkill
rose
rxrpc net: RxRPC: make dependent Kconfig symbols be shown indented 2021-08-18 10:12:11 +01:00
sched net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed 2021-08-30 20:29:03 -07:00
sctp sctp: move the active_key update after sh_keys is added 2021-08-03 11:43:43 +01:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc New features: 2021-08-31 10:57:06 -07:00
switchdev net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge 2021-08-04 12:35:07 +01:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-19 18:09:18 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
unix af_unix: fix potential NULL deref in unix_dgram_connect() 2021-08-31 11:31:52 +01:00
vmw_vsock vsock/virtio: avoid potential deadlock when vsock device remove 2021-08-12 10:57:27 -07:00
wireless cfg80211: use wiphy DFS domain if it is self-managed 2021-08-26 11:04:55 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ 2021-08-27 11:16:29 +01:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
Makefile mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
socket.c Core: 2021-08-31 16:43:06 -07:00
sysctl_net.c