linux/net
Andrey Ignatov 7723628101 bpf: Introduce bpf_skb_ancestor_cgroup_id helper
== Problem description ==

It's useful to be able to identify cgroup associated with skb in TC so
that a policy can be applied to this skb, and existing bpf_skb_cgroup_id
helper can help with this.

Though in real life cgroup hierarchy and hierarchy to apply a policy to
don't map 1:1.

It's often the case that there is a container and corresponding cgroup,
but there are many more sub-cgroups inside container, e.g. because it's
delegated to containerized application to control resources for its
subsystems, or to separate application inside container from infra that
belongs to containerization system (e.g. sshd).

At the same time it may be useful to apply a policy to container as a
whole.

If multiple containers like this are run on a host (what is often the
case) and many of them have sub-cgroups, it may not be possible to apply
per-container policy in TC with existing helpers such as
bpf_skb_under_cgroup or bpf_skb_cgroup_id:

* bpf_skb_cgroup_id will return id of immediate cgroup associated with
  skb, i.e. if it's a sub-cgroup inside container, it can't be used to
  identify container's cgroup;

* bpf_skb_under_cgroup can work only with one cgroup and doesn't scale,
  i.e. if there are N containers on a host and a policy has to be
  applied to M of them (0 <= M <= N), it'd require M calls to
  bpf_skb_under_cgroup, and, if M changes, it'd require to rebuild &
  load new BPF program.

== Solution ==

The patch introduces new helper bpf_skb_ancestor_cgroup_id that can be
used to get id of cgroup v2 that is an ancestor of cgroup associated
with skb at specified level of cgroup hierarchy.

That way admin can place all containers on one level of cgroup hierarchy
(what is a good practice in general and already used in many
configurations) and identify specific cgroup on this level no matter
what sub-cgroup skb is associated with.

E.g. if there is a cgroup hierarchy:
  root/
  root/container1/
  root/container1/app11/
  root/container1/app11/sub-app-a/
  root/container1/app12/
  root/container2/
  root/container2/app21/
  root/container2/app22/
  root/container2/app22/sub-app-b/

, then having skb associated with root/container1/app11/sub-app-a/ it's
possible to get ancestor at level 1, what is container1 and apply policy
for this container, or apply another policy if it's container2.

Policies can be kept e.g. in a hash map where key is a container cgroup
id and value is an action.

Levels where container cgroups are created are usually known in advance
whether cgroup hierarchy inside container may be hard to predict
especially in case when its creation is delegated to containerized
application.

== Implementation details ==

The helper gets ancestor by walking parents up to specified level.

Another option would be to get different kind of "id" from
cgroup->ancestor_ids[level] and use it with idr_find() to get struct
cgroup for ancestor. But that would require radix lookup what doesn't
seem to be better (at least it's not obviously better).

Format of return value of the new helper is same as that of
bpf_skb_cgroup_id.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-13 01:02:39 +02:00
..
6lowpan 6lowpan: iphc: reset mac_header after decompress to fix panic 2018-07-06 12:32:12 +02:00
9p net:mod: remove unneeded variable 'ret' in init_p9 2018-08-08 09:40:44 -07:00
802
8021q net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
appletalk Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
atm net: simplify sock_poll_wait 2018-07-30 09:10:25 -07:00
ax25 ax25: remove blank line at EOF 2018-07-24 14:10:42 -07:00
batman-adv Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux 2018-07-20 21:17:12 -07:00
bluetooth Bluetooth: hidp: buffer overflow in hidp_process_report 2018-08-01 09:12:35 +02:00
bpf bpf/test_run: support cgroup local storage 2018-08-03 00:47:32 +02:00
bpfilter bpfilter: remove trailing newline 2018-07-24 14:10:42 -07:00
bridge net/bridge/br_multicast: remove redundant variable "err" 2018-08-06 10:33:44 -07:00
caif net: simplify sock_poll_wait 2018-07-30 09:10:25 -07:00
can Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
ceph The main piece is a set of libceph changes that revamps how OSD 2018-06-15 07:24:58 +09:00
core bpf: Introduce bpf_skb_ancestor_cgroup_id helper 2018-08-13 01:02:39 +02:00
dcb net: dcb: Add priority-to-DSCP map getters 2018-07-27 13:17:50 -07:00
dccp Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
decnet decnet: whitespace fixes 2018-07-24 14:10:42 -07:00
dns_resolver net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
dsa Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
ethernet net: Convert GRO SKB handling to list_head. 2018-06-26 11:33:04 +09:00
hsr
ieee802154 net: ieee802154: 6lowpan: remove redundant pointers 'fq' and 'net' 2018-08-06 11:21:15 +02:00
ife
ipv4 bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection 2018-08-11 01:58:46 +02:00
ipv6 bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection 2018-08-11 01:58:46 +02:00
iucv net:af_iucv: get rid of the unneeded variable 'err' in afiucv_pm_freeze 2018-08-08 09:39:36 -07:00
kcm net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
key Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-07-27 09:33:37 -07:00
l2tp Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-05 13:04:31 -07:00
l3mdev
lapb
llc Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
mac80211 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-24 19:21:58 -07:00
mac802154 net: mac802154: tx: expand tailroom if necessary 2018-08-06 11:21:37 +02:00
mpls mpls: remove trailing whitepace 2018-07-24 14:10:42 -07:00
ncsi net/ncsi: Use netdev_dbg for debug messages 2018-06-20 07:26:58 +09:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2018-08-05 16:25:22 -07:00
netlabel
netlink Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-05 13:04:31 -07:00
netrom Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
nfc net: simplify sock_poll_wait 2018-07-30 09:10:25 -07:00
nsh nsh: set mac len based on inner packet 2018-07-12 16:55:29 -07:00
openvswitch Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
packet Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
phonet Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
psample
qrtr net: qrtr: Reset the node and port ID of broadcast messages 2018-07-05 20:20:03 +09:00
rds RDS: IB: fix 'passing zero to ERR_PTR()' warning 2018-08-07 13:19:45 -07:00
rfkill
rose Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
rxrpc Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
sched net: sched: cls_flower: set correct offload data in fl_reoffload 2018-08-07 12:35:17 -07:00
sctp sctp: whitespace fixes 2018-07-24 14:10:42 -07:00
smc Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
strparser strparser: remove redundant variable 'rd_desc' 2018-08-01 10:00:06 -07:00
sunrpc net: Remove some unneeded semicolon 2018-08-04 13:05:39 -07:00
switchdev
tipc Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-09 11:52:36 -07:00
tls net/tls: Mark the end in scatterlist table 2018-08-05 17:13:58 -07:00
unix af_unix: ensure POLLOUT on remote close() for connected dgram socket 2018-08-03 16:44:19 -07:00
vmw_vsock vsock: split dwork to avoid reinitializations 2018-08-07 12:39:13 -07:00
wimax wimax: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
wireless Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-24 19:21:58 -07:00
x25 x25: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
xdp Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-05 13:04:31 -07:00
xfrm Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
compat.c net: avoid unnecessary sock_flag() check when enable timestamp 2018-08-06 10:42:48 -07:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile bpfilter: check compiler capability in Kconfig 2018-06-28 13:36:39 +09:00
socket.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-08-02 10:55:32 -07:00
sysctl_net.c