linux/tools
Daniel Borkmann f318903c0b bpf: Add netns cookie and enable it for bpf cgroup hooks
In Cilium we're mainly using BPF cgroup hooks today in order to implement
kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
between Cilium managed nodes. While this works in its current shape and avoids
packet-level NAT for inter Cilium managed node traffic, there is one major
limitation we're facing today, that is, lack of netns awareness.

In Kubernetes, the concept of Pods (which hold one or multiple containers)
has been built around network namespaces, so while we can use the global scope
of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
NodePort ports on loopback addresses), we also have the need to differentiate
between initial network namespaces and non-initial one. For example, ExternalIP
services mandate that non-local service IPs are not to be translated from the
host (initial) network namespace as one example. Right now, we have an ugly
work-around in place where non-local service IPs for ExternalIP services are
not xlated from connect() and friends BPF hooks but instead via less efficient
packet-level NAT on the veth tc ingress hook for Pod traffic.

On top of determining whether we're in initial or non-initial network namespace
we also have a need for a socket-cookie like mechanism for network namespaces
scope. Socket cookies have the nice property that they can be combined as part
of the key structure e.g. for BPF LRU maps without having to worry that the
cookie could be recycled. We are planning to use this for our sessionAffinity
implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
provide the cookie for the initial network namespace while passing the context
instead of NULL would provide the cookie from the application's network namespace.
We're using a hole, so no size increase; the assignment happens only once.
Therefore this allows for a comparison on initial namespace as well as regular
cookie usage as we have today with socket cookies. We could later on enable
this helper for other program types as well as we would see need.

  (*) Both externalTrafficPolicy={Local|Cluster} types
  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
2020-03-27 19:40:38 -07:00
..
accounting
arch tools headers UAPI: Update tools's copy of kvm.h headers 2020-02-27 09:51:30 -03:00
bootconfig bootconfig: Add append value operator support 2020-02-21 09:59:39 -05:00
bpf bpftool: Add struct_ops support 2020-03-20 15:51:35 +01:00
build bpftool: Only build bpftool-prog-profile if supported by clang 2020-03-13 00:08:33 +01:00
cgroup iocost: Fix iocost_monitor.py due to helper type mismatch 2020-01-17 11:54:35 -08:00
debugging
firewire
firmware
gpio tools: gpio: Correctly add make dependencies for gpio_utils 2019-11-13 13:46:04 +01:00
hv Tools: hv: Reopen the devices if read() or write() returns errors 2020-01-26 22:10:10 -05:00
iio tools: iio: Correctly add make dependency for iio_utils 2019-11-10 17:11:06 +00:00
include bpf: Add netns cookie and enable it for bpf cgroup hooks 2020-03-27 19:40:38 -07:00
io_uring
kvm/kvm_stat tools/kvm_stat: Fix kvm_exit filter name 2020-01-23 09:51:06 +01:00
laptop
leds
lib libbpf: Don't allocate 16M for log buffer by default 2020-03-26 00:13:37 +01:00
memory-model
nfsd
objtool objtool: Fix ARCH=x86_64 build error 2020-01-22 07:54:57 +01:00
pci
pcmcia
perf perf annotate: Fix segfault with source toggle 2020-02-27 11:47:23 -03:00
power More power manadement updates for 5.6-rc1 2020-01-31 14:36:35 -08:00
scripts bpftool: Introduce "prog profile" command 2020-03-10 00:04:07 +01:00
spi
testing bpf: Test_verifier, #70 error message updates for 32-bit right shift 2020-03-25 23:05:54 -07:00
thermal/tmon
time
usb usbip: Fix unsafe unaligned pointer usage 2020-01-09 16:44:26 +01:00
virtio
vm tools/vm/slabinfo: fix sanity checks enabling 2020-01-31 10:30:38 -08:00
wmi
Makefile tools: bootconfig: Add bootconfig command 2020-01-13 13:19:39 -05:00