mainlining shenanigans
Go to file
Daniel Borkmann 1b66d25361 bpf: Add get{peer, sock}name attach types for sock_addr
As stated in 983695fa67 ("bpf: fix unconnected udp hooks"), the objective
for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
transparent to applications. In Cilium we make use of these hooks [0] in
order to enable E-W load balancing for existing Kubernetes service types
for all Cilium managed nodes in the cluster. Those backends can be local
or remote. The main advantage of this approach is that it operates as close
as possible to the socket, and therefore allows to avoid packet-based NAT
given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.

This also allows to expose NodePort services on loopback addresses in the
host namespace, for example. As another advantage, this also efficiently
blocks bind requests for applications in the host namespace for exposed
ports. However, one missing item is that we also need to perform reverse
xlation for inet{,6}_getname() hooks such that we can return the service
IP/port tuple back to the application instead of the remote peer address.

The vast majority of applications does not bother about getpeername(), but
in a few occasions we've seen breakage when validating the peer's address
since it returns unexpectedly the backend tuple instead of the service one.
Therefore, this trivial patch allows to customise and adds a getpeername()
as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
to address this situation.

Simple example:

  # ./cilium/cilium service list
  ID   Frontend     Service Type   Backend
  1    1.2.3.4:80   ClusterIP      1 => 10.0.0.10:80

Before; curl's verbose output example, no getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
  > GET / HTTP/1.1
  > Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

After; with getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
  > GET / HTTP/1.1
  >  Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
peer to the context similar as in inet{,6}_getname() fashion, but API-wise
this is suboptimal as it always enforces programs having to test for ctx->peer
which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
Similarly, the checked return code is on tnum_range(1, 1), but if a use case
comes up in future, it can easily be changed to return an error code instead.
Helper and ctx member access is the same as with connect/sendmsg/etc hooks.

  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
2020-05-19 11:32:04 -07:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
block bdi: use bdi_dev_name() to get device name 2020-05-09 16:07:39 -06:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto gcc-10: avoid shadowing standard library 'free()' in crypto 2020-05-09 15:58:04 -07:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
drivers Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
include bpf: Add get{peer, sock}name attach types for sock_addr 2020-05-19 11:32:04 -07:00
init Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:10:06 -07:00
ipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
kernel bpf: Add get{peer, sock}name attach types for sock_addr 2020-05-19 11:32:04 -07:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
net bpf: Add get{peer, sock}name attach types for sock_addr 2020-05-19 11:32:04 -07:00
samples samples, bpf: Refactor kprobe, tail call kern progs map definition 2020-05-19 17:13:03 +02:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
security bpf, capability: Introduce CAP_BPF 2020-05-15 17:29:41 +02:00
sound sound fixes for 5.7-rc6 2020-05-15 10:06:49 -07:00
tools bpf: Add get{peer, sock}name attach types for sock_addr 2020-05-19 11:32:04 -07:00
usr kbuild: fix comment about missing include guard detection 2020-04-11 12:09:48 +09:00
virt KVM: arm64: Fix 32bit PC wrap-around 2020-05-01 09:51:08 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2020-04-18 13:49:33 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
.mailmap mailmap: Add Sedat Dilek (replacement for expired email address) 2020-04-11 09:28:34 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Hand MIPS over to Thomas 2020-02-24 22:43:18 -08:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
Makefile Linux 5.7-rc5 2020-05-10 15:16:58 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.