linux/net/ipv4
Andrey Ignatov 4fbac77d2d bpf: Hooks for sys_bind
== The problem ==

There is a use-case when all processes inside a cgroup should use one
single IP address on a host that has multiple IP configured.  Those
processes should use the IP for both ingress and egress, for TCP and UDP
traffic. So TCP/UDP servers should be bound to that IP to accept
incoming connections on it, and TCP/UDP clients should make outgoing
connections from that IP. It should not require changing application
code since it's often not possible.

Currently it's solved by intercepting glibc wrappers around syscalls
such as `bind(2)` and `connect(2)`. It's done by a shared library that
is preloaded for every process in a cgroup so that whenever TCP/UDP
server calls `bind(2)`, the library replaces IP in sockaddr before
passing arguments to syscall. When application calls `connect(2)` the
library transparently binds the local end of connection to that IP
(`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty).

Shared library approach is fragile though, e.g.:
* some applications clear env vars (incl. `LD_PRELOAD`);
* `/etc/ld.so.preload` doesn't help since some applications are linked
  with option `-z nodefaultlib`;
* other applications don't use glibc and there is nothing to intercept.

== The solution ==

The patch provides much more reliable in-kernel solution for the 1st
part of the problem: binding TCP/UDP servers on desired IP. It does not
depend on application environment and implementation details (whether
glibc is used or not).

It adds new eBPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` and
attach types `BPF_CGROUP_INET4_BIND` and `BPF_CGROUP_INET6_BIND`
(similar to already existing `BPF_CGROUP_INET_SOCK_CREATE`).

The new program type is intended to be used with sockets (`struct sock`)
in a cgroup and provided by user `struct sockaddr`. Pointers to both of
them are parts of the context passed to programs of newly added types.

The new attach types provides hooks in `bind(2)` system call for both
IPv4 and IPv6 so that one can write a program to override IP addresses
and ports user program tries to bind to and apply such a program for
whole cgroup.

== Implementation notes ==

[1]
Separate attach types for `AF_INET` and `AF_INET6` are added
intentionally to prevent reading/writing to offsets that don't make
sense for corresponding socket family. E.g. if user passes `sockaddr_in`
it doesn't make sense to read from / write to `user_ip6[]` context
fields.

[2]
The write access to `struct bpf_sock_addr_kern` is implemented using
special field as an additional "register".

There are just two registers in `sock_addr_convert_ctx_access`: `src`
with value to write and `dst` with pointer to context that can't be
changed not to break later instructions. But the fields, allowed to
write to, are not available directly and to access them address of
corresponding pointer has to be loaded first. To get additional register
the 1st not used by `src` and `dst` one is taken, its content is saved
to `bpf_sock_addr_kern.tmp_reg`, then the register is used to load
address of pointer field, and finally the register's content is restored
from the temporary field after writing `src` value.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:15:18 +02:00
..
netfilter net: Convert ipv4_net_ops 2018-03-08 12:36:45 -05:00
af_inet.c bpf: Hooks for sys_bind 2018-03-31 02:15:18 +02:00
ah4.c net: use -ENOSPC for transient busy indication 2017-11-03 22:11:17 +08:00
arp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
cipso_ipv4.c tcp/dccp: fix ireq->opt races 2017-10-21 01:33:19 +01:00
datagram.c
devinet.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
esp4_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-23 13:51:56 -05:00
esp4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-17 00:10:42 -05:00
fib_frontend.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
fib_lookup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_notifier.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_rules.c ipv6: route: dissect flow in input path if fib rules need it 2018-02-28 22:44:44 -05:00
fib_semantics.c net/ipv4: Pass net to fib_multipath_hash instead of fib_info 2018-03-04 13:04:21 -05:00
fib_trie.c net: make kmem caches as __ro_after_init 2018-02-26 15:11:48 -05:00
fou.c net: Convert fou_net_ops 2018-03-05 10:48:28 -05:00
gre_demux.c
gre_offload.c gso: fix payload length when gso_size is zero 2017-10-08 10:12:15 -07:00
icmp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
igmp.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
inet_connection_sock.c Revert "defer call to mem_cgroup_sk_alloc()" 2018-02-02 19:49:31 -05:00
inet_diag.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
inet_fragment.c net: Fix hlist corruptions in inet_evict_bucket() 2018-03-07 13:29:28 -05:00
inet_hashtables.c inet: Avoid unitialized variable warning in inet_unhash() 2018-02-01 09:48:42 -05:00
inet_timewait_sock.c net: Convert atomic_t net::count to refcount_t 2018-01-15 14:23:42 -05:00
inetpeer.c net: make kmem caches as __ro_after_init 2018-02-26 15:11:48 -05:00
ip_forward.c net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ip_fragment.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
ip_gre.c gre: fix TUNNEL_SEQ bit check on sequence numbering 2018-03-22 14:52:43 -04:00
ip_input.c net: Make ip_ra_chain per struct net 2018-03-22 15:12:56 -04:00
ip_options.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip_output.c net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ip_sockglue.c net: Replace ip_ra_lock with per-net mutex 2018-03-22 15:12:56 -04:00
ip_tunnel_core.c net: store port/representator id in metadata_dst 2017-06-25 11:42:01 -04:00
ip_tunnel.c net: do not create fallback tunnels for non-default namespaces 2018-03-09 11:23:11 -05:00
ip_vti.c net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops 2018-02-27 11:01:37 -05:00
ipcomp.c
ipconfig.c ipconfig: use dev_set_mtu() 2018-01-24 19:13:45 -05:00
ipip.c net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops 2018-02-27 11:01:37 -05:00
ipmr_base.c ipmr, ip6mr: Unite dumproute flows 2018-03-01 13:13:23 -05:00
ipmr.c net: Revert "ipv4: fix a deadlock in ip_ra_control" 2018-03-22 15:12:56 -04:00
Kconfig ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
Makefile ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
netfilter.c netfilter: remove struct nf_afinfo and its helper functions 2018-01-08 18:11:02 +01:00
ping.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
proc.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw_diag.c net: ipv6: add second dif to raw socket lookups 2017-08-07 11:39:22 -07:00
raw.c net: Revert "ipv4: fix a deadlock in ip_ra_control" 2018-03-22 15:12:56 -04:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
syncookies.c tcp: Namespace-ify sysctl_tcp_workaround_signed_windows 2017-10-28 19:24:38 +09:00
sysctl_net_ipv4.c udp: Move the udp sysctl to namespace. 2018-03-16 12:03:30 -04:00
tcp_bbr.c net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit 2018-03-16 15:07:48 -04:00
tcp_bic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_cdg.c tcp: cdg: make struct tcp_cdg static 2017-10-16 21:24:25 +01:00
tcp_cong.c tcp: Namespace-ify sysctl_tcp_default_congestion_control 2017-11-15 14:09:52 +09:00
tcp_cubic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_dctcp.c
tcp_diag.c net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store 2017-12-20 14:00:25 -05:00
tcp_fastopen.c tcp: pause Fast Open globally after third consecutive timeout 2017-12-13 15:51:12 -05:00
tcp_highspeed.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_htcp.c tcp: fix cwnd undo in Reno and HTCP congestion controls 2017-08-06 21:25:10 -07:00
tcp_hybla.c
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-02-28 12:03:47 -05:00
tcp_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-06 01:20:46 -05:00
tcp_ipv4.c tcp: remove dead code after CHECKSUM_PARTIAL adoption 2018-02-21 14:24:14 -05:00
tcp_lp.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_metrics.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
tcp_minisocks.c tcp: try to keep packet if SYN_RCV race is lost 2018-02-14 14:21:45 -05:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-01-31 10:26:30 -05:00
tcp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
tcp_output.c tcp_bbr: better deal with suboptimal GSO (II) 2018-03-01 21:44:28 -05:00
tcp_rate.c tcp: invalidate rate samples during SACK reneging 2017-12-08 10:07:02 -05:00
tcp_recovery.c tcp: evaluate packet losses upon RTT change 2017-12-08 14:14:11 -05:00
tcp_scalable.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_timer.c tcp: purge write queue upon aborting the connection 2018-03-07 15:01:03 -05:00
tcp_ulp.c net: add a UID to use for ULP socket assignment 2018-02-06 11:39:31 +01:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-09-29 06:07:00 +01:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_westwood.c tcp: Revert "tcp: remove CA_ACK_SLOWPATH" 2017-08-30 11:20:08 -07:00
tcp_yeah.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp.c bpf: sockmap redirect ingress support 2018-03-30 00:09:43 +02:00
tunnel4.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
udp_diag.c net: ipv6: add second dif to udp socket lookups 2017-08-07 11:39:22 -07:00
udp_impl.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
udp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
udp_tunnel.c net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udp.c udp: Move the udp sysctl to namespace. 2018-03-16 12:03:30 -04:00
udplite.c net: Convert pernet_subsys, registered from inet_init() 2018-02-13 10:36:08 -05:00
xfrm4_input.c xfrm: Reinject transport-mode packets through tasklet 2017-12-19 08:23:21 +01:00
xfrm4_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm4_mode_transport.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm4_policy.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
xfrm4_protocol.c
xfrm4_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfrm4_tunnel.c