linux/net
Eric Dumazet 90c337da15 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-06 23:57:12 -07:00
..
6lowpan
9p 9p: patches for 4.1 merge window 2015-04-18 17:45:30 -04:00
802
8021q vlan: Add GRO support for non hardware accelerated vlan 2015-06-01 16:50:52 -07:00
appletalk net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
atm net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
ax25 net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
batman-adv batman-adv: Remove unnecessary ret variable in algo_register 2015-06-03 15:57:25 +02:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2015-05-30 23:26:45 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-01 22:51:30 -07:00
caif Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-01 22:51:30 -07:00
can net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
ceph Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-01 22:51:30 -07:00
core mpls: Add MPLS entropy label in flow_keys 2015-06-04 15:44:31 -07:00
dcb
dccp inet: fix possible panic in reqsk_queue_unlink() 2015-04-24 11:39:15 -04:00
decnet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
dns_resolver
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-01 22:51:30 -07:00
ethernet net: Add full IPv6 addresses to flow_keys 2015-06-04 15:44:30 -07:00
hsr
ieee802154 nl802154: add support to set cca ed level 2015-05-27 19:29:42 +02:00
ipv4 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations 2015-06-06 23:57:12 -07:00
ipv6 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations 2015-06-06 23:57:12 -07:00
ipx net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
irda irda: use msecs_to_jiffies for conversion to jiffies 2015-05-25 17:46:21 -04:00
iucv net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
key net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
l2tp net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
lapb
llc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-01 22:51:30 -07:00
mac802154 nl802154: add support to set cca ed level 2015-05-27 19:29:42 +02:00
mpls net: Add priority to packet_offload objects. 2015-06-01 14:56:09 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2015-05-31 00:02:30 -07:00
netlabel netlink: implement nla_put_in_addr and nla_put_in6_addr 2015-03-31 13:58:35 -04:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-05-23 01:22:35 -04:00
netrom net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
nfc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
openvswitch openvswitch: include datapath actions with sampled-packet upcall to userspace 2015-06-01 15:05:40 -07:00
packet net-packet: fix null pointer exception in rollover mode 2015-05-17 22:41:38 -04:00
phonet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
rds net/rds Add getsockopt support for SO_RDS_TRANSPORT 2015-05-31 21:47:23 -07:00
rfkill net: rfkill: gpio: make better use of gpiod API 2015-05-29 13:13:45 +02:00
rose net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
rxrpc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
sched net: Add full IPv6 addresses to flow_keys 2015-06-04 15:44:30 -07:00
sctp ipv6: Add rt6_get_cookie() function 2015-05-25 13:25:34 -04:00
sunrpc svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures 2015-05-04 12:02:40 -04:00
switchdev switchdev: documentation: use switchdev_port_obj_xxx for IPv4 FIB add/modify/delete ops 2015-06-03 23:47:23 -07:00
tipc tipc: unconditionally put sock refcnt when sock timer to be deleted is pending 2015-05-30 18:08:37 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-01 22:51:30 -07:00
vmw_vsock net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
wimax
wireless cfg80211: ignore netif running state when changing iftype 2015-05-29 13:05:40 +02:00
x25 net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-01 22:51:30 -07:00
compat.c net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
Kconfig net: add CONFIG_NET_INGRESS to enable ingress filtering 2015-05-14 01:10:05 -04:00
Makefile
socket.c net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
sysctl_net.c