linux/net
Hoang Le f73b12812a tipc: improve throughput between nodes in netns
Currently, TIPC transports intra-node user data messages directly
socket to socket, hence shortcutting all the lower layers of the
communication stack. This gives TIPC very good intra node performance,
both regarding throughput and latency.

We now introduce a similar mechanism for TIPC data traffic across
network namespaces located in the same kernel. On the send path, the
call chain is as always accompanied by the sending node's network name
space pointer. However, once we have reliably established that the
receiving node is represented by a namespace on the same host, we just
replace the namespace pointer with the receiving node/namespace's
ditto, and follow the regular socket receive patch though the receiving
node. This technique gives us a throughput similar to the node internal
throughput, several times larger than if we let the traffic go though
the full network stacks. As a comparison, max throughput for 64k
messages is four times larger than TCP throughput for the same type of
traffic.

To meet any security concerns, the following should be noted.

- All nodes joining a cluster are supposed to have been be certified
and authenticated by mechanisms outside TIPC. This is no different for
nodes/namespaces on the same host; they have to auto discover each
other using the attached interfaces, and establish links which are
supervised via the regular link monitoring mechanism. Hence, a kernel
local node has no other way to join a cluster than any other node, and
have to obey to policies set in the IP or device layers of the stack.

- Only when a sender has established with 100% certainty that the peer
node is located in a kernel local namespace does it choose to let user
data messages, and only those, take the crossover path to the receiving
node/namespace.

- If the receiving node/namespace is removed, its namespace pointer
is invalidated at all peer nodes, and their neighbor link monitoring
will eventually note that this node is gone.

- To ensure the "100% certainty" criteria, and prevent any possible
spoofing, received discovery messages must contain a proof that the
sender knows a common secret. We use the hash mix of the sending
node/namespace for this purpose, since it can be accessed directly by
all other namespaces in the kernel. Upon reception of a discovery
message, the receiver checks this proof against all the local
namespaces'hash_mix:es. If it finds a match, that, along with a
matching node id and cluster id, this is deemed sufficient proof that
the peer node in question is in a local namespace, and a wormhole can
be opened.

- We should also consider that TIPC is intended to be a cluster local
IPC mechanism (just like e.g. UNIX sockets) rather than a network
protocol, and hence we think it can justified to allow it to shortcut the
lower protocol layers.

Regarding traceability, we should notice that since commit 6c9081a391
("tipc: add loopback device tracking") it is possible to follow the node
internal packet flow by just activating tcpdump on the loopback
interface. This will be true even for this mechanism; by activating
tcpdump on the involved nodes' loopback interfaces their inter-name
space messaging can easily be tracked.

v2:
- update 'net' pointer when node left/rejoined
v3:
- grab read/write lock when using node ref obj
v4:
- clone traffics between netns to loopback

Suggested-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29 17:55:38 -07:00
..
6lowpan 6lowpan: no need to check return value of debugfs_create functions 2019-07-06 12:50:01 +02:00
9p 9p pull request for inclusion in 5.4 2019-09-27 15:10:34 -07:00
802 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
8021q Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
appletalk appletalk: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
atm pppoatm: use %*ph to print small buffer 2019-09-05 12:33:28 +02:00
ax25 ax25: enforce CAP_NET_RAW for raw sockets 2019-09-24 16:37:18 +02:00
batman-adv netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
bluetooth Bluetooth: hci_core: fix init for HCI_USER_CHANNEL 2019-10-17 07:10:49 +02:00
bpf bpf: Allow __sk_buff tstamp in BPF_PROG_TEST_RUN 2019-10-15 16:24:26 -07:00
bpfilter Kbuild updates for v5.3 2019-07-12 16:03:16 -07:00
bridge net: ensure correct skb->tstamp in various fragmenters 2019-10-18 10:02:37 -07:00
caif Clean up the net/caif/Kconfig menu 2019-10-02 14:47:51 -07:00
can can: add support of SAE J1939 protocol 2019-09-04 14:22:33 +02:00
ceph libceph: use ceph_kvmalloc() for osdmap arrays 2019-09-16 12:06:25 +02:00
core sock: remove unneeded semicolon 2019-10-28 16:38:08 -07:00
dcb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201 2019-05-30 11:29:52 -07:00
dccp netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
decnet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 53 2019-05-24 17:36:42 +02:00
dns_resolver Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
dsa net: dsa: Add support for devlink device parameters 2019-10-28 16:21:02 -07:00
ethernet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
hsr hsr: switch ->dellink() to ->ndo_uninit() 2019-07-11 14:37:45 -07:00
ieee802154 net: ieee802154: have genetlink code to parse the attrs during dumpit 2019-10-06 15:44:47 +02:00
ife net: Fix Kconfig indentation 2019-09-26 08:56:17 +02:00
ipv4 inet: do not call sublist_rcv on empty list 2019-10-29 17:54:29 -07:00
ipv6 inet: do not call sublist_rcv on empty list 2019-10-29 17:54:29 -07:00
iucv net/af_iucv: mark expected switch fall-throughs 2019-07-29 10:26:14 -07:00
kcm kcm: disable preemption in kcm_parse_func_strparser() 2019-09-27 10:27:14 +02:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-07-08 19:48:57 -07:00
l2tp netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
l3mdev ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF 2019-06-23 13:24:17 -07:00
lapb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-17 20:20:36 -07:00
llc net: silence KCSAN warnings around sk_add_backlog() calls 2019-10-09 21:42:59 -07:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-10-20 10:43:00 -07:00
mac802154 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 2019-05-30 11:26:41 -07:00
mpls ipv4: mpls: fix mpls_xmit for iptunnel 2019-08-25 14:34:08 -07:00
ncsi net/ncsi: Disable global multicast filter 2019-09-19 18:04:40 -07:00
netfilter net: Fix various misspellings of "connect" 2019-10-28 13:41:59 -07:00
netlabel netlabel: remove redundant assignment to pointer iter 2019-09-01 11:45:02 -07:00
netlink genetlink: do not parse attributes for families with zero maxattr 2019-10-13 11:20:03 -07:00
netrom netrom: hold sock when setting skb->destructor 2019-07-24 15:49:05 -07:00
nfc net: nfc: have genetlink code to parse the attrs during dumpit 2019-10-06 15:44:47 +02:00
nsh treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-10-20 10:43:00 -07:00
packet netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
phonet treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 336 2019-06-05 17:37:07 +02:00
psample net: sched: take reference to psample group in flow_action infra 2019-09-16 09:18:03 +02:00
qrtr net: qrtr: Stop rx_worker before freeing node 2019-09-21 18:45:46 -07:00
rds net/rds: Remove unnecessary null check 2019-10-17 15:23:03 -04:00
rfkill treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
rose treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-10-20 10:43:00 -07:00
sched net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware 2019-10-25 19:19:51 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-10-20 10:43:00 -07:00
smc net/smc: remove close abort worker 2019-10-22 11:23:44 -07:00
strparser Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-22 08:59:24 -04:00
sunrpc SUNRPC: fix race to sk_err after xs_error_report 2019-10-10 16:14:28 -04:00
switchdev treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tipc tipc: improve throughput between nodes in netns 2019-10-29 17:55:38 -07:00
tls net/tls: store decrypted on a single bit 2019-10-07 09:58:28 -04:00
unix af_unix: __unix_find_socket_byname() cleanup 2019-10-11 20:40:18 -07:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-10-20 10:43:00 -07:00
wimax wimax: no need to check return value of debugfs_create functions 2019-08-10 15:25:47 -07:00
wireless net: Fix various misspellings of "connect" 2019-10-28 13:41:59 -07:00
x25 net: silence KCSAN warnings around sk_add_backlog() calls 2019-10-09 21:42:59 -07:00
xdp xsk: Fix crash in poll when device does not support ndo_xsk_wakeup 2019-10-03 16:34:27 +02:00
xfrm netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
compat.c uio: make import_iovec()/compat_import_iovec() return bytes on success 2019-05-31 15:30:03 -06:00
Kconfig devlink: Add packet trap infrastructure 2019-08-17 12:40:08 -07:00
Makefile
socket.c Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
sysctl_net.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00