linux/net
Eric Dumazet 05255b823a tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
..
6lowpan License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
9p net/9p/client.c: fix potential refcnt problem of trans module 2018-04-05 21:36:23 -07:00
802 treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
8021q vlan: also check phy_driver ts_info for vlan's real device 2018-04-01 20:53:50 -04:00
appletalk net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
atm net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
ax25 net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2018-04-08 17:19:15 -04:00
bpf bpf: making bpf_prog_test run aware of possible data_end ptr change 2018-04-18 23:34:16 +02:00
bridge bridge: use hlist_entry_safe 2018-04-27 13:20:48 -04:00
caif net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN" 2018-04-19 13:37:10 -04:00
can net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ceph The big ticket items are: 2018-04-10 12:25:30 -07:00
core net: Fix coccinelle warning 2018-04-27 13:53:14 -04:00
dcb
dccp dccp: initialize ireq->ir_mark 2018-04-07 22:32:31 -04:00
decnet net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
dns_resolver KEYS: DNS: limit the length of option strings 2018-04-17 15:17:41 -04:00
dsa net: dsa: Allow providing PHY statistics from CPU port 2018-04-27 11:53:03 -04:00
ethernet
hsr net: hsr: Convert timers to use timer_setup() 2017-10-25 13:00:27 +09:00
ieee802154 inet: frags: fix ip6frag_low_thresh boundary 2018-04-04 12:04:59 -04:00
ife net: sched: ife: check on metadata length 2018-04-22 21:12:00 -04:00
ipv4 tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive 2018-04-29 21:29:55 -04:00
ipv6 tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive 2018-04-29 21:29:55 -04:00
iucv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
kcm net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
key net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
l2tp l2tp: consistent reference counting in procfs and debufs 2018-04-27 11:06:35 -04:00
l3mdev
lapb treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts 2017-11-21 16:35:54 -08:00
llc llc: fix NULL pointer deref for SOCK_ZAPPED 2018-04-22 14:56:22 -04:00
mac80211 We have a fair number of patches, but many of them are from the 2018-03-29 16:23:26 -04:00
mac802154 net/mac802154: disambiguate mac80215 vs mac802154 trace events 2018-03-28 22:55:18 +02:00
mpls net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ncsi net/ncsi: Refactor MAC, VLAN filters 2018-04-17 13:50:58 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-24 23:59:11 -04:00
netlabel netlabel: If PF_INET6, check sk_buff ip header version 2018-02-14 14:01:41 -05:00
netlink netlink: fix uninit-value in netlink_sendmsg 2018-04-07 22:32:31 -04:00
netrom net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-19 18:46:11 -05:00
nsh openvswitch: enable NSH support 2017-11-08 16:12:33 +09:00
openvswitch ovs: Remove rtnl_lock() from ovs_exit_net() 2018-03-29 13:47:54 -04:00
packet packet: fix bitfield update race 2018-04-24 13:17:08 -04:00
phonet net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
psample MAINTAINERS: Update Yotam's E-mail 2017-11-01 12:19:03 +09:00
qrtr net: qrtr: Expose tunneling endpoint to user space 2018-04-27 15:06:10 -04:00
rds rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp 2018-04-25 14:34:08 -04:00
rfkill vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
rose net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
rxrpc rxrpc: Fix undefined packet handling 2018-04-04 11:04:08 -04:00
sched net: sched: ife: handle malformed tlv length 2018-04-22 21:12:00 -04:00
sctp sctp: allow unsetting sockopt MAXSEG 2018-04-27 14:35:24 -04:00
smc net/smc: handle sockopt TCP_DEFER_ACCEPT 2018-04-27 14:02:52 -04:00
strparser strparser: Do not call mod_delayed_work with a timeout of LONG_MAX 2018-04-22 21:09:16 -04:00
sunrpc rpc_pipefs: fix double-dput() 2018-04-15 23:49:27 -04:00
switchdev net: bridge: Add/del switchdev object on host join/leave 2017-11-10 13:41:40 +09:00
tipc tipc: introduce ioctl for fetching node identity 2018-04-27 11:05:41 -04:00
tls net/tls: remove redundant second null check on sgout 2018-04-24 16:02:10 -04:00
unix af_unix: remove redundant lockdep class 2018-04-04 11:13:40 -04:00
vmw_vsock VSOCK: make af_vsock.ko removable again 2018-04-17 09:44:30 -04:00
wimax License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
x25 net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
compat.c net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to compat syscalls 2018-04-02 20:15:20 +02:00
Kconfig page_pool: refurbish version of page_pool code 2018-04-17 10:50:29 -04:00
Makefile ipx: move Novell IPX protocol support into staging 2017-11-28 13:55:00 +01:00
socket.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2018-04-05 11:56:35 -07:00
sysctl_net.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00