linux/include
Eric Dumazet 05255b823a tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
..
acpi xen: fixes for 4.17-rc1 2018-04-12 11:04:35 -07:00
asm-generic asm-generic fixes for v4.17-rc1 2018-04-12 09:15:48 -07:00
clocksource ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
crypto
drm drm: Fix HDCP downstream dev count read 2018-04-16 12:10:48 -04:00
dt-bindings Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-21 16:32:48 -04:00
keys
kvm
linux net: phy: Fix modular PHYLIB build 2018-04-28 16:48:04 -04:00
math-emu
media media updates for v4.17-rc1 2018-04-10 10:10:30 -07:00
memory
misc
net sctp: remove sctp_transport_pmtu_check 2018-04-27 14:35:23 -04:00
pcmcia
ras
rdma Merge candidates for 4.17 merge window 2018-04-06 17:35:43 -07:00
scsi SCSI for-linus on 20180404 2018-04-05 15:05:53 -07:00
soc ARM: SoC driver updates for 4.17 2018-04-05 21:29:35 -07:00
sound sound updates for 4.17-rc1 2018-04-05 10:42:07 -07:00
target
trace net: introduce a new tracepoint for tcp_rcv_space_adjust 2018-04-23 09:58:18 -04:00
uapi tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive 2018-04-29 21:29:55 -04:00
video
xen xen/sndif: Sync up with the canonical definition in Xen 2018-04-17 08:26:33 -04:00