linux/net
Christoph Paasch 5924f17a8a tcp: Fix divide by zero when pushing during tcp-repair
When in repair-mode and TCP_RECV_QUEUE is set, we end up calling
tcp_push with mss_now being 0. If data is in the send-queue and
tcp_set_skb_tso_segs gets called, we crash because it will divide by
mss_now:

[  347.151939] divide error: 0000 [#1] SMP
[  347.152907] Modules linked in:
[  347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4
[  347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000
[  347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1
[  347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0
[  347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000
[  347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00
[  347.152907]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0
[  347.152907] Stack:
[  347.152907]  c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080
[  347.152907]  f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85
[  347.152907]  c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8
[  347.152907] Call Trace:
[  347.152907]  [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34
[  347.152907]  [<c16013e6>] tcp_init_tso_segs+0x36/0x50
[  347.152907]  [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0
[  347.152907]  [<c10a0188>] ? up+0x28/0x40
[  347.152907]  [<c10acc85>] ? console_unlock+0x295/0x480
[  347.152907]  [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0
[  347.152907]  [<c1605716>] __tcp_push_pending_frames+0x36/0xd0
[  347.152907]  [<c15f4860>] tcp_push+0xf0/0x120
[  347.152907]  [<c15f7641>] tcp_sendmsg+0xf1/0xbf0
[  347.152907]  [<c116d920>] ? kmem_cache_free+0xf0/0x120
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c114f0f0>] ? do_wp_page+0x3e0/0x850
[  347.152907]  [<c161c36a>] inet_sendmsg+0x4a/0xb0
[  347.152907]  [<c1150269>] ? handle_mm_fault+0x709/0xfb0
[  347.152907]  [<c15a006b>] sock_aio_write+0xbb/0xd0
[  347.152907]  [<c1180b79>] do_sync_write+0x69/0xa0
[  347.152907]  [<c1181023>] vfs_write+0x123/0x160
[  347.152907]  [<c1181d55>] SyS_write+0x55/0xb0
[  347.152907]  [<c167f0d8>] sysenter_do_call+0x12/0x28

This can easily be reproduced with the following packetdrill-script (the
"magic" with netem, sk_pacing and limit_output_bytes is done to prevent
the kernel from pushing all segments, because hitting the limit without
doing this is not so easy with packetdrill):

0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0

+0  < S 0:0(0) win 32792 <mss 1460>
+0  > S. 0:0(0) ack 1 <mss 1460>
+0.1  < . 1:1(0) ack 1 win 65000

+0  accept(3, ..., ...) = 4

// This forces that not all segments of the snd-queue will be pushed
+0 `tc qdisc add dev tun0 root netem delay 10ms`
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2`
+0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0

+0 write(4,...,10000) = 10000
+0 write(4,...,10000) = 10000

// Set tcp-repair stuff, particularly TCP_RECV_QUEUE
+0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0
+0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0

// This now will make the write push the remaining segments
+0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000`

// Now we will crash
+0 write(4,...,1000) = 1000

This happens since ec34232575 (tcp: fix retransmission in repair
mode). Prior to that, the call to tcp_push was prevented by a check for
tp->repair.

The patch fixes it, by adding the new goto-label out_nopush. When exiting
tcp_sendmsg and a push is not required, which is the case for tp->repair,
we go to this label.

When repairing and calling send() with TCP_RECV_QUEUE, the data is
actually put in the receive-queue. So, no push is required because no
data has been added to the send-queue.

Cc: Andrew Vagin <avagin@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Fixes: ec34232575 (tcp: fix retransmission in repair mode)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:21:03 -07:00
..
9p A bunch of updates and cleanup within the transport layer, 2014-04-11 14:14:57 -07:00
802
8021q vlan: free percpu stats in device destructor 2014-07-02 17:02:05 -07:00
appletalk net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
ax25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
batman-adv net: add __pskb_copy_fclone and pskb_copy_for_clone 2014-06-11 15:38:02 -07:00
bluetooth Bluetooth: Fix for ACL disconnect when pairing fails 2014-06-20 13:53:54 +02:00
bridge bridge: fix compile error when compiling without IPv6 support 2014-06-12 11:00:24 -07:00
caif net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
can can: add hash based access to single EFF frame filters 2014-05-19 09:38:24 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-06-12 23:06:23 -07:00
core net: fix setting csum_start in skb_segment() 2014-06-25 20:45:54 -07:00
dcb net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-04-24 13:44:54 -04:00
dccp net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
decnet net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
dns_resolver Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-11 16:02:55 -07:00
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-05-24 00:32:30 -04:00
ethernet
hsr hsr: replace del_timer by del_timer_sync 2014-03-27 15:28:06 -04:00
ieee802154 6lowpan_rtnl: fix off by one while fragmentation 2014-06-02 10:39:42 -07:00
ipv4 tcp: Fix divide by zero when pushing during tcp-repair 2014-07-02 18:21:03 -07:00
ipv6 ipv6: Fix MLD Query message check 2014-06-27 00:21:50 -07:00
ipx net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
irda
iucv af_iucv: correct cleanup if listen backlog is full 2014-05-30 17:35:23 -07:00
key af_key: Replace comma with semicolon 2014-05-30 17:48:58 -07:00
l2tp l2tp: call udp{6}_set_csum 2014-06-04 22:46:38 -07:00
lapb
llc
mac80211 mac80211: WEP extra head/tail room in ieee80211_send_auth 2014-06-23 10:45:14 +02:00
mac802154 mac802154: don't deliver packets to devices that are down 2014-06-11 12:10:19 -07:00
mpls gre: Call gso_make_checksum 2014-06-04 22:46:38 -07:00
netfilter netfilter: nf_nat: fix oops on netns removal 2014-06-16 13:58:54 +02:00
netlabel
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-03 23:32:12 -07:00
netrom net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
nfc net: add __pskb_copy_fclone and pskb_copy_for_clone 2014-06-11 15:38:02 -07:00
openvswitch vxlan: Add support for UDP checksums (v4 sending, v6 zero csums) 2014-06-04 22:46:39 -07:00
packet net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-04-24 13:44:54 -04:00
phonet net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-04-24 13:44:54 -04:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-22 13:58:36 -04:00
rose net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
rxrpc af_rxrpc: Fix XDR length check in rxrpc key demarshalling. 2014-05-16 15:24:47 -04:00
sched net_sched: drr: warn when qdisc is not work conserving 2014-06-11 15:50:59 -07:00
sctp net: sctp: check proc_dointvec result in proc_sctp_do_auth 2014-06-19 21:30:19 -07:00
sunrpc NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support 2014-06-24 18:46:58 -04:00
tipc net: add __pskb_copy_fclone and pskb_copy_for_clone 2014-06-11 15:38:02 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
vmw_vsock vsock: Make transport the proto owner 2014-05-05 13:13:50 -04:00
wimax
wireless nl80211: move set_qos_map command into split state 2014-06-24 16:13:10 +02:00
x25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-03 23:32:12 -07:00
compat.c
Kconfig Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-04-03 13:05:42 -07:00
Makefile
nonet.c
socket.c net: use SYSCALL_DEFINEx for sys_recv 2014-04-16 15:15:05 -04:00
sysctl_net.c