linux/net/ipv4
Eric Dumazet 449809a66c tcp/dccp: block BH for SYN processing
SYN processing really was meant to be handled from BH.

When I got rid of BH blocking while processing socket backlog
in commit 5413d1babe ("net: do not block BH while processing socket
backlog"), I forgot that a malicious user could transition to TCP_LISTEN
from a state that allowed (SYN) packets to be parked in the socket
backlog while socket is owned by the thread doing the listen() call.

Sure enough syzkaller found this and reported the bug ;)

=================================
[ INFO: inconsistent lock state ]
4.10.0+ #60 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
syz-executor0/5090 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
[<ffffffff83a6a370>] spin_lock include/linux/spinlock.h:299 [inline]
 (&(&hashinfo->ehash_locks[i])->rlock){+.?...}, at:
[<ffffffff83a6a370>] inet_ehash_insert+0x240/0xad0
net/ipv4/inet_hashtables.c:407
{IN-SOFTIRQ-W} state was registered at:
  mark_irqflags kernel/locking/lockdep.c:2923 [inline]
  __lock_acquire+0xbcf/0x3270 kernel/locking/lockdep.c:3295
  lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:299 [inline]
  inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
  reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
  inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
  tcp_conn_request+0x25cc/0x3310 net/ipv4/tcp_input.c:6399
  tcp_v4_conn_request+0x157/0x220 net/ipv4/tcp_ipv4.c:1262
  tcp_rcv_state_process+0x802/0x4130 net/ipv4/tcp_input.c:5889
  tcp_v4_do_rcv+0x56b/0x940 net/ipv4/tcp_ipv4.c:1433
  tcp_v4_rcv+0x2e12/0x3210 net/ipv4/tcp_ipv4.c:1711
  ip_local_deliver_finish+0x4ce/0xc40 net/ipv4/ip_input.c:216
  NF_HOOK include/linux/netfilter.h:257 [inline]
  ip_local_deliver+0x1ce/0x710 net/ipv4/ip_input.c:257
  dst_input include/net/dst.h:492 [inline]
  ip_rcv_finish+0xb1d/0x2110 net/ipv4/ip_input.c:396
  NF_HOOK include/linux/netfilter.h:257 [inline]
  ip_rcv+0xd90/0x19c0 net/ipv4/ip_input.c:487
  __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4179
  __netif_receive_skb+0x2a/0x170 net/core/dev.c:4217
  netif_receive_skb_internal+0x1d6/0x430 net/core/dev.c:4245
  napi_skb_finish net/core/dev.c:4602 [inline]
  napi_gro_receive+0x4e6/0x680 net/core/dev.c:4636
  e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4033 [inline]
  e1000_clean_rx_irq+0x5e0/0x1490
drivers/net/ethernet/intel/e1000/e1000_main.c:4489
  e1000_clean+0xb9a/0x2910 drivers/net/ethernet/intel/e1000/e1000_main.c:3834
  napi_poll net/core/dev.c:5171 [inline]
  net_rx_action+0xe70/0x1900 net/core/dev.c:5236
  __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
  invoke_softirq kernel/softirq.c:364 [inline]
  irq_exit+0x19e/0x1d0 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
  do_IRQ+0x81/0x1a0 arch/x86/kernel/irq.c:250
  ret_from_intr+0x0/0x20
  native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:53
  arch_safe_halt arch/x86/include/asm/paravirt.h:98 [inline]
  default_idle+0x8f/0x410 arch/x86/kernel/process.c:271
  arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:262
  default_idle_call+0x36/0x60 kernel/sched/idle.c:96
  cpuidle_idle_call kernel/sched/idle.c:154 [inline]
  do_idle+0x348/0x440 kernel/sched/idle.c:243
  cpu_startup_entry+0x18/0x20 kernel/sched/idle.c:345
  start_secondary+0x344/0x440 arch/x86/kernel/smpboot.c:272
  verify_cpu+0x0/0xfc
irq event stamp: 1741
hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160
[inline]
hardirqs last  enabled at (1741): [<ffffffff84d49d77>]
_raw_spin_unlock_irqrestore+0xf7/0x1a0 kernel/locking/spinlock.c:191
hardirqs last disabled at (1740): [<ffffffff84d4a732>]
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (1740): [<ffffffff84d4a732>]
_raw_spin_lock_irqsave+0xa2/0x110 kernel/locking/spinlock.c:159
softirqs last  enabled at (1738): [<ffffffff84d4deff>]
__do_softirq+0x7cf/0xb7d kernel/softirq.c:310
softirqs last disabled at (1571): [<ffffffff84d4b92c>]
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&hashinfo->ehash_locks[i])->rlock);
  <Interrupt>
    lock(&(&hashinfo->ehash_locks[i])->rlock);

 *** DEADLOCK ***

1 lock held by syz-executor0/5090:
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>] lock_sock
include/net/sock.h:1460 [inline]
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83406b43>]
sock_setsockopt+0x233/0x1e40 net/core/sock.c:683

stack backtrace:
CPU: 1 PID: 5090 Comm: syz-executor0 Not tainted 4.10.0+ #60
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x292/0x398 lib/dump_stack.c:51
 print_usage_bug+0x3ef/0x450 kernel/locking/lockdep.c:2387
 valid_state kernel/locking/lockdep.c:2400 [inline]
 mark_lock_irq kernel/locking/lockdep.c:2602 [inline]
 mark_lock+0xf30/0x1410 kernel/locking/lockdep.c:3065
 mark_irqflags kernel/locking/lockdep.c:2941 [inline]
 __lock_acquire+0x6dc/0x3270 kernel/locking/lockdep.c:3295
 lock_acquire+0x241/0x580 kernel/locking/lockdep.c:3753
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:299 [inline]
 inet_ehash_insert+0x240/0xad0 net/ipv4/inet_hashtables.c:407
 reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:753 [inline]
 inet_csk_reqsk_queue_hash_add+0x1b7/0x2a0 net/ipv4/inet_connection_sock.c:764
 dccp_v6_conn_request+0xada/0x11b0 net/dccp/ipv6.c:380
 dccp_rcv_state_process+0x51e/0x1660 net/dccp/input.c:606
 dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
 sk_backlog_rcv include/net/sock.h:896 [inline]
 __release_sock+0x127/0x3a0 net/core/sock.c:2052
 release_sock+0xa5/0x2b0 net/core/sock.c:2539
 sock_setsockopt+0x60f/0x1e40 net/core/sock.c:1016
 SYSC_setsockopt net/socket.c:1782 [inline]
 SyS_setsockopt+0x2fb/0x3a0 net/socket.c:1765
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:00007fe8b26c2b58 EFLAGS: 00000292 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00000000004458b9
RDX: 000000000000001a RSI: 0000000000000001 RDI: 0000000000000006
RBP: 00000000006e2110 R08: 0000000000000010 R09: 0000000000000000
R10: 00000000208c3000 R11: 0000000000000292 R12: 0000000000708000
R13: 0000000020000000 R14: 0000000000001000 R15: 0000000000000000

Fixes: 5413d1babe ("net: do not block BH while processing socket backlog")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-01 15:03:31 -08:00
..
netfilter lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
af_inet.c net: Add a skb_gro_flush_final helper. 2017-02-15 09:39:39 +01:00
ah4.c IPsec: do not ignore crypto err in ah4 input 2017-01-16 12:57:48 +01:00
arp.c NET: Fix /proc/net/arp for AX.25 2017-02-13 22:15:03 -05:00
cipso_ipv4.c netlabel: out of bound access in cipso_v4_validate() 2017-02-04 19:44:22 -05:00
datagram.c
devinet.c net: ipv4: remove fib_lookup.h from devinet.c include list 2017-02-02 23:09:08 -05:00
esp4_offload.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
esp4.c esp: Introduce a helper to setup the trailer 2017-01-17 10:23:08 +01:00
fib_frontend.c net: route: add missing nla_policy entry for RTA_MARK attribute 2017-03-01 10:25:56 -08:00
fib_lookup.h
fib_rules.c switchdev: remove FIB offload infrastructure 2016-09-28 04:48:00 -04:00
fib_semantics.c ipv4: fib: Notify about nexthop status changes 2017-02-08 15:25:18 -05:00
fib_trie.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
fou.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
icmp.c net: for rate-limited ICMP replies save one atomic operation 2017-01-09 15:49:12 -05:00
igmp.c igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() 2017-02-09 16:43:45 -05:00
inet_connection_sock.c inet: don't use sk_v6_rcv_saddr directly 2017-01-20 14:35:51 -05:00
inet_diag.c tcp: remove early retransmit 2017-01-13 22:37:16 -05:00
inet_fragment.c net: disable fragment reassembly if high_thresh is zero 2016-06-05 22:56:42 -04:00
inet_hashtables.c inet: kill smallest_size and smallest_port 2017-01-18 13:04:29 -05:00
inet_timewait_sock.c ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob 2016-12-29 11:38:31 -05:00
inetpeer.c
ip_forward.c ipv4: allow local fragmentation in ip_finish_output_gso() 2016-11-03 16:10:26 -04:00
ip_fragment.c net: rename IP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
ip_gre.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_input.c net: VRF: Pass original iif to ip_route_input() 2016-09-16 04:24:07 -04:00
ip_options.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ip_output.c net: rename dst_neigh_output back to neigh_output 2017-02-11 21:25:18 -05:00
ip_sockglue.c ip: fix IP_CHECKSUM handling 2017-02-21 12:23:53 -05:00
ip_tunnel_core.c lwtunnel: remove device arg to lwtunnel_build_state 2017-01-30 15:14:22 -05:00
ip_tunnel.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
ip_vti.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
ipcomp.c
ipconfig.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ipip.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
ipmr.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
Kconfig Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
Makefile esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
netfilter.c netfilter: Update ip_route_me_harder to consider L3 domain 2016-11-24 12:44:36 +01:00
ping.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
proc.c net: add LINUX_MIB_PFMEMALLOCDROP counter 2017-02-02 23:34:19 -05:00
protocol.c
raw_diag.c net: ip, raw_diag -- Use jump for exiting from nested loop 2016-11-03 15:25:26 -04:00
raw.c net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP 2017-02-07 13:07:47 -05:00
route.c ipv4: mask tos for input route 2017-02-26 11:03:38 -05:00
syncookies.c syncookies: use SipHash in place of SHA1 2017-01-09 13:58:57 -05:00
sysctl_net_ipv4.c net: Avoid receiving packets with an l3mdev on unbound UDP sockets 2017-01-30 15:00:58 -05:00
tcp_bbr.c tcp_bbr: add a state transition diagram and accompanying comment 2016-10-29 17:12:43 -04:00
tcp_bic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cdg.c tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict 2016-09-21 00:22:59 -04:00
tcp_cong.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-11-22 13:27:16 -05:00
tcp_cubic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_dctcp.c Revert "dctcp: update cwnd on congestion event" 2016-12-06 11:34:24 -05:00
tcp_diag.c net: diag: Fix refcnt leak in error path destroying socket 2016-08-23 23:11:36 -07:00
tcp_fastopen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-28 10:33:06 -05:00
tcp_highspeed.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_htcp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_hybla.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_illinois.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_input.c tcp/dccp: block BH for SYN processing 2017-03-01 15:03:31 -08:00
tcp_ipv4.c tcp: setup timestamp offset when write_seq already set 2017-02-22 16:35:32 -05:00
tcp_lp.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_metrics.c tcp: replace dst_confirm with sk_dst_confirm 2017-02-07 13:07:46 -05:00
tcp_minisocks.c tcp: account for ts offset only if tsecr not zero 2017-02-22 16:35:58 -05:00
tcp_nv.c tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
tcp_offload.c gso: Support partial splitting at the frag_list pointer 2016-09-19 20:59:34 -04:00
tcp_output.c tcp: accommodate sequence number to a peer's shrunk receive window caused by precision loss in window scaling 2017-02-17 15:30:33 -05:00
tcp_probe.c tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" 2017-02-21 13:26:03 -05:00
tcp_rate.c tcp: export data delivery rate 2016-09-21 00:23:00 -04:00
tcp_recovery.c tcp: enable RACK loss detection to trigger recovery 2017-01-13 22:37:16 -05:00
tcp_scalable.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_timer.c tcp: remove early retransmit 2017-01-13 22:37:16 -05:00
tcp_vegas.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_vegas.h tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_veno.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp_westwood.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_yeah.c tcp: add cwnd_undo functions to various tcp cc algorithms 2016-11-21 13:20:17 -05:00
tcp.c tcp: use page_ref_inc() in tcp_sendmsg() 2017-02-17 15:31:06 -05:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp_diag.c net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
udp_impl.h udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
udp_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-07 16:29:30 -05:00
udplite.c udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
xfrm4_input.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c esp: Add a software GRO codepath 2017-02-15 11:04:11 +01:00
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c xfrm: policy: make policy backend const 2017-02-09 10:22:19 +01:00
xfrm4_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm4_state.c xfrm: remove unused function 2017-01-10 10:57:12 +01:00
xfrm4_tunnel.c