linux/net/ipv4
Konstantin Khlebnikov b56141ab34 net: frag, fix race conditions in LRU list maintenance
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4
("net: frag, move LRU list maintenance outside of rwlock")

One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().

Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.

This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.

I saw this kernel oops two times in a couple of days.

[119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
[119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
[119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
[119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
[119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
[119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
[119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
[119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
[119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
[119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
[119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
[119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
[119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
[119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
[119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.348675]  RSP <ffff880216d5db58>
[119482.353493] CR2: 0000000000000000

Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:51 -04:00
..
netfilter Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
af_inet.c gre: Fix GREv4 TCPv6 segmentation. 2013-05-03 16:08:58 -04:00
ah4.c net: Add skb_unclone() helper function. 2013-02-15 15:10:37 -05:00
arp.c firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. 2013-03-26 12:32:13 -04:00
cipso_ipv4.c cipso: don't follow a NULL pointer when setsockopt() is called 2012-07-18 09:01:12 -07:00
datagram.c ipv4: Add a socket release callback for datagram sockets 2013-01-21 14:17:05 -05:00
devinet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-04-22 20:32:51 -04:00
esp4.c esp4: fix error return code in esp_output() 2013-04-15 14:05:34 -04:00
fib_frontend.c net-next: replace obsolete NLMSG_* with type safe nlmsg_* 2013-03-28 14:25:25 -04:00
fib_lookup.h ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_rules.c sections: fix section conflicts in net 2012-10-06 03:04:45 +09:00
fib_semantics.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
fib_trie.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
gre.c gre: Fix GREv4 TCPv6 segmentation. 2013-05-03 16:08:58 -04:00
icmp.c net: Add MIB counters for checksum errors 2013-04-29 15:14:03 -04:00
igmp.c net: proc: change proc_net_remove to remove_proc_entry 2013-02-18 14:53:08 -05:00
inet_connection_sock.c tcp: Remove TCPCT 2013-03-17 14:35:13 -04:00
inet_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
inet_fragment.c net: frag, fix race conditions in LRU list maintenance 2013-05-06 11:06:51 -04:00
inet_hashtables.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inet_lro.c ipv4: replace ip_fast_csum with csum_replace2 2013-03-15 09:12:25 -04:00
inet_timewait_sock.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inetpeer.c Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2012-10-02 09:54:49 -07:00
ip_forward.c ipv4: introduce rt_uses_gateway 2012-10-08 17:42:36 -04:00
ip_fragment.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-04-22 20:32:51 -04:00
ip_gre.c net: ipv4: typo issue, remove erroneous semicolon 2013-04-25 04:07:45 -04:00
ip_input.c net: Add MIB counters for checksum errors 2013-04-29 15:14:03 -04:00
ip_options.c net/ipv4: Ensure that location of timestamp option is stored 2013-03-12 05:35:39 -04:00
ip_output.c netfilter: use IS_ENABLE to replace if defined in TRACE target 2013-04-02 01:03:45 +02:00
ip_sockglue.c net: prevent setting ttl=0 via IP_TTL 2013-01-08 17:57:10 -08:00
ip_tunnel.c ip_tunnel: Fix off-by-one error in forming dev name. 2013-03-29 15:24:28 -04:00
ip_vti.c Tunneling: use IP Tunnel stats APIs. 2013-03-26 12:27:19 -04:00
ipcomp.c xfrm: allow to avoid copying DSCP during encapsulation 2013-03-06 07:02:45 +01:00
ipconfig.c ipconfig: add informative timeout messages while waiting for carrier 2013-04-02 14:35:33 -04:00
ipip.c IPIP: Use ip-tunneling code. 2013-03-26 12:27:18 -04:00
ipmr.c net-next: replace obsolete NLMSG_* with type safe nlmsg_* 2013-03-28 14:25:25 -04:00
Kconfig Tunneling: use IP Tunnel stats APIs. 2013-03-26 12:27:19 -04:00
Makefile GRE: Refactor GRE tunneling code. 2013-03-26 12:27:18 -04:00
netfilter.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
ping.c net: sock: make sock_tx_timestamp void 2013-04-14 15:41:49 -04:00
proc.c net: Add MIB counters for checksum errors 2013-04-29 15:14:03 -04:00
protocol.c ipv4: Disallow non-namespace aware protocols to register. 2013-02-05 14:42:23 -05:00
raw.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
route.c rtnetlink: Remove passing of attributes into rtnl_doit functions 2013-03-22 10:31:16 -04:00
syncookies.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-04-22 20:32:51 -04:00
sysctl_net_ipv4.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_bic.c tcp: fix undo after RTO for BIC 2012-01-20 14:17:26 -05:00
tcp_cong.c tcp: remove Appropriate Byte Count support 2013-02-05 14:51:16 -05:00
tcp_cubic.c tcp: fix undo after RTO for CUBIC 2012-01-20 14:17:26 -05:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_fastopen.c tcp: TCP Fast Open Server - header & support functions 2012-08-31 20:02:18 -04:00
tcp_highspeed.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_htcp.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_hybla.c tcp: bool conversions 2012-05-17 14:59:59 -04:00
tcp_illinois.c net: fix divide by zero in tcp algorithm illinois 2012-11-01 11:55:59 -04:00
tcp_input.c net: Add MIB counters for checksum errors 2013-04-29 15:14:03 -04:00
tcp_ipv4.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
tcp_lp.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp_memcontrol.c net: tcp_memcontrol: minor: remove unused variable 2013-04-14 15:41:49 -04:00
tcp_metrics.c tcp: do not expire TCP fastopen cookies 2013-05-05 16:58:02 -04:00
tcp_minisocks.c tcp: reset timer after any SYNACK retransmit 2013-04-29 15:14:03 -04:00
tcp_output.c net: Add MIB counters for checksum errors 2013-04-29 15:14:03 -04:00
tcp_probe.c net: proc: change proc_net_remove to remove_proc_entry 2013-02-18 14:53:08 -05:00
tcp_scalable.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_timer.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_vegas.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_vegas.h
tcp_veno.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_westwood.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_yeah.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp.c tcp: tcp_tso_segment() small optimization 2013-04-13 16:54:14 -04:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
udp_impl.h ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
udp.c vxlan: Fix TCPv6 segmentation. 2013-05-03 16:08:59 -04:00
udplite.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
xfrm4_input.c net: Add skb_unclone() helper function. 2013-02-15 15:10:37 -05:00
xfrm4_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c xfrm: allow to avoid copying DSCP during encapsulation 2013-03-06 07:02:45 +01:00
xfrm4_output.c xfrm4: Don't call icmp_send on local error 2011-07-01 17:33:19 -07:00
xfrm4_policy.c xfrm: make gc_thresh configurable in all namespaces 2013-02-06 11:36:29 +01:00
xfrm4_state.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
xfrm4_tunnel.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00