linux/include/net
Konstantin Khlebnikov b56141ab34 net: frag, fix race conditions in LRU list maintenance
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4
("net: frag, move LRU list maintenance outside of rwlock")

One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().

Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.

This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.

I saw this kernel oops two times in a couple of days.

[119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
[119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
[119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
[119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
[119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
[119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
[119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
[119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
[119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
[119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
[119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
[119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
[119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
[119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
[119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.348675]  RSP <ffff880216d5db58>
[119482.353493] CR2: 0000000000000000

Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:51 -04:00
..
9p 9p: turn fid->dlist into hlist 2013-02-27 22:51:08 -05:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
caif caif: Remove my bouncing email address. 2013-04-23 13:25:51 -04:00
irda irda: small read past the end of array in debug code 2013-04-19 17:32:31 -04:00
iucv af_iucv: fix recvmsg by replacing skb_pull() function 2013-04-08 17:16:57 -04:00
netfilter netfilter: move skb_gso_segment into nfnetlink_queue module 2013-04-29 20:09:05 +02:00
netns netfilter: nf_log: prepare net namespace support for loggers 2013-04-05 20:12:54 +02:00
nfc NFC: RFKILL support 2013-04-12 16:54:45 +02:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
tc_act
act_api.h act_police: move struct tcf_police to act_police.c 2013-02-12 18:59:45 -05:00
addrconf.h ipv6: statically link register_inet6addr_notifier() 2013-04-14 15:24:17 -04:00
af_ieee802154.h
af_rxrpc.h
af_unix.h af_unix: fix a fatal race with bit fields 2013-05-01 15:13:49 -04:00
ah.h
arp.h net: Dont use ifindices in hash fns 2012-08-09 16:18:06 -07:00
atmclip.h atm: clip: Use device neigh support on top of "arp_tbl". 2011-11-30 18:51:03 -05:00
ax25.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ax88796.h
cfg80211-wext.h
cfg80211.h cfg80211: introduce critical protocol indication from user-space 2013-04-22 15:48:00 +02:00
checksum.h net: core: add function for incremental IPv6 pseudo header checksum updates 2012-08-30 03:00:16 +02:00
cipso_ipv4.h cipso: handle CIPSO options correctly when NetLabel is disabled 2012-06-01 14:18:29 -04:00
cls_cgroup.h cls_cgroup: remove task_struct parameter from sock_update_classid() 2013-04-09 13:19:35 -04:00
codel.h codel: refine one condition to avoid a nul rec_inv_sqrt 2012-08-10 16:52:54 -07:00
compat.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
datalink.h
dcbevent.h
dcbnl.h net/dcb: Add an optional max rate attribute 2012-04-05 05:08:04 -04:00
dn_dev.h
dn_fib.h decnet: Parse netlink attributes on our own 2013-03-22 10:31:16 -04:00
dn_neigh.h
dn_nsp.h
dn_route.h decnet: use correct RCU API to deref sk_dst_cache field 2013-01-28 00:15:27 -05:00
dn.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dsa.h dsa: Include linux/if_ether.h to fix build error 2011-12-01 11:41:06 -05:00
dsfield.h ipv6: Optimize ipv6_change_dsfield(). 2013-01-09 23:59:53 -08:00
dst_ops.h net: Fix warnings in dst_ops.h 2012-07-19 10:43:03 -07:00
dst.h Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug 2013-03-15 09:06:58 -04:00
esp.h
ethoc.h
fib_rules.h ipv4: Elide fib_validate_source() completely when possible. 2012-06-29 01:36:36 -07:00
firewire.h firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. 2013-03-26 12:32:13 -04:00
flow_keys.h flow_keys: include thoff into flow_keys for later usage 2013-03-20 12:14:36 -04:00
flow.h ipv4: Add FLOWI_FLAG_KNOWN_NH 2012-10-08 17:42:36 -04:00
garp.h
gen_stats.h
genetlink.h genl: Allow concurrent genl callbacks. 2013-04-25 01:43:15 -04:00
gre.h GRE: Refactor GRE tunneling code. 2013-03-26 12:27:18 -04:00
gro_cells.h gro: Fix kcalloc argument order 2013-01-27 22:46:33 -05:00
icmp.h ipv4: fix error handling in icmp_protocol. 2013-02-22 15:10:18 -05:00
ieee80211_radiotap.h mac80211: support (partial) VHT radiotap information 2012-11-27 11:56:18 +01:00
ieee802154_netdev.h ieee802154/nl-mac.c: make some MLME operations optional 2013-04-08 12:00:16 -04:00
ieee802154.h 6LoWPAN: add fragmentation support 2011-11-14 00:19:42 -05:00
if_inet6.h net: ipv6: only invalidate previously tokenized addresses 2013-04-09 13:12:23 -04:00
inet6_connection_sock.h ipv6: Add helper inet6_csk_update_pmtu(). 2012-07-16 03:44:56 -07:00
inet6_hashtables.h ipv6: use a stronger hash for tcp 2013-02-21 18:15:58 -05:00
inet_common.h net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) 2012-07-19 11:02:03 -07:00
inet_connection_sock.h tcp: Tail loss probe (TLP) 2013-03-12 08:30:34 -04:00
inet_ecn.h tunnel: drop packet if ECN present with not-ECT 2012-09-27 18:12:37 -04:00
inet_frag.h net: frag, fix race conditions in LRU list maintenance 2013-05-06 11:06:51 -04:00
inet_hashtables.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inet_sock.h ipv6: use a stronger hash for tcp 2013-02-21 18:15:58 -05:00
inet_timewait_sock.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inetpeer.h ipv4: Maintain redirect and PMTU info in struct rtable again. 2012-07-10 22:40:14 -07:00
ip6_checksum.h ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library 2013-01-08 17:56:10 -08:00
ip6_fib.h ipv6: fix race condition regarding dst->expires and dst->from. 2013-02-20 15:11:45 -05:00
ip6_route.h ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers. 2013-01-18 14:41:13 -05:00
ip6_tunnel.h GRE: Refactor GRE tunneling code. 2013-03-26 12:27:18 -04:00
ip_fib.h ipv4: fix definition of FIB_TABLE_HASHSZ 2013-03-13 10:47:09 -04:00
ip_tunnels.h GRE: Refactor GRE tunneling code. 2013-03-26 12:27:18 -04:00
ip_vs.h ipvs: fix sparse warnings for some parameters 2013-04-23 11:43:05 +09:00
ip.h ipv4: Add a socket release callback for datagram sockets 2013-01-21 14:17:05 -05:00
ipcomp.h
ipconfig.h
ipv6.h ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling 2013-03-24 17:16:30 -04:00
ipx.h
iw_handler.h
lapb.h lapb: Neaten debugging 2012-05-17 18:45:20 -04:00
lib80211.h hostap: Don't use create_proc_read_entry() 2013-04-29 15:41:56 -04:00
llc_c_ac.h
llc_c_ev.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h llc: Remove stray reference to sysctl_llc_station_ack_timeout. 2012-09-17 13:13:24 -04:00
mac80211.h mac80211: improve the rate control API 2013-04-22 16:16:41 +02:00
mac802154.h mac802154: add wpan device-class support 2012-06-26 21:06:11 -07:00
mip6.h
mld.h
mrp.h net/802: Implement Multiple Registration Protocol (MRP) 2013-02-10 20:37:22 -05:00
ndisc.h ndisc: Move ndisc_opt_addr_space() to include/net/ndisc.h. 2013-01-21 13:33:14 -05:00
neighbour.h net neighbour, decnet: Ensure to align device private data on preferred alignment. 2013-02-11 00:21:44 -05:00
net_namespace.h netfilter: make /proc/net/netfilter pernet 2013-04-05 19:35:02 +02:00
net_ratelimit.h
netdma.h
netevent.h ipv6 netevent: Remove old_neigh from netevent_redirect. 2013-01-14 15:04:59 -05:00
netlabel.h userns: Convert the audit loginuid to be a kuid 2012-09-17 18:08:54 -07:00
netlink.h netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
netprio_cgroup.h netprio_cgroup: remove task_struct parameter from sock_update_netprio() 2013-04-09 13:19:37 -04:00
netrom.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
nexthop.h
nl802154.h
p8022.h
ping.h
pkt_cls.h pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
pkt_sched.h sch_api: introduce qdisc_watchdog_schedule_ns() 2013-02-12 18:59:45 -05:00
protocol.h net: Remove code duplication between offload structures 2012-11-15 17:39:51 -05:00
psnap.h
raw.h
rawv6.h ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
red.h net_sched: red: Make minor corrections to comments 2012-04-16 23:53:11 -04:00
regulatory.h regulatory: use RCU to protect last_request 2013-01-03 13:01:30 +01:00
request_sock.h net: remove a stale comment for dl_next 2013-04-22 15:55:48 -04:00
rose.h
route.h ipv4: avoid a test in ip_rt_put() 2012-11-03 14:59:04 -04:00
rtnetlink.h rtnetlink: Remove passing of attributes into rtnl_doit functions 2013-03-22 10:31:16 -04:00
sch_generic.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
scm.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-04-22 20:32:51 -04:00
secure_seq.h net: defer net_secret[] initialization 2013-04-29 15:14:02 -04:00
slhc_vj.h
snmp.h net: avoid reloads in SNMP_UPD_PO_STATS 2012-08-06 13:40:47 -07:00
sock.h net: sock: make sock_tx_timestamp void 2013-04-14 15:41:49 -04:00
stp.h
tcp_memcontrol.h cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg 2012-04-10 10:04:07 -07:00
tcp_states.h
tcp.h tcp: GSO should be TSQ friendly 2013-04-12 18:17:06 -04:00
timewait_sock.h [PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. 2012-06-09 14:56:12 -07:00
transp_v6.h ipv6: rename datagram_send_ctl and datagram_recv_ctl 2013-01-31 13:53:08 -05:00
udp.h net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6 2012-04-28 22:21:51 -04:00
udplite.h net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
wext.h
wimax.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wpan-phy.h mac802154: monitor device support 2012-05-16 15:17:08 -04:00
x25.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
x25device.h
xfrm.h xfrm: allow to avoid copying DSCP during encapsulation 2013-03-06 07:02:45 +01:00