linux/net/ipv4
Taehee Yoo 5a86d68bcf netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
When network namespace is destroyed, cleanup_net() is called.
cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
So that clusterip_tg_destroy() is called by cleanup_net().
And clusterip_tg_destroy() calls unregister_netdevice_notifier().

But both cleanup_net() and clusterip_tg_destroy() hold same
lock(pernet_ops_rwsem). hence deadlock occurrs.

After this patch, only 1 notifier is registered when module is inserted.
And all of configs are added to per-net list.

test commands:
   %ip netns add vm1
   %ip netns exec vm1 bash
   %ip link set lo up
   %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
	-j CLUSTERIP --new --hashmode sourceip \
	--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
   %exit
   %ip netns del vm1

splat looks like:
[  341.809674] ============================================
[  341.809674] WARNING: possible recursive locking detected
[  341.809674] 4.19.0-rc5+ #16 Tainted: G        W
[  341.809674] --------------------------------------------
[  341.809674] kworker/u4:2/87 is trying to acquire lock:
[  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x8c/0x460
[  341.809674]
[  341.809674] but task is already holding lock:
[  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] other info that might help us debug this:
[  341.809674]  Possible unsafe locking scenario:
[  341.809674]
[  341.809674]        CPU0
[  341.809674]        ----
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]   lock(pernet_ops_rwsem);
[  341.809674]
[  341.809674]  *** DEADLOCK ***
[  341.809674]
[  341.809674]  May be due to missing lock nesting notation
[  341.809674]
[  341.809674] 3 locks held by kworker/u4:2/87:
[  341.809674]  #0: 00000000d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: process_one_work+0xafe/0x1de0
[  341.809674]  #1: 00000000c2cbcee2 (net_cleanup_work){+.+.}, at: process_one_work+0xb60/0x1de0
[  341.809674]  #2: 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
[  341.809674]
[  341.809674] stack backtrace:
[  341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: G        W         4.19.0-rc5+ #16
[  341.809674] Workqueue: netns cleanup_net
[  341.809674] Call Trace:
[ ... ]
[  342.070196]  down_write+0x93/0x160
[  342.070196]  ? unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? down_read+0x1e0/0x1e0
[  342.070196]  ? sched_clock_cpu+0x126/0x170
[  342.070196]  ? find_held_lock+0x39/0x1c0
[  342.070196]  unregister_netdevice_notifier+0x8c/0x460
[  342.070196]  ? register_netdevice_notifier+0x790/0x790
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.070196]  ? trace_hardirqs_on+0x93/0x210
[  342.070196]  ? __bpf_trace_preemptirq_template+0x10/0x10
[  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
[  342.123094]  clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
[  342.123094]  ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
[  342.123094]  ? cleanup_match+0x17d/0x200 [ip_tables]
[  342.123094]  ? xt_unregister_table+0x215/0x300 [x_tables]
[  342.123094]  ? kfree+0xe2/0x2a0
[  342.123094]  cleanup_entry+0x1d5/0x2f0 [ip_tables]
[  342.123094]  ? cleanup_match+0x200/0x200 [ip_tables]
[  342.123094]  __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
[  342.123094]  iptable_filter_net_exit+0x43/0x80 [iptable_filter]
[  342.123094]  ops_exit_list.isra.10+0x94/0x140
[  342.123094]  cleanup_net+0x45b/0x900
[ ... ]

Fixes: 202f59afd4 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-18 01:17:59 +01:00
..
bpfilter bpfilter: remove trailing newline 2018-07-24 14:10:42 -07:00
netfilter netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine 2018-12-18 01:17:59 +01:00
af_inet.c inet: minor optimization for backlog setting in listen(2) 2018-11-07 22:31:07 -08:00
ah4.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
arp.c net: Evict neighbor entries on carrier down 2018-10-12 09:47:39 -07:00
cipso_ipv4.c net/ipv4: defensive cipso option parsing 2018-09-17 19:37:46 -07:00
datagram.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
devinet.c net/{ipv4,ipv6}: Do not put target net if input nsid is invalid 2018-10-25 16:21:31 -07:00
esp4_offload.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-07-27 09:33:37 -07:00
esp4.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-10-01 22:31:17 -07:00
fib_frontend.c net: Don't return invalid table id error when dumping all families 2018-10-24 14:06:25 -07:00
fib_lookup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_notifier.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_rules.c net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
fib_semantics.c net: Add extack argument to ip_fib_metrics_init 2018-11-06 15:00:45 -08:00
fib_trie.c net/ipv4: Plumb support for filtering route dumps 2018-10-16 00:13:12 -07:00
fou.c fou, fou6: ICMP error handlers for FoU and GUE 2018-11-08 17:13:08 -08:00
gre_demux.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
gre_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
icmp.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
igmp.c ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12 2018-10-29 20:26:06 -07:00
inet_connection_sock.c inet: minor optimization for backlog setting in listen(2) 2018-11-07 22:31:07 -08:00
inet_diag.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
inet_fragment.c inet: frags: better deal with smp races 2018-11-08 18:40:30 -08:00
inet_hashtables.c net: ensure unbound stream socket to be chosen when not in a VRF 2018-11-07 16:12:38 -08:00
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inetpeer.c inetpeer: fix uninit-value in inet_getpeer 2018-04-09 10:57:35 -04:00
ip_forward.c net: ipv4: Control SKB reprioritization after forwarding 2018-08-01 09:52:30 -07:00
ip_fragment.c net: drop skb on failure in ip_check_defrag() 2018-11-01 13:55:30 -07:00
ip_gre.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
ip_input.c ip: factor out protocol delivery helper 2018-11-07 16:23:05 -08:00
ip_options.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip_output.c net: Add and use skb_mark_not_on_list(). 2018-09-10 10:06:54 -07:00
ip_sockglue.c net: bpfilter: fix iptables failure if bpfilter_umh is disabled 2018-11-05 17:12:18 -08:00
ip_tunnel_core.c ipv4/tunnel: use __vlan_hwaccel helpers 2018-11-08 20:45:04 -08:00
ip_tunnel.c ip_tunnel: be careful when accessing the inner header 2018-09-24 12:27:04 -07:00
ip_vti.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
ipcomp.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
ipconfig.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
ipip.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
ipmr_base.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
ipmr.c net: Don't return invalid table id error when dumping all families 2018-10-24 14:06:25 -07:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
metrics.c net: Add extack argument to ip_fib_metrics_init 2018-11-06 15:00:45 -08:00
netfilter.c netfilter: utils: move nf_ip_checksum* from ipv4 to utils 2018-07-16 17:51:48 +02:00
netlink.c ipv4: support sport, dport and ip_proto in RTM_GETROUTE 2018-05-23 15:14:12 -04:00
ping.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
proc.c ip: discard IPv4 datagrams with overlapping segments. 2018-08-05 17:16:46 -07:00
protocol.c fou, fou6: ICMP error handlers for FoU and GUE 2018-11-08 17:13:08 -08:00
raw_diag.c
raw.c net: fix raw socket lookup device bind matching with VRFs 2018-11-07 16:12:39 -08:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-12 21:38:46 -07:00
syncookies.c tcp: provide earliest departure time in skb->tstamp 2018-09-21 19:37:59 -07:00
sysctl_net_ipv4.c net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs 2018-11-07 16:12:38 -08:00
tcp_bbr.c tcp_bbr: update comments to reflect pacing_margin_percent 2018-11-08 20:46:17 -08:00
tcp_bic.c
tcp_bpf.c bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data 2018-10-30 23:31:22 +01:00
tcp_cdg.c tcp: cdg: use tcp high resolution clock cache 2018-10-15 22:56:42 -07:00
tcp_cong.c tcp: Namespace-ify sysctl_tcp_default_congestion_control 2017-11-15 14:09:52 +09:00
tcp_cubic.c
tcp_dctcp.c tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_dctcp.h tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_diag.c net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store 2017-12-20 14:00:25 -05:00
tcp_fastopen.c tcp: pause Fast Open globally after third consecutive timeout 2017-12-13 15:51:12 -05:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-02-28 12:03:47 -05:00
tcp_input.c tcp: minor optimization in tcp ack fast path processing 2018-11-11 10:24:18 -08:00
tcp_ipv4.c tcp: tsq: no longer use limit_output_bytes for paced flows 2018-11-11 13:57:03 -08:00
tcp_lp.c
tcp_metrics.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp_minisocks.c tcp: do not restart timewait timer on rst reception 2018-08-31 23:10:35 -07:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-01-31 10:26:30 -05:00
tcp_offload.c tcp: Don't coalesce decrypted and encrypted SKBs 2018-07-16 00:12:09 -07:00
tcp_output.c tcp: tsq: no longer use limit_output_bytes for paced flows 2018-11-11 13:57:03 -08:00
tcp_rate.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_recovery.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_scalable.c
tcp_timer.c tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh 2018-10-15 22:56:41 -07:00
tcp_ulp.c tcp, ulp: remove socket lock assertion on ULP cleanup 2018-10-16 12:38:41 -07:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-09-29 06:07:00 +01:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
tunnel4.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_diag.c net: diag: document swapped src/dst in udp_dump_one. 2018-10-28 19:27:21 -07:00
udp_impl.h net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_offload.c udp: implement GRO for plain UDP sockets. 2018-11-07 16:23:04 -08:00
udp_tunnel.c udp: Handle ICMP errors for tunnels with same destination port on both endpoints 2018-11-08 17:13:08 -08:00
udp.c udp: Support for error handlers of tunnels with arbitrary destination port 2018-11-08 17:13:08 -08:00
udplite.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm4_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm4_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_protocol.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm4_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfrm4_tunnel.c