mirror of
https://github.com/torvalds/linux.git
synced 2024-11-24 13:11:40 +00:00
45caeaa5ac
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:
#8 [] page_fault at ffffffff8163e648
[exception RIP: __tcp_ack_snd_check+74]
.
.
#9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8
Of course it may happen with other NIC drivers as well.
It's found the freed dst_entry here:
224 static bool tcp_in_quickack_mode(struct sock *sk)↩
225 {↩
226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩
227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩
228 ↩
229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
231 }↩
But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.
All the vmcores showed 2 significant clues:
- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.
- All vmcores showed a postitive LockDroppedIcmps value, e.g:
LockDroppedIcmps 267
A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:
do_redirect()->__sk_dst_check()-> dst_release().
Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.
To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.
The dccp/IPv6 code is very similar in this respect, so fixing it there too.
As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().
Fixes:
|
||
---|---|---|
.. | ||
netfilter | ||
af_inet.c | ||
ah4.c | ||
arp.c | ||
cipso_ipv4.c | ||
datagram.c | ||
devinet.c | ||
esp4_offload.c | ||
esp4.c | ||
fib_frontend.c | ||
fib_lookup.h | ||
fib_rules.c | ||
fib_semantics.c | ||
fib_trie.c | ||
fou.c | ||
gre_demux.c | ||
gre_offload.c | ||
icmp.c | ||
igmp.c | ||
inet_connection_sock.c | ||
inet_diag.c | ||
inet_fragment.c | ||
inet_hashtables.c | ||
inet_timewait_sock.c | ||
inetpeer.c | ||
ip_forward.c | ||
ip_fragment.c | ||
ip_gre.c | ||
ip_input.c | ||
ip_options.c | ||
ip_output.c | ||
ip_sockglue.c | ||
ip_tunnel_core.c | ||
ip_tunnel.c | ||
ip_vti.c | ||
ipcomp.c | ||
ipconfig.c | ||
ipip.c | ||
ipmr.c | ||
Kconfig | ||
Makefile | ||
netfilter.c | ||
ping.c | ||
proc.c | ||
protocol.c | ||
raw_diag.c | ||
raw.c | ||
route.c | ||
syncookies.c | ||
sysctl_net_ipv4.c | ||
tcp_bbr.c | ||
tcp_bic.c | ||
tcp_cdg.c | ||
tcp_cong.c | ||
tcp_cubic.c | ||
tcp_dctcp.c | ||
tcp_diag.c | ||
tcp_fastopen.c | ||
tcp_highspeed.c | ||
tcp_htcp.c | ||
tcp_hybla.c | ||
tcp_illinois.c | ||
tcp_input.c | ||
tcp_ipv4.c | ||
tcp_lp.c | ||
tcp_metrics.c | ||
tcp_minisocks.c | ||
tcp_nv.c | ||
tcp_offload.c | ||
tcp_output.c | ||
tcp_probe.c | ||
tcp_rate.c | ||
tcp_recovery.c | ||
tcp_scalable.c | ||
tcp_timer.c | ||
tcp_vegas.c | ||
tcp_vegas.h | ||
tcp_veno.c | ||
tcp_westwood.c | ||
tcp_yeah.c | ||
tcp.c | ||
tunnel4.c | ||
udp_diag.c | ||
udp_impl.h | ||
udp_offload.c | ||
udp_tunnel.c | ||
udp.c | ||
udplite.c | ||
xfrm4_input.c | ||
xfrm4_mode_beet.c | ||
xfrm4_mode_transport.c | ||
xfrm4_mode_tunnel.c | ||
xfrm4_output.c | ||
xfrm4_policy.c | ||
xfrm4_protocol.c | ||
xfrm4_state.c | ||
xfrm4_tunnel.c |