linux/net
Guenter Roeck 5e016cbf6c ipv4: Don't drop redirected route cache entry unless PTMU actually expired
TCP sessions over IPv4 can get stuck if routers between endpoints
do not fragment packets but implement PMTU instead, and we are using
those routers because of an ICMP redirect.

Setup is as follows

       MTU1    MTU2   MTU1
    A--------B------C------D

with MTU1 > MTU2. A and D are endpoints, B and C are routers. B and C
implement PMTU and drop packets larger than MTU2 (for example because
DF is set on all packets). TCP sessions are initiated between A and D.
There is packet loss between A and D, causing frequent TCP
retransmits.

After the number of retransmits on a TCP session reaches tcp_retries1,
tcp calls dst_negative_advice() prior to each retransmit. This results
in route cache entries for the peer to be deleted in
ipv4_negative_advice() if the Path MTU is set.

If the outstanding data on an affected TCP session is larger than
MTU2, packets sent from the endpoints will be dropped by B or C, and
ICMP NEEDFRAG will be returned. A and D receive NEEDFRAG messages and
update PMTU.

Before the next retransmit, tcp will again call dst_negative_advice(),
causing the route cache entry (with correct PMTU) to be deleted. The
retransmitted packet will be larger than MTU2, causing it to be
dropped again.

This sequence repeats until the TCP session aborts or is terminated.

Problem is fixed by removing redirected route cache entries in
ipv4_negative_advice() only if the PMTU is expired.

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 20:55:13 -07:00
..
9p 9p: Change the name of new protocol from 9p2010.L to 9p2000.L 2010-03-13 08:57:29 -06:00
802 sysctl net: Remove unused binary sysctl code 2009-11-12 02:05:06 -08:00
8021q net: Potential null skb->dev dereference 2010-03-18 21:16:45 -07:00
appletalk net: appletalk: use seq_hlist_foo() helpers 2010-02-10 11:12:09 -08:00
atm net: atm: use seq_list_foo() helpers 2010-02-10 12:31:10 -08:00
ax25 net: ax25: use seq_hlist_foo() helpers 2010-02-10 11:12:09 -08:00
bluetooth Bluetooth: Fix kernel crash on L2CAP stress tests 2010-03-21 05:49:36 +01:00
bridge bridge: Make first arg to deliver_clone const. 2010-03-16 14:37:47 -07:00
can can: deny filterlist access on non-CAN interfaces 2010-02-02 07:21:34 -08:00
core net: Potential null skb->dev dereference 2010-03-18 21:16:45 -07:00
dcb const: struct nla_policy 2010-02-18 14:30:18 -08:00
dccp net-2.6 [Bug-Fix][dccp]: fix oops caused after failed initialisation 2010-03-15 16:00:50 -07:00
decnet net: Add checking to rcu_dereference() primitives 2010-02-25 09:41:03 +01:00
dsa netdev: convert pseudo-devices to netdev_tx_t 2009-09-01 01:13:07 -07:00
econet net: use net_eq to compare nets 2009-11-25 15:14:13 -08:00
ethernet llc: use dev_hard_header 2009-12-26 20:38:23 -08:00
ieee802154 net: use net_eq to compare nets 2009-11-25 15:14:13 -08:00
ipv4 ipv4: Don't drop redirected route cache entry unless PTMU actually expired 2010-03-21 20:55:13 -07:00
ipv6 net: ipmr/ip6mr: fix potential out-of-bounds vif_table access 2010-03-19 22:47:22 -07:00
ipx net: ipx: use seq_list_foo() helpers 2010-02-10 12:31:10 -08:00
irda const: struct nla_policy 2010-02-18 14:30:18 -08:00
iucv const: constify remaining dev_pm_ops 2009-12-15 08:53:25 -08:00
key xfrm: SP lookups signature with mark 2010-02-22 16:21:12 -08:00
lapb
llc net: backlog functions rename 2010-03-05 13:34:03 -08:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2010-03-13 14:50:18 -08:00
netfilter netfilter: ctnetlink: fix reliable event delivery if message building fails 2010-03-20 14:29:03 -07:00
netlabel net: remove INIT_RCU_HEAD() usage 2010-02-17 00:03:27 -08:00
netlink netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err() 2010-03-20 14:29:03 -07:00
netrom net: netrom: use seq_hlist_foo() helpers 2010-02-10 11:12:08 -08:00
packet af_packet: move strict addr_len check right before dev_[mc/unicast]_[add/del] 2010-03-03 01:04:38 -08:00
phonet phonet: use for_each_set_bit() 2010-03-15 16:00:47 -07:00
rds net/rds: remove uses of NIPQUAD, use %pI4 2010-02-03 20:16:48 -08:00
rfkill rfkill: Add support for KEY_RFKILL 2010-03-02 14:28:49 -05:00
rose net: rose: use seq_hlist_foo() helpers 2010-02-10 11:12:08 -08:00
rxrpc net: use net_eq to compare nets 2009-11-25 15:14:13 -08:00
sched Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-02-09 11:44:44 -08:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2010-03-13 14:50:18 -08:00
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2010-03-13 14:50:18 -08:00
tipc tipc: fix lockdep warning on address assignment 2010-03-16 14:15:45 -07:00
unix AF_UNIX: update locking comment 2010-02-18 14:12:06 -08:00
wanrouter
wimax const: struct nla_policy 2010-02-18 14:30:18 -08:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-02-25 23:26:21 -08:00
x25 net: backlog functions rename 2010-03-05 13:34:03 -08:00
xfrm ipsec: Fix bogus bundle flowi 2010-03-03 01:04:37 -08:00
compat.c net: use compat helper functions in compat_sys_recvmmsg 2009-12-11 15:07:57 -08:00
Kconfig
Makefile
nonet.c
socket.c fs: no games with DCACHE_UNHASHED 2009-12-17 10:51:40 -05:00
sysctl_net.c net: spread __net_init, __net_exit 2010-01-17 19:16:02 -08:00
TUNABLE