linux/net
Daniel Borkmann 4c47af4d5e net: sctp: rework multihoming retransmission path selection to rfc4960
Problem statement: 1) both paths (primary path1 and alternate
path2) are up after the association has been established i.e.,
HB packets are normally exchanged, 2) path2 gets inactive after
path_max_retrans * max_rto timed out (i.e. path2 is down completely),
3) now, if a transmission times out on the only surviving/active
path1 (any ~1sec network service impact could cause this like
a channel bonding failover), then the retransmitted packets are
sent over the inactive path2; this happens with partial failover
and without it.

Besides not being optimal in the above scenario, a small failure
or timeout in the only existing path has the potential to cause
long delays in the retransmission (depending on RTO_MAX) until
the still active path is reselected. Further, when the T3-timeout
occurs, we have active_patch == retrans_path, and even though the
timeout occurred on the initial transmission of data, not a
retransmit, we end up updating retransmit path.

RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
6.4.1. "Failover from an Inactive Destination Address" the
following:

  Some of the transport addresses of a multi-homed SCTP endpoint
  may become inactive due to either the occurrence of certain
  error conditions (see Section 8.2) or adjustments from the
  SCTP user.

  When there is outbound data to send and the primary path
  becomes inactive (e.g., due to failures), or where the SCTP
  user explicitly requests to send data to an inactive
  destination transport address, before reporting an error to
  its ULP, the SCTP endpoint should try to send the data to an
  alternate __active__ destination transport address if one
  exists.

  When retransmitting data that timed out, if the endpoint is
  multihomed, it should consider each source-destination address
  pair in its retransmission selection policy. When retransmitting
  timed-out data, the endpoint should attempt to pick the most
  divergent source-destination pair from the original
  source-destination pair to which the packet was transmitted.

  Note: Rules for picking the most divergent source-destination
  pair are an implementation decision and are not specified
  within this document.

So, we should first reconsider to take the current active
retransmission transport if we cannot find an alternative
active one. If all of that fails, we can still round robin
through unkown, partial failover, and inactive ones in the
hope to find something still suitable.

Commit 4141ddc02a ("sctp: retran_path update bug fix") broke
that behaviour by selecting the next inactive transport when
no other active transport was found besides the current assoc's
peer.retran_path. Before commit 4141ddc02a, we would have
traversed through the list until we reach our peer.retran_path
again, and in case that is still in state SCTP_ACTIVE, we would
take it and return. Only if that is not the case either, we
take the next inactive transport.

Besides all that, another issue is that transports in state
SCTP_UNKNOWN could be preferred over transports in state
SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
SCTP_UNKNOWN in the transport list yielding a weaker transport
state to be used in retransmission.

This patch mostly reverts 4141ddc02a, but also rewrites
this function to introduce more clarity and strictness into
the code. A strict priority of transport states is enforced
in this patch, hence selection is active > unkown > partial
failover > inactive.

Fixes: 4141ddc02a ("sctp: retran_path update bug fix")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <yasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-22 00:26:05 -05:00
..
9p 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers 2014-02-10 17:48:54 -08:00
802 neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. 2014-01-16 11:31:58 -08:00
8021q 8021q: Use ether_addr_copy 2014-01-21 18:13:04 -08:00
appletalk net: Fix some fallout from the etner_addr_copy() changes. 2014-01-21 18:57:26 -08:00
atm net: Fix some fallout from the etner_addr_copy() changes. 2014-01-21 18:57:26 -08:00
ax25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
batman-adv batman-adv: fix potential kernel paging error for unicast transmissions 2014-02-17 17:17:02 +01:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2014-02-18 16:29:46 -08:00
bridge bridge: Prevent possible race condition in br_fdb_change_mac_address 2014-02-10 14:34:34 -08:00
caif net: Include appropriate header file in caif/cfsrvl.c 2014-02-09 17:32:49 -08:00
can linux-can-fixes-for-3.14-20140129 2014-01-30 16:48:17 -08:00
ceph libceph: do not dereference a NULL bio pointer 2014-02-07 11:37:07 -08:00
core neigh: fix setting of default gc_* values 2014-02-22 00:08:10 -05:00
dcb dcb: use __dev_get_by_name instead of dev_get_by_name to find interface 2014-01-14 18:50:46 -08:00
dccp dccp: re-enable debug macro 2014-02-16 23:45:00 -05:00
decnet net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dns_resolver net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dsa dsa: Use ether_addr_copy 2014-01-21 18:13:05 -08:00
ethernet net: eth_type_trans() should use skb_header_pointer() 2014-01-16 15:30:31 -08:00
hsr net/hsr: using kfree_rcu() to simplify the code 2013-12-17 16:32:30 -05:00
ieee802154 6lowpan: fix lockdep splats 2014-02-10 17:51:29 -08:00
ipv4 net-tcp: fastopen: fix high order allocations 2014-02-22 00:05:21 -05:00
ipv6 sit: fix panic with route cache in ip tunnels 2014-02-20 13:13:50 -05:00
ipx net: Move prototype declaration to header file include/net/net_namespace.h from net/ipx/af_ipx.c 2014-02-09 17:32:50 -08:00
irda net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
iucv net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
key xfrm: export verify_userspi_info for pkfey and netlink interface 2013-12-16 12:54:02 +01:00
l2tp ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts 2014-01-21 16:59:19 -08:00
lapb net/lapb: re-send packets on timeout 2013-09-23 16:52:45 -04:00
llc llc: remove noisy WARN from llc_mac_hdr_init 2014-01-28 18:01:32 -08:00
mac80211 netdevice: add queue selection fallback handler for ndo_select_queue 2014-02-17 00:36:34 -05:00
mac802154 mac802154: fix following checkpath.pl warning Prefer pr_warn(... to pr_warning(... 2013-12-22 18:53:08 -05:00
mpls ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
netfilter netfilter: ctnetlink: force null nat binding on insert 2014-02-18 00:13:51 +01:00
netlabel netlabel: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
netlink net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
netrom net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
openvswitch openvswitch: Suppress error messages on megaflow updates 2014-02-04 22:32:38 -08:00
packet af_packet: remove a stray tab in packet_set_ring() 2014-02-18 18:02:25 -05:00
phonet net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rds net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
rose net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rxrpc RxRPC fixes 2014-01-28 18:04:18 -08:00
sched net: sched: Cleanup PIE comments 2014-02-13 18:29:58 -05:00
sctp net: sctp: rework multihoming retransmission path selection to rfc4960 2014-02-22 00:26:05 -05:00
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-11 12:05:55 -08:00
tipc tipc: make bearer set up in module insertion stage 2014-02-22 00:00:15 -05:00
unix net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
vmw_vsock net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless cfg80211: send scan results from work queue 2014-02-06 09:55:19 +01:00
x25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
compat.c x86, x32: Correct invalid use of user timespec in the kernel 2014-01-30 18:44:13 -08:00
Kconfig net: netprio: rename config to be more consistent with cgroup configs 2014-01-03 23:41:42 +01:00
Makefile net: move 6lowpan compression code to separate module 2014-01-15 15:36:38 -08:00
nonet.c
socket.c net: handle error more gracefully in socketpair() 2013-12-10 22:24:13 -05:00
sysctl_net.c net: Update the sysctl permissions handler to test effective uid/gid 2013-10-07 15:57:56 -04:00