linux/net/tipc
Tuong Lien c0b14a0854 tipc: fix missing Name entries due to half-failover
TIPC link can temporarily fall into "half-establish" that only one of
the link endpoints is ESTABLISHED and starts to send traffic, PROTOCOL
messages, whereas the other link endpoint is not up (e.g. immediately
when the endpoint receives ACTIVATE_MSG, the network interface goes
down...).

This is a normal situation and will be settled because the link
endpoint will be eventually brought down after the link tolerance time.

However, the situation will become worse when the second link is
established before the first link endpoint goes down,
For example:

   1. Both links <1A-2A>, <1B-2B> down
   2. Link endpoint 2A up, but 1A still down (e.g. due to network
      disturbance, wrong session, etc.)
   3. Link <1B-2B> up
   4. Link endpoint 2A down (e.g. due to link tolerance timeout)
   5. Node B starts failover onto link <1B-2B>

   ==> Node A does never start link failover.

When the "half-failover" situation happens, two consequences have been
observed:

a) Peer link/node gets stuck in FAILINGOVER state;
b) Traffic or user messages that peer node is trying to failover onto
the second link can be partially or completely dropped by this node.

The consequence a) was actually solved by commit c140eb166d ("tipc:
fix failover problem"), but that commit didn't cover the b). It's due
to the fact that the tunnel link endpoint has never been prepared for a
failover, so the 'l->drop_point' (and the other data...) is not set
correctly. When a TUNNEL_MSG from peer node arrives on the link,
depending on the inner message's seqno and the current 'l->drop_point'
value, the message can be dropped (- treated as a duplicate message) or
processed.
At this early stage, the traffic messages from peer are likely to be
NAME_DISTRIBUTORs, this means some name table entries will be missed on
the node forever!

The commit resolves the issue by starting the FAILOVER process on this
node as well. Another benefit from this solution is that we ensure the
link will not be re-established until the failover ends.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 00:59:51 -04:00
..
addr.c tipc: handle collisions of 32-bit node address hash values 2018-03-23 13:12:18 -04:00
addr.h tipc: add 128-bit node identifier 2018-03-23 13:12:18 -04:00
bcast.c tipc: add NULL pointer check 2019-04-04 17:34:11 -07:00
bcast.h tipc: fix a null pointer deref 2019-03-21 09:56:55 -07:00
bearer.c netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
bearer.h tipc: enable tracepoints in tipc 2018-12-19 11:49:24 -08:00
core.c tipc: introduce new capability flag for cluster 2019-03-19 13:56:17 -07:00
core.h tipc: introduce new capability flag for cluster 2019-03-19 13:56:17 -07:00
diag.c tipc: switch to rhashtable iterator 2018-08-29 18:04:54 -07:00
discover.c tipc: fix lockdep warning when reinitilaizing sockets 2018-11-17 22:01:31 -08:00
discover.h tipc: some cleanups in the file discover.c 2018-03-23 13:12:17 -04:00
eth_media.c
group.c netlink: make nla_nest_start() add NLA_F_NESTED flag 2019-04-27 17:03:44 -04:00
group.h tipc: extend sock diag for group communication 2018-06-30 21:05:42 +09:00
ib_media.c
Kconfig tipc: implement socket diagnostics for AF_TIPC 2018-03-22 14:43:35 -04:00
link.c tipc: fix missing Name entries due to half-failover 2019-05-04 00:59:51 -04:00
link.h tipc: fix missing Name entries due to half-failover 2019-05-04 00:59:51 -04:00
Makefile tipc: enable tracepoints in tipc 2018-12-19 11:49:24 -08:00
monitor.c netlink: make nla_nest_start() add NLA_F_NESTED flag 2019-04-27 17:03:44 -04:00
monitor.h
msg.c tipc: buffer overflow handling in listener socket 2018-09-29 11:24:22 -07:00
msg.h tipc: reduce duplicate packets for unicast traffic 2019-04-04 18:29:25 -07:00
name_distr.c tipc: eliminate message disordering during binding table update 2018-10-22 19:29:12 -07:00
name_distr.h tipc: permit overlapping service ranges in name table 2018-03-31 22:19:52 -04:00
name_table.c netlink: make nla_nest_start() add NLA_F_NESTED flag 2019-04-27 17:03:44 -04:00
name_table.h tipc: eliminate message disordering during binding table update 2018-10-22 19:29:12 -07:00
net.c netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
net.h tipc: fix lockdep warning when reinitilaizing sockets 2018-11-17 22:01:31 -08:00
netlink_compat.c genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
netlink.c genetlink: optionally validate strictly/dumps 2019-04-27 17:07:22 -04:00
netlink.h
node.c tipc: fix missing Name entries due to half-failover 2019-05-04 00:59:51 -04:00
node.h tipc: improve TIPC throughput by Gap ACK blocks 2019-04-04 18:29:25 -07:00
socket.c netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
socket.h tipc: add trace_events for tipc socket 2018-12-19 11:49:24 -08:00
subscr.c tipc: fix unbalanced reference counter 2018-04-12 21:46:10 -04:00
subscr.h tipc: replace name table service range array with rb tree 2018-03-31 22:19:52 -04:00
sysctl.c tipc: set sysctl_tipc_rmem and named_timeout right range 2019-04-16 21:32:02 -07:00
topsrv.c tipc: fix cancellation of topology subscriptions 2019-03-21 09:09:04 -07:00
topsrv.h tipc: rename tipc_server to tipc_topsrv 2018-02-16 15:26:34 -05:00
trace.c tipc: remove unneeded semicolon in trace.c 2019-01-17 22:04:43 -08:00
trace.h tipc: add trace_events for tipc bearer 2018-12-19 11:49:25 -08:00
udp_media.c netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
udp_media.h tipc: implement configuration of UDP media MTU 2018-04-20 11:04:05 -04:00