forked from Minki/linux
net: dctcp: loosen requirement to assert ECT(0) during 3WHS
One deployment requirement of DCTCP is to be able to run in a DC setting along with TCP traffic. As Glenn Judd's NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter" [1] (tba) explains, one way to solve this on switch side is to split DCTCP and TCP traffic in two queues per switch port based on the DSCP: one queue soley intended for DCTCP traffic and one for non-DCTCP traffic. For the DCTCP queue, there's the marking threshold K as explained in commite3118e8359
("net: tcp: add DCTCP congestion control algorithm") for RED marking ECT(0) packets with CE. For the non-DCTCP queue, there's f.e. a classic tail drop queue. As already explained ine3118e8359
, running DCTCP at scale when not marking SYN/SYN-ACK packets with ECT(0) has severe consequences as for non-ECT(0) packets, traversing the RED marking DCTCP queue will result in a severe reduction of connection probability. This is due to the DCTCP queue being dominated by ECT(0) traffic and switches handle non-ECT traffic in the RED marking queue after passing K as drops, where K is usually a low watermark in order to leave enough tailroom for bursts. Splitting DCTCP traffic among several queues (ECN and non-ECN queue) is being considered a terrible idea in the network community as it splits single flows across multiple network paths. Therefore, commite3118e8359
implements this on Linux as ECT(0) marked traffic, as we argue that marking all packets of a DCTCP flow is the only viable solution and also doesn't speak against the draft. However, recently, a DCTCP implementation for FreeBSD hit also their mainline kernel [2]. In order to let them play well together with Linux' DCTCP, we would need to loosen the requirement that ECT(0) has to be asserted during the 3WHS as not implemented in FreeBSD. This simplifies the ECN test and lets DCTCP work together with FreeBSD. Joint work with Daniel Borkmann. [1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd [2]8ad8794452
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Glenn Judd <glenn.judd@morganstanley.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
6942241616
commit
843c2fdf7a
@ -5872,10 +5872,9 @@ static inline void pr_drop_req(struct request_sock *req, __u16 port, int family)
|
||||
* TCP ECN negotiation.
|
||||
*
|
||||
* Exception: tcp_ca wants ECN. This is required for DCTCP
|
||||
* congestion control; it requires setting ECT on all packets,
|
||||
* including SYN. We inverse the test in this case: If our
|
||||
* local socket wants ECN, but peer only set ece/cwr (but not
|
||||
* ECT in IP header) its probably a non-DCTCP aware sender.
|
||||
* congestion control: Linux DCTCP asserts ECT on all packets,
|
||||
* including SYN, which is most optimal solution; however,
|
||||
* others, such as FreeBSD do not.
|
||||
*/
|
||||
static void tcp_ecn_create_request(struct request_sock *req,
|
||||
const struct sk_buff *skb,
|
||||
@ -5885,18 +5884,15 @@ static void tcp_ecn_create_request(struct request_sock *req,
|
||||
const struct tcphdr *th = tcp_hdr(skb);
|
||||
const struct net *net = sock_net(listen_sk);
|
||||
bool th_ecn = th->ece && th->cwr;
|
||||
bool ect, need_ecn, ecn_ok;
|
||||
bool ect, ecn_ok;
|
||||
|
||||
if (!th_ecn)
|
||||
return;
|
||||
|
||||
ect = !INET_ECN_is_not_ect(TCP_SKB_CB(skb)->ip_dsfield);
|
||||
need_ecn = tcp_ca_needs_ecn(listen_sk);
|
||||
ecn_ok = net->ipv4.sysctl_tcp_ecn || dst_feature(dst, RTAX_FEATURE_ECN);
|
||||
|
||||
if (!ect && !need_ecn && ecn_ok)
|
||||
inet_rsk(req)->ecn_ok = 1;
|
||||
else if (ect && need_ecn)
|
||||
if ((!ect && ecn_ok) || tcp_ca_needs_ecn(listen_sk))
|
||||
inet_rsk(req)->ecn_ok = 1;
|
||||
}
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user