linux/net
Peilin Ye 56af5e749f net/sched: act_skbmod: Add SKBMOD_F_ECN option support
Currently, when doing rate limiting using the tc-police(8) action, the
easiest way is to simply drop the packets which exceed or conform the
configured bandwidth limit.  Add a new option to tc-skbmod(8), so that
users may use the ECN [1] extension to explicitly inform the receiver
about the congestion instead of dropping packets "on the floor".

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [2]:

	0b00: "Non ECN-Capable Transport", Non-ECT
	0b10: "ECN Capable Transport", ECT(0)
	0b01: "ECN Capable Transport", ECT(1)
	0b11: "Congestion Encountered", CE

As an example:

	$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
		matchall action skbmod ecn

Doing the above marks all ECT(0) and ECT(1) packets as CE.  It does NOT
affect Non-ECT or non-IP packets.  In the tc-police scenario mentioned
above, users may pipe a tc-police action and a tc-skbmod "ecn" action
together to achieve ECN-based rate limiting.

For TCP connections, upon receiving a CE packet, the receiver will respond
with an ECE packet, asking the sender to reduce their congestion window.
However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and
our implementation does not touch or care about L4 headers.

The updated tc-skbmod SYNOPSIS looks like the following:

	tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command.  Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IPv{4,6} packets.

It is also worth mentioning that, in theory, the same effect could be
achieved by piping a "police" action and a "bpf" action using the
bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the
user, thus impractical.

Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets".

[1] https://datatracker.ietf.org/doc/html/rfc3168
[2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28 13:19:31 +01:00
..
6lowpan
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802 net/802/garp: fix memleak in garp_request_join() 2021-07-01 11:21:57 -07:00
8021q dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
appletalk net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
bluetooth TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-23 16:13:06 +01:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: move bridge ioctls out of .ndo_do_ioctl 2021-07-27 20:11:45 +01:00
caif net: fix uninit-value in caif_seqpkt_sendmsg 2021-07-15 11:08:33 -07:00
can can: j1939: j1939_xtp_rx_dat_one(): use separate pointer for session skb control buffer 2021-07-25 11:36:25 +02:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core devlink: Remove duplicated registration check 2021-07-28 10:23:45 +01:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp memcg: enable accounting for inet_bin_bucket cache 2021-07-20 06:00:38 -07:00
decnet net: decnet: Fix sleeping inside in af_decnet 2021-07-16 14:06:16 -07:00
dns_resolver
dsa dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
ethernet Revert "net: dsa: Allow drivers to filter packets they can decode source port from" 2021-07-26 22:35:22 +01:00
ethtool dev_ioctl: pass SIOCDEVPRIVATE data separately 2021-07-27 20:11:44 +01:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
ife
ipv4 ip_tunnel: use ndo_siocdevprivate 2021-07-27 20:11:44 +01:00
ipv6 ip_tunnel: use ndo_siocdevprivate 2021-07-27 20:11:44 +01:00
iucv s390: iucv: Avoid field over-reading memcpy() 2021-07-01 15:54:01 -07:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc llc2: Remove redundant assignment to rc 2021-04-27 14:16:14 -07:00
mac80211 mac80211: Switch to a virtual time-based airtime scheduler 2021-06-23 18:12:00 +02:00
mac802154
mpls mpls: defer ttl decrement in mpls_forward() 2021-07-23 17:17:56 +01:00
mptcp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
ncsi net/ncsi: add dummy response handler for Intel boards 2021-07-08 14:16:39 -07:00
netfilter net: ipv6: introduce ip6_dst_mtu_maybe_forward 2021-07-21 08:22:02 -07:00
netlabel net: cipso: fix warnings in netlbl_cipsov4_add_std 2021-07-27 20:58:30 +01:00
netlink net: netlink: add the case when nlh is NULL 2021-07-27 11:43:50 +01:00
netrom netrom: Decrease sock refcount when sock timers expire 2021-07-18 09:48:59 -07:00
nfc nfc: constify nfc_digital_ops 2021-07-25 09:21:21 +01:00
nsh
openvswitch openvswitch: fix sparse warning incorrect type 2021-07-27 11:48:43 +01:00
packet Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
phonet phonet: use siocdevprivate 2021-07-27 20:11:43 +01:00
psample
qrtr net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
rds Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose
rxrpc Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
sched net/sched: act_skbmod: Add SKBMOD_F_ECN option support 2021-07-28 13:19:31 +01:00
sctp sctp: do not update transport pathmtu if SPP_PMTUD_ENABLE is not set 2021-07-21 14:17:58 -07:00
smc net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc NFS client updates for Linux 5.14 2021-07-09 09:43:57 -07:00
switchdev net: switchdev: fix FDB entries towards foreign ports not getting propagated to us 2021-07-22 00:45:40 -07:00
tipc tipc: fix an use-after-free issue in tipc_recvmsg 2021-07-25 10:43:30 +01:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-07-15 22:40:10 -07:00
vmw_vsock Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
wireless cfg80211: Support hidden AP discovery over 6GHz band 2021-06-23 13:05:09 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
xfrm Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig bpf, kconfig: Add consolidated menu entry for bpf with core options 2021-05-11 13:56:16 -07:00
Makefile
socket.c net: bridge: move bridge ioctls out of .ndo_do_ioctl 2021-07-27 20:11:45 +01:00
sysctl_net.c