linux/net
Ido Schimmel 32f2a0afa9 net/sched: flower: Fix chain template offload
When a qdisc is deleted from a net device the stack instructs the
underlying driver to remove its flow offload callback from the
associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
then continues to replay the removal of the filters in the block for
this driver by iterating over the chains in the block and invoking the
'reoffload' operation of the classifier being used. In turn, the
classifier in its 'reoffload' operation prepares and emits a
'FLOW_CLS_DESTROY' command for each filter.

However, the stack does not do the same for chain templates and the
underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when
a qdisc is deleted. This results in a memory leak [1] which can be
reproduced using [2].

Fix by introducing a 'tmplt_reoffload' operation and have the stack
invoke it with the appropriate arguments as part of the replay.
Implement the operation in the sole classifier that supports chain
templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}'
command based on whether a flow offload callback is being bound to a
filter block or being unbound from one.

As far as I can tell, the issue happens since cited commit which
reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains()
in __tcf_block_put(). The order cannot be reversed as the filter block
is expected to be freed after flushing all the chains.

[1]
unreferenced object 0xffff888107e28800 (size 2048):
  comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
  hex dump (first 32 bytes):
    b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff  ..|......[......
    01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff  ................
  backtrace:
    [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
    [<ffffffff81ab374e>] __kmalloc+0x4e/0x90
    [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0
    [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
    [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
    [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
    [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
    [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
    [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
    [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
    [<ffffffff83ac6270>] netlink_unicast+0x540/0x820
    [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
    [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
    [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0
    [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0
    [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0
unreferenced object 0xffff88816d2c0400 (size 1024):
  comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
  hex dump (first 32 bytes):
    40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00  @.......W.8.....
    10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff  ..,m......,m....
  backtrace:
    [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
    [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90
    [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0
    [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460
    [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0
    [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0
    [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
    [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
    [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
    [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
    [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
    [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
    [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
    [<ffffffff83ac6270>] netlink_unicast+0x540/0x820
    [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
    [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80

[2]
 # tc qdisc add dev swp1 clsact
 # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32
 # tc qdisc del dev swp1 clsact
 # devlink dev reload pci/0000:06:00.0

Fixes: bbf73830cd ("net: sched: traverse chains in block with tcf_get_next_chain()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-24 01:33:59 +00:00
..
6lowpan
9p net: 9p: avoid freeing uninit memory in p9pdu_vreadf 2023-12-13 05:44:30 +09:00
802 net: fill in MODULE_DESCRIPTION()s under net/802* 2023-10-28 11:29:28 +01:00
8021q vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING 2024-01-19 21:25:06 -08:00
appletalk net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
atm net: fill in MODULE_DESCRIPTION()s for ATM 2024-01-05 08:04:23 -08:00
ax25 net: implement lockless SO_PRIORITY 2023-10-01 19:09:54 +01:00
batman-adv batman-adv: Switch to linux/array_size.h 2023-11-14 08:16:34 +01:00
bluetooth TTY/Serial changes for 6.8-rc1 2024-01-18 11:37:24 -08:00
bpf bpf: Fix dtor CFI 2023-12-15 16:25:55 -08:00
bridge netfilter: bridge: replace physindev with physinif in nf_bridge_info 2024-01-17 12:02:49 +01:00
caif net: fill in MODULE_DESCRIPTION()s for CAIF 2024-01-05 08:06:35 -08:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-10-12 17:07:34 -07:00
ceph This update includes the following changes: 2023-11-02 16:15:30 -10:00
core net: fix removing a namespace with conflicting altnames 2024-01-22 01:09:30 +00:00
dcb
dccp net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
devlink devlink: extend multicast filtering by port index 2023-12-19 15:31:40 +01:00
dns_resolver Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
dsa net: dsa: fix netdev_priv() dereference before check on non-DSA netdevice events 2024-01-11 16:33:52 -08:00
ethernet
ethtool ethtool: netlink: Add missing ethnl_ops_begin/complete 2024-01-18 13:21:06 +01:00
handshake Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-10-26 13:46:28 -07:00
hsr net: fill in MODULE_DESCRIPTION()s for HSR 2024-01-11 16:16:08 -08:00
ieee802154 mac802154: Avoid new associations while disassociating 2023-12-15 11:14:57 +01:00
ife net: sched: ife: fix potential use-after-free 2023-12-15 10:50:18 +00:00
ipv4 tcp: Add memory barrier to tcp_push() 2024-01-23 10:22:31 +01:00
ipv6 ipv6: init the accept_queue's spinlocks in inet6_create 2024-01-23 13:44:50 +01:00
iucv iucv: make iucv_bus const 2023-12-29 07:46:38 +00:00
kcm net: kcm: fix direct access to bv_len 2024-01-03 18:37:22 -08:00
key
l2tp ipv6: annotate data-races around np->ucast_oif 2023-12-11 10:59:17 +00:00
l3mdev
lapb
llc llc: Drop support for ETH_P_TR_802_2. 2024-01-19 21:30:09 -08:00
mac80211 wireless fixes for v6.8-rc2 2024-01-23 08:38:13 -08:00
mac802154 mac802154: Avoid new associations while disassociating 2023-12-15 11:14:57 +01:00
mctp mctp: perform route lookups under a RCU read-side lock 2023-10-10 19:43:22 -07:00
mpls
mptcp mptcp: relax check on MPC passive fallback 2024-01-17 10:55:54 +00:00
ncsi net/ncsi: Add NC-SI 1.2 Get MC MAC Address command 2023-11-18 15:00:51 +00:00
netfilter ipvs: avoid stat macros calls from preemptible context 2024-01-17 12:02:51 +01:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2023-12-07 14:23:12 -05:00
netlink netlink: fix potential sleeping issue in mqueue_flush_file 2024-01-23 11:21:18 +01:00
netrom net: implement lockless SO_PRIORITY 2023-10-01 19:09:54 +01:00
nfc net: fill in MODULE_DESCRIPTION()s for NFC 2024-01-11 16:16:08 -08:00
nsh
openvswitch net/sched: act_ct: Always fill offloading tuple iifidx 2023-11-08 17:47:08 -08:00
packet net: fill in MODULE_DESCRIPTION() for AF_PACKET 2024-01-05 08:06:35 -08:00
phonet
psample genetlink: Use internal flags for multicast groups 2023-12-29 08:43:59 +00:00
qrtr net: qrtr: ns: Return 0 if server port is not present 2024-01-01 18:41:29 +00:00
rds net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv 2024-01-22 11:24:00 +00:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-12-21 22:17:23 +01:00
rose net/rose: fix races in rose_kill_by_device() 2023-12-15 11:59:53 +00:00
rxrpc rxrpc: Fix use of Don't Fragment flag 2024-01-11 16:41:41 -08:00
sched net/sched: flower: Fix chain template offload 2024-01-24 01:33:59 +00:00
sctp sctp: fix busy polling 2024-01-04 10:29:18 +00:00
smc net/smc: fix illegal rmb_desc access in SMC-D connection dump 2024-01-19 12:04:17 +00:00
strparser
sunrpc net: fill in MODULE_DESCRIPTION()s for Sun RPC 2024-01-11 16:16:08 -08:00
switchdev
tipc tipc: Remove some excess struct member documentation 2023-12-22 23:14:43 +00:00
tls net: tls, fix WARNIING in __sk_msg_free 2024-01-14 12:17:14 +00:00
unix for-6.8/io_uring-2024-01-08 2024-01-11 14:19:23 -08:00
vmw_vsock vsock/virtio: use skb_frag_*() helpers 2024-01-03 18:37:16 -08:00
wireless wireless fixes for v6.8-rc2 2024-01-23 08:38:13 -08:00
x25 net: remove SOCK_DEBUG leftovers 2023-12-26 20:31:01 +00:00
xdp netdev 2023-12-18 16:46:08 -08:00
xfrm bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc 2023-12-14 17:12:49 -08:00
compat.c file: stop exposing receive_fd_user() 2023-12-12 14:24:14 +01:00
devres.c
Kconfig bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
Kconfig.debug
Makefile bpfilter: remove bpfilter 2024-01-04 10:23:10 -08:00
socket.c vfs-6.8.iov_iter 2024-01-08 11:43:04 -08:00
sysctl_net.c