linux/net/sched
Vladimir Oltean db46e3a88a net/sched: taprio: avoid disabling offload when it was never enabled
In an incredibly strange API design decision, qdisc->destroy() gets
called even if qdisc->init() never succeeded, not exclusively since
commit 87b60cfacf ("net_sched: fix error recovery at qdisc creation"),
but apparently also earlier (in the case of qdisc_create_dflt()).

The taprio qdisc does not fully acknowledge this when it attempts full
offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
parsed from netlink (in taprio_change(), tail called from taprio_init()).

But in taprio_destroy(), we call taprio_disable_offload(), and this
determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).

But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
(a bitwise check of bit 1 in q->flags), it is invalid to call this macro
on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
an invalid set of flags.

As a result, it is possible to crash the kernel if user space forces an
error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
of taprio_enable_offload(). This is because drivers do not expect the
offload to be disabled when it was never enabled.

The error that we force here is to attach taprio as a non-root qdisc,
but instead as child of an mqprio root qdisc:

$ tc qdisc add dev swp0 root handle 1: \
	mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:1 \
	taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
	sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
Unable to handle kernel paging request at virtual address fffffffffffffff8
[fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Call trace:
 taprio_dump+0x27c/0x310
 vsc9959_port_setup_tc+0x1f4/0x460
 felix_port_setup_tc+0x24/0x3c
 dsa_slave_setup_tc+0x54/0x27c
 taprio_disable_offload.isra.0+0x58/0xe0
 taprio_destroy+0x80/0x104
 qdisc_create+0x240/0x470
 tc_modify_qdisc+0x1fc/0x6b0
 rtnetlink_rcv_msg+0x12c/0x390
 netlink_rcv_skb+0x5c/0x130
 rtnetlink_rcv+0x1c/0x2c

Fix this by keeping track of the operations we made, and undo the
offload only if we actually did it.

I've added "bool offloaded" inside a 4 byte hole between "int clockid"
and "atomic64_t picos_per_byte". Now the first cache line looks like
below:

$ pahole -C taprio_sched net/sched/sch_taprio.o
struct taprio_sched {
        struct Qdisc * *           qdiscs;               /*     0     8 */
        struct Qdisc *             root;                 /*     8     8 */
        u32                        flags;                /*    16     4 */
        enum tk_offsets            tk_offset;            /*    20     4 */
        int                        clockid;              /*    24     4 */
        bool                       offloaded;            /*    28     1 */

        /* XXX 3 bytes hole, try to pack */

        atomic64_t                 picos_per_byte;       /*    32     0 */

        /* XXX 8 bytes hole, try to pack */

        spinlock_t                 current_entry_lock;   /*    40     0 */

        /* XXX 8 bytes hole, try to pack */

        struct sched_entry *       current_entry;        /*    48     8 */
        struct sched_gate_list *   oper_sched;           /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */

Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 11:41:14 -07:00
..
act_api.c net/sched: act_api: Notify user space if any actions were flushed before error 2022-06-27 21:51:23 -07:00
act_bpf.c bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress 2022-03-03 14:38:48 +00:00
act_connmark.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_csum.c net/sched: act_api: Add extack to offload_act_setup() callback 2022-04-08 13:45:43 +01:00
act_ct.c net/sched: act_ct: set 'net' pointer when creating new nf_flow_table 2022-07-11 16:25:14 +02:00
act_ctinfo.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_gact.c net/sched: act_gact: Add extack messages for offload failure 2022-04-08 13:45:43 +01:00
act_gate.c net/sched: act_api: Add extack to offload_act_setup() callback 2022-04-08 13:45:43 +01:00
act_ife.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_ipt.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_meta_mark.c
act_meta_skbprio.c
act_meta_skbtcindex.c
act_mirred.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
act_mpls.c net/sched: act_mpls: Add extack messages for offload failure 2022-04-08 13:45:43 +01:00
act_nat.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_pedit.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-05-19 11:23:59 -07:00
act_police.c net/sched: act_police: allow 'continue' action offload 2022-07-06 12:44:39 +01:00
act_sample.c net/sched: act_api: Add extack to offload_act_setup() callback 2022-04-08 13:45:43 +01:00
act_simple.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_skbedit.c net: sched: support hash selecting tx queue 2022-04-19 12:20:45 +02:00
act_skbmod.c flow_offload: fill flags to action structure 2021-12-19 14:08:47 +00:00
act_tunnel_key.c net/sched: act_tunnel_key: Add extack message for offload failure 2022-04-08 13:45:43 +01:00
act_vlan.c net/sched: act_vlan: Add extack message for offload failure 2022-04-08 13:45:43 +01:00
cls_api.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-07-21 13:03:39 -07:00
cls_basic.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_bpf.c bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress 2022-03-03 14:38:48 +00:00
cls_cgroup.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_flow.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_flower.c net/sched: flower: Add PPPoE filter 2022-07-26 10:20:29 -07:00
cls_fw.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_matchall.c net/sched: matchall: Avoid overwriting error messages 2022-04-08 13:45:43 +01:00
cls_route.c net_sched: cls_route: disallow handle of 0 2022-08-15 11:46:30 +01:00
cls_rsvp6.c
cls_rsvp.c
cls_rsvp.h net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_tcindex.c net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
cls_u32.c net/sched: cls_u32: fix possible leak in u32_init_knode() 2022-04-15 14:26:11 -07:00
em_canid.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
em_cmp.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
em_ipset.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_ipt.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_meta.c net_sched: em_meta: add READ_ONCE() in var_sk_bound_if() 2022-05-16 10:31:06 +01:00
em_nbyte.c net: sched: Return the correct errno code 2021-02-06 11:15:28 -08:00
em_text.c
em_u32.c
ematch.c net: sched: Fix spelling mistakes 2021-05-31 22:44:56 -07:00
Kconfig net: sched: incorrect Kconfig dependencies on Netfilter modules 2020-12-09 15:49:29 -08:00
Makefile net/sched: sch_frag: add generic packet fragment support. 2020-11-27 14:36:02 -08:00
sch_api.c net: rename reference+tracking helpers 2022-06-09 21:52:55 -07:00
sch_atm.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_blackhole.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_cake.c Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb" 2022-08-31 20:02:28 -07:00
sch_cbq.c net/sched: sch_cbq: change the type of cbq_set_lss to void 2022-07-27 18:30:18 -07:00
sch_cbs.c net: don't include ethtool.h from netdevice.h 2020-11-23 17:27:04 -08:00
sch_choke.c net: sched: validate stab values 2021-03-10 15:47:52 -08:00
sch_codel.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_drr.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_dsmark.c net/sched: store the last executed chain also for clsact egress 2021-07-29 22:17:37 +01:00
sch_etf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_ets.c net/sched: sch_ets: don't remove idle classes from the round-robin list 2021-12-13 12:30:23 +00:00
sch_fifo.c net_sched: fix NULL deref in fifo_set_limit() 2021-10-01 14:59:10 -07:00
sch_fq_codel.c fq_codel: generalise ce_threshold marking for subset of traffic 2021-10-20 15:24:36 -07:00
sch_fq_pie.c net/sched: fq_pie: prevent dismantle issue 2021-12-09 08:01:00 -08:00
sch_fq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_frag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-12-31 14:35:40 +00:00
sch_generic.c net/sched: fix netdevice reference leaks in attach_default_qdiscs() 2022-08-30 15:10:08 +02:00
sch_gred.c net: sched: gred: dynamically allocate tc_gred_qopt_offload 2021-10-27 12:06:52 -07:00
sch_hfsc.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_hhf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_htb.c sch_htb: Fail on unsupported parameters when offload is requested 2022-01-25 20:00:02 -08:00
sch_ingress.c
sch_mq.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_mqprio.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_multiq.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_netem.c net/sched: sch_netem: Fix arithmetic in netem_dump() for 32-bit platforms 2022-06-17 20:29:38 -07:00
sch_pie.c net: sched: fix misspellings using misspell-fixer tool 2020-11-10 17:00:28 -08:00
sch_plug.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_prio.c net: sched: Remove Qdisc::running sequence counter 2021-10-18 12:54:41 +01:00
sch_qfq.c sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc 2022-01-04 12:36:51 +00:00
sch_red.c net: sched: validate stab values 2021-03-10 15:47:52 -08:00
sch_sfb.c sch_sfb: Also store skb len before calling child enqueue 2022-09-08 11:12:58 +02:00
sch_sfq.c net/sched: store the last executed chain also for clsact egress 2021-07-29 22:17:37 +01:00
sch_skbprio.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_taprio.c net/sched: taprio: avoid disabling offload when it was never enabled 2022-09-20 11:41:14 -07:00
sch_tbf.c net: sched: tbf: don't call qdisc_put() while holding tree lock 2022-08-30 11:41:24 +02:00
sch_teql.c net: sched: sch_teql: fix null-pointer dereference 2021-04-08 14:14:42 -07:00