Commit Graph

1307084 Commits

Author SHA1 Message Date
Tejun Heo
a4af89cc50 sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Zhao Mengmeng <zhaomzhao@126.com>
Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org
Fixes: 245254f708 ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
2024-11-14 08:50:58 -10:00
Tejun Heo
a6250aa251 sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()
sched_ext dispatches tasks from the BPF scheduler from balance_scx() and
thus every pick_task_scx() call must be preceded by balance_scx(). While
this usually holds, due to a bug, there are cases where the fair class's
balance() returns true indicating that it has tasks to run on the CPU and
thus terminating balance() calls but fails to actually find the next task to
run when pick_task() is called. In such cases, pick_task_scx() can be called
without preceding balance_scx().

Detect this condition using SCX_RQ_BAL_PENDING flags. If detected, keep
running the previous task if possible and avoid stalling from entering idle
without balancing.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/Ztj_h5c2LYsdXYbA@slm.duckdns.org
2024-11-09 10:43:55 -10:00
Tejun Heo
a759bf0dfc sched_ext: Update scx_show_state.py to match scx_ops_bypass_depth's new type
0e7ffff1b8 ("scx: Fix raciness in scx_ops_bypass()") converted
scx_ops_bypass_depth from an atomic to an int. Update scx_show_state.py
accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 0e7ffff1b8 ("scx: Fix raciness in scx_ops_bypass()")
2024-11-05 11:45:27 -10:00
Tejun Heo
f7d1b585e1 sched_ext: Add a missing newline at the end of an error message
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-11-05 11:45:24 -10:00
Tejun Heo
c31f2ee5cd sched_ext: Fix enq_last_no_enq_fails selftest
cc9877fb76 ("sched_ext: Improve error reporting during loading") changed
how load failures are reported so that more error context can be
communicated. This breaks the enq_last_no_enq_fails test as attach no longer
fails. The scheduler is guaranteed to be ejected on attach completion with
full error information. Update enq_last_no_enq_fails so that it checks that
the scheduler is ejected using ops.exit().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vishal Chourasia <vishalc@linux.ibm.com>
Link: http://lkml.kernel.org/r/Zxknp7RAVNjmdJSc@linux.ibm.com
Fixes: cc9877fb76 ("sched_ext: Improve error reporting during loading")
2024-10-25 12:20:29 -10:00
Tejun Heo
7724abf0ca sched_ext: Make cast_mask() inline
cast_mask() doesn't do any actual work and is defined in a header file.
Force it to be inline. When it is not inlined and the function is not used,
it can cause verificaiton failures like the following:

  # tools/testing/selftests/sched_ext/runner -t minimal
  ===== START =====
  TEST: minimal
  DESCRIPTION: Verify we can load a fully minimal scheduler
  OUTPUT:
  libbpf: prog 'cast_mask': missing BPF prog type, check ELF section name '.text'
  libbpf: prog 'cast_mask': failed to load: -22
  libbpf: failed to load object 'minimal'
  libbpf: failed to load BPF skeleton 'minimal': -22
  ERR: minimal.c:20
  Failed to open and load skel
  not ok 1 minimal #
  =====  END  =====

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: a748db0c8c ("tools/sched_ext: Receive misc updates from SCX repo")
2024-10-25 12:19:44 -10:00
David Vernet
0e7ffff1b8 scx: Fix raciness in scx_ops_bypass()
scx_ops_bypass() can currently race on the ops enable / disable path as
follows:

1. scx_ops_bypass(true) called on enable path, bypass depth is set to 1
2. An op on the init path exits, which schedules scx_ops_disable_workfn()
3. scx_ops_bypass(false) is called on the disable path, and bypass depth
   is decremented to 0
4. kthread is scheduled to execute scx_ops_disable_workfn()
5. scx_ops_bypass(true) called, bypass depth set to 1
6. scx_ops_bypass() races when iterating over CPUs

While it's not safe to take any blocking locks on the bypass path, it is
safe to take a raw spinlock which cannot be preempted. This patch therefore
updates scx_ops_bypass() to use a raw spinlock to synchronize, and changes
scx_ops_bypass_depth to be a regular int.

Without this change, we observe the following warnings when running the
'exit' sched_ext selftest (sometimes requires a couple of runs):

.[root@virtme-ng sched_ext]# ./runner -t exit
===== START =====
TEST: exit
...
[   14.935078] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4332 scx_ops_bypass+0x1ca/0x280
[   14.935126] Modules linked in:
[   14.935150] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Not tainted 6.11.0-virtme #24
[   14.935192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   14.935242] Sched_ext: exit (enabling+all)
[   14.935244] RIP: 0010:scx_ops_bypass+0x1ca/0x280
[   14.935300] Code: ff ff ff e8 48 96 10 00 fb e9 08 ff ff ff c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 <0f> 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 0f 0b 90 41 8b 84 24 24
[   14.935394] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010002
[   14.935424] RAX: 0000000000000009 RBX: 0000000000000001 RCX: 00000000e3fb8b2a
[   14.935465] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff88a4c080
[   14.935512] RBP: 0000000000009b56 R08: 0000000000000004 R09: 00000003f12e520a
[   14.935555] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300
[   14.935598] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018
[   14.935642] FS:  0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000
[   14.935684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.935721] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0
[   14.935765] PKRU: 55555554
[   14.935782] Call Trace:
[   14.935802]  <TASK>
[   14.935823]  ? __warn+0xce/0x220
[   14.935850]  ? scx_ops_bypass+0x1ca/0x280
[   14.935881]  ? report_bug+0xc1/0x160
[   14.935909]  ? handle_bug+0x61/0x90
[   14.935934]  ? exc_invalid_op+0x1a/0x50
[   14.935959]  ? asm_exc_invalid_op+0x1a/0x20
[   14.935984]  ? raw_spin_rq_lock_nested+0x15/0x30
[   14.936019]  ? scx_ops_bypass+0x1ca/0x280
[   14.936046]  ? srso_alias_return_thunk+0x5/0xfbef5
[   14.936081]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[   14.936111]  scx_ops_disable_workfn+0x146/0xac0
[   14.936142]  ? finish_task_switch+0xa9/0x2c0
[   14.936172]  ? srso_alias_return_thunk+0x5/0xfbef5
[   14.936211]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[   14.936244]  kthread_worker_fn+0x101/0x2c0
[   14.936268]  ? __pfx_kthread_worker_fn+0x10/0x10
[   14.936299]  kthread+0xec/0x110
[   14.936327]  ? __pfx_kthread+0x10/0x10
[   14.936351]  ret_from_fork+0x37/0x50
[   14.936374]  ? __pfx_kthread+0x10/0x10
[   14.936400]  ret_from_fork_asm+0x1a/0x30
[   14.936427]  </TASK>
[   14.936443] irq event stamp: 21002
[   14.936467] hardirqs last  enabled at (21001): [<ffffffff863aa35f>] resched_cpu+0x9f/0xd0
[   14.936521] hardirqs last disabled at (21002): [<ffffffff863dd0ba>] scx_ops_bypass+0x11a/0x280
[   14.936571] softirqs last  enabled at (20642): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0
[   14.936622] softirqs last disabled at (20637): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0
[   14.936672] ---[ end trace 0000000000000000 ]---
[   14.953282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   14.953352] ------------[ cut here ]------------
[   14.953383] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4335 scx_ops_bypass+0x1d8/0x280
[   14.953428] Modules linked in:
[   14.953453] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Tainted: G        W          6.11.0-virtme #24
[   14.953505] Tainted: [W]=WARN
[   14.953527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   14.953574] RIP: 0010:scx_ops_bypass+0x1d8/0x280
[   14.953603] Code: c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 0f 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 <0f> 0b 90 41 8b 84 24 24 0a 00 00 eb 92 f3 0f 1e fa 49 8d 84 24 f0
[   14.953693] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010046
[   14.953722] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
[   14.953763] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fc5fec31318
[   14.953804] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   14.953845] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300
[   14.953888] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018
[   14.953934] FS:  0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000
[   14.953974] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.954009] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0
[   14.954052] PKRU: 55555554
[   14.954068] Call Trace:
[   14.954085]  <TASK>
[   14.954102]  ? __warn+0xce/0x220
[   14.954126]  ? scx_ops_bypass+0x1d8/0x280
[   14.954150]  ? report_bug+0xc1/0x160
[   14.954178]  ? handle_bug+0x61/0x90
[   14.954203]  ? exc_invalid_op+0x1a/0x50
[   14.954226]  ? asm_exc_invalid_op+0x1a/0x20
[   14.954250]  ? raw_spin_rq_lock_nested+0x15/0x30
[   14.954285]  ? scx_ops_bypass+0x1d8/0x280
[   14.954311]  ? __mutex_unlock_slowpath+0x3a/0x260
[   14.954343]  scx_ops_disable_workfn+0xa3e/0xac0
[   14.954381]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[   14.954413]  kthread_worker_fn+0x101/0x2c0
[   14.954442]  ? __pfx_kthread_worker_fn+0x10/0x10
[   14.954479]  kthread+0xec/0x110
[   14.954507]  ? __pfx_kthread+0x10/0x10
[   14.954530]  ret_from_fork+0x37/0x50
[   14.954553]  ? __pfx_kthread+0x10/0x10
[   14.954576]  ret_from_fork_asm+0x1a/0x30
[   14.954603]  </TASK>
[   14.954621] irq event stamp: 21002
[   14.954644] hardirqs last  enabled at (21001): [<ffffffff863aa35f>] resched_cpu+0x9f/0xd0
[   14.954686] hardirqs last disabled at (21002): [<ffffffff863dd0ba>] scx_ops_bypass+0x11a/0x280
[   14.954735] softirqs last  enabled at (20642): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0
[   14.954782] softirqs last disabled at (20637): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0
[   14.954829] ---[ end trace 0000000000000000 ]---
[   15.022283] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   15.092282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   15.149282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
ok 1 exit #
=====  END  =====

And with it, the test passes without issue after 1000s of runs:

.[root@virtme-ng sched_ext]# ./runner -t exit
===== START =====
TEST: exit
DESCRIPTION: Verify we can cleanly exit a scheduler in multiple places
OUTPUT:
[    7.412856] sched_ext: BPF scheduler "exit" enabled
[    7.427924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.466677] sched_ext: BPF scheduler "exit" enabled
[    7.475923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.512803] sched_ext: BPF scheduler "exit" enabled
[    7.532924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.586809] sched_ext: BPF scheduler "exit" enabled
[    7.595926] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.661923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.723923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
ok 1 exit #
=====  END  =====

=============================

RESULTS:

PASSED:  1
SKIPPED: 0
FAILED:  0

Fixes: f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-25 11:10:51 -10:00
David Vernet
895669fd0d scx: Fix exit selftest to use custom DSQ
In commit 63fb3ec805 ("sched_ext: Allow only user DSQs for
scx_bpf_consume(), scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new()"), we
updated the consume path to only accept user DSQs, thus making it invalid
to consume SCX_DSQ_GLOBAL. This selftest was doing that, so let's create a
custom DSQ and use that instead.  The test now passes:

[root@virtme-ng sched_ext]# ./runner -t exit
===== START =====
TEST: exit
DESCRIPTION: Verify we can cleanly exit a scheduler in multiple places
OUTPUT:
[   12.387229] sched_ext: BPF scheduler "exit" enabled
[   12.406064] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   12.453325] sched_ext: BPF scheduler "exit" enabled
[   12.474064] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   12.515241] sched_ext: BPF scheduler "exit" enabled
[   12.532064] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   12.592063] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   12.654063] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   12.715062] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
ok 1 exit #
=====  END  =====

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-25 07:03:36 -10:00
Vishal Chourasia
4f7f417042 sched_ext: Fix function pointer type mismatches in BPF selftests
Fix incompatible function pointer type warnings in sched_ext BPF selftests by
explicitly casting the function pointers when initializing struct_ops.
This addresses multiple -Wincompatible-function-pointer-types warnings from the
clang compiler where function signatures didn't match exactly.

The void * cast ensures the compiler accepts the function pointer
assignment despite minor type differences in the parameters.

Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-24 06:56:17 -10:00
Ihor Solodrai
9b3c11a867 selftests/sched_ext: add order-only dependency of runner.o on BPFOBJ
The runner.o may start building before libbpf headers are installed,
and as a result build fails. This happened a couple of times on
libbpf/ci test jobs:
  * https://github.com/libbpf/ci/actions/runs/11447667257/job/31849533100
  * https://github.com/theihor/libbpf-ci/actions/runs/11445162764/job/31841649552

Headers are installed in a recipe for $(BPFOBJ) target, and adding an
order-only dependency should ensure this doesn't happen.

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-23 08:56:32 -10:00
David Vernet
60e339be10 sched_ext: Remove unnecessary cpu_relax()
As described in commit b07996c7ab ("sched_ext: Don't hold
scx_tasks_lock for too long"), we're doing a cond_resched() every 32
calls to scx_task_iter_next() to avoid RCU and other stalls. That commit
also added a cpu_relax() to the codepath where we drop and reacquire the
lock, but as Waiman described in [0], cpu_relax() should only be
necessary in busy loops to avoid pounding on a cacheline (or to allow a
hypertwin to more fully utilize a core).

Let's remove the unnecessary cpu_relax().

[0]: https://lore.kernel.org/all/35b3889b-904a-4d26-981f-c8aa1557a7c7@redhat.com/

Cc: Waiman Long <llong@redhat.com>
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-14 13:23:49 -10:00
Tejun Heo
b07996c7ab sched_ext: Don't hold scx_tasks_lock for too long
While enabling and disabling a BPF scheduler, every task is iterated a
couple times by walking scx_tasks. Except for one, all iterations keep
holding scx_tasks_lock. On multi-socket systems under heavy rq lock
contention and high number of threads, this can can lead to RCU and other
stalls.

The following is triggered on a 2 x AMD EPYC 7642 system (192 logical CPUs)
running `stress-ng --workload 150 --workload-threads 10` with >400k idle
threads and RCU stall period reduced to 5s:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:     91-...!: (10 ticks this GP) idle=0754/1/0x4000000000000000 softirq=18204/18206 fqs=17
  rcu:     186-...!: (17 ticks this GP) idle=ec54/1/0x4000000000000000 softirq=25863/25866 fqs=17
  rcu:     (detected by 80, t=10042 jiffies, g=89305, q=33 ncpus=192)
  Sending NMI from CPU 80 to CPUs 91:
  NMI backtrace for cpu 91
  CPU: 91 UID: 0 PID: 284038 Comm: sched_ext_ops_h Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty #471
  Hardware name: Supermicro Super Server/H11DSi, BIOS 2.8 12/14/2023
  Sched_ext: simple (disabling+all)
  RIP: 0010:queued_spin_lock_slowpath+0x17b/0x2f0
  Code: 02 c0 10 03 00 83 79 08 00 75 08 f3 90 83 79 08 00 74 f8 48 8b 11 48 85 d2 74 09 0f 0d 0a eb 0a 31 d2 eb 06 31 d2 eb 02 f3 90 <8b> 07 66 85 c0 75 f7 39 d8 75 0d be 01 00 00 00 89 d8 f0 0f b1 37
  RSP: 0018:ffffc9000fadfcb8 EFLAGS: 00000002
  RAX: 0000000001700001 RBX: 0000000001700000 RCX: ffff88bfcaaf10c0
  RDX: 0000000000000000 RSI: 0000000000000101 RDI: ffff88bfca8f0080
  RBP: 0000000001700000 R08: 0000000000000090 R09: ffffffffffffffff
  R10: ffff88a74761b268 R11: 0000000000000000 R12: ffff88a6b6765460
  R13: ffffc9000fadfd60 R14: ffff88bfca8f0080 R15: ffff88bfcaac0000
  FS:  0000000000000000(0000) GS:ffff88bfcaac0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f5c55f526a0 CR3: 0000000afd474000 CR4: 0000000000350eb0
  Call Trace:
   <NMI>
   </NMI>
   <TASK>
   do_raw_spin_lock+0x9c/0xb0
   task_rq_lock+0x50/0x190
   scx_task_iter_next_locked+0x157/0x170
   scx_ops_disable_workfn+0x2c2/0xbf0
   kthread_worker_fn+0x108/0x2a0
   kthread+0xeb/0x110
   ret_from_fork+0x36/0x40
   ret_from_fork_asm+0x1a/0x30
   </TASK>
  Sending NMI from CPU 80 to CPUs 186:
  NMI backtrace for cpu 186
  CPU: 186 UID: 0 PID: 51248 Comm: fish Kdump: loaded Not tainted 6.12.0-rc2-work-g6bf5681f7ee2-dirty #471

scx_task_iter can safely drop locks while iterating. Make
scx_task_iter_next() drop scx_tasks_lock every 32 iterations to avoid
stalls.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-10-10 11:41:44 -10:00
Tejun Heo
967da57832 sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers
Iterating with scx_task_iter involves scx_tasks_lock and optionally the rq
lock of the task being iterated. Both locks can be released during iteration
and the iteration can be continued after re-grabbing scx_tasks_lock.
Currently, all lock handling is pushed to the caller which is a bit
cumbersome and makes it difficult to add lock-aware behaviors. Make the
scx_task_iter helpers handle scx_tasks_lock.

- scx_task_iter_init/scx_taks_iter_exit() now grabs and releases
  scx_task_lock, respectively. Renamed to
  scx_task_iter_start/scx_task_iter_stop() to more clearly indicate that
  there are non-trivial side-effects.

- Add __ prefix to scx_task_iter_rq_unlock() to indicate that the function
  is internal.

- Add scx_task_iter_unlock/relock(). The former drops both rq lock (if held)
  and scx_tasks_lock and the latter re-locks only scx_tasks_lock.

This doesn't cause behavior changes and will be used to implement stall
avoidance.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-10-10 11:41:44 -10:00
Tejun Heo
aebe7ae4cb sched_ext: bypass mode shouldn't depend on ops.select_cpu()
Bypass mode was depending on ops.select_cpu() which can't be trusted as with
the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
bypass mode.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-10-10 11:41:44 -10:00
Tejun Heo
cc3e1caca9 sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()
Move the sanity check from the inner function scx_select_cpu_dfl() to the
exported kfunc scx_bpf_select_cpu_dfl(). This doesn't cause behavior
differences and will allow using scx_select_cpu_dfl() in bypass mode
regardless of scx_builtin_idle_enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-10 11:41:44 -10:00
Tejun Heo
3fdb9ebcec sched_ext: Start schedulers with consistent p->scx.slice values
The disable path caps p->scx.slice to SCX_SLICE_DFL. As the field is already
being ignored at this stage during disable, the only effect this has is that
when the next BPF scheduler is loaded, it won't see unreasonable left-over
slices. Ultimately, this shouldn't matter but it's better to start in a
known state. Drop p->scx.slice capping from the disable path and instead
reset it to SCX_SLICE_DFL in the enable path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-10-10 11:41:44 -10:00
Tejun Heo
54baa7ac0c Revert "sched_ext: Use shorter slice while bypassing"
This reverts commit 6f34d8d382.

Slice length is ignored while bypassing and tasks are switched on every tick
and thus the patch does not make any difference. The perceived difference
was from test noise.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-10-10 11:41:44 -10:00
Honglei Wang
c425180d88 sched_ext: use correct function name in pick_task_scx() warning message
pick_next_task_scx() was turned into pick_task_scx() since
commit 753e2836d1 ("sched_ext: Unify regular and core-sched pick
task paths"). Update the outdated message.

Signed-off-by: Honglei Wang <jameshongleiwang@126.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-10 06:35:28 -10:00
Björn Töpel
7941b83bce selftests: sched_ext: Add sched_ext as proper selftest target
The sched_ext selftests is missing proper cross-compilation support, a
proper target entry, and out-of-tree build support.

When building the kselftest suite, e.g.:

  make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-  \
    TARGETS=sched_ext SKIP_TARGETS="" O=/output/foo \
    -C tools/testing/selftests install

or:

  make ARCH=arm64 LLVM=1 TARGETS=sched_ext SKIP_TARGETS="" \
    O=/output/foo -C tools/testing/selftests install

The expectation is that the sched_ext is included, cross-built, the
correct toolchain is picked up, and placed into /output/foo.

In contrast to the BPF selftests, the sched_ext suite does not use
bpftool at test run-time, so it is sufficient to build bpftool for the
build host only.

Add ARCH, CROSS_COMPILE, OUTPUT, and TARGETS support to the sched_ext
selftest. Also, remove some variables that were unused by the
Makefile.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: David Vernet <void@manifault.com>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-09 06:42:51 -10:00
Devaansh-Kumar
e0ed52154e sched_ext: Documentation: Update instructions for running example schedulers
Since the artifact paths for tools changed, we need to update the documentation to reflect that path.

Signed-off-by: Devaansh-Kumar <devaanshk840@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-08 08:49:18 -10:00
Tejun Heo
9b671793c7 sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
tell whether ops.select_cpu() was called. This is incorrect as
ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
incorrectly skipping direct dispatch for tasks that are bound to a single
CPU.

sched core has been updated to specify ENQUEUE_RQ_SELECTED if
->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
scx_qmap to test it instead of SCX_ENQ_WAKEUP.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-10-07 10:16:18 -10:00
Tejun Heo
f207dc2dcd sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
migration is disabled. sched_ext schedulers may perform operations such as
direct dispatch from ->select_task_rq() path and it is useful for them to
know whether ->select_task_rq() was skipped in the ->enqueue_task() path.

Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
and end up assuming incorrectly that ->select_task_rq() was called for tasks
that are bound to a single CPU or migration disabled.

Make select_task_rq() indicate whether ->select_task_rq() was called by
setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
to ENQUEUE_RQ_SELECTED for ->enqueue_task().

This will be used by sched_ext to fix ->select_task_rq() skip detection.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-10-07 10:16:18 -10:00
Tejun Heo
b62933eee4 sched/core: Make select_task_rq() take the pointer to wake_flags instead of value
This will be used to allow select_task_rq() to indicate whether
->select_task_rq() was called by modifying *wake_flags.

This makes try_to_wake_up() call all functions that take wake_flags with
WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
more consistent, and, as the flag is only tested by ->select_task_rq()
implementations, it doesn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-10-07 10:16:18 -10:00
Tejun Heo
ec010333ce sched_ext: scx_cgroup_exit() may be called without successful scx_cgroup_init()
568894edbe ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations
and fix scx_tg_online()") assumed that scx_cgroup_exit() is only called
after scx_cgroup_init() finished successfully. This isn't true.
scx_cgroup_exit() can be called without scx_cgroup_init() being called at
all or after scx_cgroup_init() failed in the middle.

As init state is tracked per cgroup, scx_cgroup_exit() can be used safely to
clean up in all cases. Remove the incorrect WARN_ON_ONCE().

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 568894edbe ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online()")
2024-10-04 10:12:11 -10:00
Tejun Heo
cc9877fb76 sched_ext: Improve error reporting during loading
When the BPF scheduler fails, ops.exit() allows rich error reporting through
scx_exit_info. Use scx.exit() path consistently for all failures which can
be caused by the BPF scheduler:

- scx_ops_error() is called after ops.init() and ops.cgroup_init() failure
  to record error information.

- ops.init_task() failure now uses scx_ops_error() instead of pr_err().

- The err_disable path updated to automatically trigger scx_ops_error() to
  cover cases that the error message hasn't already been generated and
  always return 0 indicating init success so that the error is reported
  through ops.exit().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-10-04 10:11:43 -10:00
Vishal Chourasia
fcbc423577 sched_ext: Add __weak markers to BPF helper function decalarations
Fix build errors by adding __weak markers to BPF helper function
declarations in header files. This resolves static assertion failures
in scx_qmap.bpf.c and scx_flatcg.bpf.c where functions like
scx_bpf_dispatch_from_dsq_set_slice, scx_bpf_dispatch_from_dsq_set_vtime,
and scx_bpf_task_cgroup were missing the __weak attribute.

[1] https://lore.kernel.org/all/ZvvfUqRNM4-jYQzH@linux.ibm.com

Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-02 06:53:12 -10:00
Zhang Qiao
95b873693a sched_ext: Remove redundant p->nr_cpus_allowed checker
select_rq_task() already checked that 'p->nr_cpus_allowed > 1',
'p->nr_cpus_allowed == 1' checker in scx_select_cpu_dfl() is redundant.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:23:45 -10:00
Tejun Heo
efe231d9de sched_ext: Decouple locks in scx_ops_enable()
The enable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems.

For example, currently, there is a possible deadlock involving
scx_fork_rwsem and cpus_read_lock. cpus_read_lock has to nest inside
scx_fork_rwsem due to locking order existing in other subsystems. However,
there exists a dependency in the other direction during hotplug if hotplug
needs to fork a new task, which happens in some cases. This leads to the
following deadlock:

       scx_ops_enable()                               hotplug

                                          percpu_down_write(&cpu_hotplug_lock)
   percpu_down_write(&scx_fork_rwsem)
   block on cpu_hotplug_lock
                                          kthread_create() waits for kthreadd
					  kthreadd blocks on scx_fork_rwsem

Note that this doesn't trigger lockdep because the hotplug side dependency
bounces through kthreadd.

With the preceding scx_cgroup_enabled change, this can be solved by
decoupling cpus_read_lock, which is needed for static_key manipulations,
from the other two locks.

- Move the first block of static_key manipulations outside of scx_fork_rwsem
  and scx_cgroup_rwsem. This is now safe with the preceding
  scx_cgroup_enabled change.

- Drop scx_cgroup_rwsem and scx_fork_rwsem between the two task iteration
  blocks so that __scx_ops_enabled static_key enabling is outside the two
  rwsems.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
2024-09-27 10:02:40 -10:00
Tejun Heo
160216568c sched_ext: Decouple locks in scx_ops_disable_workfn()
The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems. With the preceding scx_cgroup_enabled change, we can
decouple them:

- As cgroup disabling no longer requires modifying a static_key which
  requires cpus_read_lock(), no need to grab cpus_read_lock() before
  grabbing scx_cgroup_rwsem.

- cgroup can now be independently disabled before tasks are moved back to
  the fair class.

Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop
now unnecessary cpus_read_lock() and move static_key operations out of
scx_fork_rwsem. This decouples all three locks in the disable path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
2024-09-27 10:02:40 -10:00
Tejun Heo
568894edbe sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online()
If the BPF scheduler does not implement ops.cgroup_init(), scx_tg_online()
didn't set SCX_TG_INITED which meant that ops.cgroup_exit(), even if
implemented, won't be called from scx_tg_offline(). This is because
SCX_HAS_OP(cgroupt_init) is used to test both whether SCX cgroup operations
are enabled and ops.cgroup_init() exists.

Fix it by introducing a separate bool scx_cgroup_enabled to gate cgroup
operations and use SCX_HAS_OP(cgroup_init) only to test whether
ops.cgroup_init() exists. Make all cgroup operations consistently use
scx_cgroup_enabled to test whether cgroup operations are enabled.
scx_cgroup_enabled is added instead of using scx_enabled() to ease planned
locking updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:40 -10:00
Tejun Heo
4269c603cc sched_ext: Enable scx_ops_init_task() separately
scx_ops_init_task() and the follow-up scx_ops_enable_task() in the fork path
were gated by scx_enabled() test and thus __scx_ops_enabled had to be turned
on before the first scx_ops_init_task() loop in scx_ops_enable(). However,
if an external entity causes sched_class switch before the loop is complete,
tasks which are not initialized could be switched to SCX.

The following can be reproduced by running a program which keeps toggling a
process between SCHED_OTHER and SCHED_EXT using sched_setscheduler(2).

  sched_ext: Invalid task state transition 0 -> 3 for fish[1623]
  WARNING: CPU: 1 PID: 1650 at kernel/sched/ext.c:3392 scx_ops_enable_task+0x1a1/0x200
  ...
  Sched_ext: simple (enabling)
  RIP: 0010:scx_ops_enable_task+0x1a1/0x200
  ...
   switching_to_scx+0x13/0xa0
   __sched_setscheduler+0x850/0xa50
   do_sched_setscheduler+0x104/0x1c0
   __x64_sys_sched_setscheduler+0x18/0x30
   do_syscall_64+0x7b/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix it by gating scx_ops_init_task() separately using
scx_ops_init_task_enabled. __scx_ops_enabled is now set after all tasks are
finished with scx_ops_init_task().

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:40 -10:00
Tejun Heo
9753358a6a sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable()
scx_ops_enable() has two task iteration loops. The first one calls
scx_ops_init_task() on every task and the latter switches the eligible ones
into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the
second loop switched it into READY before switching the task into SCX.

The distinction between INIT and READY is only meaningful in the fork path
where it's used to tell whether the task finished forking so that we can
tell ops.exit_task() accordingly. Leaving task in INIT state between the two
loops is incosistent with the fork path and incorrect. The following can be
triggered by running a program which keeps toggling a task between
SCHED_OTHER and SCHED_SCX while enabling a task:

  sched_ext: Invalid task state transition 1 -> 3 for fish[1526]
  WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200
  ...
  Sched_ext: qmap (enabling+all)
  RIP: 0010:scx_ops_enable_task+0x1a1/0x200
  ...
   switching_to_scx+0x13/0xa0
   __sched_setscheduler+0x850/0xa50
   do_sched_setscheduler+0x104/0x1c0
   __x64_sys_sched_setscheduler+0x18/0x30
   do_syscall_64+0x7b/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix it by transitioning to READY in the first loop right after
scx_ops_init_task() succeeds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
2024-09-27 10:02:40 -10:00
Tejun Heo
8c2090c504 sched_ext: Initialize in bypass mode
scx_ops_enable() used preempt_disable() around the task iteration loop to
switch tasks into SCX to guarantee forward progress of the task which is
running scx_ops_enable(). However, in the gap between setting
__scx_ops_enabled and preeempt_disable(), an external entity can put tasks
including the enabling one into SCX prematurely, which can lead to
malfunctions including stalls.

The bypass mode can wrap the entire enabling operation and guarantee forward
progress no matter what the BPF scheduler does. Use the bypass mode instead
to guarantee forward progress while enabling.

While at it, release and regrab scx_tasks_lock between the two task
iteration locks in scx_ops_enable() for clarity as there is no reason to
keep holding the lock between them.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:40 -10:00
Tejun Heo
fc1fcebead sched_ext: Remove SCX_OPS_PREPPING
The distinction between SCX_OPS_PREPPING and SCX_OPS_ENABLING is not used
anywhere and only adds confusion. Drop SCX_OPS_PREPPING.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27 10:02:39 -10:00
Tejun Heo
1bbcfe620e sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable()
check_hotplug_seq() is used to detect CPU hotplug event which occurred while
the BPF scheduler is being loaded so that initialization can be retried if
CPU hotplug events take place before the CPU hotplug callbacks are online.

As such, the best place to call it is in the same cpu_read_lock() section
that enables the CPU hotplug ops. Currently, it is called in the next
cpus_read_lock() block in scx_ops_enable(). The side effect of this
placement is a small window in which hotplug sequence detection can trigger
unnecessarily, which isn't critical.

Move check_hotplug_seq() invocation to the same cpus_read_lock() block as
the hotplug operation enablement to close the window and get the invocation
out of the way for planned locking updates.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
2024-09-27 10:02:39 -10:00
Tejun Heo
6f34d8d382 sched_ext: Use shorter slice while bypassing
While bypassing, tasks are scheduled in FIFO order which favors tasks that
hog CPUs. This can slow down e.g. unloading of the BPF scheduler. While
bypassing, guaranteeing timely forward progress is the main goal. There's no
point in giving long slices. Shorten the time slice used while bypassing
from 20ms to 5ms.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-09-26 12:56:46 -10:00
Tejun Heo
b7b3b2dbae sched_ext: Split the global DSQ per NUMA node
In the bypass mode, the global DSQ is used to schedule all tasks in simple
FIFO order. All tasks are queued into the global DSQ and all CPUs try to
execute tasks from it. This creates a lot of cross-node cacheline accesses
and scheduling across the node boundaries, and can lead to live-lock
conditions where the system takes tens of minutes to disable the BPF
scheduler while executing in the bypass mode.

Split the global DSQ per NUMA node. Each node has its own global DSQ. When a
task is dispatched to SCX_DSQ_GLOBAL, it's put into the global DSQ local to
the task's CPU and all CPUs in a node only consume its node-local global
DSQ.

This resolves a livelock condition which could be reliably triggered on an
2x EPYC 7642 system by running `stress-ng --race-sched 1024` together with
`stress-ng --workload 80 --workload-threads 10` while repeatedly enabling
and disabling a SCX scheduler.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-09-26 12:56:46 -10:00
Tejun Heo
bba26bf356 sched_ext: Relocate find_user_dsq()
To prepare for the addition of find_global_dsq(). No functional changes.

Signed-off-by: tejun heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-09-26 12:56:46 -10:00
Tejun Heo
63fb3ec805 sched_ext: Allow only user DSQs for scx_bpf_consume(), scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new()
SCX_DSQ_GLOBAL is special in that it can't be used as a priority queue and
is consumed implicitly, but all BPF DSQ related kfuncs could be used on it.
SCX_DSQ_GLOBAL will be split per-node for scalability and those operations
won't make sense anymore. Disallow SCX_DSQ_GLOBAL on scx_bpf_consume(),
scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new(). This means that
SCX_DSQ_GLOBAL can only be used as a dispatch target from BPF schedulers.

With scx_flatcg, which was using SCX_DSQ_GLOBAL as the fallback DSQ,
updated, this shouldn't affect any schedulers.

This leaves find_dsq_for_dispatch() the only user of find_non_local_dsq().
Open code and remove find_non_local_dsq().

Signed-off-by: tejun heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-09-26 12:56:46 -10:00
Tejun Heo
c9c809f413 scx_flatcg: Use a user DSQ for fallback instead of SCX_DSQ_GLOBAL
scx_flatcg was using SCX_DSQ_GLOBAL for fallback handling. However, it is
assuming that SCX_DSQ_GLOBAL isn't automatically consumed, which was true a
while ago but is no longer the case. Also, there are further changes planned
for SCX_DSQ_GLOBAL which will disallow explicit consumption from it. Switch
to a user DSQ for fallback.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
2024-09-26 12:56:46 -10:00
Tejun Heo
a748db0c8c tools/sched_ext: Receive misc updates from SCX repo
Receive misc tools/sched_ext updates from https://github.com/sched-ext/scx
to sync userspace bits.

- LSP macros to help language servers.

- bpf_cpumask_weight() declaration and cast_mask() helper.

- Cosmetic updates to scx_flatcg.bpf.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-25 12:22:37 -10:00
Tejun Heo
1e123fd73d sched_ext: Add __COMPAT helpers for features added during v6.12 devel cycle
cgroup support and scx_bpf_dispatch[_vtime]_from_dsq() are newly added since
8bb30798fd ("sched_ext: Fixes incorrect type in bpf_scx_init()") which is
the current earliest commit targeted by BPF schedulers. Add compat helpers
for them and apply them in the example schedulers.

These will be dropped after a few kernel releases. The exact backward
compatibility window hasn't been decided yet.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-25 11:58:44 -10:00
Tejun Heo
42268ad0eb sched_ext: Build fix for !CONFIG_SMP
move_remote_task_to_local_dsq() is only defined on SMP configs but
scx_disaptch_from_dsq() was calling move_remote_task_to_local_dsq() on UP
configs too causing build failures. Add a dummy
move_remote_task_to_local_dsq() which triggers a warning.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 4c30f5ce4f ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409241108.jaocHiDJ-lkp@intel.com/
2024-09-24 11:10:07 -10:00
Linus Torvalds
68e5c7d4ce Kbuild updates for v6.12
- Support cross-compiling linux-headers Debian package and kernel-devel
    RPM package
 
  - Add support for the linux-debug Pacman package
 
  - Improve module rebuilding speed by factoring out the common code to
    scripts/module-common.c
 
  - Separate device tree build rules into scripts/Makefile.dtbs
 
  - Add a new script to generate modules.builtin.ranges, which is useful
    for tracing tools to find symbols in built-in modules
 
  - Refactor Kconfig and misc tools
 
  - Update Kbuild and Kconfig documentation
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmby2+QVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGpQ0QALWMgox3OdceNiBT8QieqRFfwKFv
 5jxtsZt+MbTdWNMEfgc4Cq2i5ZAqpYGZh32RwTiZJogBvYEIoO7M4Md9VwoEe/BC
 q8VZ6FhUy7358IX/FCukfB0dYvkziRalBRDrE4iFmMMdhBvZ9nrvMxllqFCMllLj
 DTrBTTiMus3qiiczr4tb5QwaIR6C+yqiEBF++ftLmWvo9dn8YNNUnI65fGjyQM/w
 0wMPwsB3Y2HdnRpLUS6T18gZbjoXsAk4+WX0TpdBfTs3d7AdbzlSMtc0BslEm6Tb
 JjIK6SbJCM3kNC7O0/gsUenOaSBxSbKjjg33gQxn/eNoi0nRt+qnBMMreYiTd95G
 Hq86QcNfKQtWAagKRTppMkYEDqMU2RKH7BmJOsfQyeG9cGpAAu+0HsQv3f/h5QP1
 MlA8o+NP5oQn6RbrhZz1Pqm24+OMxiXaBhmo8XbZ+MXzi/CBR54Eo4ip/FSHzXII
 EGEAQL7t7YU7xu8qMIE6ZQMH7BJsjJNee0vrNiYZa4xHLYyHi6mJl8K6LlHQ3nEx
 WOsPX9MLITtSJwcvIio/0sEnuR7pjcShGfqhbHO5tiOYznsbcSvu3+18HPGCpFRt
 vYFkNIRc298k7++A+Zp2wwdD2TS+SSilrAImmJXMhf0M+Nyg2vnlfAo8t0QSkFlh
 1g9dJuy+8jYRjHXP
 =g4t/
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support cross-compiling linux-headers Debian package and kernel-devel
   RPM package

 - Add support for the linux-debug Pacman package

 - Improve module rebuilding speed by factoring out the common code to
   scripts/module-common.c

 - Separate device tree build rules into scripts/Makefile.dtbs

 - Add a new script to generate modules.builtin.ranges, which is useful
   for tracing tools to find symbols in built-in modules

 - Refactor Kconfig and misc tools

 - Update Kbuild and Kconfig documentation

* tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
  kbuild: doc: replace "gcc" in external module description
  kbuild: doc: describe the -C option precisely for external module builds
  kbuild: doc: remove the description about shipped files
  kbuild: doc: drop section numbering, use references in modules.rst
  kbuild: doc: throw out the local table of contents in modules.rst
  kbuild: doc: remove outdated description of the limitation on -I usage
  kbuild: doc: remove description about grepping CONFIG options
  kbuild: doc: update the description about Kbuild/Makefile split
  kbuild: remove unnecessary export of RUST_LIB_SRC
  kbuild: remove append operation on cmd_ld_ko_o
  kconfig: cache expression values
  kconfig: use hash table to reuse expressions
  kconfig: refactor expr_eliminate_dups()
  kconfig: add comments to expression transformations
  kconfig: change some expr_*() functions to bool
  scripts: move hash function from scripts/kconfig/ to scripts/include/
  kallsyms: change overflow variable to bool type
  kallsyms: squash output_address()
  kbuild: add install target for modules.builtin.ranges
  scripts: add verifier script for builtin module range data
  ...
2024-09-24 13:02:06 -07:00
Linus Torvalds
7f8de2bf07 linux-cpupower-6.12-rc1-fixes
This cpupower fixes update consists fix to raw_pylibcpupower.i being
 removed by "make mrproper". "*.i", "*.o" files are generated during
 kernel compile and removed when the repo is cleaned by mrproper.
 
 The file is renamed to use .swg extension instead to avoid the problem.
 The second patch removes references to raw_pylibcpupower.i from .gitignore.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbxyG0ACgkQCwJExA0N
 Qxxf4RAAt7Ow3RePcze5fBsl2GkIGDw3rWGxmr0Qbh/YesD/o/eOtykqe4RobqqX
 v1VSwbdJEquD4Ll4i3i4np/82zWK9YOAWRcXt9x8HfecfSFhMY0M76+wghhS+zWK
 b0V+y4xiELcmaZGiBaHwfbazHmpwpYHukzZSjK/ll0u1umCIPPVSvVtI7240IQ4b
 MAI3K/wIvHyQgnkVaqdVzqhRxitIwXuy2hw0aNzwJ66FM7EwvuBDazKhPQU355hH
 6eD0GRs7ffcLdG9cLeKd/BewIv7Fc6pkgUZ5JYmmehYGZUPvhvAgoxqwD0FOs99s
 bt5YWrIBx45psR8PJiVtL3ZJPNqNQt9LRsHD0ANKGk+yMZoKoeEUP367hGy48ZPi
 1HdMnn9aMVm/mP7wuL9nre8rV+NFooFC/hkwyBeWIQMSFI9m4KY8ifDc7Dl+r4N2
 5oX+4QKoXoNQEiPN4Ap11dhUqaxFunjcN1zjMCE63cLLzfFBIDRypAnpuypbajca
 DQZz4dcxSoNBxl0ylwiqyYfAfUoPfffT2np+ZawnBGtwGFf8gKnCuEeMjwXJg0Ym
 Cgp4mjp+xyQgwOVS0HSQ3jGrTLfLJeSUxMugc+6Mao6v8U6cMSUxNA8Le6iBHjrA
 YQFXCj1qeoD/M7LhMIseBsACmd4JwVZgoO3V+eS45LICxEgHYPY=
 =cwcN
 -----END PGP SIGNATURE-----

Merge tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux

Pull cpupower updates from Shuah Khan
 "The 'raw_pylibcpupower.i' file was being removed by "make mrproper".

  That was because '*.i', '.s' and '*.o' files are generated during
  kernel compile and removed when the repo is cleaned by mrproper.

  Rename it to use .swg extension instead to avoid the problem.

  A second patch removes references to it from .gitignore"

* tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
  pm: cpupower: Clean up bindings gitignore
  pm: cpupower: rename raw_pylibcpupower.i
2024-09-24 12:57:46 -07:00
Linus Torvalds
cd3d647729 I3C for 6.12
Core:
  - allow adjusting first broadcast address speed
 
 Drivers:
  - cdns: few fixes
  - mipi-i3c-hci: Add AMD SoC I3C controller support and quirks, fix get_i3c_mode
  - svc: adjust rates, fix race condition
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmbx6h4ACgkQY6TcMGxw
 OjL1qw/9HTddkWARFkD7eS5/GuzBdsXlaUcAZZHgbu8AGLVAMJbmB8MPSYpkP1dj
 1MDBXBgUwNjP8/Pbn3il3hvOcRje7SsvtHivYho+OA2mnLBxKIJUBfSPswiW9Y00
 j2FtJCfBFms5+UC4fX2SywSgkEWp5a7RCOW2d69nwYqkoWlpSwIRRoVerFXKyCto
 iCCdos/KyvblD2tMT+fWtSHQRmPds9Fytl9RwHdCKRmlGi9qONuqJK2twlUnE164
 bDk7C5JZuadX4l1pwOINlAOcpljEdANlSPq+5zmvGPqlSMMwFavhLEUPC2gJqOwQ
 bZXcWJs12vn28ZDeob0NDHIOJvLOrMDKWOtCQoxPMDfd00QO6+HI0vG5S7+yh+Q0
 B/UDU+PwR48OncVys4P4kvu82NU9OeBqeg6QK1KLdKKQpw9K5zNca2qTlHX3ZYKd
 fQGu1G2K7FcZFi4WduKzUXbo7dIv69hi1uj4gbXii79LiOaZdtCAnYKn1qX7aeWM
 bwutMqLcug8vxqRxIR+23Ki7QLr55TPzj3YCGQsPPnmKAnkHJeEUiCtPJ5Ymytb8
 GWzW0PVEdsFg1LB9Ei1K41/JICx62h/U0X2OdJnYx4JIij0e8GcoItYVsKsq+mG0
 f7g2kBZDMDcAzaoHl7lgW5gBB8NnFmC0+74pqnjrBiSfzWkNFcM=
 =jSyb
 -----END PGP SIGNATURE-----

Merge tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c updates from Alexandre Belloni:
 "This adds support for the I3C HCI controller of the AMD SoC which as
  expected requires quirks. Also fixes for the other drivers, including
  rate selection fixes for svc.

  Core:
   - allow adjusting first broadcast address speed

  Drivers:
   - cdns: few fixes
   - mipi-i3c-hci: Add AMD SoC I3C controller support and quirks, fix
     get_i3c_mode
   - svc: adjust rates, fix race condition"

* tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: master: svc: Fix use after free vulnerability in svc_i3c_master Driver Due to Race Condition
  i3c: master: cdns: Fix use after free vulnerability in cdns_i3c_master Driver Due to Race Condition
  i3c: master: svc: adjust SDR according to i3c spec
  i3c: master: svc: use slow speed for first broadcast address
  i3c: master: support to adjust first broadcast address speed
  i3c/master: cmd_v1: Fix the rule for getting i3c mode
  i3c: master: cdns: fix module autoloading
  i3c: mipi-i3c-hci: Add a quirk to set Response buffer threshold
  i3c: mipi-i3c-hci: Add a quirk to set timing parameters
  i3c: mipi-i3c-hci: Relocate helper macros to HCI header file
  i3c: mipi-i3c-hci: Add a quirk to set PIO mode
  i3c: mipi-i3c-hci: Read HC_CONTROL_PIO_MODE only after i3c hci v1.1
  i3c: mipi-i3c-hci: Add AMDI5017 ACPI ID to the I3C Support List
2024-09-24 12:53:54 -07:00
Linus Torvalds
ba0c0cb56f remoteproc: k3-m4: use the proper dependencies
The TI_K3_M4_REMOTEPROC Kconfig entry selects OMAP2PLUS_MBOX, but that
driver in turn depends on other things, which the k4-m4 driver didn't.

This causes a Kconfig time warning:

  WARNING: unmet direct dependencies detected for OMAP2PLUS_MBOX
    Depends on [n]: MAILBOX [=y] && (ARCH_OMAP2PLUS || ARCH_K3)
    Selected by [m]:
    - TI_K3_M4_REMOTEPROC [=m] && REMOTEPROC [=y] && (ARCH_K3 || COMPILE_TEST [=y])

because you can't select something that is unavailable.

Make the dependencies for TI_K3_M4_REMOTEPROC match those of the
OMAP2PLUS_MBOX driver that it needs.

Fixes: ebcf9008a8 ("remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem")
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Martyn Welch <martyn.welch@collabora.com>
Cc: Hari Nagalla <hnagalla@ti.com>
Cc: Andrew Davis <afd@ti.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-24 12:48:35 -07:00
Linus Torvalds
9ae2940cbc Input updates for v6.12-rc0
- support for PixArt PS/2 touchpad
 
 - updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers
 
 - support for GPIO-only mode for ADP55888 controller
 
 - support for touch keys in Zinitix driver
 
 - support for querying density of Synaptics sensors
 
 - sysfs interface for Goodex "Berlin" devices to read and write touch IC
   registers
 
 - more quirks to i8042 to handle various Tuxedo laptops
 
 - a number of drivers have been converted to using "guard" notation
   when acquiring various locks, as well as using other cleanup functions
   to simplify releasing of resources (with more drivers to follow)
 
 - evdev will limit amount of data that can be written into an evdev
   instance at a given time to 4096 bytes (170 input events) to avoid
   holding evdev->mutex for too long and starving other users
 
 - Spitz has been converted to use software nodes/properties to describe
   its matrix keypad and GPIO-connected LEDs
 
 - msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
   removed since noone in mainline have been using them
 
 - other assorted cleanups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCZvGnWgAKCRBAj56VGEWX
 nDjGAQCHcRk1icJlyjx3KA85gdILriDB9zMNJnCnbYXrSfGbOQD+LZNjO3po26zB
 wloLEYTKRu3E/oWEI8VcfIN2m89vxw4=
 =bDsb
 -----END PGP SIGNATURE-----

Merge tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

 - support for PixArt PS/2 touchpad

 - updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers

 - support for GPIO-only mode for ADP55888 controller

 - support for touch keys in Zinitix driver

 - support for querying density of Synaptics sensors

 - sysfs interface for Goodex "Berlin" devices to read and write touch
   IC registers

 - more quirks to i8042 to handle various Tuxedo laptops

 - a number of drivers have been converted to using "guard" notation
   when acquiring various locks, as well as using other cleanup
   functions to simplify releasing of resources (with more drivers to
   follow)

 - evdev will limit amount of data that can be written into an evdev
   instance at a given time to 4096 bytes (170 input events) to avoid
   holding evdev->mutex for too long and starving other users

 - Spitz has been converted to use software nodes/properties to describe
   its matrix keypad and GPIO-connected LEDs

 - msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
   removed since noone in mainline have been using them

 - other assorted cleanups and fixes

* tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (98 commits)
  ARM: spitz: fix compile error when matrix keypad driver is enabled
  Input: hynitron_cstxxx - drop explicit initialization of struct i2c_device_id::driver_data to 0
  Input: adp5588-keys - fix check on return code
  Input: Convert comma to semicolon
  Input: i8042 - add TUXEDO Stellaris 15 Slim Gen6 AMD to i8042 quirk table
  Input: i8042 - add another board name for TUXEDO Stellaris Gen5 AMD line
  Input: tegra-kbc - use of_property_read_variable_u32_array() and of_property_present()
  Input: ps2-gpio - use IRQF_NO_AUTOEN flag in request_irq()
  Input: ims-pcu - fix calling interruptible mutex
  Input: zforce_ts - switch to using asynchronous probing
  Input: zforce_ts - remove assert/deassert wrappers
  Input: zforce_ts - do not hardcode interrupt level
  Input: zforce_ts - switch to using devm_regulator_get_enable()
  Input: zforce_ts - stop treating VDD regulator as optional
  Input: zforce_ts - make zforce_idtable constant
  Input: zforce_ts - use dev_err_probe() where appropriate
  Input: zforce_ts - do not ignore errors when acquiring regulator
  Input: zforce_ts - make parsing of contacts less confusing
  Input: zforce_ts - switch to using get_unaligned_le16
  Input: zforce_ts - use guard notation when acquiring mutexes
  ...
2024-09-24 12:42:35 -07:00
Linus Torvalds
6db6a19f1a hwspinlock update for v6.12
This converts the Spreadtrum hardware spinlock DeviceTree binding to
 YAML, to allow validation of related DeviceTree source.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmbyReoVHGFuZGVyc3Nv
 bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3FGJkP/jMfaXkO+QMtt2jJqOOC/ebaviEP
 k8EILUy7EhYnfkJdzdaUKgN6M+xSIClRssmoJq0xUgs7LNq9mqViBw3ewxpP8JQe
 ychmnqogzbRHnKaH1QWxqeE1PfEUCBqTcT7c+3L2QDDnhjIIViObMarbwBJ6sFnf
 Yb9nG0gOG26W81eG32FRQ+Z6+Tnf6vLT9PjgYZ2UbG01onZuAG1KKhcQrckbPt3p
 jpTw/0gfJMIauuFtC5ouhVMeWnA9ksstRogOd2gjjd5jUIdC/7Ff3IkAJ04aqpSy
 t8NphBGOHqageyVczDt4kjmgQti4bL+EemgJ5HTkzQ9PFFJJ8bwueRpodJGW4I7s
 /SAxEfGFSlaB1Ll/dX9drsARir2CRT/nJPWcSKQL7G9/5Cq5ywfQsmN3NPZjlrQH
 cX37leOK1Xpw1TN60kWccZKn2/BIuDY7XLnyVezj/2tdmJhvP7+05G09qZVoHxGQ
 MDmHz0htENW+yGcPfVtGvj92vBsXpVmwUo6VXp7Cu/1U64rzzKa/TAm0QBZhLYTX
 1GaQb5STMvbUx0l8KBAN1taKhfaeEbxF/SMbxKgIjCTpybhCj6q787QvBT3BICh4
 jDRMeCM5PpXFIrFDA+oyMjqx9aXRxgj9O5c3HRvPjxDbMKyKAW2Twf99SZTn6StZ
 2JXQcFR117EFZYt1
 =DyIB
 -----END PGP SIGNATURE-----

Merge tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull hwspinlock update from Bjorn Andersson:
 "This converts the Spreadtrum hardware spinlock DeviceTree binding to
  YAML, to allow validation of related DeviceTree source"

* tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  dt-bindings: hwlock: sprd-hwspinlock: convert to YAML
2024-09-24 12:33:22 -07:00
Linus Torvalds
6e10aa1fee rpmsg updates for v6.12
This performs minor cleanup/refactor of the Qualcomm GLINK code, in
 order to then add trace events related to the messages exchange with the
 remote side, useful for debugging a range of interoperability issues.
 
 The nested structs with flexible array members, in the same driver, is
 then rewritten to avoid the risk of invalid accesses.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmbyRXAVHGFuZGVyc3Nv
 bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3FCmwQAK4rnPNCnJsPoZDQx41JoWwIwRMu
 Xn3CBwAN24ysfSYusqQrtYtru6KIVg3B1F0Qu/bDCmRG9rq0jGgnUDK368OX9Tpm
 P1rZTCqvVe8iZms69uz/aJN+3ZAjUH6kJlrTlCRuSoDFGirohjqIUUYQylGvNFQH
 E27wAVNH7W8ahXOZP94T2w39rIN4+EB5G8BTZmdFXRW4V1DPYoxcoAc9a9AZqsea
 mPkvIpF0aKHMD8NnPjEutMLC51RCTb6IrqJknmd1IhREVai0d4pc3UlwU/NpIMWO
 pDu4EJbYzFSj1wkgPeaaQ05fQ9mCMM9y4fXKJFSKqX50Iq9HCpr7NR2XYiv+mGLJ
 F7tMhmu0sfOGg8t8Zzh92I2WoTiQs0JdikZ+CJo2I5ZkNCPbGUe/DvZfyUE3SEjt
 SRhgBzhyZ+aREzzmbkwfwmEtJLahgJDvC9kj9EUX9wRdE4gG9Uwb2YpUU6pguKTX
 XhKNUey19KVUZjBdcOL4eau7FrZiP7D8cYrwH870t027gdNahPD8DsxC8l52NlNV
 0IST3gIRZneiOEW0OA/5i8O6SHJaJ5A0aOpzm0uMiiXu7cL+3iW9PwtwywsjObGX
 QNjtNtHfxuXFK3P8UlVfyW1MvmF7aA7E4Y3v43yThITUfD+BDC19+4e2jjCbLt40
 vfxUkuJSMmuHWCr2
 =hQET
 -----END PGP SIGNATURE-----

Merge tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull rpmsg updates from Bjorn Andersson:

 - Minor cleanup/refactor to the Qualcomm GLINK code, in order to add
   trace events related to the messages exchange with the remote side,
   useful for debugging a range of interoperability issues

 - Rewrite the nested structs with flexible array members in order to
   avoid the risk of invalid accesses

* tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  rpmsg: glink: Avoid -Wflex-array-member-not-at-end warnings
  rpmsg: glink: Introduce packet tracepoints
  rpmsg: glink: Pass channel to qcom_glink_send_close_ack()
  rpmsg: glink: Tidy up RX advance handling
2024-09-24 12:24:32 -07:00