ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Zhao Mengmeng <zhaomzhao@126.com>
Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org
Fixes: 245254f708 ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
sched_ext dispatches tasks from the BPF scheduler from balance_scx() and
thus every pick_task_scx() call must be preceded by balance_scx(). While
this usually holds, due to a bug, there are cases where the fair class's
balance() returns true indicating that it has tasks to run on the CPU and
thus terminating balance() calls but fails to actually find the next task to
run when pick_task() is called. In such cases, pick_task_scx() can be called
without preceding balance_scx().
Detect this condition using SCX_RQ_BAL_PENDING flags. If detected, keep
running the previous task if possible and avoid stalling from entering idle
without balancing.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/Ztj_h5c2LYsdXYbA@slm.duckdns.org
0e7ffff1b8 ("scx: Fix raciness in scx_ops_bypass()") converted
scx_ops_bypass_depth from an atomic to an int. Update scx_show_state.py
accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 0e7ffff1b8 ("scx: Fix raciness in scx_ops_bypass()")
cc9877fb76 ("sched_ext: Improve error reporting during loading") changed
how load failures are reported so that more error context can be
communicated. This breaks the enq_last_no_enq_fails test as attach no longer
fails. The scheduler is guaranteed to be ejected on attach completion with
full error information. Update enq_last_no_enq_fails so that it checks that
the scheduler is ejected using ops.exit().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vishal Chourasia <vishalc@linux.ibm.com>
Link: http://lkml.kernel.org/r/Zxknp7RAVNjmdJSc@linux.ibm.com
Fixes: cc9877fb76 ("sched_ext: Improve error reporting during loading")
cast_mask() doesn't do any actual work and is defined in a header file.
Force it to be inline. When it is not inlined and the function is not used,
it can cause verificaiton failures like the following:
# tools/testing/selftests/sched_ext/runner -t minimal
===== START =====
TEST: minimal
DESCRIPTION: Verify we can load a fully minimal scheduler
OUTPUT:
libbpf: prog 'cast_mask': missing BPF prog type, check ELF section name '.text'
libbpf: prog 'cast_mask': failed to load: -22
libbpf: failed to load object 'minimal'
libbpf: failed to load BPF skeleton 'minimal': -22
ERR: minimal.c:20
Failed to open and load skel
not ok 1 minimal #
===== END =====
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: a748db0c8c ("tools/sched_ext: Receive misc updates from SCX repo")
In commit 63fb3ec805 ("sched_ext: Allow only user DSQs for
scx_bpf_consume(), scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new()"), we
updated the consume path to only accept user DSQs, thus making it invalid
to consume SCX_DSQ_GLOBAL. This selftest was doing that, so let's create a
custom DSQ and use that instead. The test now passes:
[root@virtme-ng sched_ext]# ./runner -t exit
===== START =====
TEST: exit
DESCRIPTION: Verify we can cleanly exit a scheduler in multiple places
OUTPUT:
[ 12.387229] sched_ext: BPF scheduler "exit" enabled
[ 12.406064] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[ 12.453325] sched_ext: BPF scheduler "exit" enabled
[ 12.474064] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[ 12.515241] sched_ext: BPF scheduler "exit" enabled
[ 12.532064] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[ 12.592063] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[ 12.654063] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[ 12.715062] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
ok 1 exit #
===== END =====
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix incompatible function pointer type warnings in sched_ext BPF selftests by
explicitly casting the function pointers when initializing struct_ops.
This addresses multiple -Wincompatible-function-pointer-types warnings from the
clang compiler where function signatures didn't match exactly.
The void * cast ensures the compiler accepts the function pointer
assignment despite minor type differences in the parameters.
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
As described in commit b07996c7ab ("sched_ext: Don't hold
scx_tasks_lock for too long"), we're doing a cond_resched() every 32
calls to scx_task_iter_next() to avoid RCU and other stalls. That commit
also added a cpu_relax() to the codepath where we drop and reacquire the
lock, but as Waiman described in [0], cpu_relax() should only be
necessary in busy loops to avoid pounding on a cacheline (or to allow a
hypertwin to more fully utilize a core).
Let's remove the unnecessary cpu_relax().
[0]: https://lore.kernel.org/all/35b3889b-904a-4d26-981f-c8aa1557a7c7@redhat.com/
Cc: Waiman Long <llong@redhat.com>
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Iterating with scx_task_iter involves scx_tasks_lock and optionally the rq
lock of the task being iterated. Both locks can be released during iteration
and the iteration can be continued after re-grabbing scx_tasks_lock.
Currently, all lock handling is pushed to the caller which is a bit
cumbersome and makes it difficult to add lock-aware behaviors. Make the
scx_task_iter helpers handle scx_tasks_lock.
- scx_task_iter_init/scx_taks_iter_exit() now grabs and releases
scx_task_lock, respectively. Renamed to
scx_task_iter_start/scx_task_iter_stop() to more clearly indicate that
there are non-trivial side-effects.
- Add __ prefix to scx_task_iter_rq_unlock() to indicate that the function
is internal.
- Add scx_task_iter_unlock/relock(). The former drops both rq lock (if held)
and scx_tasks_lock and the latter re-locks only scx_tasks_lock.
This doesn't cause behavior changes and will be used to implement stall
avoidance.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Bypass mode was depending on ops.select_cpu() which can't be trusted as with
the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
bypass mode.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Move the sanity check from the inner function scx_select_cpu_dfl() to the
exported kfunc scx_bpf_select_cpu_dfl(). This doesn't cause behavior
differences and will allow using scx_select_cpu_dfl() in bypass mode
regardless of scx_builtin_idle_enabled.
Signed-off-by: Tejun Heo <tj@kernel.org>
The disable path caps p->scx.slice to SCX_SLICE_DFL. As the field is already
being ignored at this stage during disable, the only effect this has is that
when the next BPF scheduler is loaded, it won't see unreasonable left-over
slices. Ultimately, this shouldn't matter but it's better to start in a
known state. Drop p->scx.slice capping from the disable path and instead
reset it to SCX_SLICE_DFL in the enable path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
This reverts commit 6f34d8d382.
Slice length is ignored while bypassing and tasks are switched on every tick
and thus the patch does not make any difference. The perceived difference
was from test noise.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
pick_next_task_scx() was turned into pick_task_scx() since
commit 753e2836d1 ("sched_ext: Unify regular and core-sched pick
task paths"). Update the outdated message.
Signed-off-by: Honglei Wang <jameshongleiwang@126.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The sched_ext selftests is missing proper cross-compilation support, a
proper target entry, and out-of-tree build support.
When building the kselftest suite, e.g.:
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- \
TARGETS=sched_ext SKIP_TARGETS="" O=/output/foo \
-C tools/testing/selftests install
or:
make ARCH=arm64 LLVM=1 TARGETS=sched_ext SKIP_TARGETS="" \
O=/output/foo -C tools/testing/selftests install
The expectation is that the sched_ext is included, cross-built, the
correct toolchain is picked up, and placed into /output/foo.
In contrast to the BPF selftests, the sched_ext suite does not use
bpftool at test run-time, so it is sufficient to build bpftool for the
build host only.
Add ARCH, CROSS_COMPILE, OUTPUT, and TARGETS support to the sched_ext
selftest. Also, remove some variables that were unused by the
Makefile.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: David Vernet <void@manifault.com>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Since the artifact paths for tools changed, we need to update the documentation to reflect that path.
Signed-off-by: Devaansh-Kumar <devaanshk840@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
tell whether ops.select_cpu() was called. This is incorrect as
ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
incorrectly skipping direct dispatch for tasks that are bound to a single
CPU.
sched core has been updated to specify ENQUEUE_RQ_SELECTED if
->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
scx_qmap to test it instead of SCX_ENQ_WAKEUP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
migration is disabled. sched_ext schedulers may perform operations such as
direct dispatch from ->select_task_rq() path and it is useful for them to
know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
and end up assuming incorrectly that ->select_task_rq() was called for tasks
that are bound to a single CPU or migration disabled.
Make select_task_rq() indicate whether ->select_task_rq() was called by
setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
to ENQUEUE_RQ_SELECTED for ->enqueue_task().
This will be used by sched_ext to fix ->select_task_rq() skip detection.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
This will be used to allow select_task_rq() to indicate whether
->select_task_rq() was called by modifying *wake_flags.
This makes try_to_wake_up() call all functions that take wake_flags with
WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
more consistent, and, as the flag is only tested by ->select_task_rq()
implementations, it doesn't cause any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
568894edbe ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations
and fix scx_tg_online()") assumed that scx_cgroup_exit() is only called
after scx_cgroup_init() finished successfully. This isn't true.
scx_cgroup_exit() can be called without scx_cgroup_init() being called at
all or after scx_cgroup_init() failed in the middle.
As init state is tracked per cgroup, scx_cgroup_exit() can be used safely to
clean up in all cases. Remove the incorrect WARN_ON_ONCE().
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 568894edbe ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online()")
When the BPF scheduler fails, ops.exit() allows rich error reporting through
scx_exit_info. Use scx.exit() path consistently for all failures which can
be caused by the BPF scheduler:
- scx_ops_error() is called after ops.init() and ops.cgroup_init() failure
to record error information.
- ops.init_task() failure now uses scx_ops_error() instead of pr_err().
- The err_disable path updated to automatically trigger scx_ops_error() to
cover cases that the error message hasn't already been generated and
always return 0 indicating init success so that the error is reported
through ops.exit().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Fix build errors by adding __weak markers to BPF helper function
declarations in header files. This resolves static assertion failures
in scx_qmap.bpf.c and scx_flatcg.bpf.c where functions like
scx_bpf_dispatch_from_dsq_set_slice, scx_bpf_dispatch_from_dsq_set_vtime,
and scx_bpf_task_cgroup were missing the __weak attribute.
[1] https://lore.kernel.org/all/ZvvfUqRNM4-jYQzH@linux.ibm.com
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The enable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems.
For example, currently, there is a possible deadlock involving
scx_fork_rwsem and cpus_read_lock. cpus_read_lock has to nest inside
scx_fork_rwsem due to locking order existing in other subsystems. However,
there exists a dependency in the other direction during hotplug if hotplug
needs to fork a new task, which happens in some cases. This leads to the
following deadlock:
scx_ops_enable() hotplug
percpu_down_write(&cpu_hotplug_lock)
percpu_down_write(&scx_fork_rwsem)
block on cpu_hotplug_lock
kthread_create() waits for kthreadd
kthreadd blocks on scx_fork_rwsem
Note that this doesn't trigger lockdep because the hotplug side dependency
bounces through kthreadd.
With the preceding scx_cgroup_enabled change, this can be solved by
decoupling cpus_read_lock, which is needed for static_key manipulations,
from the other two locks.
- Move the first block of static_key manipulations outside of scx_fork_rwsem
and scx_cgroup_rwsem. This is now safe with the preceding
scx_cgroup_enabled change.
- Drop scx_cgroup_rwsem and scx_fork_rwsem between the two task iteration
blocks so that __scx_ops_enabled static_key enabling is outside the two
rwsems.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
cpus_read_lock. Currently, the locks are grabbed together which is prone to
locking order problems. With the preceding scx_cgroup_enabled change, we can
decouple them:
- As cgroup disabling no longer requires modifying a static_key which
requires cpus_read_lock(), no need to grab cpus_read_lock() before
grabbing scx_cgroup_rwsem.
- cgroup can now be independently disabled before tasks are moved back to
the fair class.
Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop
now unnecessary cpus_read_lock() and move static_key operations out of
scx_fork_rwsem. This decouples all three locks in the disable path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
If the BPF scheduler does not implement ops.cgroup_init(), scx_tg_online()
didn't set SCX_TG_INITED which meant that ops.cgroup_exit(), even if
implemented, won't be called from scx_tg_offline(). This is because
SCX_HAS_OP(cgroupt_init) is used to test both whether SCX cgroup operations
are enabled and ops.cgroup_init() exists.
Fix it by introducing a separate bool scx_cgroup_enabled to gate cgroup
operations and use SCX_HAS_OP(cgroup_init) only to test whether
ops.cgroup_init() exists. Make all cgroup operations consistently use
scx_cgroup_enabled to test whether cgroup operations are enabled.
scx_cgroup_enabled is added instead of using scx_enabled() to ease planned
locking updates.
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_ops_init_task() and the follow-up scx_ops_enable_task() in the fork path
were gated by scx_enabled() test and thus __scx_ops_enabled had to be turned
on before the first scx_ops_init_task() loop in scx_ops_enable(). However,
if an external entity causes sched_class switch before the loop is complete,
tasks which are not initialized could be switched to SCX.
The following can be reproduced by running a program which keeps toggling a
process between SCHED_OTHER and SCHED_EXT using sched_setscheduler(2).
sched_ext: Invalid task state transition 0 -> 3 for fish[1623]
WARNING: CPU: 1 PID: 1650 at kernel/sched/ext.c:3392 scx_ops_enable_task+0x1a1/0x200
...
Sched_ext: simple (enabling)
RIP: 0010:scx_ops_enable_task+0x1a1/0x200
...
switching_to_scx+0x13/0xa0
__sched_setscheduler+0x850/0xa50
do_sched_setscheduler+0x104/0x1c0
__x64_sys_sched_setscheduler+0x18/0x30
do_syscall_64+0x7b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix it by gating scx_ops_init_task() separately using
scx_ops_init_task_enabled. __scx_ops_enabled is now set after all tasks are
finished with scx_ops_init_task().
Signed-off-by: Tejun Heo <tj@kernel.org>
scx_ops_enable() has two task iteration loops. The first one calls
scx_ops_init_task() on every task and the latter switches the eligible ones
into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the
second loop switched it into READY before switching the task into SCX.
The distinction between INIT and READY is only meaningful in the fork path
where it's used to tell whether the task finished forking so that we can
tell ops.exit_task() accordingly. Leaving task in INIT state between the two
loops is incosistent with the fork path and incorrect. The following can be
triggered by running a program which keeps toggling a task between
SCHED_OTHER and SCHED_SCX while enabling a task:
sched_ext: Invalid task state transition 1 -> 3 for fish[1526]
WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200
...
Sched_ext: qmap (enabling+all)
RIP: 0010:scx_ops_enable_task+0x1a1/0x200
...
switching_to_scx+0x13/0xa0
__sched_setscheduler+0x850/0xa50
do_sched_setscheduler+0x104/0x1c0
__x64_sys_sched_setscheduler+0x18/0x30
do_syscall_64+0x7b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix it by transitioning to READY in the first loop right after
scx_ops_init_task() succeeds.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
scx_ops_enable() used preempt_disable() around the task iteration loop to
switch tasks into SCX to guarantee forward progress of the task which is
running scx_ops_enable(). However, in the gap between setting
__scx_ops_enabled and preeempt_disable(), an external entity can put tasks
including the enabling one into SCX prematurely, which can lead to
malfunctions including stalls.
The bypass mode can wrap the entire enabling operation and guarantee forward
progress no matter what the BPF scheduler does. Use the bypass mode instead
to guarantee forward progress while enabling.
While at it, release and regrab scx_tasks_lock between the two task
iteration locks in scx_ops_enable() for clarity as there is no reason to
keep holding the lock between them.
Signed-off-by: Tejun Heo <tj@kernel.org>
The distinction between SCX_OPS_PREPPING and SCX_OPS_ENABLING is not used
anywhere and only adds confusion. Drop SCX_OPS_PREPPING.
Signed-off-by: Tejun Heo <tj@kernel.org>
check_hotplug_seq() is used to detect CPU hotplug event which occurred while
the BPF scheduler is being loaded so that initialization can be retried if
CPU hotplug events take place before the CPU hotplug callbacks are online.
As such, the best place to call it is in the same cpu_read_lock() section
that enables the CPU hotplug ops. Currently, it is called in the next
cpus_read_lock() block in scx_ops_enable(). The side effect of this
placement is a small window in which hotplug sequence detection can trigger
unnecessarily, which isn't critical.
Move check_hotplug_seq() invocation to the same cpus_read_lock() block as
the hotplug operation enablement to close the window and get the invocation
out of the way for planned locking updates.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
While bypassing, tasks are scheduled in FIFO order which favors tasks that
hog CPUs. This can slow down e.g. unloading of the BPF scheduler. While
bypassing, guaranteeing timely forward progress is the main goal. There's no
point in giving long slices. Shorten the time slice used while bypassing
from 20ms to 5ms.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
In the bypass mode, the global DSQ is used to schedule all tasks in simple
FIFO order. All tasks are queued into the global DSQ and all CPUs try to
execute tasks from it. This creates a lot of cross-node cacheline accesses
and scheduling across the node boundaries, and can lead to live-lock
conditions where the system takes tens of minutes to disable the BPF
scheduler while executing in the bypass mode.
Split the global DSQ per NUMA node. Each node has its own global DSQ. When a
task is dispatched to SCX_DSQ_GLOBAL, it's put into the global DSQ local to
the task's CPU and all CPUs in a node only consume its node-local global
DSQ.
This resolves a livelock condition which could be reliably triggered on an
2x EPYC 7642 system by running `stress-ng --race-sched 1024` together with
`stress-ng --workload 80 --workload-threads 10` while repeatedly enabling
and disabling a SCX scheduler.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
To prepare for the addition of find_global_dsq(). No functional changes.
Signed-off-by: tejun heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
SCX_DSQ_GLOBAL is special in that it can't be used as a priority queue and
is consumed implicitly, but all BPF DSQ related kfuncs could be used on it.
SCX_DSQ_GLOBAL will be split per-node for scalability and those operations
won't make sense anymore. Disallow SCX_DSQ_GLOBAL on scx_bpf_consume(),
scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new(). This means that
SCX_DSQ_GLOBAL can only be used as a dispatch target from BPF schedulers.
With scx_flatcg, which was using SCX_DSQ_GLOBAL as the fallback DSQ,
updated, this shouldn't affect any schedulers.
This leaves find_dsq_for_dispatch() the only user of find_non_local_dsq().
Open code and remove find_non_local_dsq().
Signed-off-by: tejun heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
scx_flatcg was using SCX_DSQ_GLOBAL for fallback handling. However, it is
assuming that SCX_DSQ_GLOBAL isn't automatically consumed, which was true a
while ago but is no longer the case. Also, there are further changes planned
for SCX_DSQ_GLOBAL which will disallow explicit consumption from it. Switch
to a user DSQ for fallback.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Receive misc tools/sched_ext updates from https://github.com/sched-ext/scx
to sync userspace bits.
- LSP macros to help language servers.
- bpf_cpumask_weight() declaration and cast_mask() helper.
- Cosmetic updates to scx_flatcg.bpf.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
cgroup support and scx_bpf_dispatch[_vtime]_from_dsq() are newly added since
8bb30798fd ("sched_ext: Fixes incorrect type in bpf_scx_init()") which is
the current earliest commit targeted by BPF schedulers. Add compat helpers
for them and apply them in the example schedulers.
These will be dropped after a few kernel releases. The exact backward
compatibility window hasn't been decided yet.
Signed-off-by: Tejun Heo <tj@kernel.org>
move_remote_task_to_local_dsq() is only defined on SMP configs but
scx_disaptch_from_dsq() was calling move_remote_task_to_local_dsq() on UP
configs too causing build failures. Add a dummy
move_remote_task_to_local_dsq() which triggers a warning.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 4c30f5ce4f ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409241108.jaocHiDJ-lkp@intel.com/
- Support cross-compiling linux-headers Debian package and kernel-devel
RPM package
- Add support for the linux-debug Pacman package
- Improve module rebuilding speed by factoring out the common code to
scripts/module-common.c
- Separate device tree build rules into scripts/Makefile.dtbs
- Add a new script to generate modules.builtin.ranges, which is useful
for tracing tools to find symbols in built-in modules
- Refactor Kconfig and misc tools
- Update Kbuild and Kconfig documentation
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmby2+QVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGpQ0QALWMgox3OdceNiBT8QieqRFfwKFv
5jxtsZt+MbTdWNMEfgc4Cq2i5ZAqpYGZh32RwTiZJogBvYEIoO7M4Md9VwoEe/BC
q8VZ6FhUy7358IX/FCukfB0dYvkziRalBRDrE4iFmMMdhBvZ9nrvMxllqFCMllLj
DTrBTTiMus3qiiczr4tb5QwaIR6C+yqiEBF++ftLmWvo9dn8YNNUnI65fGjyQM/w
0wMPwsB3Y2HdnRpLUS6T18gZbjoXsAk4+WX0TpdBfTs3d7AdbzlSMtc0BslEm6Tb
JjIK6SbJCM3kNC7O0/gsUenOaSBxSbKjjg33gQxn/eNoi0nRt+qnBMMreYiTd95G
Hq86QcNfKQtWAagKRTppMkYEDqMU2RKH7BmJOsfQyeG9cGpAAu+0HsQv3f/h5QP1
MlA8o+NP5oQn6RbrhZz1Pqm24+OMxiXaBhmo8XbZ+MXzi/CBR54Eo4ip/FSHzXII
EGEAQL7t7YU7xu8qMIE6ZQMH7BJsjJNee0vrNiYZa4xHLYyHi6mJl8K6LlHQ3nEx
WOsPX9MLITtSJwcvIio/0sEnuR7pjcShGfqhbHO5tiOYznsbcSvu3+18HPGCpFRt
vYFkNIRc298k7++A+Zp2wwdD2TS+SSilrAImmJXMhf0M+Nyg2vnlfAo8t0QSkFlh
1g9dJuy+8jYRjHXP
=g4t/
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support cross-compiling linux-headers Debian package and kernel-devel
RPM package
- Add support for the linux-debug Pacman package
- Improve module rebuilding speed by factoring out the common code to
scripts/module-common.c
- Separate device tree build rules into scripts/Makefile.dtbs
- Add a new script to generate modules.builtin.ranges, which is useful
for tracing tools to find symbols in built-in modules
- Refactor Kconfig and misc tools
- Update Kbuild and Kconfig documentation
* tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
kbuild: doc: replace "gcc" in external module description
kbuild: doc: describe the -C option precisely for external module builds
kbuild: doc: remove the description about shipped files
kbuild: doc: drop section numbering, use references in modules.rst
kbuild: doc: throw out the local table of contents in modules.rst
kbuild: doc: remove outdated description of the limitation on -I usage
kbuild: doc: remove description about grepping CONFIG options
kbuild: doc: update the description about Kbuild/Makefile split
kbuild: remove unnecessary export of RUST_LIB_SRC
kbuild: remove append operation on cmd_ld_ko_o
kconfig: cache expression values
kconfig: use hash table to reuse expressions
kconfig: refactor expr_eliminate_dups()
kconfig: add comments to expression transformations
kconfig: change some expr_*() functions to bool
scripts: move hash function from scripts/kconfig/ to scripts/include/
kallsyms: change overflow variable to bool type
kallsyms: squash output_address()
kbuild: add install target for modules.builtin.ranges
scripts: add verifier script for builtin module range data
...
This cpupower fixes update consists fix to raw_pylibcpupower.i being
removed by "make mrproper". "*.i", "*.o" files are generated during
kernel compile and removed when the repo is cleaned by mrproper.
The file is renamed to use .swg extension instead to avoid the problem.
The second patch removes references to raw_pylibcpupower.i from .gitignore.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmbxyG0ACgkQCwJExA0N
Qxxf4RAAt7Ow3RePcze5fBsl2GkIGDw3rWGxmr0Qbh/YesD/o/eOtykqe4RobqqX
v1VSwbdJEquD4Ll4i3i4np/82zWK9YOAWRcXt9x8HfecfSFhMY0M76+wghhS+zWK
b0V+y4xiELcmaZGiBaHwfbazHmpwpYHukzZSjK/ll0u1umCIPPVSvVtI7240IQ4b
MAI3K/wIvHyQgnkVaqdVzqhRxitIwXuy2hw0aNzwJ66FM7EwvuBDazKhPQU355hH
6eD0GRs7ffcLdG9cLeKd/BewIv7Fc6pkgUZ5JYmmehYGZUPvhvAgoxqwD0FOs99s
bt5YWrIBx45psR8PJiVtL3ZJPNqNQt9LRsHD0ANKGk+yMZoKoeEUP367hGy48ZPi
1HdMnn9aMVm/mP7wuL9nre8rV+NFooFC/hkwyBeWIQMSFI9m4KY8ifDc7Dl+r4N2
5oX+4QKoXoNQEiPN4Ap11dhUqaxFunjcN1zjMCE63cLLzfFBIDRypAnpuypbajca
DQZz4dcxSoNBxl0ylwiqyYfAfUoPfffT2np+ZawnBGtwGFf8gKnCuEeMjwXJg0Ym
Cgp4mjp+xyQgwOVS0HSQ3jGrTLfLJeSUxMugc+6Mao6v8U6cMSUxNA8Le6iBHjrA
YQFXCj1qeoD/M7LhMIseBsACmd4JwVZgoO3V+eS45LICxEgHYPY=
=cwcN
-----END PGP SIGNATURE-----
Merge tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower updates from Shuah Khan
"The 'raw_pylibcpupower.i' file was being removed by "make mrproper".
That was because '*.i', '.s' and '*.o' files are generated during
kernel compile and removed when the repo is cleaned by mrproper.
Rename it to use .swg extension instead to avoid the problem.
A second patch removes references to it from .gitignore"
* tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
pm: cpupower: Clean up bindings gitignore
pm: cpupower: rename raw_pylibcpupower.i
Core:
- allow adjusting first broadcast address speed
Drivers:
- cdns: few fixes
- mipi-i3c-hci: Add AMD SoC I3C controller support and quirks, fix get_i3c_mode
- svc: adjust rates, fix race condition
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEBqsFVZXh8s/0O5JiY6TcMGxwOjIFAmbx6h4ACgkQY6TcMGxw
OjL1qw/9HTddkWARFkD7eS5/GuzBdsXlaUcAZZHgbu8AGLVAMJbmB8MPSYpkP1dj
1MDBXBgUwNjP8/Pbn3il3hvOcRje7SsvtHivYho+OA2mnLBxKIJUBfSPswiW9Y00
j2FtJCfBFms5+UC4fX2SywSgkEWp5a7RCOW2d69nwYqkoWlpSwIRRoVerFXKyCto
iCCdos/KyvblD2tMT+fWtSHQRmPds9Fytl9RwHdCKRmlGi9qONuqJK2twlUnE164
bDk7C5JZuadX4l1pwOINlAOcpljEdANlSPq+5zmvGPqlSMMwFavhLEUPC2gJqOwQ
bZXcWJs12vn28ZDeob0NDHIOJvLOrMDKWOtCQoxPMDfd00QO6+HI0vG5S7+yh+Q0
B/UDU+PwR48OncVys4P4kvu82NU9OeBqeg6QK1KLdKKQpw9K5zNca2qTlHX3ZYKd
fQGu1G2K7FcZFi4WduKzUXbo7dIv69hi1uj4gbXii79LiOaZdtCAnYKn1qX7aeWM
bwutMqLcug8vxqRxIR+23Ki7QLr55TPzj3YCGQsPPnmKAnkHJeEUiCtPJ5Ymytb8
GWzW0PVEdsFg1LB9Ei1K41/JICx62h/U0X2OdJnYx4JIij0e8GcoItYVsKsq+mG0
f7g2kBZDMDcAzaoHl7lgW5gBB8NnFmC0+74pqnjrBiSfzWkNFcM=
=jSyb
-----END PGP SIGNATURE-----
Merge tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"This adds support for the I3C HCI controller of the AMD SoC which as
expected requires quirks. Also fixes for the other drivers, including
rate selection fixes for svc.
Core:
- allow adjusting first broadcast address speed
Drivers:
- cdns: few fixes
- mipi-i3c-hci: Add AMD SoC I3C controller support and quirks, fix
get_i3c_mode
- svc: adjust rates, fix race condition"
* tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: svc: Fix use after free vulnerability in svc_i3c_master Driver Due to Race Condition
i3c: master: cdns: Fix use after free vulnerability in cdns_i3c_master Driver Due to Race Condition
i3c: master: svc: adjust SDR according to i3c spec
i3c: master: svc: use slow speed for first broadcast address
i3c: master: support to adjust first broadcast address speed
i3c/master: cmd_v1: Fix the rule for getting i3c mode
i3c: master: cdns: fix module autoloading
i3c: mipi-i3c-hci: Add a quirk to set Response buffer threshold
i3c: mipi-i3c-hci: Add a quirk to set timing parameters
i3c: mipi-i3c-hci: Relocate helper macros to HCI header file
i3c: mipi-i3c-hci: Add a quirk to set PIO mode
i3c: mipi-i3c-hci: Read HC_CONTROL_PIO_MODE only after i3c hci v1.1
i3c: mipi-i3c-hci: Add AMDI5017 ACPI ID to the I3C Support List
The TI_K3_M4_REMOTEPROC Kconfig entry selects OMAP2PLUS_MBOX, but that
driver in turn depends on other things, which the k4-m4 driver didn't.
This causes a Kconfig time warning:
WARNING: unmet direct dependencies detected for OMAP2PLUS_MBOX
Depends on [n]: MAILBOX [=y] && (ARCH_OMAP2PLUS || ARCH_K3)
Selected by [m]:
- TI_K3_M4_REMOTEPROC [=m] && REMOTEPROC [=y] && (ARCH_K3 || COMPILE_TEST [=y])
because you can't select something that is unavailable.
Make the dependencies for TI_K3_M4_REMOTEPROC match those of the
OMAP2PLUS_MBOX driver that it needs.
Fixes: ebcf9008a8 ("remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem")
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Martyn Welch <martyn.welch@collabora.com>
Cc: Hari Nagalla <hnagalla@ti.com>
Cc: Andrew Davis <afd@ti.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- support for PixArt PS/2 touchpad
- updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers
- support for GPIO-only mode for ADP55888 controller
- support for touch keys in Zinitix driver
- support for querying density of Synaptics sensors
- sysfs interface for Goodex "Berlin" devices to read and write touch IC
registers
- more quirks to i8042 to handle various Tuxedo laptops
- a number of drivers have been converted to using "guard" notation
when acquiring various locks, as well as using other cleanup functions
to simplify releasing of resources (with more drivers to follow)
- evdev will limit amount of data that can be written into an evdev
instance at a given time to 4096 bytes (170 input events) to avoid
holding evdev->mutex for too long and starving other users
- Spitz has been converted to use software nodes/properties to describe
its matrix keypad and GPIO-connected LEDs
- msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
removed since noone in mainline have been using them
- other assorted cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCZvGnWgAKCRBAj56VGEWX
nDjGAQCHcRk1icJlyjx3KA85gdILriDB9zMNJnCnbYXrSfGbOQD+LZNjO3po26zB
wloLEYTKRu3E/oWEI8VcfIN2m89vxw4=
=bDsb
-----END PGP SIGNATURE-----
Merge tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- support for PixArt PS/2 touchpad
- updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers
- support for GPIO-only mode for ADP55888 controller
- support for touch keys in Zinitix driver
- support for querying density of Synaptics sensors
- sysfs interface for Goodex "Berlin" devices to read and write touch
IC registers
- more quirks to i8042 to handle various Tuxedo laptops
- a number of drivers have been converted to using "guard" notation
when acquiring various locks, as well as using other cleanup
functions to simplify releasing of resources (with more drivers to
follow)
- evdev will limit amount of data that can be written into an evdev
instance at a given time to 4096 bytes (170 input events) to avoid
holding evdev->mutex for too long and starving other users
- Spitz has been converted to use software nodes/properties to describe
its matrix keypad and GPIO-connected LEDs
- msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
removed since noone in mainline have been using them
- other assorted cleanups and fixes
* tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (98 commits)
ARM: spitz: fix compile error when matrix keypad driver is enabled
Input: hynitron_cstxxx - drop explicit initialization of struct i2c_device_id::driver_data to 0
Input: adp5588-keys - fix check on return code
Input: Convert comma to semicolon
Input: i8042 - add TUXEDO Stellaris 15 Slim Gen6 AMD to i8042 quirk table
Input: i8042 - add another board name for TUXEDO Stellaris Gen5 AMD line
Input: tegra-kbc - use of_property_read_variable_u32_array() and of_property_present()
Input: ps2-gpio - use IRQF_NO_AUTOEN flag in request_irq()
Input: ims-pcu - fix calling interruptible mutex
Input: zforce_ts - switch to using asynchronous probing
Input: zforce_ts - remove assert/deassert wrappers
Input: zforce_ts - do not hardcode interrupt level
Input: zforce_ts - switch to using devm_regulator_get_enable()
Input: zforce_ts - stop treating VDD regulator as optional
Input: zforce_ts - make zforce_idtable constant
Input: zforce_ts - use dev_err_probe() where appropriate
Input: zforce_ts - do not ignore errors when acquiring regulator
Input: zforce_ts - make parsing of contacts less confusing
Input: zforce_ts - switch to using get_unaligned_le16
Input: zforce_ts - use guard notation when acquiring mutexes
...
This converts the Spreadtrum hardware spinlock DeviceTree binding to
YAML, to allow validation of related DeviceTree source.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmbyReoVHGFuZGVyc3Nv
bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3FGJkP/jMfaXkO+QMtt2jJqOOC/ebaviEP
k8EILUy7EhYnfkJdzdaUKgN6M+xSIClRssmoJq0xUgs7LNq9mqViBw3ewxpP8JQe
ychmnqogzbRHnKaH1QWxqeE1PfEUCBqTcT7c+3L2QDDnhjIIViObMarbwBJ6sFnf
Yb9nG0gOG26W81eG32FRQ+Z6+Tnf6vLT9PjgYZ2UbG01onZuAG1KKhcQrckbPt3p
jpTw/0gfJMIauuFtC5ouhVMeWnA9ksstRogOd2gjjd5jUIdC/7Ff3IkAJ04aqpSy
t8NphBGOHqageyVczDt4kjmgQti4bL+EemgJ5HTkzQ9PFFJJ8bwueRpodJGW4I7s
/SAxEfGFSlaB1Ll/dX9drsARir2CRT/nJPWcSKQL7G9/5Cq5ywfQsmN3NPZjlrQH
cX37leOK1Xpw1TN60kWccZKn2/BIuDY7XLnyVezj/2tdmJhvP7+05G09qZVoHxGQ
MDmHz0htENW+yGcPfVtGvj92vBsXpVmwUo6VXp7Cu/1U64rzzKa/TAm0QBZhLYTX
1GaQb5STMvbUx0l8KBAN1taKhfaeEbxF/SMbxKgIjCTpybhCj6q787QvBT3BICh4
jDRMeCM5PpXFIrFDA+oyMjqx9aXRxgj9O5c3HRvPjxDbMKyKAW2Twf99SZTn6StZ
2JXQcFR117EFZYt1
=DyIB
-----END PGP SIGNATURE-----
Merge tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull hwspinlock update from Bjorn Andersson:
"This converts the Spreadtrum hardware spinlock DeviceTree binding to
YAML, to allow validation of related DeviceTree source"
* tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
dt-bindings: hwlock: sprd-hwspinlock: convert to YAML
This performs minor cleanup/refactor of the Qualcomm GLINK code, in
order to then add trace events related to the messages exchange with the
remote side, useful for debugging a range of interoperability issues.
The nested structs with flexible array members, in the same driver, is
then rewritten to avoid the risk of invalid accesses.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmbyRXAVHGFuZGVyc3Nv
bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3FCmwQAK4rnPNCnJsPoZDQx41JoWwIwRMu
Xn3CBwAN24ysfSYusqQrtYtru6KIVg3B1F0Qu/bDCmRG9rq0jGgnUDK368OX9Tpm
P1rZTCqvVe8iZms69uz/aJN+3ZAjUH6kJlrTlCRuSoDFGirohjqIUUYQylGvNFQH
E27wAVNH7W8ahXOZP94T2w39rIN4+EB5G8BTZmdFXRW4V1DPYoxcoAc9a9AZqsea
mPkvIpF0aKHMD8NnPjEutMLC51RCTb6IrqJknmd1IhREVai0d4pc3UlwU/NpIMWO
pDu4EJbYzFSj1wkgPeaaQ05fQ9mCMM9y4fXKJFSKqX50Iq9HCpr7NR2XYiv+mGLJ
F7tMhmu0sfOGg8t8Zzh92I2WoTiQs0JdikZ+CJo2I5ZkNCPbGUe/DvZfyUE3SEjt
SRhgBzhyZ+aREzzmbkwfwmEtJLahgJDvC9kj9EUX9wRdE4gG9Uwb2YpUU6pguKTX
XhKNUey19KVUZjBdcOL4eau7FrZiP7D8cYrwH870t027gdNahPD8DsxC8l52NlNV
0IST3gIRZneiOEW0OA/5i8O6SHJaJ5A0aOpzm0uMiiXu7cL+3iW9PwtwywsjObGX
QNjtNtHfxuXFK3P8UlVfyW1MvmF7aA7E4Y3v43yThITUfD+BDC19+4e2jjCbLt40
vfxUkuJSMmuHWCr2
=hQET
-----END PGP SIGNATURE-----
Merge tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull rpmsg updates from Bjorn Andersson:
- Minor cleanup/refactor to the Qualcomm GLINK code, in order to add
trace events related to the messages exchange with the remote side,
useful for debugging a range of interoperability issues
- Rewrite the nested structs with flexible array members in order to
avoid the risk of invalid accesses
* tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
rpmsg: glink: Avoid -Wflex-array-member-not-at-end warnings
rpmsg: glink: Introduce packet tracepoints
rpmsg: glink: Pass channel to qcom_glink_send_close_ack()
rpmsg: glink: Tidy up RX advance handling