linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 04:31:50 +00:00

History

Tejun Heo 8639ecebc9 workqueue: Implement non-strict affinity scope for unbound workqueues An unbound workqueue can be served by multiple worker_pools to improve locality. The segmentation is achieved by grouping CPUs into pods. By default, the cache boundaries according to cpus_share_cache() define the CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the system has two L3 caches. The workqueue would be mapped to two worker_pools each serving one L3 cache domains. While this improves locality, because the pod boundaries are strict, it limits the total bandwidth a given issuer can consume. For example, let's say there is a thread pinned to a CPU issuing enough work items to saturate the whole machine. With the machine segmented into two pods, no matter how many work items it issues, it can only use half of the CPUs on the system. While this limitation has existed for a very long time, it wasn't very pronounced because the affinity grouping used to be always by NUMA nodes. With cache boundaries as the default and support for even finer grained scopes (smt and cpu), it is now an a lot more pressing problem. This patch implements non-strict affinity scope where the pod boundaries aren't enforced strictly. Going back to the previous example, the workqueue would still be mapped to two worker_pools; however, the affinity enforcement would be soft. The workers in both pools would have their cpus_allowed set to the whole machine thus allowing the scheduler to migrate them anywhere on the machine. However, whenever an idle worker is woken up, the workqueue code asks the scheduler to bring back the task within the pod if the worker is outside. ie. work items start executing within its affinity scope but can be migrated outside as the scheduler sees fit. This removes the hard cap on utilization while maintaining the benefits of affinity scopes. After the earlier ->__pod_cpumask changes, the implementation is pretty simple. When non-strict which is the new default: * pool_allowed_cpus() returns @pool->attrs->cpumask instead of ->__pod_cpumask so that the workers are allowed to run on any CPU that the associated workqueues allow. * If the idle worker task's ->wake_cpu is outside the pod, kick_pool() sets the field to a CPU within the pod. This would be the first use of task_struct->wake_cpu outside scheduler proper, so it isn't clear whether this would be acceptable. However, other methods of migrating tasks are significantly more expensive and are likely prohibitively so if we want to do this on every work item. This needs discussion with scheduler folks. There is also a race window where setting ->wake_cpu wouldn't be effective as the target task is still on CPU. However, the window is pretty small and this being a best-effort optimization, it doesn't seem to warrant more complexity at the moment. While the non-strict cache affinity scopes seem to be the best option, the performance picture interacts with the affinity scope and is a bit complicated to fully discuss in this patch, so the behavior is made easily selectable through wqattrs and sysfs and the next patch will add documentation to discuss performance implications. v2: pool->attrs->affn_strict is set to true for per-cpu worker_pools. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>		2023-08-07 15:57:25 -10:00
..
bpf	bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()	2023-07-03 18:48:09 +02:00
cgroup	- Yosry Ahmed brought back some cgroup v1 stats in OOM logs.	2023-06-28 10:28:11 -07:00
configs	mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED	2023-05-26 19:01:47 +02:00
debug	kdb: move kdb_send_sig() declaration to a better header file	2023-07-03 09:27:12 +01:00
dma	dma-mapping fixes for Linux 6.5	2023-07-09 10:24:22 -07:00
entry	ptrace: Provide set/get interface for syscall user dispatch	2023-04-16 14:23:07 +02:00
events	cxl for v6.5	2023-07-01 08:58:41 -07:00
futex	- Prevent the leaking of a debug timer in futex_waitv()	2023-01-01 11:15:05 -08:00
gcov	gcov: add support for checksum field	2022-12-21 14:31:52 -08:00
irq	irqdomain: Use return value of strreplace()	2023-06-30 11:13:44 +02:00
kcsan	kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures	2023-06-09 23:29:50 +10:00
livepatch	livepatch: Make 'klp_stack_entries' static	2023-06-05 13:56:52 +02:00
locking	- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in	2023-06-28 10:59:38 -07:00
module	module: fix init_module_from_file() error handling	2023-07-04 10:17:11 -07:00
power	- Yosry Ahmed brought back some cgroup v1 stats in OOM logs.	2023-06-28 10:28:11 -07:00
printk	seqlock/latch: Provide raw_read_seqcount_latch_retry()	2023-06-05 21:11:03 +02:00
rcu	Merge branches 'doc.2023.05.10a', 'fixes.2023.05.11a', 'kvfree.2023.05.10a', 'nocb.2023.05.11a', 'rcu-tasks.2023.05.10a', 'torture.2023.05.15a' and 'rcu-urgent.2023.06.06a' into HEAD	2023-06-07 13:44:06 -07:00
sched	cgroup: Changes for v6.5	2023-06-27 16:54:21 -07:00
time	hardening updates for v6.5-rc1	2023-06-27 21:24:18 -07:00
trace	Tracing fixes for 6.5:	2023-07-06 19:07:15 -07:00
.gitignore
acct.c	acct: fix potential integer overflow in encode_comp_t()	2022-11-30 16:13:18 -08:00
async.c
audit_fsnotify.c	audit: fix potential double free on error path from fsnotify_add_inode_mark	2022-08-22 18:50:06 -04:00
audit_tree.c
audit_watch.c	audit_init_parent(): constify path	2022-09-01 17:39:30 -04:00
audit.c	audit: use time_after to compare time	2022-08-29 19:47:03 -04:00
audit.h	audit: avoid missing-prototype warnings	2023-05-17 11:34:55 -04:00
auditfilter.c
auditsc.c	capability: just use a 'u64' instead of a 'u32[2]' array	2023-03-01 10:01:22 -08:00
backtracetest.c
bounds.c	mm: multi-gen LRU: minimal implementation	2022-09-26 19:46:09 -07:00
capability.c	capability: fix kernel-doc warnings in capability.c	2023-05-22 14:30:52 -04:00
cfi.c	cfi: Switch to -fsanitize=kcfi	2022-09-26 10:13:13 -07:00
compat.c	sched_getaffinity: don't assume 'cpumask_size()' is fully initialized	2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c	locking/atomic: treewide: use raw_atomic*_<op>()	2023-06-05 09:57:20 +02:00
cpu_pm.c	cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()	2023-01-13 11:48:15 +01:00
cpu.c	cpu/hotplug: Fix off by one in cpuhp_bringup_mask()	2023-05-23 18:06:40 +02:00
crash_core.c	mm, treewide: redefine MAX_ORDER sanely	2023-04-05 19:42:46 -07:00
crash_dump.c
cred.c	cred: Do not default to init_cred in prepare_kernel_cred()	2022-11-01 10:04:52 -07:00
delayacct.c	delayacct: track delays from IRQ/SOFTIRQ	2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c	fork, vhost: Use CLONE_THREAD to fix freezer/ps regression	2023-06-01 17:15:33 -04:00
extable.c	context_tracking: Take NMI eqs entrypoints over RCU	2022-07-05 13:32:59 -07:00
fail_function.c	kernel/fail_function: fix memory leak with using debugfs_lookup()	2023-02-08 13:36:22 +01:00
fork.c	fork: lock VMAs of the parent process when forking	2023-07-08 14:08:02 -07:00
freezer.c	freezer,sched: Rewrite core freezer logic	2022-09-07 21:53:50 +02:00
gen_kheaders.sh	Revert "kheaders: substituting --sort in archive creation"	2023-05-28 16:20:21 +09:00
groups.c	security: Add LSM hook to setgroups() syscall	2022-07-15 18:21:49 +00:00
hung_task.c	kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static	2023-04-08 13:45:37 -07:00
iomem.c
irq_work.c	trace: Add trace_ipi_send_cpu()	2023-03-24 11:01:29 +01:00
jump_label.c	jump_label: Prevent key->enabled int overflow	2022-12-01 15:53:05 -08:00
kallsyms_internal.h	kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[]	2022-11-12 18:47:36 -08:00
kallsyms_selftest.c	kallsyms: Delete an unused parameter related to {module_}kallsyms_on_each_symbol()	2023-03-19 13:27:19 -07:00
kallsyms_selftest.h	kallsyms: Add self-test facility	2022-11-15 00:42:02 -08:00
kallsyms.c	v6.5-rc1-modules-next	2023-06-28 15:51:08 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c	kcov: add prototypes for helper functions	2023-06-09 17:44:17 -07:00
kexec_core.c	kexec: enable kexec_crash_size to support two crash kernel regions	2023-06-09 17:44:24 -07:00
kexec_elf.c
kexec_file.c	- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in	2023-06-28 10:59:38 -07:00
kexec_internal.h	panic, kexec: make __crash_kexec() NMI safe	2022-09-11 21:55:06 -07:00
kexec.c	kexec: introduce sysctl parameters kexec_load_limit_*	2023-02-02 22:50:05 -08:00
kheaders.c	kheaders: Use array declaration instead of char	2023-03-24 20:10:59 -07:00
kprobes.c	fprobe: Pass return address to the handlers	2023-06-06 21:39:55 +09:00
ksyms_common.c	kallsyms: make kallsyms_show_value() as generic function	2023-06-08 12:27:20 -07:00
ksysfs.c	kernel/ksysfs.c: use sysfs_emit for sysfs show handlers	2023-03-24 17:09:14 +01:00
kthread.c	- Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in	2023-06-28 10:59:38 -07:00
latencytop.c	latencytop: use the last element of latency_record of system	2022-09-11 21:55:12 -07:00
Makefile	v6.5-rc1-modules-next	2023-06-28 15:51:08 -07:00
module_signature.c
notifier.c	notifiers: add tracepoints to the notifiers infrastructure	2023-04-08 13:45:38 -07:00
nsproxy.c	convert setns(2) to fdget()/fdput()	2023-04-20 22:55:35 -04:00
padata.c	padata: use alignment when calculating the number of worker threads	2023-03-14 17:06:44 +08:00
panic.c	panic: hide unused global functions	2023-06-09 17:44:15 -07:00
params.c	kallsyms: Replace all non-returning strlcpy with strscpy	2023-06-14 12:27:38 -07:00
pid_namespace.c	pid: use struct_size_t() helper	2023-07-01 08:26:23 -07:00
pid_sysctl.h	kernel: pid_namespace: remove unused set_memfd_noexec_scope()	2023-06-19 16:19:28 -07:00
pid.c	pid: use struct_size_t() helper	2023-07-01 08:26:23 -07:00
profile.c	kernel/profile.c: simplify duplicated code in profile_setup()	2022-09-11 21:55:12 -07:00
ptrace.c	ptrace: Provide set/get interface for syscall user dispatch	2023-04-16 14:23:07 +02:00
range.c
reboot.c	kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode	2022-10-04 15:59:36 +02:00
regset.c
relay.c	relayfs: fix out-of-bounds access in relay_file_read	2023-05-02 17:23:27 -07:00
resource_kunit.c
resource.c	dax/kmem: Fix leak of memory-hotplug resources	2023-02-17 14:58:01 -08:00
rseq.c	rseq: Extend struct rseq with per-memory-map concurrency ID	2022-12-27 12:52:12 +01:00
scftorture.c
scs.c	scs: add support for dynamic shadow call stacks	2022-11-09 18:06:35 +00:00
seccomp.c	seccomp: simplify sysctls with register_sysctl_init()	2023-04-13 11:49:20 -07:00
signal.c	v6.5-rc1-sysctl-next	2023-06-28 16:05:21 -07:00
smp.c	trace,smp: Add tracepoints for scheduling remotelly called functions	2023-06-16 22:08:09 +02:00
smpboot.c	cpu/hotplug: Remove unused state functions	2023-05-15 13:45:00 +02:00
smpboot.h
softirq.c	Revert "softirq: Let ksoftirqd do its job"	2023-05-09 21:50:27 +02:00
stackleak.c	stackleak: allow to specify arch specific stackleak poison function	2023-04-20 11:36:35 +02:00
stacktrace.c
static_call_inline.c	static_call: Add call depth tracking support	2022-10-17 16:41:16 +02:00
static_call.c
stop_machine.c	Scheduler changes in this cycle were:	2022-05-24 11:11:13 -07:00
sys_ni.c	asm-generic updates for 6.5	2023-07-06 10:06:04 -07:00
sys.c	riscv: Add prctl controls for userspace vector management	2023-06-08 07:16:53 -07:00
sysctl-test.c	kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred}	2022-09-08 16:56:45 -07:00
sysctl.c	v6.5-rc1-sysctl-next	2023-06-28 16:05:21 -07:00
task_work.c	task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run	2022-09-11 21:55:10 -07:00
taskstats.c	genetlink: start to validate reserved header bytes	2022-08-29 12:47:15 +01:00
torture.c	torture: Fix hang during kthread shutdown phase	2023-01-05 12:10:35 -08:00
tracepoint.c	tracepoint: Allow livepatch module add trace event	2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c	sysctl: fix unused proc_cap_handler() function warning	2023-06-29 15:19:43 -07:00
up.c
user_namespace.c	userns: fix a struct's kernel-doc notation	2023-02-02 22:50:04 -08:00
user-return-notifier.c
user.c	kernel/user: Allow user_struct::locked_vm to be usable for iommufd	2022-11-30 20:16:49 -04:00
usermode_driver.c
utsname_sysctl.c	utsname: simplify one-level sysctl registration for uts_kern_table	2023-04-13 11:49:35 -07:00
utsname.c
vhost_task.c	vhost: Fix worker hangs due to missed wake up calls	2023-06-08 15:43:09 -04:00
watch_queue.c	watch_queue: prevent dangling pipe pointer	2023-06-06 10:47:04 +02:00
watchdog_buddy.c	watchdog/hardlockup: move SMP barriers from common code to buddy code	2023-06-19 16:25:28 -07:00
watchdog_perf.c	watchdog/perf: add a weak function for an arch to detect if perf can use NMIs	2023-06-09 17:44:21 -07:00
watchdog.c	watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64	2023-06-19 16:25:29 -07:00
workqueue_internal.h	workqueue: Drop the special locking rule for worker->flags and worker_pool->flags	2023-08-07 15:57:22 -10:00
workqueue.c	workqueue: Implement non-strict affinity scope for unbound workqueues	2023-08-07 15:57:25 -10:00