linux/arch/ia64/kernel
Linus Torvalds 596ff4a09b cpumask: re-introduce constant-sized cpumask optimizations
Commit aa47a7c215 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.

The optimization was then later added back in a limited form by commit
6f9c07be9d ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.

Instead, just re-introduce the optimization, with some changes.

Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":

 - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.

   This is used for situations where we should use the exact size.

 - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
   fits in a single word and the bitmap operations thus end up able
   to trigger the "small_const_nbits()" optimizations.

   This is used for the operations that have optimized single-word
   cases that get inlined, notably the bit find and scanning functions.

 - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
   is an sufficiently small constant that makes simple "copy" and
   "clear" operations more efficient.

   This is arbitrarily set at four words or less.

As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like

        movl    nr_cpu_ids(%rip), %edx
        addq    $63, %rdx
        shrq    $3, %rdx
        andl    $-8, %edx
        callq   memset@PLT

on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.

In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single

	movq $0,cpumask

instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.

Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.

But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.

In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'.  Better remove them now than have somebody introduce use
of them later.

Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless.  Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05 14:30:34 -08:00
..
syscalls ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency 2022-09-11 21:55:07 -07:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
acpi-ext.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
acpi.c cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
asm-offsets.c ia64: do not typedef struct pal_min_state_area_s 2021-02-12 05:11:19 +09:00
audit.c audit: add support for the openat2 syscall 2021-10-01 16:52:48 -04:00
brl_emu.c signal: Remove the task parameter from force_sig_fault 2019-05-29 09:31:43 -05:00
crash_dump.c vmcore: convert copy_oldmem_page() to take an iov_iter 2022-04-29 14:37:59 -07:00
crash.c Kbuild updates for v5.12 2021-02-25 10:17:31 -08:00
cyclone.c remove ioremap_nocache and devm_ioremap_nocache 2020-01-06 09:45:59 +01:00
dma-mapping.c dma-mapping: split <linux/dma-mapping.h> 2020-10-06 07:07:03 +02:00
efi_stub.S mm: update legacy flush_tlb_* to use vma 2021-06-29 10:53:52 -07:00
efi.c efi: Drop minimum EFI version check at boot 2023-02-03 18:01:07 +01:00
elfcore.c elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size} 2023-01-05 15:12:12 +00:00
entry.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
entry.S ia64: syscalls: switch to generic syscalltbl.sh 2021-04-25 05:25:40 +09:00
err_inject.c ia64: fix format strings for err_inject 2021-03-25 09:22:55 -07:00
esi_stub.S treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
esi.c efi/ia64: Use existing helpers to locate ESI table 2020-02-23 21:59:42 +01:00
fsys.S arch/ia64/kernel/fsys.S: fix typos 2021-04-30 11:20:34 -07:00
fsyscall_gtod_data.h Stop ia64 being the last holdout using GENERIC_TIME_VSYSCALL_OLD 2017-11-13 12:15:40 -08:00
ftrace.c ftrace: Cleanup ftrace_dyn_arch_init() 2021-10-08 19:41:39 -04:00
gate-data.S
gate.lds.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
gate.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
head.S ia64: drop unused IA64_FW_EMU ifdef 2021-04-30 11:20:35 -07:00
iosapic.c genirq: Add and use an irq_data_update_affinity helper 2022-07-07 09:38:04 +01:00
irq_ia64.c ia64: Remove perfmon 2020-09-11 09:34:32 -07:00
irq_lsapic.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq.c genirq: Add and use an irq_data_update_affinity helper 2022-07-07 09:38:04 +01:00
irq.h ia64: replace setup_irq() by request_irq() 2020-03-13 15:21:28 -07:00
ivt.S mm: reorder includes after introduction of linux/pgtable.h 2020-06-09 09:39:13 -07:00
kprobes.c ia64: replace comments with C99 initializers 2022-04-28 23:17:25 -07:00
machine_kexec.c ia64: drop marked broken DISCONTIGMEM and VIRTUAL_MEM_MAP 2021-04-30 11:20:35 -07:00
Makefile kbuild: remove --include-dir MAKEFLAG from top Makefile 2023-02-05 18:51:22 +09:00
Makefile.gate ia64: require -Wl,--hash-style=sysv 2019-05-18 11:29:01 +09:00
mca_asm.S mm: reorder includes after introduction of linux/pgtable.h 2020-06-09 09:39:13 -07:00
mca_drv_asm.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mca_drv.c exit: Add and use make_task_dead. 2021-12-13 12:04:45 -06:00
mca_drv.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mca.c ia64: mca: use strscpy() is more robust and safer 2022-10-11 18:51:10 -07:00
minstate.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
module.c ia64: Rename 'ip' to 'addr' in 'struct fdesc' 2022-02-16 23:25:11 +11:00
msi_ia64.c genirq: Add and use an irq_data_update_affinity helper 2022-07-07 09:38:04 +01:00
numa.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
pal.S ia64: trivial spelling fixes 2021-04-30 11:20:35 -07:00
palinfo.c ia64: fix typos in comments 2022-04-28 23:17:25 -07:00
patch.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pci-dma.c ia64 for v5.4 - big change here is removal of support for SGI Altix 2019-09-16 15:32:01 -07:00
perfmon_itanium.h arch: ia64: Remove rest of perfmon support 2021-01-22 12:12:20 +05:30
process.c arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled 2023-01-13 11:48:15 +01:00
ptrace.c ia64: ptrace: user_regset_copyin_ignore() always returns 0 2022-11-15 14:30:40 -08:00
relocate_kernel.S mm: reorder includes after introduction of linux/pgtable.h 2020-06-09 09:39:13 -07:00
sal.c locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header 2020-08-06 16:13:13 +02:00
salinfo.c proc: remove PDE_DATA() completely 2022-01-22 08:33:37 +02:00
setup.c ia64: move from strlcpy with unused retval to strscpy 2022-09-11 21:55:09 -07:00
sigframe.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
signal.c resume_user_mode: Move to resume_user_mode.h 2022-03-10 16:51:50 -06:00
smp.c profile: setup_profiling_timer() is moslty not implemented 2022-07-29 18:12:36 -07:00
smpboot.c ia64: cleanup remove_siblinginfo() 2022-06-03 06:52:58 -07:00
stacktrace.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sys_ia64.c ia64: fix build error due to switch case label appearing next to declaration 2023-01-31 16:44:08 -08:00
time.c sched/cputime: Fix IA64 build error of missing arch_vtime_task_switch() prototype 2023-01-11 10:31:57 +01:00
topology.c drivers/base/node: consolidate node device subsystem initialization in node_dev_init() 2022-03-22 15:57:10 -07:00
traps.c ia64: fix typos in comments 2022-04-28 23:17:25 -07:00
unaligned.c ia64: remove CONFIG_SET_FS support 2022-02-25 09:36:06 +01:00
uncached.c mm: use for_each_online_node and node_online instead of open coding 2022-04-29 14:36:58 -07:00
unwind_decoder.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
unwind_i.h ia64: kernel: unwind_i.h: Replace zero-length array with flexible-array 2020-06-15 23:08:31 -05:00
unwind.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
vmlinux.lds.S objtool/idle: Validate __cpuidle code as noinstr 2023-01-13 11:48:15 +01:00