linux/arch/x86/lib
Mateusz Guzik ca96b162bf x86: bring back rep movsq for user access on CPUs without ERMS
Intel CPUs ship with ERMS for over a decade, but this is not true for
AMD.  In particular one reasonably recent uarch (EPYC 7R13) does not
have it (or at least the bit is inactive when running on the Amazon EC2
cloud -- I found rather conflicting information about AMD CPUs vs the
extension).

Hand-rolled mov loops executing in this case are quite pessimal compared
to rep movsq for bigger sizes.  While the upper limit depends on uarch,
everyone is well south of 1KB AFAICS and sizes bigger than that are
common.

While technically ancient CPUs may be suffering from rep usage, gcc has
been emitting it for years all over kernel code, so I don't think this
is a legitimate concern.

Sample result from read1_processes from will-it-scale (4KB reads/s):

  before:   1507021
  after:    1721828 (+14%)

Note that the cutoff point for rep usage is set to 64 bytes, which is
way too conservative but I'm sticking to what was done in 47ee3f1dd9
("x86: re-introduce support for ERMS copies for user space accesses").
That is to say *some* copies will now go slower, which is fixable but
beyond the scope of this patch.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-30 09:45:12 -07:00
..
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
atomic64_32.c
atomic64_386_32.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
atomic64_cx8_32.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
cache-smp.c smp: Remove smp_call_function() and on_each_cpu() return values 2019-06-23 14:26:26 +02:00
checksum_32.S x86/checksum_32: Remove .fixup usage 2021-12-11 09:09:49 +01:00
clear_page_64.S x86: improve on the non-rep 'clear_user' function 2023-04-18 17:05:28 -07:00
cmdline.c x86/lib: Fix compiler and kernel-doc warnings 2023-01-03 18:46:21 +01:00
cmpxchg8b_emu.S percpu: Wire up cmpxchg128 2023-06-05 09:36:37 +02:00
cmpxchg16b_emu.S percpu: Wire up cmpxchg128 2023-06-05 09:36:37 +02:00
copy_mc_64.S x86/copy_mc_64: Remove .fixup usage 2021-12-11 09:09:46 +01:00
copy_mc.c x86, libnvdimm/test: Remove COPY_MC_TEST 2020-10-26 18:08:35 +01:00
copy_page_64.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
copy_user_64.S x86: bring back rep movsq for user access on CPUs without ERMS 2023-08-30 09:45:12 -07:00
copy_user_uncached_64.S x86: rewrite '__copy_user_nocache' function 2023-04-20 18:53:49 -07:00
cpu.c x86/lib/cpu: Address missing prototypes warning 2019-08-08 08:25:53 +02:00
csum-copy_64.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
csum-partial_64.c x86/csum: Fix clang -Wuninitialized in csum_partial() 2023-05-29 06:52:32 -07:00
csum-wrappers_64.c net: unexport csum_and_copy_{from,to}_user 2022-04-29 14:37:59 -07:00
delay.c x86/delay: Fix the wrong asm constraint in delay_loop() 2022-04-05 21:21:57 +02:00
error-inject.c x86/error_inject: Align function properly 2022-10-17 16:40:59 +02:00
getuser.S x86/lib: Make get/put_user() exception handling a visible symbol 2023-06-02 10:51:46 +02:00
hweight.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
inat.c x86/insn: Add a __ignore_sync_check__ marker 2021-03-15 11:00:57 +01:00
insn-eval.c x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type 2023-01-03 18:46:06 +01:00
insn.c x86/insn: Use get_unaligned() instead of memcpy() 2021-10-06 11:56:37 +02:00
iomap_copy_64.S x86/asm: Fix an assembler warning with current binutils 2023-01-03 17:55:11 +01:00
iomem.c x86: kmsan: handle open-coded assembly in lib/iomem.c 2022-10-03 14:03:24 -07:00
kaslr.c x86/kaslr: Fix build warning in KASLR code in boot stub 2022-04-11 09:41:12 +02:00
Makefile percpu: Wire up cmpxchg128 2023-06-05 09:36:37 +02:00
memcpy_32.c x86/mem: Move memmove to out of line assembler 2022-11-01 15:44:07 -07:00
memcpy_64.S x86: don't use REP_GOOD or ERMS for small memory copies 2023-04-18 17:05:28 -07:00
memmove_32.S x86/mem: Move memmove to out of line assembler 2022-11-01 15:44:07 -07:00
memmove_64.S x86/lib/memmove: Decouple ERMS from FSRM 2023-05-10 14:51:56 +02:00
memset_64.S x86: don't use REP_GOOD or ERMS for small memory clearing 2023-04-18 17:05:28 -07:00
misc.c x86/lib: Include <asm/misc.h> to fix a missing prototypes warning at build time 2023-01-03 11:11:03 +01:00
msr-reg-export.c
msr-reg.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
msr-smp.c x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes 2021-03-22 21:37:03 +01:00
msr.c x86/lib/msr: Clean up kernel-doc notation 2023-06-06 15:19:50 +02:00
pc-conf-reg.c x86: Add support for 0x22/0x23 port I/O configuration space 2021-08-10 23:31:43 +02:00
putuser.S x86/lib: Make get/put_user() exception handling a visible symbol 2023-06-02 10:51:46 +02:00
retpoline.S x86/srso: Explain the untraining sequences a bit more 2023-08-16 21:58:59 +02:00
string_32.c lib/string: Move helper functions out of string.c 2021-09-25 08:20:49 -07:00
strstr_32.c
usercopy_32.c x86/usercopy: Remove .fixup usage 2021-12-11 09:09:50 +01:00
usercopy_64.c x86/usercopy: Include arch_wb_cache_pmem() declaration 2023-05-18 11:56:18 -07:00
usercopy.c x86/uaccess: instrument copy_from_user_nmi() 2022-11-08 15:57:24 -08:00
x86-opcode-map.txt x86/opcode: Add the LKGS instruction to x86-opcode-map 2023-01-12 13:06:36 +01:00