linux/arch/x86
Linus Torvalds 47ee3f1dd9 x86: re-introduce support for ERMS copies for user space accesses
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd5 ("x86: inline the 'rep movs' in
user copies for the FSRM case").

We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.

However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.

And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.

Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code.  This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".

Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ff: "x86: don't use REP_GOOD or ERMS for
small memory copies").  But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.

In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".

Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-26 12:34:20 -07:00
..
boot * Do conditional __tdx_hypercall() 'output' processing via an 2023-04-28 09:36:09 -07:00
coco * Do conditional __tdx_hypercall() 'output' processing via an 2023-04-28 09:36:09 -07:00
configs
crypto modules-6.4-rc1 2023-04-27 16:36:55 -07:00
entry Objtool changes for v6.4: 2023-04-28 14:02:54 -07:00
events perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG 2023-05-08 10:58:27 +02:00
hyperv Objtool changes for v6.4: 2023-04-28 14:02:54 -07:00
ia32
include KVM: VMX: Fix header file dependency of asm/vmx.h 2023-05-19 13:56:25 -04:00
kernel Probes fixes for 6.4-rc1: 2023-05-18 09:04:45 -07:00
kvm KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save 2023-05-21 04:05:51 -04:00
lib x86: re-introduce support for ERMS copies for user space accesses 2023-05-26 12:34:20 -07:00
math-emu
mm x86/mm: Avoid incomplete Global INVLPG flushes 2023-05-17 08:55:02 -07:00
net bpf, x86: Simplify the parsing logic of structure parameters 2023-01-10 15:53:22 -08:00
pci pci-v6.4-changes 2023-04-27 10:45:30 -07:00
platform Objtool changes for v6.4: 2023-04-28 14:02:54 -07:00
power x86/cpu: Mark {hlt,resume}_play_dead() __noreturn 2023-04-14 17:31:27 +02:00
purgatory purgatory: fix disabling debug info 2023-04-08 19:36:53 +09:00
ras
realmode x86/boot: Skip realmode init code when running as Xen PV guest 2022-11-25 12:05:22 +01:00
tools ELF: fix all "Elf" typos 2023-04-08 13:45:37 -07:00
um um: make stub data pages size tweakable 2023-04-20 23:08:43 +02:00
video
virt/vmx/tdx
xen Objtool changes for v6.4: 2023-04-28 14:02:54 -07:00
.gitignore
Kbuild
Kconfig Add support for new Linear Address Masking CPU feature. This is similar 2023-04-28 09:43:49 -07:00
Kconfig.assembler crypto: x86/aria-avx - fix build failure with old binutils 2023-01-20 18:29:31 +08:00
Kconfig.cpu
Kconfig.debug docs: move x86 documentation into Documentation/arch/ 2023-03-30 12:58:51 -06:00
Makefile x86/build: Make 64-bit defconfig the default 2023-02-15 14:20:17 +01:00
Makefile_32.cpu
Makefile.um um: Only disable SSE on clang to work around old GCC bugs 2023-04-04 09:57:05 +02:00