linux/arch/x86/kernel
Borislav Petkov (AMD) 04c3024560 x86/barrier: Do not serialize MSR accesses on AMD
AMD does not have the requirement for a synchronization barrier when
acccessing a certain group of MSRs. Do not incur that unnecessary
penalty there.

There will be a CPUID bit which explicitly states that a MFENCE is not
needed. Once that bit is added to the APM, this will be extended with
it.

While at it, move to processor.h to avoid include hell. Untangling that
file properly is a matter for another day.

Some notes on the performance aspect of why this is relevant, courtesy
of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>:

On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM
shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The
ipi-bench is modified so that the IPIs are sent between two vCPUs in the
same CCX. This also requires to pin the vCPU to a physical core to
prevent any latencies. This simulates the use case of pinning vCPUs to
the thread of a single CCX to avoid interrupt IPI latency.

In order to avoid run-to-run variance (for both x2AVIC and AVIC), the
below configurations are done:

  1) Disable Power States in BIOS (to prevent the system from going to
     lower power state)

  2) Run the system at fixed frequency 2500MHz (to prevent the system
     from increasing the frequency when the load is more)

With the above configuration:

*) Performance measured using ipi-bench for AVIC:
  Average Latency:  1124.98ns [Time to send IPI from one vCPU to another vCPU]

  Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from
  				     48 vCPUs simultaneously]

*) Performance measured using ipi-bench for x2AVIC:
  Average Latency:  1172.42ns [Time to send IPI from one vCPU to another vCPU]

  Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from
  				     48 vCPUs simultaneously]

From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is
x2AVIC performance to be better or equivalent to AVIC. Upon analyzing
the perf captures, it is observed significant time is spent in
weak_wrmsr_fence() invoked by x2apic_send_IPI().

With the fix to skip weak_wrmsr_fence()

*) Performance measured using ipi-bench for x2AVIC:
  Average Latency:  1117.44ns [Time to send IPI from one vCPU to another vCPU]

  Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from
  				     48 vCPUs simultaneously]

Comparing the performance of x2AVIC with and without the fix, it can be seen
the performance improves by ~4%.

Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option
with and without weak_wrmsr_fence() on a Zen4 system also showed significant
performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores
CCX or CCD and just picks random vCPU.

  Average throughput (10 iterations) with weak_wrmsr_fence(),
        Cumulative throughput: 4933374 IPI/s

  Average throughput (10 iterations) without weak_wrmsr_fence(),
        Cumulative throughput: 6355156 IPI/s

[1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de
2023-11-13 10:09:45 +01:00
..
acpi X86 core code updates: 2023-10-30 17:37:47 -10:00
apic Major microcode loader restructuring, cleanup and improvements by Thomas 2023-11-04 08:46:37 -10:00
cpu x86/barrier: Do not serialize MSR accesses on AMD 2023-11-13 10:09:45 +01:00
fpu - A kernel-doc fix 2023-10-30 12:36:41 -10:00
kprobes X86 core updates: 2023-08-30 10:10:31 -07:00
.gitignore
alternative.c x86/alternatives: Disable KASAN in apply_alternatives() 2023-10-12 20:27:16 +02:00
amd_gart_64.c x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros 2022-12-15 10:37:27 -08:00
amd_nb.c X86 core code updates: 2023-10-30 17:37:47 -10:00
aperture_64.c x86: Fix various duplicate-word comment typos 2022-08-15 19:17:52 +02:00
apm_32.c x86/APM: drop the duplicate APM_MINOR_DEV macro 2023-07-30 14:00:32 +02:00
asm-offsets_32.c
asm-offsets_64.c x86: Fixup asm-offsets duplicate 2022-10-17 16:41:06 +02:00
asm-offsets.c x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL 2023-09-12 16:28:13 -07:00
audit_64.c x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class 2023-08-30 10:11:16 +02:00
bootflag.c
callthunks.c x86/callthunks: Delete unused "struct thunk_desc" 2023-10-20 12:58:48 +02:00
cet.c x86/ibt: Convert IBT selftest to asm 2023-08-17 17:07:09 +02:00
cfi.c x86: Add support for CONFIG_CFI_CLANG 2022-09-26 10:13:16 -07:00
check.c
cpuid.c x86/cpuid: make cpuid_class a static const structure 2023-08-05 08:31:41 +02:00
crash_core_32.c
crash_core_64.c
crash_dump_32.c
crash_dump_64.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
crash.c ARM: 2023-09-07 13:52:20 -07:00
devicetree.c x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init() 2023-10-02 21:30:09 +02:00
doublefault_32.c x86: Avoid missing-prototype warnings for doublefault code 2023-05-18 11:56:18 -07:00
dumpstack_32.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
dumpstack_64.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
dumpstack.c x86/show_trace_log_lvl: Ensure stack pointer is aligned, again 2023-05-16 06:31:04 -07:00
e820.c x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery 2023-01-11 12:45:16 +01:00
early_printk.c x86/earlyprintk: Clean up pciserial 2022-08-29 12:19:25 +02:00
early-quirks.c
ebda.c
eisa.c
espfix_64.c x86/espfix: Use get_random_long() rather than archrandom 2022-10-31 20:12:50 +01:00
ftrace_32.S x86/headers: Replace #include <asm/export.h> with #include <linux/export.h> 2023-10-03 10:38:07 +02:00
ftrace_64.S x86/headers: Replace #include <asm/export.h> with #include <linux/export.h> 2023-10-03 10:38:07 +02:00
ftrace.c x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret() 2023-07-10 21:38:13 -04:00
head32.c x86/microcode/32: Move early loading after paging enable 2023-10-18 22:15:01 +02:00
head64.c x86/head/64: Move the __head definition to <asm/init.h> 2023-10-17 14:51:14 +02:00
head_32.S Major microcode loader restructuring, cleanup and improvements by Thomas 2023-11-04 08:46:37 -10:00
head_64.S x86 MM handling code changes for v6.7: 2023-10-30 15:40:57 -10:00
hpet.c x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC 2023-10-12 08:13:27 +02:00
hw_breakpoint.c x86/amd: Cache debug register values in percpu variables 2023-01-31 20:09:26 +01:00
i8237.c
i8253.c
i8259.c x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility 2023-10-27 20:36:49 +02:00
ibt_selftest.S x86/ibt: Convert IBT selftest to asm 2023-08-17 17:07:09 +02:00
idt.c x86/entry: Make IA32 syscalls' availability depend on ia32_enabled() 2023-09-14 13:19:53 +02:00
io_delay.c
ioport.c
irq_32.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
irq_64.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
irq_work.c x86/apic: Wrap IPI calls into helper functions 2023-08-09 12:00:55 -07:00
irq.c x86/apic: Nuke ack_APIC_irq() 2023-08-09 11:58:34 -07:00
irqflags.S x86/headers: Replace #include <asm/export.h> with #include <linux/export.h> 2023-10-03 10:38:07 +02:00
irqinit.c x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL 2023-01-16 17:24:56 +01:00
itmt.c arch/x86: Remove now superfluous sentinel elem from ctl_table arrays 2023-10-10 15:22:02 -07:00
jailhouse.c x86/apic: Remove the pointless APIC version check 2023-08-09 11:58:19 -07:00
jump_label.c jump_label: make initial NOP patching the special case 2022-06-24 09:48:55 +02:00
kdebugfs.c
kexec-bzimage64.c docs: move x86 documentation into Documentation/arch/ 2023-03-30 12:58:51 -06:00
kgdb.c x86/kgdb: Fix a kerneldoc warning when build with W=1 2023-09-24 11:00:13 +02:00
ksysfs.c
kvm.c x86/apic: Use u32 for APIC IDs in global data 2023-10-10 14:38:18 +02:00
kvmclock.c x86/tsc: Provide sched_clock_noinstr() 2023-06-05 21:11:08 +02:00
ldt.c x86: allow get_locked_pte() to fail 2023-06-19 16:19:10 -07:00
machine_kexec_32.c
machine_kexec_64.c x86/kexec: remove unnecessary arch_kexec_kernel_image_load() 2023-04-08 13:45:38 -07:00
Makefile x86/boot/32: Disable stackprotector and tracing for mk_early_pgtbl_32() 2023-10-18 11:11:43 +02:00
mmconf-fam10h_64.c
module.c x86/alternative: Rename apply_ibt_endbr() 2023-07-10 09:52:23 +02:00
mpparse.c x86/apic: Sanitize APIC address setup 2023-08-09 11:58:20 -07:00
msr.c x86/MSR: make msr_class a static const structure 2023-08-05 08:31:42 +02:00
nmi_selftest.c x86/apic: Wrap IPI calls into helper functions 2023-08-09 12:00:55 -07:00
nmi.c Major microcode loader restructuring, cleanup and improvements by Thomas 2023-11-04 08:46:37 -10:00
paravirt-spinlocks.c
paravirt.c x86/xen: move paravirt lazy code 2023-09-19 07:04:49 +02:00
pci-dma.c x86: always initialize xen-swiotlb when xen-pcifront is enabling 2023-07-31 17:54:27 +02:00
pcspeaker.c
perf_regs.c
platform-quirks.c x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled() 2023-05-18 11:56:18 -07:00
pmem.c x86/pmem: Fix platform-device leak in error path 2022-06-20 18:01:16 +02:00
probe_roms.c
process_32.c x86/resctl: fix scheduler confusion with 'current' 2023-03-08 11:48:11 -08:00
process_64.c x86/shstk: Add ARCH_SHSTK_STATUS 2023-08-02 15:01:51 -07:00
process.c x86/shstk: Remove useless clone error handling 2023-09-19 09:18:34 -07:00
process.h
ptrace.c x86: Add PTRACE interface for shadow stack 2023-08-02 15:01:51 -07:00
pvclock.c locking/atomic: treewide: use raw_atomic*_<op>() 2023-06-05 09:57:20 +02:00
quirks.c
reboot_fixups_32.c
reboot.c x86/reboot: Expose VMCS crash hooks if and only if KVM_{INTEL,AMD} is enabled 2023-08-03 15:37:14 -07:00
relocate_kernel_32.S x86/kexec: Disable RET on kexec 2022-07-09 13:12:32 +02:00
relocate_kernel_64.S x86,objtool: Split UNWIND_HINT_EMPTY in two 2023-03-23 23:18:58 +01:00
resource.c x86/PCI: Tidy E820 removal messages 2022-12-10 10:33:11 -06:00
rethook.c
rtc.c x86/rtc: Simplify PNP ids check 2023-01-06 04:22:34 +01:00
setup_percpu.c x86/apic/32: Remove x86_cpu_to_logical_apicid 2023-08-09 11:58:23 -07:00
setup.c TTY/Serial changes for 6.7-rc1 2023-11-03 15:44:25 -10:00
sev_verify_cbit.S
sev-shared.c Take care of a race between when the #VC exception is raised and when 2023-10-19 18:12:08 -07:00
sev.c X86 core code updates: 2023-10-30 17:37:47 -10:00
shstk.c x86/shstk: Add warning for shadow stack double unmap 2023-09-19 09:18:34 -07:00
signal_32.c x86/shstk: Add user control-protection fault handler 2023-08-02 15:01:50 -07:00
signal_64.c x86/shstk: Handle signals for shadow stack 2023-08-02 15:01:50 -07:00
signal.c x86/shstk: Handle signals for shadow stack 2023-08-02 15:01:50 -07:00
smp.c Revert "x86/smp: Put CPUs into INIT on shutdown if possible" 2023-10-15 12:02:02 -07:00
smpboot.c Major microcode loader restructuring, cleanup and improvements by Thomas 2023-11-04 08:46:37 -10:00
stacktrace.c
static_call.c x86/static_call: Fix __static_call_fixup() 2023-08-17 13:24:09 +02:00
step.c
sys_ia32.c
sys_x86_64.c x86/mm: Introduce MAP_ABOVE4G 2023-07-11 14:12:19 -07:00
tboot.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
time.c
tls.c x86/gsseg: Move load_gs_index() to its own new header file 2023-01-12 13:06:36 +01:00
tls.h
topology.c cpu-hotplug: Provide prototypes for arch CPU registration 2023-10-11 14:27:37 +02:00
trace_clock.c
trace.c
tracepoint.c
traps.c Add x86 shadow stack support 2023-08-31 12:20:12 -07:00
tsc_msr.c
tsc_sync.c x86/tsc: Defer marking TSC unstable to a worker 2023-10-27 20:36:57 +02:00
tsc.c x86/tsc: Extend watchdog check exemption to 4-Sockets platform 2023-07-14 15:17:09 -07:00
umip.c
unwind_frame.c x86: kmsan: don't instrument stack walking functions 2022-10-03 14:03:25 -07:00
unwind_guess.c
unwind_orc.c x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find() 2023-09-21 08:41:23 +02:00
uprobes.c uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix 2022-12-05 11:55:18 +01:00
verify_cpu.S
vm86_32.c
vmlinux.lds.S x86/srso: Disentangle rethunk-dependent options 2023-10-20 12:30:50 +02:00
vsmp_64.c x86/apic: Use u32 for phys_pkg_id() 2023-10-10 14:38:19 +02:00
x86_init.c - Fix a race window where load_unaligned_zeropad() could cause 2023-06-26 16:32:47 -07:00