linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-03 09:31:26 +00:00

History

Borislav Petkov (AMD) 04c3024560 x86/barrier: Do not serialize MSR accesses on AMD AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there. There will be a CPUID bit which explicitly states that a MFENCE is not needed. Once that bit is added to the APM, this will be extended with it. While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day. Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>: On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency. In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done: 1) Disable Power States in BIOS (to prevent the system from going to lower power state) 2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more) With the above configuration: ) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] ) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI(). With the fix to skip weak_wrmsr_fence() *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%. Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU. Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de		2023-11-13 10:09:45 +01:00
..
acpi	X86 core code updates:	2023-10-30 17:37:47 -10:00
apic	Major microcode loader restructuring, cleanup and improvements by Thomas	2023-11-04 08:46:37 -10:00
cpu	x86/barrier: Do not serialize MSR accesses on AMD	2023-11-13 10:09:45 +01:00
fpu	- A kernel-doc fix	2023-10-30 12:36:41 -10:00
kprobes	X86 core updates:	2023-08-30 10:10:31 -07:00
.gitignore
alternative.c	x86/alternatives: Disable KASAN in apply_alternatives()	2023-10-12 20:27:16 +02:00
amd_gart_64.c	x86/mm: Remove PD_PAGE_MASK and PD_PAGE_SIZE macros	2022-12-15 10:37:27 -08:00
amd_nb.c	X86 core code updates:	2023-10-30 17:37:47 -10:00
aperture_64.c	x86: Fix various duplicate-word comment typos	2022-08-15 19:17:52 +02:00
apm_32.c	x86/APM: drop the duplicate APM_MINOR_DEV macro	2023-07-30 14:00:32 +02:00
asm-offsets_32.c
asm-offsets_64.c	x86: Fixup asm-offsets duplicate	2022-10-17 16:41:06 +02:00
asm-offsets.c	x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL	2023-09-12 16:28:13 -07:00
audit_64.c	x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class	2023-08-30 10:11:16 +02:00
bootflag.c
callthunks.c	x86/callthunks: Delete unused "struct thunk_desc"	2023-10-20 12:58:48 +02:00
cet.c	x86/ibt: Convert IBT selftest to asm	2023-08-17 17:07:09 +02:00
cfi.c	x86: Add support for CONFIG_CFI_CLANG	2022-09-26 10:13:16 -07:00
check.c
cpuid.c	x86/cpuid: make cpuid_class a static const structure	2023-08-05 08:31:41 +02:00
crash_core_32.c
crash_core_64.c
crash_dump_32.c
crash_dump_64.c	use less confusing names for iov_iter direction initializers	2022-11-25 13:01:55 -05:00
crash.c	ARM:	2023-09-07 13:52:20 -07:00
devicetree.c	x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()	2023-10-02 21:30:09 +02:00
doublefault_32.c	x86: Avoid missing-prototype warnings for doublefault code	2023-05-18 11:56:18 -07:00
dumpstack_32.c	x86/percpu: Move irq_stack variables next to current_task	2022-10-17 16:41:05 +02:00
dumpstack_64.c	x86/percpu: Move irq_stack variables next to current_task	2022-10-17 16:41:05 +02:00
dumpstack.c	x86/show_trace_log_lvl: Ensure stack pointer is aligned, again	2023-05-16 06:31:04 -07:00
e820.c	x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery	2023-01-11 12:45:16 +01:00
early_printk.c	x86/earlyprintk: Clean up pciserial	2022-08-29 12:19:25 +02:00
early-quirks.c
ebda.c
eisa.c
espfix_64.c	x86/espfix: Use get_random_long() rather than archrandom	2022-10-31 20:12:50 +01:00
ftrace_32.S	x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>	2023-10-03 10:38:07 +02:00
ftrace_64.S	x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>	2023-10-03 10:38:07 +02:00
ftrace.c	x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()	2023-07-10 21:38:13 -04:00
head32.c	x86/microcode/32: Move early loading after paging enable	2023-10-18 22:15:01 +02:00
head64.c	x86/head/64: Move the __head definition to <asm/init.h>	2023-10-17 14:51:14 +02:00
head_32.S	Major microcode loader restructuring, cleanup and improvements by Thomas	2023-11-04 08:46:37 -10:00
head_64.S	x86 MM handling code changes for v6.7:	2023-10-30 15:40:57 -10:00
hpet.c	x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC	2023-10-12 08:13:27 +02:00
hw_breakpoint.c	x86/amd: Cache debug register values in percpu variables	2023-01-31 20:09:26 +01:00
i8237.c
i8253.c
i8259.c	x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility	2023-10-27 20:36:49 +02:00
ibt_selftest.S	x86/ibt: Convert IBT selftest to asm	2023-08-17 17:07:09 +02:00
idt.c	x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()	2023-09-14 13:19:53 +02:00
io_delay.c
ioport.c
irq_32.c	x86/percpu: Move irq_stack variables next to current_task	2022-10-17 16:41:05 +02:00
irq_64.c	x86/percpu: Move irq_stack variables next to current_task	2022-10-17 16:41:05 +02:00
irq_work.c	x86/apic: Wrap IPI calls into helper functions	2023-08-09 12:00:55 -07:00
irq.c	x86/apic: Nuke ack_APIC_irq()	2023-08-09 11:58:34 -07:00
irqflags.S	x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>	2023-10-03 10:38:07 +02:00
irqinit.c	x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL	2023-01-16 17:24:56 +01:00
itmt.c	arch/x86: Remove now superfluous sentinel elem from ctl_table arrays	2023-10-10 15:22:02 -07:00
jailhouse.c	x86/apic: Remove the pointless APIC version check	2023-08-09 11:58:19 -07:00
jump_label.c	jump_label: make initial NOP patching the special case	2022-06-24 09:48:55 +02:00
kdebugfs.c
kexec-bzimage64.c	docs: move x86 documentation into Documentation/arch/	2023-03-30 12:58:51 -06:00
kgdb.c	x86/kgdb: Fix a kerneldoc warning when build with W=1	2023-09-24 11:00:13 +02:00
ksysfs.c
kvm.c	x86/apic: Use u32 for APIC IDs in global data	2023-10-10 14:38:18 +02:00
kvmclock.c	x86/tsc: Provide sched_clock_noinstr()	2023-06-05 21:11:08 +02:00
ldt.c	x86: allow get_locked_pte() to fail	2023-06-19 16:19:10 -07:00
machine_kexec_32.c
machine_kexec_64.c	x86/kexec: remove unnecessary arch_kexec_kernel_image_load()	2023-04-08 13:45:38 -07:00
Makefile	x86/boot/32: Disable stackprotector and tracing for mk_early_pgtbl_32()	2023-10-18 11:11:43 +02:00
mmconf-fam10h_64.c
module.c	x86/alternative: Rename apply_ibt_endbr()	2023-07-10 09:52:23 +02:00
mpparse.c	x86/apic: Sanitize APIC address setup	2023-08-09 11:58:20 -07:00
msr.c	x86/MSR: make msr_class a static const structure	2023-08-05 08:31:42 +02:00
nmi_selftest.c	x86/apic: Wrap IPI calls into helper functions	2023-08-09 12:00:55 -07:00
nmi.c	Major microcode loader restructuring, cleanup and improvements by Thomas	2023-11-04 08:46:37 -10:00
paravirt-spinlocks.c
paravirt.c	x86/xen: move paravirt lazy code	2023-09-19 07:04:49 +02:00
pci-dma.c	x86: always initialize xen-swiotlb when xen-pcifront is enabling	2023-07-31 17:54:27 +02:00
pcspeaker.c
perf_regs.c
platform-quirks.c	x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled()	2023-05-18 11:56:18 -07:00
pmem.c	x86/pmem: Fix platform-device leak in error path	2022-06-20 18:01:16 +02:00
probe_roms.c
process_32.c	x86/resctl: fix scheduler confusion with 'current'	2023-03-08 11:48:11 -08:00
process_64.c	x86/shstk: Add ARCH_SHSTK_STATUS	2023-08-02 15:01:51 -07:00
process.c	x86/shstk: Remove useless clone error handling	2023-09-19 09:18:34 -07:00
process.h
ptrace.c	x86: Add PTRACE interface for shadow stack	2023-08-02 15:01:51 -07:00
pvclock.c	locking/atomic: treewide: use raw_atomic*_<op>()	2023-06-05 09:57:20 +02:00
quirks.c
reboot_fixups_32.c
reboot.c	x86/reboot: Expose VMCS crash hooks if and only if KVM_{INTEL,AMD} is enabled	2023-08-03 15:37:14 -07:00
relocate_kernel_32.S	x86/kexec: Disable RET on kexec	2022-07-09 13:12:32 +02:00
relocate_kernel_64.S	x86,objtool: Split UNWIND_HINT_EMPTY in two	2023-03-23 23:18:58 +01:00
resource.c	x86/PCI: Tidy E820 removal messages	2022-12-10 10:33:11 -06:00
rethook.c
rtc.c	x86/rtc: Simplify PNP ids check	2023-01-06 04:22:34 +01:00
setup_percpu.c	x86/apic/32: Remove x86_cpu_to_logical_apicid	2023-08-09 11:58:23 -07:00
setup.c	TTY/Serial changes for 6.7-rc1	2023-11-03 15:44:25 -10:00
sev_verify_cbit.S
sev-shared.c	Take care of a race between when the #VC exception is raised and when	2023-10-19 18:12:08 -07:00
sev.c	X86 core code updates:	2023-10-30 17:37:47 -10:00
shstk.c	x86/shstk: Add warning for shadow stack double unmap	2023-09-19 09:18:34 -07:00
signal_32.c	x86/shstk: Add user control-protection fault handler	2023-08-02 15:01:50 -07:00
signal_64.c	x86/shstk: Handle signals for shadow stack	2023-08-02 15:01:50 -07:00
signal.c	x86/shstk: Handle signals for shadow stack	2023-08-02 15:01:50 -07:00
smp.c	Revert "x86/smp: Put CPUs into INIT on shutdown if possible"	2023-10-15 12:02:02 -07:00
smpboot.c	Major microcode loader restructuring, cleanup and improvements by Thomas	2023-11-04 08:46:37 -10:00
stacktrace.c
static_call.c	x86/static_call: Fix __static_call_fixup()	2023-08-17 13:24:09 +02:00
step.c
sys_ia32.c
sys_x86_64.c	x86/mm: Introduce MAP_ABOVE4G	2023-07-11 14:12:19 -07:00
tboot.c	mm: remove rb tree.	2022-09-26 19:46:16 -07:00
time.c
tls.c	x86/gsseg: Move load_gs_index() to its own new header file	2023-01-12 13:06:36 +01:00
tls.h
topology.c	cpu-hotplug: Provide prototypes for arch CPU registration	2023-10-11 14:27:37 +02:00
trace_clock.c
trace.c
tracepoint.c
traps.c	Add x86 shadow stack support	2023-08-31 12:20:12 -07:00
tsc_msr.c
tsc_sync.c	x86/tsc: Defer marking TSC unstable to a worker	2023-10-27 20:36:57 +02:00
tsc.c	x86/tsc: Extend watchdog check exemption to 4-Sockets platform	2023-07-14 15:17:09 -07:00
umip.c
unwind_frame.c	x86: kmsan: don't instrument stack walking functions	2022-10-03 14:03:25 -07:00
unwind_guess.c
unwind_orc.c	x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()	2023-09-21 08:41:23 +02:00
uprobes.c	uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix	2022-12-05 11:55:18 +01:00
verify_cpu.S
vm86_32.c
vmlinux.lds.S	x86/srso: Disentangle rethunk-dependent options	2023-10-20 12:30:50 +02:00
vsmp_64.c	x86/apic: Use u32 for phys_pkg_id()	2023-10-10 14:38:19 +02:00
x86_init.c	- Fix a race window where load_unaligned_zeropad() could cause	2023-06-26 16:32:47 -07:00