linux

Author	SHA1	Message	Date
Chang S. Bae	824eea38d2	x86/fsgsbase/64: Convert the ELF core dump code to the new FSGSBASE helpers Replace open-coded rdmsr()'s with their <asm/fsgsbase.h> API counterparts. No change in functionality intended. [ mingo: Wrote new changelog. ] Based-on-code-from: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Markus T Metzger <markus.t.metzger@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/1537312139-5580-5-git-send-email-chang.seok.bae@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-08 10:41:09 +02:00
Chang S. Bae	e696c231be	x86/fsgsbase/64: Make ptrace use the new FS/GS base helpers Use the new FS/GS base helper functions in <asm/fsgsbase.h> in the platform specific ptrace implementation of the following APIs: PTRACE_ARCH_PRCTL, PTRACE_SETREG, PTRACE_GETREG, etc. The fsgsbase code is more abstracted out this way and the FS/GS-update mechanism will be easier to change this way. [ mingo: Wrote new changelog. ] Based-on-code-from: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Markus T Metzger <markus.t.metzger@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1537312139-5580-4-git-send-email-chang.seok.bae@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-08 10:41:08 +02:00
Chang S. Bae	b1378a561f	x86/fsgsbase/64: Introduce FS/GS base helper functions Introduce FS/GS base access functionality via <asm/fsgsbase.h>, not yet used by anything directly. Factor out task_seg_base() from x86/ptrace.c and rename it to x86_fsgsbase_read_task() to make it part of the new helpers. This will allow us to enhance FSGSBASE support and eventually enable the FSBASE/GSBASE instructions. An "inactive" GS base refers to a base saved at kernel entry and being part of an inactive, non-running/stopped user-task. (The typical ptrace model.) Here are the new functions: x86_fsbase_read_task() x86_gsbase_read_task() x86_fsbase_write_task() x86_gsbase_write_task() x86_fsbase_read_cpu() x86_fsbase_write_cpu() x86_gsbase_read_cpu_inactive() x86_gsbase_write_cpu_inactive() As an advantage of the unified namespace we can now see all FS/GSBASE API use in the kernel via the following 'git grep' pattern: $ git grep x86_.*sbase [ mingo: Wrote new changelog. ] Based-on-code-from: Andy Lutomirski <luto@kernel.org> Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Markus T Metzger <markus.t.metzger@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1537312139-5580-3-git-send-email-chang.seok.bae@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-08 10:41:08 +02:00
Andy Lutomirski	07e1d88ada	x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately On 64-bit kernels ptrace can read the FS/GS base using the register access APIs (PTRACE_PEEKUSER, etc.) or PTRACE_ARCH_PRCTL. Make both of these mechanisms return the actual FS/GS base. This will improve debuggability by providing the correct information to ptracer such as GDB. [ chang: Rebased and revised patch description. ] [ mingo: Revised the changelog some more. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Markus T Metzger <markus.t.metzger@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-08 10:41:08 +02:00
Ingo Molnar	edfbeecd92	Merge branch 'linus' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-08 10:40:34 +02:00
Eric Biggers	e0db9c48f1	crypto: x86/aes-ni - fix build error following fpu template removal aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit() to register/unregister the "fpu" template. But these functions don't exist anymore, causing a build error. Remove the calls to them. Fixes: `944585a64f` ("crypto: x86/aes-ni - remove special handling of AES in PCBC mode") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-10-08 13:47:02 +08:00
Borislav Petkov	fa112cf1e8	x86/olpc: Fix build error with CONFIG_MFD_CS5535=m When building a 32-bit config which has the above MFD item as module but OLPC_XO1_PM is enabled =y - which is bool, btw - the kernel fails building with: ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_remove': /home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:159: undefined reference to `mfd_cell_disable' ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_probe': /home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:133: undefined reference to `mfd_cell_enable' make: *** [Makefile:1030: vmlinux] Error 1 Force MFD_CS5535 to y if OLPC_XO1_PM is enabled. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Lubomir Rintel <lkundrak@v3.sk> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20181005131750.GA5366@zn.tnic	2018-10-06 20:40:43 +02:00
Nadav Amit	5bdcd510c2	x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block - which is also a minor cleanup for the jump-label code. As a result the code size is slightly increased, but inlining decisions are better: text data bss dec hex filename 18163528 10226300 2957312 31347140 `1de51c4` ./vmlinux before 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+1128) And functions such as intel_pstate_adjust_policy_max(), kvm_cpu_accept_dm_intr(), kvm_register_readl() are inlined. Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181005202718.229565-4-namit@vmware.com Link: https://lore.kernel.org/lkml/20181003213100.189959-11-namit@vmware.com/T/#u Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-06 15:52:17 +02:00
Nadav Amit	d5a581d84a	x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block - which is pretty pointless indirection in the static_cpu_has() case, but is worth it to improve overall inlining quality. The patch slightly increases the kernel size: text data bss dec hex filename 18162879 10226256 2957312 31346447 1de4f0f ./vmlinux before 18163528 10226300 2957312 31347140 `1de51c4` ./vmlinux after (+693) And enables the inlining of function such as free_ldt_pgtables(). Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181005202718.229565-3-namit@vmware.com Link: https://lore.kernel.org/lkml/20181003213100.189959-10-namit@vmware.com/T/#u Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-06 15:52:16 +02:00
Nadav Amit	0474d5d9d2	x86/extable: Macrofy inline assembly code to work around GCC inlining bugs As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block - which is also a minor cleanup for the exception table code. Text size goes up a bit: text data bss dec hex filename 18162555 10226288 2957312 31346155 1de4deb ./vmlinux before 18162879 10226256 2957312 31346447 1de4f0f ./vmlinux after (+292) But this allows the inlining of functions such as nested_vmx_exit_reflected(), set_segment_reg(), __copy_xstate_to_user() which is a net benefit. Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181005202718.229565-2-namit@vmware.com Link: https://lore.kernel.org/lkml/20181003213100.189959-9-namit@vmware.com/T/#u Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-06 15:52:15 +02:00
Ingo Molnar	02678a5823	Merge branch 'core/core' into x86/build, to prevent conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-06 15:51:56 +02:00
Baoquan He	06d4a462e9	x86/KASLR: Update KERNEL_IMAGE_SIZE description Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them to the current state of affairs. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/20181006084327.27467-2-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-06 14:46:46 +02:00
Lianbo Jiang	992b649a3f	kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled In the kdump kernel, the memory of the first kernel needs to be dumped into the vmcore file. If SME is enabled in the first kernel, the old memory has to be remapped with the memory encryption mask in order to access it properly. Split copy_oldmem_page() functionality to handle encrypted memory properly. [ bp: Heavily massage everything. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/be7b47f9-6be6-e0d1-2c2a-9125bc74b818@redhat.com	2018-10-06 12:09:26 +02:00
Lianbo Jiang	c3a7a61c19	x86/ioremap: Add an ioremap_encrypted() helper When SME is enabled, the memory is encrypted in the first kernel. In this case, SME also needs to be enabled in the kdump kernel, and we have to remap the old memory with the memory encryption mask. The case of concern here is if SME is active in the first kernel, and it is active too in the kdump kernel. There are four cases to be considered: a. dump vmcore It is encrypted in the first kernel, and needs be read out in the kdump kernel. b. crash notes When dumping vmcore, the people usually need to read useful information from notes, and the notes is also encrypted. c. iommu device table It's encrypted in the first kernel, kdump kernel needs to access its content to analyze and get information it needs. d. mmio of AMD iommu not encrypted in both kernels Add a new bool parameter @encrypted to __ioremap_caller(). If set, memory will be remapped with the SME mask. Add a new function ioremap_encrypted() to explicitly pass in a true value for @encrypted. Use ioremap_encrypted() for the above a, b, c cases. [ bp: cleanup commit message, extern defs in io.h and drop forgotten include. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/20180927071954.29615-2-lijiang@redhat.com	2018-10-06 11:57:51 +02:00
Greg Kroah-Hartman	31d099085d	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "perf fixes: - fix a CPU#0 hot unplug bug and a PCI enumeration bug in the x86 Intel uncore PMU driver - fix a CPU event enumeration bug in the x86 AMD PMU driver - fix a perf ring-buffer corruption bug when using tracepoints - fix a PMU unregister locking bug" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX perf/ring_buffer: Prevent concurent ring buffer access perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0 perf/core: Fix perf_pmu_unregister() locking	2018-10-05 16:07:13 -07:00
Greg Kroah-Hartman	247373b5dd	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "x86 fixes: Misc fixes: - fix various vDSO bugs: asm constraints and retpolines - add vDSO test units to make sure they never re-appear - fix UV platform TSC initialization bug - fix build warning on Clang" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Fix vDSO syscall fallback asm constraint regression x86/cpu/amd: Remove unnecessary parentheses x86/vdso: Only enable vDSO retpolines when enabled and supported x86/tsc: Fix UV TSC initialization x86/platform/uv: Provide is_early_uv_system() selftests/x86: Add clock_gettime() tests to test_vdso x86/vdso: Fix asm constraints on vDSO syscall fallbacks	2018-10-05 15:40:57 -07:00
Andy Lutomirski	99c19e6a8f	x86/vdso: Rearrange do_hres() to improve code generation vgetcyc() is full of barriers, so fetching values out of the vvar page before vgetcyc() for use after vgetcyc() results in poor code generation. Put vgetcyc() first to avoid this problem. Also, pull the tv_sec division into the loop and put all the ts writes together. The old code wrote ts->tv_sec on each iteration before the syscall fallback check and then added in the offset afterwards, which forced the compiler to pointlessly copy base->sec to ts->tv_sec on each iteration. The new version seems to generate sensible code. Saves several cycles. With this patch applied, the result is faster than before the clock_gettime() rewrite. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/3c05644d010b72216aa286a6d20b5078d5fae5cd.1538762487.git.luto@kernel.org	2018-10-05 21:03:23 +02:00
Greg Kroah-Hartman	08b297bb10	x86 and PPC bugfixes, mostly introduced in 4.19-rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJbtxXyAAoJEL/70l94x66D52kH/jEbBo/jz9Jx2bnbxYkG1YzO cIpIRjbRcOKVFNGxjStlJ0PedQBWAfPQl+SywRfqwiSlOOt/yo0lZ5ZewENR2TxO CLQC/OnV/5SU7BJvbsKgH9tc+Wp9X55wBUEalfcvG/knFlmR+eK/7TwTS+hv/U21 uYKRnGfz5AGfdmB9FyCn0blkPNnFaQ8KB+y+INZTkB+YZzNsybow230FRPs22fjX HGeJ7gngah50M5gxDW+YPPNXFhs36x2hsyQXBN9TPxLPHoxTsRRoeqx2nl/UvA+e LXZWg8/UAzXFO/fKVHkJX4jSnCDr2W7HYGNyLPtXFPWhcOelP1h9uHrfuX+fxA4= =UNUo -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Paolo writes: "KVM changes for 4.19-rc7 x86 and PPC bugfixes, mostly introduced in 4.19-rc1." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: nVMX: fix entry with pending interrupt if APICv is enabled KVM: VMX: hide flexpriority from guest when disabled at the module level KVM: VMX: check for existence of secondary exec controls before accessing KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault KVM: x86: fix L1TF's MMIO GFN calculation tools/kvm_stat: cut down decimal places in update interval dialog KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled KVM: x86: never trap MSR_KERNEL_GS_BASE	2018-10-05 08:29:44 -07:00
Lubomir Rintel	d92116b800	x86/olpc: Indicate that legacy PC XO-1 platform should not register RTC On OLPC XO-1, the RTC is discovered via device tree from the arch initcall. Don't let the PC platform register another one from its device initcall, it's not going to work: sysfs: cannot create duplicate filename '/devices/platform/rtc_cmos' CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #12 Hardware name: OLPC XO/XO, BIOS OLPC Ver 1.00.01 06/11/2014 Call Trace: dump_stack+0x16/0x18 sysfs_warn_dup+0x46/0x58 sysfs_create_dir_ns+0x76/0x9b kobject_add_internal+0xed/0x209 ? __schedule+0x3fa/0x447 kobject_add+0x5b/0x66 device_add+0x298/0x535 ? insert_resource_conflict+0x2a/0x3e platform_device_add+0x14d/0x192 ? io_delay_init+0x19/0x19 platform_device_register+0x1c/0x1f add_rtc_cmos+0x16/0x31 do_one_initcall+0x78/0x14a ? do_early_param+0x75/0x75 kernel_init_freeable+0x152/0x1e0 ? rest_init+0xa2/0xa2 kernel_init+0x8/0xd5 ret_from_fork+0x2e/0x38 kobject_add_internal failed for rtc_cmos with -EEXIST, don't try to register things with the same name in the same directory. platform rtc_cmos: registered platform RTC device (no PNP device found) Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20181004160808.307738-1-lkundrak@v3.sk	2018-10-05 12:29:20 +02:00
Ingo Molnar	bce6824cc8	Merge branch 'x86/core' into x86/build, to avoid conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-05 11:27:23 +02:00
Andy Lutomirski	bcc4a62a73	x86/vdso: Document vgtod_ts better After reading do_hres() and do_course() and scratching my head a bit, I figured out why the arithmetic is strange. Document it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f66f53d81150bbad47d7b282c9207a71a3ce1c16.1538689401.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-05 10:12:18 +02:00
Andy Lutomirski	89fe0a1f1c	x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks When a vDSO clock function falls back to the syscall, no special barriers or ordering is needed, and the syscall fallbacks don't clobber any memory that is not explicitly listed in the asm constraints. Remove the "memory" clobber. This causes minor changes to the generated code, but otherwise has no obvious performance impact. I think it's nice to have, though, since it may help the optimizer in the future. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3a7438f5fb2422ed881683d2ccffd7f987b2dc44.1538689401.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-05 10:12:18 +02:00
Ard Biesheuvel	944585a64f	crypto: x86/aes-ni - remove special handling of AES in PCBC mode For historical reasons, the AES-NI based implementation of the PCBC chaining mode uses a special FPU chaining mode wrapper template to amortize the FPU start/stop overhead over multiple blocks. When this FPU wrapper was introduced, it supported widely used chaining modes such as XTS and CTR (as well as LRW), but currently, PCBC is the only remaining user. Since there are no known users of pcbc(aes) in the kernel, let's remove this special driver, and rely on the generic pcbc driver to encapsulate the AES-NI core cipher. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-10-05 10:16:56 +08:00
Thomas Gleixner	315f28fa3a	x66/vdso: Add CLOCK_TAI support With the storage array in place it's now trivial to support CLOCK_TAI in the vdso. Extend the base time storage array and add the update code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Matt Rickard <matt@softrans.com.au> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.823878601@linutronix.de	2018-10-04 23:00:27 +02:00
Thomas Gleixner	3e89bf35eb	x86/vdso: Move cycle_last handling into the caller Dereferencing gtod->cycle_last all over the place and foing the cycles < last comparison in the vclock read functions generates horrible code. Doing it at the call site is much better and gains a few cycles both for TSC and pvclock. Caveat: This adds the comparison to the hyperv vclock as well, but I have no way to test that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.741440803@linutronix.de	2018-10-04 23:00:27 +02:00
Thomas Gleixner	4f72adc506	x86/vdso: Simplify the invalid vclock case The code flow for the vclocks is convoluted as it requires the vclocks which can be invalidated separately from the vsyscall_gtod_data sequence to store the fact in a separate variable. That's inefficient. Restructure the code so the vclock readout returns cycles and the conversion to nanoseconds is handled at the call site. If the clock gets invalidated or vclock is already VCLOCK_NONE, return U64_MAX as the cycle value, which is invalid for all clocks and leave the sequence loop immediately in that case by calling the fallback function directly. This allows to remove the gettimeofday fallback as it now uses the clock_gettime() fallback and does the nanoseconds to microseconds conversion in the same way as it does when the vclock is functional. It does not make a difference whether the division by 1000 happens in the kernel fallback or in userspace. Generates way better code and gains a few cycles back. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.657928937@linutronix.de	2018-10-04 23:00:26 +02:00
Thomas Gleixner	f3e8393841	x86/vdso: Replace the clockid switch case Now that the time getter functions use the clockid as index into the storage array for the base time access, the switch case can be replaced. - Check for clockid >= MAX_CLOCKS and for negative clockid (CPU/FD) first and call the fallback function right away. - After establishing that clockid is < MAX_CLOCKS, convert the clockid to a bitmask - Check for the supported high resolution and coarse functions by anding the bitmask of supported clocks and check whether a bit is set. This completely avoids jump tables, reduces the number of conditionals and makes the VDSO extensible for other clock ids. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.574315796@linutronix.de	2018-10-04 23:00:26 +02:00
Thomas Gleixner	6deec5bdef	x86/vdso: Collapse coarse functions do_realtime_coarse() and do_monotonic_coarse() are now the same except for the storage array index. Hand the index in as an argument and collapse the functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.490733779@linutronix.de	2018-10-04 23:00:26 +02:00
Thomas Gleixner	e9a62f76f9	x86/vdso: Collapse high resolution functions do_realtime() and do_monotonic() are now the same except for the storage array index. Hand the index in as an argument and collapse the functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.407955860@linutronix.de	2018-10-04 23:00:25 +02:00
Thomas Gleixner	49116f2081	x86/vdso: Introduce and use vgtod_ts It's desired to support more clocks in the VDSO, e.g. CLOCK_TAI. This results either in indirect calls due to the larger switch case, which then requires retpolines or when the compiler is forced to avoid jump tables it results in even more conditionals. To avoid both variants which are bad for performance the high resolution functions and the coarse grained functions will be collapsed into one for each. That requires to store the clock specific base time in an array. Introcude struct vgtod_ts for storage and convert the data store, the update function and the individual clock functions over to use it. The new storage does not longer use gtod_long_t for seconds depending on 32 or 64 bit compile because this needs to be the full 64bit value even for 32bit when a Y2038 function is added. No point in keeping the distinction alive in the internal representation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.324679401@linutronix.de	2018-10-04 23:00:25 +02:00
Thomas Gleixner	77e9c678c5	x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: Seq The sequence count in vgtod_data is unsigned int, but the call sites use unsigned long, which is a pointless exercise. Fix the call sites and replace 'unsigned' with unsinged 'int' while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.236250416@linutronix.de	2018-10-04 23:00:25 +02:00
Thomas Gleixner	a51e996d48	x86/vdso: Enforce 64bit clocksource All VDSO clock sources are TSC based and use CLOCKSOURCE_MASK(64). There is no point in masking with all FF. Get rid of it and enforce the mask in the sanity checker. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.151963007@linutronix.de	2018-10-04 23:00:25 +02:00
Thomas Gleixner	2a21ad571b	x86/time: Implement clocksource_arch_init() Runtime validate the VCLOCK_MODE in clocksource::archdata and disable VCLOCK if invalid, which disables the VDSO but keeps the system running. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.069167446@linutronix.de	2018-10-04 23:00:24 +02:00
Paolo Bonzini	7e7126846c	kvm: nVMX: fix entry with pending interrupt if APICv is enabled Commit `b5861e5cf2` introduced a check on the interrupt-window and NMI-window CPU execution controls in order to inject an external interrupt vmexit before the first guest instruction executes. However, when APIC virtualization is enabled the host does not need a vmexit in order to inject an interrupt at the next interrupt window; instead, it just places the interrupt vector in RVI and the processor will inject it as soon as possible. Therefore, on machines with APICv it is not enough to check the CPU execution controls: the same scenario can also happen if RVI>vPPR. Fixes: `b5861e5cf2` Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-04 17:10:40 +02:00
Paolo Bonzini	2cf7ea9f40	KVM: VMX: hide flexpriority from guest when disabled at the module level As of commit `8d860bbeed` ("kvm: vmx: Basic APIC virtualization controls have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0, whereas previously KVM would allow a nested guest to enable VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware. That is, KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't (always) allow setting it when kvm-intel.flexpriority=0, and may even initially allow the control and then clear it when the nested guest writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause functional issues. Hide the control completely when the module parameter is cleared. reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Fixes: `8d860bbeed` ("kvm: vmx: Basic APIC virtualization controls have three settings") Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-04 13:40:44 +02:00
Sean Christopherson	fd6b6d9b82	KVM: VMX: check for existence of secondary exec controls before accessing Return early from vmx_set_virtual_apic_mode() if the processor doesn't support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of which reside in SECONDARY_VM_EXEC_CONTROL. This eliminates warnings due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing on processors without secondary exec controls. Remove the similar check for TPR shadowing as it is incorporated in the flexpriority_enabled check and the APIC-related code in vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE. Reported-by: Gerhard Wiesinger <redhat@wiesinger.com> Fixes: `8d860bbeed` ("kvm: vmx: Basic APIC virtualization controls have three settings") Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-04 13:40:21 +02:00
Nadav Amit	494b5168f2	x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) In this patch we wrap the paravirt call section tricks in a macro, to hide it from GCC. The effect of the patch is a more aggressive inlining, which also causes a size increase of kernel. text data bss dec hex filename 18147336 10226688 2957312 31331336 1de1408 ./vmlinux before 18162555 10226288 2957312 31346155 1de4deb ./vmlinux after (+14819) The number of static text symbols (non-inlined functions) goes down: Before: 40053 After: 39942 (-111) [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20181003213100.189959-8-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 11:25:00 +02:00
Nadav Amit	f81f8ad56f	x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) This patch increases the kernel size: text data bss dec hex filename 18146889 10225380 2957312 31329581 1de0d2d ./vmlinux before 18147336 10226688 2957312 31331336 1de1408 ./vmlinux after (+1755) But enables more aggressive inlining (and probably better branch decisions). The number of static text symbols in vmlinux is much lower: Before: 40218 After: 40053 (-165) The assembly code gets harder to read due to the extra macro layer. [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181003213100.189959-7-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 11:25:00 +02:00
Nadav Amit	77f48ec28e	x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block - i.e. to macrify the affected block. As a result GCC considers the inline assembly block as a single instruction. This patch handles the LOCK prefix, allowing more aggresive inlining: text data bss dec hex filename 18140140 10225284 2957312 31322736 1ddf270 ./vmlinux before 18146889 10225380 2957312 31329581 1de0d2d ./vmlinux after (+6845) This is the reduction in non-inlined functions: Before: 40286 After: 40218 (-68) Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181003213100.189959-6-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 11:24:59 +02:00
Nadav Amit	9e1725b410	x86/refcount: Work around GCC inlining bug As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) This patch allows GCC to inline simple functions such as __get_seccomp_filter(). To no-one's surprise the result is that GCC performs more aggressive (read: correct) inlining decisions in these senarios, which reduces the kernel size and presumably also speeds it up: text data bss dec hex filename 18140970 10225412 2957312 31323694 1ddf62e ./vmlinux before 18140140 10225284 2957312 31322736 1ddf270 ./vmlinux after (-958) 16 fewer static text symbols: Before: 40302 After: 40286 (-16) these got inlined instead. Functions such as kref_get(), free_user(), fuse_file_get() now get inlined. Hurray! [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181003213100.189959-5-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 11:24:59 +02:00
Nadav Amit	c06c4d8090	x86/objtool: Use asm macros to work around GCC inlining bugs As described in: `77b0bf55bc`: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs") GCC's inlining heuristics are broken with common asm() patterns used in kernel code, resulting in the effective disabling of inlining. In the case of objtool the resulting borkage can be significant, since all the annotations of objtool are discarded during linkage and never inlined, yet GCC bogusly considers most functions affected by objtool annotations as 'too large'. The workaround is to set an assembly macro and call it from the inline assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it isn't, but that's the best we can get.) This increases the kernel size slightly: text data bss dec hex filename 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before 18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829) The number of static text symbols (i.e. non-inlined functions) is reduced: Before: 40321 After: 40302 (-19) [ mingo: Rewrote the changelog. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christopher Li <sparse@chrisli.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-sparse@vger.kernel.org Link: http://lkml.kernel.org/r/20181003213100.189959-4-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 11:24:58 +02:00
Nadav Amit	77b0bf55bc	kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs Using macros in inline assembly allows us to work around bugs in GCC's inlining decisions. Compile macros.S and use it to assemble all C files. Currently only x86 will use it. Background: The inlining pass of GCC doesn't include an assembler, so it's not aware of basic properties of the generated code, such as its size in bytes, or that there are such things as discontiuous blocks of code and data due to the newfangled linker feature called 'sections' ... Instead GCC uses a lazy and fragile heuristic: it does a linear count of certain syntactic and whitespace elements in inlined assembly block source code, such as a count of new-lines and semicolons (!), as a poor substitute for "code size and complexity". Unsurprisingly this heuristic falls over and breaks its neck whith certain common types of kernel code that use inline assembly, such as the frequent practice of putting useful information into alternative sections. As a result of this fresh, 20+ years old GCC bug, GCC's inlining decisions are effectively disabled for inlined functions that make use of such asm() blocks, because GCC thinks those sections of code are "large" - when in reality they are often result in just a very low number of machine instructions. This absolute lack of inlining provess when GCC comes across such asm() blocks both increases generated kernel code size and causes performance overhead, which is particularly noticeable on paravirt kernels, which make frequent use of these inlining facilities in attempt to stay out of the way when running on baremetal hardware. Instead of fixing the compiler we use a workaround: we set an assembly macro and call it from the inlined assembly block. As a result GCC considers the inline assembly block as a single instruction. (Which it often isn't but I digress.) This uglifies and bloats the source code - for example just the refcount related changes have this impact: Makefile \| 9 +++++++-- arch/x86/Makefile \| 7 +++++++ arch/x86/kernel/macros.S \| 7 +++++++ scripts/Kbuild.include \| 4 +++- scripts/mod/Makefile \| 2 ++ 5 files changed, 26 insertions(+), 3 deletions(-) Yay readability and maintainability, it's not like assembly code is hard to read and maintain ... We also hope that GCC will eventually get fixed, but we are not holding our breath for that. Yet we are optimistic, it might still happen, any decade now. [ mingo: Wrote new changelog describing the background. ] Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kbuild@vger.kernel.org Link: http://lkml.kernel.org/r/20181003213100.189959-3-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 10:57:09 +02:00
Ingo Molnar	c0554d2d3d	Merge branch 'linus' into x86/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 08:23:03 +02:00
Andy Lutomirski	02e425668f	x86/vdso: Fix vDSO syscall fallback asm constraint regression When I added the missing memory outputs, I failed to update the index of the first argument (ebx) on 32-bit builds, which broke the fallbacks. Somehow I must have screwed up my testing or gotten lucky. Add another test to cover gettimeofday() as well. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: `715bd9d12f` ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks") Link: http://lkml.kernel.org/r/21bd45ab04b6d838278fa5bebfa9163eceffa13c.1538608971.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-04 08:17:50 +02:00
Xiaochen Shen	2cc81c6992	x86/intel_rdt: Show missing resctrl mount options In resctrl filesystem, mount options exist to enable L3/L2 CDP and MBA Software Controller features if the platform supports them: mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl But currently only "cdp" option is displayed in /proc/mounts. "cdpl2" and "mba_MBps" options are not shown even when they are active. Before: # mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl # grep resctrl /proc/mounts /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0 After: # mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl # grep resctrl /proc/mounts /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0 Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1536796118-60135-1-git-send-email-fenghua.yu@intel.com	2018-10-03 21:53:49 +02:00
Andy Shevchenko	82159876d3	x86/intel_rdt: Switch to bitmap_zalloc() Switch to bitmap_zalloc() to show clearly what is allocated. Besides that it returns a pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180830115039.63430-1-andriy.shevchenko@linux.intel.com	2018-10-03 21:53:49 +02:00
Reinette Chatre	53ed74af05	x86/intel_rdt: Re-enable pseudo-lock measurements Commit `4a7a54a55e` ("x86/intel_rdt: Disable PMU access") disabled measurements of pseudo-locked regions because of incorrect usage of the performance monitoring hardware. Cache pseudo-locking measurements are now done correctly with the in-kernel perf API and its use can be re-enabled at this time. The adjustment to the in-kernel perf API also separated the L2 and L3 measurements that can be triggered separately from user space. The re-enabling of the measurements is thus not a simple revert of the original disable in order to accommodate the additional parameter possible. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/bfb9fc31692e0c62d9ca39062e55eceb6a0635b5.1537377064.git.reinette.chatre@intel.com	2018-10-03 21:53:35 +02:00
Eric W. Biederman	ae7795bc61	signal: Distinguish between kernel_siginfo and siginfo Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-10-03 16:47:43 +02:00
Eric W. Biederman	f283801851	signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE Rework the defintion of struct siginfo so that the array padding struct siginfo to SI_MAX_SIZE can be placed in a union along side of the rest of the struct siginfo members. The result is that we no longer need the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-10-03 16:46:43 +02:00
Takuya Yamamoto	b3541fbc3c	x86/mm: Fix typo in comment Signed-off-by: Takuya Yamamoto <tkyymmt01@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180829072730.988-1-tkyymmt01@gmail.com	2018-10-03 16:14:05 +02:00
Zhimin Gu	1fca4ba0b1	x86-32, hibernate: Adjust in_suspend after resumed on 32bit system Update the in_suspend variable to reflect the actual hibernation status. Back-port from 64bit system. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:34 +02:00
Zhimin Gu	5331d2c7ef	x86-32, hibernate: Set up temporary text mapping for 32bit system Set up the temporary text mapping for the final jump address so that the system could jump to the right address after all the pages have been copied back to their original address - otherwise the final mapping for the jump address is invalid. Analogous changes were made for 64-bit in commit `65c0554b73` (x86/power/64: Fix kernel text mapping corruption during image restoration). Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:34 +02:00
Zhimin Gu	6bae499a0a	x86-32, hibernate: Switch to relocated restore code during resume on 32bit system On 64bit system, code should be executed in a safe page during page restoring, as the page where instruction is running during resume might be scribbled and causes issues. Although on 32 bit, we only suspend resuming by same kernel that did the suspend, we'd like to remove that restriction in the future. Porting corresponding code from 64bit system: Allocate a safe page, and copy the restore code to it, then jump to the safe page to run the code. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:34 +02:00
Zhimin Gu	32aa276437	x86-32, hibernate: Switch to original page table after resumed After all the pages are restored to previous address, the page table switches back to current swapper_pg_dir. However the swapper_pg_dir currently in used might not be consistent with previous page table, which might cause issue after resume. Fix this issue by switching to original page table after resume, and the address of the original page table is saved in the hibernation image header. Move the manipulation of restore_cr3 into common code blocks. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:34 +02:00
Zhimin Gu	0b0a6b1f76	x86-32, hibernate: Use the page size macro instead of constant value Convert the hard code into PAGE_SIZE for better scalability. No functional change. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:34 +02:00
Zhimin Gu	7c0a982750	x86-32, hibernate: Use temp_pgt as the temporary page table This is to reuse the temp_pgt for both 32bit and 64bit system. No intentional behavior change. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:34 +02:00
Zhimin Gu	72adf47764	x86, hibernate: Rename temp_level4_pgt to temp_pgt As 32bit system is not using 4-level page, rename it to temp_pgt so that it can be reused for both 32bit and 64bit hibernation. No functional change. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:34 +02:00
Zhimin Gu	445565303d	x86-32, hibernate: Enable CONFIG_ARCH_HIBERNATION_HEADER on 32bit system Enable CONFIG_ARCH_HIBERNATION_HEADER for 32bit system so that 1. arch_hibernation_header_save/restore() are invoked across hibernation on 32bit system. 2. The checksum handling as well as 'magic' number checking for 32bit system are enabled. Controlled by CONFIG_X86_64 in hibernate.c Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:33 +02:00
Zhimin Gu	25862a049e	x86, hibernate: Extract the common code of 64/32 bit system Reduce the hibernation code duplication between x86-32 and x86-64 by extracting the common code into hibernate.c. Currently only pfn_is_nosave() is the activated common function in hibernate.c No functional change. Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:33 +02:00
Zhimin Gu	8e5b2a3c5a	x86-32/asm/power: Create stack frames in hibernate_asm_32.S swsusp_arch_suspend() is callable non-leaf function which doesn't honor CONFIG_FRAME_POINTER, which can result in bad stack traces. Also it's not annotated as ELF callable function which can confuse tooling. Create a stack frame for it when CONFIG_FRAME_POINTER is enabled and give it proper ELF function annotation. Also in this patch introduces the restore_registers() symbol and gives it ELF function annotation, thus to prepare for later register restore. Analogous changes were made for 64bit before in commit `ef0f3ed5a4` (x86/asm/power: Create stack frames in hibernate_asm_64.S) and commit `4ce827b4cc` (x86/power/64: Fix hibernation return address corruption). Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:33 +02:00
Chen Yu	749fa17093	PM / hibernate: Check the success of generating md5 digest before hibernation Currently if get_e820_md5() fails, then it will hibernate nevertheless. Actually the error code should be propagated to upper caller so that the hibernation could be aware of the result and terminates the process if md5 digest fails. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:33 +02:00
Zhimin Gu	cc55f7537d	x86, hibernate: Fix nosave_regions setup for hibernation On 32bit systems, nosave_regions(non RAM areas) located between max_low_pfn and max_pfn are not excluded from hibernation snapshot currently, which may result in a machine check exception when trying to access these unsafe regions during hibernation: [ 612.800453] Disabling lock debugging due to kernel taint [ 612.805786] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: fe00000000801136 [ 612.814344] mce: [Hardware Error]: RIP !INEXACT! 60:<00000000d90be566> {swsusp_save+0x436/0x560} [ 612.823167] mce: [Hardware Error]: TSC 1f5939fe276 ADDR dd000000 MISC 30e0000086 [ 612.830677] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1529487426 SOCKET 0 APIC 0 microcode 24 [ 612.839581] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 612.846394] mce: [Hardware Error]: Machine check: Processor context corrupt [ 612.853380] Kernel panic - not syncing: Fatal machine check [ 612.858978] Kernel Offset: 0x18000000 from 0xc1000000 (relocation range: 0xc0000000-0xf7ffdfff) This is because on 32bit systems, pages above max_low_pfn are regarded as high memeory, and accessing unsafe pages might cause expected MCE. On the problematic 32bit system, there are reserved memory above low memory, which triggered the MCE: e820 memory mapping: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000d160cfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d160d000-0x00000000d1613fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000d1614000-0x00000000d1a44fff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d1a45000-0x00000000d1ecffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000d1ed0000-0x00000000d7eeafff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d7eeb000-0x00000000d7ffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000d8000000-0x00000000d875ffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d8760000-0x00000000d87fffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000d8800000-0x00000000d8fadfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000d8fae000-0x00000000d8ffffff] ACPI data [ 0.000000] BIOS-e820: [mem 0x00000000d9000000-0x00000000da71bfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000da71c000-0x00000000da7fffff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000da800000-0x00000000dbb8bfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000dbb8c000-0x00000000dbffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000dd000000-0x00000000df1fffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000041edfffff] usable Fix this problem by changing pfn limit from max_low_pfn to max_pfn. This fix does not impact 64bit system because on 64bit max_low_pfn is the same as max_pfn. Signed-off-by: Zhimin Gu <kookoo.gu@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-03 11:56:33 +02:00
Nathan Chancellor	88296bd42b	x86/cpu/amd: Remove unnecessary parentheses Clang warns when multiple pairs of parentheses are used for a single conditional statement. arch/x86/kernel/cpu/amd.c:925:14: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((c->x86 == 6)) { ~~~~~~~^~~~ arch/x86/kernel/cpu/amd.c:925:14: note: remove extraneous parentheses around the comparison to silence this warning if ((c->x86 == 6)) { ~ ^ ~ arch/x86/kernel/cpu/amd.c:925:14: note: use '=' to turn this equality comparison into an assignment if ((c->x86 == 6)) { ^~ = 1 warning generated. Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181002224511.14929-1-natechancellor@gmail.com Link: https://github.com/ClangBuiltLinux/linux/issues/187 Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-03 08:27:47 +02:00
Andy Lutomirski	4f16656401	x86/vdso: Only enable vDSO retpolines when enabled and supported When I fixed the vDSO build to use inline retpolines, I messed up the Makefile logic and made it unconditional. It should have depended on CONFIG_RETPOLINE and on the availability of compiler support. This broke the build on some older compilers. Reported-by: nikola.ciprich@linuxbox.cz Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jason.vas.dias@gmail.com Cc: stable@vger.kernel.org Fixes: `2e549b2ee0` ("x86/vdso: Fix vDSO build if a retpoline is emitted") Link: http://lkml.kernel.org/r/08a1f29f2c238dd1f493945e702a521f8a5aa3ae.1538540801.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-03 08:26:14 +02:00
Jon Derrick	4f475e8e0a	x86/PCI: Apply VMD's AERSID fixup generically A root port Device ID changed between simulation and production. Rather than match Device IDs which may not be future-proof if left unmaintained, match all root ports which exist in a VMD domain. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2018-10-02 16:22:11 -05:00
Mike Travis	2647c43c7f	x86/tsc: Fix UV TSC initialization The recent rework of the TSC calibration code introduced a regression on UV systems as it added a call to tsc_early_init() which initializes the TSC ADJUST values before acpi_boot_table_init(). In the case of UV systems, that is a necessary step that calls uv_system_init(). This informs tsc_sanitize_first_cpu() that the kernel runs on a platform with async TSC resets as documented in commit `341102c3ef` ("x86/tsc: Add option that TSC on Socket 0 being non-zero is valid") Fix it by skipping the early tsc initialization on UV systems and let TSC init tests take place later in tsc_init(). Fixes: `cf7a63ef4e` ("x86/tsc: Calibrate tsc only once") Suggested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Link: https://lkml.kernel.org/r/20181002180144.923579706@stormcage.americas.sgi.com	2018-10-02 21:29:16 +02:00
Mike Travis	20a8378aa9	x86/platform/uv: Provide is_early_uv_system() Introduce is_early_uv_system() which uses efi.uv_systab to decide early in the boot process whether the kernel runs on a UV system. This is needed to skip other early setup/init code that might break the UV platform if done too early such as before necessary ACPI tables parsing takes place. Suggested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Mike Travis <mike.travis@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Link: https://lkml.kernel.org/r/20181002180144.801700401@stormcage.americas.sgi.com	2018-10-02 21:29:16 +02:00
Feng Tang	d2266bbfa9	x86/earlyprintk: Add a force option for pciserial device The "pciserial" earlyprintk variant helps much on many modern x86 platforms, but unfortunately there are still some platforms with PCI UART devices which have the wrong PCI class code. In that case, the current class code check does not allow for them to be used for logging. Add a sub-option "force" which overrides the class code check and thus the use of such device can be enforced. [ bp: massage formulations. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Stuart R . Anderson" <stuart.r.anderson@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thymo van Beers <thymovanbeers@gmail.com> Cc: alan@linux.intel.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/20181002164921.25833-1-feng.tang@intel.com	2018-10-02 21:02:47 +02:00
Kan Liang	7c5314b88d	perf/x86/intel: Add quirk for Goldmont Plus A ucode patch is needed for Goldmont Plus while counter freezing feature is enabled. Otherwise, there will be some issues, e.g. PMI flood with some events. Add a quirk to check microcode version. If the system starts with the wrong ucode, leave the counter-freezing feature permanently disabled. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 10:14:33 +02:00
Peter Zijlstra	f2c4db1bd8	x86/cpu: Sanitize FAM6_ATOM naming Going primarily by: https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors with additional information gleaned from other related pages; notably: - Bonnell shrink was called Saltwell - Moorefield is the Merriefield refresh which makes it Airmont The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE for i in `git grep -l FAM6_ATOM` ; do sed -i -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g' \ -e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/' \ -e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g' \ -e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g' \ -e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g' \ -e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g' \ -e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g' \ -e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g' \ -e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g' \ -e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g' \ -e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dave.hansen@linux.intel.com Cc: len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 10:14:32 +02:00
Andi Kleen	af3bdb991a	perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler Implements counter freezing for Arch Perfmon v4 (Skylake and newer). This allows to speed up the PMI handler by avoiding unnecessary MSR writes and make it more accurate. The Arch Perfmon v4 PMI handler is substantially different than the older PMI handler. Differences to the old handler: - It relies on counter freezing, which eliminates several MSR writes from the PMI handler and lowers the overhead significantly. It makes the PMI handler more accurate, as all counters get frozen atomically as soon as any counter overflows. So there is much less counting of the PMI handler itself. With the freezing we don't need to disable or enable counters or PEBS. Only BTS which does not support auto-freezing still needs to be explicitly managed. - The PMU acking is done at the end, not the beginning. This makes it possible to avoid manual enabling/disabling of the PMU, instead we just rely on the freezing/acking. - The APIC is acked before reenabling the PMU, which avoids problems with LBRs occasionally not getting unfreezed on Skylake. - Looping is only needed to workaround a corner case which several PMIs are very close to each other. For common cases, the counters are freezed during PMI handler. It doesn't need to do re-check. This patch: - Adds code to enable v4 counter freezing - Fork <=v3 and >=v4 PMI handlers into separate functions. - Add kernel parameter to disable counter freezing. It took some time to debug counter freezing, so in case there are new problems we added an option to turn it off. Would not expect this to be used until there are new bugs. - Only for big core. The patch for small core will be posted later separately. Performance: When profiling a kernel build on Kabylake with different perf options, measuring the length of all NMI handlers using the nmi handler trace point: V3 is without counter freezing. V4 is with counter freezing. The value is the average cost of the PMI handler. (lower is better) perf options ` V3(ns) V4(ns) delta -c 100000 1088 894 -18% -g -c 100000 1862 1646 -12% --call-graph lbr -c 100000 3649 3367 -8% --c.g. dwarf -c 100000 2248 1982 -12% Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 10:14:31 +02:00
Kan Liang	ba12d20edc	perf/x86/intel: Factor out common code of PMI handler The Arch Perfmon v4 PMI handler is substantially different than the older PMI handler. Instead of adding more and more ifs cleanly fork the new handler into a new function, with the main common code factored out into a common function. Fix complaint from checkpatch.pl by removing "false" from "static bool warned". No functional change. Based-on-code-from: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1533712328-2834-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 10:14:30 +02:00
Ingo Molnar	a4c9f26533	Merge branch 'x86/cache' into perf/core, to resolve conflicts Avoid conflict with upcoming perf/core patches, merge in the RDT perf work. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:51:41 +02:00
Ingo Molnar	97e831e130	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:50:34 +02:00
Natarajan, Janakarajan	d7cbbe49a9	perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events In Family 17h, some L3 Cache Performance events require the ThreadMask and SliceMask to be set. For other events, these fields do not affect the count either way. Set ThreadMask and SliceMask to 0xFF and 0xF respectively. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:38:04 +02:00
Kan Liang	9d92cfeaf5	perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX The counters on M3UPI Link 0 and Link 3 don't count properly, and writing 0 to these counters may causes system crash on some machines. The PCI BDF addresses of the M3UPI in the current code are incorrect. The correct addresses should be: D18:F1 0x204D D18:F2 0x204E D18:F5 0x204D Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: `cd34cd97b7` ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1537538826-55489-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:38:02 +02:00
Masayoshi Mizuma	6265adb972	perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0 Physical package id 0 doesn't always exist, we should use boot_cpu_data.phys_proc_id here. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20180910144750.6782-1-msys.mizuma@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:37:58 +02:00
Andy Lutomirski	715bd9d12f	x86/vdso: Fix asm constraints on vDSO syscall fallbacks The syscall fallbacks in the vDSO have incorrect asm constraints. They are not marked as writing to their outputs -- instead, they are marked as clobbering "memory", which is useless. In particular, gcc is smart enough to know that the timespec parameter hasn't escaped, so a memory clobber doesn't clobber it. And passing a pointer as an asm input does not tell gcc that the pointed-to value is changed. Add in the fact that the asm instructions weren't volatile, and gcc was free to omit them entirely unless their sole output (the return value) is used. Which it is (phew!), but that stops happening with some upcoming patches. As a trivial example, the following code: void test_fallback(struct timespec *ts) { vdso_fallback_gettime(CLOCK_MONOTONIC, ts); } compiles to: 00000000000000c0 <test_fallback>: c0: c3 retq To add insult to injury, the RCX and R11 clobbers on 64-bit builds were missing. The "memory" clobber is also unnecessary -- no ordering with respect to other memory operations is needed, but that's going to be fixed in a separate not-for-stable patch. Fixes: `2aae950b21` ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org	2018-10-02 08:28:15 +02:00
Jens Axboe	c0aac682fa	This is the 4.19-rc6 release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAluw4MIACgkQONu9yGCS aT7+8xAAiYnc4khUsxeInm3z44WPfRX1+UF51frTNSY5C8Nn5nvRSnTUNLuKkkrz 8RbwCL6UYyJxF9I/oZdHPsPOD4IxXkQY55tBjz7ZbSBIFEwYM6RJMm8mAGlXY7wq VyWA5MhlpGHM9DjrguB4DMRipnrSc06CVAnC+ZyKLjzblzU1Wdf2dYu+AW9pUVXP j4r74lFED5djPY1xfqfzEwmYRCeEGYGx7zMqT3GrrF5uFPqj1H6O5klEsAhIZvdl IWnJTU2coC8R/Sd17g4lHWPIeQNnMUGIUbu+PhIrZ/lDwFxlocg4BvarPXEdzgYi gdZzKBfovpEsSu5RCQsKWG4IGQxY7I1p70IOP9eqEFHZy77qT1YcHVAWrK1Y/bJd UA08gUOSzRnhKkNR3+PsaMflUOl9WkpyHECZu394cyRGMutSS50aWkavJPJ/o1Qi D/oGqZLLcKFyuNcchG+Met1TzY3LvYEDgSburqwqeUZWtAsGs8kmiiq7qvmXx4zV IcgM8ERqJ8mbfhfsXQU7hwydIrPJ3JdIq19RnM5ajbv2Q4C/qJCyAKkQoacrlKR4 aiow/qvyNrP80rpXfPJB8/8PiWeDtAnnGhM+xySZNlw3t8GR6NYpUkIzf5TdkSb3 C8KuKg6FY9QAS62fv+5KK3LB/wbQanxaPNruQFGe5K1iDQ5Fvzw= =dMl4 -----END PGP SIGNATURE----- Merge tag 'v4.19-rc6' into for-4.20/block Merge -rc6 in, for two reasons: 1) Resolve a trivial conflict in the blk-mq-tag.c documentation 2) A few important regression fixes went into upstream directly, so they aren't in the 4.20 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk> * tag 'v4.19-rc6': (780 commits) Linux 4.19-rc6 MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c cpufreq: qcom-kryo: Fix section annotations perf/core: Add sanity check to deal with pinned event failure xen/blkfront: correct purging of persistent grants Revert "xen/blkfront: When purging persistent grants, keep them in the buffer" selftests/powerpc: Fix Makefiles for headers_install change blk-mq: I/O and timer unplugs are inverted in blktrace dax: Fix deadlock in dax_lock_mapping_entry() x86/boot: Fix kexec booting failure in the SEV bit detection code bcache: add separate workqueue for journal_write to avoid deadlock drm/amd/display: Fix Edid emulation for linux drm/amd/display: Fix Vega10 lightup on S3 resume drm/amdgpu: Fix vce work queue was not cancelled when suspend Revert "drm/panel: Add device_link from panel device to DRM device" xen/blkfront: When purging persistent grants, keep them in the buffer clocksource/drivers/timer-atmel-pit: Properly handle error cases block: fix deadline elevator drain for zoned block devices ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set ... Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-10-01 08:58:57 -06:00
Sean Christopherson	daa07cbc9a	KVM: x86: fix L1TF's MMIO GFN calculation One defense against L1TF in KVM is to always set the upper five bits of the legal physical address in the SPTEs for non-present and reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the MMIO SPTE may overlap with the upper five bits that are being usurped to defend against L1TF. To preserve the GFN, the bits of the GFN that overlap with the repurposed bits are shifted left into the reserved bits, i.e. the GFN in the SPTE will be split into high and low parts. When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO access, get_mmio_spte_gfn() unshifts the affected bits and restores the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() neglects to mask off the reserved bits in the SPTE that were used to store the upper chunk of the GFN. As a result, KVM fails to detect MMIO accesses whose GPA overlaps the repurprosed bits, which in turn causes guest panics and hangs. Fix the bug by generating a mask that covers the lower chunk of the GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The alternative approach would be to explicitly zero the five reserved bits that are used to store the upper chunk of the GFN, but that requires additional run-time computation and makes an already-ugly bit of code even more inscrutable. I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to warn if GENMASK_ULL() generated a nonsensical value, but that seemed silly since that would mean a system that supports VMX has less than 18 bits of physical address space... Reported-by: Sakari Ailus <sakari.ailus@iki.fi> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Reviewed-by: Junaid Shahid <junaids@google.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:41:00 +02:00
Liran Alon	62cf9bd811	KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls. Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which is L1 IA32_BNDCFGS. Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:40:59 +02:00
Liran Alon	503234b3fd	KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly Commit `a87036add0` ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") introduced kvm_mpx_supported() to return true iff MPX is enabled in the host. However, that commit seems to have missed replacing some calls to kvm_x86_ops->mpx_supported() to kvm_mpx_supported(). Complete original commit by replacing remaining calls to kvm_mpx_supported(). Fixes: `a87036add0` ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:40:57 +02:00
Liran Alon	5f76f6f5ff	KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled Before this commit, KVM exposes MPX VMX controls to L1 guest only based on if KVM and host processor supports MPX virtualization. However, these controls should be exposed to guest only in case guest vCPU supports MPX. Without this change, a L1 guest running with kernel which don't have commit `691bd4340b` ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS") asserts in QEMU on the following: qemu-kvm: error: failed to set MSR 0xd90 to 0x0 qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs: Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed' This is because L1 KVM kvm_init_msr_list() will see that vmx_mpx_supported() (As it only checks MPX VMX controls support) and therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS. However, later when L1 will attempt to set this MSR via KVM_SET_MSRS IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu). Therefore, fix the issue by exposing MPX VMX controls to L1 guest only when vCPU supports MPX. Fixes: `36be0b9deb` ("KVM: x86: Add nested virtualization support for MPX") Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:40:57 +02:00
Masahiro Yamada	ac0d656795	x86/build: Remove unused CONFIG_AS_CRC32 CONFIG_AS_CRC32 is not used anywhere. Its last user was removed by `0cb6c969ed` ("net, lib: kill arch_fast_hash library bits") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/1538389443-28514-1-git-send-email-yamada.masahiro@socionext.com	2018-10-01 14:58:09 +02:00
Uros Bizjak	c808c09b52	x86/asm: Use CC_SET()/CC_OUT() in __cmpxchg_double() Replace open-coded use of the SETcc instruction with CC_SET()/CC_OUT() in __cmpxchg_double(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/CAFULd4YdvwwhXWHqqPsGk5+TLG71ozgSscTZNsqmrm+Jzg941w@mail.gmail.com	2018-10-01 13:46:32 +02:00
Greg Kroah-Hartman	e75417739b	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Thomas writes: "A single fix for the AMD memory encryption boot code so it does not read random garbage instead of the cached encryption bit when a kexec kernel is allocated above the 32bit address limit." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix kexec booting failure in the SEV bit detection code	2018-09-29 14:34:06 -07:00
Reinette Chatre	dd45407c0b	x86/intel_rdt: Use perf infrastructure for measurements The success of a cache pseudo-locked region is measured using performance monitoring events that are programmed directly at the time the user requests a measurement. Modifying the performance event registers directly is not appropriate since it circumvents the in-kernel perf infrastructure that exists to manage these resources and provide resource arbitration to the performance monitoring hardware. The cache pseudo-locking measurements are modified to use the in-kernel perf infrastructure. Performance events are created and validated with the appropriate perf API. The performance counters are still read as directly as possible to avoid the additional cache hits. This is done safely by first ensuring with the perf API that the counters have been programmed correctly and only accessing the counters in an interrupt disabled section where they are not able to be moved. As part of the transition to the in-kernel perf infrastructure the L2 and L3 measurements are split into two separate measurements that can be triggered independently. This separation prevents additional cache misses incurred during the extra testing code used to decide if a L2 and/or L3 measurement should be made. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/fc24e728b446404f42c78573c506e98cd0599873.1537468643.git.reinette.chatre@intel.com	2018-09-28 22:48:27 +02:00
Reinette Chatre	0a701c9dd5	x86/intel_rdt: Create required perf event attributes A perf event has many attributes that are maintained in a separate structure that should be provided when a new perf_event is created. In preparation for the transition to perf_events the required attribute structures are created for all the events that may be used in the measurements. Most attributes for all the events are identical. The actual configuration, what specifies what needs to be measured, is what will be different between the events used. This configuration needs to be done with X86_CONFIG that cannot be used as part of the designated initializers used here, this will be introduced later. Although they do look identical at this time the attribute structures needs to be maintained separately since a perf_event will maintain a pointer to its unique attributes. In support of patch testing the new structs are given the unused attribute until their use in later patches. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1822f6164e221a497648d108913d056ab675d5d0.1537377064.git.reinette.chatre@intel.com	2018-09-28 22:48:27 +02:00
Reinette Chatre	b5e4274ef7	x86/intel_rdt: Remove local register variables Local register variables were used in an effort to improve the accuracy of the measurement of cache residency of a pseudo-locked region. This was done to ensure that only the cache residency of the memory is measured and not the cache residency of the variables used to perform the measurement. While local register variables do accomplish the goal they do require significant care since different architectures have different registers available. Local register variables also cannot be used with valuable developer tools like KASAN. Significant testing has shown that similar accuracy in measurement results can be obtained by replacing local register variables with regular local variables. Make use of local variables in the critical code but do so with READ_ONCE() to prevent the compiler from merging or refetching reads. Ensure these variables are initialized before the measurement starts, and ensure it is only the local variables that are accessed during the measurement. With the removal of the local register variables and using READ_ONCE() there is no longer a motivation for using a direct wrmsr call (that avoids the additional tracing code that may clobber the local register variables). Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/f430f57347414e0691765d92b144758ab93d8407.1537377064.git.reinette.chatre@intel.com	2018-09-28 22:48:26 +02:00
Reinette Chatre	1182a49529	perf/x86: Add helper to obtain performance counter index perf_event_read_local() is the safest way to obtain measurements associated with performance events. In some cases the overhead introduced by perf_event_read_local() affects the measurements and the use of rdpmcl() is needed. rdpmcl() requires the index of the performance counter used so a helper is introduced to determine the index used by a provided performance event. The index used by a performance event may change when interrupts are enabled. A check is added to ensure that the index is only accessed with interrupts disabled. Even with this check the use of this counter needs to be done with care to ensure it is queried and used within the same disabled interrupts section. This change introduces a new checkpatch warning: CHECK: extern prototypes should be avoided in .h files +extern int x86_perf_rdpmc_index(struct perf_event *event); This warning was discussed and designated as a false positive in http://lkml.kernel.org/r/20180919091759.GZ24124@hirez.programming.kicks-ass.net Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: acme@kernel.org Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/b277ffa78a51254f5414f7b1bc1923826874566e.1537377064.git.reinette.chatre@intel.com	2018-09-28 22:48:26 +02:00
Rob Herring	7de8f4aa2f	x86: DT: use for_each_of_cpu_node iterator Use the for_each_of_cpu_node iterator to iterate over cpu nodes. This has the side effect of defaulting to iterating using "cpu" node names in preference to the deprecated (for FDT) device_type == "cpu". Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rob Herring <robh@kernel.org>	2018-09-28 14:25:58 -05:00
Kees Cook	88fe0b957f	x86/fpu: Remove VLA usage of skcipher In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: x86@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-28 12:46:07 +08:00
YueHaibing	5140a6f471	x86/hyperv: Remove unused include Remove including <linux/version.h>. It's not needed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <devel@linuxdriverproject.org> Cc: <kernel-janitors@vger.kernel.org> Link: https://lkml.kernel.org/r/1537690822-97455-1-git-send-email-yuehaibing@huawei.com	2018-09-27 21:21:00 +02:00
Dexuan Cui	2f285f4624	x86/hyperv: Suppress "PCI: Fatal: No config space access function found" A Generation-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and users always see the scary warning, which is actually harmless. Suppress it. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org> Cc: Olaf Aepfle <olaf@aepfle.de> Cc: Andy Whitcroft <apw@canonical.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Marcelo Cerri <marcelo.cerri@canonical.com> Cc: Josh Poulson <jopoulso@microsoft.com> Link: https://lkml.kernel.org/r/ <KU1P153MB0166D977DC930996C4BF538ABF1D0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM	2018-09-27 21:19:14 +02:00
Peter Zijlstra	7904ba8a66	x86/mm/cpa: Optimize __cpa_flush_range() If we IPI for WBINDV, then we might as well kill the entire TLB too. But if we don't have to invalidate cache, there is no reason not to use a range TLB flush. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.195633798@infradead.org	2018-09-27 20:39:42 +02:00
Peter Zijlstra	47e262ac5b	x86/mm/cpa: Factor common code between cpa_flush_*() The start of cpa_flush_range() and cpa_flush_array() is the same, use a common function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.138859183@infradead.org	2018-09-27 20:39:42 +02:00
Peter Zijlstra	fce2ce9544	x86/mm/cpa: Move CLFLUSH test into cpa_flush_array() Rather than guarding cpa_flush_array() users with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.087848187@infradead.org	2018-09-27 20:39:42 +02:00
Peter Zijlstra	5f464b33b1	x86/mm/cpa: Move CLFLUSH test into cpa_flush_range() Rather than guarding all cpa_flush_range() uses with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.036195503@infradead.org	2018-09-27 20:39:41 +02:00
Peter Zijlstra	a7295fd53c	x86/mm/cpa: Use flush_tlb_kernel_range() Both cpa_flush_range() and cpa_flush_array() have a well specified range, use that to do a range based TLB invalidate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.985193217@infradead.org	2018-09-27 20:39:41 +02:00
Peter Zijlstra	ddd07b7503	x86/mm/cpa: Unconditionally avoid WBINDV when we can CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org	2018-09-27 20:39:41 +02:00
Peter Zijlstra	c0a759abf5	x86/mm/cpa: Move flush_tlb_all() There is an atom errata, where we do a local TLB invalidate right before we return and then do a global TLB invalidate. Move the global invalidate up a little bit and avoid the local invalidate entirely. This does put the global invalidate under pgd_lock, but that shouldn't matter. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.882287392@infradead.org	2018-09-27 20:39:40 +02:00
Peter Zijlstra	c6185b1f21	x86/mm/cpa: Use flush_tlb_all() Instead of open-coding it.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.831102058@infradead.org	2018-09-27 20:39:40 +02:00
Thomas Gleixner	585948f4f6	x86/mm/cpa: Avoid the 4k pages check completely The extra loop which tries hard to preserve large pages in case of conflicts with static protection regions turns out to be not preserving anything, at least not in the experiments which have been conducted. There might be corner cases in which the code would be able to preserve a large page oaccsionally, but it's really not worth the extra code and the cycles wasted in the common case. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.589642503@linutronix.de	2018-09-27 20:39:39 +02:00
Thomas Gleixner	9cc9f17a5a	x86/mm/cpa: Do the range check early To avoid excessive 4k wise checks in the common case do a quick check first whether the requested new page protections conflict with a static protection area in the large page. If there is no conflict then the decision whether to preserve or to split the page can be made immediately. If the requested range covers the full large page, preserve it. Otherwise split it up. No point in doing a slow crawl in 4k steps. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.507259989@linutronix.de	2018-09-27 20:39:39 +02:00
Thomas Gleixner	1c4b406ee8	x86/mm/cpa: Optimize same protection check When the existing mapping is correct and the new requested page protections are the same as the existing ones, then further checks can be omitted and the large page can be preserved. The slow path 4k wise check will not come up with a different result. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.424477581@linutronix.de	2018-09-27 20:39:39 +02:00
Thomas Gleixner	f61c5ba288	x86/mm/cpa: Add sanity check for existing mappings With the range check it is possible to do a quick verification that the current mapping is correct vs. the static protection areas. In case a incorrect mapping is detected a warning is emitted and the large page is split up. If the large page is a 2M page, then the split code is forced to check the static protections for the PTE entries to fix up the incorrectness. For 1G pages this can't be done easily because that would require to either find the offending 2M areas before the split or afterwards. For now just warn about that case and revisit it when reported. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.331408643@linutronix.de	2018-09-27 20:39:39 +02:00
Thomas Gleixner	69c31e69df	x86/mm/cpa: Avoid static protection checks on unmap If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are completely irrelevant. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.245849757@linutronix.de	2018-09-27 20:39:38 +02:00
Thomas Gleixner	5c280cf608	x86/mm/cpa: Add large page preservation statistics The large page preservation mechanism is just magic and provides no information at all. Add optional statistic output in debugfs so the magic can be evaluated. Defaults is off. Output: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.160867778@linutronix.de	2018-09-27 20:39:38 +02:00
Thomas Gleixner	4046460b86	x86/mm/cpa: Add debug mechanism The whole static protection magic is silently fixing up anything which is handed in. That's just wrong. The offending call sites need to be fixed. Add a debug mechanism which emits a warning if a requested mapping needs to be fixed up. The DETECT debug mechanism is really not meant to be enabled except for developers, so limit the output hard to the protection fixups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.078998733@linutronix.de	2018-09-27 20:39:38 +02:00
Thomas Gleixner	91ee8f5c1f	x86/mm/cpa: Allow range check for static protections Checking static protections only page by page is slow especially for huge pages. To allow quick checks over a complete range, add the ability to do that. Make the checks inclusive so the ranges can be directly used for debug output later. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.995734490@linutronix.de	2018-09-27 20:39:37 +02:00
Thomas Gleixner	afd7969a99	x86/mm/cpa: Rework static_protections() static_protections() is pretty unreadable. Split it up into separate checks for each protection area. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.913005317@linutronix.de	2018-09-27 20:39:37 +02:00
Thomas Gleixner	8679de0959	x86/mm/cpa: Split, rename and clean up try_preserve_large_page() Avoid the extra variable and gotos by splitting the function into the actual algorithm and a callable function which contains the lock protection. Rename it to should_split_large_page() while at it so the return values make actually sense. Clean up the code flow, comments and general whitespace damage while at it. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.830507216@linutronix.de	2018-09-27 20:39:37 +02:00
Thomas Gleixner	2a25dc7c79	x86/mm/init32: Mark text and rodata RO in one go The sequence of marking text and rodata read-only in 32bit init is: set_ro(text); kernel_set_to_readonly = 1; set_ro(rodata); When kernel_set_to_readonly is 1 it enables the protection mechanism in CPA for the read only regions. With the upcoming checks for existing mappings this consequently triggers the warning about an existing mapping being incorrect vs. static protections because rodata has not been converted yet. There is no technical reason to split the two, so just combine the RO protection to convert text and rodata in one go. Convert the printks to pr_info while at it. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.731701535@linutronix.de	2018-09-27 20:39:37 +02:00
Kairui Song	bdec8d7fa5	x86/boot: Fix kexec booting failure in the SEV bit detection code Commit `1958b5fc40` ("x86/boot: Add early boot support when running with SEV active") can occasionally cause system resets when kexec-ing a second kernel even if SEV is not active. That's because get_sev_encryption_bit() uses 32-bit rIP-relative addressing to read the value of enc_bit - a variable which caches a previously detected encryption bit position - but kexec may allocate the early boot code to a higher location, beyond the 32-bit addressing limit. In this case, garbage will be read and get_sev_encryption_bit() will return the wrong value, leading to accessing memory with the wrong encryption setting. Therefore, remove enc_bit, and thus get rid of the need to do 32-bit rIP-relative addressing in the first place. [ bp: massage commit message heavily. ] Fixes: `1958b5fc40` ("x86/boot: Add early boot support when running with SEV active") Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-kernel@vger.kernel.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: brijesh.singh@amd.com Cc: kexec@lists.infradead.org Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: ghook@redhat.com Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com	2018-09-27 19:35:03 +02:00
Pu Wen	4044240365	x86/xen: Add Hygon Dhyana support to Xen To make Xen work on the Hygon platform, reuse AMD's Xen support code path for Hygon Dhyana CPU. There are six core performance events counters per thread, so there are six MSRs for these counters. Also there are four legacy PMC MSRs, they are aliases of the counters. In this version, use the legacy and safe version of MSR access. Tested successfully with VPMU enabled in Xen on Hygon platform by testing with perf. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: jgross@suse.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/311bf41f08f24550aa6c5da3f1e03a68d3b89dac.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:59 +02:00
Pu Wen	b8f4abb652	x86/kvm: Add Hygon Dhyana support to KVM The Hygon Dhyana CPU has the SVM feature as AMD family 17h does. So enable the KVM infrastructure support to it. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/654dd12876149fba9561698eaf9fc15d030301f8.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:59 +02:00
Pu Wen	ac78bd7235	x86/mce: Add Hygon Dhyana support to the MCA infrastructure The machine check architecture for Hygon Dhyana CPU is similar to the AMD family 17h one. Add vendor checking for Hygon Dhyana to share the code path of AMD family 17h. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: tony.luck@intel.com Cc: thomas.lendacky@amd.com Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/87d8a4f16bdea0bfe0c0cf2e4a8d2c2a99b1055c.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:59 +02:00
Pu Wen	1a576b23d6	x86/bugs: Add Hygon Dhyana to the respective mitigation machinery The Hygon Dhyana CPU has the same speculative execution as AMD family 17h, so share AMD spectre mitigation code with Hygon Dhyana. Also Hygon Dhyana is not affected by meltdown, so add exception for it. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/0861d39c8a103fc0deca15bafbc85d403666d9ef.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:59 +02:00
Pu Wen	da33dfef40	x86/apic: Add Hygon Dhyana support Add Hygon Dhyana support to the APIC subsystem. When running in 32 bit mode, bigsmp should be enabled if there are more than 8 cores online. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/7a557265a8c7c9e842fe60f9d8e064458801aef3.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:58 +02:00
Pu Wen	c6babb5806	x86/pci, x86/amd_nb: Add Hygon Dhyana support to PCI and northbridge Hygon's PCI vendor ID is 0x1d94, and there are PCI devices 0x1450/0x1463/0x1464 for the host bridge on the Hygon Dhyana platform. Add Hygon Dhyana support to the PCI and northbridge subsystems by using the code path of AMD family 17h. [ bp: Massage commit message, sort local vars into reverse xmas tree order and move the amd_northbridges.num check up. ] Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: helgaas@kernel.org Cc: linux-pci@vger.kernel.org Link: https://lkml.kernel.org/r/5f8877bd413f2ea0833378dd5454df0720e1c0df.1537885177.git.puwen@hygon.cn	2018-09-27 18:28:58 +02:00
Pu Wen	b7a5cb4f22	x86/amd_nb: Check vendor in AMD-only functions Exit early in functions which are meant to run on AMD only but which get run on different vendor (VMs, etc). [ bp: rewrite commit message. ] Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: bhelgaas@google.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Cc: helgaas@kernel.org Link: https://lkml.kernel.org/r/487d8078708baedaf63eb00a82251e228b58f1c2.1537885177.git.puwen@hygon.cn	2018-09-27 18:28:58 +02:00
Pu Wen	c3fecca457	x86/alternative: Init ideal_nops for Hygon Dhyana The ideal_nops for Hygon Dhyana CPU should be p6_nops. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/79e76c3173716984fe5fdd4a8e2c798bf4193205.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:58 +02:00
Pu Wen	6d0ef316b9	x86/events: Add Hygon Dhyana support to PMU infrastructure The PMU architecture for the Hygon Dhyana CPU is similar to the AMD Family 17h one. To support it, call amd_pmu_init() to share the AMD PMU initialization flow, and change the PMU name to "HYGON". The Hygon Dhyana CPU supports both legacy and extension PMC MSRs (perf counter registers and event selection registers), so add Hygon Dhyana support in the similar way as AMD does. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/9d93ed54a975f33ef7247e0967960f4ce5d3d990.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:57 +02:00
Pu Wen	0b13bec787	x86/smpboot: Do not use BSP INIT delay and MWAIT to idle on Dhyana The Hygon Dhyana CPU uses no delay in smp_quirk_init_udelay(), and does HLT on idle just like AMD does. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: bp@alien8.de Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/87000fa82e273f5967c908448414228faf61e077.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:57 +02:00
Pu Wen	39dc6f154d	x86/cpu/mtrr: Support TOP_MEM2 and get MTRR number The Hygon Dhyana CPU has a special MSR way to force WB for memory >4GB, and support TOP_MEM2. Therefore, it is necessary to add Hygon Dhyana support in amd_special_default_mtrr(). The number of variable MTRRs for Hygon is 2 as AMD's. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/8246f81648d014601de3812ade40e85d9c50d9b3.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:57 +02:00
Pu Wen	d4f7423efd	x86/cpu: Get cache info and setup cache cpumap for Hygon Dhyana The Hygon Dhyana CPU has a topology extensions bit in CPUID. With this bit, the kernel can get the cache information. So add support in cpuid4_cache_lookup_regs() to get the correct cache size. The Hygon Dhyana CPU also discovers num_cache_leaves via CPUID leaf 0x8000001d, so add support to it in find_num_cache_leaves(). Also add cacheinfo_hygon_init_llc_id() and init_hygon_cacheinfo() functions to initialize Dhyana cache info. Setup cache cpumap in the same way as AMD does. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: bp@alien8.de Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/2a686b2ac0e2f5a1f2f5f101124d9dd44f949731.1537533369.git.puwen@hygon.cn	2018-09-27 18:28:57 +02:00
Borislav Petkov	7eae653c80	Merge branch 'tip-x86-hygon' into tip-x86-cpu ... in order to share one commit with the EDAC tree. Signed-off-by: Borislav Petkov <bp@suse.de>	2018-09-27 18:13:01 +02:00
Ard Biesheuvel	b34006c425	x86/jump_table: Use relative references Similar to the arm64 case, 64-bit x86 can benefit from using relative references rather than absolute ones when emitting struct jump_entry instances. Not only does this reduce the memory footprint of the entries themselves by 33%, it also removes the need for carrying relocation metadata on relocatable builds (i.e., for KASLR) which saves a fair chunk of .init space as well (although the savings are not as dramatic as on arm64) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jessica Yu <jeyu@kernel.org> Link: https://lkml.kernel.org/r/20180919065144.25010-7-ard.biesheuvel@linaro.org	2018-09-27 17:56:48 +02:00
Ard Biesheuvel	9fc0f798ab	x86/jump_label: Switch to jump_entry accessors In preparation of switching x86 to use place-relative references for the code, target and key members of struct jump_entry, replace direct references to the struct members with invocations of the new accessors. This will allow us to make the switch by modifying the accessors only. This incorporates a cleanup of __jump_label_transform() proposed by Peter. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jessica Yu <jeyu@kernel.org> Link: https://lkml.kernel.org/r/20180919065144.25010-6-ard.biesheuvel@linaro.org	2018-09-27 17:56:48 +02:00
Ard Biesheuvel	b40a142b12	x86: Add support for 64-bit place relative relocations Add support for R_X86_64_PC64 relocations, which operate on 64-bit quantities holding a relative symbol reference. Also remove the definition of R_X86_64_NUM: given that it is currently unused, it is unclear what the new value should be. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jessica Yu <jeyu@kernel.org> Link: https://lkml.kernel.org/r/20180919065144.25010-5-ard.biesheuvel@linaro.org	2018-09-27 17:56:47 +02:00
Thomas Gleixner	fa70f0d2ce	EFI updates for v4.20: - Add support for enlisting the help of the EFI firmware to create memory reservations that persist across kexec. - Add page fault handling to the runtime services support code on x86 so we can gracefully recover from buggy EFI firmware. - Fix command line handling on x86 for the boot path that omits the stub's PE/COFF entry point. - Other assorted fixes. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAlurXR8ACgkQwjcgfpV0 +n2CGwf/V4exixXjTDwkqE6gY5bq0Y3AL8tp89wdbJzjgGOIJLKh3CrGr8xEFHrv oYObcvB3SfNEIyGeBjc/8ZMw1P/j98s6ucsMm0u+V52k7xxu/xJoIPw3bX2R8LLc QhedUmKWLFQXxottaqzRFi1m0rP9TlAlc2n2pjIPCywjTPzeT/jBTtnRGRRdpDkN uxwv59eXc6MXuwJGhM9lGIBCu8ra54SiSByJSKoMwNYXQRCLtiBUg5iibWkKigHp 9rQiimQnDOuPiZ6JGFx6pwSu7cqv3d8LYk5EnU3zYfzxAvHRfxuf40joSeZzySby vZ4zRog79DxkSnuvaQ0+phQHiq+yQg== =HZGk -----END PGP SIGNATURE----- Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core Pull EFI updates for v4.20 from Ard Biesheuvel: - Add support for enlisting the help of the EFI firmware to create memory reservations that persist across kexec. - Add page fault handling to the runtime services support code on x86 so we can gracefully recover from buggy EFI firmware. - Fix command line handling on x86 for the boot path that omits the stub's PE/COFF entry point. - Other assorted fixes.	2018-09-27 16:58:49 +02:00
Pu Wen	c9661c1e80	x86/cpu: Create Hygon Dhyana architecture support file Add x86 architecture support for a new processor: Hygon Dhyana Family 18h. Carve out initialization code needed by Dhyana into a separate compilation unit. To identify Hygon Dhyana CPU, add a new vendor type X86_VENDOR_HYGON. Since Dhyana uses AMD functionality to a large degree, select CPU_SUP_AMD which provides that functionality. [ bp: drop explicit license statement as it has an SPDX tag already. ] Signed-off-by: Pu Wen <puwen@hygon.cn> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: thomas.lendacky@amd.com Link: https://lkml.kernel.org/r/1a882065223bacbde5726f3beaa70cebd8dcd814.1537533369.git.puwen@hygon.cn	2018-09-27 16:14:05 +02:00
Qiuxu Zhuo	e5276b1ffa	x86/mce: Add macros for the corrected error count bit field The bit field [52:38] of MCi_STATUS contains the corrected error count. Add {_SHIFT\|_MASK\|*_CEC(c)} macros for it. [ bp: use GENMASK_ULL. ] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20180925000343.GB5998@agluck-desk	2018-09-27 16:08:18 +02:00
Qiuxu Zhuo	93ac57540e	x86/mce: Use BIT_ULL(x) for bit mask definitions Current coding style is to use the BIT_ULL() macro. [ bp: Align the MCG_STATUS defines vertically too. ] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: linux-edac@vger.kernel.org Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20180925000127.GA5998@agluck-desk	2018-09-27 16:06:37 +02:00
Christoph Hellwig	3cfa210bf3	xen: don't include <xen/xen.h> from <asm/io.h> and <asm/dma-mapping.h> Nothing Xen specific in these headers, which get included from a lot of code in the kernel. So prune the includes and move them to the Xen-specific files that actually use them instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-26 08:45:18 -06:00
Christoph Hellwig	c39ae60dfb	block: remove ARCH_BIOVEC_PHYS_MERGEABLE Take the Xen check into the core code instead of delegating it to the architectures. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-26 08:45:11 -06:00
Christoph Hellwig	20e3267601	xen: provide a prototype for xen_biovec_phys_mergeable in xen.h Having multiple externs in arch headers is not a good way to provide a common interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-26 08:45:10 -06:00
Jiri Kosina	bb4b3b7762	x86/speculation: Propagate information about RSB filling mitigation to sysfs If spectrev2 mitigation has been enabled, RSB is filled on context switch in order to protect from various classes of spectrev2 attacks. If this mitigation is enabled, say so in sysfs for spectrev2. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438580.15880@cbobk.fhfr.pm	2018-09-26 14:26:52 +02:00
Jiri Kosina	53c613fe63	x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation STIBP is a feature provided by certain Intel ucodes / CPUs. This feature (once enabled) prevents cross-hyperthread control of decisions made by indirect branch predictors. Enable this feature if - the CPU is vulnerable to spectre v2 - the CPU supports SMT and has SMT siblings online - spectre_v2 mitigation autoselection is enabled (default) After some previous discussion, this leaves STIBP on all the time, as wrmsr on crossing kernel boundary is a no-no. This could perhaps later be a bit more optimized (like disabling it in NOHZ, experiment with disabling it in idle, etc) if needed. Note that the synchronization of the mask manipulation via newly added spec_ctrl_mutex is currently not strictly needed, as the only updater is already being serialized by cpu_add_remove_lock, but let's make this a little bit more future-proof. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm	2018-09-26 14:26:52 +02:00
Jiri Kosina	dbfe2953f6	x86/speculation: Apply IBPB more strictly to avoid cross-process data leak Currently, IBPB is only issued in cases when switching into a non-dumpable process, the rationale being to protect such 'important and security sensitive' processess (such as GPG) from data leaking into a different userspace process via spectre v2. This is however completely insufficient to provide proper userspace-to-userpace spectrev2 protection, as any process can poison branch buffers before being scheduled out, and the newly scheduled process immediately becomes spectrev2 victim. In order to minimize the performance impact (for usecases that do require spectrev2 protection), issue the barrier only in cases when switching between processess where the victim can't be ptraced by the potential attacker (as in such cases, the attacker doesn't have to bother with branch buffers at all). [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably fine-grained ] Fixes: `18bf3c3ea8` ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch") Originally-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm	2018-09-26 14:26:51 +02:00
Ben Hutchings	9c1442a9d0	x86: boot: Fix EFI stub alignment We currently align the end of the compressed image to a multiple of 16. However, the PE-COFF header included in the EFI stub says that the file alignment is 32 bytes, and when adding an EFI signature to the file it must first be padded to this alignment. sbsigntool commands warn about this: warning: file-aligned section .text extends beyond end of file warning: checksum areas are greater than image size. Invalid section table? Worse, pesign -at least when creating a detached signature- uses the hash of the unpadded file, resulting in an invalid signature if padding is required. Avoid both these problems by increasing alignment to 32 bytes when CONFIG_EFI_STUB is enabled. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2018-09-26 12:15:01 +02:00
Hans de Goede	c33ce98443	efi/x86: Call efi_parse_options() from efi_main() Before this commit we were only calling efi_parse_options() from make_boot_params(), but make_boot_params() only gets called if the kernel gets booted directly as an EFI executable. So when booted through e.g. grub we ended up not parsing the commandline in the boot code. This makes the drivers/firmware/efi/libstub code ignore the "quiet" commandline argument resulting in the following message being printed: "EFI stub: UEFI Secure Boot is enabled." Despite the quiet request. This commits adds an extra call to efi_parse_options() to efi_main() to make sure that the options are always processed. This fixes quiet not working. This also fixes the libstub code ignoring nokaslr and efi=nochunk. Reported-by: Peter Robinson <pbrobinson@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2018-09-26 12:15:00 +02:00
Aaron Ma	b8b39bff3c	efi/x86: earlyprintk - Add 64bit efi fb address support EFI GOP uses 64-bit frame buffer address in some BIOS. Add 64bit address support in efi earlyprintk. Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2018-09-26 12:14:58 +02:00
Sebastian Andrzej Siewior	4eda11175f	efi/x86: drop task_lock() from efi_switch_mm() efi_switch_mm() is a wrapper around switch_mm() which saves current's ->active_mm, sets the requests mm as ->active_mm and invokes switch_mm(). I don't think that task_lock() is required during that procedure. It protects ->mm which isn't changed here. It needs to be mentioned that during the whole procedure (switch to EFI's mm and back) the preemption needs to be disabled. A context switch at this point would reset the cr3 value based on current->mm. Also, this function may not be invoked at the same time on a different CPU because it would overwrite the efi_scratch.prev_mm information. Remove task_lock() and also update the comment to reflect it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2018-09-26 12:14:57 +02:00
Sai Praneeth	3425d934fc	efi/x86: Handle page faults occurring while running EFI runtime services Memory accesses performed by UEFI runtime services should be limited to: - reading/executing from EFI_RUNTIME_SERVICES_CODE memory regions - reading/writing from/to EFI_RUNTIME_SERVICES_DATA memory regions - reading/writing by-ref arguments - reading/writing from/to the stack. Accesses outside these regions may cause the kernel to hang because the memory region requested by the firmware isn't mapped in efi_pgd, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an EFI specific page fault handler which recovers from such faults by 1. If the efi runtime service is efi_reset_system(), reboot the machine through BIOS. 2. If the efi runtime service is _not_ efi_reset_system(), then freeze efi_rts_wq and schedule a new process. The EFI page fault handler offers us two advantages: 1. Avoid potential hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Suggested-by: Matt Fleming <matt@codeblueprint.co.uk> Based-on-code-from: Ricardo Neri <ricardo.neri@intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [ardb: clarify commit log] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2018-09-26 12:14:55 +02:00
Sohil Mehta	26b86092c4	iommu/vt-d: Relocate struct/function declarations to its header files To reuse the static functions and the struct declarations, move them to corresponding header files and export the needed functions. Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2018-09-25 14:33:43 +02:00
Ingo Molnar	fb437bc8fe	This is the 4.19-rc5 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlunyjMACgkQONu9yGCS aT52HhAA0JU7E88QPZ1gSxc1ifTaIlHXhLQSvQKAXOhIvHDwj4tEKDqPhpCN/dWX /o/xaUf36gU0VzUD/1IyEiMFmJEeFKnfvN5SZYZLk8uSrd4swqaY8mSueZxNEDz4 YNK9ugI/tPztuuz7I6KrO1iVquY1WlnECxc9FH76wvHsit8Sr3PvzhR+CVrOi+8p k3cpWlhHiOzT/3K3Wv2Et+oh+U+myKtQTlJDSe3fMx5chksJpBmsV/IDEtsLNZfz 3v25fHz5a3DOYqKkGJaDrbLyPNC85249B+CiXqbXvfOAHDVkMwYqcxYUG+YZ5cpm U0OShLXm67dz8vT9cxqOSguCliPRlM9W5+EKzmVT7l8+ycds3BuEEHg1xWPrJWgG 7XO10HkhZl+VvnJCj54KaszMUOdpvdEQSUs82gAFxjPbQIx5gosN9O0H+DnirMhS 6VtzS20ZoIzjd4YVkRoLNcobHB4bZVTNXZ1Zi3C/neP9pxUjhOk0y+Vr/crC5Xph 3TykIMgiVa+CdvQ/f4LOSiCgTFhF0tLGtfDQTG7f+9+W5pMc4NKSLi8EOMlJtYEy wsCYZ7/T9ElgrEzFvlxSvDwiPUhcldNao/EGdRYvMxXtgj0Ctw8LhR/2YKkqo6LK oMoKKWkj0o7uKSHKq+dakS0FprKnBnvE2Y+XA4SO/saPGFlDAVc= =OFJh -----END PGP SIGNATURE----- Merge tag 'v4.19-rc5' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-25 11:19:44 +02:00
Christoph Hellwig	6a9f5f240a	block: simplify BIOVEC_PHYS_MERGEABLE Turn the macro into an inline, move it to blk.h and simplify the arch hooks a bit. Also rename the function to biovec_phys_mergeable as there is no need to shout. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-24 12:33:54 -06:00
Paolo Bonzini	4679b61f26	KVM: x86: never trap MSR_KERNEL_GS_BASE KVM has an old optimization whereby accesses to the kernel GS base MSR are trapped when the guest is in 32-bit and not when it is in 64-bit mode. The idea is that swapgs is not available in 32-bit mode, thus the guest has no reason to access the MSR unless in 64-bit mode and 32-bit applications need not pay the price of switching the kernel GS base between the host and the guest values. However, this optimization adds complexity to the code for little benefit (these days most guests are going to be 64-bit anyway) and in fact broke after commit `678e315e78` ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base", 2018-08-06); the guest kernel GS base can be corrupted across SMIs and UEFI Secure Boot is therefore broken (a secure boot Linux guest, for example, fails to reach the login prompt about half the time). This patch just removes the optimization; the kernel GS base MSR is now never trapped by KVM, similarly to the FS and GS base MSRs. Fixes: `678e315e78` Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-24 18:34:13 +02:00
Zhenzhong Duan	0cbb76d628	x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant ..so that they match their asm counterpart. Add the missing ANNOTATE_NOSPEC_ALTERNATIVE in CALL_NOSPEC, while at it. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang YanQing <udknight@gmail.com> Cc: dhaval.giani@oracle.com Cc: srinivas.eeda@oracle.com Link: http://lkml.kernel.org/r/c3975665-173e-4d70-8dee-06c926ac26ee@default	2018-09-23 15:25:28 +02:00
Greg Kroah-Hartman	18d49ec3c6	xen: fixes for 4.19-rc5 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW6dGhgAKCRCAXGG7T9hj vs1UAPwPSDmelfUus+P5ndRQZdK/iQWuRgQUe9Gd3RUVTfcQ7AEAljcN4/dSj7SB hOgRlCp5WB1s5/vFF7z4jc2wtqvOPAk= =8P9c -----END PGP SIGNATURE----- Merge tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Juergen writes: "xen: Two small fixes for xen drivers." * tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: issue warning message when out of grant maptrack entries xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code	2018-09-23 13:32:19 +02:00
Greg Kroah-Hartman	328c6333ba	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Thomas writes: "A set of fixes for x86: - Resolve the kvmclock regression on AMD systems with memory encryption enabled. The rework of the kvmclock memory allocation during early boot results in encrypted storage, which is not shareable with the hypervisor. Create a new section for this data which is mapped unencrypted and take care that the later allocations for shared kvmclock memory is unencrypted as well. - Fix the build regression in the paravirt code introduced by the recent spectre v2 updates. - Ensure that the initial static page tables cover the fixmap space correctly so early console always works. This worked so far by chance, but recent modifications to the fixmap layout can - depending on kernel configuration - move the relevant entries to a different place which is not covered by the initial static page tables. - Address the regressions and issues which got introduced with the recent extensions to the Intel Recource Director Technology code. - Update maintainer entries to document reality" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Expand static page table for fixmap space MAINTAINERS: Add X86 MM entry x86/intel_rdt: Add Reinette as co-maintainer for RDT MAINTAINERS: Add Borislav to the x86 maintainers x86/paravirt: Fix some warning messages x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Fix exclusive mode handling of MBA resource x86/intel_rdt: Fix incorrect loop end condition x86/intel_rdt: Do not allow pseudo-locking of MBA resource x86/intel_rdt: Fix unchecked MSR access x86/intel_rdt: Fix invalid mode warning when multiple resources are managed x86/intel_rdt: Global closid helper to support future fixes x86/intel_rdt: Fix size reporting of MBA resource x86/intel_rdt: Fix data type in parsing callbacks x86/kvm: Use __bss_decrypted attribute in shared variables x86/mm: Add .bss..decrypted section to hold shared variables	2018-09-23 08:10:12 +02:00
Matthew Whitehead	2893cc8ff8	x86/CPU: Change query logic so CPUID is enabled before testing Presently we check first if CPUID is enabled. If it is not already enabled, then we next call identify_cpu_without_cpuid() and clear X86_FEATURE_CPUID. Unfortunately, identify_cpu_without_cpuid() is the function where CPUID becomes _enabled_ on Cyrix 6x86/6x86L CPUs. Reverse the calling sequence so that CPUID is first enabled, and then check a second time to see if the feature has now been activated. [ bp: Massage commit message and remove trailing whitespace. ] Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180921212041.13096-3-tedheadster@gmail.com	2018-09-22 11:47:39 +02:00
Matthew Whitehead	03b099bdcd	x86/CPU: Use correct macros for Cyrix calls There are comments in processor-cyrix.h advising you to _not_ make calls using the deprecated macros in this style: setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) \| 0x80); This is because it expands the macro into a non-functioning calling sequence. The calling order must be: outb(CX86_CCR2, 0x22); inb(0x23); From the comments: * When using the old macros a line like * setCx86(CX86_CCR2, getCx86(CX86_CCR2) \| 0x88); * gets expanded to: * do { * outb((CX86_CCR2), 0x22); * outb((({ * outb((CX86_CCR2), 0x22); * inb(0x23); * }) \| 0x88), 0x23); * } while (0); The new macros fix this problem, so use them instead. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jia Zhang <qianyue.zj@alibaba-inc.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180921212041.13096-2-tedheadster@gmail.com	2018-09-22 11:46:56 +02:00
Greg Kroah-Hartman	a27fb6d983	This pull request is slightly bigger than usual at this stage, but I swear I would have sent it the same to Linus! The main cause for this is that I was on vacation until two weeks ago and it took a while to sort all the pending patches between 4.19 and 4.20, test them and so on. It's mostly small bugfixes and cleanups, mostly around x86 nested virtualization. One important change, not related to nested virtualization, is that the ability for the guest kernel to trap CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now masked by default. This is because the feature is detected through an MSR; a very bad idea that Intel seems to like more and more. Some applications choke if the other fields of that MSR are not initialized as on real hardware, hence we have to disable the whole MSR by default, as was the case before Linux 4.12. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJbpPo1AAoJEL/70l94x66DdxgH/is0qe6ZBtzb6Qc0W+8mHHD7 nxIkWAs2V5NsouJ750YwRQ+0Ym407+wlNt30acdBUEoXhrnA5/TvyGq999XvCL96 upWEIxpIgbvTMX/e2nLhe4wQdhsboUK4r0/B9IFgVFYrdCt5uRXjB2G4ewxcqxL/ GxxqrAKhaRsbQG9Xv0Fw5Vohh/Ls6fQDJcyuY1EBnbMpVenq2QDLI6cOAPXncyFb uLN6ov4GNCWIPckwxejri5XhZesUOsafrmn48sApShh4T6TrisrdtSYdzl+DGza+ j5vhIEwdFO5kulZ3viuhqKJOnS2+F6wvfZ75IKT0tEKeU2bi+ifGDyGRefSF6Q0= =YXLw -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Paolo writes: "It's mostly small bugfixes and cleanups, mostly around x86 nested virtualization. One important change, not related to nested virtualization, is that the ability for the guest kernel to trap CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now masked by default. This is because the feature is detected through an MSR; a very bad idea that Intel seems to like more and more. Some applications choke if the other fields of that MSR are not initialized as on real hardware, hence we have to disable the whole MSR by default, as was the case before Linux 4.12." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits) KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs kvm: selftests: Add platform_info_test KVM: x86: Control guest reads of MSR_PLATFORM_INFO KVM: x86: Turbo bits in MSR_PLATFORM_INFO nVMX x86: Check VPID value on vmentry of L2 guests nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv KVM: VMX: check nested state and CR4.VMXE against SMM kvm: x86: make kvm_{load\|put}_guest_fpu() static x86/hyper-v: rename ipi_arg_{ex,non_ex} structures KVM: VMX: use preemption timer to force immediate VMExit KVM: VMX: modify preemption timer bit only when arming timer KVM: VMX: immediately mark preemption timer expired only for zero value KVM: SVM: Switch to bitmap_zalloc() KVM/MMU: Fix comment in walk_shadow_page_lockless_end() kvm: selftests: use -pthread instead of -lpthread KVM: x86: don't reset root in kvm_mmu_setup() kvm: mmu: Don't read PDPTEs when paging is not enabled x86/kvm/lapic: always disable MMIO interface in x2APIC mode KVM: s390: Make huge pages unavailable in ucontrol VMs ...	2018-09-21 16:21:42 +02:00
Eric W. Biederman	0a996c1a3f	signal/x86: Use force_sig_fault where appropriate Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 15:30:54 +02:00
Eric W. Biederman	419ceeb128	signal/x86: Pass pkey by value Now that si_code == SEGV_PKUERR is the flag indicating that a pkey is present there is no longer a need to pass a pointer to a local pkey value, instead pkey can be passed more efficiently by value. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 15:29:57 +02:00
Eric W. Biederman	b4fd52f25c	signal/x86: Replace force_sig_info_fault with force_sig_fault Now that the pkey handling has been removed force_sig_info_fault and force_sig_fault perform identical work. Just the type of the address paramter is different. So replace calls to force_sig_info_fault with calls to force_sig_fault, and remove force_sig_info_fault. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 15:26:24 +02:00
Eric W. Biederman	9db812dbb2	signal/x86: Call force_sig_pkuerr from __bad_area_nosemaphore There is only one code path that can generate a pkuerr signal. That code path calls __bad_area_nosemaphore and can be dectected by testing if si_code == SEGV_PKUERR. It can be seen from inspection that all of the other tests in fill_sig_info_pkey are unnecessary. Therefore call force_sig_pkuerr directly from __bad_area_semaphore and remove fill_sig_info_pkey. At the same time move the comment above force_sig_info_pkey into bad_area_access_error, so that the documentation about pkey generation races is not lost. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 15:03:05 +02:00
Eric W. Biederman	aba1ecd32c	signal/x86: Pass pkey not vma into __bad_area There is only one caller of __bad_area that passes in PKUERR and thus will generate a siginfo with si_pkey set. Therefore simplify the logic and hoist reading of vma_pkey up into that caller, and just pass *pkey into __bad_area. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:54:17 +02:00
Eric W. Biederman	988bbc7b1a	signal/x86: Don't compute pkey in __do_page_fault There are no more users of the computed pkey value in __do_page_fault so stop computing the value. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:52:12 +02:00
Eric W. Biederman	25c102d803	signal/x86: Remove pkey parameter from mm_fault_error After the previous cleanups to do_sigbus and and bad_area_nosemaphore mm_fault_error no now longer uses it's pkey parameter. Therefore remove the unused parameter. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:51:42 +02:00
Eric W. Biederman	27274f731c	signal/x86: Remove the pkey parameter from do_sigbus The function do_sigbus never sets si_code to PKUERR so it can never return a pkey to userspace. Therefore remove the unusable pkey parameter from do_sigbus. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:48:44 +02:00
Eric W. Biederman	768fd9c69b	signal/x86: Remove pkey parameter from bad_area_nosemaphore The function bad_area_nosemaphore always sets si_code to SEGV_MAPERR and as such can never return a pkey parameter. Therefore remove the unusable pkey parameter from bad_area_nosemaphore. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:48:04 +02:00
Eric W. Biederman	164881b614	signal/x86/traps: Simplify trap generation Update the DO_ERROR macro to take si_code and si_addr values for a siginfo, removing the need for the fill_trap_info function. Update do_trap to also take the sicode and si_addr values for a sigininfo and modify the code to call force_sig when a sicode is not passed in and to call force_sig_fault when all of the information is present. Making this a more obvious, simpler and less error prone construction. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:47:49 +02:00
Eric W. Biederman	afe8448c0d	signal/x86/traps: Use force_sig instead of open coding it. The function "force_sig(sig, tsk)" is equivalent to " force_sig_info(sig, SEND_SIG_PRIV, tsk)". Using the siginfo variants can be error prone so use the simpler old fashioned force_sig variant, and with luck the force_sig_info variant can go away. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:47:01 +02:00
Eric W. Biederman	851ce9e697	signal/x86/traps: Use force_sig_bnderr Instead of generating the siginfo in x86 specific code use the new helper function force_sig_bnderr to separate the concerns of collecting the information and generating a proper siginfo. Making the code easier to understand and maintain. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:46:27 +02:00
Eric W. Biederman	79e21d6540	signal/x86/traps: Move more code into do_trap_no_signal so it can be reused The function do_trap_no_signal embodies almost all of the work of the function do_trap. The exceptions are setting of thread.error_code and thread.trap_nr in the case when the signal will be sent, and reporting which signal will be sent with show_signal. Filling in struct siginfo and then calling do_trap is problematic as filling in struct siginfo is an fiddly process that can through inattention has resulted in fields not initialized and the wrong fields being filled in. To avoid this error prone situation I am replacing force_sig_info with a set of functions that take as arguments the information needed to send a specific kind of signal. The function do_trap is called in the context of several different kinds of signals today. Having a solid do_trap_no_signal that can be reused allows call sites that send different kinds of signals to reuse all of the code in do_trap_no_signal. Modify do_trap_no_signal to have a single exit there signals where be sent (aka returning -1) to allow more of the signal sending path to be moved to from do_trap to do_trap_no_signal. Move setting thread.trap_nr and thread.error_code into do_trap_no_signal so the code does not need to be duplicated. Make the type of the string that is passed into do_trap_no_signal to const. The only user of that str is die and it already takes a const string, so this just makes it explicit that the string won't change. All of this prepares the way for using do_trap_no_signal outside of do_trap. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-21 14:45:22 +02:00
Borislav Petkov	7401a633c3	x86/mce-inject: Reset injection struct after injection Clear the MCE struct which is used for collecting the injection details after injection. Also, populate it with more details from the machine. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20180905081954.10391-1-bp@alien8.de	2018-09-21 14:28:37 +02:00
Herbert Xu	910e3ca10b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Merge crypto-2.6 to resolve caam conflict with skcipher conversion.	2018-09-21 13:22:37 +08:00
Feng Tang	05ab1d8a4b	x86/mm: Expand static page table for fixmap space We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup. Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled. So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables. Fixes: `1ad83c858c` ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts) Cc: H Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com	2018-09-20 23:17:22 +02:00
Liran Alon	26b471c7e2	KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set their return value in "r" local var and break out of switch block when they encounter some error. This is because vcpu_load() is called before the switch block which have a proper cleanup of vcpu_put() afterwards. However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return immediately on error without performing above mentioned cleanup. Thus, change these handlers to behave as expected. Fixes: `8fcc4b5923` ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Patrick Colp <patrick.colp@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 18:54:08 +02:00
Christoph Hellwig	bc3ec75de5	dma-mapping: merge direct and noncoherent ops All the cache maintainance is already stubbed out when not enabled, but merging the two allows us to nicely handle the case where cache maintainance is required for some devices, but not others. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts	2018-09-20 09:01:15 +02:00
Drew Schmitt	6fbbde9a19	KVM: x86: Control guest reads of MSR_PLATFORM_INFO Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:46 +02:00
Drew Schmitt	d84f1cff90	KVM: x86: Turbo bits in MSR_PLATFORM_INFO Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only the CPUID faulting bit was settable. But now any bit in MSR_PLATFORM_INFO would be settable. This can be used, for example, to convey frequency information about the platform on which the guest is running. Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:46 +02:00
Krish Sadhukhan	ba8e23db59	nVMX x86: Check VPID value on vmentry of L2 guests According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: If the 'enable VPID' VM-execution control is 1, the value of the of the VPID VM-execution control field must not be 0000H. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:45 +02:00
Krish Sadhukhan	6de84e581c	nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check needs to be enforced on vmentry of L2 guests: - Bits 5:0 of the posted-interrupt descriptor address are all 0. - The posted-interrupt descriptor address does not set any bits beyond the processor's physical-address width. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:44 +02:00
Liran Alon	e6c67d8cf1	KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state, it is possible for a vCPU to be blocked while it is in guest-mode. According to Intel SDM 26.6.5 Interrupt-Window Exiting and Virtual-Interrupt Delivery: "These events wake the logical processor if it just entered the HLT state because of a VM entry". Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU should be waken from the HLT state and injected with the interrupt. In addition, if while the vCPU is blocked (while it is in guest-mode), it receives a nested posted-interrupt, then the vCPU should also be waken and injected with the posted interrupt. To handle these cases, this patch enhances kvm_vcpu_has_events() to also check if there is a pending interrupt in L2 virtual APICv provided by L1. That is, it evaluates if there is a pending virtual interrupt for L2 by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1 Evaluation of Pending Interrupts. Note that this also handles the case of nested posted-interrupt by the fact RVI is updated in vmx_complete_nested_posted_interrupt() which is called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() -> kvm_vcpu_running() -> vmx_check_nested_events() -> vmx_complete_nested_posted_interrupt(). Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:44 +02:00
Paolo Bonzini	5bea5123cb	KVM: VMX: check nested state and CR4.VMXE against SMM VMX cannot be enabled under SMM, check it when CR4 is set and when nested virtualization state is restored. This should fix some WARNs reported by syzkaller, mostly around alloc_shadow_vmcs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:43 +02:00
Sebastian Andrzej Siewior	822f312d47	kvm: x86: make kvm_{load\|put}_guest_fpu() static The functions kvm_load_guest_fpu() kvm_put_guest_fpu() are only used locally, make them static. This requires also that both functions are moved because they are used before their implementation. Those functions were exported (via EXPORT_SYMBOL) before commit `e5bb40251a` ("KVM: Drop kvm_{load,put}_guest_fpu() exports"). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:43 +02:00
Vitaly Kuznetsov	a1efa9b700	x86/hyper-v: rename ipi_arg_{ex,non_ex} structures These structures are going to be used from KVM code so let's make their names reflect their Hyper-V origin. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:42 +02:00
Sean Christopherson	d264ee0c2e	KVM: VMX: use preemption timer to force immediate VMExit A VMX preemption timer value of '0' is guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. Use the preemption timer (if it's supported) to trigger immediate VMExit in place of the current method of sending a self-IPI. This ensures that pending VMExit injection to L1 occurs prior to executing any instructions in the guest (regardless of nesting level). When deferring VMExit injection, KVM generates an immediate VMExit from the (possibly nested) guest by sending itself an IPI. Because hardware interrupts are blocked prior to VMEnter and are unblocked (in hardware) after VMEnter, this results in taking a VMExit(INTR) before any guest instruction is executed. But, as this approach relies on the IPI being received before VMEnter executes, it only works as intended when KVM is running as L0. Because there are no architectural guarantees regarding when IPIs are delivered, when running nested the INTR may "arrive" long after L2 is running e.g. L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. For the most part, this unintended delay is not an issue since the events being injected to L1 also do not have architectural guarantees regarding their timing. The notable exception is the VMX preemption timer[1], which is architecturally guaranteed to cause a VMExit prior to executing any instructions in the guest if the timer value is '0' at VMEnter. Specifically, the delay in injecting the VMExit causes the preemption timer KVM unit test to fail when run in a nested guest. Note: this approach is viable even on CPUs with a broken preemption timer, as broken in this context only means the timer counts at the wrong rate. There are no known errata affecting timer value of '0'. [1] I/O SMIs also have guarantees on when they arrive, but I have no idea if/how those are emulated in KVM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:42 +02:00
Sean Christopherson	f459a707ed	KVM: VMX: modify preemption timer bit only when arming timer Provide a singular location where the VMX preemption timer bit is set/cleared so that future usages of the preemption timer can ensure the VMCS bit is up-to-date without having to modify unrelated code paths. For example, the preemption timer can be used to force an immediate VMExit. Cache the status of the timer to avoid redundant VMREAD and VMWRITE, e.g. if the timer stays armed across multiple VMEnters/VMExits. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:51:41 +02:00
Sean Christopherson	4c008127e4	KVM: VMX: immediately mark preemption timer expired only for zero value A VMX preemption timer value of '0' at the time of VMEnter is architecturally guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. This architectural definition is in place to ensure that a previously expired timer is correctly recognized by the CPU as it is possible for the timer to reach zero and not trigger a VMexit due to a higher priority VMExit being signalled instead, e.g. a pending #DB that morphs into a VMExit. Whether by design or coincidence, commit `f4124500c2` ("KVM: nVMX: Fully emulate preemption timer") special cased timer values of '0' and '1' to ensure prompt delivery of the VMExit. Unlike '0', a timer value of '1' has no has no architectural guarantees regarding when it is delivered. Modify the timer emulation to trigger immediate VMExit if and only if the timer value is '0', and document precisely why '0' is special. Do this even if calibration of the virtual TSC failed, i.e. VMExit will occur immediately regardless of the frequency of the timer. Making only '0' a special case gives KVM leeway to be more aggressive in ensuring the VMExit is injected prior to executing instructions in the nested guest, and also eliminates any ambiguity as to why '1' is a special case, e.g. why wasn't the threshold for a "short timeout" set to 10, 100, 1000, etc... Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:26:46 +02:00
Andy Shevchenko	a101c9d63e	KVM: SVM: Switch to bitmap_zalloc() Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:26:45 +02:00
Tianyu Lan	9a9845867c	KVM/MMU: Fix comment in walk_shadow_page_lockless_end() kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page() This patch is to fix the commit. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:26:45 +02:00
Wei Yang	83b20b28c6	KVM: x86: don't reset root in kvm_mmu_setup() Here is the code path which shows kvm_mmu_setup() is invoked after kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path, this means the root_hpa and prev_roots are guaranteed to be invalid. And it is not necessary to reset it again. kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_create() vmx_create_vcpu() kvm_vcpu_init() kvm_arch_vcpu_init() kvm_mmu_create() kvm_arch_vcpu_setup() kvm_mmu_setup() kvm_init_mmu() This patch set reset_roots to false in kmv_mmu_setup(). Fixes: `50c28f21d0` Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:26:44 +02:00
Junaid Shahid	d35b34a9a7	kvm: mmu: Don't read PDPTEs when paging is not enabled kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and CR4.PAE = 1. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:26:43 +02:00
Vitaly Kuznetsov	d176620277	x86/kvm/lapic: always disable MMIO interface in x2APIC mode When VMX is used with flexpriority disabled (because of no support or if disabled with module parameter) MMIO interface to lAPIC is still available in x2APIC mode while it shouldn't be (kvm-unit-tests): PASS: apic_disable: Local apic enabled in x2APIC mode PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set FAIL: apic_disable: *0xfee00030: 50014 The issue appears because we basically do nothing while switching to x2APIC mode when APIC access page is not used. apic_mmio_{read,write} only check if lAPIC is disabled before proceeding to actual write. When APIC access is virtualized we correctly manipulate with VMX controls in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes in x2APIC mode so there's no issue. Disabling MMIO interface seems to be easy. The question is: what do we do with these reads and writes? If we add apic_x2apic_mode() check to apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will go to userspace. When lAPIC is in kernel, Qemu uses this interface to inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This somehow works with disabled lAPIC but when we're in xAPIC mode we will get a real injected MSI from every write to lAPIC. Not good. The simplest solution seems to be to just ignore writes to the region and return ~0 for all reads when we're in x2APIC mode. This is what this patch does. However, this approach is inconsistent with what currently happens when flexpriority is enabled: we allocate APIC access page and create KVM memory region so in x2APIC modes all reads and writes go to this pre-allocated page which is, btw, the same for all vCPUs. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-09-20 00:26:43 +02:00
Boris Ostrovsky	70513d5875	xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code Otherwise we may leak kernel stack for events that sample user registers. Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org	2018-09-19 11:26:53 -04:00
Eric W. Biederman	6ace1098a6	signal/x86/traps: Factor out show_signal The code for conditionally printing unhanded signals is duplicated twice in arch/x86/kernel/traps.c. Factor it out into it's own subroutine called show_signal to make the code clearer and easier to maintain. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-19 15:54:44 +02:00
Eric W. Biederman	8d68fa0e08	signal/x86: Move mpx siginfo generation into do_bounds This separates the logic of generating the signal from the logic of gathering the information about the bounds violation. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-19 15:53:11 +02:00
Eric W. Biederman	8a35eb22c0	signal/x86: In trace_mpx_bounds_register_exception add __user annotations The value passed in to addr_referenced is of type void __user *, so update the addr_referenced parameter in trace_mpx_bounds_register_exception to match. Also update the addr_referenced paramater in TP_STRUCT__entry as it again holdes the same value. I don't know why this was missed earlier but sparse was complaining when testing test branch so fix this now. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-19 15:52:21 +02:00
Eric W. Biederman	585a8b9b48	signal/x86: Use send_sig_mceerr as apropriate This simplifies the code making it clearer what is going on. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-19 15:51:33 +02:00
Eric W. Biederman	40e5539463	signal/x86: Move MCE error reporting out of force_sig_info_fault Only the call from do_sigbus will send SIGBUS due to a memory machine check error. Consolidate all of the machine check signal generation code in do_sigbus and remove the now unnecessary fault parameter from force_sig_info_fault. Explicitly use the now constant si_code BUS_ADRERR in the call to force_sig_info_fault from do_sigbus. This makes the code in arch/x86/mm/fault.c easier to follower and simpler to maintain. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-19 15:50:08 +02:00
Eric W. Biederman	73f297aa07	signal/x86: Inline fill_sigtrap_info in it's only caller send_sigtrap The function fill_sigtrap_info now only has one caller so remove it and put it's contents in it's caller. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-19 15:45:57 +02:00
Eric W. Biederman	efc463adbc	signal: Simplify tracehook_report_syscall_exit Replace user_single_step_siginfo with user_single_step_report that allocates siginfo structure on the stack and sends it. This allows tracehook_report_syscall_exit to become a simple if statement that calls user_single_step_report or ptrace_report_syscall depending on the value of step. Update the default helper function now called user_single_step_report to explicitly set si_code to SI_USER and to set si_uid and si_pid to 0. The default helper has always been doing this (using memset) but it was far from obvious. The powerpc helper can now just call force_sig_fault. The x86 helper can now just call send_sigtrap. Unfortunately the default implementation of user_single_step_report can not use force_sig_fault as it does not use a SIGTRAP si_code. So it has to carefully setup the siginfo and use use force_sig_info. The net result is code that is easier to understand and simpler to maintain. Ref: `85ec7fd9f8` ("ptrace: introduce user_single_step_siginfo() helper") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-19 15:45:42 +02:00
Dan Carpenter	571d0563c8	x86/paravirt: Fix some warning messages The first argument to WARN_ONCE() is a condition. Fixes: `5800dc5c19` ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alok Kataria <akataria@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: virtualization@lists.linux-foundation.org Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda	2018-09-19 13:22:04 +02:00
Greg Kroah-Hartman	4ca719a338	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Crypto stuff from Herbert: "This push fixes a potential boot hang in ccp and an incorrect CPU capability check in aegis/morus on x86." * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2 crypto: ccp - add timeout support in the SEV command	2018-09-19 08:29:42 +02:00
Reinette Chatre	ffb2315fd2	x86/intel_rdt: Fix incorrect loop end condition In order to determine a sane default cache allocation for a new CAT/CDP resource group, all resource groups are checked to determine which cache portions are available to share. At this time all possible CLOSIDs that can be supported by the resource is checked. This is problematic if the resource supports more CLOSIDs than another CAT/CDP resource. In this case, the number of CLOSIDs that could be allocated are fewer than the number of CLOSIDs that can be supported by the resource. Limit the check of closids to that what is supported by the system based on the minimum across all resources. Fixes: `95f0b77ef` ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-10-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:07 +02:00
Reinette Chatre	939b90b20b	x86/intel_rdt: Fix exclusive mode handling of MBA resource It is possible for a resource group to consist out of MBA as well as CAT/CDP resources. The "exclusive" resource mode only applies to the CAT/CDP resources since MBA allocations cannot be specified to overlap or not. When a user requests a resource group to become "exclusive" then it can only be successful if there are CAT/CDP resources in the group and none of their CBMs associated with the group's CLOSID overlaps with any other resource group. Fix the "exclusive" mode setting by failing if there isn't any CAT/CDP resource in the group and ensuring that the CBM checking is only done on CAT/CDP resources. Fixes: `49f7b4efa` ("x86/intel_rdt: Enable setting of exclusive mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-9-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:07 +02:00
Reinette Chatre	f0df4e1acf	x86/intel_rdt: Fix incorrect loop end condition A loop is used to check if a CAT resource's CBM of one CLOSID overlaps with the CBM of another CLOSID of the same resource. The loop is run over all CLOSIDs supported by the resource. The problem with running the loop over all CLOSIDs supported by the resource is that its number of supported CLOSIDs may be more than the number of supported CLOSIDs on the system, which is the minimum number of CLOSIDs supported across all resources. Fix the loop to only consider the number of system supported CLOSIDs, not all that are supported by the resource. Fixes: `49f7b4efa` ("x86/intel_rdt: Enable setting of exclusive mode") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-8-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:06 +02:00
Reinette Chatre	32d736abed	x86/intel_rdt: Do not allow pseudo-locking of MBA resource A system supporting pseudo-locking may have MBA as well as CAT resources of which only the CAT resources could support cache pseudo-locking. When the schemata to be pseudo-locked is provided it should be checked that that schemata does not attempt to pseudo-lock a MBA resource. Fixes: `e0bdfe8e3` ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-7-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:06 +02:00
Reinette Chatre	70479c012b	x86/intel_rdt: Fix unchecked MSR access When a new resource group is created, it is initialized with sane defaults that currently assume the resource being initialized is a CAT resource. This code path is also followed by a MBA resource that is not allocated the same as a CAT resource and as a result we encounter the following unchecked MSR access error: unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000 000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20) Call Trace: mba_wrmsr+0x41/0x80 update_domains+0x125/0x130 rdtgroup_mkdir+0x270/0x500 Fix the above by ensuring the initial allocation is only attempted on a CAT resource. Fixes: `95f0b77ef` ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-6-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:06 +02:00
Reinette Chatre	47d53b184a	x86/intel_rdt: Fix invalid mode warning when multiple resources are managed When multiple resources are managed by RDT, the number of CLOSIDs used is the minimum of the CLOSIDs supported by each resource. In the function rdt_bit_usage_show(), the annotated bitmask is created to depict how the CAT supporting caches are being used. During this annotated bitmask creation, each resource group is queried for its mode that is used as a label in the annotated bitmask. The maximum number of resource groups is currently assumed to be the number of CLOSIDs supported by the resource for which the information is being displayed. This is incorrect since the number of active CLOSIDs is the minimum across all resources. If information for a cache instance with more CLOSIDs than another is being generated we thus encounter a warning like: invalid mode for closid 8 WARNING: CPU: 88 PID: 1791 at [SNIP]/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c :827 rdt_bit_usage_show+0x221/0x2b0 Fix this by ensuring that only the number of supported CLOSIDs are considered. Fixes: `e651901187` ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-5-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:05 +02:00
Reinette Chatre	c793da8e4c	x86/intel_rdt: Global closid helper to support future fixes The number of CLOSIDs supported by a system is the minimum number of CLOSIDs supported by any of its resources. Care should be taken when iterating over the CLOSIDs of a resource since it may be that the number of CLOSIDs supported on the system is less than the number of CLOSIDs supported by the resource. Introduce a helper function that can be used to query the number of CLOSIDs that is supported by all resources, irrespective of how many CLOSIDs are supported by a particular resource. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-4-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:05 +02:00
Reinette Chatre	f968dc119a	x86/intel_rdt: Fix size reporting of MBA resource Chen Yu reported a divide-by-zero error when accessing the 'size' resctrl file when a MBA resource is enabled. divide error: 0000 [#1] SMP PTI CPU: 93 PID: 1929 Comm: cat Not tainted 4.19.0-rc2-debug-rdt+ #25 RIP: 0010:rdtgroup_cbm_to_size+0x7e/0xa0 Call Trace: rdtgroup_size_show+0x11a/0x1d0 seq_read+0xd8/0x3b0 Quoting Chen Yu's report: This is because for MB resource, the r->cache.cbm_len is zero, thus calculating size in rdtgroup_cbm_to_size() will trigger the exception. Fix this issue in the 'size' file by getting correct memory bandwidth value which is in MBps when MBA software controller is enabled or in percentage when MBA software controller is disabled. Fixes: `d9b48c86eb` ("x86/intel_rdt: Display resource groups' allocations in bytes") Reported-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chen Yu <yu.c.chen@intel.com> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Xiaochen Shen" <xiaochen.shen@intel.com> Link: https://lkml.kernel.org/r/20180904174614.26682-1-yu.c.chen@intel.com Link: https://lkml.kernel.org/r/1537048707-76280-3-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:05 +02:00
Xiaochen Shen	753694a8df	x86/intel_rdt: Fix data type in parsing callbacks Each resource is associated with a parsing callback to parse the data provided from user space when writing schemata file. The 'data' parameter in the callbacks is defined as a void pointer which is error prone due to lack of type check. parse_bw() processes the 'data' parameter as a string while its caller actually passes the parameter as a pointer to struct rdt_cbm_parse_data. Thus, parse_bw() takes wrong data and causes failure of parsing MBA throttle value. To fix the issue, the 'data' parameter in all parsing callbacks is defined and handled as a pointer to struct rdt_parse_data (renamed from struct rdt_cbm_parse_data). Fixes: `7604df6e16` ("x86/intel_rdt: Support flexible data to parsing callbacks") Fixes: `9ab9aa15c3` ("x86/intel_rdt: Ensure requested schemata respects mode") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Chen Yu" <yu.c.chen@intel.com> Link: https://lkml.kernel.org/r/1537048707-76280-2-git-send-email-fenghua.yu@intel.com	2018-09-18 23:38:05 +02:00
Dou Liyang	76f99ae5b5	irq/matrix: Spread managed interrupts on allocation Linux spreads out the non managed interrupt across the possible target CPUs to avoid vector space exhaustion. Managed interrupts are treated differently, as for them the vectors are reserved (with guarantee) when the interrupt descriptors are initialized. When the interrupt is requested a real vector is assigned. The assignment logic uses the first CPU in the affinity mask for assignment. If the interrupt has more than one CPU in the affinity mask, which happens when a multi queue device has less queues than CPUs, then doing the same search as for non managed interrupts makes sense as it puts the interrupt on the least interrupt plagued CPU. For single CPU affine vectors that's obviously a NOOP. Restructre the matrix allocation code so it does the 'best CPU' search, add the sanity check for an empty affinity mask and adapt the call site in the x86 vector management code. [ tglx: Added the empty mask check to the core and improved change log ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.com	2018-09-18 18:27:24 +02:00
Punit Agrawal	d193631bfb	x86/PCI: Remove node-local allocation when initialising host controller Memory for host controller data structures is allocated local to the node to which the controller is associated with. This has been the behaviour since `965cd0e4a5` ("x86, PCI, ACPI: Use kmalloc_node() to optimize for performance") where the node local allocation was added without additional context. Drop the node local allocation as there is no benefit from doing so - the usage of these structures is independent from where the controller is located. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2018-09-17 16:33:24 -05:00
Linus Walleij	efdfeb079c	regulator: fixed: Convert to use GPIO descriptor only As we augmented the regulator core to accept a GPIO descriptor instead of a GPIO number, we can augment the fixed GPIO regulator to look up and pass that descriptor directly from device tree or board GPIO descriptor look up tables. Some boards just auto-enumerate their fixed regulator platform devices and I have assumed they get names like "fixed-regulator.0" but it's pretty hard to guess this. I need some testing from board maintainers to be sure. Other boards are straight forward, using just plain "fixed-regulator" (ID -1) or "fixed-regulator.1" hammering down the device ID. It seems the da9055 and da9211 has never got around to actually passing any enable gpio into its platform data (not the in-tree code anyway) so we can just decide to simply pass a descriptor instead. The fixed GPIO-controlled regulator in mach-pxa/ezx.c was confusingly named "_dummy_supply_device" while it is a very real device backed by a GPIO line. There is nothing dummy about it at all, so I renamed it with the infix _regulator_* as part of this patch set. Intel MID portions tested by Andy. Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> # Check the x86 BCM stuff Acked-by: Tony Lindgren <tony@atomide.com> # OMAP1,2,3 maintainer Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Mark Brown <broonie@kernel.org>	2018-09-17 14:32:22 -07:00
Brijesh Singh	6a1cac56f4	x86/kvm: Use __bss_decrypted attribute in shared variables The recent removal of the memblock dependency from kvmclock caused a SEV guest regression because the wall_clock and hv_clock_boot variables are no longer mapped decrypted when SEV is active. Use the __bss_decrypted attribute to put the static wall_clock and hv_clock_boot in the .bss..decrypted section so that they are mapped decrypted during boot. In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer assigns either an element of the static array or dynamically allocated memory for the pvclock data pointer. The static array are now mapped decrypted but the dynamically allocated memory is not mapped decrypted. However, when SEV is active this memory range must be mapped decrypted. Add a function which is called after the page allocator is up, and allocate memory for the pvclock data pointers for the all possible cpus. Map this memory range as decrypted when SEV is active. Fixes: `368a540e02` ("x86/kvmclock: Remove memblock dependency") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1536932759-12905-3-git-send-email-brijesh.singh@amd.com	2018-09-15 20:48:46 +02:00
Brijesh Singh	b3f0907c71	x86/mm: Add .bss..decrypted section to hold shared variables kvmclock defines few static variables which are shared with the hypervisor during the kvmclock initialization. When SEV is active, memory is encrypted with a guest-specific key, and if the guest OS wants to share the memory region with the hypervisor then it must clear the C-bit before sharing it. Currently, we use kernel_physical_mapping_init() to split large pages before clearing the C-bit on shared pages. But it fails when called from the kvmclock initialization (mainly because the memblock allocator is not ready that early during boot). Add a __bss_decrypted section attribute which can be used when defining such shared variable. The so-defined variables will be placed in the .bss..decrypted section. This section will be mapped with C=0 early during boot. The .bss..decrypted section has a big chunk of memory that may be unused when memory encryption is not active, free it when memory encryption is not active. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Radim Krčmář<rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com	2018-09-15 20:48:45 +02:00
Linus Torvalds	27c5a778df	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingol Molnar: "Misc fixes: - EFI crash fix - Xen PV fixes - do not allow PTI on 2-level 32-bit kernels for now - documentation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/APM: Fix build warning when PROC_FS is not enabled Revert "x86/mm/legacy: Populate the user page-table with user pgd's" x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3 x86/xen: Disable CPU0 hotplug for Xen PV x86/EISA: Don't probe EISA bus for Xen PV guests x86/doc: Fix Documentation/x86/earlyprintk.txt	2018-09-15 08:02:46 -10:00
zhong jiang	8e6b65a1b6	x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION Get rid of local @cpu variable which is unused in the !CONFIG_IA32_EMULATION case. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1536806985-24197-1-git-send-email-zhongjiang@huawei.com [ Clean up commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2018-09-15 14:57:05 +02:00
Randy Dunlap	002b87d2aa	x86/APM: Fix build warning when PROC_FS is not enabled Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled: ../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function] static int proc_apm_show(struct seq_file m, void v) Fixes: `3f3942aca6` ("proc: introduce proc_create_single{,_data}") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jiri Kosina <jikos@kernel.org> Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org	2018-09-15 10:16:25 +02:00
Joerg Roedel	61a6bd83ab	Revert "x86/mm/legacy: Populate the user page-table with user pgd's" This reverts commit `1f40a46cf4`. It turned out that this patch is not sufficient to enable PTI on 32 bit systems with legacy 2-level page-tables. In this paging mode the huge-page PTEs are in the top-level page-table directory, where also the mirroring to the user-space page-table happens. So every huge PTE exits twice, in the kernel and in the user page-table. That means that accessed/dirty bits need to be fetched from two PTEs in this mode to be safe, but this is not trivial to implement because it needs changes to generic code just for the sake of enabling PTI with 32-bit legacy paging. As all systems that need PTI should support PAE anyway, remove support for PTI when 32-bit legacy paging is used. Fixes: `7757d607c6` ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org	2018-09-14 17:08:45 +02:00
Mikulas Patocka	a788848116	crypto: aesni - don't use GFP_ATOMIC allocation if the request doesn't cross a page in gcm This patch fixes gcmaes_crypt_by_sg so that it won't use memory allocation if the data doesn't cross a page boundary. Authenticated encryption may be used by dm-crypt. If the encryption or decryption fails, it would result in I/O error and filesystem corruption. The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can fail anytime. This patch fixes the logic so that it won't attempt the failing allocation if the data doesn't cross a page boundary. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-14 14:08:53 +08:00
Ondrej Mosnacek	24568b47d4	crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2 It turns out OSXSAVE needs to be checked only for AVX, not for SSE. Without this patch the affected modules refuse to load on CPUs with SSE2 but without AVX support. Fixes: `877ccce7cb` ("crypto: x86/aegis,morus - Fix and simplify CPUID checks") Cc: <stable@vger.kernel.org> # 4.18 Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-14 14:08:27 +08:00
Guenter Roeck	cf40361ede	x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3 Commit `eeb89e2bb1` ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()") moved loading the fixmap in efi_call_phys_epilog() after load_cr3() since it was assumed to be more logical. Turns out this is incorrect: In efi_call_phys_prolog(), the gdt with its physical address is loaded first, and when the %cr3 is reloaded in _epilog from initial_page_table to swapper_pg_dir again the gdt is no longer mapped. This results in a triple fault if an interrupt occurs after load_cr3() and before load_fixmap_gdt(0). Calling load_fixmap_gdt(0) first restores the execution order prior to commit `eeb89e2bb1` and fixes the problem. Fixes: `eeb89e2bb1` ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: linux-efi@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/1536689892-21538-1-git-send-email-linux@roeck-us.net	2018-09-12 21:53:34 +02:00
Andy Lutomirski	bf904d2762	x86/pti/64: Remove the SYSCALL64 entry trampoline The SYSCALL64 trampoline has a couple of nice properties: - The usual sequence of SWAPGS followed by two GS-relative accesses to set up RSP is somewhat slow because the GS-relative accesses need to wait for SWAPGS to finish. The trampoline approach allows RIP-relative accesses to set up RSP, which avoids the stall. - The trampoline avoids any percpu access before CR3 is set up, which means that no percpu memory needs to be mapped in the user page tables. This prevents using Meltdown to read any percpu memory outside the cpu_entry_area and prevents using timing leaks to directly locate the percpu areas. The downsides of using a trampoline may outweigh the upsides, however. It adds an extra non-contiguous I$ cache line to system calls, and it forces an indirect jump to transfer control back to the normal kernel text after CR3 is set up. The latter is because x86 lacks a 64-bit direct jump instruction that could jump from the trampoline to the entry text. With retpolines enabled, the indirect jump is extremely slow. Change the code to map the percpu TSS into the user page tables to allow the non-trampoline SYSCALL64 path to work under PTI. This does not add a new direct information leak, since the TSS is readable by Meltdown from the cpu_entry_area alias regardless. It does allow a timing attack to locate the percpu area, but KASLR is more or less a lost cause against local attack on CPUs vulnerable to Meltdown regardless. As far as I'm concerned, on current hardware, KASLR is only useful to mitigate remote attacks that try to attack the kernel without first gaining RCE against a vulnerable user process. On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall overhead from ~237ns to ~228ns. There is a possible alternative approach: Move the trampoline within 2G of the entry text and make a separate copy for each CPU. This would allow a direct jump to rejoin the normal entry path. There are pro's and con's for this approach: + It avoids a pipeline stall - It executes from an extra page and read from another extra page during the syscall. The latter is because it needs to use a relative addressing mode to find sp1 -- it's the same cacheline, but accessed using an alias, so it's an extra TLB entry. - Slightly more memory. This would be one page per CPU for a simple implementation and 64-ish bytes per CPU or one page per node for a more complex implementation. - More code complexity. The current approach is chosen for simplicity and because the alternative does not provide a significant benefit, which makes it worth. [ tglx: Added the alternative discussion to the changelog ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org	2018-09-12 21:33:53 +02:00
Zubin Mithra	49e73246cb	perf/x86/intel/pt: Annotate 'pt_cap_group' with __ro_after_init 'pt_cap_group' is written to in pt_pmu_hw_init() and not modified after. This makes it a suitable candidate for annotating as __ro_after_init. Signed-off-by: Zubin Mithra <zsm@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/20180912164510.23444-1-zsm@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-12 21:16:16 +02:00
Juergen Gross	999696752d	x86/xen: Disable CPU0 hotplug for Xen PV Xen PV guests don't allow CPU0 hotplug, so disable it. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20180912174122.24282-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-12 21:15:02 +02:00
Boris Ostrovsky	6a92b11169	x86/EISA: Don't probe EISA bus for Xen PV guests For unprivileged Xen PV guests this is normal memory and ioremap will not be able to properly map it. While at it, since ioremap may return NULL, add a test for pointer's validity. Reported-by: Andy Smith <andy@strugglers.net> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: xen-devel@lists.xenproject.org Cc: jgross@suse.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180911195538.23289-1-boris.ostrovsky@oracle.com	2018-09-11 23:36:50 +02:00
Eric W. Biederman	4a63c1ffd3	signal: Properly deliver SIGSEGV from x86 uprobes For userspace to tell the difference between an random signal and an exception, the exception must include siginfo information. Using SEND_SIG_FORCED for SIGSEGV is thus wrong, and it will result in userspace seeing si_code == SI_USER (like a random signal) instead of si_code == SI_KERNEL or a more specific si_code as all exceptions deliver. Therefore replace force_sig_info(SIGSEGV, SEND_SIG_FORCE, current) with force_sig(SIG_SEGV, current) which gets this right and is shorter and easier to type. Fixes: `791eca1010` ("uretprobes/x86: Hijack return address") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-09-11 21:18:53 +02:00
Borislav Petkov	3637897b6c	x86/paravirt: Clean up native_patch() When CONFIG_PARAVIRT_SPINLOCKS=n, it generates a warning: arch/x86/kernel/paravirt_patch_64.c: In function ‘native_patch’: arch/x86/kernel/paravirt_patch_64.c:89:1: warning: label ‘patch_site’ defined but not used [-Wunused-label] patch_site: ... but those labels can simply be removed by directly calling the respective functions there. Get rid of local variables too, while at it. Also, simplify function flow for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180911091510.GA12094@zn.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-11 12:45:14 +02:00
Mikulas Patocka	02101c45ec	x86/asm: Optimize memcpy_flushcache() I use memcpy_flushcache() in my persistent memory driver for metadata updates, there are many 8-byte and 16-byte updates and it turns out that the overhead of memcpy_flushcache causes 2% performance degradation compared to "movnti" instruction explicitly coded using inline assembler. The tests were done on a Skylake processor with persistent memory emulated using the "memmap" kernel parameter. dd was used to copy data to the dm-writecache target. This patch recognizes memcpy_flushcache calls with constant short length and turns them into inline assembler - so that I don't have to use inline assembler in the driver. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: device-mapper development <dm-devel@redhat.com> Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1808081720460.24747@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 15:17:12 +02:00
Chao Fan	44060e8a51	x86/boot/KASLR: Remove return value from handle_mem_options() It's not used by its sole user, so remove this unused functionality. Also remove a stray unused variable that GCC didn't warn about for some reason. Suggested-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kirill.shutemov@linux.intel.com Link: http://lkml.kernel.org/r/20180807015705.21697-1-fanc.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 15:07:11 +02:00
Zubin Mithra	2766d2ee96	perf/x86: Add __ro_after_init annotations x86_pmu_{format,events,attr,caps}_group is written to in init_hw_perf_events and not modified after. This makes them suitable candidates for annotating as __ro_after_init. Signed-off-by: Zubin Mithra <zsm@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: groeck@chromium.org Link: http://lkml.kernel.org/r/20180810154314.96710-1-zsm@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 14:55:36 +02:00
He Zhe	b1e3a25f58	x86/corruption-check: Use pr_() instead of printk() pr_() is the preferred style. Signed-off-by: He Zhe <zhe.he@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: kstewart@linuxfoundation.org Cc: pombredanne@nexb.com Link: http://lkml.kernel.org/r/1534260823-87917-2-git-send-email-zhe.he@windriver.com [ Moved all console output into a single line. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 14:47:33 +02:00
He Zhe	ccde460b9a	x86/corruption-check: Fix panic in memory_corruption_check() when boot option without value is provided memory_corruption_check[{_period\|_size}]()'s handlers do not check input argument before passing it to kstrtoul() or simple_strtoull(). The argument would be a NULL pointer if each of the kernel parameters, without its value, is set in command line and thus cause the following panic. PANIC: early exception 0xe3 IP 10:ffffffff73587c22 error 0 cr2 0x0 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18-rc8+ #2 [ 0.000000] RIP: 0010:kstrtoull+0x2/0x10 ... [ 0.000000] Call Trace [ 0.000000] ? set_corruption_check+0x21/0x49 [ 0.000000] ? do_early_param+0x4d/0x82 [ 0.000000] ? parse_args+0x212/0x330 [ 0.000000] ? rdinit_setup+0x26/0x26 [ 0.000000] ? parse_early_options+0x20/0x23 [ 0.000000] ? rdinit_setup+0x26/0x26 [ 0.000000] ? parse_early_param+0x2d/0x39 [ 0.000000] ? setup_arch+0x2f7/0xbf4 [ 0.000000] ? start_kernel+0x5e/0x4c2 [ 0.000000] ? load_ucode_bsp+0x113/0x12f [ 0.000000] ? secondary_startup_64+0xa5/0xb0 This patch adds checks to prevent the panic. Signed-off-by: He Zhe <zhe.he@windriver.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: kstewart@linuxfoundation.org Cc: pombredanne@nexb.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1534260823-87917-1-git-send-email-zhe.he@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 14:47:32 +02:00
Jacek Tomaka	16160c1946	perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs Problem: perf did not show branch predicted/mispredicted bit in brstack. Output of perf -F brstack for profile collected Before: 0x4fdbcd/0x4fdc03/-/-/-/0 0x45f4c1/0x4fdba0/-/-/-/0 0x45f544/0x45f4bb/-/-/-/0 0x45f555/0x45f53c/-/-/-/0 0x7f66901cc24b/0x45f555/-/-/-/0 0x7f66901cc22e/0x7f66901cc23d/-/-/-/0 0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0 0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0 After: 0x4fdbcd/0x4fdc03/P/-/-/0 0x45f4c1/0x4fdba0/P/-/-/0 0x45f544/0x45f4bb/P/-/-/0 0x45f555/0x45f53c/P/-/-/0 0x7f66901cc24b/0x45f555/P/-/-/0 0x7f66901cc22e/0x7f66901cc23d/P/-/-/0 0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0 0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0 Cause: As mentioned in Software Development Manual vol 3, 17.4.8.1, IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as its format. Despite that, registers containing FROM address of the branch, do have MISPREDICT bit but because of the format indicated in IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit. Solution: Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit. Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-09-10 10:03:01 +02:00
Linus Torvalds	9a5682765a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Prevent multiplication result truncation on 32bit. Introduced with the early timestamp reworrk. - Ensure microcode revision storage to be consistent under all circumstances - Prevent write tearing of PTEs - Prevent confusion of user and kernel reegisters when dumping fatal signals verbosely - Make an error return value in a failure path of the vector allocation negative. Returning EINVAL might the caller assume success and causes further wreckage. - A trivial kernel doc warning fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Use WRITE_ONCE() when setting PTEs x86/apic/vector: Make error return value negative x86/process: Don't mix user/kernel regs in 64bit __show_regs() x86/tsc: Prevent result truncation on 32bit x86: Fix kernel-doc atomic.h warnings x86/microcode: Update the new microcode revision unconditionally x86/microcode: Make sure boot_cpu_data.microcode is up-to-date	2018-09-09 07:05:15 -07:00
Nadav Amit	9bc4f28af7	x86/mm: Use WRITE_ONCE() when setting PTEs When page-table entries are set, the compiler might optimize their assignment by using multiple instructions to set the PTE. This might turn into a security hazard if the user somehow manages to use the interim PTE. L1TF does not make our lives easier, making even an interim non-present PTE a security hazard. Using WRITE_ONCE() to set PTEs and friends should prevent this potential security hazard. I skimmed the differences in the binary with and without this patch. The differences are (obviously) greater when CONFIG_PARAVIRT=n as more code optimizations are possible. For better and worse, the impact on the binary with this patch is pretty small. Skimming the code did not cause anything to jump out as a security hazard, but it seems that at least move_soft_dirty_pte() caused set_pte_at() to use multiple writes. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com	2018-09-08 12:30:36 +02:00
Thomas Gleixner	47b7360ce5	x86/apic/vector: Make error return value negative activate_managed() returns EINVAL instead of -EINVAL in case of error. While this is unlikely to happen, the positive return value would cause further malfunction at the call site. Fixes: `2db1f959d9` ("x86/vector: Handle managed interrupts proper") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org	2018-09-08 12:12:40 +02:00
Andy Lutomirski	98f05b5138	x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space In the non-trampoline SYSCALL64 path, a percpu variable is used to temporarily store the user RSP value. Instead of a separate variable, use the otherwise unused sp2 slot in the TSS. This will improve cache locality, as the sp1 slot is already used in the same code to find the kernel stack. It will also simplify a future change to make the non-trampoline path work in PTI mode. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/08e769a0023dbad4bac6f34f3631dbaf8ad59f4f.1536015544.git.luto@kernel.org	2018-09-08 11:20:11 +02:00
Andy Lutomirski	bd7b1f7cbf	x86/entry/64: Document idtentry The idtentry macro is complicated and magical. Document what it does to help future readers and to allow future patches to adjust the code and docs at the same time. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/6e56c3ad94879e41afe345750bc28ccc0e820ea8.1536015544.git.luto@kernel.org	2018-09-08 11:20:11 +02:00
Wanpeng Li	bdf7ffc899	KVM: LAPIC: Fix pv ipis out-of-bounds access Dan Carpenter reported that the untrusted data returns from kvm_register_read() results in the following static checker warning: arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi() error: buffer underflow 'map->phys_map' 's32min-s32max' KVM guest can easily trigger this by executing the following assembly sequence in Ring0: mov $10, %rax mov $0xFFFFFFFF, %rbx mov $0xFFFFFFFF, %rdx mov $0, %rsi vmcall As this will cause KVM to execute the following code-path: vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi() which will reach out-of-bounds access. This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id, ignoring destinations that are not present and delivering the rest. We also check whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Add second "if (min > map->max_apic_id)" to complete the fix. -Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-09-07 18:38:43 +02:00
Liran Alon	b5861e5cf2	KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2 Consider the case L1 had a IRQ/NMI event until it executed VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, L0 needs to evaluate if this pending event should cause an exit from L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept EXTERNAL_INTERRUPT). Usually this would be handled by L0 requesting a IRQ/NMI window by setting VMCS accordingly. However, this setting was done on VMCS01 and now VMCS02 is active instead. Thus, when L1 executes VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by requesting a KVM_REQ_EVENT. Note that above scenario exists when L1 KVM is about to enter L2 but requests an "immediate-exit". As in this case, L1 will disable-interrupts and then send a self-IPI before entering L2. Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-09-07 18:38:42 +02:00
Radim Krčmář	564ad0aa85	Fixes for KVM/ARM for Linux v4.19 v2: - Fix a VFP corruption in 32-bit guest - Add missing cache invalidation for CoW pages - Two small cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJbkngmAAoJEEtpOizt6ddyeaoH/15bbGHlwWf23tGjSoDzhyD4 zAXfy+SJdm4cR8K7jEkVrNffkEMAby7Zl28hTHKB9jsY1K8DD+EuCE3Nd4kkVAsc iHJwV4aiHil/zC5SyE0MqMzELeS8UhsxESYebG6yNF0ElQDQ0SG+QAFr47/OBN9S u4I7x0rhyJP6Kg8z9U4KtEX0hM6C7VVunGWu44/xZSAecTaMuJnItCIM4UMdEkSs xpAoI59lwM6BWrXLvEunekAkxEXoR7AVpQER2PDINoLK2I0i0oavhPim9Xdt2ZXs rqQqfmwmPOVvYbexDp97JtfWo3/psGLqvgoK1tq9bzF3u6Y3ylnUK5IspyVYwuQ= =TK8A -----END PGP SIGNATURE----- Merge tag 'kvm-arm-fixes-for-v4.19-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm Fixes for KVM/ARM for Linux v4.19 v2: - Fix a VFP corruption in 32-bit guest - Add missing cache invalidation for CoW pages - Two small cleanups	2018-09-07 18:38:25 +02:00
Radim Krčmář	ed2ef29100	KVM: s390: Fixes for 4.19 - Fallout from the hugetlbfs support: pfmf interpretion and locking - VSIE: fix keywrapping for nested guests -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJbj40sAAoJEBF7vIC1phx8MIYQAK6TtogzCUok4nvRJZGl34Ac HvJP2OTSNcJO8MA/DkmXk6LNVgrjgLqc4Y0MCMqaz9EzM1FVM0A5cQ4Tiiwk6dlG 395Q5SbkrmVIpmxG7dSQbrj3HlMTUCz7jtAUrDS57zaWYdKhqX+AUuW45u+TPfAo DL00wS+WJxiTWB06cr0gHpHcXyctn5hK0cYUZQokMn2a1pAjLrS4TEpvoGOcu2d6 lULY6uYWCwCnma8eieiC8ssLzB8opDPedLrewBnaZFziEZZrPybYvT8uMffNfygA tj7og1/+iqnUmyAG20Fb8oM0MMcjRWhLGHVFpv1W1ph7624oDUb3Tzd7rV8bzTMC NoqHeIv+oQyhRJCsuPTe2jUcpKc/eJzA8o3ZUdu3LeDBXxNzNOIh08iRHvyFC9iM 91/YkyYcDW2cukxqYjIwPf+y/dVHRqNAmcs9+hvu8AiNeUJPGUYsmlTBABEg0V9H gubV7m/Gl5Yx95UyrlQ4UkuvkOzmtwFYsnFKE0KnqT99bbFFf2na3CZyYBJFBVOj knSl3lS9W5LLrZ3s2VaJ/4/bPc4oGjW1ADEamQCYa4K3XQoMrnqGdL0VVuALJ2dZ RVIz2DP+P6HBCoRWD0cOA0Q+MvP5hl6TrGDdpCbza3ASSF1f/eSASvHs4P4JQPqY dWQ3uIByc3wDXuErkcT5 =kgjR -----END PGP SIGNATURE----- Merge tag 'kvm-s390-master-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fixes for 4.19 - Fallout from the hugetlbfs support: pfmf interpretion and locking - VSIE: fix keywrapping for nested guests	2018-09-07 18:30:47 +02:00
Marc Zyngier	a35381e10d	KVM: Remove obsolete kvm_unmap_hva notifier backend kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: `fb1522e099` ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>	2018-09-07 15:06:02 +02:00
Juergen Gross	b7a5eb6aaf	x86/paravirt: Prevent redefinition of SAVE_FLAGS macro The PARAVIRT_XXL changes introduced a redefinition of SAVE_FLAGS under certain configurations. Cure it Fixes: `6da63eb241` ("x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella"). Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180905053720.13710-1-jgross@suse.com	2018-09-06 14:37:37 +02:00
Juergen Gross	4f2d7af702	x86/xen: Make xen_reservation_lock static No users outside of that file. Fixes: `f030aade91` ("x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180905053634.13648-1-jgross@suse.com	2018-09-06 14:37:37 +02:00
Jann Horn	9fe6299dde	x86/process: Don't mix user/kernel regs in 64bit __show_regs() When the kernel.print-fatal-signals sysctl has been enabled, a simple userspace crash will cause the kernel to write a crash dump that contains, among other things, the kernel gsbase into dmesg. As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE in this case. This also moves the bitness-specific logic from show_regs() into process_{32,64}.c. Fixes: `45807a1df9` ("vdso: print fatal signals") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com	2018-09-06 14:33:12 +02:00
Chuanhua Lei	17f6bac224	x86/tsc: Prevent result truncation on 32bit Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then dividing it by HZ. Both tsc_khz and the temporary variable holding the multiplication result are of type unsigned long, so on 32bit the result is truncated to the lower 32bit. Use u64 as type for the temporary variable and cast tsc_khz to it before multiplying. [ tglx: Massaged changelog and removed pointless braces ] Fixes: `cf7a63ef4e` ("x86/tsc: Calibrate tsc only once") Signed-off-by: Chuanhua Lei <chuanhua.lei@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: yixin.zhu@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Tatashin <pasha.tatashin@microsoft.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com	2018-09-06 14:22:01 +02:00
Alexander Popov	afaef01c00	x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>	2018-09-04 10:35:47 -07:00
Linus Walleij	97feacc05d	gpio: ts5500: Delete platform data handling The TS5500 GPIO driver apparently supports platform data without making any use of it whatsoever. Delete this code, last chance to speak up if you think it is needed. Cc: kernel@savoirfairelinux.com Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Cc: Jerome Oufella <jerome.oufella@savoirfairelinux.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>	2018-09-04 08:22:47 +02:00
Ard Biesheuvel	ab8085c130	crypto: x86 - remove SHA multibuffer routines and mcryptd As it turns out, the AVX2 multibuffer SHA routines are currently broken [0], in a way that would have likely been noticed if this code were in wide use. Since the code is too complicated to be maintained by anyone except the original authors, and since the performance benefits for real-world use cases are debatable to begin with, it is better to drop it entirely for the moment. [0] https://marc.info/?l=linux-crypto-vger&m=153476243825350&w=2 Suggested-by: Eric Biggers <ebiggers@google.com> Cc: Megha Dey <megha.dey@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:37:04 +08:00
Juergen Gross	495310e4f2	x86/paravirt: Remove unneeded mmu related paravirt ops bits There is no need to have 32-bit code for CONFIG_PGTABLE_LEVELS >= 4. Remove it. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-16-jgross@suse.com	2018-09-03 16:50:37 +02:00
Juergen Gross	fdc0269e89	x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella Most of the paravirt ops defined in pv_mmu_ops are for Xen PV guests only. Define them only if CONFIG_PARAVIRT_XXL is set. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-15-jgross@suse.com	2018-09-03 16:50:37 +02:00
Juergen Gross	6da63eb241	x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella All of the paravirt ops defined in pv_irq_ops are for Xen PV guests or VSMP only. Define them only if CONFIG_PARAVIRT_XXL is set. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-14-jgross@suse.com	2018-09-03 16:50:36 +02:00
Juergen Gross	9bad5658ea	x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella Most of the paravirt ops defined in pv_cpu_ops are for Xen PV guests only. Define them only if CONFIG_PARAVIRT_XXL is set. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-13-jgross@suse.com	2018-09-03 16:50:36 +02:00
Juergen Gross	40181646db	x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella All items but name in pv_info are needed by Xen PV only. Define them with CONFIG_PARAVIRT_XXL set only. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-12-jgross@suse.com	2018-09-03 16:50:36 +02:00
Juergen Gross	c00a280a8e	x86/paravirt: Introduce new config option PARAVIRT_XXL A large amount of paravirt ops is used by Xen PV guests only. Add a new config option PARAVIRT_XXL which is selected by XEN_PV. Later we can put the Xen PV only paravirt ops under the PARAVIRT_XXL umbrella. Since irq related paravirt ops are used only by VSMP and Xen PV, let VSMP select PARAVIRT_XXL, too, in order to enable moving the irq ops under PARAVIRT_XXL. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-11-jgross@suse.com	2018-09-03 16:50:35 +02:00
Juergen Gross	5def7a4cd5	x86/paravirt: Remove unused paravirt bits The macros ENABLE_INTERRUPTS_SYSEXIT, GET_CR0_INTO_EAX and PARAVIRT_ADJUST_EXCEPTION_FRAME are used nowhere. Remove their definitions. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-10-jgross@suse.com	2018-09-03 16:50:35 +02:00
Juergen Gross	5c83511bdb	x86/paravirt: Use a single ops structure Instead of using six globally visible paravirt ops structures combine them in a single structure, keeping the original structures as sub-structures. This avoids the need to assemble struct paravirt_patch_template at runtime on the stack each time apply_paravirt() is being called (i.e. when loading a module). [ tglx: Made the struct and the initializer tabular for readability sake ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-9-jgross@suse.com	2018-09-03 16:50:35 +02:00
Juergen Gross	27876f3882	x86/paravirt: Remove clobbers from struct paravirt_patch_site There is no need any longer to store the clobbers in struct paravirt_patch_site. Remove clobbers from the struct and from the related macros. While at it fix some lines longer than 80 characters. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-8-jgross@suse.com	2018-09-03 16:50:34 +02:00
Juergen Gross	abc745f85c	x86/paravirt: Remove clobbers parameter from paravirt patch functions The clobbers parameter from paravirt_patch_default() et al isn't used any longer. Remove it. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-7-jgross@suse.com	2018-09-03 16:50:34 +02:00
Juergen Gross	7e43720289	x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static paravirt_patch_call() and paravirt_patch_jmp() are used in paravirt.c only. Convert them to static. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-6-jgross@suse.com	2018-09-03 16:50:34 +02:00
Juergen Gross	901d209a8b	x86/xen: Add SPDX identifier in arch/x86/xen files Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-5-jgross@suse.com	2018-09-03 16:50:33 +02:00
Juergen Gross	3013c2be60	x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM Instead of using one large #ifdef CONFIG_XEN_PVHVM in arch/x86/xen/platform-pci-unplug.c add the object file depending on CONFIG_XEN_PVHVM being set. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-4-jgross@suse.com	2018-09-03 16:50:33 +02:00
Juergen Gross	f030aade91	x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c There are some PV specific functions in arch/x86/xen/mmu.c which can be moved to mmu_pv.c. This in turn enables to build multicalls.c dependent on CONFIG_XEN_PV. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-3-jgross@suse.com	2018-09-03 16:50:33 +02:00
Juergen Gross	28c11b0f79	x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella All functions in arch/x86/xen/irq.c and arch/x86/xen/xen-asm.S are specific to PV guests. Include them in the kernel with CONFIG_XEN_PV only. Make the PV specific code in arch/x86/entry/entry_.S dependent on CONFIG_XEN_PV instead of CONFIG_XEN. The HVM specific code should depend on CONFIG_XEN_PVHVM. While at it reformat the Makefile to make it more readable. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-2-jgross@suse.com	2018-09-03 16:50:32 +02:00
Jann Horn	9da3f2b740	x86/fault: BUG() when uaccess helpers fault on kernel addresses There have been multiple kernel vulnerabilities that permitted userspace to pass completely unchecked pointers through to userspace accessors: - the waitid() bug - commit `96ca579a1e` ("waitid(): Add missing access_ok() checks") - the sg/bsg read/write APIs - the infiniband read/write APIs These don't happen all that often, but when they do happen, it is hard to test for them properly; and it is probably also hard to discover them with fuzzing. Even when an unmapped kernel address is supplied to such buggy code, it just returns -EFAULT instead of doing a proper BUG() or at least WARN(). Try to make such misbehaving code a bit more visible by refusing to do a fixup in the pagefault handler code when a userspace accessor causes a #PF on a kernel address and the current context isn't whitelisted. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: dvyukov@google.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180828201421.157735-7-jannh@google.com	2018-09-03 15:12:09 +02:00
Jann Horn	81fd9c1844	x86/fault: Plumb error code and fault address through to fault handlers This is preparation for looking at trap number and fault address in the handlers for uaccess errors. No functional change. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-kernel@vger.kernel.org Cc: dvyukov@google.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180828201421.157735-6-jannh@google.com	2018-09-03 15:12:09 +02:00
Jann Horn	75045f77f7	x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups Currently, most fixups for attempting to access userspace memory are handled using _ASM_EXTABLE, which is also used for various other types of fixups (e.g. safe MSR access, IRET failures, and a bunch of other things). In order to make it possible to add special safety checks to uaccess fixups (in particular, checking whether the fault address is actually in userspace), introduce a new exception table handler ex_handler_uaccess() and wire it up to all the user access fixups (excluding ones that already use _ASM_EXTABLE_EX). Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: dvyukov@google.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180828201421.157735-5-jannh@google.com	2018-09-03 15:12:09 +02:00
Jann Horn	e3e4d5019c	x86/kprobes: Stop calling fixup_exception() from kprobe_fault_handler() This removes the call into exception fixup that was added in commit `c28f896634` ("[PATCH] kprobes: fix broken fault handling for x86_64"). On X86, kprobe_fault_handler() is called from two places: do_general_protection() (for #GP) and kprobes_fault() (for #PF). In both paths, the fixup_exception() call in the kprobe fault handler is redundant. In case of #GP, fixup_exception() is called immediately before kprobe_fault_handler() is invoked, so no need to try that again. This assumes that the kprobe's fault handler isn't going to do something crazy like changing RIP so that it suddenly points to an instruction that does userspace access. For #PF on a kernel address from kernel space, after the kprobe fault handler has run, no_context() is invoked, which calls fixup_exception(). Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-kernel@vger.kernel.org Cc: dvyukov@google.com Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180828201421.157735-4-jannh@google.com	2018-09-03 15:12:08 +02:00
Jann Horn	76dee4a728	x86/kprobes: Inline kprobe_exceptions_notify() into do_general_protection() The opaque plumbing of #GP from do_general_protection() through notify_die() into kprobe_exceptions_notify() makes it hard to understand what's going on. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: dvyukov@google.com Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180828201421.157735-3-jannh@google.com	2018-09-03 15:12:08 +02:00
Jann Horn	a980c0ef9f	x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify() This is an extension of commit `b506a9d08b` ("x86: code clarification patch to Kprobes arch code"). As that commit explains, even though kprobe_running() can't be called with preemption enabled, preemption does not need to be disabled. If preemption is enabled, then this can't be originate from a kprobe. Also, use X86_TRAP_PF instead of 14. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: dvyukov@google.com Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180828201421.157735-2-jannh@google.com	2018-09-03 15:12:08 +02:00
Randy Dunlap	4331f4d5ad	x86: Fix kernel-doc atomic.h warnings Fix kernel-doc warnings in arch/x86/include/asm/atomic.h that are caused by having a #define macro between the kernel-doc notation and the function name. Fixed by moving the #define macro to after the function implementation. Make the same change for atomic64_{32,64}.h for consistency even though there were no kernel-doc warnings found in these header files, but there would be if they were used in generation of documentation. Fixes these kernel-doc warnings: ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'i' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'v' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:96: warning: Excess function parameter 'v' description in 'arch_atomic_inc' ../arch/x86/include/asm/atomic.h:109: warning: Excess function parameter 'v' description in 'arch_atomic_dec' ../arch/x86/include/asm/atomic.h:124: warning: Excess function parameter 'v' description in 'arch_atomic_dec_and_test' ../arch/x86/include/asm/atomic.h:138: warning: Excess function parameter 'v' description in 'arch_atomic_inc_and_test' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'i' description in 'arch_atomic_add_negative' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'v' description in 'arch_atomic_add_negative' Fixes: `18cc1814d4` ("atomics/treewide: Make test ops optional") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/0a1e678d-c8c5-b32c-2640-ed4e94d399d2@infradead.org	2018-09-03 12:41:41 +02:00
Linus Torvalds	899ba79553	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Speculation: - Make the microcode check more robust - Make the L1TF memory limit depend on the internal cache physical address space and not on the CPUID advertised physical address space, which might be significantly smaller. This avoids disabling L1TF on machines which utilize the full physical address space. - Fix the GDT mapping for EFI calls on 32bit PTI - Fix the MCE nospec implementation to prevent #GP Fixes and robustness: - Use the proper operand order for LSL in the VDSO - Prevent NMI uaccess race against CR3 switching - Add a lockdep check to verify that text_mutex is held in text_poke() functions - Repair the fallout of giving native_restore_fl() a prototype - Prevent kernel memory dumps based on usermode RIP - Wipe KASAN shadow stack before rewinding the stack to prevent false positives - Move the AMS GOTO enforcement to the actual build stage to allow user API header extraction without a compiler - Fix a section mismatch introduced by the on demand VDSO mapping change Miscellaneous: - Trivial typo, GCC quirk removal and CC_SET/OUT() cleanups" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pti: Fix section mismatch warning/error x86/vdso: Fix lsl operand order x86/mce: Fix set_mce_nospec() to avoid #GP fault x86/efi: Load fixmap GDT in efi_call_phys_epilog() x86/nmi: Fix NMI uaccess race against CR3 switching x86: Allow generating user-space headers without a compiler x86/dumpstack: Don't dump kernel memory based on usermode RIP x86/asm: Use CC_SET()/CC_OUT() in __gen_sigismember() x86/alternatives: Lockdep-enforce text_mutex in text_poke*() x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit() x86/irqflags: Mark native_restore_fl extern inline x86/build: Remove jump label quirk for GCC older than 4.5.2 x86/Kconfig: Fix trivial typo x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+ x86/spectre: Add missing family 6 check to microcode check	2018-09-02 10:11:30 -07:00
Filippo Sironi	8da38ebaad	x86/microcode: Update the new microcode revision unconditionally Handle the case where microcode gets loaded on the BSP's hyperthread sibling first and the boot_cpu_data's microcode revision doesn't get updated because of early exit due to the siblings sharing a microcode engine. For that, simply write the updated revision on all CPUs unconditionally. Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: prarit@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de	2018-09-02 14:10:54 +02:00
Prarit Bhargava	370a132bb2	x86/microcode: Make sure boot_cpu_data.microcode is up-to-date When preparing an MCE record for logging, boot_cpu_data.microcode is used to read out the microcode revision on the box. However, on systems where late microcode update has happened, the microcode revision output in a MCE log record is wrong because boot_cpu_data.microcode is not updated when the microcode gets updated. But, the microcode revision saved in boot_cpu_data's microcode member should be kept up-to-date, regardless, for consistency. Make it so. Fixes: `fa94d0c6e0` ("x86/MCE: Save microcode revision in machine check records") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: sironi@amazon.de Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com	2018-09-02 14:10:54 +02:00
Jacek Tomaka	f4661d293e	x86/microcode: Make revision and processor flags world-readable The microcode revision is already readable for non-root users via /proc/cpuinfo. Thus, there's no reason to keep the same information readable by root only in /sys/devices/system/cpu/cpuX/microcode/. Make .../processor_flags world-readable too, while at it. Reported-by: Tim Burgess <timb@dug.com> Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180825035039.14409-1-jacekt@dugeo.com	2018-09-02 14:09:13 +02:00
Randy Dunlap	ff924c5a1e	x86/pti: Fix section mismatch warning/error Fix the section mismatch warning in arch/x86/mm/pti.c: WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte() The function pti_clone_pgtable() references the function __init pti_user_pagetable_walk_pte(). This is often because pti_clone_pgtable lacks a __init annotation or the annotation of pti_user_pagetable_walk_pte is wrong. FATAL: modpost: Section mismatches detected. Fixes: `85900ea515` ("x86/pti: Map the vsyscall page if needed") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org	2018-09-02 11:24:41 +02:00
Samuel Neves	e78e5a9145	x86/vdso: Fix lsl operand order In the __getcpu function, lsl is using the wrong target and destination registers. Luckily, the compiler tends to choose %eax for both variables, so it has been working so far. Fixes: `a582c540ac` ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt	2018-09-01 23:01:56 +02:00
LuckTony	c7486104a5	x86/mce: Fix set_mce_nospec() to avoid #GP fault The trick with flipping bit 63 to avoid loading the address of the 1:1 mapping of the poisoned page while the 1:1 map is updated used to work when unmapping the page. But it falls down horribly when attempting to directly set the page as uncacheable. The problem is that when the cache mode is changed to uncachable, the pages needs to be flushed from the cache first. But the decoy address is non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a #GP fault. Add code to change_page_attr_set_clr() to fix the address before calling flush. Fixes: `284ce4011b` ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk	2018-09-01 14:59:19 +02:00
Joerg Roedel	eeb89e2bb1	x86/efi: Load fixmap GDT in efi_call_phys_epilog() When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap for the simple reason that this address is also mapped for user-space. The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT to call EFI runtime services and switch back to the kernel GDT when they return. But the switch-back uses the writable GDT, not the fixmap GDT. When that happened and and the CPU returns to user-space it switches to the user %cr3 and tries to restore user segment registers. This fails because the writable GDT is not mapped in the user page-table, and without a GDT the fault handlers also can't be launched. The result is a triple fault and reboot of the machine. Fix that by restoring the GDT back to the fixmap GDT which is also mapped in the user page-table. Fixes: `7757d607c6` x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: hpa@zytor.com Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org	2018-08-31 17:45:54 +02:00
Linus Torvalds	4290d5b9ca	xen: fixes for 4.19-rc2 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW4lM6AAKCRCAXGG7T9hj vs8AAQDysFccg97UdopW3B7yklIaRqkfEIAsxe65f191MXsH2AEAp5SKxZqRPqBP a9WHDj8ShB3BhZ/IxpdO9Y59U3Jo4wA= =Gt4c -----END PGP SIGNATURE----- Merge tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - minor cleanup avoiding a warning when building with new gcc - a patch to add a new sysfs node for Xen frontend/backend drivers to make it easier to obtain the state of a pv device - two fixes for 32-bit pv-guests to avoid intermediate L1TF vulnerable PTEs * tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: remove redundant variable save_pud xen: export device state to sysfs x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear x86/xen: don't write ptes directly in 32-bit PV guests	2018-08-31 08:45:16 -07:00
Andy Lutomirski	4012e77a90	x86/nmi: Fix NMI uaccess race against CR3 switching A NMI can hit in the middle of context switching or in the middle of switch_mm_irqs_off(). In either case, CR3 might not match current->mm, which could cause copy_from_user_nmi() and friends to read the wrong memory. Fix it by adding a new nmi_uaccess_okay() helper and checking it in copy_from_user_nmi() and in __copy_from_user_nmi()'s callers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@surriel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org	2018-08-31 17:08:22 +02:00
Ben Hutchings	829fe4aa9a	x86: Allow generating user-space headers without a compiler When bootstrapping an architecture, it's usual to generate the kernel's user-space headers (make headers_install) before building a compiler. Move the compiler check (for asm goto support) to the archprepare target so that it is only done when building code for the target. Fixes: `e501ce957a` ("x86: Force asm-goto") Reported-by: Helmut Grohne <helmutg@debian.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk	2018-08-31 17:08:22 +02:00
Jann Horn	342db04ae7	x86/dumpstack: Don't dump kernel memory based on usermode RIP show_opcodes() is used both for dumping kernel instructions and for dumping user instructions. If userspace causes #PF by jumping to a kernel address, show_opcodes() can be reached with regs->ip controlled by the user, pointing to kernel code. Make sure that userspace can't trick us into dumping kernel memory into dmesg. Fixes: `7cccf0725c` ("x86/dumpstack: Add a show_ip() function") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: security@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com	2018-08-31 17:08:22 +02:00
Sean Christopherson	c60658d1d9	KVM: x86: Unexport x86_emulate_instruction() Allowing x86_emulate_instruction() to be called directly has led to subtle bugs being introduced, e.g. not setting EMULTYPE_NO_REEXECUTE in the emulation type. While most of the blame lies on re-execute being opt-out, exporting x86_emulate_instruction() also exposes its cr2 parameter, which may have contributed to commit `d391f12070` ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") using x86_emulate_instruction() instead of emulate_instruction() because "hey, I have a cr2!", which in turn introduced its EMULTYPE_NO_REEXECUTE bug. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:44 +02:00
Sean Christopherson	0ce97a2b62	KVM: x86: Rename emulate_instruction() to kvm_emulate_instruction() Lack of the kvm_ prefix gives the impression that it's a VMX or SVM specific function, and there's no conflict that prevents adding the kvm_ prefix. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:44 +02:00
Sean Christopherson	6c3dfeb6a4	KVM: x86: Do not re-{try,execute} after failed emulation in L2 Commit `a6f177efaa` ("KVM: Reenter guest after emulation failure if due to access to non-mmio address") added reexecute_instruction() to handle the scenario where two (or more) vCPUS race to write a shadowed page, i.e. reexecute_instruction() is intended to return true if and only if the instruction being emulated was accessing a shadowed page. As L0 is only explicitly shadowing L1 tables, an emulation failure of a nested VM instruction cannot be due to a race to write a shadowed page and so should never be re-executed. This fixes an issue where an "MMIO" emulation failure[1] in L2 is all but guaranteed to result in an infinite loop when TDP is enabled. Because "cr2" is actually an L2 GPA when TDP is enabled, calling kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA and cause reexecute_instruction() to immediately return true. The !mmio_info_in_cache() check in kvm_mmu_page_fault() doesn't catch this case because mmio_info_in_cache() returns false for a nested MMU (the MMIO caching currently handles L1 only, e.g. to cache nested guests' GPAs we'd have to manually flush the cache when switching between VMs and when L1 updated its page tables controlling the nested guest). Way back when, commit `68be080345` ("KVM: x86: never re-execute instruction with enabled tdp") changed reexecute_instruction() to always return false when using TDP under the assumption that KVM would only get into the emulator for MMIO. Commit `95b3cf69bd` ("KVM: x86: let reexecute_instruction work for tdp") effectively reverted that behavior in order to handle the scenario where emulation failed due to an access from L1 to the shadow page tables for L2, but it didn't account for the case where emulation failed in L2 with TDP enabled. All of the above logic also applies to retry_instruction(), added by commit `1cb3f3ae5a` ("KVM: x86: retry non-page-table writing instructions"). An indefinite loop in retry_instruction() should be impossible as it protects against retrying the same instruction over and over, but it's still correct to not retry an L2 instruction in the first place. Fix the immediate issue by adding a check for a nested guest when determining whether or not to allow retry in kvm_mmu_page_fault(). In addition to fixing the immediate bug, add WARN_ON_ONCE in the retry functions since they are not designed to handle nested cases, i.e. they need to be modified even if there is some scenario in the future where we want to allow retrying a nested guest. [1] This issue was encountered after commit `3a2936dedd` ("kvm: mmu: Don't expose private memslots to L2") changed the page fault path to return KVM_PFN_NOSLOT when translating an L2 access to a prive memslot. Returning KVM_PFN_NOSLOT is semantically correct when we want to hide a memslot from L2, i.e. there effectively is no defined memory region for L2, but it has the unfortunate side effect of making KVM think the GFN is a MMIO page, thus triggering emulation. The failure occurred with in-development code that deliberately exposed a private memslot to L2, which L2 accessed with an instruction that is not emulated by KVM. Fixes: `95b3cf69bd` ("KVM: x86: let reexecute_instruction work for tdp") Fixes: `1cb3f3ae5a` ("KVM: x86: retry non-page-table writing instructions") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:44 +02:00
Sean Christopherson	472faffacd	KVM: x86: Default to not allowing emulation retry in kvm_mmu_page_fault Effectively force kvm_mmu_page_fault() to opt-in to allowing retry to make it more obvious when and why it allows emulation to be retried. Previously this approach was less convenient due to retry and re-execute behavior being controlled by separate flags that were also inverted in their implementations (opt-in versus opt-out). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:43 +02:00
Sean Christopherson	384bf2218e	KVM: x86: Merge EMULTYPE_RETRY and EMULTYPE_ALLOW_REEXECUTE retry_instruction() and reexecute_instruction() are a package deal, i.e. there is no scenario where one is allowed and the other is not. Merge their controlling emulation type flags to enforce this in code. Name the combined flag EMULTYPE_ALLOW_RETRY to make it abundantly clear that we are allowing re{try,execute} to occur, as opposed to explicitly requesting retry of a previously failed instruction. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:43 +02:00
Sean Christopherson	8065dbd1ee	KVM: x86: Invert emulation re-execute behavior to make it opt-in Re-execution of an instruction after emulation decode failure is intended to be used only when emulating shadow page accesses. Invert the flag to make allowing re-execution opt-in since that behavior is by far in the minority. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:43 +02:00
Sean Christopherson	35be0aded7	KVM: x86: SVM: Set EMULTYPE_NO_REEXECUTE for RSM emulation Re-execution after an emulation decode failure is only intended to handle a case where two or vCPUs race to write a shadowed page, i.e. we should never re-execute an instruction as part of RSM emulation. Add a new helper, kvm_emulate_instruction_from_buffer(), to support emulating from a pre-defined buffer. This eliminates the last direct call to x86_emulate_instruction() outside of kvm_mmu_page_fault(), which means x86_emulate_instruction() can be unexported in a future patch. Fixes: `7607b71744` ("KVM: SVM: install RSM intercept") Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:43 +02:00
Sean Christopherson	c4409905cd	KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr Re-execution after an emulation decode failure is only intended to handle a case where two or vCPUs race to write a shadowed page, i.e. we should never re-execute an instruction as part of MMIO emulation. As handle_ept_misconfig() is only used for MMIO emulation, it should pass EMULTYPE_NO_REEXECUTE when using the emulator to skip an instr in the fast-MMIO case where VM_EXIT_INSTRUCTION_LEN is invalid. And because the cr2 value passed to x86_emulate_instruction() is only destined for use when retrying or reexecuting, we can simply call emulate_instruction(). Fixes: `d391f12070` ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:42 +02:00
Colin Ian King	0186ec8232	KVM: SVM: remove unused variable dst_vaddr_end Variable dst_vaddr_end is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable 'dst_vaddr_end' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:42 +02:00
Vitaly Kuznetsov	b871da4a77	KVM: nVMX: avoid redundant double assignment of nested_run_pending nested_run_pending is set 20 lines above and check_vmentry_prereqs()/ check_vmentry_postreqs() don't seem to be resetting it (the later, however, checks it). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Eduardo Valentin <eduval@amazon.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-08-30 16:20:03 +02:00
Uros Bizjak	26e609eccd	x86/asm: Use CC_SET()/CC_OUT() in __gen_sigismember() Replace open-coded set instructions with CC_SET()/CC_OUT(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180814165951.13538-1-ubizjak@gmail.com	2018-08-30 13:02:31 +02:00
Jiri Kosina	9222f60650	x86/alternatives: Lockdep-enforce text_mutex in text_poke*() text_poke() and text_poke_bp() must be called with text_mutex held. Put proper lockdep anotation in place instead of just mentioning the requirement in a comment. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1808280853520.25787@cbobk.fhfr.pm	2018-08-30 13:02:30 +02:00
Jann Horn	f12d11c5c1	x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit() Reset the KASAN shadow state of the task stack before rewinding RSP. Without this, a kernel oops will leave parts of the stack poisoned, and code running under do_exit() can trip over such poisoned regions and cause nonsensical false-positive KASAN reports about stack-out-of-bounds bugs. This does not wipe the exception stacks; if an oops happens on an exception stack, it might result in random KASAN false-positives from other tasks afterwards. This is probably relatively uninteresting, since if the kernel oopses on an exception stack, there are most likely bigger things to worry about. It'd be more interesting if vmapped stacks and KASAN were compatible, since then handle_stack_overflow() would oops from exception stack context. Fixes: `2deb4be280` ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: kasan-dev@googlegroups.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com	2018-08-30 11:37:09 +02:00
Nick Desaulniers	1f59a4581b	x86/irqflags: Mark native_restore_fl extern inline This should have been marked extern inline in order to pick up the out of line definition in arch/x86/kernel/irqflags.S. Fixes: `208cbb3255` ("x86/irqflags: Provide a declaration for native_save_fl") Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180827214011.55428-1-ndesaulniers@google.com	2018-08-30 11:37:09 +02:00
Masahiro Yamada	36bf9da291	x86/build: Remove jump label quirk for GCC older than 4.5.2 Commit `cafa0010cd` ("Raise the minimum required gcc version to 4.6") bumped the minimum GCC version to 4.6 for all architectures. Remove the workaround code. It was the only user of cc-if-fullversion. Remove the macro as well. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/1535348714-25457-1-git-send-email-yamada.masahiro@socionext.com	2018-08-30 11:37:08 +02:00
Linus Torvalds	b4df50de6a	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - Check for the right CPU feature bit in sm4-ce on arm64. - Fix scatterwalk WARN_ON in aes-gcm-ce on arm64. - Fix unaligned fault in aesni on x86. - Fix potential NULL pointer dereference on exit in chtls. - Fix DMA mapping direction for RSA in caam. - Fix error path return value for xts setkey in caam. - Fix address endianness when DMA unmapping in caam. - Fix sleep-in-atomic in vmx. - Fix command corruption when queue is full in cavium/nitrox. * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: cavium/nitrox - fix for command corruption in queue full case with backlog submissions. crypto: vmx - Fix sleep-in-atomic bugs crypto: arm64/aes-gcm-ce - fix scatterwalk API violation crypto: aesni - Use unaligned loads from gcm_context_data crypto: chtls - fix null dereference chtls_free_uld() crypto: arm64/sm4-ce - check for the right CPU feature bit crypto: caam - fix DMA mapping direction for RSA forms 2 & 3 crypto: caam/qi - fix error path in xts setkey crypto: caam/jr - fix descriptor DMA unmapping	2018-08-29 13:38:39 -07:00
Arnd Bergmann	4faea239e5	y2038: utimes: Rework #ifdef guards for compat syscalls After changing over to 64-bit time_t syscalls, many architectures will want compat_sys_utimensat() but not respective handlers for utime(), utimes() and futimesat(). This adds a new __ARCH_WANT_SYS_UTIME32 to complement __ARCH_WANT_SYS_UTIME. For now, all 64-bit architectures that support CONFIG_COMPAT set it, but future 64-bit architectures will not (tile would not have needed it either, but got removed). As older 32-bit architectures get converted to using CONFIG_64BIT_TIME, they will have to use __ARCH_WANT_SYS_UTIME32 instead of __ARCH_WANT_SYS_UTIME. Architectures using the generic syscall ABI don't need either of them as they never had a utime syscall. Since the compat_utimbuf structure is now required outside of CONFIG_COMPAT, I'm moving it into compat_time.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- changed from last version: - renamed __ARCH_WANT_COMPAT_SYS_UTIME to __ARCH_WANT_SYS_UTIME32	2018-08-29 15:42:23 +02:00

... 4 5 6 7 8 ...

31043 Commits