linux

Author	SHA1	Message	Date
Hayato Ohhashi	898ec52d2b	x86/xen/time: Set the X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz() If the TSC frequency is known from the pvclock page, the TSC frequency does not need to be recalibrated. We can avoid recalibration by setting X86_FEATURE_TSC_KNOWN_FREQ. Signed-off-by: Hayato Ohhashi <o8@vmm.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20200721161231.6019-1-o8@vmm.dev	2020-07-25 15:49:41 +02:00
Fenghua Yu	3aae57f0c3	x86/split_lock: Enable the split lock feature on Sapphire Rapids and Alder Lake CPUs Add Sapphire Rapids and Alder Lake processors to CPU list to enumerate and enable the split lock feature. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/1595634320-79689-1-git-send-email-fenghua.yu@intel.com	2020-07-25 12:17:00 +02:00
Tony Luck	e00b62f0b0	x86/cpu: Add Lakefield, Alder Lake and Rocket Lake models to the to Intel CPU family Add three new Intel CPU models. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200721043749.31567-1-tony.luck@intel.com	2020-07-25 12:16:59 +02:00
Ingo Molnar	fb4405ae6e	Linux 5.8-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8UzA4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGQ7cH/3v+Gv+SmHJCvaT2 CSu0+7okVnYbY3UTb3hykk7/aOqb6284KjxR03r0CWFzsEsZVhC5pvvruASSiMQg Pi04sLqv6CsGLHd1n+pl4AUYEaxq6k4KS3uU3HHSWxrahDDApQoRUx2F8lpOxyj8 RiwnoO60IMPA7IFJqzcZuFqsgdxqiiYvnzT461KX8Mrw6fyMXeR2KAj2NwMX8dZN At21Sf8+LSoh6q2HnugfiUd/jR10XbfxIIx2lXgIinb15GXgWydEQVrDJ7cUV7ix Jd0S+dtOtp+lWtFHDoyjjqqsMV7+G8i/rFNZoxSkyZqsUTaKzaR6JD3moSyoYZgG 0+eXO4A= =9EpR -----END PGP SIGNATURE----- Merge tag 'v5.8-rc6' into x86/cpu, to refresh the branch before adding new commits Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-07-25 12:16:16 +02:00
Ingo Molnar	1d0e12fd3a	x86/defconfigs: Refresh defconfig files Perform a 'make savedefconfig' pass over our main defconfig files, which keeps the defconfig result the same, but compresses the file where defaults were changed, options removed or reordered. Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200724130638.645844-2-mingo@kernel.org	2020-07-25 12:02:14 +02:00
Ingo Molnar	4b8e0328e5	x86/mm: Remove the unused mk_kernel_pgd() #define AFAICS the last uses of directly 'making' kernel PGDs was removed 7 years ago: `8b78c21d72`: ("x86, 64bit, mm: hibernate use generic mapping_init") Where the explicit PGD walking loop was replaced with kernel_ident_mapping_init() calls. This was then (unnecessarily) carried over through the 5-level paging conversion. Also clean up the 'level' comments a bit, to convey the original, meanwhile somewhat bit-rotten notion, that these are empty comment blocks with no methods to handle any of the levels except the PTE level. Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200724114418.629021-4-mingo@kernel.org	2020-07-25 12:00:57 +02:00
Ingo Molnar	161449bad5	x86/tsc: Remove unused "US_SCALE" and "NS_SCALE" leftover macros Last use of them was removed 13 years ago, when the code was converted to use CYC2NS_SCALE_FACTOR: `53d517cdba`: ("x86: scale cyc_2_nsec according to CPU frequency") The current TSC code uses the 'struct cyc2ns_data' scaling abstraction, the old fixed scaling approach is long gone. This cleanup also removes the 'arbitralrily' typo from the comment, so win-win. ;-) Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200724114418.629021-3-mingo@kernel.org	2020-07-25 12:00:57 +02:00
Ingo Molnar	8cd591aeb1	x86/ioapic: Remove unused "IOAPIC_AUTO" define Last use was removed more than 5 years ago, in: `5ad274d41c`: ("x86/irq: Remove unused old IOAPIC irqdomain interfaces") Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200724114418.629021-2-mingo@kernel.org	2020-07-25 12:00:56 +02:00
Arvind Sankar	587af649bc	x86/build: Move max-page-size option to LDFLAGS_vmlinux This option is only required for vmlinux on 64-bit, to enforce 2MiB alignment, so set it in LDFLAGS_vmlinux instead of KBUILD_LDFLAGS. Also drop the ld-option check: this option was added in binutils-2.18 and all the other places that use it already don't have the check. This reduces the size of the intermediate ELF files arch/x86/boot/setup.elf and arch/x86/realmode/rm/realmode.elf by about 2MiB each. The binary versions are unchanged. Move the LDFLAGS settings to all be together and just after CFLAGS settings are done. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Link: https://lore.kernel.org/r/20200722184334.3785418-1-nivedita@alum.mit.edu	2020-07-24 16:39:27 +02:00
Thomas Gleixner	72c3c0fe54	x86/kvm: Use generic xfer to guest work function Use the generic infrastructure to check for and handle pending work before transitioning into guest mode. This now handles TIF_NOTIFY_RESUME as well which was ignored so far. Handling it is important as this covers task work and task work will be used to offload the heavy lifting of POSIX CPU timers to thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.979724969@linutronix.de	2020-07-24 15:05:01 +02:00
Thomas Gleixner	a27a0a5549	x86/entry: Cleanup idtentry_enter/exit Remove the temporary defines and fixup all references. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.855839271@linutronix.de	2020-07-24 15:05:01 +02:00
Thomas Gleixner	bdcd178ada	x86/entry: Use generic interrupt entry/exit code Replace the x86 code with the generic variant. Use temporary defines for idtentry_* which will be cleaned up in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.711492752@linutronix.de	2020-07-24 15:05:00 +02:00
Thomas Gleixner	517e499227	x86/entry: Cleanup idtentry_entry/exit_user Cleanup the temporary defines and use irqentry_ instead of idtentry_. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.602603691@linutronix.de	2020-07-24 15:05:00 +02:00
Thomas Gleixner	167fd210ec	x86/entry: Use generic syscall exit functionality Replace the x86 variant with the generic version. Provide the relevant architecture specific helper functions and defines. Use a temporary define for idtentry_exit_user which will be cleaned up seperately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.494648601@linutronix.de	2020-07-24 15:04:59 +02:00
Thomas Gleixner	27d6b4d14f	x86/entry: Use generic syscall entry function Replace the syscall entry work handling with the generic version. Provide the necessary helper inlines to handle the real architecture specific parts, e.g. ptrace. Use a temporary define for idtentry_enter_user which will be cleaned up seperately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.376213694@linutronix.de	2020-07-24 15:04:59 +02:00
Thomas Gleixner	0bf019ea59	x86/ptrace: Provide pt_regs helper for entry/exit As a preparatory step for moving the syscall and interrupt entry/exit handling into generic code, provide a pt_regs helper which retrieves the interrupt state from pt_regs. This is required to check whether interrupts are reenabled by return from interrupt/exception. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.258511584@linutronix.de	2020-07-24 15:04:58 +02:00
Thomas Gleixner	a377ac1cd9	x86/entry: Move user return notifier out of loop Guests and user space share certain MSRs. KVM sets these MSRs to guest values once and does not set them back to user space values on every VM exit to spare the costly MSR operations. User return notifiers ensure that these MSRs are set back to the correct values before returning to user space in exit_to_usermode_loop(). There is no reason to evaluate the TIF flag indicating that user return notifiers need to be invoked in the loop. The important point is that they are invoked before returning to user space. Move the invocation out of the loop into the section which does the last preperatory steps before returning to user space. That section is not preemptible and runs with interrupts disabled until the actual return. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.159112003@linutronix.de	2020-07-24 15:04:58 +02:00
Thomas Gleixner	0b085e68f4	x86/entry: Consolidate 32/64 bit syscall entry 64bit and 32bit entry code have the same open coded syscall entry handling after the bitwidth specific bits. Move it to a helper function and share the code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.051234096@linutronix.de	2020-07-24 15:04:58 +02:00
Thomas Gleixner	8d5ea35c5e	x86/entry: Consolidate check_user_regs() The user register sanity check is sprinkled all over the place. Move it into enter_from_user_mode(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220519.943016204@linutronix.de	2020-07-24 15:04:57 +02:00
Thomas Gleixner	b35ad8405d	Merge branch 'core/entry' into x86/entry Pick up generic entry code to migrate x86 over.	2020-07-24 15:03:59 +02:00
Sedat Dilek	6526b12de0	x86/defconfigs: Remove CONFIG_CRYPTO_AES_586 from i386_defconfig Initially CONFIG_CRYPTO_AES_586=y was added to the i386_defconfig file in: `c1b362e3b4`: ("x86: update defconfigs") The code and Kconfig for CONFIG_CRYPTO_AES_586 was removed in: `1d2c327931`: ("crypto: x86/aes - drop scalar assembler implementations") Remove the leftover from the i386_defconfig file as well. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20200723171119.9881-1-sedat.dilek@gmail.com	2020-07-24 14:55:55 +02:00
Ingo Molnar	d19e789f06	compiler.h: Move instrumentation_begin()/end() to new <linux/instrumentation.h> header Linus pointed out that compiler.h - which is a key header that gets included in every single one of the 28,000+ kernel files during a kernel build - was bloated in: `6553896666`: ("vmlinux.lds.h: Create section for protection against instrumentation") Linus noted: > I have pulled this, but do we really want to add this to a header file > that is _so_ core that it gets included for basically every single > file built? > > I don't even see those instrumentation_begin/end() things used > anywhere right now. > > It seems excessive. That 53 lines is maybe not a lot, but it pushed > that header file to over 12kB, and while it's mostly comments, it's > extra IO and parsing basically for _every_ single file compiled in the > kernel. > > For what appears to be absolutely zero upside right now, and I really > don't see why this should be in such a core header file! Move these primitives into a new header: <linux/instrumentation.h>, and include that header in the headers that make use of it. Unfortunately one of these headers is asm-generic/bug.h, which does get included in a lot of places, similarly to compiler.h. So the de-bloating effect isn't as good as we'd like it to be - but at least the interfaces are defined separately. No change to functionality intended. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200604071921.GA1361070@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org>	2020-07-24 13:56:23 +02:00
Ira Weiny	7f6fa101df	x86: Correct noinstr qualifiers The noinstr qualifier is to be specified before the return type in the same way inline is used. These 2 cases were missed by previous patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200723161405.852613-1-ira.weiny@intel.com	2020-07-24 09:54:15 +02:00
Arvind Sankar	0a787b28b7	x86/mm: Drop unused MAX_PHYSADDR_BITS The macro is not used anywhere, and has an incorrect value (going by the comment) on x86_64 since commit `c898faf91b` ("x86: 46 bit physical address support on 64 bits") To avoid confusion, just remove the definition. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200723231544.17274-2-nivedita@alum.mit.edu	2020-07-24 09:53:06 +02:00
Dave Airlie	41206a073c	Linux 5.8-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl8UzA4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGQ7cH/3v+Gv+SmHJCvaT2 CSu0+7okVnYbY3UTb3hykk7/aOqb6284KjxR03r0CWFzsEsZVhC5pvvruASSiMQg Pi04sLqv6CsGLHd1n+pl4AUYEaxq6k4KS3uU3HHSWxrahDDApQoRUx2F8lpOxyj8 RiwnoO60IMPA7IFJqzcZuFqsgdxqiiYvnzT461KX8Mrw6fyMXeR2KAj2NwMX8dZN At21Sf8+LSoh6q2HnugfiUd/jR10XbfxIIx2lXgIinb15GXgWydEQVrDJ7cUV7ix Jd0S+dtOtp+lWtFHDoyjjqqsMV7+G8i/rFNZoxSkyZqsUTaKzaR6JD3moSyoYZgG 0+eXO4A= =9EpR -----END PGP SIGNATURE----- Merge v5.8-rc6 into drm-next I've got a silent conflict + two trees based on fixes to merge. Fixes a silent merge with amdgpu Signed-off-by: Dave Airlie <airlied@redhat.com>	2020-07-24 08:48:05 +10:00
Nick Desaulniers	158807de58	x86/uaccess: Make __get_user_size() Clang compliant on 32-bit Clang fails to compile __get_user_size() on 32-bit for the following code: long long val; __get_user(val, usrptr); with: error: invalid output size for constraint '=q' GCC compiles the same code without complaints. The reason is that GCC and Clang are architecturally different, which leads to subtle issues for code that's invalid but clearly dead, i.e. with code that emulates polymorphism with the preprocessor and sizeof. GCC will perform semantic analysis after early inlining and dead code elimination, so it will not warn on invalid code that's dead. Clang strictly performs optimizations after semantic analysis, so it will warn for dead code. Neither Clang nor GCC like this very much with -m32: long long ret; asm ("movb $5, %0" : "=q" (ret)); However, GCC can tolerate this variant: long long ret; switch (sizeof(ret)) { case 1: asm ("movb $5, %0" : "=q" (ret)); break; case 8:; } Clang, on the other hand, won't accept that because it validates the inline asm for the '1' case before the optimisation phase where it realises that it wouldn't have to emit it anyway. If LLVM (Clang's "back end") fails such as during instruction selection or register allocation, it cannot provide accurate diagnostics (warnings / errors) that contain line information, as the AST has been discarded from memory at that point. While there have been early discussions about having C/C++ specific language optimizations in Clang via the use of MLIR, which would enable such earlier optimizations, such work is not scoped and likely a multi-year endeavor. It was discussed to change the asm output constraint for the one byte case from "=q" to "=r". While it works for 64-bit, it fails on 32-bit. With '=r' the compiler could fail to chose a register accessible as high/low which is required for the byte operation. If that happens the assembly will fail. Use a local temporary variable of type 'unsigned char' as output for the byte copy inline asm and then assign it to the real output variable. This prevents Clang from failing the semantic analysis in the above case. The resulting code for the actual one byte copy is not affected as the temporary variable is optimized out. [ tglx: Amended changelog ] Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: David Woodhouse <dwmw2@infradead.org> Reported-by: Dmitry Golovin <dima@golovin.in> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://bugs.llvm.org/show_bug.cgi?id=33587 Link: https://github.com/ClangBuiltLinux/linux/issues/3 Link: https://github.com/ClangBuiltLinux/linux/issues/194 Link: https://github.com/ClangBuiltLinux/linux/issues/781 Link: https://lore.kernel.org/lkml/20180209161833.4605-1-dwmw2@infradead.org/ Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@mail.gmail.com/ Link: https://lkml.kernel.org/r/20200720204925.3654302-12-ndesaulniers@google.com	2020-07-23 12:38:31 +02:00
Brian Gerst	4719ffecbb	x86/percpu: Remove unused PER_CPU() macro Also remove now unused __percpu_mov_op. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-11-ndesaulniers@google.com	2020-07-23 11:46:43 +02:00
Brian Gerst	c94055fe93	x86/percpu: Clean up percpu_stable_op() Use __pcpu_size_call_return() to simplify this_cpu_read_stable(). Also remove __bad_percpu_size() which is now unused. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-10-ndesaulniers@google.com	2020-07-23 11:46:42 +02:00
Brian Gerst	ebcd580bed	x86/percpu: Clean up percpu_cmpxchg_op() The core percpu macros already have a switch on the data size, so the switch in the x86 code is redundant and produces more dead code. Also use appropriate types for the width of the instructions. This avoids errors when compiling with Clang. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-9-ndesaulniers@google.com	2020-07-23 11:46:42 +02:00
Brian Gerst	73ca542fba	x86/percpu: Clean up percpu_xchg_op() The core percpu macros already have a switch on the data size, so the switch in the x86 code is redundant and produces more dead code. Also use appropriate types for the width of the instructions. This avoids errors when compiling with Clang. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-8-ndesaulniers@google.com	2020-07-23 11:46:41 +02:00
Brian Gerst	bbff583b84	x86/percpu: Clean up percpu_add_return_op() The core percpu macros already have a switch on the data size, so the switch in the x86 code is redundant and produces more dead code. Also use appropriate types for the width of the instructions. This avoids errors when compiling with Clang. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-7-ndesaulniers@google.com	2020-07-23 11:46:41 +02:00
Brian Gerst	e4d16defbb	x86/percpu: Remove "e" constraint from XADD The "e" constraint represents a constant, but the XADD instruction doesn't accept immediate operands. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-6-ndesaulniers@google.com	2020-07-23 11:46:40 +02:00
Brian Gerst	33e5614a43	x86/percpu: Clean up percpu_add_op() The core percpu macros already have a switch on the data size, so the switch in the x86 code is redundant and produces more dead code. Also use appropriate types for the width of the instructions. This avoids errors when compiling with Clang. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-5-ndesaulniers@google.com	2020-07-23 11:46:40 +02:00
Brian Gerst	bb631e3002	x86/percpu: Clean up percpu_from_op() The core percpu macros already have a switch on the data size, so the switch in the x86 code is redundant and produces more dead code. Also use appropriate types for the width of the instructions. This avoids errors when compiling with Clang. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-4-ndesaulniers@google.com	2020-07-23 11:46:39 +02:00
Brian Gerst	c175acc147	x86/percpu: Clean up percpu_to_op() The core percpu macros already have a switch on the data size, so the switch in the x86 code is redundant and produces more dead code. Also use appropriate types for the width of the instructions. This avoids errors when compiling with Clang. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-3-ndesaulniers@google.com	2020-07-23 11:46:39 +02:00
Brian Gerst	6865dc3ae9	x86/percpu: Introduce size abstraction macros In preparation for cleaning up the percpu operations, define macros for abstraction based on the width of the operation. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dennis Zhou <dennis@kernel.org> Link: https://lkml.kernel.org/r/20200720204925.3654302-2-ndesaulniers@google.com	2020-07-23 11:46:39 +02:00
Uros Bizjak	ef19f826ec	crypto: x86 - Put back integer parts of include/asm/inst.h Resolves conflict with the tip tree. Fixes: `d7866e503b` ("crypto: x86 - Remove include/asm/inst.h") CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Stephen Rothwell <sfr@canb.auug.org.au>, CC: "Chang S. Bae" <chang.seok.bae@intel.com>, CC: Peter Zijlstra <peterz@infradead.org>, CC: Sasha Levin <sashal@kernel.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-07-23 17:34:20 +10:00
Arnd Bergmann	44623b2818	crypto: x86/crc32c - fix building with clang ias The clang integrated assembler complains about movzxw: arch/x86/crypto/crc32c-pcl-intel-asm_64.S:173:2: error: invalid instruction mnemonic 'movzxw' It seems that movzwq is the mnemonic that it expects instead, and this is what objdump prints when disassembling the file. Fixes: `6a8ce1ef39` ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-07-23 17:34:16 +10:00
Jon Derrick	ec0160891e	irqdomain/treewide: Free firmware node after domain removal Commit `711419e504` ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") unintentionally caused a dangling pointer page fault issue on firmware nodes that were freed after IRQ domain allocation. Commit `e3beca48a4` fixed that dangling pointer issue by only freeing the firmware node after an IRQ domain allocation failure. That fix no longer frees the firmware node immediately, but leaves the firmware node allocated after the domain is removed. The firmware node must be kept around through irq_domain_remove, but should be freed it afterwards. Add the missing free operations after domain removal where where appropriate. Fixes: `e3beca48a4` ("irqdomain/treewide: Keep firmware node unconditionally allocated") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1595363169-7157-1-git-send-email-jonathan.derrick@intel.com	2020-07-23 00:08:52 +02:00
Dmitry Safonov	ef2ff0f5d6	x86/dumpstack: Show registers dump with trace's log level show_trace_log_lvl() provides x86 platform-specific way to unwind backtrace with a given log level. Unfortunately, registers dump(s) are not printed with the same log level - instead, KERN_DEFAULT is always used. Arista's switches uses quite common setup with rsyslog, where only urgent messages goes to console (console_log_level=KERN_ERR), everything else goes into /var/log/ as the console baud-rate often is indecently slow (9600 bps). Backtrace dumps without registers printed have proven to be as useful as morning standups. Furthermore, in order to introduce KERN_UNSUPPRESSED (which I believe is still the most elegant way to fix raciness of sysrq[1]) the log level should be passed down the stack to register dumping functions. Besides, there is a potential use-case for printing traces with KERN_DEBUG level [2] (where registers dump shouldn't appear with higher log level). After all preparations are done, provide log_lvl parameter for show_regs_if_on_stack() and wire up to actual log level used as an argument for show_trace_log_lvl(). [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/ [2]: https://lore.kernel.org/linux-doc/20190724170249.9644-1-dima@arista.com/ Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lkml.kernel.org/r/20200629144847.492794-4-dima@arista.com	2020-07-22 23:56:54 +02:00
Dmitry Safonov	44e215352c	x86/dumpstack: Add log_lvl to __show_regs() show_trace_log_lvl() provides x86 platform-specific way to unwind backtrace with a given log level. Unfortunately, registers dump(s) are not printed with the same log level - instead, KERN_DEFAULT is always used. Arista's switches uses quite common setup with rsyslog, where only urgent messages goes to console (console_log_level=KERN_ERR), everything else goes into /var/log/ as the console baud-rate often is indecently slow (9600 bps). Backtrace dumps without registers printed have proven to be as useful as morning standups. Furthermore, in order to introduce KERN_UNSUPPRESSED (which I believe is still the most elegant way to fix raciness of sysrq[1]) the log level should be passed down the stack to register dumping functions. Besides, there is a potential use-case for printing traces with KERN_DEBUG level [2] (where registers dump shouldn't appear with higher log level). Add log_lvl parameter to __show_regs(). Keep the used log level intact to separate visible change. [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/ [2]: https://lore.kernel.org/linux-doc/20190724170249.9644-1-dima@arista.com/ Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lkml.kernel.org/r/20200629144847.492794-3-dima@arista.com	2020-07-22 23:56:53 +02:00
Dmitry Safonov	fd07f802a7	x86/dumpstack: Add log_lvl to show_iret_regs() show_trace_log_lvl() provides x86 platform-specific way to unwind backtrace with a given log level. Unfortunately, registers dump(s) are not printed with the same log level - instead, KERN_DEFAULT is always used. Arista's switches uses quite common setup with rsyslog, where only urgent messages goes to console (console_log_level=KERN_ERR), everything else goes into /var/log/ as the console baud-rate often is indecently slow (9600 bps). Backtrace dumps without registers printed have proven to be as useful as morning standups. Furthermore, in order to introduce KERN_UNSUPPRESSED (which I believe is still the most elegant way to fix raciness of sysrq[1]) the log level should be passed down the stack to register dumping functions. Besides, there is a potential use-case for printing traces with KERN_DEBUG level [2] (where registers dump shouldn't appear with higher log level). Add log_lvl parameter to show_iret_regs() as a preparation to add it to __show_regs() and show_regs_if_on_stack(). [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/ [2]: https://lore.kernel.org/linux-doc/20190724170249.9644-1-dima@arista.com/ Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Petr Mladek <pmladek@suse.com> Link: https://lkml.kernel.org/r/20200629144847.492794-2-dima@arista.com	2020-07-22 23:56:53 +02:00
Thomas Gleixner	d181d2da01	x86/dumpstack: Dump user space code correctly again H.J. reported that post 5.7 a segfault of a user space task does not longer dump the Code bytes when /proc/sys/debug/exception-trace is enabled. It prints 'Code: Bad RIP value.' instead. This was broken by a recent change which made probe_kernel_read() reject non-kernel addresses. Update show_opcodes() so it retrieves user space opcodes via copy_from_user_nmi(). Fixes: `98a23609b1` ("maccess: always use strict semantics for probe_kernel_read") Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87h7tz306w.fsf@nanos.tec.linutronix.de	2020-07-22 23:47:48 +02:00
Josh Poimboeuf	039a7a30ec	x86/stacktrace: Fix reliable check for empty user task stacks If a user task's stack is empty, or if it only has user regs, ORC reports it as a reliable empty stack. But arch_stack_walk_reliable() incorrectly treats it as unreliable. That happens because the only success path for user tasks is inside the loop, which only iterates on non-empty stacks. Generally, a user task must end in a user regs frame, but an empty stack is an exception to that rule. Thanks to commit `71c9582528` ("x86/unwind/orc: Fix error handling in __unwind_start()"), unwind_start() now sets state->error appropriately. So now for both ORC and FP unwinders, unwind_done() and !unwind_error() always means the end of the stack was successfully reached. So the success path for kthreads is no longer needed -- it can also be used for empty user tasks. Reported-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lkml.kernel.org/r/f136a4e5f019219cbc4f4da33b30c2f44fa65b84.1594994374.git.jpoimboe@redhat.com	2020-07-22 23:47:47 +02:00
Josh Poimboeuf	372a8eaa05	x86/unwind/orc: Fix ORC for newly forked tasks The ORC unwinder fails to unwind newly forked tasks which haven't yet run on the CPU. It correctly reads the 'ret_from_fork' instruction pointer from the stack, but it incorrectly interprets that value as a call stack address rather than a "signal" one, so the address gets incorrectly decremented in the call to orc_find(), resulting in bad ORC data. Fix it by forcing 'ret_from_fork' frames to be signal frames. Reported-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lkml.kernel.org/r/f91a8778dde8aae7f71884b5df2b16d552040441.1594994374.git.jpoimboe@redhat.com	2020-07-22 23:47:47 +02:00
Linus Torvalds	d15be54603	media fixes for v5.8-rc7 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl8VlckACgkQCF8+vY7k 4RWg9RAAhmGk/mYYAIAy6o2G8CPgcNnLrUuhXD+SyYrM25QlkinMjHXVhWiPo0F8 DKq21vXg0qlOe7kwn0vSEQ7tClAuJhK/ERnLnDFWe36lO6QF5ydllzEg9XS+xVq2 4Ur+jdkllaS8j7sjQrEdQF17HW52Sr0WB8KegoeUXfeKcGaIBmB+kfNstK2fWMa+ yuVITK/YNpr53zwwsHlMvmIndui0DJHKGGWuTknZsoQfWJtO62YW50FmkizjqXIt +4DCRiq9Juc/MPYotP2pArwBFiqSl/uP0MeWqn4YzPXvhhhoGPTAhskBWDmZD8jT XI9Q9Uj5x5kCtTLx4CI3mG2X2XrZ+GqFKZhVD0WKnZN2skdHRn/3qn6tHm7O/Rtr s27MVKae5zSQs1XrCXwlEz03T5wZ1y1FCX3dlXGkQqbOQLQGbZTsE8wybPElzkLR 18e8xTA1Ss7tUYf7qcrEmMk242byXP8F43cStaWCwsbJgddBpV2jFFfxan3/j6q9 Kl6Y/ssRn6OrIy1UNpdCClRweLi0U53sFQqM/ZmyUbBXMbT/y+rL6h8Wel8EvFlR PzisYBI/G0AWHahbdHhS9mzQFAZb9etlAboMvI31bo4Bjpbv9UVkX5dMLxabcbEr B70EZ7nigg5EMqeBkemz0kD4W68qvQRKHK0d+TYcYSnq0VMmErI= =/5rQ -----END PGP SIGNATURE----- Merge tag 'media/v5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media into master Pull media fixes from Mauro Carvalho Chehab: "A series of fixes for the upcoming atomisp driver. They solve issues when probing atomisp on devices with multiple cameras and get rid of warnings when built with W=1. The diffstat is a bit long, as this driver has several abstractions. The patches that solved the issues with W=1 had to get rid of some duplicated code (there used to have 2 versions of the same code, one for ISP2401 and another one for ISP2400). As this driver is not in 5.7, such changes won't cause regressions" * tag 'media/v5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (38 commits) Revert "media: atomisp: keep the ISP powered on when setting it" media: atomisp: fix mask and shift operation on ISPSSPM0 media: atomisp: move system_local consts into a C file media: atomisp: get rid of version-specific system_local.h media: atomisp: move global stuff into a common header media: atomisp: remove non-used 32-bits consts at system_local media: atomisp: get rid of some unused static vars media: atomisp: Fix error code in ov5693_probe() media: atomisp: Replace trace_printk by pr_info media: atomisp: Fix __func__ style warnings media: atomisp: fix help message for ISP2401 selection media: atomisp: i2c: atomisp-ov2680.c: fixed a brace coding style issue. media: atomisp: make const arrays static, makes object smaller media: atomisp: Clean up non-existing folders from Makefile media: atomisp: Get rid of ACPI specifics in gmin_subdev_add() media: atomisp: Provide Gmin subdev as parameter to gmin_subdev_add() media: atomisp: Use temporary variable for device in gmin_subdev_add() media: atomisp: Refactor PMIC detection to a separate function media: atomisp: Deduplicate return ret in gmin_i2c_write() media: atomisp: Make pointer to PMIC client global ...	2020-07-22 11:56:00 -07:00
Hu Haowen	2ac5413e5e	x86/perf: Fix a typo The word "Zhoaxin" is incorrect and the right one is "Zhaoxin". Signed-off-by: Hu Haowen <xianfengting221@163.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200719105007.57649-1-xianfengting221@163.com	2020-07-22 10:22:08 +02:00
Peter Zijlstra	015dc08918	Merge branch 'sched/urgent'	2020-07-22 10:22:02 +02:00
Joerg Roedel	de2b41be8f	x86, vmlinux.lds: Page-align end of ..page_aligned sections On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is page-aligned, but the end of the .bss..page_aligned section is not guaranteed to be page-aligned. As a result, objects from other .bss sections may end up on the same 4k page as the idt_table, and will accidentially get mapped read-only during boot, causing unexpected page-faults when the kernel writes to them. This could be worked around by making the objects in the page aligned sections page sized, but that's wrong. Explicit sections which store only page aligned objects have an implicit guarantee that the object is alone in the page in which it is placed. That works for all objects except the last one. That's inconsistent. Enforcing page sized objects for these sections would wreckage memory sanitizers, because the object becomes artificially larger than it should be and out of bound access becomes legit. Align the end of the .bss..page_aligned and .data..page_aligned section on page-size so all objects places in these sections are guaranteed to have their own page. [ tglx: Amended changelog ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200721093448.10417-1-joro@8bytes.org	2020-07-22 09:38:37 +02:00
Eric W. Biederman	be619f7f06	exec: Implement kernel_execve To allow the kernel not to play games with set_fs to call exec implement kernel_execve. The function kernel_execve takes pointers into kernel memory and copies the values pointed to onto the new userspace stack. The calls with arguments from kernel space of do_execve are replaced with calls to kernel_execve. The calls do_execve and do_execveat are made static as there are now no callers outside of exec. The comments that mention do_execve are updated to refer to kernel_execve or execve depending on the circumstances. In addition to correcting the comments, this makes it easy to grep for do_execve and verify it is not used. Inspired-by: https://lkml.kernel.org/r/20200627072704.2447163-1-hch@lst.de Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87wo365ikj.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2020-07-21 08:24:52 -05:00
Christoph Hellwig	55db9c0e85	net: remove compat_sys_{get,set}sockopt Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-19 18:16:40 -07:00
Kevin Buettner	5714ee50bb	copy_xstate_to_kernel: Fix typo which caused GDB regression This fixes a regression encountered while running the gdb.base/corefile.exp test in GDB's test suite. In my testing, the typo prevented the sw_reserved field of struct fxregs_state from being output to the kernel XSAVES area. Thus the correct mask corresponding to XCR0 was not present in the core file for GDB to interrogate, resulting in the following behavior: [kev@f32-1 gdb]$ ./gdb -q testsuite/outputs/gdb.base/corefile/corefile testsuite/outputs/gdb.base/corefile/corefile.core Reading symbols from testsuite/outputs/gdb.base/corefile/corefile... [New LWP 232880] warning: Unexpected size of section `.reg-xstate/232880' in core file. With the typo fixed, the test works again as expected. Signed-off-by: Kevin Buettner <kevinb@redhat.com> Fixes: `9e46365459` ("copy_xstate_to_kernel(): don't leave parts of destination uninitialized") Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Airlie <airlied@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-07-19 17:09:10 -07:00
Linus Torvalds	efb9666e90	A pile of fixes for x86: - Fix the I/O bitmap invalidation on XEN PV, which was overlooked in the recent ioperm/iopl rework. This caused the TSS and XEN's I/O bitmap to get out of sync. - Use the proper vectors for HYPERV. - Make disabling of stack protector for the entry code work with GCC builds which enable stack protector by default. Removing the option is not sufficient, it needs an explicit -fno-stack-protector to shut it off. - Mark check_user_regs() noinstr as it is called from noinstr code. The missing annotation causes it to be placed in the text section which makes it instrumentable. - Add the missing interrupt disable in exc_alignment_check() - Fixup a XEN_PV build dependency in the 32bit entry code - A few fixes to make the Clang integrated assembler happy - Move EFI stub build to the right place for out of tree builds - Make prepare_exit_to_usermode() static. It's not longer called from ASM code. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8UR+MTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoQCUD/4/9W5FFvdZvQPwmXsHPaVnW9hUsXxG 0tjc34xqDcgEl1U3khu+6jj+oHx+JM+4wGP/V49Wqx6xkrJ33/a8uYErAgI7+Pmp s3T2gXMWkgJtYFlDQdAMbeuuM2cOFZJw4BxxvTMth5EixQvk1EkX6QyBjLaSGo8y 78sWtZ6Oh5Ql9ua/9TOilewLsCsQSFIFn0o/hawwwPUMrwGvD29scha0XHom+AO7 uwejfU8klq2HJJaLaaiUaiNBkFz0TNGJtY+3mQpw8BPjCuuBQhYygrS0X4uQzo01 4XJzhDnOVbAYWqi0/T+mAEmuJ9NBZJwYiYrwBYCkZgELwJKLzhzO2GOgP9xEsFY4 VUNgqHFhKrQp10k2k4L/A5tmr+0GntiCQhdZi+/gty6oO/t3ni57pRcAhA9qBNOb 8ZqumBwgaaAIqcmdtoyXAIveWOHnzwKEg6wmIGFbyCwHjeLJKJG7KhpXIpEuX+j2 DC7EfYvRB+jllAk1CBypBvzD0DHfMZ0myPxCcZiW2wHTVAlkpY7hiIyPHqocjE9L OjOQ7FS6E2/p24lYVcLUFWcESxGFvQjjxwXk7htjpGUIZsQOhz/LOW+CIPCsfbqE HoEsHmNltksYYV9FDfevXRp5sbxpx3wQSLOgqNqiOpy4cTCG8boalUqHQ0OsN8Oa EgU067yF77ymRg== =QAeH -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull x86 fixes from Thomas Gleixner: "A pile of fixes for x86: - Fix the I/O bitmap invalidation on XEN PV, which was overlooked in the recent ioperm/iopl rework. This caused the TSS and XEN's I/O bitmap to get out of sync. - Use the proper vectors for HYPERV. - Make disabling of stack protector for the entry code work with GCC builds which enable stack protector by default. Removing the option is not sufficient, it needs an explicit -fno-stack-protector to shut it off. - Mark check_user_regs() noinstr as it is called from noinstr code. The missing annotation causes it to be placed in the text section which makes it instrumentable. - Add the missing interrupt disable in exc_alignment_check() - Fixup a XEN_PV build dependency in the 32bit entry code - A few fixes to make the Clang integrated assembler happy - Move EFI stub build to the right place for out of tree builds - Make prepare_exit_to_usermode() static. It's not longer called from ASM code" * tag 'x86-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Don't add the EFI stub to targets x86/entry: Actually disable stack protector x86/ioperm: Fix io bitmap invalidation on Xen PV x86: math-emu: Fix up 'cmp' insn for clang ias x86/entry: Fix vectors to IDTENTRY_SYSVEC for CONFIG_HYPERV x86/entry: Add compatibility with IAS x86/entry/common: Make prepare_exit_to_usermode() static x86/entry: Mark check_user_regs() noinstr x86/traps: Disable interrupts in exc_aligment_check() x86/entry/32: Fix XEN_PV build dependency	2020-07-19 12:16:09 -07:00
Linus Torvalds	9413cd7792	Two fixes for the interrupt subsystem: - Make the handling of the firmware node consistent and do not free the node after the domain has been created successfully. The core code stores a pointer to it which can lead to a use after free or double free. This used to "work" because the pointer was not stored when the initial code was written, but at some point later it was required to store it. Of course nobody noticed that the existing users break that way. - Handle affinity setting on inactive interrupts correctly when hierarchical irq domains are enabled. When interrupts are inactive with the modern hierarchical irqdomain design, the interrupt chips are not necessarily in a state where affinity changes can be handled. The legacy irq chip design allowed this because interrupts are immediately fully initialized at allocation time. X86 has a hacky workaround for this, but other implementations do not. This cased malfunction on GIC-V3. Instead of playing whack a mole to find all affected drivers, change the core code to store the requested affinity setting and then establish it when the interrupt is allocated, which makes the X86 hack go away. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8UP+4THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoSuZD/9tNPR4fIDt4mC9ciSvwSqGTV+q1y1D zhXSDro4cJNjzy/9D475IJqOlvchaF9Nfun55b60Q6vnA4VN8G+kABEaG8uwr8mV ijTB4f0qKfW/9kUDTJRScq3nNmC3miqg8ZFgFEn6Ecxj3NHmwidATIi5sF6f/XVG DdhL0Jys7ycNeGBf7yIKbT5/NOULMHYy9rK1NDAeBo9u3klvmrwrHgdNsiDDhEaU KlHtwuQLCdjFY3Lf67YpSah+Hx/gXPI1VHUxDDFRoFmC4RlB0VjyXGydjsisOrSQ Cl2gnkQ6VOlLaLbN38nmia9nyb6npzE5iK1h9EDcaRhBACG9O23Bdo+YZYxl6BOP mXuyIVKJYczJEp7j1fGHW/aNCoEqC8dGVyN7toxMVfGZmF12JzMSt4SYItPeSjFC bPNPRCscpiMOQdgwgO0woK1764V46g1BlmxXtJRdWB4iwWgXcryaz65xzSfNeZF4 0+TvdYs2FYjxwwIyWj8xJ3Npe1lKhH+06DA6gziwJt1u4it8rl82UcqMFyf/ty1w o5LHyMBWYm7SJXSeaZZj+nv7moJKJnmRYKnpry21cUzsK/vQEPX0vqhwh4dSFN3O BaBocDsOk+9wkmUwi6haP+6+vpadAFQrsqVhURtwc6OVSWn2/vsf2ZH5P36xwFWD tlFanb8hX9y2NQ== =elM3 -----END PGP SIGNATURE----- Merge tag 'irq-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master Pull irq fixes from Thomas Gleixner: "Two fixes for the interrupt subsystem: - Make the handling of the firmware node consistent and do not free the node after the domain has been created successfully. The core code stores a pointer to it which can lead to a use after free or double free. This used to "work" because the pointer was not stored when the initial code was written, but at some point later it was required to store it. Of course nobody noticed that the existing users break that way. - Handle affinity setting on inactive interrupts correctly when hierarchical irq domains are enabled. When interrupts are inactive with the modern hierarchical irqdomain design, the interrupt chips are not necessarily in a state where affinity changes can be handled. The legacy irq chip design allowed this because interrupts are immediately fully initialized at allocation time. X86 has a hacky workaround for this, but other implementations do not. This cased malfunction on GIC-V3. Instead of playing whack a mole to find all affected drivers, change the core code to store the requested affinity setting and then establish it when the interrupt is allocated, which makes the X86 hack go away" * tag 'irq-urgent-2020-07-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Handle affinity setting on inactive interrupts correctly irqdomain/treewide: Keep firmware node unconditionally allocated	2020-07-19 11:53:08 -07:00
Arvind Sankar	da05b143a3	x86/boot: Don't add the EFI stub to targets vmlinux-objs-y is added to targets, which currently means that the EFI stub gets added to the targets as well. It shouldn't be added since it is built elsewhere. This confuses Makefile.build which interprets the EFI stub as a target $(obj)/$(objtree)/drivers/firmware/efi/libstub/lib.a and will create drivers/firmware/efi/libstub/ underneath arch/x86/boot/compressed, to hold this supposed target, if building out-of-tree. [0] Fix this by pulling the stub out of vmlinux-objs-y into efi-obj-y. [0] See scripts/Makefile.build near the end: # Create directories for object files if they do not exist Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200715032631.1562882-1-nivedita@alum.mit.edu	2020-07-19 13:07:11 +02:00
Kees Cook	58ac3154b8	x86/entry: Actually disable stack protector Some builds of GCC enable stack protector by default. Simply removing the arguments is not sufficient to disable stack protector, as the stack protector for those GCC builds must be explicitly disabled. Remove the argument removals and add -fno-stack-protector. Additionally include missed x32 argument updates, and adjust whitespace for readability. Fixes: `20355e5f73` ("x86/entry: Exclude low level entry code from sanitizing") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/202006261333.585319CA6B@keescook	2020-07-19 13:07:10 +02:00
Christoph Hellwig	2f9237d4f6	dma-mapping: make support for dma ops optional Avoid the overhead of the dma ops support for tiny builds that only use the direct mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2020-07-19 09:29:23 +02:00
Andy Lutomirski	cadfad8701	x86/ioperm: Fix io bitmap invalidation on Xen PV tss_invalidate_io_bitmap() wasn't wired up properly through the pvop machinery, so the TSS and Xen's io bitmap would get out of sync whenever disabling a valid io bitmap. Add a new pvop for tss_invalidate_io_bitmap() to fix it. This is XSA-329. Fixes: `22fe5b0439` ("x86/ioperm: Move TSS bitmap update to exit to user work") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/d53075590e1f91c19f8af705059d3ff99424c020.1595030016.git.luto@kernel.org	2020-07-18 12:31:49 +02:00
Andy Shevchenko	5f55dd5499	media: atomisp: move CCK endpoint address to generic header IOSF MBI header contains a lot of definitions, such as end point addresses of IPs. Move CCK address from AtomISP driver to generic header. While here, drop unused one. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2020-07-18 07:17:16 +02:00
Thomas Gleixner	baedb87d1b	genirq/affinity: Handle affinity setting on inactive interrupts correctly Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests. X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS. Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then: - Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on. Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations. For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design. Remove the X86 workaround as it is not longer required. Fixes: `02edee152d` ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de	2020-07-17 23:30:43 +02:00
steve.wahl@hpe.com	3bcf25a40b	x86/efi: Remove unused EFI_UV1_MEMMAP code With UV1 support removed, EFI_UV1_MEMMAP is no longer used. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200713212956.019149227@hpe.com	2020-07-17 16:47:48 +02:00
steve.wahl@hpe.com	6aa3baabe1	x86/platform/uv: Remove uv bios and efi code related to EFI_UV1_MEMMAP With UV1 removed, EFI_UV1_MEMMAP is not longer used. Remove the code used by it and the related code in EFI. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200713212955.902592618@hpe.com	2020-07-17 16:47:48 +02:00
steve.wahl@hpe.com	66d67fecd8	x86/efi: Remove references to no-longer-used efi_have_uv1_memmap() In removing UV1 support, efi_have_uv1_memmap is no longer used. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200713212955.786177105@hpe.com	2020-07-17 16:47:47 +02:00
steve.wahl@hpe.com	cadde2379f	x86/efi: Delete SGI UV1 detection. As a part of UV1 platform removal, don't try to recognize the platform through DMI to set the EFI_UV1_MEMMAP bit. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200713212955.667726896@hpe.com	2020-07-17 16:47:47 +02:00
steve.wahl@hpe.com	3dad716240	x86/platform/uv: Remove efi=old_map command line option As a part of UV1 platform removal, delete the efi=old_map option, which is not longer needed. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200713212955.552098718@hpe.com	2020-07-17 16:47:46 +02:00
steve.wahl@hpe.com	5d66253751	x86/platform/uv: Remove vestigial mention of UV1 platform from bios header Remove UV1 reference as UV1 is not longer supported by HPE. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212955.435951508@hpe.com	2020-07-17 16:47:46 +02:00
steve.wahl@hpe.com	f584c75307	x86/platform/uv: Remove support for UV1 platform from uv UV1 is not longer supported by HPE Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212955.320087418@hpe.com	2020-07-17 16:47:46 +02:00
steve.wahl@hpe.com	9b9ee17241	x86/platform/uv: Remove support for uv1 platform from uv_hub UV1 is not longer supported by HPE. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212955.203480177@hpe.com	2020-07-17 16:47:45 +02:00
steve.wahl@hpe.com	711621a098	x86/platform/uv: Remove support for UV1 platform from uv_bau UV1 is not longer supported. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212955.083309377@hpe.com	2020-07-17 16:47:45 +02:00
steve.wahl@hpe.com	3736e82d3a	x86/platform/uv: Remove support for UV1 platform from uv_mmrs UV1 is not longer supported. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212954.964332370@hpe.com	2020-07-17 16:47:44 +02:00
steve.wahl@hpe.com	a6b740f173	x86/platform/uv: Remove support for UV1 platform from x2apic_uv_x UV1 is not longer supported. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212954.846026992@hpe.com	2020-07-17 16:47:44 +02:00
steve.wahl@hpe.com	95328de5fc	x86/platform/uv: Remove support for UV1 platform from uv_tlb UV1 is not longer supported. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212954.728022415@hpe.com	2020-07-17 16:47:44 +02:00
steve.wahl@hpe.com	8b3c9b1606	x86/platform/uv: Remove support for UV1 platform from uv_time UV1 is not longer supported Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200713212954.610885520@hpe.com	2020-07-17 16:47:43 +02:00
Kees Cook	3f649ab728	treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' \| cut -d: -f1 \| sort -u \| \ xargs perl -pi -e \ 's/\buninitialized_var$([^$]+)\)/\1/g; s:\s/\ (GCC be quiet\|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>	2020-07-16 12:35:15 -07:00
Kees Cook	aecfd220b2	x86/mm/numa: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. As a precursor to removing[2] this[3] macro[4], refactor code to avoid its need. The original reason for its use here was to work around the #ifdef being the only place the variable was used. This is better expressed using IS_ENABLED() and a new code block where the variable can be used unconditionally. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Fixes: `1e01979c8f` ("x86, numa: Implement pfn -> nid mapping granularity check") Signed-off-by: Kees Cook <keescook@chromium.org>	2020-07-16 12:32:25 -07:00
Arnd Bergmann	81e96851ea	x86: math-emu: Fix up 'cmp' insn for clang ias The clang integrated assembler requires the 'cmp' instruction to have a length prefix here: arch/x86/math-emu/wm_sqrt.S:212:2: error: ambiguous instructions require an explicit suffix (could be 'cmpb', 'cmpw', or 'cmpl') cmp $0xffffffff,-24(%ebp) ^ Make this a 32-bit comparison, which it was clearly meant to be. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lkml.kernel.org/r/20200527135352.1198078-1-arnd@arndb.de	2020-07-16 17:26:42 +02:00
Sedat Dilek	5769fe26f3	x86/entry: Fix vectors to IDTENTRY_SYSVEC for CONFIG_HYPERV When assembling with Clang via `make LLVM_IAS=1` and CONFIG_HYPERV enabled, we observe the following error: <instantiation>:9:6: error: expected absolute expression .if HYPERVISOR_REENLIGHTENMENT_VECTOR == 3 ^ <instantiation>:1:1: note: while in macro instantiation idtentry HYPERVISOR_REENLIGHTENMENT_VECTOR asm_sysvec_hyperv_reenlightenment sysvec_hyperv_reenlightenment has_error_code=0 ^ ./arch/x86/include/asm/idtentry.h:627:1: note: while in macro instantiation idtentry_sysvec HYPERVISOR_REENLIGHTENMENT_VECTOR sysvec_hyperv_reenlightenment; ^ <instantiation>:9:6: error: expected absolute expression .if HYPERVISOR_STIMER0_VECTOR == 3 ^ <instantiation>:1:1: note: while in macro instantiation idtentry HYPERVISOR_STIMER0_VECTOR asm_sysvec_hyperv_stimer0 sysvec_hyperv_stimer0 has_error_code=0 ^ ./arch/x86/include/asm/idtentry.h:628:1: note: while in macro instantiation idtentry_sysvec HYPERVISOR_STIMER0_VECTOR sysvec_hyperv_stimer0; This is caused by typos in arch/x86/include/asm/idtentry.h: HYPERVISOR_REENLIGHTENMENT_VECTOR -> HYPERV_REENLIGHTENMENT_VECTOR HYPERVISOR_STIMER0_VECTOR -> HYPERV_STIMER0_VECTOR For more details see ClangBuiltLinux issue #1088. Fixes: `a16be368dd` ("x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC") Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1088 Link: https://github.com/ClangBuiltLinux/linux/issues/1043 Link: https://lore.kernel.org/patchwork/patch/1272115/ Link: https://lkml.kernel.org/r/20200714194740.4548-1-sedat.dilek@gmail.com	2020-07-16 17:25:10 +02:00
Jian Cai	6ee93f8df0	x86/entry: Add compatibility with IAS Clang's integrated assembler does not allow symbols with non-absolute values to be reassigned. Modify the interrupt entry loop macro to be compatible with IAS by using a label and an offset. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Brian Gerst <brgerst@gmail.com> Suggested-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Jian Cai <caij2003@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Link: https://github.com/ClangBuiltLinux/linux/issues/1043 Link: https://lkml.kernel.org/r/20200714233024.1789985-1-caij2003@gmail.com	2020-07-16 17:25:09 +02:00
Uros Bizjak	d7866e503b	crypto: x86 - Remove include/asm/inst.h Current minimum required version of binutils is 2.23, which supports PSHUFB, PCLMULQDQ, PEXTRD, AESKEYGENASSIST, AESIMC, AESENC, AESENCLAST, AESDEC, AESDECLAST and MOVQ instruction mnemonics. Substitute macros from include/asm/inst.h with a proper instruction mnemonics in various assmbly files from x86/crypto directory, and remove now unneeded file. The patch was tested by calculating and comparing sha256sum hashes of stripped object files before and after the patch, to be sure that executable code didn't change. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-07-16 21:49:07 +10:00
Ard Biesheuvel	e79a317151	crypto: x86/chacha-sse3 - use unaligned loads for state array Due to the fact that the x86 port does not support allocating objects on the stack with an alignment that exceeds 8 bytes, we have a rather ugly hack in the x86 code for ChaCha to ensure that the state array is aligned to 16 bytes, allowing the SSE3 implementation of the algorithm to use aligned loads. Given that the performance benefit of using of aligned loads appears to be limited (~0.25% for 1k blocks using tcrypt on a Corei7-8650U), and the fact that this hack has leaked into generic ChaCha code, let's just remove it. Cc: Martin Willi <martin@strongswan.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin Willi <martin@strongswan.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-07-16 21:49:04 +10:00
Thomas Gleixner	790ce3b400	x86/idtentry: Remove stale comment Stack switching for interrupt handlers happens in C now for both 64 and 32bit. Remove the stale comment which claims the contrary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2020-07-16 12:44:26 +02:00
Thomas Gleixner	e3beca48a4	irqdomain/treewide: Keep firmware node unconditionally allocated Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after creating the irqdomain. The only purpose of these FW nodes is to convey name information. When this was introduced the core code did not store the pointer to the node in the irqdomain. A recent change stored the firmware node pointer in irqdomain for other reasons and missed to notice that the usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence are broken by this. Storing a dangling pointer is dangerous itself, but in case that the domain is destroyed later on this leads to a double free. Remove the freeing of the firmware node after creating the irqdomain from all affected call sites to cure this. Fixes: `711419e504` ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de	2020-07-14 17:44:42 +02:00
Paolo Bonzini	e8af9e9f45	KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF The "if" that drops the present bit from the page structure fauls makes no sense. It was added by yours truly in order to be bug-compatible with pre-existing code and in order to make the tests pass; however, the tests are wrong. The behavior after this patch matches bare metal. Reported-by: Nadav Amit <namit@vmware.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 17:01:54 -04:00
Mohammed Gamal	3edd68399d	KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which allows userspace to query if the underlying architecture would support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly (e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X) The complications in this patch are due to unexpected (but documented) behaviour we see with NPF vmexit handling in AMD processor. If SVM is modified to add guest physical address checks in the NPF and guest #PF paths, we see the followning error multiple times in the 'access' test in kvm-unit-tests: test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001 Dump mapping: address: 0x123400000000 ------L4: 24c3027 ------L3: 24c4027 ------L2: 24c5021 ------L1: 1002000021 This is because the PTE's accessed bit is set by the CPU hardware before the NPF vmexit. This is handled completely by hardware and cannot be fixed in software. Therefore, availability of the new capability depends on a boolean variable allow_smaller_maxphyaddr which is set individually by VMX and SVM init routines. On VMX it's always set to true, on SVM it's only set to true when NPT is not enabled. CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Babu Moger <babu.moger@amd.com> Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Message-Id: <20200710154811.418214-10-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 17:01:53 -04:00
Paolo Bonzini	8c4182bd27	KVM: VMX: optimize #PF injection when MAXPHYADDR does not match Ignore non-present page faults, since those cannot have reserved bits set. When running access.flat with "-cpu Haswell,phys-bits=36", the number of trapped page faults goes down from `8872644` to 3978948. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-9-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 17:01:52 -04:00
Mohammed Gamal	1dbf5d68af	KVM: VMX: Add guest physical address check in EPT violation and misconfig Check guest physical address against its maximum, which depends on the guest MAXPHYADDR. If the guest's physical address exceeds the maximum (i.e. has reserved bits set), inject a guest page fault with PFERR_RSVD_MASK set. This has to be done both in the EPT violation and page fault paths, as there are complications in both cases with respect to the computation of the correct error code. For EPT violations, unfortunately the only possibility is to emulate, because the access type in the exit qualification might refer to an access to a paging structure, rather than to the access performed by the program. Trapping page faults instead is needed in order to correct the error code, but the access type can be obtained from the original error code and passed to gva_to_gpa. The corrections required in the error code are subtle. For example, imagine that a PTE for a supervisor page has a reserved bit set. On a supervisor-mode access, the EPT violation path would trigger. However, on a user-mode access, the processor will not notice the reserved bit and not include PFERR_RSVD_MASK in the error code. Co-developed-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-8-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 17:01:52 -04:00
Paolo Bonzini	a0c134347b	KVM: VMX: introduce vmx_need_pf_intercept Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-7-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 17:01:52 -04:00
Paolo Bonzini	32de2b5ee3	KVM: x86: update exception bitmap on CPUID changes Allow vendor code to observe changes to MAXPHYADDR and start/stop intercepting page faults. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 17:01:29 -04:00
Paolo Bonzini	6986982fef	KVM: x86: rename update_bp_intercept to update_exception_bitmap We would like to introduce a callback to update the #PF intercept when CPUID changes. Just reuse update_bp_intercept since VMX is already using update_exception_bitmap instead of a bespoke function. While at it, remove an unnecessary assignment in the SVM version, which is already done in the caller (kvm_arch_vcpu_ioctl_set_guest_debug) and has nothing to do with the exception bitmap. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 13:20:18 -04:00
Mohammed Gamal	ec7771ab47	KVM: x86: mmu: Add guest physical address check in translate_gpa() Intel processors of various generations have supported 36, 39, 46 or 52 bits for physical addresses. Until IceLake introduced MAXPHYADDR==52, running on a machine with higher MAXPHYADDR than the guest more or less worked, because software that relied on reserved address bits (like KVM) generally used bit 51 as a marker and therefore the page faults where generated anyway. Unfortunately this is not true anymore if the host MAXPHYADDR is 52, and this can cause problems when migrating from a MAXPHYADDR<52 machine to one with MAXPHYADDR==52. Typically, the latter are machines that support 5-level page tables, so they can be identified easily from the LA57 CPUID bit. When that happens, the guest might have a physical address with reserved bits set, but the host won't see that and trap it. Hence, we need to check page faults' physical addresses against the guest's maximum physical memory and if it's exceeded, we need to add the PFERR_RSVD_MASK bits to the page fault error code. This patch does this for the MMU's page walks. The next patches will ensure that the correct exception and error code is produced whenever no host-reserved bits are set in page table entries. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-4-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 13:09:59 -04:00
Mohammed Gamal	cd313569f5	KVM: x86: mmu: Move translate_gpa() to mmu.c Also no point of it being inline since it's always called through function pointers. So remove that. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-3-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 13:09:40 -04:00
Mohammed Gamal	897861479c	KVM: x86: Add helper functions for illegal GPA checking and page fault injection This patch adds two helper functions that will be used to support virtualizing MAXPHYADDR in both kvm-intel.ko and kvm.ko. kvm_fixup_and_inject_pf_error() injects a page fault for a user-specified GVA, while kvm_mmu_is_illegal_gpa() checks whether a GPA exceeds vCPU address limits. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-2-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 13:07:28 -04:00
Vitaly Kuznetsov	fe9304d318	KVM: x86: drop superfluous mmu_check_root() from fast_pgd_switch() The mmu_check_root() check in fast_pgd_switch() seems to be superfluous: when GPA is outside of the visible range cached_root_available() will fail for non-direct roots (as we can't have a matching one on the list) and we don't seem to care for direct ones. Also, raising #TF immediately when a non-existent GFN is written to CR3 doesn't seem to mach architectural behavior. Drop the check. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 13:03:12 -04:00
Vitaly Kuznetsov	d82aaef9c8	KVM: nSVM: use nested_svm_load_cr3() on guest->host switch Make nSVM code resemble nVMX where nested_vmx_load_cr3() is used on both guest->host and host->guest transitions. Also, we can now eliminate unconditional kvm_mmu_reset_context() and speed things up. Note, nVMX has two different paths: load_vmcs12_host_state() and nested_vmx_restore_host_state() and the later is used to restore from 'partial' switch to L2, it always uses kvm_mmu_reset_context(). nSVM doesn't have this yet. Also, nested_svm_vmexit()'s return value is almost always ignored nowadays. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:59:39 -04:00
Vitaly Kuznetsov	a506fdd223	KVM: nSVM: implement nested_svm_load_cr3() and use it for host->guest switch Undesired triple fault gets injected to L1 guest on SVM when L2 is launched with certain CR3 values. #TF is raised by mmu_check_root() check in fast_pgd_switch() and the root cause is that when kvm_set_cr3() is called from nested_prepare_vmcb_save() with NPT enabled CR3 points to a nGPA so we can't check it with kvm_is_visible_gfn(). Using generic kvm_set_cr3() when switching to nested guest is not a great idea as we'll have to distinguish between 'real' CR3s and 'nested' CR3s to e.g. not call kvm_mmu_new_pgd() with nGPA. Following nVMX implement nested-specific nested_svm_load_cr3() doing the job. To support the change, nested_svm_load_cr3() needs to be re-ordered with nested_svm_init_mmu_context(). Note: the current implementation is sub-optimal as we always do TLB flush/MMU sync but this is still an improvement as we at least stop doing kvm_mmu_reset_context(). Fixes: `7c390d350f` ("kvm: x86: Add fast CR3 switch code path") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:57:37 -04:00
Vitaly Kuznetsov	bf7dea4253	KVM: nSVM: move kvm_set_cr3() after nested_svm_uninit_mmu_context() kvm_mmu_new_pgd() refers to arch.mmu and at this point it still references arch.guest_mmu while arch.root_mmu is expected. Note, the change is effectively a nop: when !npt_enabled, nested_svm_uninit_mmu_context() does nothing (as we don't do nested_svm_init_mmu_context()) and with npt_enabled we don't do kvm_set_cr3(). However, it will matter when we move the call to kvm_mmu_new_pgd into nested_svm_load_cr3(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:56:30 -04:00
Vitaly Kuznetsov	62156f6cd1	KVM: nSVM: introduce nested_svm_load_cr3()/nested_npt_enabled() As a preparatory change for implementing nSVM-specific PGD switch (following nVMX' nested_vmx_load_cr3()), introduce nested_svm_load_cr3() instead of relying on kvm_set_cr3(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:55:36 -04:00
Vitaly Kuznetsov	59cd9bc5b0	KVM: nSVM: prepare to handle errors from enter_svm_guest_mode() Some operations in enter_svm_guest_mode() may fail, e.g. currently we suppress kvm_set_cr3() return value. Prepare the code to proparate errors. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:55:13 -04:00
Vitaly Kuznetsov	ebdb3dba7b	KVM: nSVM: reset nested_run_pending upon nested_svm_vmrun_msrpm() failure WARN_ON_ONCE(svm->nested.nested_run_pending) in nested_svm_vmexit() will fire if nested_run_pending remains '1' but it doesn't really need to, we are already failing and not going to run nested guest. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:54:25 -04:00
Paolo Bonzini	8c008659aa	KVM: MMU: stop dereferencing vcpu->arch.mmu to get the context for MMU init kvm_init_shadow_mmu() was actually the only function that could be called with different vcpu->arch.mmu values. Now that kvm_init_shadow_npt_mmu() is separated from kvm_init_shadow_mmu(), we always know the MMU context we need to use and there is no need to dereference vcpu->arch.mmu pointer. Based on a patch by Vitaly Kuznetsov <vkuznets@redhat.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:53:58 -04:00
Vitaly Kuznetsov	0f04a2ac4f	KVM: nSVM: split kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu() As a preparatory change for moving kvm_mmu_new_pgd() from nested_prepare_vmcb_save() to nested_svm_init_mmu_context() split kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu(). This also makes the code look more like nVMX (kvm_init_shadow_ept_mmu()). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:53:27 -04:00
Vitaly Kuznetsov	d574c539c3	KVM: x86: move MSR_IA32_PERF_CAPABILITIES emulation to common x86 code state_test/smm_test selftests are failing on AMD with: "Unexpected result from KVM_GET_MSRS, r: 51 (failed MSR was 0x345)" MSR_IA32_PERF_CAPABILITIES is an emulated MSR on Intel but it is not known to AMD code, we can move the emulation to common x86 code. For AMD, we basically just allow the host to read and write zero to the MSR. Fixes: `27461da310` ("KVM: x86/pmu: Support full width counting") Suggested-by: Jim Mattson <jmattson@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710152559.1645827-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 12:52:56 -04:00
Linus Torvalds	cb24c61b53	Two simple but important bugfixes. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8IQ20UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNzxQf+KziWiVgLYnRmGVJ4xztRGv8Cjt+o g1YRyJgpST4UEdHyO+T1JvO8HVzTxtDwZlRNHB9UtIaAsAuybSdpUaHeK9lvcZvi vd59ItmyHKFOtItfG6Qpj6MJKN9tbN1y2F9Vc+TXNLL9BLHwPTbvF3l5ffdtBJ6F zurQBec7hoCarXZJS2GfzBQ+16WxZmm7RLDpYtEqAayp+UHNw+Z1eMrMV6TwdxAA 3LgDn3l+A+BNuIIUKFF9Y5Ef3T4zBWGbdsV7FR9mH1fq2DiT1Vz4IT674L+6rEom /KvyyjVPIfQF33sMZmVpLCpoe2IEmOtc7cu3zffqNL6gNw3YHZwMK6cgsg== =abJa -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull vkm fixes from Paolo Bonzini: "Two simple but important bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MIPS: Fix build errors for 32bit kernel KVM: nVMX: fixes for preemption timer migration	2020-07-10 08:34:12 -07:00
Paolo Bonzini	83d31e5271	KVM: nVMX: fixes for preemption timer migration Commit `850448f35a` ("KVM: nVMX: Fix VMX preemption timer migration", 2020-06-01) accidentally broke nVMX live migration from older version by changing the userspace ABI. Restore it and, while at it, ensure that vmx->nested.has_preemption_timer_deadline is always initialized according to the KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE flag. Cc: Makarand Sonare <makarandsonare@google.com> Fixes: `850448f35a` ("KVM: nVMX: Fix VMX preemption timer migration") Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 06:15:36 -04:00
Peter Zijlstra	f9ad4a5f3f	lockdep: Remove lockdep_hardirq{s_enabled,_context}() argument Now that the macros use per-cpu data, we no longer need the argument. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200623083721.571835311@infradead.org	2020-07-10 12:00:02 +02:00
Peter Zijlstra	ba1f2b2eaa	x86/entry: Fix NMI vs IRQ state tracking While the nmi_enter() users did trace_hardirqs_{off_prepare,on_finish}() there was no matching lockdep_hardirqs_*() calls to complete the picture. Introduce idtentry_{enter,exit}_nmi() to enable proper IRQ state tracking across the NMIs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200623083721.216740948@infradead.org	2020-07-10 12:00:01 +02:00
Thomas Gleixner	bc916e67c0	Merge branch 'x86/urgent' into x86/entry to pick up upstream fixes.	2020-07-10 11:45:22 +02:00
Sean Christopherson	6926f95acc	KVM: Move x86's MMU memory cache helpers to common KVM code Move x86's memory cache helpers to common KVM code so that they can be reused by arm64 and MIPS in future patches. Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-16-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:42 -04:00
Sean Christopherson	2aa9c199cf	KVM: Move x86's version of struct kvm_mmu_memory_cache to common code Move x86's 'struct kvm_mmu_memory_cache' to common code in anticipation of moving the entire x86 implementation code to common KVM and reusing it for arm64 and MIPS. Add a new architecture specific asm/kvm_types.h to control the existence and parameters of the struct. The new header is needed to avoid a chicken-and-egg problem with asm/kvm_host.h as all architectures define instances of the struct in their vCPU structs. Add an asm-generic version of kvm_types.h to avoid having empty files on PPC and s390 in the long term, and for arm64 and mips in the short term. Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-15-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:42 -04:00
Sean Christopherson	94ce87ef81	KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be global Rename the memory helpers that will soon be moved to common code and be made globaly available via linux/kvm_host.h. "mmu" alone is not a sufficient namespace for globally available KVM symbols. Opportunistically add "nr_" in mmu_memory_cache_free_objects() to make it clear the function returns the number of free objects, as opposed to freeing existing objects. Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-14-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:41 -04:00
Sean Christopherson	378f5cd64a	KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU topups Don't bother filling the gfn array cache when the caller is a fully direct MMU, i.e. won't need a gfn array for shadow pages. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-13-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:41 -04:00
Sean Christopherson	9688088378	KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock) Set __GFP_ZERO for the shadow page memory cache and drop the explicit clear_page() from kvm_mmu_get_page(). This moves the cost of zeroing a page to the allocation time of the physical page, i.e. when topping up the memory caches, and thus avoids having to zero out an entire page while holding mmu_lock. Cc: Peter Feiner <pfeiner@google.com> Cc: Peter Shier <pshier@google.com> Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Suggested-by: Ben Gardon <bgardon@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-12-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:40 -04:00
Sean Christopherson	5f6078f9f1	KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache Add a gfp_zero flag to 'struct kvm_mmu_memory_cache' and use it to control __GFP_ZERO instead of hardcoding a call to kmem_cache_zalloc(). A future patch needs such a flag for the __get_free_page() path, as gfn arrays do not need/want the allocator to zero the memory. Convert the kmem_cache paths to __GFP_ZERO now so as to avoid a weird and inconsistent API in the future. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:40 -04:00
Sean Christopherson	171a90d70f	KVM: x86/mmu: Separate the memory caches for shadow pages and gfn arrays Use separate caches for allocating shadow pages versus gfn arrays. This sets the stage for specifying __GFP_ZERO when allocating shadow pages without incurring extra cost for gfn arrays. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:40 -04:00
Sean Christopherson	531281ad98	KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches() Clean up the minimums in mmu_topup_memory_caches() to document the driving mechanisms behind the minimums. Now that encountering an empty cache is unlikely to trigger BUG_ON(), it is less dangerous to be more precise when defining the minimums. For rmaps, the logic is 1 parent PTE per level, plus a single rmap, and prefetched rmaps. The extra objects in the current '8 + PREFETCH' minimum came about due to an abundance of paranoia in commit `c41ef344de` ("KVM: MMU: increase per-vcpu rmap cache alloc size"), i.e. it could have increased the minimum to 2 rmaps. Furthermore, the unexpected extra rmap case was killed off entirely by commits `f759e2b4c7` ("KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write") and `f5a1e9f895` ("KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr"). For the so called page cache, replace '8' with 2*PT64_ROOT_MAX_LEVEL. The 2x multiplier is needed because the cache is used for both shadow pages and gfn arrays for indirect MMUs. And finally, for page headers, replace '4' with PT64_ROOT_MAX_LEVEL. Note, KVM now supports 5-level paging, i.e. the old minimums that used a baseline derived from 4-level paging were technically wrong. But, KVM always allocates roots in a separate flow, e.g. it's impossible in the current implementation to actually need 5 new shadow pages in a single flow. Use PT64_ROOT_MAX_LEVEL unmodified instead of subtracting 1, as the direct usage is likely more intuitive to uninformed readers, and the inflated minimum is unlikely to affect functionality in practice. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:39 -04:00
Sean Christopherson	f3747a5a9e	KVM: x86/mmu: Topup memory caches after walking GVA->GPA Topup memory caches after walking the GVA->GPA translation during a shadow page fault, there is no need to ensure the caches are full when walking the GVA. As of commit `f5a1e9f895` ("KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr"), the FNAME(walk_addr) flow no longer add rmaps via kvm_mmu_pte_write(). This avoids allocating memory in the case that the GVA is unmapped in the guest, and also provides a paper trail of why/when the memory caches need to be filled. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:39 -04:00
Sean Christopherson	832914452a	KVM: x86/mmu: Move fast_page_fault() call above mmu_topup_memory_caches() Avoid refilling the memory caches and potentially slow reclaim/swap when handling a fast page fault, which does not need to allocate any new objects. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:39 -04:00
Sean Christopherson	53a3f48771	KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty Attempt to allocate a new object instead of crashing KVM (and likely the kernel) if a memory cache is unexpectedly empty. Use GFP_ATOMIC for the allocation as the caches are used while holding mmu_lock. The immediate BUG_ON() makes the code unnecessarily explosive and led to confusing minimums being used in the past, e.g. allocating 4 objects where 1 would suffice. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:38 -04:00
Sean Christopherson	284aa86868	KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches() Return errors directly from mmu_topup_memory_caches() instead of branching to a label that does the same. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:38 -04:00
Sean Christopherson	356ec69adf	KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals Use "mc" for local variables to shorten line lengths and provide consistent names, which will be especially helpful when some of the helpers are moved to common KVM code in future patches. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:37 -04:00
Sean Christopherson	45177cccd9	KVM: x86/mmu: Consolidate "page" variant of memory cache helpers Drop the "page" variants of the topup/free memory cache helpers, using the existence of an associated kmem_cache to select the correct alloc or free routine. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:37 -04:00
Sean Christopherson	5962bfb748	KVM: x86/mmu: Track the associated kmem_cache in the MMU caches Track the kmem_cache used for non-page KVM MMU memory caches instead of passing in the associated kmem_cache when filling the cache. This will allow consolidating code and other cleanups. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 13:29:37 -04:00
Thomas Gleixner	2245d39886	x86/kvm/vmx: Use native read/write_cr2() read/write_cr2() go throuh the paravirt XXL indirection, but nested VMX in a XEN_PV guest is not supported. Use the native variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Message-Id: <20200708195322.344731916@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:42 -04:00
Thomas Gleixner	c3f08ed150	x86/kvm/svm: Use uninstrumented wrmsrl() to restore GS On guest exit MSR_GS_BASE contains whatever the guest wrote to it and the first action after returning from the ASM code is to set it to the host kernel value. This uses wrmsrl() which is interesting at least. wrmsrl() is either using native_write_msr() or the paravirt variant. The XEN_PV code is uninteresting as nested SVM in a XEN_PV guest does not work. But native_write_msr() can be placed out of line by the compiler especially when paravirtualization is enabled in the kernel configuration. The function is marked notrace, but still can be probed if CONFIG_KPROBE_EVENTS_ON_NOTRACE is enabled. That would be a fatal problem as kprobe events use per-CPU variables which are GS based and would be accessed with the guest GS. Depending on the GS value this would either explode in colorful ways or lead to completely undebugable data corruption. Aside of that native_write_msr() contains a tracepoint which objtool complains about as it is invoked from the noinstr section. As this cannot run inside a XEN_PV guest there is no point in using wrmsrl(). Use native_wrmsrl() instead which is just a plain native WRMSR without tracing or anything else attached. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Juergen Gross <jgross@suse.com> Message-Id: <20200708195322.244847377@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:41 -04:00
Thomas Gleixner	135961e0a7	x86/kvm/svm: Move guest enter/exit into .noinstr.text Move the functions which are inside the RCU off region into the non-instrumentable text section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200708195322.144607767@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:41 -04:00
Thomas Gleixner	3ebccdf373	x86/kvm/vmx: Move guest enter/exit into .noinstr.text Move the functions which are inside the RCU off region into the non-instrumentable text section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200708195322.037311579@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:40 -04:00
Thomas Gleixner	9fc975e9ef	x86/kvm/svm: Add hardirq tracing on guest enter/exit Entering guest mode is more or less the same as returning to user space. From an instrumentation point of view both leave kernel mode and the transition to guest or user mode reenables interrupts on the host. In user mode an interrupt is served directly and in guest mode it causes a VM exit which then handles or reinjects the interrupt. The transition from guest mode or user mode to kernel mode disables interrupts, which needs to be recorded in instrumentation to set the correct state again. This is important for e.g. latency analysis because otherwise the execution time in guest or user mode would be wrongly accounted as interrupt disabled and could trigger false positives. Add hardirq tracing to guest enter/exit functions in the same way as it is done in the user mode enter/exit code, respecting the RCU requirements. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200708195321.934715094@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:39 -04:00
Thomas Gleixner	0642391e21	x86/kvm/vmx: Add hardirq tracing to guest enter/exit Entering guest mode is more or less the same as returning to user space. From an instrumentation point of view both leave kernel mode and the transition to guest or user mode reenables interrupts on the host. In user mode an interrupt is served directly and in guest mode it causes a VM exit which then handles or reinjects the interrupt. The transition from guest mode or user mode to kernel mode disables interrupts, which needs to be recorded in instrumentation to set the correct state again. This is important for e.g. latency analysis because otherwise the execution time in guest or user mode would be wrongly accounted as interrupt disabled and could trigger false positives. Add hardirq tracing to guest enter/exit functions in the same way as it is done in the user mode enter/exit code, respecting the RCU requirements. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200708195321.822002354@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:39 -04:00
Thomas Gleixner	87fa7f3e98	x86/kvm: Move context tracking where it belongs Context tracking for KVM happens way too early in the vcpu_run() code. Anything after guest_enter_irqoff() and before guest_exit_irqoff() cannot use RCU and should also be not instrumented. The current way of doing this covers way too much code. Move it closer to the actual vmenter/exit code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200708195321.724574345@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:38 -04:00
Maxim Levitsky	841c2be09f	kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host To avoid complex and in some cases incorrect logic in kvm_spec_ctrl_test_value, just try the guest's given value on the host processor instead, and if it doesn't #GP, allow the guest to set it. One such case is when host CPU supports STIBP mitigation but doesn't support IBRS (as is the case with some Zen2 AMD cpus), and in this case we were giving guest #GP when it tried to use STIBP The reason why can can do the host test is that IA32_SPEC_CTRL msr is passed to the guest, after the guest sets it to a non zero value for the first time (due to performance reasons), and as as result of this, it is pointless to emulate #GP condition on this first access, in a different way than what the host CPU does. This is based on a patch from Sean Christopherson, who suggested this idea. Fixes: `6441fa6178` ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200708115731.180097-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:38 -04:00
Like Xu	632a4cf57f	KVM/x86: pmu: Fix #GP condition check for RDPMC emulation In guest protected mode, if the current privilege level is not 0 and the PCE flag in the CR4 register is cleared, we will inject a #GP for RDPMC usage. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200708074409.39028-1-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:37 -04:00
Vitaly Kuznetsov	995decb6c4	KVM: x86: take as_id into account when checking PGD OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after enabling paging from SMM. The crash is triggered from mmu_check_root() and is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0 while vCPU may be in a different context (address space). Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:37 -04:00
Xiaoyao Li	5668821aef	KVM: x86: Move kvm_x86_ops.vcpu_after_set_cpuid() into kvm_vcpu_after_set_cpuid() kvm_x86_ops.vcpu_after_set_cpuid() is used to update vmx/svm specific vcpu settings based on updated CPUID settings. So it's supposed to be called after CPUIDs are updated, i.e., kvm_update_cpuid_runtime(). Currently, kvm_update_cpuid_runtime() only updates CPUID bits of OSXSAVE, APIC, OSPKE, MWAIT, KVM_FEATURE_PV_UNHALT and CPUID(0xD,0).ebx and CPUID(0xD, 1).ebx. None of them is consumed by vmx/svm's update_vcpu_after_set_cpuid(). So there is no dependency between them. Move kvm_x86_ops.vcpu_after_set_cpuid() into kvm_vcpu_after_set_cpuid() is obviously more reasonable. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200709043426.92712-6-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:25 -04:00
Xiaoyao Li	7c1b761be0	KVM: x86: Rename cpuid_update() callback to vcpu_after_set_cpuid() The name of callback cpuid_update() is misleading that it's not about updating CPUID settings of vcpu but updating the configurations of vcpu based on the CPUIDs. So rename it to vcpu_after_set_cpuid(). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200709043426.92712-5-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:18 -04:00
Xiaoyao Li	346ce3591d	KVM: x86: Rename kvm_update_cpuid() to kvm_vcpu_after_set_cpuid() Now there is no updating CPUID bits behavior in kvm_update_cpuid(), rename it to kvm_vcpu_after_set_cpuid(). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200709043426.92712-4-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 07:08:08 -04:00
Xiaoyao Li	aedbaf4f6a	KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid() Beside called in kvm_vcpu_ioctl_set_cpuid*(), kvm_update_cpuid() is also called 5 places else in x86.c and 1 place else in lapic.c. All those 6 places only need the part of updating guest CPUIDs (OSXSAVE, OSPKE, APIC, KVM_FEATURE_PV_UNHALT, ...) based on the runtime vcpu state, so extract them as a separate kvm_update_cpuid_runtime(). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200709043426.92712-3-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 06:53:49 -04:00
Xiaoyao Li	a76733a987	KVM: x86: Introduce kvm_check_cpuid() Use kvm_check_cpuid() to validate if userspace provides legal cpuid settings and call it before KVM take any action to update CPUID or update vcpu states based on given CPUID settings. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200709043426.92712-2-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 06:49:14 -04:00
Xiaoyao Li	36f37648ca	KVM: X86: Move kvm_apic_set_version() to kvm_update_cpuid() There is no dependencies between kvm_apic_set_version() and kvm_update_cpuid() because kvm_apic_set_version() queries X2APIC CPUID bit, which is not touched/changed by kvm_update_cpuid(). Obviously, kvm_apic_set_version() belongs to the category of updating vcpu model. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200708065054.19713-9-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-09 06:48:09 -04:00
Thomas Gleixner	bd87e6f661	x86/entry/common: Make prepare_exit_to_usermode() static No users outside this file anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.301116609@linutronix.de	2020-07-09 11:18:30 +02:00
Thomas Gleixner	006e1ced51	x86/entry: Mark check_user_regs() noinstr It's called from the non-instrumentable section. Fixes: `c9c26150e6` ("x86/entry: Assert that syscalls are on the right stack") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.191497962@linutronix.de	2020-07-09 11:18:29 +02:00
Thomas Gleixner	bce9b042ec	x86/traps: Disable interrupts in exc_aligment_check() exc_alignment_check() fails to disable interrupts before returning to the entry code. Fixes: `ca4c6a9858` ("x86/traps: Make interrupt enable/disable symmetric in C code") Reported-by: syzbot+0889df9502bc0f112b31@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.076519438@linutronix.de	2020-07-09 11:18:29 +02:00
Sedat Dilek	3347c8a079	crypto: aesni - Fix build with LLVM_IAS=1 When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS) from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils I see the following breakage in Debian/testing AMD64: <instantiation>:15:74: error: too many positional arguments PRECOMPUTE 83+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, ^ arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation GCM_INIT %r9, 83 +8(%rsp), 83 +16(%rsp), 83 +24(%rsp) ^ <instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc ^ arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation GCM_ENC_DEC dec ^ <instantiation>:15:74: error: too many positional arguments PRECOMPUTE 83+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, ^ arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation GCM_INIT %r9, 83 +8(%rsp), 83 +16(%rsp), 83 +24(%rsp) ^ <instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc ^ arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation GCM_ENC_DEC enc Craig Topper suggested me in ClangBuiltLinux issue #1050: > I think the "too many positional arguments" is because the parser isn't able > to handle the trailing commas. > > The "unknown use of instruction mnemonic" is because the macro was named > GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with > GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the > macro instantiation, but llvm doesn't. First, I removed the trailing comma in the PRECOMPUTE line. Second, I substituted: 1. GHASH_4_ENCRYPT_4_PARALLEL_DEC -> GHASH_4_ENCRYPT_4_PARALLEL_dec 2. GHASH_4_ENCRYPT_4_PARALLEL_ENC -> GHASH_4_ENCRYPT_4_PARALLEL_enc With these changes I was able to build with LLVM_IAS=1 and boot on bare metal. I confirmed that this works with Linux-kernel v5.7.5 final. NOTE: This patch is on top of Linux v5.7 final. Thanks to Craig and especially Nick for double-checking and his comments. Suggested-by: Craig Topper <craig.topper@intel.com> Suggested-by: Craig Topper <craig.topper@gmail.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: "ClangBuiltLinux" <clang-built-linux@googlegroups.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1050 Link: https://bugs.llvm.org/show_bug.cgi?id=24494 Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-07-09 18:25:22 +10:00
Xiaoyao Li	565b782073	KVM: lapic: Use guest_cpuid_has() in kvm_apic_set_version() Only code cleanup and no functional change. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200708065054.19713-8-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:22:01 -04:00
Xiaoyao Li	0d3b2ba16b	KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent As handling of bits out of leaf 1 added over time, kvm_update_cpuid() should not return directly if leaf 1 is absent, but should go on updateing other CPUID leaves. Keep the update of apic->lapic_timer.timer_mode_mask in a separate wrapper, to minimize churn for code since it will be moved out of this function in a future patch. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200708065054.19713-3-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:22:00 -04:00
Xiaoyao Li	1896409282	KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID* fails Current implementation keeps userspace input of CPUID configuration and cpuid->nent even if kvm_update_cpuid() fails. Reset vcpu->arch.cpuid_nent to 0 for the case of failure as a simple fix. Besides, update the doc to explicitly state that if IOCTL SET_CPUID* fail KVM gives no gurantee that previous valid CPUID configuration is kept. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200708065054.19713-2-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:22:00 -04:00
Like Xu	2e8cd7a3b8	kvm: x86: limit the maximum number of vPMU fixed counters to 3 Some new Intel platforms (such as TGL) already have the fourth fixed counter TOPDOWN.SLOTS, but it has not been fully enabled on KVM and the host. Therefore, we limit edx.split.num_counters_fixed to 3, so that it does not break the kvm-unit-tests PMU test case and bad-handled userspace. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200624015928.118614-1-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:21:59 -04:00
Krish Sadhukhan	761e416934	KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests According to section "Canonicalization and Consistency Checks" in APM vol. 2 the following guest state is illegal: "Any MBZ bit of CR3 is set." "Any MBZ bit of CR4 is set." Suggeted-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <1594168797-29444-3-git-send-email-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:21:59 -04:00
Paolo Bonzini	53efe527ca	KVM: x86: Make CR4.VMXE reserved for the guest CR4.VMXE is reserved unless the VMX CPUID bit is set. On Intel, it is also tested by vmx_set_cr4, but AMD relies on kvm_valid_cr4, so fix it. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:21:58 -04:00
Krish Sadhukhan	b899c13277	KVM: x86: Create mask for guest CR4 reserved bits in kvm_update_cpuid() Instead of creating the mask for guest CR4 reserved bits in kvm_valid_cr4(), do it in kvm_update_cpuid() so that it can be reused instead of creating it each time kvm_valid_cr4() is called. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <1594168797-29444-2-git-send-email-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:21:58 -04:00
Jim Mattson	d42e3fae6f	kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changes According to the SDM, when PAE paging would be in use following a MOV-to-CR0 that modifies any of CR0.CD, CR0.NW, or CR0.PG, then the PDPTEs are loaded from the address in CR3. Previously, kvm only loaded the PDPTEs when PAE paging would be in use following a MOV-to-CR0 that modified CR0.PG. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Message-Id: <20200707223630.336700-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-08 16:21:58 -04:00

1 2 3 4 5 ...

36145 Commits