-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXBvKlAAKCRCAXGG7T9hj
vmIoAP0XpLCE+0Z1hhxcDcJ0hKah1NIniRSIGGr6Af+gxe8F4wEA0Vm55gtEZerU
9mL5S7e2EcuTo93XCIjsxU8uPLGtegQ=
=59wi
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.21-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Xen features and fixes:
- a series to enable KVM guests to be booted by qemu via the Xen PVH
boot entry for speeding up KVM guest tests
- a series for a common driver to be used by Xen PV frontends (right
now drm and sound)
- two other fixes in Xen related code"
* tag 'for-linus-4.21-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
ALSA: xen-front: Use Xen common shared buffer implementation
drm/xen-front: Use Xen common shared buffer implementation
xen: Introduce shared buffer helpers for page directory...
xen/pciback: Check dev_data before using it
kprobes/x86/xen: blacklist non-attachable xen interrupt functions
KVM: x86: Allow Qemu/KVM to use PVH entry point
xen/pvh: Add memory map pointer to hvm_start_info struct
xen/pvh: Move Xen code for getting mem map via hcall out of common file
xen/pvh: Move Xen specific PVH VM initialization out of common file
xen/pvh: Create a new file for Xen specific PVH code
xen/pvh: Move PVH entry code out of Xen specific tree
xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In the end, we ended up with quite a lot more than I expected:
- Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
kernel-side support to come later)
- Support for per-thread stack canaries, pending an update to GCC that
is currently undergoing review
- Support for kexec_file_load(), which permits secure boot of a kexec
payload but also happens to improve the performance of kexec
dramatically because we can avoid the sucky purgatory code from
userspace. Kdump will come later (requires updates to libfdt).
- Optimisation of our dynamic CPU feature framework, so that all
detected features are enabled via a single stop_machine() invocation
- KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
they can benefit from global TLB entries when KASLR is not in use
- 52-bit virtual addressing for userspace (kernel remains 48-bit)
- Patch in LSE atomics for per-cpu atomic operations
- Custom preempt.h implementation to avoid unconditional calls to
preempt_schedule() from preempt_enable()
- Support for the new 'SB' Speculation Barrier instruction
- Vectorised implementation of XOR checksumming and CRC32 optimisations
- Workaround for Cortex-A76 erratum #1165522
- Improved compatibility with Clang/LLD
- Support for TX2 system PMUS for profiling the L3 cache and DMC
- Reflect read-only permissions in the linear map by default
- Ensure MMIO reads are ordered with subsequent calls to Xdelay()
- Initial support for memory hotplug
- Tweak the threshold when we invalidate the TLB by-ASID, so that
mremap() performance is improved for ranges spanning multiple PMDs.
- Minor refactoring and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJcE4TmAAoJELescNyEwWM0Nr0H/iaU7/wQSzHyNXtZoImyKTul
Blu2ga4/EqUrTU7AVVfmkl/3NBILWlgQVpY6tH6EfXQuvnxqD7CizbHyLdyO+z0S
B5PsFUH2GLMNAi48AUNqGqkgb2knFbg+T+9IimijDBkKg1G/KhQnRg6bXX32mLJv
Une8oshUPBVJMsHN1AcQknzKariuoE3u0SgJ+eOZ9yA2ZwKxP4yy1SkDt3xQrtI0
lojeRjxcyjTP1oGRNZC+BWUtGOT35p7y6cGTnBd/4TlqBGz5wVAJUcdoxnZ6JYVR
O8+ob9zU+4I0+SKt80s7pTLqQiL9rxkKZ5joWK1pr1g9e0s5N5yoETXKFHgJYP8=
=sYdt
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 festive updates from Will Deacon:
"In the end, we ended up with quite a lot more than I expected:
- Support for ARMv8.3 Pointer Authentication in userspace (CRIU and
kernel-side support to come later)
- Support for per-thread stack canaries, pending an update to GCC
that is currently undergoing review
- Support for kexec_file_load(), which permits secure boot of a kexec
payload but also happens to improve the performance of kexec
dramatically because we can avoid the sucky purgatory code from
userspace. Kdump will come later (requires updates to libfdt).
- Optimisation of our dynamic CPU feature framework, so that all
detected features are enabled via a single stop_machine()
invocation
- KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that
they can benefit from global TLB entries when KASLR is not in use
- 52-bit virtual addressing for userspace (kernel remains 48-bit)
- Patch in LSE atomics for per-cpu atomic operations
- Custom preempt.h implementation to avoid unconditional calls to
preempt_schedule() from preempt_enable()
- Support for the new 'SB' Speculation Barrier instruction
- Vectorised implementation of XOR checksumming and CRC32
optimisations
- Workaround for Cortex-A76 erratum #1165522
- Improved compatibility with Clang/LLD
- Support for TX2 system PMUS for profiling the L3 cache and DMC
- Reflect read-only permissions in the linear map by default
- Ensure MMIO reads are ordered with subsequent calls to Xdelay()
- Initial support for memory hotplug
- Tweak the threshold when we invalidate the TLB by-ASID, so that
mremap() performance is improved for ranges spanning multiple PMDs.
- Minor refactoring and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits)
arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset()
arm64: sysreg: Use _BITUL() when defining register bits
arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches
arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4
arm64: docs: document pointer authentication
arm64: ptr auth: Move per-thread keys from thread_info to thread_struct
arm64: enable pointer authentication
arm64: add prctl control for resetting ptrauth keys
arm64: perf: strip PAC when unwinding userspace
arm64: expose user PAC bit positions via ptrace
arm64: add basic pointer authentication support
arm64/cpufeature: detect pointer authentication
arm64: Don't trap host pointer auth use to EL2
arm64/kvm: hide ptrauth from guests
arm64/kvm: consistently handle host HCR_EL2 flags
arm64: add pointer authentication register bits
arm64: add comments about EC exception levels
arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned
arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field
arm64: enable per-task stack canaries
...
Pull x86 pti updates from Thomas Gleixner:
"No point in speculating what's in this parcel:
- Drop the swap storage limit when L1TF is disabled so the full space
is available
- Add support for the new AMD STIBP always on mitigation mode
- Fix a bunch of STIPB typos"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add support for STIBP always-on preferred mode
x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off
x86/speculation: Change misspelled STIPB to STIBP
- Update the ACPICA code in the kernel to the 20181213 upstream
revision including:
* New Windows _OSI strings (Bob Moore, Jung-uk Kim).
* Buffers-to-string conversions update (Bob Moore).
* Removal of support for expressions in package elements (Bob
Moore).
* New option to display method/object evaluation in debug output
(Bob Moore).
* Compiler improvements (Bob Moore, Erik Schmauss).
* Minor debugger fix (Erik Schmauss).
* Disassembler improvement (Erik Schmauss).
* Assorted cleanups (Bob Moore, Colin Ian King, Erik Schmauss).
- Add support for a new OEM _OSI string to indicate special handling
of secondary graphics adapters on some systems (Alex Hung).
- Make it possible to build the ACPI subystem without PCI support
(Sinan Kaya).
- Make the SPCR table handling regard baud rate 0 in accordance with
the specification of it and make the DSDT override code support
DSDT code names generated by recent ACPICA (Andy Shevchenko, Wang
Dongsheng, Nathan Chancellor).
- Add clock frequency for Hisilicon Hip08 SPI controller to the ACPI
driver for AMD SoCs (APD) (Jay Fang).
- Fix the PM handling during device init in the ACPI driver for
Intel SoCs (LPSS) (Hans de Goede).
- Avoid double panic()s by clearing the APEI GHES block_status
before panic() (Lenny Szubowicz).
- Clean up a function invocation in the ACPI core and get rid of
some code duplication by using the DEFINE_SHOW_ATTRIBUTE macro
in the APEI support code (Alexey Dobriyan, Yangtao Li).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcHMSBAAoJEILEb/54YlRxZmEQAIbRXKOwvvt3my9HLBC/6V1u
+Wed0yNBQ9HkVWQzFuppDq97/kk5DRODnPNu9RaeS7QXxVOBfwElinm8NhzVI7Fm
FP5iPwnNq8EAkDTBOoG139Fs82EkaVSa2x9FHy84Jge3BXmauQM13bWP/kF5TjCn
Frjuh0TfhQ+ub853GisAr/SW7ixCWp81FZaW/xFcDuJU2E6AvjNQusdiAocgAqQ8
rnl8D0gjSW6m6HcauaTizRMXOIyePkfT86xQKwU7259ByRW20iQtsl/6+Rnyy3wG
cCrlsaHd0bP6qwVAQyh6cURq8hdLAUYI9tzBW0EL+UEpJ289j51s+RSh2nZNyIKO
wfbr2DdK3aaWcUygSxoP4FFHqINch/IRwaP2huT9szO1yLCikAN8Xmrb1BPZvOIK
m6Lywb1B+SOfGgJl4Z1GjzIc6dimrXVbgxjN1+Bpe1NeKqe/M6vMdbcvPIsMs7b8
iE/1gJPeJ5pvAgsQiWncZvyaOKaSmrLWbaw/ITQnNXVLDlTI3hIQExiPPl5hJ00v
Z4egVMdCCxYqZxxkZKEYnEe/lb9BRAMIvbkkocPBdmtNAWPuVnCqdR26BppaEt7i
r2tnEd84aISCDcBc2sIpo/pVUwncw5GtK20z8Ke+3rlg8lDZ0hAdHQWgBtj4xnnw
grImzXnKvSdajfZnvjRg
=yxXc
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to the 20181213 upstream
revision, make it possible to build the ACPI subsystem without PCI
support, and a new OEM _OSI string, add a new device support to the
ACPI driver for AMD SoCs and fix PM handling in the ACPI driver for
Intel SoCs, fix the SPCR table handling and do some assorted fixes and
cleanups.
Specifics:
- Update the ACPICA code in the kernel to the 20181213 upstream
revision including:
* New Windows _OSI strings (Bob Moore, Jung-uk Kim).
* Buffers-to-string conversions update (Bob Moore).
* Removal of support for expressions in package elements (Bob
Moore).
* New option to display method/object evaluation in debug output
(Bob Moore).
* Compiler improvements (Bob Moore, Erik Schmauss).
* Minor debugger fix (Erik Schmauss).
* Disassembler improvement (Erik Schmauss).
* Assorted cleanups (Bob Moore, Colin Ian King, Erik Schmauss).
- Add support for a new OEM _OSI string to indicate special handling
of secondary graphics adapters on some systems (Alex Hung).
- Make it possible to build the ACPI subystem without PCI support
(Sinan Kaya).
- Make the SPCR table handling regard baud rate 0 in accordance with
the specification of it and make the DSDT override code support
DSDT code names generated by recent ACPICA (Andy Shevchenko, Wang
Dongsheng, Nathan Chancellor).
- Add clock frequency for Hisilicon Hip08 SPI controller to the ACPI
driver for AMD SoCs (APD) (Jay Fang).
- Fix the PM handling during device init in the ACPI driver for Intel
SoCs (LPSS) (Hans de Goede).
- Avoid double panic()s by clearing the APEI GHES block_status before
panic() (Lenny Szubowicz).
- Clean up a function invocation in the ACPI core and get rid of some
code duplication by using the DEFINE_SHOW_ATTRIBUTE macro in the
APEI support code (Alexey Dobriyan, Yangtao Li)"
* tag 'acpi-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (31 commits)
ACPI / tables: Add an ifdef around amlcode and dsdt_amlcode
ACPI/APEI: Clear GHES block_status before panic()
ACPI: Make PCI slot detection driver depend on PCI
ACPI/IORT: Stub out ACS functions when CONFIG_PCI is not set
arm64: select ACPI PCI code only when both features are enabled
PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set
ACPICA: Remove PCI bits from ACPICA when CONFIG_PCI is unset
ACPI: Allow CONFIG_PCI to be unset for reboot
ACPI: Move PCI reset to a separate function
ACPI / OSI: Add OEM _OSI string to enable dGPU direct output
ACPI / tables: add DSDT AmlCode new declaration name support
ACPICA: Update version to 20181213
ACPICA: change coding style to match ACPICA, no functional change
ACPICA: Debug output: Add option to display method/object evaluation
ACPICA: disassembler: disassemble OEMx tables as AML
ACPICA: Add "Windows 2018.2" string in the _OSI support
ACPICA: Expressions in package elements are not supported
ACPICA: Update buffer-to-string conversions
ACPICA: add comments, no functional change
ACPICA: Remove defines that use deprecated flag
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcHOSgAAoJEL/70l94x66DAw4H/jQjdRjT1DAf4vswXwMD6lpJ
qHcSyAYL4d/PFbcovfAm2ca8F0HJylVWDeZcqQRP3zdX53diqJ4gyYMaNuuY0niX
zKvzNhFw1oaZK93rwrF6BX1jl4Virw2uC4qL9bhgV/OfkmvTPvIFkP8gJGVDt9YY
Kn5yhWnJOpHOCQs3GW8zOy2LWtiuCrp7epSrMGjGsWrp50ccW1tTioxYyDmBr3mF
GizAIgDD2xMwIeOlj4IngQhDTahwekOA9XzhSMKjm0/GMcZ33TXPcnUdoa0Yxguj
Uu3cXLfcEUfakZdefi3FB5eDB2knDe3kbmKviok2giAAY1hBvEO5b6bHrn+5W2g=
=l4oP
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fix from Paolo Bonzini:
"A simple patch for a pretty bad bug: Unbreak AMD nested
virtualization."
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: nSVM: fix switch to guest mmu
Pull x86 fixes from Ingo Molnar:
"The biggest part is a series of reverts for the macro based GCC
inlining workarounds. It caused regressions in distro build and other
kernel tooling environments, and the GCC project was very receptive to
fixing the underlying inliner weaknesses - so as time ran out we
decided to do a reasonably straightforward revert of the patches. The
plan is to rely on the 'asm inline' GCC 9 feature, which might be
backported to GCC 8 and could thus become reasonably widely available
on modern distros.
Other than those reverts, there's misc fixes from all around the
place.
I wish our final x86 pull request for v4.20 was smaller..."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs"
Revert "x86/objtool: Use asm macros to work around GCC inlining bugs"
Revert "x86/refcount: Work around GCC inlining bug"
Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs"
Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs"
Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"
Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs"
Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs"
Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"
x86/mtrr: Don't copy uninitialized gentry fields back to userspace
x86/fsgsbase/64: Fix the base write helper functions
x86/mm/cpa: Fix cpa_flush_array() TLB invalidation
x86/vdso: Pass --eh-frame-hdr to the linker
x86/mm: Fix decoy address handling vs 32-bit builds
x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
x86/dump_pagetables: Fix LDT remap address marker
x86/mm: Fix guard hole handling
We are compiling PCI code today for systems with ACPI and no PCI
device present. Remove the useless code and reduce the tight
dependency.
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Recent optimizations in MMU code broke nested SVM with NPT in L1
completely: when we do nested_svm_{,un}init_mmu_context() we want
to switch from TDP MMU to shadow MMU, both init_kvm_tdp_mmu() and
kvm_init_shadow_mmu() check if re-configuration is needed by looking
at cache source data. The data, however, doesn't change - it's only
the type of the MMU which changes. We end up not re-initializing
guest MMU as shadow and everything goes off the rails.
The issue could have been fixed by putting MMU type into extended MMU
role but this is not really needed. We can just split root and guest MMUs
the exact same way we did for nVMX, their types never change in the
lifetime of a vCPU.
There is still room for improvement: currently, we reset all MMU roots
when switching from L1 to L2 and back and this is not needed.
Fixes: 7dcd575520 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit 5bdcd510c2.
The macro based workarounds for GCC's inlining bugs caused regressions: distcc
and other distro build setups broke, and the fixes are not easy nor will they
solve regressions on already existing installations.
So we are reverting this patch and the 8 followup patches.
What makes this revert easier is that GCC9 will likely include the new 'asm inline'
syntax that makes inlining of assembly blocks a lot more robust.
This is a superior method to any macro based hackeries - and might even be
backported to GCC8, which would make all modern distros get the inlining
fixes as well.
Many thanks to Masahiro Yamada and others for helping sort out these problems.
Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Biener <rguenther@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the copy_to_user of data in the gentry struct is copying
uninitiaized data in field _pad from the stack to userspace.
Fix this by explicitly memset'ing gentry to zero, this also will zero any
compiler added padding fields that may be in struct (currently there are
none).
Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable")
Fixes: b263b31e8a ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: security@kernel.org
Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com
Some guests OSes (including Windows 10) write to MSR 0xc001102c
on some cases (possibly while trying to apply a CPU errata).
Make KVM ignore reads and writes to that MSR, so the guest won't
crash.
The MSR is documented as "Execution Unit Configuration (EX_CFG)",
at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family
15h Models 00h-0Fh Processors".
Cc: stable@vger.kernel.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported by syzkaller:
CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline]
RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline]
RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline]
RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline]
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074
Call Trace:
kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696
ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr
and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
However, irqchip is not initialized by this simple testcase, ioapic/apic
objects should not be accessed.
This patch fixes it by also considering whether or not apic is present.
Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It
caches the kmap()ed page object and pointer, however, it doesn't handle
errors correctly: it's possible to cache a valid pointer, then release
the page and later dereference the dangling pointer.
I was able to reproduce with the following steps:
1. Call vmlaunch with valid posted_intr_desc_addr but an invalid
MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed
pi_desc_page and pi_desc. Later the invalid EFER value fails
check_vmentry_postreqs() which fails the first vmlaunch.
2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr
(I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages
pi_desc_page is unmapped and released and pi_desc_page is set to NULL
(the "shouldn't happen" clause). Due to the invalid
posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and
nested_get_vmcs12_pages() returns. It doesn't return an error value so
vmlaunch proceeds. Note that at this time we have a dangling pointer in
vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs.
3. Issue an IPI in L2 guest code. This triggers a call to
vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which
dereferences the dangling pointer.
Vulnerable code requires nested and enable_apicv variables to be set to
true. The host CPU must also support posted interrupts.
Fixes: 5e2f30b756 "KVM: nVMX: get rid of nested_get_page()"
Cc: stable@vger.kernel.org
Reviewed-by: Andy Honig <ahonig@google.com>
Signed-off-by: Cfir Cohen <cfir@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andy spotted a regression in the fs/gs base helpers after the patch series
was committed. The helper functions which write fs/gs base are not just
writing the base, they are also changing the index. That's wrong and needs
to be separated because writing the base has not to modify the index.
While the regression is not causing any harm right now because the only
caller depends on that behaviour, it's a guarantee for subtle breakage down
the road.
Make the index explicitly changed from the caller, instead of including
the code in the helpers.
Subsequently, the task write helpers do not handle for the current task
anymore. The range check for a base value is also factored out, to minimize
code redundancy from the caller.
Fixes: b1378a561f ("x86/fsgsbase/64: Introduce FS/GS base helper functions")
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20181126195524.32179-1-chang.seok.bae@intel.com
Different AMD processors may have different implementations of STIBP.
When STIBP is conditionally enabled, some implementations would benefit
from having STIBP always on instead of toggling the STIBP bit through MSR
writes. This preference is advertised through a CPUID feature bit.
When conditional STIBP support is requested at boot and the CPU advertises
STIBP always-on mode as preferred, switch to STIBP "on" support. To show
that this transition has occurred, create a new spectre_v2_user_mitigation
value and a new spectre_v2_user_strings message. The new mitigation value
is used in spectre_v2_user_select_mitigation() to print the new mitigation
message as well as to return a new string from stibp_state().
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20181213230352.6937.74943.stgit@tlendack-t1.amdoffice.net
In commit:
a7295fd53c ("x86/mm/cpa: Use flush_tlb_kernel_range()")
I misread the CAP array code and incorrectly used
tlb_flush_kernel_range(), resulting in missing TLB flushes and
consequent failures.
Instead do a full invalidate in this case -- for now.
Reported-by: StDenis, Tom <Tom.StDenis@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Fixes: a7295fd53c ("x86/mm/cpa: Use flush_tlb_kernel_range()")
Link: http://lkml.kernel.org/r/20181203171043.089868285@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Blacklist symbols in Xen probe-prohibited areas, so that user can see
these prohibited symbols in debugfs.
See also: a50480cb6d.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Commit
379d98ddf4 ("x86: vdso: Use $LD instead of $CC to link")
accidentally broke unwinding from userspace, because ld would strip the
.eh_frame sections when linking.
Originally, the compiler would implicitly add --eh-frame-hdr when
invoking the linker, but when this Makefile was converted from invoking
ld via the compiler, to invoking it directly (like vmlinux does),
the flag was missed. (The EH_FRAME section is important for the VDSO
shared libraries, but not for vmlinux.)
Fix the problem by explicitly specifying --eh-frame-hdr, which restores
parity with the old method.
See relevant bug reports for additional info:
https://bugzilla.kernel.org/show_bug.cgi?id=201741https://bugzilla.redhat.com/show_bug.cgi?id=1659295
Fixes: 379d98ddf4 ("x86: vdso: Use $LD instead of $CC to link")
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Carlos O'Donell <carlos@redhat.com>
Reported-by: "H. J. Lu" <hjl.tools@gmail.com>
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: kernel-team@android.com
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.com
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, Qemu should be able to boot directly into the
uncompressed Linux kernel binary without the need to run firmware.
There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:
https://xenbits.xen.org/docs/unstable/misc/pvh.html
This patch enables Qemu to use that same entry point for booting KVM
guests.
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
We need to refactor PVH entry code so that support for other hypervisors
like Qemu/KVM can be added more easily.
The original design for PVH entry in Xen guests relies on being able to
obtain the memory map from the hypervisor using a hypercall. When we
extend the PVH entry ABI to support other hypervisors like Qemu/KVM,
a new mechanism will be added that allows the guest to get the memory
map without needing to use hypercalls.
For Xen guests, the hypercall approach will still be supported. In
preparation for adding support for other hypervisors, we can move the
code that uses hypercalls into the Xen specific file. This will allow us
to compile kernels in the future without CONFIG_XEN that are still capable
of being booted as a Qemu/KVM guest via the PVH entry point.
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
We need to refactor PVH entry code so that support for other hypervisors
like Qemu/KVM can be added more easily.
This patch moves the small block of code used for initializing Xen PVH
virtual machines into the Xen specific file. This initialization is not
going to be needed for Qemu/KVM guests. Moving it out of the common file
is going to allow us to compile kernels in the future without CONFIG_XEN
that are still capable of being booted as a Qemu/KVM guest via the PVH
entry point.
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
We need to refactor PVH entry code so that support for other hypervisors
like Qemu/KVM can be added more easily.
The first step in that direction is to create a new file that will
eventually hold the Xen specific routines.
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Once hypervisors other than Xen start using the PVH entry point for
starting VMs, we would like the option of being able to compile PVH entry
capable kernels without enabling CONFIG_XEN and all the code that comes
along with that. To allow that, we are moving the PVH code out of Xen and
into files sitting at a higher level in the tree.
This patch is not introducing any code or functional changes, just moving
files from one location to another.
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
In order to pave the way for hypervisors other than Xen to use the PVH
entry point for VMs, we need to factor the PVH entry code into Xen specific
and hypervisor agnostic components. The first step in doing that, is to
create a new config option for PVH entry that can be enabled
independently from CONFIG_XEN.
Signed-off-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
A decoy address is used by set_mce_nospec() to update the cache attributes
for a page that may contain poison (multi-bit ECC error) while attempting
to minimize the possibility of triggering a speculative access to that
page.
When reserve_memtype() is handling a decoy address it needs to convert it
to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK,
is broken for a 32-bit physical mask and reserve_memtype() is passed the
last physical page. Gert reports triggering the:
BUG_ON(start >= end);
...assertion when running a 32-bit non-PAE build on a platform that has
a driver resource at the top of physical memory:
BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
Given that the decoy address scheme is only targeted at 64-bit builds and
assumes that the top of physical address space is free for use as a decoy
address range, simply bypass address sanitization in the 32-bit case.
Lastly, there was no need to crash the system when this failure occurred,
and no need to crash future systems if the assumptions of decoy addresses
are ever violated. Change the BUG_ON() to a WARN() with an error return.
Fixes: 510ee090ab ("x86/mm/pat: Prepare {reserve, free}_memtype() for...")
Reported-by: Gert Robben <t2@gert.gr>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Gert Robben <t2@gert.gr>
Cc: stable@vger.kernel.org
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: platform-driver-x86@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.com
The user triggers the creation of a pseudo-locked region when writing
the requested schemata to the schemata resctrl file. The pseudo-locking
of a region is required to be done on a CPU that is associated with the
cache on which the pseudo-locked region will reside. In order to run the
locking code on a specific CPU, the needed CPU has to be selected and
ensured to remain online during the entire locking sequence.
At this time, the cpu_hotplug_lock is not taken during the pseudo-lock
region creation and it is thus possible for a CPU to be selected to run
the pseudo-locking code and then that CPU to go offline before the
thread is able to run on it.
Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on
which code has to run needs to be controlled. Since the cpu_hotplug_lock
is always taken before rdtgroup_mutex the lock order is maintained.
Fixes: e0bdfe8e36 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com
Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever
the system is deemed affected by L1TF vulnerability. Even though the limit
is quite high for most deployments it seems to be too restrictive for
deployments which are willing to live with the mitigation disabled.
We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is
clearly out of the limit.
Drop the swap restriction when l1tf=off is specified. It also doesn't make
much sense to warn about too much memory for the l1tf mitigation when it is
forcefully disabled by the administrator.
[ tglx: Folded the documentation delta change ]
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: <linux-mm@kvack.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.org
There is a guard hole at the beginning of the kernel address space, also
used by hypervisors. It occupies 16 PGD entries.
This reserved range is not defined explicitely, it is calculated relative
to other entities: direct mapping and user space ranges.
The calculation got broken by recent changes of the kernel memory layout:
LDT remap range is now mapped before direct mapping and makes the
calculation invalid.
The breakage leads to crash on Xen dom0 boot[1].
Define the reserved range explicitely. It's part of kernel ABI (hypervisors
expect it to be stable) and must not depend on changes in the rest of
kernel memory layout.
[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html
Fixes: d52888aa27 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: bhe@redhat.com
Cc: linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com
Pull x86 fixes from Ingo Molnar:
"Three fixes: a boot parameter re-(re-)fix, a retpoline build artifact
fix and an LLVM workaround"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Drop implicit common-page-size linker flag
x86/build: Fix compiler support check for CONFIG_RETPOLINE
x86/boot: Clear RSDP address in boot_params for broken loaders
Pull kprobes fixes from Ingo Molnar:
"Two kprobes fixes: a blacklist fix and an instruction patching related
corruption fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Blacklist non-attachable interrupt functions
kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
Pull EFI fixes from Ingo Molnar:
"Two fixes: a large-system fix and an earlyprintk fix with certain
resolutions"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/earlyprintk/efi: Fix infinite loop on some screen widths
x86/efi: Allocate e820 buffer before calling efi_exit_boot_service
GNU linker's -z common-page-size's default value is based on the target
architecture. arch/x86/entry/vdso/Makefile sets it to the architecture
default, which is implicit and redundant. Drop it.
Fixes: 2aae950b21 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Reported-by: Dmitry Golovin <dima@golovin.in>
Reported-by: Bill Wendling <morbo@google.com>
Suggested-by: Dmitry Golovin <dima@golovin.in>
Suggested-by: Rui Ueyama <ruiu@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Fangrui Song <maskray@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com
Link: https://bugs.llvm.org/show_bug.cgi?id=38774
Link: https://github.com/ClangBuiltLinux/linux/issues/31
PREEMPT_NEED_RESCHED is never used directly, so move it into the arch
code where it can potentially be implemented using either a different
bit in the preempt count or as an entirely separate entity.
Cc: Robert Love <rml@tech9.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
These interrupt functions are already non-attachable by kprobes.
Blacklist them explicitly so that they can show up in
/sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
additional information.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/20181206095648.GA8249@Dell
Signed-off-by: Ingo Molnar <mingo@kernel.org>
STIBP stands for Single Thread Indirect Branch Predictors. The acronym,
however, can be easily mis-spelled as STIPB. It is perhaps due to the
presence of another related term - IBPB (Indirect Branch Predictor
Barrier).
Fix the mis-spelling in the code.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1544039368-9009-1-git-send-email-longman@redhat.com
It is troublesome to add a diagnostic like this to the Makefile
parse stage because the top-level Makefile could be parsed with
a stale include/config/auto.conf.
Once you are hit by the error about non-retpoline compiler, the
compilation still breaks even after disabling CONFIG_RETPOLINE.
The easiest fix is to move this check to the "archprepare" like
this commit did:
829fe4aa9a ("x86: Allow generating user-space headers without a compiler")
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Fixes: 4cd24de3a0 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com
Link: https://lkml.org/lkml/2018/12/4/206
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After copy_optimized_instructions() copies several instructions
to the working buffer it tries to fix up the real RIP address, but it
adjusts the RIP-relative instruction with an incorrect RIP address
for the 2nd and subsequent instructions due to a bug in the logic.
This will break the kernel pretty badly (with likely outcomes such as
a kernel freeze, a crash, or worse) because probed instructions can refer
to the wrong data.
For example putting kprobes on cpumask_next() typically hits this bug.
cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y
(in this case nr_cpumask_bits is an alias of nr_cpu_ids):
<cpumask_next>:
48 89 f0 mov %rsi,%rax
8b 35 7b fb e2 00 mov 0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids>
55 push %rbp
...
If we put a kprobe on it and it gets jump-optimized, it gets
patched by the kprobes code like this:
<cpumask_next>:
e9 95 7d 07 1e jmpq 0xffffffffa000207a
7b fb jnp 0xffffffff81f8a2e2 <cpumask_next+2>
e2 00 loop 0xffffffff81f8a2e9 <cpumask_next+9>
55 push %rbp
This shows that the first two MOV instructions were copied to a
trampoline buffer at 0xffffffffa000207a.
Here is the disassembled result of the trampoline, skipping
the optprobe template instructions:
# Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea:
54 push %rsp
...
48 83 c4 08 add $0x8,%rsp
9d popfq
48 89 f0 mov %rsi,%rax
8b 35 82 7d db e2 mov -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3>
This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of
the original *nr_cpu_ids. This leads to a kernel freeze because
cpumask_next() always returns 0 and for_each_cpu() never ends.
Fix this by adding 'len' correctly to the real RIP address while
copying.
[ mingo: Improved the changelog. ]
Reported-by: Michael Rodin <michael@rodin.online>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # v4.15+
Fixes: 63fef14fc9 ("kprobes/x86: Make insn buffer always ROX and use text_poke()")
Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Gunnar Krueger reported a systemd-boot failure and bisected it down to:
e6e094e053 ("x86/acpi, x86/boot: Take RSDP address from boot params if available")
In case a broken boot loader doesn't clear its 'struct boot_params', clear
rsdp_addr in sanitize_boot_params().
Reported-by: Gunnar Krueger <taijian@posteo.de>
Tested-by: Gunnar Krueger <taijian@posteo.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: sstabellini@kernel.org
Fixes: e6e094e053 ("x86/acpi, x86/boot: Take RSDP address from boot params if available")
Link: http://lkml.kernel.org/r/20181203103811.17056-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXAOQWAAKCRCAXGG7T9hj
vqqLAQCC+eJNcIEGXm2B5SpadB4v/TKPk+GYRK/zUP0FVPHo3AEAoSTZlmCUUSno
7me8n0sTrd4+ZKMGHvFgAYbpoO90XQM=
=BxP8
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A revert of a previous commit as it is no longer necessary and has
shown to cause problems in some memory hotplug cases.
- Some small fixes and a minor cleanup.
- A patch for adding better diagnostic data in a very rare failure
case.
* tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
pvcalls-front: fixes incorrect error handling
Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"
xen: xlate_mmu: add missing header to fix 'W=1' warning
xen/x86: add diagnostic printout to xen_mc_flush() in case of error
x86/xen: cleanup includes in arch/x86/xen/spinlock.c
Pull STIBP fallout fixes from Thomas Gleixner:
"The performance destruction department finally got it's act together
and came up with a cure for the STIPB regression:
- Provide a command line option to control the spectre v2 user space
mitigations. Default is either seccomp or prctl (if seccomp is
disabled in Kconfig). prctl allows mitigation opt-in, seccomp
enables the migitation for sandboxed processes.
- Rework the code to handle the conditional STIBP/IBPB control and
remove the now unused ptrace_may_access_sched() optimization
attempt
- Disable STIBP automatically when SMT is disabled
- Optimize the switch_to() logic to avoid MSR writes and invocations
of __switch_to_xtra().
- Make the asynchronous speculation TIF updates synchronous to
prevent stale mitigation state.
As a general cleanup this also makes retpoline directly depend on
compiler support and removes the 'minimal retpoline' option which just
pretended to provide some form of security while providing none"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/speculation: Provide IBPB always command line options
x86/speculation: Add seccomp Spectre v2 user space protection mode
x86/speculation: Enable prctl mode for spectre_v2_user
x86/speculation: Add prctl() control for indirect branch speculation
x86/speculation: Prepare arch_smt_update() for PRCTL mode
x86/speculation: Prevent stale SPEC_CTRL msr content
x86/speculation: Split out TIF update
ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
x86/speculation: Prepare for conditional IBPB in switch_mm()
x86/speculation: Avoid __switch_to_xtra() calls
x86/process: Consolidate and simplify switch_to_xtra() code
x86/speculation: Prepare for per task indirect branch speculation control
x86/speculation: Add command line control for indirect branch speculation
x86/speculation: Unify conditional spectre v2 print functions
x86/speculataion: Mark command line parser data __initdata
x86/speculation: Mark string arrays const correctly
x86/speculation: Reorder the spec_v2 code
x86/l1tf: Show actual SMT state
x86/speculation: Rework SMT state change
sched/smt: Expose sched_smt_present static key
...
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- MCE related boot crash fix on certain AMD systems
- FPU exception handling fix
- FPU handling race fix
- revert+rewrite of the RSDP boot protocol extension, use boot_params
instead
- documentation fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE/AMD: Fix the thresholding machinery initialization order
x86/fpu: Use the correct exception table macro in the XSTATE_OP wrapper
x86/fpu: Disable bottom halves while loading FPU registers
x86/acpi, x86/boot: Take RSDP address from boot params if available
x86/boot: Mostly revert commit ae7e1238e6 ("Add ACPI RSDP address to setup_header")
x86/ptrace: Fix documentation for tracehook_report_syscall_entry()