Pull timer updates from Thomas Gleixner:
"The timer and timekeeping departement delivers:
Core:
- The consolidation of the VDSO code into a generic library including
the conversion of x86 and ARM64. Conversion of ARM and MIPS are en
route through the relevant maintainer trees and should end up in
5.4.
This gets rid of the unnecessary different copies of the same code
and brings all architectures on the same level of VDSO
functionality.
- Make the NTP user space interface more robust by restricting the
TAI offset to prevent undefined behaviour. Includes a selftest.
- Validate user input in the compat settimeofday() syscall to catch
invalid values which would be turned into valid values by a
multiplication overflow
- Consolidate the time accessors
- Small fixes, improvements and cleanups all over the place
Drivers:
- Support for the NXP system counter, TI davinci timer
- Move the Microsoft HyperV clocksource/events code into the
drivers/clocksource directory so it can be shared between x86 and
ARM64.
- Overhaul of the Tegra driver
- Delay timer support for IXP4xx
- Small fixes, improvements and cleanups as usual"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
time: Validate user input in compat_settimeofday()
timer: Document TIMER_PINNED
clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic
clocksource/drivers: Make Hyper-V clocksource ISA agnostic
MAINTAINERS: Fix Andy's surname and the directory entries of VDSO
hrtimer: Use a bullet for the returns bullet list
arm64: vdso: Fix compilation with clang older than 8
arm64: compat: Fix __arch_get_hw_counter() implementation
arm64: Fix __arch_get_hw_counter() implementation
lib/vdso: Make delta calculation work correctly
MAINTAINERS: Add entry for the generic VDSO library
arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system
arm64: vdso: Remove unnecessary asm-offsets.c definitions
vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h
clocksource/drivers/davinci: Add support for clocksource
clocksource/drivers/davinci: Add support for clockevents
clocksource/drivers/tegra: Set up maximum-ticks limit properly
clocksource/drivers/tegra: Cycles can't be 0
clocksource/drivers/tegra: Restore base address before cleanup
clocksource/drivers/tegra: Add verbose definition for 1MHz constant
...
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG
and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags'
introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl0eHqcACgkQa9axLQDI
XvFyNA/+L+bnkz8m3ncydlqqfXomQn4eJJVQ8Uksb0knJz+1+3CUxxbO4ry4jXZN
fMkbggYrDPRKpDbsUl0lsRipj7jW9bqan+N37c3SWqCkgb6HqDaHViwxdx6Ec/Uk
gHudozDSPh/8c7hxGcSyt/CFyuW6b+8eYIQU5rtIgz8aVY2BypBvS/7YtYCbIkx0
w4CFleRTK1zXD5mJQhrc6jyDx659sVkrAvdhf6YIymOY8nBTv40vwdNo3beJMYp8
Po/+0Ixu+VkHUNtmYYZQgP/AGH96xiTcRnUqd172JdtRPpCLqnLqwFokXeVIlUKT
KZFMDPzK+756Ayn4z4huEePPAOGlHbJje8JVNnFyreKhVVcCotW7YPY/oJR10bnc
eo7yD+DxABTn+93G2yP436bNVa8qO1UqjOBfInWBtnNFJfANIkZweij/MQ6MjaTA
o7KtviHnZFClefMPoiI7HDzwL8XSmsBDbeQ04s2Wxku1Y2xUHLx4iLmadwLQ1ZPb
lZMTZP3N/T1554MoURVA1afCjAwiqU3bt1xDUGjbBVjLfSPBAn/25IacsG9Li9AF
7Rp1M9VhrfLftjFFkB2HwpbhRASOxaOSx+EI3kzEfCtM2O9I1WHgP3rvCdc3l0HU
tbK0/IggQicNgz7GSZ8xDlWPwwSadXYGLys+xlMZEYd3pDIOiFc=
=0TDT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...
All fpu__xstate_clear_all_cpu_caps() does is to invoke one simple
function since commit
73e3a7d2a7 ("x86/fpu: Remove the explicit clearing of XSAVE dependent features")
so invoke that function directly and remove the wrapper.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190704060743.rvew4yrjd6n33uzx@linutronix.de
The command line option `no387' is designed to disable the FPU
entirely. This only 'works' with CONFIG_MATH_EMULATION enabled.
But on 64bit this cannot work because user space expects SSE to work which
required basic FPU support. MATH_EMULATION does not help because SSE is not
emulated.
The command line option `nofxsr' should also be limited to 32bit because
FXSR is part of the required flags on 64bit so turning it off is not
possible.
Clearing X86_FEATURE_FPU without emulation enabled will not work anyway and
hang in fpu__init_system_early_generic() before the console is enabled.
Setting additioal dependencies, ensures that the CPU still boots on a
modern CPU. Otherwise, dropping FPU will leave FXSR enabled causing the
kernel to crash early in fpu__init_system_mxcsr().
With XSAVE support it will crash in fpu__init_cpu_xstate(). The problem is
that xsetbv() with XMM set and SSE cleared is not allowed. That means
XSAVE has to be disabled. The XSAVE support is disabled in
fpu__init_system_xstate_size_legacy() but it is too late. It can be
removed, it has been added in commit
1f999ab5a1 ("x86, xsave: Disable xsave in i387 emulation mode")
to use `no387' on a CPU with XSAVE support.
All this happens before console output.
After hat, the next possible crash is in RAID6 detect code because MMX
remained enabled. With a 3DNOW enabled config it will explode in memcpy()
for instance due to kernel_fpu_begin() but this is unconditionally enabled.
This is enough to boot a Debian Wheezy on a 32bit qemu "host" CPU which
supports everything up to XSAVES, AVX2 without 3DNOW. Later, Debian
increased the minimum requirements to i686 which means it does not boot
userland atleast due to CMOV.
After masking the additional features it still keeps SSE4A and 3DNOW*
enabled (if present on the host) but those are unused in the kernel.
Restrict `no387' and `nofxsr' otions to 32bit only. Add dependencies for
FPU, FXSR to additionaly mask CMOV, MMX, XSAVE if FXSR or FPU is cleared.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190703083247.57kjrmlxkai3vpw3@linutronix.de
- Fixes a deadlock from a previous fix to keep module loading
and function tracing text modifications from stepping on each other.
(this has a few patches to help document the issue in comments)
- Fix a crash when the snapshot buffer gets out of sync with the
main ring buffer.
- Fix a memory leak when reading the memory logs
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXRzBCBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnDaAP9qTFBOFtgIGCT5wVP8xjQeESxh1b8R
tbaT7/U2oPpeiwEAvp1mYo5UYcc8KauBqVaLSLJVN4pv07xiZF5Qgh9C1QE=
=m2IT
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This includes three fixes:
- Fix a deadlock from a previous fix to keep module loading and
function tracing text modifications from stepping on each other
(this has a few patches to help document the issue in comments)
- Fix a crash when the snapshot buffer gets out of sync with the main
ring buffer
- Fix a memory leak when reading the memory logs"
* tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()
tracing/snapshot: Resize spare buffer if size changed
tracing: Fix memory leak in tracing_err_log_open()
ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()
ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()
The FSGSBASE series turned out to have serious bugs and there is still an
open issue which is not fully understood yet.
The confidence in those changes has become close to zero especially as the
test cases which have been shipped with that series were obviously never
run before sending the final series out to LKML.
./fsgsbase_64 >/dev/null
Segmentation fault
As the merge window is close, the only sane decision is to revert FSGSBASE
support. The revert is necessary as this branch has been merged into
perf/core already and rebasing all of that a few days before the merge
window is not the most brilliant idea.
I could definitely slap myself for not noticing the test case fail when
merging that series, but TBH my expectations weren't that low back
then. Won't happen again.
Revert the following commits:
539bca535d ("x86/entry/64: Fix and clean up paranoid_exit")
2c7b5ac5d5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
f987c955c7 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
2032f1f96e ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
5bf0cab60e ("x86/entry/64: Document GSBASE handling in the paranoid path")
708078f657 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
79e1932fa3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
1d07316b13 ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
f60a83df45 ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
1ab5f3f7fe ("x86/process/64: Use FSBSBASE in switch_to() if available")
a86b462513 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
8b71340d70 ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
b64ed19b93 ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Hyper-V clock/timer code and data structures are currently mixed
in with other code in the ISA independent drivers/hv directory as
well as the ISA dependent Hyper-V code under arch/x86.
Consolidate this code and data structures into a Hyper-V clocksource driver
to better follow the Linux model. In doing so, separate out the ISA
dependent portions so the new clocksource driver works for x86 and for the
in-process Hyper-V on ARM64 code.
To start, move the existing clockevents code to create the new clocksource
driver. Update the VMbus driver to call initialization and cleanup routines
since the Hyper-V synthetic timers are not independently enumerated in
ACPI.
No behavior is changed and no new functionality is added.
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "bp@alien8.de" <bp@alien8.de>
Cc: "will.deacon@arm.com" <will.deacon@arm.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
Cc: "olaf@aepfle.de" <olaf@aepfle.de>
Cc: "apw@canonical.com" <apw@canonical.com>
Cc: "jasowang@redhat.com" <jasowang@redhat.com>
Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: "sashal@kernel.org" <sashal@kernel.org>
Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
Cc: "arnd@arndb.de" <arnd@arndb.de>
Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
Cc: "paul.burton@mips.com" <paul.burton@mips.com>
Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
Cc: "salyzyn@android.com" <salyzyn@android.com>
Cc: "pcc@google.com" <pcc@google.com>
Cc: "shuah@kernel.org" <shuah@kernel.org>
Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
Cc: "huw@codeweavers.com" <huw@codeweavers.com>
Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
Quite some time ago the interrupt entry stubs for unused vectors in the
system vector range got removed and directly mapped to the spurious
interrupt vector entry point.
Sounds reasonable, but it's subtly broken. The spurious interrupt vector
entry point pushes vector number 0xFF on the stack which makes the whole
logic in __smp_spurious_interrupt() pointless.
As a consequence any spurious interrupt which comes from a vector != 0xFF
is treated as a real spurious interrupt (vector 0xFF) and not
acknowledged. That subsequently stalls all interrupt vectors of equal and
lower priority, which brings the system to a grinding halt.
This can happen because even on 64-bit the system vector space is not
guaranteed to be fully populated. A full compile time handling of the
unused vectors is not possible because quite some of them are conditonally
populated at runtime.
Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
but gains the proper handling back. There is no point to selectively spare
some of the stubs which are known at compile time as the required code in
the IDT management would be way larger and convoluted.
Do not route the spurious entries through common_interrupt and do_IRQ() as
the original code did. Route it to smp_spurious_interrupt() which evaluates
the vector number and acts accordingly now that the real vector numbers are
handed in.
Fixup the pr_warn so the actual spurious vector (0xff) is clearly
distiguished from the other vectors and also note for the vectored case
whether it was pending in the ISR or not.
"Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
"Spurious interrupt vector 0xed on CPU#1. Acked."
"Spurious interrupt vector 0xee on CPU#1. Not pending!."
Fixes: 2414e021ac ("x86: Avoid building unused IRQ entry stubs")
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.de
Since the rework of the vector management, warnings about spurious
interrupts have been reported. Robert provided some more information and
did an initial analysis. The following situation leads to these warnings:
CPU 0 CPU 1 IO_APIC
interrupt is raised
sent to CPU1
Unable to handle
immediately
(interrupts off,
deep idle delay)
mask()
...
free()
shutdown()
synchronize_irq()
clear_vector()
do_IRQ()
-> vector is clear
Before the rework the vector entries of legacy interrupts were statically
assigned and occupied precious vector space while most of them were
unused. Due to that the above situation was handled silently because the
vector was handled and the core handler of the assigned interrupt
descriptor noticed that it is shut down and returned.
While this has been usually observed with legacy interrupts, this situation
is not limited to them. Any other interrupt source, e.g. MSI, can cause the
same issue.
After adding proper synchronization for level triggered interrupts, this
can only happen for edge triggered interrupts where the IO-APIC obviously
cannot provide information about interrupts in flight.
While the spurious warning is actually harmless in this case it worries
users and driver developers.
Handle it gracefully by marking the vector entry as VECTOR_SHUTDOWN instead
of VECTOR_UNUSED when the vector is freed up.
If that above late handling happens the spurious detector will not complain
and switch the entry to VECTOR_UNUSED. Any subsequent spurious interrupt on
that line will trigger the spurious warning as before.
Fixes: 464d12309e ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>-
Tested-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.459647741@linutronix.de
When an interrupt is shut down in free_irq() there might be an inflight
interrupt pending in the IO-APIC remote IRR which is not yet serviced. That
means the interrupt has been sent to the target CPUs local APIC, but the
target CPU is in a state which delays the servicing.
So free_irq() would proceed to free resources and to clear the vector
because synchronize_hardirq() does not see an interrupt handler in
progress.
That can trigger a spurious interrupt warning, which is harmless and just
confuses users, but it also can leave the remote IRR in a stale state
because once the handler is invoked the interrupt resources might be freed
already and therefore acknowledgement is not possible anymore.
Implement the irq_get_irqchip_state() callback for the IO-APIC irq chip. The
callback is invoked from free_irq() via __synchronize_hardirq(). Check the
remote IRR bit of the interrupt and return 'in flight' if it is set and the
interrupt is configured in level mode. For edge mode the remote IRR has no
meaning.
As this is only meaningful for level triggered interrupts this won't cure
the potential spurious interrupt warning for edge triggered interrupts, but
the edge trigger case does not result in stale hardware state. This has to
be addressed at the vector/interrupt entry level seperately.
Fixes: 464d12309e ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.370295517@linutronix.de
ftrace_arch_code_modify_prepare() is acquiring text_mutex, while the
corresponding release is happening in ftrace_arch_code_modify_post_process().
This has already been documented in the code, but let's also make the fact
that this is intentional clear to the semantic analysis tools such as sparse.
Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1906292321170.27227@cbobk.fhfr.pm
Fixes: 39611265ed ("ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()")
Fixes: d5b844a2cf ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When sending a call-function IPI-many to vCPUs, yield if any of
the IPI target vCPUs was preempted, we just select the first
preempted target vCPU which we found since the state of target
vCPUs can change underneath and to avoid race conditions.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull x86 fixes from Ingo Molnar:
"Misc fixes all over the place:
- might_sleep() atomicity fix in the microcode loader
- resctrl boundary condition fix
- APIC arithmethics bug fix for frequencies >= 4.2 GHz
- three 5-level paging crash fixes
- two speculation fixes
- a perf/stacktrace fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Fall back to using frame pointers for generated code
perf/x86: Always store regs->ip in perf_callchain_kernel()
x86/speculation: Allow guests to use SSBD even if host does not
x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
x86/boot/64: Add missing fixup_pointer() for next_early_pgt access
x86/boot/64: Fix crash if kernel image crosses page table boundary
x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz
x86/resctrl: Prevent possible overrun during bitmap operations
x86/microcode: Fix the microcode load on CPU hotplug for real
Pull perf fixes from Ingo Molnar:
"Various fixes, most of them related to bugs perf fuzzing found in the
x86 code"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/regs: Use PERF_REG_EXTENDED_MASK
perf/x86: Remove pmu->pebs_no_xmm_regs
perf/x86: Clean up PEBS_XMM_REGS
perf/x86/regs: Check reserved bits
perf/x86: Disable extended registers for non-supported PMUs
perf/ioctl: Add check for the sample_period value
perf/core: Fix perf_sample_regs_user() mm check
Recent Intel chipsets including Skylake and ApolloLake have a special
ITSSPRC register which allows the 8254 PIT to be gated. When gated, the
8254 registers can still be programmed as normal, but there are no IRQ0
timer interrupts.
Some products such as the Connex L1430 and exone go Rugged E11 use this
register to ship with the PIT gated by default. This causes Linux to fail
to boot:
Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
apic=debug and send a report.
The panic happens before the framebuffer is initialized, so to the user, it
appears as an early boot hang on a black screen.
Affected products typically have a BIOS option that can be used to enable
the 8254 and make Linux work (Chipset -> South Cluster Configuration ->
Miscellaneous Configuration -> 8254 Clock Gating), however it would be best
to make Linux support the no-8254 case.
Modern sytems allow to discover the TSC and local APIC timer frequencies,
so the calibration against the PIT is not required. These systems have
always running timers and the local APIC timer works also in deep power
states.
So the setup of the PIT including the IO-APIC timer interrupt delivery
checks are a pointless exercise.
Skip the PIT setup and the IO-APIC timer interrupt checks on these systems,
which avoids the panic caused by non ticking PITs and also speeds up the
boot process.
Thanks to Daniel for providing the changelog, initial analysis of the
problem and testing against a variety of machines.
Reported-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: linux@endlessm.com
Cc: rafael.j.wysocki@intel.com
Cc: hdegoede@redhat.com
Link: https://lkml.kernel.org/r/20190628072307.24678-1-drake@endlessm.com
Taking the text_mutex in ftrace_arch_code_modify_prepare() is to fix a
race against module loading and live kernel patching that might try to
change the text permissions while ftrace has it as read/write. This
really needs to be documented in the code. Add a comment that does such.
Link: http://lkml.kernel.org/r/20190627211819.5a591f52@gandalf.local.home
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The commit 9f255b632b ("module: Fix livepatch/ftrace module text
permissions race") causes a possible deadlock between register_kprobe()
and ftrace_run_update_code() when ftrace is using stop_machine().
The existing dependency chain (in reverse order) is:
-> #1 (text_mutex){+.+.}:
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
__mutex_lock+0x88/0x908
mutex_lock_nested+0x32/0x40
register_kprobe+0x254/0x658
init_kprobes+0x11a/0x168
do_one_initcall+0x70/0x318
kernel_init_freeable+0x456/0x508
kernel_init+0x22/0x150
ret_from_fork+0x30/0x34
kernel_thread_starter+0x0/0xc
-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x90c/0xde0
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
cpus_read_lock+0x62/0xd0
stop_machine+0x2e/0x60
arch_ftrace_update_code+0x2e/0x40
ftrace_run_update_code+0x40/0xa0
ftrace_startup+0xb2/0x168
register_ftrace_function+0x64/0x88
klp_patch_object+0x1a2/0x290
klp_enable_patch+0x554/0x980
do_one_initcall+0x70/0x318
do_init_module+0x6e/0x250
load_module+0x1782/0x1990
__s390x_sys_finit_module+0xaa/0xf0
system_call+0xd8/0x2d0
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);
It is similar problem that has been solved by the commit 2d1e38f566
("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
To be on the safe side, text_mutex must become a low level lock taken
after cpu_hotplug_lock.rw_sem.
This can't be achieved easily with the current ftrace design.
For example, arm calls set_all_modules_text_rw() already in
ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
This functions is called:
+ outside stop_machine() from ftrace_run_update_code()
+ without stop_machine() from ftrace_module_enable()
Fortunately, the problematic fix is needed only on x86_64. It is
the only architecture that calls set_all_modules_text_rw()
in ftrace path and supports livepatching at the same time.
Therefore it is enough to move text_mutex handling from the generic
kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:
ftrace_arch_code_modify_prepare()
ftrace_arch_code_modify_post_process()
This patch basically reverts the ftrace part of the problematic
commit 9f255b632b ("module: Fix livepatch/ftrace module
text permissions race"). And provides x86_64 specific-fix.
Some refactoring of the ftrace code will be needed when livepatching
is implemented for arm or nds32. These architectures call
set_all_modules_text_rw() and use stop_machine() at the same time.
Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com
Fixes: 9f255b632b ("module: Fix livepatch/ftrace module text permissions race")
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
[
As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
ftrace_run_update_code() as it is a void function.
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.
On some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:
clocksource: timekeeping watchdog on CPU1: Marking clocksource
'tsc-early' as unstable because the skew is too large:
clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
fffedb90 mask: ffffffff
clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
mask: ffffffffffffffff
tsc: Marking TSC unstable due to clocksource watchdog
As per measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers:
Platform Before After
104-core (208 Threads) Skylake 1437ms 28ms
2-core ( 4 Threads) Haswell 114ms 2ms
Reported-by: Mohammad Etemadi <mohammad.etemadi@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alan Cox <alan.cox@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/1561689337-19390-3-git-send-email-ricardo.neri-calderon@linux.intel.com
Processors which have self-snooping capability can handle conflicting
memory type across CPUs by snooping its own cache. However, there exists
CPU models in which having conflicting memory types still leads to
unpredictable behavior, machine check errors, or hangs.
Clear this feature on affected CPUs to prevent its use.
Suggested-by: Alan Cox <alan.cox@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jordan Borgner <mail@jordan-borgner.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Mohammad Etemadi <mohammad.etemadi@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/1561689337-19390-2-git-send-email-ricardo.neri-calderon@linux.intel.com
Restrict kdump to only reserve crashkernel below 64TB.
The reaons is that the kdump may jump from a 5-level paging mode to a
4-level paging mode kernel. If a 4-level paging mode kdump kernel is put
above 64TB, then the kdump kernel cannot start.
The 1st kernel reserves the kdump kernel region during bootup. At that
point it is not known whether the kdump kernel has 5-level or 4-level
paging support.
To support both restrict the kdump kernel reservation to the lower 64TB
address space to ensure that a 4-level paging mode kdump kernel can be
loaded and successfully started.
[ tglx: Massaged changelog ]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20190524073810.24298-4-bhe@redhat.com
If the running kernel has 5-level paging activated, the 5-level paging mode
is preserved across kexec. If the kexec'ed kernel does not contain support
for handling active 5-level paging mode in the decompressor, the
decompressor will crash with #GP.
Prevent this situation at load time. If 5-level paging is active, check the
xloadflags whether the kexec kernel can handle 5-level paging at least in
the decompressor. If not, reject the load attempt and print out an error
message.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dyoung@redhat.com
Link: https://lkml.kernel.org/r/20190524073810.24298-3-bhe@redhat.com
All preparations are done. Use the channel storage for the legacy
clockevent and remove the static variable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.737689919@linutronix.de
Replace the static initialization of the legacy clockevent with runtime
initialization utilizing the common init function as the last preparatory
step to switch the legacy clockevent over to the channel 0 storage in
hpet_base.
This comes with a twist. The static clockevent initializer has selected
support for periodic and oneshot mode unconditionally whether the HPET
config advertised periodic mode or not. Even the pre clockevents code did
this. But....
Using the conditional in hpet_init_clockevent() makes at least Qemu and one
hardware machine fail to boot. There are two issues which cause the boot
failure:
#1 After the timer delivery test in IOAPIC and the IOAPIC setup the next
interrupt is not delivered despite the HPET channel being programmed
correctly. Reprogramming the HPET after switching to IOAPIC makes it
work again. After fixing this, the next issue surfaces:
#2 Due to the unconditional periodic mode 'availability' the Local APIC
timer calibration can hijack the global clockevents event handler
without causing damage. Using oneshot at this stage makes if hang
because the HPET does not get reprogrammed due to the handler
hijacking. Duh, stupid me!
Both issues require major surgery and especially the kick HPET again after
enabling IOAPIC results in really nasty hackery. This 'assume periodic
works' magic has survived since HPET support got added, so it's
questionable whether this should be fixed. Both Qemu and the failing
hardware machine support periodic mode despite the fact that both don't
advertise it in the configuration register and both need that extra kick
after switching to IOAPIC. Seems to be a feature...
Keep the 'assume periodic works' magic around and add a big fat comment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.646565913@linutronix.de
To finally remove the static channel0/clockevent storage and to utilize the
channel 0 storage in hpet_base, it's required to run time initialize the
clockevent. The MSI clockevents already have a run time init function.
Carve out the parts which can be shared between the legacy and the MSI
implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.552451082@linutronix.de
Now that the legacy clockevent is wrapped in a hpet_channel struct most
clockevent functions can be shared between the legacy and the MSI based
clockevents.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.461437795@linutronix.de
For HPET channel 0 there exist two clockevent structures right now:
- the static hpet_clockevent
- the clockevent in channel 0 storage
The goal is to use the clockevent in the channel storage, remove the static
variable and share code with the MSI implementation.
As a first step wrap the legacy clockevent into a hpet_channel struct and
convert the users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.368141247@linutronix.de
Now that HPET clockevent support is integrated into the channel data, reuse
the cached boot configuration instead of copying the same information into
a flags field.
This also allows to consolidate the reservation code into one place, which
can now solely depend on the mode information.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.277510163@linutronix.de
Instead of allocating yet another data structure, move the clock event data
into the channel structure. This allows further consolidation of the
reservation code and the reuse of the cached boot config to replace the
extra flags in the clockevent data.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.185851116@linutronix.de
struct hpet_dev is gone with the next change as the clockevent storage
moves into struct hpet_channel. So the variable name hdev will not make
sense anymore. Ditto for timer vs. channel and similar details.
Doing the rename in the change makes the patch harder to review. Doing it
afterward is problematic vs. tracking down issues. Doing it upfront is the
easiest solution as it does not change functionality.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.093113681@linutronix.de
If CONFIG_HPET=y is enabled the x86 specific HPET code should reserve at
least one channel for the /dev/hpet character device, so that not all
channels are absorbed for per CPU clockevent devices.
Create a function to assign HPET_MODE_DEVICE so the rework of the
clockevents allocation code can utilize the mode information instead of
reducing the number of evaluated channels by #ifdef hackery.
The function is not yet used, but provided as a separate patch for ease of
review. It will be used when the rework of the clockevent selection takes
place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132436.002758910@linutronix.de
The usage of the individual HPET channels is not tracked in a central
place. The information is scattered in different data structures. Also the
HPET reservation in the HPET character device is split out into several
places which makes the code hard to follow.
Assigning a mode to the channel allows to consolidate the reservation code
and paves the way for further simplifications.
As a first step set the mode of the legacy channels when the HPET is in
legacy mode.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132435.911652981@linutronix.de
Instead of rereading the HPET registers over and over use the information
which was cached in hpet_enable().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132435.821728550@linutronix.de
Introduce new data structures to replace the ad hoc collection of separate
variables and pointers.
Replace the boot configuration store and restore as a first step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132435.728456320@linutronix.de
It's a function not a macro and the upcoming changes use channel for the
individual hpet timer units to allow a step by step refactoring approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132435.241032433@linutronix.de
There is no point to loop for 200k TSC cycles to check afterwards whether
the HPET counter is working. Read the counter inside of the loop and break
out when the counter value changed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132435.149535103@linutronix.de
The init code checks whether the HPET counter works late in the init
function when the clocksource is registered. That should happen right with
the other sanity checks.
Split it into a separate validation function and move it to the other
sanity checks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132435.058540608@linutronix.de
It doesn't make sense to have init functions in the middle of other
code. Aside of that, further changes in that area create horrible diffs if
the code stays where it is.
No functional change
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132434.951733064@linutronix.de
Having static and global variables sprinkled all over the code is just
annoying to read. Move them all to the top of the file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132434.860549134@linutronix.de
The clockevent device pointer is not used in this function.
While at it, rename the misnamed 'timer' parameter to 'channel', which makes it
clear what this parameter means.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132434.447880978@linutronix.de
As a preparatory change for further consolidation, restructure the HPET
init code so it becomes more readable. Fix up misleading and stale comments
and rename variables so they actually make sense.
No intended functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132434.247842972@linutronix.de
The indirection via work scheduled on the upcoming CPU was necessary with the
old hotplug code because the online callback was invoked on the control CPU
not on the upcoming CPU. The rework of the CPU hotplug core guarantees that
the online callbacks are invoked on the upcoming CPU.
Remove the now pointless work redirection.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/20190623132434.047254075@linutronix.de
The ORC unwinder can't unwind through BPF JIT generated code because
there are no ORC entries associated with the code.
If an ORC entry isn't available, try to fall back to frame pointers. If
BPF and other generated code always do frame pointer setup (even with
CONFIG_FRAME_POINTERS=n) then this will allow ORC to unwind through most
generated code despite there being no corresponding ORC entries.
Fixes: d15d356887 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kairui Song <kasong@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/b6f69208ddff4343d56b7bfac1fc7cfcd62689e8.1561595111.git.jpoimboe@redhat.com
The index to access the threads tls array is controlled by userspace
via syscall: sys_ptrace(), hence leading to a potential exploitation
of the Spectre variant 1 vulnerability.
The index can be controlled from:
ptrace -> arch_ptrace -> do_get_thread_area.
Fix this by sanitizing the user supplied index before using it to access
the p->thread.tls_array.
Signed-off-by: Dianzhang Chen <dianzhangchen0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1561524630-3642-1-git-send-email-dianzhangchen0@gmail.com
The index to access the threads ptrace_bps is controlled by userspace via
syscall: sys_ptrace(), hence leading to a potential exploitation of the
Spectre variant 1 vulnerability.
The index can be controlled from:
ptrace -> arch_ptrace -> ptrace_get_debugreg.
Fix this by sanitizing the user supplied index before using it access
thread->ptrace_bps.
Signed-off-by: Dianzhang Chen <dianzhangchen0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1561476617-3759-1-git-send-email-dianzhangchen0@gmail.com
The bits set in x86_spec_ctrl_mask are used to calculate the guest's value
of SPEC_CTRL that is written to the MSR before VMENTRY, and control which
mitigations the guest can enable. In the case of SSBD, unless the host has
enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in
the kernel parameters), the SSBD bit is not set in the mask and the guest
can not properly enable the SSBD always on mitigation mode.
This has been confirmed by running the SSBD PoC on a guest using the SSBD
always on mitigation mode (booted with kernel parameter
"spec_store_bypass_disable=on"), and verifying that the guest is vulnerable
unless the host is also using SSBD always on mode. In addition, the guest
OS incorrectly reports the SSB vulnerability as mitigated.
Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports
it, allowing the guest to use SSBD whether or not the host has chosen to
enable the mitigation in any of its modes.
Fixes: be6fcb5478 ("x86/bugs: Rework spec_ctrl base and mask logic")
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: bp@alien8.de
Cc: rkrcmar@redhat.com
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jimenez@oracle.com
The following sparse warning is emitted:
arch/x86/kernel/crash.c:59:15:
warning: symbol 'crash_zero_bytes' was not declared. Should it be static?
The variable is only used in this compilation unit, but it is also only
used when CONFIG_KEXEC_FILE is enabled. Just making it static would result
in a 'defined but not used' warning for CONFIG_KEXEC_FILE=n.
Make it static and move it into the existing CONFIG_KEXEC_FILE section.
[ tglx: Massaged changelog and moved it into the existing ifdef ]
Fixes: dd5f726076 ("kexec: support for kexec on panic using new system call")
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: kexec@lists.infradead.org
Cc: vgoyal@redhat.com
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/117ef0c6.3d30.16b87c9cfbf.Coremail.kernelpatch@126.com
__startup_64() uses fixup_pointer() to access global variables in a
position-independent fashion. Access to next_early_pgt was wrapped into the
helper, but one instance in the 5-level paging branch was missed.
GCC generates a R_X86_64_PC32 PC-relative relocation for the access which
doesn't trigger the issue, but Clang emmits a R_X86_64_32S which leads to
an invalid memory access and system reboot.
Fixes: 187e91fe5e ("x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Link: https://lkml.kernel.org/r/20190620112422.29264-1-kirill.shutemov@linux.intel.com
A kernel which boots in 5-level paging mode crashes in a small percentage
of cases if KASLR is enabled.
This issue was tracked down to the case when the kernel image unpacks in a
way that it crosses an 1G boundary. The crash is caused by an overrun of
the PMD page table in __startup_64() and corruption of P4D page table
allocated next to it. This particular issue is not visible with 4-level
paging as P4D page tables are not used.
But the P4D and the PUD calculation have similar problems.
The PMD index calculation is wrong due to operator precedence, which fails
to confine the PMDs in the PMD array on wrap around.
The P4D calculation for 5-level paging and the PUD calculation calculate
the first index correctly, but then blindly increment it which causes the
same issue when a kernel image is located across a 512G and for 5-level
paging across a 46T boundary.
This wrap around mishandling was introduced when these parts moved from
assembly to C.
Restore it to the correct behaviour.
Fixes: c88d71508e ("x86/boot/64: Rewrite startup_64() in C")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190620112345.28833-1-kirill.shutemov@linux.intel.com
Given that the entry_*.S changes for this functionality are somewhat
tricky, make sure the paths are tested every boot, instead of on the
rare occasion when we trip an INT3 while rewriting text.
Requested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that x86_32 has an unconditional gap on the kernel stack frame,
the int3_emulate_push() thing will work without further changes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently pt_regs on x86_32 has an oddity in that kernel regs
(!user_mode(regs)) are short two entries (esp/ss). This means that any
code trying to use them (typically: regs->sp) needs to jump through
some unfortunate hoops.
Change the entry code to fix this up and create a full pt_regs frame.
This then simplifies various trampolines in ftrace and kprobes, the
stack unwinder, ptrace, kdump and kgdb.
Much thanks to Josh for help with the cleanups!
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When CONFIG_FRAME_POINTER, we should mark pt_regs frames.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The kprobe trampolines have a FRAME_POINTER annotation that makes no
sense. It marks the frame in the middle of pt_regs, at the place of
saving BP.
Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
from the respective entry_*.S.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All the files added to 'targets' are cleaned. Adding the same file to both
'targets' and 'clean-files' is redundant.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190625073311.18303-1-yamada.masahiro@socionext.com
Without 'set -e', shell scripts continue running even after any
error occurs. The missed 'set -e' is a typical bug in shell scripting.
For example, when a disk space shortage occurs while this script is
running, it actually ends up with generating a truncated capflags.c.
Yet, mkcapflags.sh continues running and exits with 0. So, the build
system assumes it has succeeded.
It will not be re-generated in the next invocation of Make since its
timestamp is newer than that of any of the source files.
Add 'set -e' so that any error in this script is caught and propagated
to the build system.
Since 9c2af1c737 ("kbuild: add .DELETE_ON_ERROR special target"),
make automatically deletes the target on any failure. So, the broken
capflags.c will be deleted automatically.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20190625072622.17679-1-yamada.masahiro@socionext.com
Fix sparse warning:
arch/x86/kernel/jump_label.c:106:5: warning:
symbol 'tp_vec_nr' was not declared. Should it be static?
It's only used in jump_label.c, so make it static.
Fixes: ba54f0c3f7 ("x86/jump_label: Batch jump label updates")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <bp@alien8.de>
Cc: <hpa@zytor.com>
Cc: <peterz@infradead.org>
Cc: <bristot@redhat.com>
Cc: <namit@vmware.com>
Link: https://lkml.kernel.org/r/20190625034548.26392-1-yuehaibing@huawei.com
The perf fuzzer triggers a warning which map to:
if (WARN_ON_ONCE(idx >= ARRAY_SIZE(pt_regs_offset)))
return 0;
The bits between XMM registers and generic registers are reserved.
But perf_reg_validate() doesn't check these bits.
Add PERF_REG_X86_RESERVED for reserved bits on X86.
Check the reserved bits in perf_reg_validate().
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 878068ea27 ("perf/x86: Support outputting XMM registers")
Link: https://lkml.kernel.org/r/1559081314-9714-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta
that processor can stay in C0.1 or C0.2. A zero value means no maximum
time.
Each instruction sets its own deadline in the instruction's implicit
input EDX:EAX value. The instruction wakes up if the time-stamp counter
reaches or exceeds the specified deadline, or the umwait maximum time
expires, or a store happens in the monitored address range in umwait.
The administrator can write an unsigned 32-bit number to
/sys/devices/system/cpu/umwait_control/max_time to change the default
value. Note that a value of zero means there is no limit. The lower two
bits of the value must be zero.
[ tglx: Simplify the write function. Massage changelog ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Andy Lutomirski" <luto@kernel.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-5-git-send-email-fenghua.yu@intel.com
C0.2 state in umwait and tpause instructions can be enabled or disabled
on a processor through IA32_UMWAIT_CONTROL MSR register.
By default, C0.2 is enabled and the user wait instructions results in
lower power consumption with slower wakeup time.
But in real time systems which require faster wakeup time although power
savings could be smaller, the administrator needs to disable C0.2 and all
umwait invocations from user applications use C0.1.
Create a sysfs interface which allows the administrator to control C0.2
state during run time.
Andy Lutomirski suggested to turn off local irqs before writing the MSR to
ensure the cached control value is not changed by a concurrent sysfs write
from a different CPU via IPI.
[ tglx: Simplified the update logic in the write function and got rid of
all the convoluted type casts. Added a shared update function and
made the namespace consistent. Moved the sysfs create invocation.
Massaged changelog ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Andy Lutomirski" <luto@kernel.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-4-git-send-email-fenghua.yu@intel.com
umwait or tpause allows the processor to enter a light-weight
power/performance optimized state (C0.1 state) or an improved
power/performance optimized state (C0.2 state) for a period specified by
the instruction or until the system time limit or until a store to the
monitored address range in umwait.
IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
the processor and to set the maximum time the processor can reside in C0.1
or C0.2.
By default C0.2 is enabled so the user wait instructions can enter the
C0.2 state to save more power with slower wakeup time.
Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
default. A quote from Andy:
"What I want to avoid is the case where it works dramatically differently
on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
behave a bit differently if the max timeout is hit, and I'd like that
path to get exercised widely by making it happen even on default
configs."
A sysfs interface to adjust the time and the C0.2 enablement is provided in
a follow up change.
[ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
mask throughout the code.
Massaged comments and changelog ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
Using __clear_bit() and __cpumask_clear_cpu() is more efficient than using
their atomic counterparts.
Use them when atomicity is not needed, such as when manipulating bitmasks
that are on the stack.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190613064813.8102-10-namit@vmware.com
The x86 vDSO library requires some adaptations to take advantage of the
newly introduced generic vDSO library.
Introduce the following changes:
- Modification of vdso.c to be compliant with the common vdso datapage
- Use of lib/vdso for gettimeofday
[ tglx: Massaged changelog and cleaned up the function signature formatting ]
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Cc: Shijith Thotton <sthotton@marvell.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frascino@arm.com
Since commit 7d5905dc14 ("x86 / CPU: Always show current CPU frequency
in /proc/cpuinfo") open and read of /proc/cpuinfo sends IPI to all CPUs.
Many applications read /proc/cpuinfo at the start for trivial reasons like
counting cores or detecting cpu features. While sensitive workloads like
DPDK network polling don't like any interrupts.
Integrates this feature with cpu isolation and do not send IPIs to CPUs
without housekeeping flag HK_FLAG_MISC (set by nohz_full).
Code that requests cpu frequency like show_cpuinfo() falls back to the last
frequency set by the cpufreq driver if this method returns 0.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/155790354043.1104.15333317408370209.stgit@buzz
The left shift of unsigned int cpu_khz will overflow for large values of
cpu_khz, so cast it to a long long before shifting it to avoid overvlow.
For example, this can happen when cpu_khz is 4194305, i.e. ~4.2 GHz.
Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 8c3ba8d049 ("x86, apic: ack all pending irqs when crashed/on kexec")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20190619181446.13635-1-colin.king@canonical.com
Several recent exploits have used direct calls to the native_write_cr4()
function to disable SMEP and SMAP before then continuing their exploits
using userspace memory access.
Direct calls of this form can be mitigate by pinning bits of CR4 so that
they cannot be changed through a common function. This is not intended to
be a general ROP protection (which would require CFI to defend against
properly), but rather a way to avoid trivial direct function calling (or
CFI bypasses via a matching function prototype) as seen in:
https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
(https://github.com/xairy/kernel-exploits/tree/master/CVE-2017-7308)
The goals of this change:
- Pin specific bits (SMEP, SMAP, and UMIP) when writing CR4.
- Avoid setting the bits too early (they must become pinned only after
CPU feature detection and selection has finished).
- Pinning mask needs to be read-only during normal runtime.
- Pinning needs to be checked after write to validate the cr4 state
Using __ro_after_init on the mask is done so it can't be first disabled
with a malicious write.
Since these bits are global state (once established by the boot CPU and
kernel boot parameters), they are safe to write to secondary CPUs before
those CPUs have finished feature detection. As such, the bits are set at
the first cr4 write, so that cr4 write bugs can be detected (instead of
silently papered over). This uses a few bytes less storage of a location we
don't have: read-only per-CPU data.
A check is performed after the register write because an attack could just
skip directly to the register write. Such a direct jump is possible because
of how this function may be built by the compiler (especially due to the
removal of frame pointers) where it doesn't add a stack frame (function
exit may only be a retq without pops) which is sufficient for trivial
exploitation like in the timer overwrites mentioned above).
The asm argument constraints gain the "+" modifier to convince the compiler
that it shouldn't make ordering assumptions about the arguments or memory,
and treat them as changed.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: https://lkml.kernel.org/r/20190618045503.39105-3-keescook@chromium.org
Same as Intel, Zhaoxin MP CPUs support C3 share cache and on all
recent Zhaoxin platforms ARB_DISABLE is a nop. So set related
flags correctly in the same way as Intel does.
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "hpa@zytor.com" <hpa@zytor.com>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net>
Cc: "lenb@kernel.org" <lenb@kernel.org>
Cc: David Wang <DavidWang@zhaoxin.com>
Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com>
Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com>
Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com>
Link: https://lkml.kernel.org/r/a370503660994669991a7f7cda7c5e98@zhaoxin.com
Add x86 architecture support for new Zhaoxin processors.
Carve out initialization code needed by Zhaoxin processors into
a separate compilation unit.
To identify Zhaoxin CPU, add a new vendor type X86_VENDOR_ZHAOXIN
for system recognition.
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "hpa@zytor.com" <hpa@zytor.com>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net>
Cc: "lenb@kernel.org" <lenb@kernel.org>
Cc: David Wang <DavidWang@zhaoxin.com>
Cc: "Cooper Yan(BJ-RD)" <CooperYan@zhaoxin.com>
Cc: "Qiyuan Wang(BJ-RD)" <QiyuanWang@zhaoxin.com>
Cc: "Herry Yang(BJ-RD)" <HerryYang@zhaoxin.com>
Link: https://lkml.kernel.org/r/01042674b2f741b2aed1f797359bdffb@zhaoxin.com
The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.
One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.
Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
aux vector. AT_HWCAP2 is already used by PPC for similar purposes.
The application can access it open coded or by using the getauxval()
function in newer versions of glibc.
[ tglx: Massaged changelog ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com
Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com
When FSGSBASE is enabled, copying threads and reading fsbase and gsbase
using ptrace must read the actual values.
When copying a thread, use save_fsgs() and copy the saved values. For
ptrace, the bases must be read from memory regardless of the selector if
FSGSBASE is enabled.
[ tglx: Invoke __rdgsbase_inactive() with interrupts disabled ]
[ luto: Massage changelog ]
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-9-git-send-email-chang.seok.bae@intel.com
With the new FSGSBASE instructions, FS and GSABSE can be efficiently read
and writen in __switch_to(). Use that capability to preserve the full
state.
This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas. (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE if
WRGSBASE was used, but users are expected to read the CPU manual before
doing things like that.)
This is a considerable speedup. It seems to save about 100 cycles
per context switch compared to the baseline 4.6-rc1 behavior on a
Skylake laptop.
[ chang: 5~10% performance improvements were seen with a context switch
benchmark that ran threads with different FS/GSBASE values (to the
baseline 4.16). Minor edit on the changelog. ]
[ tglx: Masaage changelog ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-8-git-send-email-chang.seok.bae@intel.com
Add cpu feature conditional FSGSBASE access to the relevant helper
functions. That allows to accelerate certain FS/GS base operations in
subsequent changes.
Note, that while possible, the user space entry/exit GSBASE operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.
To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive()
helpers. Note, for Xen PV, paravirt hooks can be added later as they might
allow a very efficient but different implementation.
[ tglx: Massaged changelog ]
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com
This is temporary. It will allow the next few patches to be tested
incrementally.
Setting unsafe_fsgsbase is a root hole. Don't do it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com
When a ptracer writes a ptracee's FS/GSBASE with a different value, the
selector is also cleared. This behavior is not correct as the selector
should be preserved.
Update only the base value and leave the selector intact. To simplify the
code further remove the conditional checking for the same value as this
code is not performance critical.
The only recognizable downside of this change is when the selector is
already nonzero on write. The base will be reloaded according to the
selector. But the case is highly unexpected in real usages.
[ tglx: Massage changelog ]
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/9040CFCD-74BD-4C17-9A01-B9B713CF6B10@intel.com
While the DOC at the beginning of lib/bitmap.c explicitly states that
"The number of valid bits in a given bitmap does _not_ need to be an
exact multiple of BITS_PER_LONG.", some of the bitmap operations do
indeed access BITS_PER_LONG portions of the provided bitmap no matter
the size of the provided bitmap.
For example, if find_first_bit() is provided with an 8 bit bitmap the
operation will access BITS_PER_LONG bits from the provided bitmap. While
the operation ensures that these extra bits do not affect the result,
the memory is still accessed.
The capacity bitmasks (CBMs) are typically stored in u32 since they
can never exceed 32 bits. A few instances exist where a bitmap_*
operation is performed on a CBM by simply pointing the bitmap operation
to the stored u32 value.
The consequence of this pattern is that some bitmap_* operations will
access out-of-bounds memory when interacting with the provided CBM.
This same issue has previously been addressed with commit 49e00eee00
("x86/intel_rdt: Fix out-of-bounds memory access in CBM tests")
but at that time not all instances of the issue were fixed.
Fix this by using an unsigned long to store the capacity bitmask data
that is passed to bitmap functions.
Fixes: e651901187 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details")
Fixes: f4e80d67a5 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information")
Fixes: 95f0b77efa ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/58c9b6081fd9bf599af0dfc01a6fdd335768efef.1560975645.git.reinette.chatre@intel.com
AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.
BF16 is a short version of 32-bit single-precision floating-point
format (FP32) and has several advantages over 16-bit half-precision
floating-point format (FP16). BF16 keeps FP32 accumulation after
multiplication without loss of precision, offers more than enough
range for deep learning training tasks, and doesn't need to handle
hardware exception.
AVX512 BFLOAT16 instructions are enumerated in CPUID.7.1:EAX[bit 5]
AVX512_BF16.
CPUID.7.1:EAX contains only feature bits. Reuse the currently empty
word 12 as a pure features word to hold the feature bits including
AVX512_BF16.
Detailed information of the CPUID bit and AVX512 BFLOAT16 instructions
can be found in the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference.
[ bp: Check CPUID(7) subleaf validity before accessing subleaf 1. ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: Robert Hoo <robert.hu@linux.intel.com>
Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86 <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560794416-217638-3-git-send-email-fenghua.yu@intel.com
It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
whole feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be
added in word 11 in the future.
Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
Linux-defined leaf.
Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
name in the next patch when CPUID.7.1:EAX occupies world 12.
Maximum number of RMID and cache occupancy scale are retrieved from
CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
code into a separate function.
KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Babu Moger <babu.moger@amd.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86 <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
... into a separate function for better readability. Split out from a
patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
sole code movement separate for easy review.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
When SEV is active, the second kernel image is loaded into encrypted
memory. For that, make sure that when kexec builds the identity mapping
page table, the memory is encrypted (i.e., _PAGE_ENC is set).
[ bp: Sort local args and OR in _PAGE_ENC for more clarity. ]
Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: bhe@redhat.com
Cc: dyoung@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190430074421.7852-3-lijiang@redhat.com
When a virtual machine panics, its memory needs to be dumped for
analysis. With memory encryption in the picture, special care must be
taken when loading a kexec/kdump kernel in a SEV guest.
A SEV guest starts and runs fully encrypted. In order to load a kexec
kernel and initrd, arch_kexec_post_{alloc,free}_pages() need to not map
areas as decrypted unconditionally but differentiate whether the kernel
is running as a SEV guest and if so, leave kexec area encrypted.
[ bp: Reduce commit message to the relevant information pertaining to
this commit only. ]
Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: bhe@redhat.com
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: dyoung@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190430074421.7852-2-lijiang@redhat.com
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs, for example:
kexec -s -p xxx
the kernel does not pass the e820 reserved ranges to the second kernel,
which might cause two problems:
1. MMCONFIG: A device in PCI segment 1 cannot be discovered by the
kernel PCI probing without all the e820 I/O reservations being present
in the e820 table. Which is the case currently, because the kdump kernel
does not have those reservations because the kexec command does not pass
the I/O reservation via the "memmap=xxx" command line option.
Further details courtesy of Bjorn Helgaas¹: I think you should regard
correct MCFG/ECAM usage in the kdump kernel as a requirement. MMCONFIG
(aka ECAM) space is described in the ACPI MCFG table. If you don't have
ECAM:
(a) PCI devices won't work at all on non-x86 systems that use only
ECAM for config access,
(b) you won't be able to access devices on non-0 segments (granted,
there aren't very many of these yet, but there will be more in the
future), and
(c) you won't be able to access extended config space (addresses
0x100-0xfff), which means none of the Extended Capabilities will be
available (AER, ACS, ATS, etc).
2. The second issue is that the SME kdump kernel doesn't work without
the e820 reserved ranges. When SME is active in the kdump kernel, those
reserved regions are still decrypted, but because those reserved ranges
are not present at all in kdump kernel's e820 table, they are accessed
as encrypted. Which is obviously wrong.
[1]: https://lkml.kernel.org/r/CABhMZUUscS3jUZUSM5Y6EYJK6weo7Mjj5-EAKGvbw0qEe%2B38zw@mail.gmail.com
[ bp: Heavily massage commit message. ]
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bjorn.helgaas@gmail.com>
Cc: dave.hansen@linux.intel.com
Cc: Dave Young <dyoung@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Link: https://lkml.kernel.org/r/20190423013007.17838-4-lijiang@redhat.com
When executing the kexec_file_load() syscall, the first kernel needs to
pass the e820 reserved ranges to the second kernel because some devices
(PCI, for example) need them present in the kdump kernel for proper
initialization.
But the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources using the default IORES_DESC_NONE
descriptor, because there are several types of e820 ranges which are
marked IORES_DESC_NONE, see e820_type_to_iores_desc().
Therefore, add a new I/O resource descriptor called IORES_DESC_RESERVED
to mark exactly those ranges. It will be used to match the reserved
resource ranges when walking through iomem resources.
[ bp: Massage commit message. ]
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: bhe@redhat.com
Cc: dave.hansen@linux.intel.com
Cc: dyoung@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huang Zijiang <huang.zijiang@zte.com.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kexec@lists.infradead.org
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190423013007.17838-2-lijiang@redhat.com
In order for the kernel to be encrypted "in place" during boot, a workarea
outside of the kernel must be used. This SME workarea used during early
encryption of the kernel is situated on a 2MB boundary after the end of
the kernel text, data, etc. sections (_end).
This works well during initial boot of a compressed kernel because of
the relocation used for decompression of the kernel. But when performing
a kexec boot, there's a chance that the SME workarea may not be mapped
by the kexec pagetables or that some of the other data used by kexec
could exist in this range.
Create a section for SME in vmlinux.lds.S. Position it after "_end", which
is after "__end_of_kernel_reserve", so that the memory will be reclaimed
during boot and since this area is all zeroes, it compresses well. This
new section will be part of the kernel image, so kexec will account for it
in pagetable mappings and placement of data after the kernel.
Here's an example of a kernel size without and with the SME section:
without:
vmlinux: 36,501,616
bzImage: 6,497,344
100000000-47f37ffff : System RAM
1e4000000-1e47677d4 : Kernel code (0x7677d4)
1e47677d5-1e4e2e0bf : Kernel data (0x6c68ea)
1e5074000-1e5372fff : Kernel bss (0x2fefff)
with:
vmlinux: 44,419,408
bzImage: 6,503,136
880000000-c7ff7ffff : System RAM
8cf000000-8cf7677d4 : Kernel code (0x7677d4)
8cf7677d5-8cfe2e0bf : Kernel data (0x6c68ea)
8d0074000-8d0372fff : Kernel bss (0x2fefff)
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Lianbo Jiang <lijiang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael Ávila de Espíndola" <rafael@espindo.la>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/3c483262eb4077b1654b2052bd14a8d011bffde3.1560969363.git.thomas.lendacky@amd.com
The memory occupied by the kernel is reserved using memblock_reserve()
in setup_arch(). Currently, the area is from symbols _text to __bss_stop.
Everything after __bss_stop must be specifically reserved otherwise it
is discarded. This is not clearly documented.
Add a new symbol, __end_of_kernel_reserve, that more readily identifies
what is reserved, along with comments that indicate what is reserved,
what is discarded and what needs to be done to prevent a section from
being discarded.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Lianbo Jiang <lijiang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/7db7da45b435f8477f25e66f292631ff766a844c.1560969363.git.thomas.lendacky@amd.com
cpuinfo_x86.x86_model is an unsigned type, so comparing against zero
will generate a compilation warning:
arch/x86/kernel/cpu/cacheinfo.c: In function 'cacheinfo_amd_init_llc_id':
arch/x86/kernel/cpu/cacheinfo.c:662:19: warning: comparison is always true \
due to limited range of data type [-Wtype-limits]
Remove the unnecessary lower bound check.
[ bp: Massage. ]
Fixes: 68091ee7ac ("x86/CPU/AMD: Calculate last level cache ID from number of sharing threads")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1560954773-11967-1-git-send-email-cai@lca.pw
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
subject to gplv2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081204.018005938@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file is licensed under the gpl v2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 3 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204654.634736654@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 2 normalized pattern(s):
this source code is licensed under the gnu general public license
version 2 see the file copying for more details
this source code is licensed under general public license version 2
see
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 52 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.449021192@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent change moved the microcode loader hotplug callback into the early
startup phase which is running with interrupts disabled. It missed that
the callbacks invoke sysfs functions which might sleep causing nice 'might
sleep' splats with proper debugging enabled.
Split the callbacks and only load the microcode in the early startup phase
and move the sysfs handling back into the later threaded and preemptible
bringup phase where it was before.
Fixes: 78f4e932f7 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906182228350.1766@nanos.tec.linutronix.de
This function is only use by the core FPU code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604071524.12835-4-hch@lst.de
Merge two helpers into the main function, remove a pointless local
variable and flatten a conditional.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604071524.12835-3-hch@lst.de
Currently, the jump label of a static key is transformed via the arch
specific function:
void arch_jump_label_transform(struct jump_entry *entry,
enum jump_label_type type)
The new approach (batch mode) uses two arch functions, the first has the
same arguments of the arch_jump_label_transform(), and is the function:
bool arch_jump_label_transform_queue(struct jump_entry *entry,
enum jump_label_type type)
Rather than transforming the code, it adds the jump_entry in a queue of
entries to be updated. This functions returns true in the case of a
successful enqueue of an entry. If it returns false, the caller must to
apply the queue and then try to queue again, for instance, because the
queue is full.
This function expects the caller to sort the entries by the address before
enqueueuing then. This is already done by the arch independent code, though.
After queuing all jump_entries, the function:
void arch_jump_label_transform_apply(void)
Applies the changes in the queue.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/57b4caa654bad7e3b066301c9a9ae233dea065b5.1560325897.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, the patch of an address is done in three steps:
-- Pseudo-code #1 - Current implementation ---
1) add an int3 trap to the address that will be patched
sync cores (send IPI to all other CPUs)
2) update all but the first byte of the patched range
sync cores (send IPI to all other CPUs)
3) replace the first byte (int3) by the first byte of replacing opcode
sync cores (send IPI to all other CPUs)
-- Pseudo-code #1 ---
When a static key has more than one entry, these steps are called once for
each entry. The number of IPIs then is linear with regard to the number 'n' of
entries of a key: O(n*3), which is O(n).
This algorithm works fine for the update of a single key. But we think
it is possible to optimize the case in which a static key has more than
one entry. For instance, the sched_schedstats jump label has 56 entries
in my (updated) fedora kernel, resulting in 168 IPIs for each CPU in
which the thread that is enabling the key is _not_ running.
With this patch, rather than receiving a single patch to be processed, a vector
of patches is passed, enabling the rewrite of the pseudo-code #1 in this
way:
-- Pseudo-code #2 - This patch ---
1) for each patch in the vector:
add an int3 trap to the address that will be patched
sync cores (send IPI to all other CPUs)
2) for each patch in the vector:
update all but the first byte of the patched range
sync cores (send IPI to all other CPUs)
3) for each patch in the vector:
replace the first byte (int3) by the first byte of replacing opcode
sync cores (send IPI to all other CPUs)
-- Pseudo-code #2 - This patch ---
Doing the update in this way, the number of IPI becomes O(3) with regard
to the number of keys, which is O(1).
The batch mode is done with the function text_poke_bp_batch(), that receives
two arguments: a vector of "struct text_to_poke", and the number of entries
in the vector.
The vector must be sorted by the addr field of the text_to_poke structure,
enabling the binary search of a handler in the poke_int3_handler function
(a fast path).
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/ca506ed52584c80f64de23f6f55ca288e5d079de.1560325897.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the definition of the code to be written from
__jump_label_transform() to a specialized function. No functional
change.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Wood <swood@redhat.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/d2f52a0010ecd399cf9b02a65bcf5836571b9e52.1560325897.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove two little helpers and merge them into kernel_fpu_end() to
streamline the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604071524.12835-2-hch@lst.de
Pull x86 fixes from Thomas Gleixner:
"The accumulated fixes from this and last week:
- Fix vmalloc TLB flush and map range calculations which lead to
stale TLBs, spurious faults and other hard to diagnose issues.
- Use fault_in_pages_writable() for prefaulting the user stack in the
FPU code as it's less fragile than the current solution
- Use the PF_KTHREAD flag when checking for a kernel thread instead
of current->mm as the latter can give the wrong answer due to
use_mm()
- Compute the vmemmap size correctly for KASLR and 5-Level paging.
Otherwise this can end up with a way too small vmemmap area.
- Make KASAN and 5-level paging work again by making sure that all
invalid bits are masked out when computing the P4D offset. This
worked before but got broken recently when the LDT remap area was
moved.
- Prevent a NULL pointer dereference in the resource control code
which can be triggered with certain mount options when the
requested resource is not available.
- Enforce ordering of microcode loading vs. perf initialization on
secondary CPUs. Otherwise perf tries to access a non-existing MSR
as the boot CPU marked it as available.
- Don't stop the resource control group walk early otherwise the
control bitmaps are not updated correctly and become inconsistent.
- Unbreak kgdb by returning 0 on success from
kgdb_arch_set_breakpoint() instead of an error code.
- Add more Icelake CPU model defines so depending changes can be
queued in other trees"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback
x86/kasan: Fix boot with 5-level paging and KASAN
x86/fpu: Don't use current->mm to check for a kthread
x86/kgdb: Return 0 from kgdb_arch_set_breakpoint()
x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled
x86/resctrl: Don't stop walking closids when a locksetup group is found
x86/fpu: Update kernel's FPU state before using for the fsave header
x86/mm/KASLR: Compute the size of the vmemmap section properly
x86/fpu: Use fault_in_pages_writeable() for pre-faulting
x86/CPU: Add more Icelake model numbers
mm/vmalloc: Avoid rare case of flushing TLB with weird arguments
mm/vmalloc: Fix calculation of direct map addr range
Adric Blake reported the following warning during suspend-resume:
Enabling non-boot CPUs ...
x86: Booting SMP configuration:
smpboot: Booting Node 0 Processor 1 APIC 0x2
unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
Call Trace:
intel_set_tfa
intel_pmu_cpu_starting
? x86_pmu_dead_cpu
x86_pmu_starting_cpu
cpuhp_invoke_callback
? _raw_spin_lock_irqsave
notify_cpu_starting
start_secondary
secondary_startup_64
microcode: sig=0x806ea, pf=0x80, revision=0x96
microcode: updated to revision 0xb4, date = 2019-04-01
CPU1 is up
The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
by microcode. The log above shows that the microcode loader callback
happens after the PMU restoration, leading to the conjecture that
because the microcode hasn't been updated yet, that MSR is not present
yet, leading to the #GP.
Add a microcode loader-specific hotplug vector which comes before
the PERF vectors and thus executes earlier and makes sure the MSR is
present.
Fixes: 400816f60c ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Adric Blake <promarbler14@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: x86@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlz8fAYeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1asH/3ySguxqtqL1MCBa
4/SZ37PHeWKMerfX6ZyJdgEqK3B+PWlmuLiOMNK5h2bPLzeQQQAmHU/mfKmpXqgB
dHwUbG9yNnyUtTfsfRqAnCA6vpuw9Yb1oIzTCVQrgJLSWD0j7scBBvmzYqguOkto
ThwigLUq3AILr8EfR4rh+GM+5Dn9OTEFAxwil9fPHQo7QoczwZxpURhScT6Co9TB
DqLA3fvXbBvLs/CZy/S5vKM9hKzC+p39ApFTURvFPrelUVnythAM0dPDJg3pIn5u
g+/+gDxDFa+7ANxvxO2ng1sJPDqJMeY/xmjJYlYyLpA33B7zLNk2vDHhAP06VTtr
XCMhQ9s=
=cb80
-----END PGP SIGNATURE-----
Merge tag 'v5.2-rc4' into mauro
We need to pick up post-rc1 changes to various document files so they don't
get lost in Mauro's massive RST conversion push.
Fix the following sparse warning:
arch/x86/kernel/amd_nb.c:74:28: warning:
symbol 'hygon_nb_misc_ids' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Brian Woods <Brian.Woods@amd.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190614155441.22076-1-yuehaibing@huawei.com
The inline keyword was not at the beginning of the function declarations.
Fix the following warnings triggered when using W=1:
arch/x86/kernel/tsc.c:62:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
arch/x86/kernel/tsc.c:79:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Cc: kernel-janitors@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190524103252.28575-1-malat@debian.org
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
The only way this can fail is if:
* debugfs superblock can not be pinned - something really went wrong with the
vfs layer.
* file is created with same name - the caller's fault.
* new_inode() fails - happens if memory is exhausted.
so failing to clean up debugfs properly is the least of the system's
sproblems in uch a situation.
[ bp: Extend commit message, remove unused err var in inject_init(). ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190612151531.GA16278@kroah.com
current->mm can be non-NULL if a kthread calls use_mm(). Check for
PF_KTHREAD instead to decide when to store user mode FP state.
Fixes: 2722146eb7 ("x86/fpu: Remove fpu->initialized")
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190604175411.GA27477@lst.de
err must be nonzero in order to reach text_poke(), which caused kgdb to
fail to set breakpoints:
(gdb) break __x64_sys_sync
Breakpoint 1 at 0xffffffff81288910: file ../fs/sync.c, line 124.
(gdb) c
Continuing.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0xffffffff81288910
Command aborted.
Fixes: 86a2205712 ("x86/kgdb: Avoid redundant comparison of patched code")
Signed-off-by: Matt Mullins <mmullins@fb.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nadav Amit <namit@vmware.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190531194755.6320-1-mmullins@fb.com
AVX-512 components usage can result in turbo frequency drop. So it's useful
to expose AVX-512 usage elapsed time as a heuristic hint for user space job
schedulers to cluster the AVX-512 using tasks together.
Examples:
$ while [ 1 ]; do cat /proc/tid/arch_status | grep AVX512; sleep 1; done
AVX512_elapsed_ms: 4
AVX512_elapsed_ms: 8
AVX512_elapsed_ms: 4
This means that 4 milliseconds have elapsed since the tsks AVX512 usage was
detected when the task was scheduled out.
$ cat /proc/tid/arch_status | grep AVX512
AVX512_elapsed_ms: -1
'-1' indicates that no AVX512 usage was recorded for this task.
The time exposed is not necessarily accurate when the arch_status file is
read as the AVX512 usage is only evaluated when a task is scheduled
out. Accurate usage information can be obtained with performance counters.
[ tglx: Massaged changelog ]
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: peterz@infradead.org
Cc: hpa@zytor.com
Cc: ak@linux.intel.com
Cc: tim.c.chen@linux.intel.com
Cc: dave.hansen@intel.com
Cc: arjan@linux.intel.com
Cc: adobriyan@gmail.com
Cc: aubrey.li@intel.com
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190606012236.9391-2-aubrey.li@linux.intel.com
Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and
executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl" results in
a NULL pointer dereference on systems which do not have local MBM support
enabled..
BUG: kernel NULL pointer dereference, address: 0000000000000020
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 #2
Workqueue: events mbm_handle_overflow
RIP: 0010:mbm_handle_overflow+0x150/0x2b0
Only enter the bandwith update loop if the system has local MBM enabled.
Fixes: de73f38f76 ("x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190610171544.13474-1-prarit@redhat.com
When a new control group is created __init_one_rdt_domain() walks all
the other closids to calculate the sets of used and unused bits.
If it discovers a pseudo_locksetup group, it breaks out of the loop. This
means any later closid doesn't get its used bits added to used_b. These
bits will then get set in unused_b, and added to the new control group's
configuration, even if they were marked as exclusive for a later closid.
When encountering a pseudo_locksetup group, we should continue. This is
because "a resource group enters 'pseudo-locked' mode after the schemata is
written while the resource group is in 'pseudo-locksetup' mode." When we
find a pseudo_locksetup group, its configuration is expected to be
overwritten, we can skip it.
Fixes: dfe9674b04 ("x86/intel_rdt: Enable entering of pseudo-locksetup mode")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H Peter Avin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190603172531.178830-1-james.morse@arm.com
Use the HYPERVISOR_CALLBACK_VECTOR to notify an ACRN guest.
Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1559108037-18813-4-git-send-email-yakui.zhao@intel.com
ACRN is an open-source hypervisor maintained by The Linux Foundation. It
is built for embedded IOT with small footprint and real-time features.
Add ACRN guest support so that it allows Linux to be booted under the
ACRN hypervisor. This adds only the barebones implementation.
[ bp: Massage commit message and help text. ]
Co-developed-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Jason Chen CJ <jason.cj.chen@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1559108037-18813-3-git-send-email-yakui.zhao@intel.com
Add a special Kconfig symbol X86_HV_CALLBACK_VECTOR so that the guests
using the hypervisor interrupt callback counter can select and thus
enable that counter. Select it when xen or hyperv support is enabled. No
functional changes.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: linux-hyperv@vger.kernel.org
Cc: Nicolai Stange <nstange@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/1559108037-18813-2-git-send-email-yakui.zhao@intel.com
The OS is expected to write all bits to MCA_CTL for each bank,
thus enabling error reporting in all banks. However, some banks
may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
bits because of quirks, etc.
A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.
Set a bank's init value based on if the control bits are set or not in
hardware. Return an error code in the sysfs interface for uninitialized
banks.
Do a final bank init check in a separate function which is not part of
any user-controlled code flows. This is so a user may enable/disable a
bank during runtime without having to restart their system.
[ bp: Massage a bit. Discover bank init state at boot. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190607201752.221446-6-Yazen.Ghannam@amd.com
The number of MCA banks is provided per logical CPU. Historically, this
number has been the same across all CPUs, but this is not an
architectural guarantee. Future AMD systems may have MCA bank counts
that vary between logical CPUs in a system.
This issue was partially addressed in
006c077041 ("x86/mce: Handle varying MCA bank counts")
by allocating structures using the maximum number of MCA banks and by
saving the maximum MCA bank count in a system as the global count. This
means that some extra structures are allocated. Also, this means that
CPUs will spend more time in the #MC and other handlers checking extra
MCA banks.
Thus, define the number of MCA banks as a per-CPU variable.
[ bp: Make mce_num_banks an unsigned int. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190607201752.221446-5-Yazen.Ghannam@amd.com
On legacy systems, the addresses of the MCA_MISC* registers need to be
recursively discovered based on a Block Pointer field in the registers.
On Scalable MCA systems, the register space is fixed, and particular
addresses can be derived by regular offsets for bank and register type.
This fixed address space includes the MCA_MISC* registers.
MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through
MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1.
Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This
needs to be done only during init. The values should be saved per CPU
to accommodate heterogeneous SMCA systems.
Redo smca_get_block_address() to directly return the block addresses.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190607201752.221446-4-Yazen.Ghannam@amd.com
Current AMD systems have unique MCA banks per logical CPU even though
the type of the banks may all align to the same bank number. Each CPU
will have control of a set of MCA banks in the hardware and these are
not shared with other CPUs.
For example, bank 0 may be the Load-Store Unit on every logical CPU, but
each bank 0 is a unique structure in the hardware. In other words, there
isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
share.
This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
may see a Unified Memory Controller at bank 15, but each CPU is actually
seeing a unique hardware structure that is not shared with other CPUs.
Because the MCA banks are all unique hardware structures, it would be
good to control them in a more granular way. For example, if there is a
known issue with the Floating Point Unit on CPU5 and a user wishes to
disable an error type on the Floating Point Unit, then it would be good
to do this only for CPU5 rather than all CPUs.
Also, future AMD systems may have heterogeneous MCA banks. Meaning
the bank numbers may not necessarily represent the same types between
CPUs. For example, bank 20 visible to CPU0 may be a Unified Memory
Controller and bank 20 visible to CPU4 may be a Coherent Slave. So
granular control will be even more necessary should the user wish to
control specific MCA banks.
Split the device attributes from struct mce_bank leaving only the MCA
bank control fields.
Make struct mce_banks[] per_cpu in order to have more granular control
over individual MCA banks in the hardware.
Allocate the device attributes statically based on the maximum number of
MCA banks supported. The sysfs interface will use as many as needed per
CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
per_cpu bank count in a future patch.
Allocate the MCA control bits statically. This is in order to avoid
locking warnings when memory is allocated during secondary CPUs' init
sequences.
Also, remove the now unnecessary return values from
__mcheck_cpu_mce_banks_init() and __mcheck_cpu_cap_init().
Redo the sysfs store/show functions to handle the per_cpu mce_banks[].
[ bp: s/mce_banks_percpu/mce_banks_array/g ]
[ Locking issue reported by ]
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190607201752.221446-3-Yazen.Ghannam@amd.com
The struct mce_banks[] array is only used in mce/core.c so move its
definition there and make it static. Also, change the "init" field to
bool type.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190607201752.221446-2-Yazen.Ghannam@amd.com
Use the _ASM_BX macro which expands to either %rbx or %ebx, depending on
the 32-bit or 64-bit config selected.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190606200044.5730-1-ubizjak@gmail.com
With the recent addition of RSDP parsing in the decompression stage,
a kexec-ed kernel now needs ACPI tables to be covered by the identity
mapping. And in commit
6bbeb276b7 ("x86/kexec: Add the EFI system tables and ACPI tables to the ident map")
the ACPI tables memory region was added to the ident map.
But some machines have only an ACPI NVS memory region and the ACPI
tables are located in that region. In such case, the kexec-ed kernel
will still fail when trying to access ACPI tables if they're not mapped.
So add the NVS memory region to the ident map as well.
[ bp: Massage. ]
Fixes: 6bbeb276b7 ("x86/kexec: Add the EFI system tables and ACPI tables to the ident map")
Suggested-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: kexec@lists.infradead.org
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190610073617.19767-1-kasong@redhat.com
Another round of SPDX header file fixes for 5.2-rc4
These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.
We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0
I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuGTg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykBvQCg2SG+HmDH+tlwKLT/q7jZcLMPQigAoMpt9Uuy
sxVEiFZo8ZU9v1IoRb1I
=qU++
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull yet more SPDX updates from Greg KH:
"Another round of SPDX header file fixes for 5.2-rc4
These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.
We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0
I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through"
* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
...
Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
In commit
39388e80f9 ("x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()")
I removed the statement
| if (ia32_fxstate)
| copy_fxregs_to_kernel(fpu);
and argued that it was wrongly merged because the content was already
saved in kernel's state.
This was wrong: It is required to write it back because it is only
saved on the user-stack and save_fsave_header() reads it from task's
FPU-state. I missed that part…
Save x87 FPU state unless thread's FPU registers are already up to date.
Fixes: 39388e80f9 ("x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190607142915.y52mfmgk5lvhll7n@linutronix.de
Fix a couple of s/poped/popped/ typos.
Signed-off-by: George G. Davis <george_davis@mentor.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Currently, only the whole physical memory is identity-mapped for the
kexec kernel and the regions reserved by firmware are ignored.
However, the recent addition of RSDP parsing in the decompression stage
and especially:
33f0df8d84 ("x86/boot: Search for RSDP in the EFI tables")
which tries to access EFI system tables and to dig out the RDSP address
from there, becomes a problem because in certain configurations, they
might not be mapped in the kexec'ed kernel's address space.
What is more, this problem doesn't appear on all systems because the
kexec kernel uses gigabyte pages to build the identity mapping. And
the EFI system tables and ACPI tables can, depending on the system
configuration, end up being mapped as part of all physical memory, if
they share the same 1 GB area with the physical memory.
Therefore, make sure they're always mapped.
[ bp: productize half-baked patch:
- rewrite commit message.
- correct the map_acpi_tables() function name in the !ACPI case. ]
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Cc: dyoung@redhat.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: j-nomura@ce.jp.nec.com
Cc: kexec@lists.infradead.org
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190429002318.GA25400@MiWiFi-R3L-srv
Since commit
d9c9ce34ed ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
get_user_pages_unlocked() pre-faults user's memory if a write generates
a page fault while the handler is disabled.
This works in general and uncovered a bug as reported by Mike
Rapoport¹. It has been pointed out that this function may be fragile
and a simple pre-fault as in fault_in_pages_writeable() would be a
better solution. Better as in taste and simplicity: that write (as
performed by the alternative function) performs exactly the same
faulting of memory as before. This was suggested by Hugh Dickins and
Andrew Morton.
Use fault_in_pages_writeable() for pre-faulting user's stack.
[ bigeasy: Write commit message. ]
[ bp: Massage some. ]
¹ https://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com
Fixes: d9c9ce34ed ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190529072540.g46j4kfeae37a3iu@linutronix.de
Link: https://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com
While the TIF_SYSCALL_EMU is set in ptrace_resume independent of any
architecture, currently only powerpc and x86 unset the TIF_SYSCALL_EMU
flag in ptrace_disable which gets called from ptrace_detach.
Let's move the clearing of TIF_SYSCALL_EMU flag to __ptrace_unlink
which gets executed from ptrace_detach and also keep it along with
or close to clearing of TIF_SYSCALL_TRACE.
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 of the license
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 315 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file is licensed under gplv2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 22 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190115.129548190@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
distribute under gplv2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 8 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.475576622@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file is released under the gplv2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 68 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
licensed under the terms of the gnu general public license version 2
see file copying for details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081035.403801661@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
your use of this code is subject to the terms and conditions of the
gnu general public license version 2 see copying or http www gnu org
licenses gpl html
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 3 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000437.701946635@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
use of this code is subject to the terms and conditions of the gnu
general public license version 2 see copying or http www gnu org
licenses gpl html
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000437.611918838@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 111 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.567572064@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 59 temple place suite 330 boston ma 02111
1307 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 136 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.384967451@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 263 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit:
4b53a3412d ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper")
the tsk_nr_cpus_allowed() wrapper was removed. There was not
much difference in !RT but in RT we used this to implement
migrate_disable(). Within a migrate_disable() section the CPU mask is
restricted to single CPU while the "normal" CPU mask remains untouched.
As an alternative implementation Ingo suggested to use:
struct task_struct {
const cpumask_t *cpus_ptr;
cpumask_t cpus_mask;
};
with
t->cpus_ptr = &t->cpus_mask;
In -RT we then can switch the cpus_ptr to:
t->cpus_ptr = &cpumask_of(task_cpu(p));
in a migration disabled region. The rules are simple:
- Code that 'uses' ->cpus_allowed would use the pointer.
- Code that 'modifies' ->cpus_allowed would use the direct mask.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Two fixes: a quirk for KVM guests running on certain AMD CPUs, and a
KASAN related build fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
x86/boot: Provide KASAN compatible aliases for string routines
Pull integrity subsystem fixes from Mimi Zohar:
"Four bug fixes, none 5.2-specific, all marked for stable.
The first two are related to the architecture specific IMA policy
support. The other two patches, one is related to EVM signatures,
based on additional hash algorithms, and the other is related to
displaying the IMA policy"
* 'next-fixes-for-5.2-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: show rules with IMA_INMASK correctly
evm: check hash algorithm passed to init_desc()
ima: fix wrong signed policy requirement when not appraising
x86/ima: Check EFI_RUNTIME_SERVICES before using
Based on 1 normalized pattern(s):
subject to the gnu public license v 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 9 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171440.130801526@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
subject to the gnu general public license v2 only
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171439.275006521@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license v 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 45 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file may be distributed under the terms of the gnu general
public license version 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 9 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070034.395589349@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 3 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [kishon] [vijay] [abraham]
[i] [kishon]@[ti] [com] this program is distributed in the hope that
it will be useful but without any warranty without even the implied
warranty of merchantability or fitness for a particular purpose see
the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [graeme] [gregory]
[gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
[kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
[hk] [hemahk]@[ti] [com] this program is distributed in the hope
that it will be useful but without any warranty without even the
implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1105 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1334 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation inc 675 mass ave cambridge ma 02139 usa
either version 2 of the license or at your option any later version
incorporated herein by reference
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 4 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190524100844.465381181@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 9ed0985332 ("x86: intel_epb: Take CONFIG_PM into account")
prevented the majority of the Performance and Energy Bias Hint (EPB)
handling code from being built when CONFIG_PM is unset to fix a
regression introduced by commit b9c273babc ("PM / arch: x86:
MSR_IA32_ENERGY_PERF_BIAS sysfs interface").
In hindsight, however, it would be better to skip all of the EPB
handling code for CONFIG_PM unset as there really is no reason for
it to be there in that case. Namely, if the EPB is not touched
by the kernel at all with CONFIG_PM unset, there is no need to
worry about modifying the EPB inadvertently on CPU online and since
the system will not suspend or hibernate then, there is no need to
worry about possible modifications of the EPB by the platform
firmware during system-wide PM transitions.
For this reason, revert the changes made by commit 9ed0985332
and only allow intel_epb.o to be built when CONFIG_PM is set.
Note that this changes the behavior of the kernels built with
CONFIG_PM unset as they will not modify the EPB on boot if it is
zero initially any more, so it is not a fix strictly speaking, but
users building their kernels with CONFIG_PM unset really should not
expect them to take energy efficiency into account. Moreover, if
CONFIG_PM is unset for performance reasons, leaving EPB as set
initially by the platform firmware will actually be consistent
with the user's expectations.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.
The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.
The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:
force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Update the calls of force_sig_fault that pass in a variable that is
set to current earlier to explicitly use current.
This is to make the next change that removes the task parameter
from force_sig_fault easier to verify.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The send_sigtrap function is always called with task == current. Make
that explicit by removing the task parameter.
This also makes it clear that the x86 send_sigtrap passes current
into force_sig_fault.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
All of the remaining callers pass current into force_sig so
remove the task parameter to make this obvious and to make
misuse more difficult in the future.
This also makes it clear force_sig passes current into force_sig_info.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 or at your option any
later version this program is distributed in the hope that it will
be useful but without any warranty without even the implied warranty
of merchantability or fitness for a particular purpose see the gnu
general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 44 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190523091651.032047323@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
51 franklin st fifth floor boston ma 02110 1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 50 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190523091649.499889647@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this code is released under the gnu general public license version 2
or later
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520075211.232210963@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For F17h AMD CPUs, the CPB capability ('Core Performance Boost') is forcibly set,
because some versions of that chip incorrectly report that they do not have it.
However, a hypervisor may filter out the CPB capability, for good
reasons. For example, KVM currently does not emulate setting the CPB
bit in MSR_K7_HWCR, and unchecked MSR access errors will be thrown
when trying to set it as a guest:
unchecked MSR access error: WRMSR to 0xc0010015 (tried to write 0x0000000001000011) at rIP: 0xffffffff890638f4 (native_write_msr+0x4/0x20)
Call Trace:
boost_set_msr+0x50/0x80 [acpi_cpufreq]
cpuhp_invoke_callback+0x86/0x560
sort_range+0x20/0x20
cpuhp_thread_fun+0xb0/0x110
smpboot_thread_fn+0xef/0x160
kthread+0x113/0x130
kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x35/0x40
To avoid this issue, don't forcibly set the CPB capability for a CPU
when running under a hypervisor.
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: jiaxun.yang@flygoat.com
Fixes: 0237199186 ("x86/CPU/AMD: Set the CPB bit unconditionally on F17h")
Link: http://lkml.kernel.org/r/20190522221745.GA15789@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com
[ Minor edits to the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since commit:
21d375b6b3 ("x86/entry/64: Remove the SYSCALL64 fast path")
there is no user of TASK_TI_flags in assembly. There's no need to
keep it around in asm-offsets.c
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190523102325.22eacdf7@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
CONFIG_IO_DELAY_TYPE_* are not kernel configuration at all. They just
define constant values, 0, 1, 2, and 3. Define them by #define in C.
CONFIG_DEFAULT_IO_DELAY_TYPE can also be defined in C by using #ifdef
and #define directives.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190521072211.21014-2-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current code is fine since 'case CONFIG_IO_DELAY_TYPE_NONE'
does nothing, but scripts/checkpatch.pl complains about this:
warning: Possible switch case/default not preceded by break or fallthrough comment
I like break statement better than a fallthrough comment here.
It avoids the warning and clarify the code.
No behavior change is intended.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20190521072211.21014-1-yamada.masahiro@socionext.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Create CPU topology sysfs attributes: "core_cpus" and "core_cpus_list"
These attributes represent all of the logical CPUs that share the
same core.
These attriutes is synonymous with the existing "thread_siblings" and
"thread_siblings_list" attribute, which will be deprecated.
Create CPU topology sysfs attributes: "die_cpus" and "die_cpus_list".
These attributes represent all of the logical CPUs that share the
same die.
Suggested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/071c23a298cd27ede6ed0b6460cae190d193364f.1557769318.git.len.brown@intel.com
topology_max_packages() is available to size resources to cover all
packages in the system.
But now multi-die/package systems are coming up, and some resources are
per-die.
Create topology_max_die_per_package(), for detecting multi-die/package
systems, and sizing any per-die resources.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/e6eaf384571ae52ac7d0ca41510b7fb7d2fda0e4.1557769318.git.len.brown@intel.com
Some new systems have multiple software-visible die within each package.
Update Linux parsing of the Intel CPUID "Extended Topology Leaf" to handle
either CPUID.B, or the new CPUID.1F.
Add cpuinfo_x86.die_id and cpuinfo_x86.max_dies to store the result.
die_id will be non-zero only for multi-die/package systems.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: https://lkml.kernel.org/r/7b23d2d26d717b8e14ba137c94b70943f1ae4b5c.1557769318.git.len.brown@intel.com
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation 51
franklin street fifth floor boston ma 02110 1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 2 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154042.432790911@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not see http www gnu org licenses
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details [based]
[from] [clk] [highbank] [c] you should have received a copy of the
gnu general public license along with this program if not see http
www gnu org licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 355 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Checking efi_enabled(EFI_BOOT) is not sufficient to ensure that
EFI runtime services are available, e.g. if efi=noruntime is used.
Without this, I get an oops on a PREEMPT_RT kernel where efi=noruntime is
the default.
Fixes: 399574c64e ("x86/ima: retry detecting secure boot mode")
Cc: stable@vger.kernel.org (linux-5.0)
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].
To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.
Having whitespaces after -I does not matter since commit 48f6e3cf5b
("kbuild: do not drop -I without parameter").
[1]: https://patchwork.kernel.org/patch/9632347/
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
* POWER: support for direct access to the POWER9 XIVE interrupt controller,
memory and performance optimizations.
* x86: support for accessing memory not backed by struct page, fixes and refactoring
* Generic: dirty page tracking improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJc3qV/AAoJEL/70l94x66Dn3QH/jX1Bn0P/RZAIt4w0SySklSg
PqxUKDyBQqB9vN9Qeb9jWXAKPH2CtM3+up/rz7oRnBWp7qA6vXcC/R/QJYAvzdXE
nklsR/oYCsflR1KdlVYuDvvPCPP2fLBU5zfN83OsaBQ8fNRkm3gN+N5XQ2SbXbLy
Mo9tybS4otY201UAC96e8N0ipwwyCRpDneQpLcl+F5nH3RBt63cVbs04O+70MXn7
eT4I+8K3+Go7LATzT8hglD21D/7uvE31qQb6yr5L33IfhU4GB51RZzBXTNaAdY8n
hT1rMrRkAMAFWYZPQDfoMadjWU3i5DIfstKjDxOr9oTfuOEp5Z+GvJwvVnUDg1I=
=D0+p
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for SVE and Pointer Authentication in guests
- PMU improvements
POWER:
- support for direct access to the POWER9 XIVE interrupt controller
- memory and performance optimizations
x86:
- support for accessing memory not backed by struct page
- fixes and refactoring
Generic:
- dirty page tracking improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits)
kvm: fix compilation on aarch64
Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU"
kvm: x86: Fix L1TF mitigation for shadow MMU
KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible
KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device
KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing"
KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs
kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete
tests: kvm: Add tests for KVM_SET_NESTED_STATE
KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state
tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID
tests: kvm: Add tests to .gitignore
KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one
KVM: Fix the bitmap range to copy during clear dirty
KVM: arm64: Fix ptrauth ID register masking logic
KVM: x86: use direct accessors for RIP and RSP
KVM: VMX: Use accessors for GPRs outside of dedicated caching logic
KVM: x86: Omit caching logic for always-available GPRs
kvm, x86: Properly check whether a pfn is an MMIO or not
...
Pull x86 fixes from Ingo Molnar:
"Misc fixes and updates:
- a handful of MDS documentation/comment updates
- a cleanup related to hweight interfaces
- a SEV guest fix for large pages
- a kprobes LTO fix
- and a final cleanup commit for vDSO HPET support removal"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/mds: Improve CPU buffer clear documentation
x86/speculation/mds: Revert CPU buffer clear on double fault exit
x86/kconfig: Disable CONFIG_GENERIC_HWEIGHT and remove __HAVE_ARCH_SW_HWEIGHT
x86/mm: Do not use set_{pud, pmd}_safe() when splitting a large page
x86/kprobes: Make trampoline_handler() global and visible
x86/vdso: Remove hpet_page from vDSO
The double fault ESPFIX path doesn't return to user mode at all --
it returns back to the kernel by simulating a #GP fault.
prepare_exit_to_usermode() will run on the way out of
general_protection before running user code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jon Masters <jcm@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 04dcbdb805 ("x86/speculation/mds: Clear CPU buffers on exit to user")
Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Removing of non-DYNAMIC_FTRACE from 32bit x86
- Removing of mcount support from x86
- Emulating a call from int3 on x86_64, fixes live kernel patching
- Consolidated Tracing Error logs file
Minor updates:
- Removal of klp_check_compiler_support()
- kdb ftrace dumping output changes
- Accessing and creating ftrace instances from inside the kernel
- Clean up of #define if macro
- Introduction of TRACE_EVENT_NOP() to disable trace events based on config
options
And other minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r
ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw=
=RVQo
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The major changes in this tracing update includes:
- Removal of non-DYNAMIC_FTRACE from 32bit x86
- Removal of mcount support from x86
- Emulating a call from int3 on x86_64, fixes live kernel patching
- Consolidated Tracing Error logs file
Minor updates:
- Removal of klp_check_compiler_support()
- kdb ftrace dumping output changes
- Accessing and creating ftrace instances from inside the kernel
- Clean up of #define if macro
- Introduction of TRACE_EVENT_NOP() to disable trace events based on
config options
And other minor fixes and clean ups"
* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
x86: Hide the int3_emulate_call/jmp functions from UML
livepatch: Remove klp_check_compiler_support()
ftrace/x86: Remove mcount support
ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
tracing: Simplify "if" macro code
tracing: Fix documentation about disabling options using trace_options
tracing: Replace kzalloc with kcalloc
tracing: Fix partial reading of trace event's id file
tracing: Allow RCU to run between postponed startup tests
tracing: Fix white space issues in parse_pred() function
tracing: Eliminate const char[] auto variables
ring-buffer: Fix mispelling of Calculate
tracing: probeevent: Fix to make the type of $comm string
tracing: probeevent: Do not accumulate on ret variable
tracing: uprobes: Re-enable $comm support for uprobe events
ftrace/x86_64: Emulate call function while updating in breakpoint handler
x86_64: Allow breakpoints to emulate call instructions
x86_64: Add gap to int3 to allow for call emulation
tracing: kdb: Allow ftdump to skip all but the last few entries
tracing: Add trace_total_entries() / trace_total_entries_cpu()
...
- Fix recent regression causing kernels built with CONFIG_PM
unset to crash on systems that support the Performance and
Energy Bias Hint (EPB) by avoiding to compile the EPB-related
code depending on CONFIG_PM when it is unset (Rafael Wysocki).
- Clean up the transition notifier invocation code in the cpufreq
core and change some users of cpufreq transition notifiers
accordingly (Viresh Kumar).
- Change MAINTAINERS to cover the schedutil governor as part of
cpufreq (Viresh Kumar).
- Simplify cpufreq_init_policy() to avoid redundant computations
(Yue Hu).
- Add explanatory comment to the cpufreq core (Rafael Wysocki).
- Introduce a new flag, GENPD_FLAG_RPM_ALWAYS_ON, to the generic
power domains (genpd) framework along with the first user of it
(Leonard Crestez).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAlzb4TASHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxiEAP/37uQOx+I8J3IU7HQcPIkdI1hgksLEzo
g2eoREekjszIjFK9xa70X3V/QnGK4YSPQ/cHCjgXfVhwkO5TJzte5T5M2z9gUCDT
7OMYWCI6hP6Mo5UWlP4dQ9Cqce4SB3TdibadevxcVOhFAW/xz42y5Gr6s4WkexJf
Swb2uoLS4gGANyhUhx6XEZ5NpWZkWcK2ygZ8VJZETnoIwxMSUW7FTJkF+4s2tXLZ
GH+F5jWAbwPlg6g2c54lPL1HtiAvK+/018aF8CZMqUBec94RHDFybVOlb5sacfQW
+Y0W/mc/6SMqT3OUcQ0H3Z/qkgwR8mL01hH6gCP1jA5OBljmTjzk0Bbc4c3n9BEN
aRy4M8Qc/GXzEBPO3Z9AlYik6ALH9iUgL2hewGZAFN8kn9ZGPAqYsctdCVkfKL1u
4Esz5+wOsyYmBx910PozL+p2jbTH0x89sSo1qXUQr2JEiNm2iL4I4+ndqhuiq4LO
sQPHCpe4HhYWzIQzJLDurv6hAxxU5PUsGg8XDEGlsyowIPDoIkMgC93RRLGZ/taY
Ivc2FSlwLTSkzBHwVfckakXPvfyFdw8DFL2n66dQbXS9FFNshOF/TFx40iV42i5H
wusyIZIT1y1H74De0EVntUho3xBo3nrrsu1o2NaXsTBoEsYwJiCji4yOZlI1Zh+m
A9coiXKm4hY5
=LqTN
-----END PGP SIGNATURE-----
Merge tag 'pm-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These fix a recent regression causing kernels built with CONFIG_PM
unset to crash on systems that support the Performance and Energy Bias
Hint (EPB), clean up the cpufreq core and some users of transition
notifiers and introduce a new power domain flag into the generic power
domains framework (genpd).
Specifics:
- Fix recent regression causing kernels built with CONFIG_PM unset to
crash on systems that support the Performance and Energy Bias Hint
(EPB) by avoiding to compile the EPB-related code depending on
CONFIG_PM when it is unset (Rafael Wysocki).
- Clean up the transition notifier invocation code in the cpufreq
core and change some users of cpufreq transition notifiers
accordingly (Viresh Kumar).
- Change MAINTAINERS to cover the schedutil governor as part of
cpufreq (Viresh Kumar).
- Simplify cpufreq_init_policy() to avoid redundant computations (Yue
Hu).
- Add explanatory comment to the cpufreq core (Rafael Wysocki).
- Introduce a new flag, GENPD_FLAG_RPM_ALWAYS_ON, to the generic
power domains (genpd) framework along with the first user of it
(Leonard Crestez)"
* tag 'pm-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
soc: imx: gpc: Use GENPD_FLAG_RPM_ALWAYS_ON for ERR009619
PM / Domains: Add GENPD_FLAG_RPM_ALWAYS_ON flag
cpufreq: Update MAINTAINERS to include schedutil governor
cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy()
cpufreq: Explain the kobject_put() in cpufreq_policy_alloc()
cpufreq: Call transition notifier only once for each policy
x86: intel_epb: Take CONFIG_PM into account
* pm-cpufreq:
cpufreq: Update MAINTAINERS to include schedutil governor
cpufreq: Don't find governor for setpolicy drivers in cpufreq_init_policy()
cpufreq: Explain the kobject_put() in cpufreq_policy_alloc()
cpufreq: Call transition notifier only once for each policy
* pm-domains:
soc: imx: gpc: Use GENPD_FLAG_RPM_ALWAYS_ON for ERR009619
PM / Domains: Add GENPD_FLAG_RPM_ALWAYS_ON flag
Pull x86 MDS mitigations from Thomas Gleixner:
"Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is
available in various CPU internal buffers. This new set of misfeatures
has the following CVEs assigned:
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
MDS attacks target microarchitectural buffers which speculatively
forward data under certain conditions. Disclosure gadgets can expose
this data via cache side channels.
Contrary to other speculation based vulnerabilities the MDS
vulnerability does not allow the attacker to control the memory target
address. As a consequence the attacks are purely sampling based, but
as demonstrated with the TLBleed attack samples can be postprocessed
successfully.
The mitigation is to flush the microarchitectural buffers on return to
user space and before entering a VM. It's bolted on the VERW
instruction and requires a microcode update. As some of the attacks
exploit data structures shared between hyperthreads, full protection
requires to disable hyperthreading. The kernel does not do that by
default to avoid breaking unattended updates.
The mitigation set comes with documentation for administrators and a
deeper technical view"
* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/speculation/mds: Fix documentation typo
Documentation: Correct the possible MDS sysfs values
x86/mds: Add MDSUM variant to the MDS documentation
x86/speculation/mds: Add 'mitigations=' support for MDS
x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
x86/speculation/mds: Fix comment
x86/speculation/mds: Add SMT warning message
x86/speculation: Move arch_smt_update() call to after mitigation decisions
x86/speculation/mds: Add mds=full,nosmt cmdline option
Documentation: Add MDS vulnerability documentation
Documentation: Move L1TF to separate directory
x86/speculation/mds: Add mitigation mode VMWERV
x86/speculation/mds: Add sysfs reporting for MDS
x86/speculation/mds: Add mitigation control for MDS
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
x86/kvm/vmx: Add MDS protection when L1D Flush is not active
x86/speculation/mds: Clear CPU buffers on exit to user
x86/speculation/mds: Add mds_clear_cpu_buffers()
x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
x86/speculation/mds: Add BUG_MSBDS_ONLY
...
There's two methods of enabling function tracing in Linux on x86. One is
with just "gcc -pg" and the other is "gcc -pg -mfentry". The former will use
calls to a special function "mcount" after the frame is set up in all C
functions. The latter will add calls to a special function called "fentry"
as the very first instruction of all C functions.
At compile time, there is a check to see if gcc supports, -mfentry, and if
it does, it will use that, because it is more versatile and less error prone
for function tracing.
Starting with v4.19, the minimum gcc supported to build the Linux kernel,
was raised to version 4.6. That also happens to be the first gcc version to
support -mfentry. Since on x86, using gcc versions from 4.6 and beyond will
unconditionally enable the -mfentry, it will no longer use mcount as the
method for inserting calls into the C functions of the kernel. This means
that there is no point in continuing to maintain mcount in x86.
Remove support for using mcount. This makes the code less complex, and will
also allow it to be simplified in the future.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When DYNAMIC_FTRACE is enabled in the kernel, all the functions that can be
traced by the function tracer have a "nop" placeholder at the start of the
function. When function tracing is enabled, the nop is converted into a call
to the tracing infrastructure where the functions get traced. This also
allows for specifying specific functions to trace, and a lot of
infrastructure is built on top of this.
When DYNAMIC_FTRACE is not enabled, all the functions have a call to the
ftrace trampoline. A check is made to see if a function pointer is the
ftrace_stub or not, and if it is not, it calls the function pointer to trace
the code. This adds over 10% overhead to the kernel even when tracing is
disabled.
When an architecture supports DYNAMIC_FTRACE there really is no reason to
use the static tracing. I have kept non DYNAMIC_FTRACE available in x86 so
that the generic code for non DYNAMIC_FTRACE can be tested. There is no
reason to support non DYNAMIC_FTRACE for both x86_64 and x86_32. As the non
DYNAMIC_FTRACE for x86_32 does not even support fentry, and we want to
remove mcount completely, there's no reason to keep non DYNAMIC_FTRACE
around for x86_32.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, the notifiers are called once for each CPU of the policy->cpus
cpumask. It would be more optimal if the notifier can be called only
once and all the relevant information be provided to it. Out of the 23
drivers that register for the transition notifiers today, only 4 of them
do per-cpu updates and the callback for the rest can be called only once
for the policy without any impact.
This would also avoid multiple function calls to the notifier callbacks
and reduce multiple iterations of notifier core's code (which does
locking as well).
This patch adds pointer to the cpufreq policy to the struct
cpufreq_freqs, so the notifier callback has all the information
available to it with a single call. The five drivers which perform
per-cpu updates are updated to use the cpufreq policy. The freqs->cpu
field is redundant now and is removed.
Acked-by: David S. Miller <davem@davemloft.net> (sparc)
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit b9c273babc ("PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs
interface") caused kernels built with CONFIG_PM unset to crash on
systems supporting the Performance and Energy Bias Hint (EPB),
because it attempts to add files to sysfs directories that don't
exist on those systems.
Prevent that from happening by taking CONFIG_PM into account so
that the code depending on it is not compiled at all when it is
not set.
Fixes: b9c273babc ("PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Pull intgrity updates from James Morris:
"This contains just three patches, the remainder were either included
in other pull requests (eg. audit, lockdown) or will be upstreamed via
other subsystems (eg. kselftests, Power).
Included here is one bug fix, one documentation update, and extending
the x86 IMA arch policy rules to coordinate the different kernel
module signature verification methods"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
doc/kernel-parameters.txt: Deprecate ima_appraise_tcb
x86/ima: add missing include
x86/ima: require signed kernel modules
- remove the already broken support for NULL dev arguments to the
DMA API calls
- Kconfig tidyups
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlzT00YLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYO66hAAx2kCUIh+K2gFB5uxHqZiG62UmRjkPzolxcR5/Jx9
4Rz6NRAE+rp8v2fbBr2bveDx7cF5bm1L+pRyRsFMfwkm3a8dCHQ51ldIm5VFoI3e
NiX6Zoxk02BCXP/Qk//aHeNW9dBmuiemiXzdPEhOvWvVzqTO5JZrECQpkHEkG+8A
R/IWU15sr5xzw9Td/HVN9CRJri/qiTAuB9nSoP6BGjZeHkQjREJKNMGKDTvSzH4L
tlyD1G7yEymQvLBqGGO64ztuav00l8sqjI3tn1mmwpw4VTajabeRHPnWh+7g9Od+
sH1pRvIOTvEMc456fizufYIOedB5Ze344kgfrxhngRbBVXmMfShr8ZLzdIUGhGjY
1cdGqIUOEKywiDf13KrHVkNU+lJtvjMCMxvV93mAYRLOIQg0Jf4T2kklgKyEhqrG
rqFdbbtSBzmLjPyqc1FS0heDWmA+yJsKAumGcH4blJXCpsD1rHWGe0AJ34x+OHPT
tw5l+P4zAH1eO1qHCtmxN9s0lXZv1VLcFkOrJH91LPvAhZsUCrdqDjyJpTUYaIao
yzkiLbDwFO7SVoqzaVNlVZIJ/9LX0qfAnl2Atty+sAQomrQMoviNSzGbLSLQqhHN
FbTIEBMxrxS49+3lfzHOS/lYPpJp6B31yotNM+6YpXmbRQZN5gjGNYBqhKD+7Rgn
L0Y=
=IdsP
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.2' of git://git.infradead.org/users/hch/dma-mapping
Pull DMA mapping updates from Christoph Hellwig:
- remove the already broken support for NULL dev arguments to the DMA
API calls
- Kconfig tidyups
* tag 'dma-mapping-5.2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: add a Kconfig symbol to indicate arch_dma_prep_coherent presence
dma-mapping: remove an unnecessary NULL check
x86/dma: Remove the x86_dma_fallback_dev hack
dma-mapping: remove leftover NULL device support
arm: use a dummy struct device for ISA DMA use of the DMA API
pxa3xx-gcu: pass struct device to dma_mmap_coherent
gbefb: switch to managed version of the DMA allocator
da8xx-fb: pass struct device to DMA API functions
parport_ip32: pass struct device to DMA API functions
dma: select GENERIC_ALLOCATOR for DMA_REMAP
The APIC timer calibration (calibrate_APIC_timer()) can be skipped
in cases where we know the APIC timer frequency. On Intel SoCs,
we believe that the APIC is fed by the crystal clock; this would make
sense, and the crystal clock frequency has been verified against the
APIC timer calibration result on ApolloLake, GeminiLake, Kabylake,
CoffeeLake, WhiskeyLake and AmberLake.
Set lapic_timer_period based on the crystal clock frequency
accordingly.
APIC timer calibration would normally be skipped on modern CPUs
by nature of the TSC deadline timer being used instead,
however this change is still potentially useful, e.g. if the
TSC deadline timer has been disabled with a kernel parameter.
calibrate_APIC_timer() uses the legacy timer, but we are seeing
new platforms that omit such legacy functionality, so avoiding
such codepaths is becoming more important.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: len.brown@intel.com
Cc: linux@endlessm.com
Cc: rafael.j.wysocki@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-3-drake@endlessm.com
Link: https://lkml.kernel.org/r/20190419083533.32388-1-drake@endlessm.com
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904031206440.1967@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This variable is a period unit (number of clock cycles per jiffy),
not a frequency (which is number of cycles per second).
Give it a more appropriate name.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: len.brown@intel.com
Cc: linux@endlessm.com
Cc: rafael.j.wysocki@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-2-drake@endlessm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
native_calibrate_tsc() had a data mapping Intel CPU families
and crystal clock speed, but hardcoded tables are not ideal, and this
approach was already problematic at least in the Skylake X case, as
seen in commit:
b511203093 ("x86/tsc: Fix erroneous TSC rate on Skylake Xeon")
By examining CPUID data from http://instlatx64.atw.hu/ and units
in the lab, we have found that 3 different scenarios need to be dealt
with, and we can eliminate most of the hardcoded data using an approach a
little more advanced than before:
1. ApolloLake, GeminiLake, CannonLake (and presumably all new chipsets
from this point) report the crystal frequency directly via CPUID.0x15.
That's definitive data that we can rely upon.
2. Skylake, Kabylake and all variants of those two chipsets report a
crystal frequency of zero, however we can calculate the crystal clock
speed by condidering data from CPUID.0x16.
This method correctly distinguishes between the two crystal clock
frequencies present on different Skylake X variants that caused
headaches before.
As the calculations do not quite match the previously-hardcoded values
in some cases (e.g. 23913043Hz instead of 24MHz), TSC refinement is
enabled on all platforms where we had to calculate the crystal
frequency in this way.
3. Denverton (GOLDMONT_X) reports a crystal frequency of zero and does
not support CPUID.0x16, so we leave this entry hardcoded.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Drake <drake@endlessm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: len.brown@intel.com
Cc: linux@endlessm.com
Cc: rafael.j.wysocki@intel.com
Link: http://lkml.kernel.org/r/20190509055417.13152-1-drake@endlessm.com
Link: https://lkml.kernel.org/r/20190419083533.32388-1-drake@endlessm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJc04M6AAoJEAx081l5xIa+SJgP/0uIgIOM53vPpydgmr+2IEHF
jbDqrd+mipgNriRVHjDsWdUHCUNtyhB7YEBCMrj3mY0rRFI7FlQQf4lOwYGoHiKP
4JZg4kwC37997lFXl1uabGj3DmJLtxKL2/D15zCH/uLe+2EDzWznP6NVdFT3WK0P
YKZQCWT19PWSsLoBRPutWxkmop4AYvkqE0a6vXUlJlFYZK3Bbytx6/179uWKfiX5
ZkKEEtx1XiDAvcp5gBb6PISurycrBY0e/bkPBnK3ES5vawMbTU5IrmWOrQ4D8yOd
z9qOVZawZ6+b2XBDgBWjQ9bM7I5R7Il1q/LglYEaFI9+wHUnlUdDSm6ft5/5BiCZ
fqgkh5Bj2iEsajbSsacoljMOpxpYPqj63mqc+7fAGXF34V+B+9U1bpt8kCbMKowf
7Abb7IuiCR6vLDapjP6VqTMvdQ4O466OEAN83ULGFTdmMqYYH4AxaIwc+xcAk/aP
RNq7/RHhh4FRynRAj9fCkGlF3ArnM88gLINwWuEQq4SClWGcvdw7eaHpwWo77c4g
iccCnTLqSIg5pDVu07AQzzBlW6KulWxh5o72x+Xx+EXWdYUDHQ1SlNs11bSNUBV1
5MkrzY2GuD+NFEjsXJEDIPOr40mQOyJCXnxq8nXPsz/hD9kHeJPvWn3J3eVKyb5B
Z6/knNqM0BDn3SaYR/rD
=YFiQ
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This has two exciting community drivers for ARM Mali accelerators.
Since ARM has never been open source friendly on the GPU side of the
house, the community has had to create open source drivers for the
Mali GPUs. Lima covers the older t4xx and panfrost the newer 6xx/7xx
series. Well done to all involved and hopefully this will help ARM
head in the right direction.
There is also now the ability if you don't have any of the legacy
drivers enabled (pre-KMS) to remove all the pre-KMS support code from
the core drm, this saves 10% or so in codesize on my machine.
i915 also enable Icelake/Elkhart Lake Gen11 GPUs by default, vboxvideo
moves out of staging.
There are also some rcar-du patches which crossover with media tree
but all should be acked by Mauro.
Summary:
uapi changes:
- Colorspace connector property
- fourcc - new YUV formts
- timeline sync objects initially merged
- expose FB_DAMAGE_CLIPS to atomic userspace
new drivers:
- vboxvideo: moved out of staging
- aspeed: ASPEED SoC BMC chip display support
- lima: ARM Mali4xx GPU acceleration driver support
- panfrost: ARM Mali6xx/7xx Midgard/Bitfrost acceleration driver support
core:
- component helper docs
- unplugging fixes
- devm device init
- MIPI/DSI rate control
- shmem backed gem objects
- connector, display_info, edid_quirks cleanups
- dma_buf fence chain support
- 64-bit dma-fence seqno comparison fixes
- move initial fb config code to core
- gem fence array helpers for Lima
- ability to remove legacy support code if no drivers requires it (removes 10% of drm.ko size)
- lease fixes
ttm:
- unified DRM_FILE_PAGE_OFFSET handling
- Account for kernel allocations in kernel zone only
panel:
- OSD070T1718-19TS panel support
- panel-tpo-td028ttec1 backlight support
- Ronbo RB070D30 MIPI/DSI
- Feiyang FY07024DI26A30-D MIPI-DSI panel
- Rocktech jh057n00900 MIPI-DSI panel
i915:
- Comet Lake (Gen9) PCI IDs
- Updated Icelake PCI IDs
- Elkhartlake (Gen11) support
- DP MST property addtions
- plane and watermark fixes
- Icelake port sync and VEBOX disable fixes
- struct_mutex usage reduction
- Icelake gamma fix
- GuC reset fixes
- make mmap more asynchronous
- sound display power well race fixes
- DDI/MIPI-DSI clocks for Icelake
- Icelake RPS frequency changing support
- Icelake workarounds
amdgpu:
- Use HMM for userptr
- vega20 experimental smu11 support
- RAS support for vega20
- BACO support for vega12 + fixes for vega20
- reworked IH interrupt handling
- amdkfd RAS support
- Freesync improvements
- initial timeline sync object support
- DC Z ordering fixes
- NV12 planes support
- colorspace properties for planes=
- eDP opts if eDP already initialized
nouveau:
- misc fixes
etnaviv:
- misc fixes
msm:
- GPU zap shader support expansion
- robustness ABI addition
exynos:
- Logging cleanups
tegra:
- Shared reset fix
- CPU cache maintenance fix
cirrus:
- driver rewritten using simple helpers
meson:
- G12A support
vmwgfx:
- Resource dirtying management improvements
- Userspace logging improvements
virtio:
- PRIME fixes
rockchip:
- rk3066 hdmi support
sun4i:
- DSI burst mode support
vc4:
- load tracker to detect underflow
v3d:
- v3d v4.2 support
malidp:
- initial Mali D71 support in komeda driver
tfp410:
- omap related improvement
omapdrm:
- drm bridge/panel support
- drop some omap specific panels
rcar-du:
- Display writeback support"
* tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm: (1507 commits)
drm/msm/a6xx: No zap shader is not an error
drm/cma-helper: Fix drm_gem_cma_free_object()
drm: Fix timestamp docs for variable refresh properties.
drm/komeda: Mark the local functions as static
drm/komeda: Fixed warning: Function parameter or member not described
drm/komeda: Expose bus_width to Komeda-CORE
drm/komeda: Add sysfs attribute: core_id and config_id
drm: add non-desktop quirk for Valve HMDs
drm/panfrost: Show stored feature registers
drm/panfrost: Don't scream about deferred probe
drm/panfrost: Disable PM on probe failure
drm/panfrost: Set DMA masks earlier
drm/panfrost: Add sanity checks to submit IOCTL
drm/etnaviv: initialize idle mask before querying the HW db
drm: introduce a capability flag for syncobj timeline support
drm: report consistent errors when checking syncobj capibility
drm/nouveau/nouveau: forward error generated while resuming objects tree
drm/nouveau/fb/ramgk104: fix spelling mistake "sucessfully" -> "successfully"
drm/nouveau/i2c: Disable i2c bus access after ->fini()
drm/nouveau: Remove duplicate ACPI_VIDEO_NOTIFY_PROBE definition
...
Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
function tracer started tracing the same function that was patched, the
conversion of the fentry call site during the translation of going from
calling the live kernel patch trampoline to the iterator trampoline, would
have as slight window where it didn't call anything.
As live kernel patching depends on ftrace to always call its code (to
prevent the function being traced from being called, as it will redirect
it). This small window would allow the old buggy function to be called, and
this can cause undesirable results.
Nicolai submitted new patches[2] but these were controversial. As this is
similar to the static call emulation issues that came up a while ago[3].
But after some debate[4][5] adding a gap in the stack when entering the
breakpoint handler allows for pushing the return address onto the stack to
easily emulate a call.
[1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de
[2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de
[3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com
[4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com
[5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_w@mail.gmail.com
[
Live kernel patching is not implemented on x86_32, thus the emulate
calls are only for x86_64.
]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: b700e7f03d ("livepatch: kernel: add support for live patching")
Tested-by: Nicolai Stange <nstange@suse.de>
Reviewed-by: Nicolai Stange <nstange@suse.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Changed to only implement emulated calls for x86_64 ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This function is referenced from assembler, so in LTO
it needs to be global and visible to not be optimized away.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20190330004743.29541-7-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Here is the "big" set of driver core patches for 5.2-rc1
There are a number of ACPI patches in here as well, as Rafael said they
should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.
There are also a number of small subsystem-specific changes in here, due
to some changes to the kobject core code. Those too have all been acked
by the various subsystem maintainers.
As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXNHDbw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynDAgCfbb4LBR6I50wFXb8JM/R6cAS7qrsAn1unshKV
8XCYcif2RxjtdJWXbjdm
=/rLh
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core/kobject updates from Greg KH:
"Here is the "big" set of driver core patches for 5.2-rc1
There are a number of ACPI patches in here as well, as Rafael said
they should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.
There are also a number of small subsystem-specific changes in here,
due to some changes to the kobject core code. Those too have all been
acked by the various subsystem maintainers.
As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixes
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
kobject: clean up the kobject add documentation a bit more
kobject: Fix kernel-doc comment first line
kobject: Remove docstring reference to kset
firmware_loader: Fix a typo ("syfs" -> "sysfs")
kobject: fix dereference before null check on kobj
Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
init/config: Do not select BUILD_BIN2C for IKCONFIG
Provide in-kernel headers to make extending kernel easier
kobject: Improve doc clarity kobject_init_and_add()
kobject: Improve docs for kobject_add/del
driver core: platform: Fix the usage of platform device name(pdev->name)
livepatch: Replace klp_ktype_patch's default_attrs with groups
cpufreq: schedutil: Replace default_attrs field with groups
padata: Replace padata_attr_type default_attrs field with groups
irqdesc: Replace irq_kobj_type's default_attrs field with groups
net-sysfs: Replace ktype default_attrs field with groups
block: Replace all ktype default_attrs with groups
samples/kobject: Replace foo_ktype's default_attrs field with groups
kobject: Add support for default attribute groups to kobj_type
driver core: Postpone DMA tear-down until after devres release for probe failure
...
https://lore.kernel.org/linux-fsdevel/CAHk-=wg1tFzcaX2v9Z91vPJiBR486ddW5MtgDL02-fOen2F0Aw@mail.gmail.com/T/#m5b2d9ad3aeacea4bd6aa1964468ac074bf3aa5bf
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEECVWwJCUO7/z+QjZbZsp4hBP2dUkFAlzR1UgQHGtpcnJAbmV4
ZWRpLmNvbQAKCRBmyniEE/Z1SZBiEACGw1LzUmjV9eBYFjqaUkgX/Zfcu42D4Ek2
8MuWnNdRabtpGQq0LccYlfoL3yH5xECp14IkCgJvkjqoZ3CcqWcv6uDxf0WtnUqZ
wPx1RYZykb4RZj2A6/ndhInReP4AlXICyTVulKb+BquVkemMvmXX8k+bkr/msKfT
9jdKWFIn+ANNABt3y2D7ywZvs9mkxIx+Fti+tVV4BFBeGfUuj4ArZBOHnngRnIk/
XYlQ7FVzENSPSB+3GvL34jTGEzo8suPHKhHQlIhtcd5hwzVRZKE2sdVXsCc6/WbY
YnT32gmT1/+cUuDl1mZSiQY5R4Xkb07k6/jNrdmjQpwmWbZu90cuRhb+JBXwnmjZ
2Wgy3sfwYISDxtePukg1iYePlHlVlGTYqMo3AQrTBs/gEwCKWrsKQb98mRxlf1YK
e2mdtmq6upYoorLFQesfRgrCg4GTBiPkrR3amXsFgJ2O5fhV6R98ZdGSv4kip19f
ZNoc/t1EtKGwyAJwjINduv36E3RSHODWwSPtSnmSS1ieCGToY1SI3bVUkFM4C0tO
5GMdSugHgXRGGVbTd/VftndJm6Wtj8b1j8c/1Vh04Q8qbKKJDRTDzAbK1v8oLaDh
UXAKMIc8uY4caZy3/bTAB2Ou9dibrSi8Oc+LwZqJlwIcbkwn/IGNvmwtWv4ehorE
N7EhCFZsFQ==
=Mavg
-----END PGP SIGNATURE-----
Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux
Pull stream_open conversion from Kirill Smelkov:
- remove unnecessary double nonseekable_open from drivers/char/dtlk.c
as noticed by Pavel Machek while reviewing nonseekable_open ->
stream_open mass conversion.
- the mass conversion patch promised in commit 10dce8af34 ("fs:
stream_open - opener for stream-like files so that read and write can
run simultaneously without deadlock") and is automatically generated
by running
$ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci
I've verified each generated change manually - that it is correct to
convert - and each other nonseekable_open instance left - that it is
either not correct to convert there, or that it is not converted due
to current stream_open.cocci limitations. More details on this in the
patch.
- finally, change VFS to pass ppos=NULL into .read/.write for files
that declare themselves streams. It was suggested by Rasmus Villemoes
and makes sure that if ppos starts to be erroneously used in a stream
file, such bug won't go unnoticed and will produce an oops instead of
creating illusion of position change being taken into account.
Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to
use stream_open()" that will be hopefully coming via FUSE tree,
because fs/fuse/ uses new-style .read_iter/.write_iter, and for these
accessors position is still passed as non-pointer kiocb.ki_pos .
* tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux:
vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files
*: convert stream-like files from nonseekable_open -> stream_open
dtlk: remove double call to nonseekable_open
Pull x86 FPU state handling updates from Borislav Petkov:
"This contains work started by Rik van Riel and brought to fruition by
Sebastian Andrzej Siewior with the main goal to optimize when to load
FPU registers: only when returning to userspace and not on every
context switch (while the task remains in the kernel).
In addition, this optimization makes kernel_fpu_begin() cheaper by
requiring registers saving only on the first invocation and skipping
that in following ones.
What is more, this series cleans up and streamlines many aspects of
the already complex FPU code, hopefully making it more palatable for
future improvements and simplifications.
Finally, there's a __user annotations fix from Jann Horn"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails
x86/pkeys: Add PKRU value to init_fpstate
x86/fpu: Restore regs in copy_fpstate_to_sigframe() in order to use the fastpath
x86/fpu: Add a fastpath to copy_fpstate_to_sigframe()
x86/fpu: Add a fastpath to __fpu__restore_sig()
x86/fpu: Defer FPU state load until return to userspace
x86/fpu: Merge the two code paths in __fpu__restore_sig()
x86/fpu: Restore from kernel memory on the 64-bit path too
x86/fpu: Inline copy_user_to_fpregs_zeroing()
x86/fpu: Update xstate's PKRU value on write_pkru()
x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
x86/entry: Add TIF_NEED_FPU_LOAD
x86/fpu: Eager switch PKRU state
x86/pkeys: Don't check if PKRU is zero before writing it
x86/fpu: Only write PKRU if it is different from current
x86/pkeys: Provide *pkru() helpers
x86/fpu: Use a feature number instead of mask in two more helpers
x86/fpu: Make __raw_xsave_addr() use a feature number instead of mask
x86/fpu: Add an __fpregs_load_activate() internal helper
...
Pull RAS updates from Borislav Petkov:
- Support for varying MCA bank numbers per CPU: this is in preparation
for future CPU enablement (Yazen Ghannam)
- MCA banks read race fix (Tony Luck)
- Facility to filter MCEs which should not be logged (Yazen Ghannam)
- The usual round of cleanups and fixes
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
x86/MCE: Add an MCE-record filtering function
RAS/CEC: Increment cec_entered under the mutex lock
x86/mce: Fix debugfs_simple_attr.cocci warnings
x86/mce: Remove mce_report_event()
x86/mce: Handle varying MCA bank counts
x86/mce: Fix machine_check_poll() tests for error types
MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE
x86/MCE: Group AMD function prototypes in <asm/mce.h>
- Fix the handling of Performance and Energy Bias Hint (EPB) on
Intel processors and expose it to user space via sysfs to avoid
having to access it through the generic MSR I/F (Rafael Wysocki).
- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).
- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).
- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).
- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).
- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).
- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).
- Fix kobject memory leak in the cpufreq core (Viresh Kumar).
- Simplify the IOwait boosting in the schedutil cpufreq governor
and rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
- Improve the cpufreq documentation, add SPDX license tags to
some PM documentation files and unify copyright notices in
them (Rafael Wysocki).
- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).
- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
- Improve genpd support for devices in multiple power domains (Ulf
Hansson).
- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).
- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).
- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to
the rk3399_dmc devfreq driver (Enric Balletbo i Serra).
- Add tracing to devfreq (Lukasz Luba).
- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).
- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).
- Improve system wakeup diagnostics (Stephen Boyd).
- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAlzQEwUSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxxXwP/jrxikIXdCOV3CJVioV0NetyebwlOqYp
UsIA7lQBfZ/DY6dHw/oKuAT9LP01vcFg6XGe83Alkta9qczR5KZ/MYHFNSZXjXjL
kEvIMBCS/oykaBuW+Xn9am8Ke3Yq/rBSTKWVom3vzSQY0qvZ9GBwPDrzw+k63Zhz
P3afB4ThyY0e9ftgw4HvSSNm13Kn0ItUIQOdaLatXMMcPqP5aAdnUma5Ibinbtpp
rpTHuHKYx7MSjaCg6wl3kKTJeWbQP4wYO2ISZqH9zEwQgdvSHeFAvfPKTegUkmw9
uUsQnPD1JvdglOKovr2muehD1Ur+zsjKDf2OKERkWsWXHPyWzA/AqaVv1mkkU++b
KaWaJ9pE86kGlJ3EXwRbGfV0dM5rrl+dUUQW6nPI1XJnIOFlK61RzwAbqI26F0Mz
AlKxY4jyPLcM3SpQz9iILqyzHQqB67rm29XvId/9scoGGgoqEI4S+v6LYZqI3Vx6
aeSRu+Yof7p5w4Kg5fODX+HzrtMnMrPmLUTXhbExfsYZMi7hXURcN6s+tMpH0ckM
4yiIpnNGCKUSV4vxHBm8XJdAuUnR4Vcz++yFslszgDVVvw5tkvF7SYeHZ6HqcQVm
af9HdWzx3qajs/oyBwdRBedZYDnP1joC5donBI2ofLeF33NA7TEiPX8Zebw8XLkv
fNikssA7PGdv
=nY9p
-----END PGP SIGNATURE-----
Merge tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These fix the (Intel-specific) Performance and Energy Bias Hint (EPB)
handling and expose it to user space via sysfs, fix and clean up
several cpufreq drivers, add support for two new chips to the qoriq
cpufreq driver, fix, simplify and clean up the cpufreq core and the
schedutil governor, add support for "CPU" domains to the generic power
domains (genpd) framework and provide low-level PSCI firmware support
for that feature, fix the exynos cpuidle driver and fix a couple of
issues in the devfreq subsystem and clean it up.
Specifics:
- Fix the handling of Performance and Energy Bias Hint (EPB) on Intel
processors and expose it to user space via sysfs to avoid having to
access it through the generic MSR I/F (Rafael Wysocki).
- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).
- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).
- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).
- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).
- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).
- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).
- Fix kobject memory leak in the cpufreq core (Viresh Kumar).
- Simplify the IOwait boosting in the schedutil cpufreq governor and
rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
- Improve the cpufreq documentation, add SPDX license tags to some PM
documentation files and unify copyright notices in them (Rafael
Wysocki).
- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).
- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
- Improve genpd support for devices in multiple power domains (Ulf
Hansson).
- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).
- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).
- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to the
rk3399_dmc devfreq driver (Enric Balletbo i Serra).
- Add tracing to devfreq (Lukasz Luba).
- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).
- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).
- Improve system wakeup diagnostics (Stephen Boyd).
- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan)"
* tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
cpufreq: Fix kobject memleak
cpufreq: armada-37xx: fix frequency calculation for opp
cpufreq: centrino: Fix centrino_setpolicy() kerneldoc comment
cpufreq: qoriq: add support for lx2160a
x86: tsc: Rework time_cpufreq_notifier()
PM / Domains: Allow to attach a CPU via genpd_dev_pm_attach_by_id|name()
PM / Domains: Search for the CPU device outside the genpd lock
PM / Domains: Drop unused in-parameter to some genpd functions
PM / Domains: Use the base device for driver_deferred_probe_check_state()
cpufreq: qoriq: Add ls1028a chip support
PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain
PM / Domains: Allow OF lookup for multi PM domain case from ->attach_dev()
PM / Domains: Don't kfree() the virtual device in the error path
cpufreq: Move ->get callback check outside of __cpufreq_get()
PM / Domains: remove unnecessary unlikely()
cpufreq: Remove needless bios_limit check in show_bios_limit()
drivers/cpufreq/acpi-cpufreq.c: This fixes the following checkpatch warning
firmware/psci: add support for SYSTEM_RESET2
PM / devfreq: add tracing for scheduling work
trace: events: add devfreq trace event file
...
Pull x86 microcode loading update from Borislav Petkov:
"A nice Intel microcode blob loading cleanup which gets rid of the ugly
memcpy wrappers and switches the driver to use the iov_iter API. By
Jann Horn.
In addition, the /dev/cpu/microcode interface is finally deprecated as
it is inadequate for the same reasons the late microcode loading is"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Deprecate MICROCODE_OLD_INTERFACE
x86/microcode: Fix the ancient deprecated microcode loading method
x86/microcode/intel: Refactor Intel microcode blob loading
Pull x86 topology updates from Ingo Molnar:
"Two main changes: preparatory changes for Intel multi-die topology
support, plus a syslog message tweak"
* 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Make DEBUG_HOTPLUG_CPU0 pr_info() more descriptive
x86/smpboot: Rename match_die() to match_pkg()
topology: Simplify cputopology.txt formatting and wording
x86/topology: Fix documentation typo
Pull x86 timer updates from Ingo Molnar:
"Two changes: an LTO improvement, plus the new 'nowatchdog' boot option
to disable the clocksource watchdog"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timer: Don't inline __const_udelay()
x86/tsc: Add option to disable tsc clocksource watchdog
Pull x86 platform updates from Ingo Molnar:
"Smaller update for Hyper-V to support EOI assist, plus LTO fixes"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kvm: Make steal_time visible
x86/hyperv: Make hv_vcpu_is_preempted() visible
x86/hyper-v: Implement EOI assist
Pull x86 mm updates from Ingo Molnar:
"The changes in here are:
- text_poke() fixes and an extensive set of executability lockdowns,
to (hopefully) eliminate the last residual circumstances under
which we are using W|X mappings even temporarily on x86 kernels.
This required a broad range of surgery in text patching facilities,
module loading, trampoline handling and other bits.
- tweak page fault messages to be more informative and more
structured.
- remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
default.
- reduce KASLR granularity on 5-level paging kernels from 512 GB to
1 GB.
- misc other changes and updates"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/mm: Initialize PGD cache during mm initialization
x86/alternatives: Add comment about module removal races
x86/kprobes: Use vmalloc special flag
x86/ftrace: Use vmalloc special flag
bpf: Use vmalloc special flag
modules: Use vmalloc special flag
mm/vmalloc: Add flag for freeing of special permsissions
mm/hibernation: Make hibernation handle unmapped pages
x86/mm/cpa: Add set_direct_map_*() functions
x86/alternatives: Remove the return value of text_poke_*()
x86/jump-label: Remove support for custom text poker
x86/modules: Avoid breaking W^X while loading modules
x86/kprobes: Set instruction page as executable
x86/ftrace: Set trampoline pages as executable
x86/kgdb: Avoid redundant comparison of patched code
x86/alternatives: Use temporary mm for text poking
x86/alternatives: Initialize temporary mm for patching
fork: Provide a function for copying init_mm
uprobes: Initialize uprobes earlier
x86/mm: Save debug registers when loading a temporary mm
...
Pull x86 kdump update from Ingo Molnar:
"This includes two changes:
- Raise the crash kernel reservation limit from from ~896MB to ~4GB.
Only very old (and already known-broken) kexec-tools is supposed to
be affected by this negatively.
- Allow higher than 4GB crash kernel allocations when low allocations
fail"
* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kdump: Fall back to reserve high crashkernel memory
x86/kdump: Have crashkernel=X reserve under 4G by default
Pull x86 irq updates from Ingo Molnar:
"Here are the main changes in this tree:
- Introduce x86-64 IRQ/exception/debug stack guard pages to detect
stack overflows immediately and deterministically.
- Clean up over a decade worth of cruft accumulated.
The outcome of this should be more clear-cut faults/crashes when any
of the low level x86 CPU stacks overflow, instead of silent memory
corruption and sporadic failures much later on"
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
x86/irq: Fix outdated comments
x86/irq/64: Remove stack overflow debug code
x86/irq/64: Remap the IRQ stack with guard pages
x86/irq/64: Split the IRQ stack into its own pages
x86/irq/64: Init hardirq_stack_ptr during CPU hotplug
x86/irq/32: Handle irq stack allocation failure proper
x86/irq/32: Invoke irq_ctx_init() from init_IRQ()
x86/irq/64: Rename irq_stack_ptr to hardirq_stack_ptr
x86/irq/32: Rename hard/softirq_stack to hard/softirq_stack_ptr
x86/irq/32: Make irq stack a character array
x86/irq/32: Define IRQ_STACK_SIZE
x86/dumpstack/64: Speedup in_exception_stack()
x86/exceptions: Split debug IST stack
x86/exceptions: Enable IST guard pages
x86/exceptions: Disconnect IST index and stack order
x86/cpu: Remove orig_ist array
x86/cpu: Prepare TSS.IST setup for guard pages
x86/dumpstack/64: Use cpu_entry_area instead of orig_ist
x86/irq/64: Use cpu entry area instead of orig_ist
x86/traps: Use cpu_entry_area instead of orig_ist
...
Pull x86 cpu updates from Ingo Molnar:
"Two changes: a Hygon CPU fix, and an optimization Centaur CPUs"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/power: Optimize C3 entry on Centaur CPUs
x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors
Pull x86 cleanups from Ingo Molnar:
"A handful of cleanups: dma-ops cleanups, missing boot time kcalloc()
check, a Sparse fix and use struct_size() to simplify a vzalloc()
call"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pci: Clean up usage of X86_DEV_DMA_OPS
x86/Kconfig: Remove the unused X86_DMA_REMAP KConfig symbol
x86/kexec/crash: Use struct_size() in vzalloc()
x86/mm/tlb: Define LOADED_MM_SWITCHING with pointer-sized number
x86/platform/uv: Fix missing checks of kcalloc() return values
Pull x86 cache QoS updates from Ingo Molnar:
"An RDT cleanup and a fix for RDT initialization of new resource
groups"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Initialize a new resource group with default MBA values
x86/resctrl: Move per RDT domain initialization to a separate function
Pull x86 asm updates from Ingo Molnar:
"This includes the following changes:
- cpu_has() cleanups
- sync_bitops.h modernization to the rmwcc.h facility, similarly to
bitops.h
- continued LTO annotations/fixes
- misc cleanups and smaller cleanups"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/um/vdso: Drop unnecessary cc-ldoption
x86/vdso: Rename variable to fix -Wshadow warning
x86/cpu/amd: Exclude 32bit only assembler from 64bit build
x86/asm: Mark all top level asm statements as .text
x86/build/vdso: Add FORCE to the build rule of %.so
x86/asm: Modernize sync_bitops.h
x86/mm: Convert some slow-path static_cpu_has() callers to boot_cpu_has()
x86: Convert some slow-path static_cpu_has() callers to boot_cpu_has()
x86/asm: Clarify static_cpu_has()'s intended use
x86/uaccess: Fix implicit cast of __user pointer
x86/cpufeature: Remove __pure attribute to _static_cpu_has()
Pull x86 apic update from Ingo Molnar:
"A single commit which unifies the unnecessarily diverged
implementations of APIC timer initialization. As a result the
max_delta parameter is now consistently taken into account"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Unify duplicated local apic timer clockevent initialization
Pull perf updates from Ingo Molnar:
"The main kernel changes were:
- add support for Intel's "adaptive PEBS v4" - which embedds LBS data
in PEBS records and can thus batch up and reduce the IRQ (NMI) rate
significantly - reducing overhead and making call-graph profiling
less intrusive.
- add Intel CPU core and uncore support updates for Tremont, Icelake,
- extend the x86 PMU constraints scheduler with 'constraint ranges'
to better support Icelake hw constraints,
- make x86 call-chain support work better with CONFIG_FRAME_POINTER=y
- misc other changes
Tooling changes:
- updates to the main tools: 'perf record', 'perf trace', 'perf
stat'
- updated Intel and S/390 vendor events
- libtraceevent updates
- misc other updates and fixes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER
watchdog: Fix typo in comment
perf/x86/intel: Add Tremont core PMU support
perf/x86/intel/uncore: Add Intel Icelake uncore support
perf/x86/msr: Add Icelake support
perf/x86/intel/rapl: Add Icelake support
perf/x86/intel/cstate: Add Icelake support
perf/x86/intel: Add Icelake support
perf/x86: Support constraint ranges
perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them
perf/x86/intel: Support adaptive PEBS v4
perf/x86/intel/ds: Extract code of event update in short period
perf/x86/intel: Extract memory code PEBS parser for reuse
perf/x86: Support outputting XMM registers
perf/x86/intel: Force resched when TFA sysctl is modified
perf/core: Add perf_pmu_resched() as global function
perf/headers: Fix stale comment for struct perf_addr_filter
perf/core: Make perf_swevent_init_cpu() static
perf/x86: Add sanity checks to x86_schedule_events()
perf/x86: Optimize x86_schedule_events()
...
Pull EFI updates from Ingo Molnar:
"The changes in this cycle were:
- Squash a spurious warning when using the EFI framebuffer on a
non-EFI boot
- Use DMI data to annotate RAS memory errors on ARM just like we do
on Intel
- Followup cleanups for DMI
- libstub Makefile cleanups"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm: Omit unneeded stripping of ksymtab/kcrctab sections
efi: Unify DMI setup code over the arm/arm64, ia64 and x86 architectures
efi/arm: Show SMBIOS bank/device location in CPER and GHES error logs
efifb: Omit memory map check on legacy boot
efi/libstub: Refactor the cmd_stubcopy Makefile command
Pull stack trace updates from Ingo Molnar:
"So Thomas looked at the stacktrace code recently and noticed a few
weirdnesses, and we all know how such stories of crummy kernel code
meeting German engineering perfection end: a 45-patch series to clean
it all up! :-)
Here's the changes in Thomas's words:
'Struct stack_trace is a sinkhole for input and output parameters
which is largely pointless for most usage sites. In fact if embedded
into other data structures it creates indirections and extra storage
overhead for no benefit.
Looking at all usage sites makes it clear that they just require an
interface which is based on a storage array. That array is either on
stack, global or embedded into some other data structure.
Some of the stack depot usage sites are outright wrong, but
fortunately the wrongness just causes more stack being used for
nothing and does not have functional impact.
Another oddity is the inconsistent termination of the stack trace
with ULONG_MAX. It's pointless as the number of entries is what
determines the length of the stored trace. In fact quite some call
sites remove the ULONG_MAX marker afterwards with or without nasty
comments about it. Not all architectures do that and those which do,
do it inconsistenly either conditional on nr_entries == 0 or
unconditionally.
The following series cleans that up by:
1) Removing the ULONG_MAX termination in the architecture code
2) Removing the ULONG_MAX fixups at the call sites
3) Providing plain storage array based interfaces for stacktrace
and stackdepot.
4) Cleaning up the mess at the callsites including some related
cleanups.
5) Removing the struct stack_trace based interfaces
This is not changing the struct stack_trace interfaces at the
architecture level, but it removes the exposure to the generic
code'"
* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/stacktrace: Use common infrastructure
stacktrace: Provide common infrastructure
lib/stackdepot: Remove obsolete functions
stacktrace: Remove obsolete functions
livepatch: Simplify stack trace retrieval
tracing: Remove the last struct stack_trace usage
tracing: Simplify stack trace retrieval
tracing: Make ftrace_trace_userstack() static and conditional
tracing: Use percpu stack trace buffer more intelligently
tracing: Simplify stacktrace retrieval in histograms
lockdep: Simplify stack trace handling
lockdep: Remove save argument from check_prev_add()
lockdep: Remove unused trace argument from print_circular_bug()
drm: Simplify stacktrace handling
dm persistent data: Simplify stack trace handling
dm bufio: Simplify stack trace retrieval
btrfs: ref-verify: Simplify stack trace retrieval
dma/debug: Simplify stracktrace retrieval
fault-inject: Simplify stacktrace retrieval
mm/page_owner: Simplify stack trace handling
...
Pull speculation mitigation update from Ingo Molnar:
"This adds the "mitigations=" bootline option, which offers a
cross-arch set of options that will work on x86, PowerPC and s390 that
will map to the arch specific option internally"
* 'core-speculation-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
s390/speculation: Support 'mitigations=' cmdline option
powerpc/speculation: Support 'mitigations=' cmdline option
x86/speculation: Support 'mitigations=' cmdline option
cpu/speculation: Add 'mitigations=' cmdline option
Pull rseq updates from Ingo Molnar:
"A cleanup and a fix to comments"
* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Remove superfluous rseq_len from task_struct
rseq: Clean up comments by reflecting removal of event counter
Pull objtool updates from Ingo Molnar:
"This is a series from Peter Zijlstra that adds x86 build-time uaccess
validation of SMAP to objtool, which will detect and warn about the
following uaccess API usage bugs and weirdnesses:
- call to %s() with UACCESS enabled
- return with UACCESS enabled
- return with UACCESS disabled from a UACCESS-safe function
- recursive UACCESS enable
- redundant UACCESS disable
- UACCESS-safe disables UACCESS
As it turns out not leaking uaccess permissions outside the intended
uaccess functionality is hard when the interfaces are complex and when
such bugs are mostly dormant.
As a bonus we now also check the DF flag. We had at least one
high-profile bug in that area in the early days of Linux, and the
checking is fairly simple. The checks performed and warnings emitted
are:
- call to %s() with DF set
- return with DF set
- return with modified stack frame
- recursive STD
- redundant CLD
It's all x86-only for now, but later on this can also be used for PAN
on ARM and objtool is fairly cross-platform in principle.
While all warnings emitted by this new checking facility that got
reported to us were fixed, there might be GCC version dependent
warnings that were not reported yet - which we'll address, should they
trigger.
The warnings are non-fatal build warnings"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation
sched/x86_64: Don't save flags on context switch
objtool: Add Direction Flag validation
objtool: Add UACCESS validation
objtool: Fix sibling call detection
objtool: Rewrite alt->skip_orig
objtool: Add --backtrace support
objtool: Rewrite add_ignores()
objtool: Handle function aliases
objtool: Set insn->func for alternatives
x86/uaccess, kcov: Disable stack protector
x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
x86/uaccess, ubsan: Fix UBSAN vs. SMAP
x86/uaccess, kasan: Fix KASAN vs SMAP
x86/smap: Ditch __stringify()
x86/uaccess: Introduce user_access_{save,restore}()
x86/uaccess, signal: Fix AC=1 bloat
x86/uaccess: Always inline user_access_begin()
x86/uaccess, xen: Suppress SMAP warnings
...
Using scripts/coccinelle/api/stream_open.cocci added in 10dce8af34
("fs: stream_open - opener for stream-like files so that read and write
can run simultaneously without deadlock"), search and convert to
stream_open all in-kernel nonseekable_open users for which read and
write actually do not depend on ppos and where there is no other methods
in file_operations which assume @offset access.
I've verified each generated change manually - that it is correct to convert -
and each other nonseekable_open instance left - that it is either not correct
to convert there, or that it is not converted due to current stream_open.cocci
limitations. The script also does not convert files that should be valid to
convert, but that currently have .llseek = noop_llseek or generic_file_llseek
for unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)
Among cases converted 14 were potentially vulnerable to read vs write deadlock
(see details in 10dce8af34):
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/infiniband/core/user_mad.c:988:1-17: ERROR: umad_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/input/misc/uinput.c:401:1-17: ERROR: uinput_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
and the rest were just safe to convert to stream_open because their read and
write do not use ppos at all and corresponding file_operations do not
have methods that assume @offset file access(*):
arch/powerpc/platforms/52xx/mpc52xx_gpt.c:631:8-24: WARNING: mpc52xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/um/drivers/harddog_kern.c:88:8-24: WARNING: harddog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
arch/x86/kernel/cpu/microcode/core.c:430:33-49: WARNING: microcode_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/ds1620.c:215:8-24: WARNING: ds1620_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/dtlk.c:301:1-17: WARNING: dtlk_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/ipmi/ipmi_watchdog.c:840:9-25: WARNING: ipmi_wdog_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/pcmcia/scr24x_cs.c:95:8-24: WARNING: scr24x_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/char/tb0219.c:246:9-25: WARNING: tb0219_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/firewire/nosy.c:306:8-24: WARNING: nosy_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/hwmon/fschmd.c:840:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/hwmon/w83793.c:1344:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/infiniband/core/ucma.c:1747:8-24: WARNING: ucma_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/infiniband/core/ucm.c:1178:8-24: WARNING: ucm_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/infiniband/core/uverbs_main.c:1086:8-24: WARNING: uverbs_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/input/joydev.c:282:1-17: WARNING: joydev_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/pci/switch/switchtec.c:393:1-17: WARNING: switchtec_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/platform/chrome/cros_ec_debugfs.c:135:8-24: WARNING: cros_ec_console_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/rtc/rtc-ds1374.c:470:9-25: WARNING: ds1374_wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/rtc/rtc-m41t80.c:805:9-25: WARNING: wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/s390/char/tape_char.c:293:2-18: WARNING: tape_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/s390/char/zcore.c:194:8-24: WARNING: zcore_reipl_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/s390/crypto/zcrypt_api.c:528:8-24: WARNING: zcrypt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/spi/spidev.c:594:1-17: WARNING: spidev_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/staging/pi433/pi433_if.c:974:1-17: WARNING: pi433_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/acquirewdt.c:203:8-24: WARNING: acq_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/advantechwdt.c:202:8-24: WARNING: advwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/alim1535_wdt.c:252:8-24: WARNING: ali_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/alim7101_wdt.c:217:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ar7_wdt.c:166:8-24: WARNING: ar7_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/at91rm9200_wdt.c:113:8-24: WARNING: at91wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ath79_wdt.c:135:8-24: WARNING: ath79_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/bcm63xx_wdt.c:119:8-24: WARNING: bcm63xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/cpu5wdt.c:143:8-24: WARNING: cpu5wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/cpwd.c:397:8-24: WARNING: cpwd_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/eurotechwdt.c:319:8-24: WARNING: eurwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/f71808e_wdt.c:528:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/gef_wdt.c:232:8-24: WARNING: gef_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/geodewdt.c:95:8-24: WARNING: geodewdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ib700wdt.c:241:8-24: WARNING: ibwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ibmasr.c:326:8-24: WARNING: asr_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/indydog.c:80:8-24: WARNING: indydog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/intel_scu_watchdog.c:307:8-24: WARNING: intel_scu_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/iop_wdt.c:104:8-24: WARNING: iop_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/it8712f_wdt.c:330:8-24: WARNING: it8712f_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ixp4xx_wdt.c:68:8-24: WARNING: ixp4xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/ks8695_wdt.c:145:8-24: WARNING: ks8695wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/m54xx_wdt.c:88:8-24: WARNING: m54xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/machzwd.c:336:8-24: WARNING: zf_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/mixcomwd.c:153:8-24: WARNING: mixcomwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/mtx-1_wdt.c:121:8-24: WARNING: mtx1_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/mv64x60_wdt.c:136:8-24: WARNING: mv64x60_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/nuc900_wdt.c:134:8-24: WARNING: nuc900wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/nv_tco.c:164:8-24: WARNING: nv_tco_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pc87413_wdt.c:289:8-24: WARNING: pc87413_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd.c:698:8-24: WARNING: pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd.c:737:8-24: WARNING: pcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_pci.c:581:8-24: WARNING: pcipcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_pci.c:623:8-24: WARNING: pcipcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_usb.c:488:8-24: WARNING: usb_pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pcwd_usb.c:527:8-24: WARNING: usb_pcwd_temperature_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pika_wdt.c:121:8-24: WARNING: pikawdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/pnx833x_wdt.c:119:8-24: WARNING: pnx833x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/rc32434_wdt.c:153:8-24: WARNING: rc32434_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/rdc321x_wdt.c:145:8-24: WARNING: rdc321x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/riowd.c:79:1-17: WARNING: riowd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sa1100_wdt.c:62:8-24: WARNING: sa1100dog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc60xxwdt.c:211:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc7240_wdt.c:139:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc8360.c:274:8-24: WARNING: sbc8360_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc_epx_c3.c:81:8-24: WARNING: epx_c3_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sbc_fitpc2_wdt.c:78:8-24: WARNING: fitpc2_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sb_wdog.c:108:1-17: WARNING: sbwdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sc1200wdt.c:181:8-24: WARNING: sc1200wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sc520_wdt.c:261:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/sch311x_wdt.c:319:8-24: WARNING: sch311x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/scx200_wdt.c:105:8-24: WARNING: scx200_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/smsc37b787_wdt.c:369:8-24: WARNING: wb_smsc_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/w83877f_wdt.c:227:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/w83977f_wdt.c:301:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wafer5823wdt.c:200:8-24: WARNING: wafwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/watchdog_dev.c:828:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdrtas.c:379:8-24: WARNING: wdrtas_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdrtas.c:445:8-24: WARNING: wdrtas_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt285.c:104:1-17: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt977.c:276:8-24: WARNING: wdt977_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt.c:424:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt.c:484:8-24: WARNING: wdt_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt_pci.c:464:8-24: WARNING: wdtpci_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
drivers/watchdog/wdt_pci.c:527:8-24: WARNING: wdtpci_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
net/batman-adv/log.c:105:1-17: WARNING: batadv_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/control.c:57:7-23: WARNING: snd_ctl_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/rawmidi.c:385:7-23: WARNING: snd_rawmidi_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/seq/seq_clientmgr.c:310:7-23: WARNING: snd_seq_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
sound/core/timer.c:1428:7-23: WARNING: snd_timer_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
One can also recheck/review the patch via generating it with explanation comments included via
$ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci SPFLAGS="-D explain"
(*) This second group also contains cases with read/write deadlocks that
stream_open.cocci don't yet detect, but which are still valid to convert to
stream_open since ppos is not used. For example drivers/pci/switch/switchtec.c
calls wait_for_completion_interruptible() in its .read, but stream_open.cocci
currently detects only "wait_event*" as blocking.
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James R. Van Zandt" <jrv@vanzandt.mv.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Harald Welte <laforge@gnumonks.org>
Acked-by: Lubomir Rintel <lkundrak@v3.sk> [scr24x_cs]
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: David Herrmann <dh.herrmann@googlemail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Jean Delvare <jdelvare@suse.com>
Acked-by: Guenter Roeck <linux@roeck-us.net> [watchdog/* hwmon/*]
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Logan Gunthorpe <logang@deltatee.com> [drivers/pci/switch/switchtec]
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [drivers/pci/switch/switchtec]
Cc: Benson Leung <bleung@chromium.org>
Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> [platform/chrome]
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> [rtc/*]
Cc: Mark Brown <broonie@kernel.org>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: Wan ZongShun <mcuos.com@gmail.com>
Cc: Zwane Mwaikambo <zwanem@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
In the compacted form, XSAVES may save only the XMM+SSE state but skip
FP (x87 state).
This is denoted by header->xfeatures = 6. The fastpath
(copy_fpregs_to_sigframe()) does that but _also_ initialises the FP
state (cwd to 0x37f, mxcsr as we do, remaining fields to 0).
The slowpath (copy_xstate_to_user()) leaves most of the FP
state untouched. Only mxcsr and mxcsr_flags are set due to
xfeatures_mxcsr_quirk(). Now that XFEATURE_MASK_FP is set
unconditionally, see
04944b793e ("x86: xsave: set FP, SSE bits in the xsave header in the user sigcontext"),
on return from the signal, random garbage is loaded as the FP state.
Instead of utilizing copy_xstate_to_user(), fault-in the user memory
and retry the fast path. Ideally, the fast path succeeds on the second
attempt but may be retried again if the memory is swapped out due
to memory pressure. If the user memory can not be faulted-in then
get_user_pages() returns an error so we don't loop forever.
Fault in memory via get_user_pages_unlocked() so
copy_fpregs_to_sigframe() succeeds without a fault.
Fixes: 69277c98f5 ("x86/fpu: Always store the registers in copy_fpstate_to_sigframe()")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190502171139.mqtegctsg35cir2e@linutronix.de
pfn_valid check is not sufficient because it only checks if a page has a struct
page or not, if "mem=" was passed to the kernel some valid pages won't have a
struct page. This means that if guests were assigned valid memory that lies
after the mem= boundary it will be passed uncached to the guest no matter what
the guest caching attributes are for this memory.
Introduce a new function e820__mapped_raw_any which is equivalent to
e820__mapped_any but uses the original e820 unmodified and use it to
identify real *RAM*.
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a comment to clarify that users of text_poke() must ensure that
no races with module removal take place.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-22-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
permissioned memory in vmalloc and remove places where memory was set NX
and RW before freeing which is no longer needed.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-21-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
permissioned memory in vmalloc and remove places where memory was set NX
and RW before freeing which is no longer needed.
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-20-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are only two types of text poking: early and breakpoint based. The use
of a function pointer to perform text poking complicates the code and is
probably inefficient due to the use of indirect branches.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-13-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. Prevent having
writable executable PTEs in this stage.
In addition, avoiding having W+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch. This was actually the
main motivation for this patch.
To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Set the page as executable after allocation. This patch is a
preparatory patch for a following patch that makes module allocated
pages non-executable.
While at it, do some small cleanup of what appears to be unnecessary
masking.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-11-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since alloc_module() will not set the pages as executable soon, set
ftrace trampoline pages as executable after they are allocated.
For the time being, do not change ftrace to use the text_poke()
interface. As a result, ftrace still breaks W^X.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-10-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
text_poke() already ensures that the written value is the correct one
and fails if that is not the case. There is no need for an additional
comparison. Remove it.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-9-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
text_poke() can potentially compromise security as it sets temporary
PTEs in the fixmap. These PTEs might be used to rewrite the kernel code
from other cores accidentally or maliciously, if an attacker gains the
ability to write onto kernel memory.
Moreover, since remote TLBs are not flushed after the temporary PTEs are
removed, the time-window in which the code is writable is not limited if
the fixmap PTEs - maliciously or accidentally - are cached in the TLB.
To address these potential security hazards, use a temporary mm for
patching the code.
Finally, text_poke() is also not conservative enough when mapping pages,
as it always tries to map 2 pages, even when a single one is sufficient.
So try to be more conservative, and do not map more than needed.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-8-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To prevent improper use of the PTEs that are used for text patching, the
next patches will use a temporary mm struct. Initailize it by copying
the init mm.
The address that will be used for patching is taken from the lower area
that is usually used for the task memory. Doing so prevents the need to
frequently synchronize the temporary-mm (e.g., when BPF programs are
installed), since different PGDs are used for the task memory.
Finally, randomize the address of the PTEs to harden against exploits
that use these PTEs.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: ard.biesheuvel@linaro.org
Cc: deneen.t.dock@intel.com
Cc: kernel-hardening@lists.openwall.com
Cc: kristen@linux.intel.com
Cc: linux_dti@icloud.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no apparent reason not to use text_poke_early() during
early-init, since no patching of code that might be on the stack is done
and only a single core is running.
This is required for the next patches that would set a temporary mm for
text poking, and this mm is only initialized after some static-keys are
enabled/disabled.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-3-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
text_mutex is currently expected to be held before text_poke() is
called, but kgdb does not take the mutex, and instead *supposedly*
ensures the lock is not taken and will not be acquired by any other core
while text_poke() is running.
The reason for the "supposedly" comment is that it is not entirely clear
that this would be the case if gdb_do_roundup is zero.
Create two wrapper functions, text_poke() and text_poke_kgdb(), which do
or do not run the lockdep assertion respectively.
While we are at it, change the return code of text_poke() to something
meaningful. One day, callers might actually respect it and the existing
BUG_ON() when patching fails could be removed. For kgdb, the return
value can actually be used.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9222f60650 ("x86/alternatives: Lockdep-enforce text_mutex in text_poke*()")
Link: https://lkml.kernel.org/r/20190426001143.4983-2-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's used as 'type' in almost every paravirt patching function, so standardize
the field name from the somewhat weird 'instrtype' name to 'type'.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>