The bad_mode() handler is called if we encounter an uunknown exception,
with the expectation that the subsequent call to panic() will halt the
system. Unfortunately, if the exception calling bad_mode() is taken from
EL0, then the call to die() can end up killing the current user task and
calling schedule() instead of falling through to panic().
Remove the die() call altogether, since we really want to bring down the
machine in this "impossible" case.
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
force_signal_inject() is designed to send a fatal signal to userspace,
so WARN if the current pt_regs indicates a kernel context. This can
currently happen for the undefined instruction trap, so patch that up so
we always BUG() if we didn't have a handler.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The cpu errata and feature enable callbacks are only called via their
respective arm64_cpu_capabilities structure and therefore shouldn't
exist in the global namespace.
Move the PAN, RAS and cache maintenance emulation enable callbacks into
the same files as their corresponding arm64_cpu_capabilities structures,
making them static in the process.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD
state without needing to call into firmware.
This patch hooks into the existing SSBD infrastructure so that SSBS is
used on CPUs that support it, but it's all made horribly complicated by
the very real possibility of big/little systems that don't uniformly
provide the new capability.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Rather than panic() when taking an undefined instruction exception from
EL1, allow a hook to be registered in case we want to emulate the
instruction, like we will for the SSBS PSTATE manipulation instructions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that we're all merged nicely into mainline, there's no need to check
to see if PR_SPEC_STORE_BYPASS is defined.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass
Safe (SSBS) which can be used as a mitigation against Spectre variant 4.
Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS
directly, so that userspace can toggle the SSBS control without trapping
to the kernel.
This patch probes for the existence of SSBS and advertise the new instructions
to userspace if they exist.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since commit 23c85094fe ("proc/kcore: add vmcoreinfo note to /proc/kcore")
the kernel has exported the vmcoreinfo PT_NOTE on /proc/kcore as well
as /proc/vmcore.
arm64 only exposes it's additional arch information via
arch_crash_save_vmcoreinfo() if built with CONFIG_KEXEC, as kdump was
previously the only user of vmcoreinfo.
Move this weak function to a separate file that is built at the same
time as its caller in kernel/crash_core.c. This ensures values like
'kimage_voffset' are always present in the vmcoreinfo PT_NOTE.
CC: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add a CRC32 feature bit and wire it up to the CPU id register so we
will be able to use alternatives patching for CRC32 operations.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Drop stackleak_check_alloca() for arm64 since the STACKLEAK gcc plugin now
doesn't track stack depth overflow caused by alloca().
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltxmb4VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpD7E0P/0qn1IMtskaC7EglFCm72+NXe1CW
ZAtxTHzetjf7977dA3bVsg4gEKvVx5b3YuRT76u4hBoSa0rFJ8Q9iSC8wL4u9Idf
JUQjwVIUxMeGW5fR0VFDkd9SkDYtNGdjQcVl2I8UpV+lnLC/2Vfr4xR5qBad2pAQ
zjthdpQMjZWClyhPkOv6WjVsW0lNw0xDkZWgCViBY+TdT7Gmw/q8hmvj9TEwbMGT
7tmQl9MupQ2bLY8WuTiGA6eNiEZld9esJGthI43xGQDJl4Y3FeciIZWcBru20+wu
GnC3QS3FlmYlp2WuWcKU9lEGXhmoX/7/1WVhZkoMsIvi05c2JCxSxstK7QNfUaAH
8q2/Wc0fYIGm2owH+b1Mpn0w37GZtgl7Bxxzakg7B7Ko0q/EnO7z6XVup1/abKRU
NtUKlWIL7NDiHjHO6j0hBb3rGi7B3wo86P7GTPJb12Dg9EBF5DVhekXeGI/ChzE9
WIV1PxR0seSapzlJ92HHmWLAtcRLtXXesqcctmN4d2URBtsx9DEwo0Upiz//reYE
TBncQbtniVt2xXEl7sqNEYei75IxC3Dg1AgDL/zVQDl8PW0UvKo8Qb0cW7EnF9Vg
AcjD6R72dAgbqUMYOP0nriKxzXwa0Jls9aF3zBgcikKMGeyD6Z/Exlq4LexhSeuw
cWKsrQUYcLGKZPRN
=b6+A
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 4.19
- Support for Group0 interrupts in guests
- Cache management optimizations for ARMv8.4 systems
- Userspace interface for RAS, allowing error retrival and injection
- Fault path optimization
- Emulated physical timer fixes
- Random cleanups
- Fix boot on Hikey-960 by avoiding an IPI with interrupts disabled
- Fix address truncation in pfn_valid() implementation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbdp+EAAoJELescNyEwWM0Ld8H/iJqjvPwNLRC0KGL/rCQJH70
D80qlNBnwlrs2eUJTeNeRVZC+t2l9vJIoT17W938WkjxV+DSGDsfFDy3/BQ7VTji
7e33mwFBNoH+feAfMYmzht3sRlvyZ0oqXSIq/GrdZ8a4Gg/6iNVz7K1kpboBVFXp
LFnFIN4I7mNwdl1nAyNmnU081MMWfyvgRB82Xd9eS00KCAm3ueHfkwBNcwkfulDg
RT2ZXPzwd3Yxsdy3Z+r1vyXMHAw2GjcYpL5pjvHf34zMdvqkk03sMsx2yReuSR1U
M6MpNCdZfWHgMlFWbsEoEOd0g0CF5s6TQK3hBqoUEE3AUVNrQ8ixZMip326axoQ=
=C2YW
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"A couple of arm64 fixes
- Fix boot on Hikey-960 by avoiding an IPI with interrupts disabled
- Fix address truncation in pfn_valid() implementation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid()
arm64: Avoid calling stop_machine() when patching jump labels
Patching a jump label involves patching a single instruction at a time,
swizzling between a branch and a NOP. The architecture treats these
instructions specially, so a concurrently executing CPU is guaranteed to
see either the NOP or the branch, rather than an amalgamation of the two
instruction encodings.
However, in order to guarantee that the new instruction is visible, it
is necessary to send an IPI to the concurrently executing CPU so that it
discards any previously fetched instructions from its pipeline. This
operation therefore cannot be completed from a context with IRQs
disabled, but this is exactly what happens on the jump label path where
the hotplug lock is held and irqs are subsequently disabled by
stop_machine_cpuslocked(). This results in a deadlock during boot on
Hikey-960.
Due to the architectural guarantees around patching NOPs and branches,
we don't actually need to stop_machine() at all on the jump label path,
so we can avoid the deadlock by using the "nosync" variant of our
instruction patching routine.
Fixes: 693350a799 ("arm64: insn: Don't fallback on nosync path for general insn patching")
Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- verify depmod is installed before modules_install
- support build salt in case build ids must be unique between builds
- allow users to specify additional host compiler flags via HOST*FLAGS,
and rename internal variables to KBUILD_HOST*FLAGS
- update buildtar script to drop vax support, add arm64 support
- update builddeb script for better debarch support
- document the pit-fall of if_changed usage
- fix parallel build of UML with O= option
- make 'samples' target depend on headers_install to fix build errors
- remove deprecated host-progs variable
- add a new coccinelle script for refcount_t vs atomic_t check
- improve double-test coccinelle script
- misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbdFZ0AAoJED2LAQed4NsGcHYP/23txxk3GRP7O4UkfPw9Rtky
MHiXTgcoy2vbG+l12BgzWX+qFii8XTUe3dQtK4HnGQFUIBtEBV/hpZPJtxfgGSev
Zou5cv1kr5rNzTkCn//TG3O6/WIkTBCe2hahDCtmGDI3kd/cPK4dHbU/q6KpaqIJ
qzZYBXIvCeu2GM8idQoCRrwdMpgu1pBz1gz2sDje1yHH2toI7T6cXHRLQDgx+HPq
LIP7W9GUsoDdXjecvPD51LiW89E6BUxETBh5Ft9r9uzwB5ylQQMcw6Qyu2DiYDUX
PPsHCMiolYV+Ttcy+vj/67KOvKmEaFotssck+RD/xDCF17zKhRkup+YM8kPLHTVZ
TcAUZadbnT6U/s2W6GFwvVbN/P7cc3aif+aNCC/Pl23yagp3pydlSCocYxQgiVR7
/rx48haYDEgu/MJ1X0dOpSO0ErY7zu2OoAlNerW+D9QizwbP+WtZO/CJH8SxQRuN
dQ1xmyNrie+ODgi9tbc4eBrsb+1rioX927TP5MbJcfXt5CTsxDmIqop5XwyYIoQN
ZWWlzC8Ii3P2trAVpBgM2IEbngSxwr6T9Wbf1ScJnPKr/o1rq+pBk49cYstTz3kQ
OwJ8gPwUrkW4R+hlD7L6mL/WcrKzZBQS0Ij1QW2kVSEhRrsKo99psE1/rGehnHu9
KGB0LYYCqGSOHR4zOjg0
=VjfG
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- verify depmod is installed before modules_install
- support build salt in case build ids must be unique between builds
- allow users to specify additional host compiler flags via HOST*FLAGS,
and rename internal variables to KBUILD_HOST*FLAGS
- update buildtar script to drop vax support, add arm64 support
- update builddeb script for better debarch support
- document the pit-fall of if_changed usage
- fix parallel build of UML with O= option
- make 'samples' target depend on headers_install to fix build errors
- remove deprecated host-progs variable
- add a new coccinelle script for refcount_t vs atomic_t check
- improve double-test coccinelle script
- misc cleanups and fixes
* tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
coccicheck: return proper error code on fail
Coccinelle: doubletest: reduce side effect false positives
kbuild: remove deprecated host-progs variable
kbuild: make samples really depend on headers_install
um: clean up archheaders recipe
kbuild: add %asm-generic to no-dot-config-targets
um: fix parallel building with O= option
scripts: Add Python 3 support to tracing/draw_functrace.py
builddeb: Add automatic support for sh{3,4}{,eb} architectures
builddeb: Add automatic support for riscv* architectures
builddeb: Add automatic support for m68k architecture
builddeb: Add automatic support for or1k architecture
builddeb: Add automatic support for sparc64 architecture
builddeb: Add automatic support for mips{,64}r6{,el} architectures
builddeb: Add automatic support for mips64el architecture
builddeb: Add automatic support for ppc64 and powerpcspe architectures
builddeb: Introduce functions to simplify kconfig tests in set_debarch
builddeb: Drop check for 32-bit s390
builddeb: Change architecture detection fallback to use dpkg-architecture
builddeb: Skip architecture detection when KBUILD_DEBARCH is set
...
A bunch of good stuff in here:
- Wire up support for qspinlock, replacing our trusty ticket lock code
- Add an IPI to flush_icache_range() to ensure that stale instructions
fetched into the pipeline are discarded along with the I-cache lines
- Support for the GCC "stackleak" plugin
- Support for restartable sequences, plus an arm64 port for the selftest
- Kexec/kdump support on systems booting with ACPI
- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace
- Support for chained PMU counters, allowing 64-bit event counters to be
constructed on current CPUs
- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events
- Re-enable support for huge vmalloc/IO mappings now that the core code
has the correct hooks to use break-before-make sequences
- Miscellaneous, non-critical fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbbV41AAoJELescNyEwWM0WoEIALhrKtsIn6vqFlSs/w6aDuJL
cMWmFxjTaKLmIq2+cJIdFLOJ3CH80Pu9gB+nEv/k+cZdCTfUVKfRf28HTpmYWsht
bb4AhdHMC7yFW752BHk+mzJspeC8h/2Rm8wMuNVplZ3MkPrwo3vsiuJTofLhVL/y
BihlU3+5sfBvCYIsWnuEZIev+/I/s/qm1ASiqIcKSrFRZP6VTt5f9TC75vFI8seW
7yc3odKb0CArexB8yBjiPNziehctQF42doxQyL45hezLfWw4qdgHOSiwyiOMxEz9
Fwwpp8Tx33SKLNJgqoqYznGW9PhYJ7n2Kslv19uchJrEV+mds82vdDNaWRULld4=
=kQn6
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"A bunch of good stuff in here. Worth noting is that we've pulled in
the x86/mm branch from -tip so that we can make use of the core
ioremap changes which allow us to put down huge mappings in the
vmalloc area without screwing up the TLB. Much of the positive
diffstat is because of the rseq selftest for arm64.
Summary:
- Wire up support for qspinlock, replacing our trusty ticket lock
code
- Add an IPI to flush_icache_range() to ensure that stale
instructions fetched into the pipeline are discarded along with the
I-cache lines
- Support for the GCC "stackleak" plugin
- Support for restartable sequences, plus an arm64 port for the
selftest
- Kexec/kdump support on systems booting with ACPI
- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace
- Support for chained PMU counters, allowing 64-bit event counters to
be constructed on current CPUs
- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events
- Re-enable support for huge vmalloc/IO mappings now that the core
code has the correct hooks to use break-before-make sequences
- Miscellaneous, non-critical fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
arm64: alternative: Use true and false for boolean values
arm64: kexec: Add comment to explain use of __flush_icache_range()
arm64: sdei: Mark sdei stack helper functions as static
arm64, kaslr: export offset in VMCOREINFO ELF notes
arm64: perf: Add cap_user_time aarch64
efi/libstub: Only disable stackleak plugin for arm64
arm64: drop unused kernel_neon_begin_partial() macro
arm64: kexec: machine_kexec should call __flush_icache_range
arm64: svc: Ensure hardirq tracing is updated before return
arm64: mm: Export __sync_icache_dcache() for xen-privcmd
drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory
arm64: Add support for STACKLEAK gcc plugin
arm64: Add stack information to on_accessible_stack
drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported
arm64: fix ACPI dependencies
rseq/selftests: Add support for arm64
arm64: acpi: fix alignment fault in accessing ACPI
efi/arm: map UEFI memory map even w/o runtime services enabled
efi/arm: preserve early mapping of UEFI memory map longer for BGRT
drivers: acpi: add dependency of EFI for arm64
...
Pull perf update from Thomas Gleixner:
"The perf crowd presents:
Kernel updates:
- Removal of jprobes
- Cleanup and consolidatation the handling of kprobes
- Cleanup and consolidation of hardware breakpoints
- The usual pile of fixes and updates to PMUs and event descriptors
Tooling updates:
- Updates and improvements all over the place. Nothing outstanding,
just the (good) boring incremental grump work"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
perf trace: Do not require --no-syscalls to suppress strace like output
perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h
perf tools: Allow overriding MAX_NR_CPUS at compile time
perf bpf: Show better message when failing to load an object
perf list: Unify metric group description format with PMU event description
perf vendor events arm64: Update ThunderX2 implementation defined pmu core events
perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet
perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet
perf cs-etm: Fix start tracing packet handling
perf build: Fix installation directory for eBPF
perf c2c report: Fix crash for empty browser
perf tests: Fix indexing when invoking subtests
perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args
perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg
perf trace beauty: Do not print NULL strarray entries
perf beauty: Add a generator for IPPROTO_ socket's protocol constants
tools include uapi: Grab a copy of linux/in.h
perf tests: Fix complex event name parsing
perf evlist: Fix error out while applying initial delay and LBR
...
Pull genirq updates from Thomas Gleixner:
"The irq departement provides:
- A synchronization fix for free_irq() to synchronize just the
removed interrupt thread on shared interrupt lines.
- Consolidate the multi low level interrupt entry handling and mvoe
it to the generic code instead of adding yet another copy for
RISC-V
- Refactoring of the ARM LPI allocator and LPI exposure to the
hypervisor
- Yet another interrupt chip driver for the JZ4725B SoC
- Speed up for /proc/interrupts as people seem to love reading this
file with high frequency
- Miscellaneous fixes and updates"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t
genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete
openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER
arm64: Use the new GENERIC_IRQ_MULTI_HANDLER
ARM: Convert to GENERIC_IRQ_MULTI_HANDLER
irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER
irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices
dt-bindings: irqchip: renesas-irqc: Document r8a77980 support
dt-bindings: irqchip: renesas-irqc: Document r8a77470 support
irqchip/ingenic: Add support for the JZ4725B SoC
irqchip/stm32: Add exti0 translation for stm32mp1
genirq: Remove redundant NULL pointer check in __free_irq()
irqchip/gic-v3-its: Honor hypervisor enforced LPI range
irqchip/gic-v3: Expose GICD_TYPER in the rdist structure
irqchip/gic-v3-its: Drop chunk allocation compatibility
irqchip/gic-v3-its: Move minimum LPI requirements to individual busses
irqchip/gic-v3-its: Use full range of LPIs
irqchip/gic-v3-its: Refactor LPI allocator
genirq: Synchronize only with single thread on free_irq()
genirq: Update code comments wrt recycled thread_mask
...
Return statements in functions returning bool should use true or false
instead of an integer value. This code was detected with the help of
Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltoBXMVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyUYP/1feAq3F7ZmhCIZka4c6y/m4EBpq
BjWEEgOAGMEyyB4s98flsRtZcEUxxp6CqEXo2FgCsd1Nj+og7oA7vwOlqy3aGzsi
9f/Z5Wi6SlG06lH5tmYNkyVbGk2tE3s2FzkH5Rg8qZGk+X3OCOdNs/+G20pYAkSp
ESePWSapbQUJSExJ1MqzfdHFidtVA1V+ev8BKdIp2ykl1NRae8LJeKHIbqac49Ym
JclfCLFpQM1M1ElB9j0E8hAvZhz10oOz7TtBR737O/1QEifVyFqGBckPzldvwIJM
zZ+nR+Yzj1ruD109xwaF1iKy9AinZWhiqrtN7UXJ3jwHtNih+sy0R6FQ38GMNoOC
0K02n/qStR5xglGr4BmAcWlOuFtBYWfz6HpSVMqaTWWmOxHEiqS6pXtEA+dV/YyI
wHLbo0YzpWTQm6t1+b/PoByAJ0/hOcD1nOD57b+NGjX7tZV0sGjpGsecvFhTSywh
BN3COBi9k/FOBrOTGDX1qUAI+mEf76vc2BAC+BkkoiiMg3WlY0E9qfQJguUxHdrb
0LS3lDZoHCNoz8RZLrUyenTT0NYGcjPGUTinMDJWG79VGXOWFexTDdCuX0kF90CK
1Zie3O6lrTYolmaiyLUxwukKp1SVUyoA5IpKVwfDJQYUhEfk27yvlzg2MBMcHDRA
uy3QSkmjx9vw/sAu
=gKw8
-----END PGP SIGNATURE-----
Merge tag 'irqchip-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
Now that we understand the deadlock arising from flush_icache_range()
on the kexec crash kernel path, add a comment to justify the use of
__flush_icache_range() here.
Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SDEI stack helper functions are only used by _on_sdei_stack() and
refer to symbols (e.g. sdei_stack_normal_ptr) that are only defined if
CONFIG_VMAP_STACK=y.
Mark these functions as static, so we don't run into errors at link-time
due to references to undefined symbols. Stick all the parameters onto
the same line whilst we're passing through.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Include KASLR offset in arm64 VMCOREINFO ELF notes to assist in
debugging. vmcore parsing in user-space already expects this value in
the notes and we are providing it for portability of those existing
tools with x86.
Ideally we would like core code to do this (so that way this
information won't be missed when an architecture adds KASLR support),
but mips has CONFIG_RANDOMIZE_BASE, and doesn't provide kaslr_offset(),
so I am not sure if this is needed for mips (and other such similar arch
cases in future). So, lets keep this architecture specific for now.
As an example of a user-space use-case, consider the
makedumpfile user-space utility which will need fixup to use this
KASLR offset to work with cases where we need to find a way to
translate symbol address from vmlinux to kernel run time address
in case of KASLR boot on arm64.
I have already submitted the makedumpfile user-space patch upstream
and the maintainer has suggested to wait for the kernel changes to be
included (see [0]).
I tested this on my qualcomm amberwing board both for KASLR and
non-KASLR boot cases:
Without this patch:
# cat > scrub.conf << EOF
[vmlinux]
erase jiffies
erase init_task.utime
for tsk in init_task.tasks.next within task_struct:tasks
erase tsk.utime
endfor
EOF
# makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
readpage_elf: Attempt to read non-existent page at 0xffffa8a5bf180000.
readmem: type_addr: 1, addr:ffffa8a5bf180000, size:8
vaddr_to_paddr_arm64: Can't read pgd
readmem: Can't convert a virtual address(ffff0000092a542c) to physical
address.
readmem: type_addr: 0, addr:ffff0000092a542c, size:390
check_release: Can't get the address of system_utsname
After this patch check_release() is ok, and also we are able to erase
symbol from vmcore (I checked this with kernel 4.18.0-rc4+):
# makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
The kernel version is not supported.
The makedumpfile operation may be incomplete.
Checking for memory holes : [100.0 %] \
Checking for memory holes : [100.0 %] |
Checking foExcluding unnecessary pages : [100.0 %]
\
Excluding unnecessary pages : [100.0 %] \
The dumpfiles are saved to dumpfile_1, dumpfile_2, and dumpfile_3.
makedumpfile Completed.
[0] https://www.spinics.net/lists/kexec/msg21195.html
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It is useful to get the running time of a thread. Doing so in an
efficient manner can be important for performance of user applications.
Avoiding system calls in `clock_gettime` when handling
CLOCK_THREAD_CPUTIME_ID is important. Other clocks are handled in the
VDSO, but CLOCK_THREAD_CPUTIME_ID falls back on the system call.
CLOCK_THREAD_CPUTIME_ID is not handled in the VDSO since it would have
costs associated with maintaining updated user space accessible time
offsets. These offsets have to be updated everytime the a thread is
scheduled/descheduled. However, for programs regularly checking the
running time of a thread, this is a performance improvement.
This patch takes a middle ground, and adds support for cap_user_time an
optional feature of the perf_event API. This way costs are only
incurred when the perf_event api is enabled. This is done the same way
as it is in x86.
Ultimately this allows calculating the thread running time in userspace
on aarch64 as follows (adapted from perf_event_open manpage):
u32 seq, time_mult, time_shift;
u64 running, count, time_offset, quot, rem, delta;
struct perf_event_mmap_page *pc;
pc = buf; // buf is the perf event mmaped page as documented in the API.
if (pc->cap_usr_time) {
do {
seq = pc->lock;
barrier();
running = pc->time_running;
count = readCNTVCT_EL0(); // Read ARM hardware clock.
time_offset = pc->time_offset;
time_mult = pc->time_mult;
time_shift = pc->time_shift;
barrier();
} while (pc->lock != seq);
quot = (count >> time_shift);
rem = count & (((u64)1 << time_shift) - 1);
delta = time_offset + quot * time_mult +
((rem * time_mult) >> time_shift);
running += delta;
// running now has the current nanosecond level thread time.
}
Summary of changes in the patch:
For aarch64 systems, make arch_perf_update_userpage update the timing
information stored in the perf_event page. Requiring the following
calculations:
- Calculate the appropriate time_mult, and time_shift factors to convert
ticks to nano seconds for the current clock frequency.
- Adjust the mult and shift factors to avoid shift factors of 32 bits.
(possibly unnecessary)
- The time_offset userspace should apply when doing calculations:
negative the current sched time (now), because time_running and
time_enabled fields of the perf_event page have just been updated.
Toggle bits to appropriate values:
- Enable cap_user_time
Signed-off-by: Michael O'Farrell <micpof@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
machine_kexec flushes the reboot_code_buffer from the icache
after stopping the other cpus.
Commit 3b8c9f1cdf ("arm64: IPI each CPU after invalidating the I-cache
for kernel mappings") added an IPI call to flush_icache_range, which
causes a hang here, so replace the call with __flush_icache_range
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We always run userspace with interrupts enabled, but with the recent
conversion of the syscall entry/exit code to C, we don't inform the
hardirq tracing code that interrupts are about to become enabled by
virtue of restoring the EL0 SPSR.
This patch ensures that trace_hardirqs_on() is called on the syscall
return path when we return to the assembly code with interrupts still
disabled.
Fixes: f37099b699 ("arm64: convert syscall trace logic to C")
Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull in arm perf updates, including support for 64-bit (chained) event
counters and some non-critical fixes for some of the system PMU drivers.
Signed-off-by: Will Deacon <will.deacon@arm.com>
This adds support for the STACKLEAK gcc plugin to arm64 by implementing
stackleak_check_alloca(), based heavily on the x86 version, and adding the
two helpers used by the stackleak common code: current_top_of_stack() and
on_thread_stack(). The stack erasure calls are made at syscall returns.
Additionally, this disables the plugin in hypervisor and EFI stub code,
which are out of scope for the protection.
Acked-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for enabling the stackleak plugin on arm64,
we need a way to get the bounds of the current stack. Extend
on_accessible_stack to get this information.
Acked-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
[will: folded in fix for allmodconfig build breakage w/ sdei]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit d3aec8a28b ("arm64: capabilities: Restrict KPTI
detection to boot-time CPUs") we rely on errata flags being already
populated during feature enumeration. The order of errata and
features was flipped as part of commit ed478b3f9e ("arm64:
capabilities: Group handling of features and errata workarounds").
Return to the orginal order of errata and feature evaluation to
ensure errata flags are present during feature evaluation.
Fixes: ed478b3f9e ("arm64: capabilities: Group handling of
features and errata workarounds")
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
CC: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is a fix against the issue that crash dump kernel may hang up
during booting, which can happen on any ACPI-based system with "ACPI
Reclaim Memory."
(kernel messages after panic kicked off kdump)
(snip...)
Bye!
(snip...)
ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
Internal error: Oops: 96000021 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
task: ffff000008d05180 task.stack: ffff000008cc0000
PC is at acpi_ns_lookup+0x25c/0x3c0
LR is at acpi_ds_load1_begin_op+0xa4/0x294
(snip...)
Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
Call trace:
(snip...)
[<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[<ffff000008b70d50>] start_kernel+0x3b4/0x43c
Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
---[ end trace c46ed37f9651c58e ]---
Kernel panic - not syncing: Fatal exception
Rebooting in 10 seconds..
(diagnosis)
* This fault is a data abort, alignment fault (ESR=0x96000021)
during reading out ACPI table.
* Initial ACPI tables are normally stored in system ram and marked as
"ACPI Reclaim memory" by the firmware.
* After the commit f56ab9a5b7 ("efi/arm: Don't mark ACPI reclaim
memory as MEMBLOCK_NOMAP"), those regions are differently handled
as they are "memblock-reserved", without NOMAP bit.
* So they are now excluded from device tree's "usable-memory-range"
which kexec-tools determines based on a current view of /proc/iomem.
* When crash dump kernel boots up, it tries to accesses ACPI tables by
mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
since they are no longer part of mapped system ram.
* Given that ACPI accessor/helper functions are compiled in without
unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
any unaligned access to ACPI tables can cause a fatal panic.
With this patch, acpi_os_ioremap() always honors memory attribute
information provided by the firmware (EFI) and retaining cacheability
allows the kernel safe access to ACPI tables.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by and Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There has been some confusion around what is necessary to prevent kexec
overwriting important memory regions. memblock: reserve, or nomap?
Only memblock nomap regions are reported via /proc/iomem, kexec's
user-space doesn't know about memblock_reserve()d regions.
Until commit f56ab9a5b7 ("efi/arm: Don't mark ACPI reclaim memory
as MEMBLOCK_NOMAP") the ACPI tables were nomap, now they are reserved
and thus possible for kexec to overwrite with the new kernel or initrd.
But this was always broken, as the UEFI memory map is also reserved
and not marked as nomap.
Exporting both nomap and reserved memblock types is a nuisance as
they live in different memblock structures which we can't walk at
the same time.
Take a second walk over memblock.reserved and add new 'reserved'
subnodes for the memblock_reserved() regions that aren't already
described by the existing code. (e.g. Kernel Code)
We use reserve_region_with_split() to find the gaps in existing named
regions. This handles the gap between 'kernel code' and 'kernel data'
which is memblock_reserve()d, but already partially described by
request_standard_resources(). e.g.:
| 80000000-dfffffff : System RAM
| 80080000-80ffffff : Kernel code
| 81000000-8158ffff : reserved
| 81590000-8237efff : Kernel data
| a0000000-dfffffff : Crash kernel
| e00f0000-f949ffff : System RAM
reserve_region_with_split needs kzalloc() which isn't available when
request_standard_resources() is called, use an initcall.
Reported-by: Bhupesh Sharma <bhsharma@redhat.com>
Reported-by: Tyler Baicar <tbaicar@codeaurora.org>
Suggested-by: Akashi Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: James Morse <james.morse@arm.com>
Fixes: d28f6df130 ("arm64/kexec: Add core kexec support")
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's possible for userspace to control idx. Sanitize idx when using it
as an array index, to inhibit the potential spectre-v1 write gadget.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The vDSO needs to have a unique build id in a similar manner
to the kernel and modules. Use the build salt macro.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
syscall_trace_{enter,exit} are only called from C code, so drop the
asmlinkage qualifier from their definitions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
To minimize the risk of userspace-controlled values being used under
speculation, this patch adds pt_regs based syscall wrappers for arm64,
which pass the minimum set of required userspace values to syscall
implementations. For each syscall, a wrapper which takes a pt_regs
argument is automatically generated, and this extracts the arguments
before calling the "real" syscall implementation.
Each syscall has three functions generated:
* __do_<compat_>sys_<name> is the "real" syscall implementation, with
the expected prototype.
* __se_<compat_>sys_<name> is the sign-extension/narrowing wrapper,
inherited from common code. This takes a series of long parameters,
casting each to the requisite types required by the "real" syscall
implementation in __do_<compat_>sys_<name>.
This wrapper *may* not be necessary on arm64 given the AAPCS rules on
unused register bits, but it seemed safer to keep the wrapper for now.
* __arm64_<compat_>_sys_<name> takes a struct pt_regs pointer, and
extracts *only* the relevant register values, passing these on to the
__se_<compat_>sys_<name> wrapper.
The syscall invocation code is updated to handle the calling convention
required by __arm64_<compat_>_sys_<name>, and passes a single struct
pt_regs pointer.
The compiler can fold the syscall implementation and its wrappers, such
that the overhead of this approach is minimized.
Note that we play games with sys_ni_syscall(). It can't be defined with
SYSCALL_DEFINE0() because we must avoid the possibility of error
injection. Additionally, there are a couple of locations where we need
to call it from C code, and we don't (currently) have a
ksys_ni_syscall(). While it has no wrapper, passing in a redundant
pt_regs pointer is benign per the AAPCS.
When ARCH_HAS_SYSCALL_WRAPPER is selected, no prototype is defines for
sys_ni_syscall(). Since we need to treat it differently for in-kernel
calls and the syscall tables, the prototype is defined as-required.
The wrappers are largely the same as their x86 counterparts, but
simplified as we don't have a variety of compat calling conventions that
require separate stubs. Unlike x86, we have some zero-argument compat
syscalls, and must define COMPAT_SYSCALL_DEFINE0() to ensure that these
are also given an __arm64_compat_sys_ prefix.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for converting to pt_regs syscall wrappers, convert our
existing compat wrappers to C. This will allow the pt_regs wrappers to
be automatically generated, and will allow for the compat register
manipulation to be folded in with the pt_regs accesses.
To avoid confusion with the upcoming pt_regs wrappers and existing
compat wrappers provided by core code, the C wrappers are renamed to
compat_sys_aarch32_<syscall>.
With the assembly wrappers gone, we can get rid of entry32.S and the
associated boilerplate.
Note that these must call the ksys_* syscall entry points, as the usual
sys_* entry points will be modified to take a single pt_regs pointer
argument.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently annotate our mmap implementation as a syscall, as we
need to do to use pt_regs syscall wrappers.
Let's mark it as a real syscall.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently annotate our various sigreturn functions as syscalls,
as we need to do to use pt_regs syscall wrappers.
Let's mark them as real syscalls.
For compat_sys_sigreturn and compat_sys_rt_sigreturn, this changes the
return type from int to long, matching the prototypes in sys32.c.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With pt_regs syscall wrappers, the calling convention for
sys_personality() will change. Use ksys_personality(), which is
functionally equivalent.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Our syscall tables are aligned to 4096 bytes, which allowed their
addresses to be generated with a single adrp in entry.S. This has the
unfortunate property of wasting space in .rodata for the necessary
padding.
Now that the address is generated by C code, we can rely on the compiler
to do the right thing, and drop the alignemnt.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can zero GPRs x0 - x29 upon entry from EL0 to make it harder for
userspace to control values consumed by speculative gadgets.
We don't blat x30, since this is stashed much later, and we'll blat it
before invoking C code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that all of the syscall logic works on the saved pt_regs, apply_ssbd
can safely corrupt x0-x3 in the entry paths, and we no longer need to
restore them. So let's remove the logic doing so.
With that logic gone, we can fold the branch target into the macro, so
that callers need not deal with this. GAS provides \@, which provides a
unique value per macro invocation, which we can use to create a unique
label.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that syscalls are invoked with pt_regs, we no longer need to ensure
that the argument regsiters are live in the entry assembly, and it's
fine to not restore them after context_tracking_user_exit() has
corrupted them.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the syscall invocation logic is in C, we can migrate the rest
of the syscall entry logic over, so that the entry assembly needn't look
at the register values at all.
The SVE reset across syscall logic now unconditionally clears TIF_SVE,
but sve_user_disable() will only write back to CPACR_EL1 when SVE is
actually enabled.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently syscall tracing is a tricky assembly state machine, which can
be rather difficult to follow, and even harder to modify. Before we
start fiddling with it for pt_regs syscalls, let's convert it to C.
This is not intended to have any functional change.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As a first step towards invoking syscalls with a pt_regs argument,
convert the raw syscall invocation logic to C. We end up with a bit more
register shuffling, but the unified invocation logic means we can unify
the tracing paths, too.
Previously, assembly had to open-code calls to ni_sys() when the system
call number was out-of-bounds for the relevant syscall table. This case
is now handled by invoke_syscall(), and the assembly no longer need to
handle this case explicitly. This allows the tracing paths to be
simplified and unified, as we no longer need the __ni_sys_trace path and
the __sys_trace_return label.
This only converts the invocation of the syscall. The rest of the
syscall triage and tracing is left in assembly for now, and will be
converted in subsequent patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for invoking arbitrary syscalls from C code, let's define
a type for an arbitrary syscall, matching the parameter passing rules of
the AAPCS.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm64 sigreturn* syscall handlers are non-standard. Rather than
taking a number of user parameters in registers as per the AAPCS,
they expect the pt_regs as their sole argument.
To make this work, we override the syscall definitions to invoke
wrappers written in assembly, which mov the SP into x0, and branch to
their respective C functions.
On other architectures (such as x86), the sigreturn* functions take no
argument and instead use current_pt_regs() to acquire the user
registers. This requires less boilerplate code, and allows for other
features such as interposing C code in this path.
This patch takes the same approach for arm64.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tentatively-reviewed-by: Dave Martin <dave.martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In subsequent patches, we'll want to make use of sve_user_enable() and
sve_user_disable() outside of kernel/fpsimd.c. Let's move these to
<asm/fpsimd.h> where we can make use of them.
To avoid ifdeffery in sequences like:
if (system_supports_sve() && some_condition)
sve_user_disable();
... empty stubs are provided when support for SVE is not enabled. Note
that system_supports_sve() contains as IS_ENABLED(CONFIG_ARM64_SVE), so
the sve_user_disable() call should be optimized away entirely when
CONFIG_ARM64_SVE is not selected.
To ensure that this is the case, the stub definitions contain a
BUILD_BUG(), as we do for other stubs for which calls should always be
optimized away when the relevant config option is not selected.
At the same time, the include list of <asm/fpsimd.h> is sorted while
adding <asm/sysreg.h>.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have sysreg_clear_set(), we can use this instead of
change_cpacr().
Note that the order of the set and clear arguments differs between
change_cpacr() and sysreg_clear_set(), so these are flipped as part of
the conversion. Also, sve_user_enable() redundantly clears
CPACR_EL1_ZEN_EL0EN before setting it; this is removed for clarity.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have sysreg_clear_set(), we can consistently use this
instead of config_sctlr_el1().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In do_notify_resume, we manipulate thread_flags as a 32-bit unsigned
int, whereas thread_info::flags is a 64-bit unsigned long, and elsewhere
(e.g. in the entry assembly) we manipulate the flags as a 64-bit
quantity.
For consistency, and to avoid problems if we end up with more than 32
flags, let's make do_notify_resume take the flags as a 64-bit unsigned
long.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This reverts commit 7e7df71fd5.
When unwinding out of the IRQ stack and onto the interrupted EL1 stack,
we cannot rely on the frame pointer being strictly increasing, as this
could terminate the backtrace early depending on how the stacks have
been allocated.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Implement calls to rseq_signal_deliver, rseq_handle_notify_resume
and rseq_syscall so that we can select HAVE_RSEQ on arm64.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add support for 64bit event by using chained event counters
and 64bit cycle counters.
PMUv3 allows chaining a pair of adjacent 32-bit counters, effectively
forming a 64-bit counter. The low/even counter is programmed to count
the event of interest, and the high/odd counter is programmed to count
the CHAIN event, taken when the low/even counter overflows.
For CPU cycles, when 64bit mode is requested, the cycle counter
is used in 64bit mode. If the cycle counter is not available,
falls back to chaining.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm64 PMU updates the event counters and reprograms the
counters in the overflow IRQ handler without disabling the
PMU. This could potentially cause skews in for group counters,
where the overflowed counters may potentially loose some event
counts, while they are reprogrammed. To prevent this, disable
the PMU while we process the counter overflows and enable it
right back when we are done.
This patch also moves the PMU stop/start routines to avoid a
forward declaration.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
armv8pmu_select_counter always returns the passed idx. So
let us make that void and get rid of the pointless checks.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The armpmu uses get_event_idx callback to allocate an event
counter for a given event, which marks the selected counter
as "used". Now, when we delete the counter, the arm_pmu goes
ahead and clears the "used" bit and then invokes the "clear_event_idx"
call back, which kind of splits the job between the core code
and the backend. To keep things tidy, mandate the implementation
of clear_event_idx() and add it for exisiting backends.
This will be useful for adding the chained event support, where
we leave the event idx maintenance to the backend.
Also, when an event is removed from the PMU, reset the hw.idx
to indicate that a counter is not allocated for this event,
to help the backends do better checks. This will be also used
for the chain counter support.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Convert the {read/write}_counter APIs to handle 64bit values
to enable supporting chained event counters. The backends still
use 32bit values and we pass them 32bit values only. So in effect
there are no functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Each PMU defines their max_period of the counter as the maximum
value that can be counted. Since all the PMU backends support
32bit counters by default, let us remove the redundant field.
No functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Current ACPI ARM64 NUMA initialization code in
acpi_numa_gicc_affinity_init()
carries out NUMA nodes creation and cpu<->node mappings at the same time
in the arch backend so that a single SRAT walk is needed to parse both
pieces of information. This implies that the cpu<->node mappings must
be stashed in an array (sized NR_CPUS) so that SMP code can later use
the stashed values to avoid another SRAT table walk to set-up the early
cpu<->node mappings.
If the kernel is configured with a NR_CPUS value less than the actual
processor entries in the SRAT (and MADT), the logic in
acpi_numa_gicc_affinity_init() is broken in that the cpu<->node mapping
is only carried out (and stashed for future use) only for a number of
SRAT entries up to NR_CPUS, which do not necessarily correspond to the
possible cpus detected at SMP initialization in
acpi_map_gic_cpu_interface() (ie MADT and SRAT processor entries order
is not enforced), which leaves the kernel with broken cpu<->node
mappings.
Furthermore, given the current ACPI NUMA code parsing logic in
acpi_numa_gicc_affinity_init(), PXM domains for CPUs that are not parsed
because they exceed NR_CPUS entries are not mapped to NUMA nodes (ie the
PXM corresponding node is not created in the kernel) leaving the system
with a broken NUMA topology.
Rework the ACPI ARM64 NUMA initialization process so that the NUMA
nodes creation and cpu<->node mappings are decoupled. cpu<->node
mappings are moved to SMP initialization code (where they are needed),
at the cost of an extra SRAT walk so that ACPI NUMA mappings can be
batched before being applied, fixing current parsing pitfalls.
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: John Garry <john.garry@huawei.com>
Fixes: d8b47fca8c ("arm64, ACPI, NUMA: NUMA support based on SRAT and
SLIT")
Link: http://lkml.kernel.org/r/1527768879-88161-2-git-send-email-xiexiuqi@huawei.com
Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Up to ARMv8.3, the combinaison of Stage-1 and Stage-2 attributes
results in the strongest attribute of the two stages. This means
that the hypervisor has to perform quite a lot of cache maintenance
just in case the guest has some non-cacheable mappings around.
ARMv8.4 solves this problem by offering a different mode (FWB) where
Stage-2 has total control over the memory attribute (this is limited
to systems where both I/O and instruction fetches are coherent with
the dcache). This is achieved by having a different set of memory
attributes in the page tables, and a new bit set in HCR_EL2.
On such a system, we can then safely sidestep any form of dcache
management.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Commit 37c3ec2d81 ("arm64: topology: divorce MC scheduling domain from
core_siblings") selected the smallest of LLC, socket siblings, and NUMA
node siblings to ensure that the sched domain we build for the MC layer
isn't larger than the DIE above it or it's shrunk to the socket or NUMA
node if LLC exist acrosis NUMA node/chiplets.
Commit acd32e52e4e0 ("arm64: topology: Avoid checking numa mask for
scheduler MC selection") reverted the NUMA siblings checks since the
CPU topology masks weren't updated on hotplug at that time.
This patch re-introduces numa mask check as the CPU and NUMA topology
is now updated in hotplug paths. Effectively, this patch does the
partial revert of commit acd32e52e4e0.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Similar to core_sibling and thread_sibling, it's better to align and
rename llc_siblings to llc_sibling.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We already repopulate the information on CPU hotplug-in, so we can safely
remove the CPU topology and NUMA cpumap information during CPU hotplug
out operation. This will help to provide the correct cpumask for
scheduler domains.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's incorrect to iterate over all the possible CPUs to update the
sibling masks when any CPU is hotplugged in. In case the topology
siblings masks of the CPU is removed when is it hotplugged out, we
end up updating those masks when one of it's sibling is powered up
again. This will provide inconsistent view.
Further, since the CPU calling update_sibling_masks is yet to be set
online, there's no need to compare itself with each online CPU when
updating the siblings masks.
This patch restricts updation of sibling masks only for CPUs that are
already online. It also the drops the unnecessary cpuid check.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds support to remove all the CPU topology information using
clear_cpu_topology and also resetting the sibling information on other
sibling CPUs. This will be used in cpu_disable so that all the topology
sibling information is removed on CPU hotplug out.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently numa_clear_node removes both cpu information from the NUMA
node cpumap as well as the NUMA node id from the cpu. Similarly
numa_store_cpu_info updates both percpu nodeid and NUMA cpumap.
However we need to retain the numa node id for the cpu and only remove
the cpu information from the numa node cpumap during CPU hotplug out.
The same can be extended for hotplugging in the CPU.
This patch separates out numa_{add,remove}_cpu from numa_clear_node and
numa_store_cpu_info.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently reset_cpu_topology clears all the CPU topology information
and resets to default values. However we may need to just clear the
information when we hotplug out the CPU. In preparation to add the
support the same, let's refactor reset_cpu_topology to just reset
the information and move clearing out the topology information to
clear_cpu_topology.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ERRATA_MIDR_REV_RANGE macro assigns ARM64_CPUCAP_LOCAL_CPU_ERRATUM
to the '.type' field of the 'struct arm64_cpu_capabilities', so there's
no need to assign it explicitly as well.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Patching kernel instructions at runtime requires other CPUs to undergo
a context synchronisation event via an explicit ISB or an IPI in order
to ensure that the new instructions are visible. This is required even
for "hotpatch" instructions such as NOP and BL, so avoid optimising in
this case and always go via stop_machine() when performing general
patching.
ftrace isn't quite as strict, so it can continue to call the nosync
code directly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
When invalidating the instruction cache for a kernel mapping via
flush_icache_range(), it is also necessary to flush the pipeline for
other CPUs so that instructions fetched into the pipeline before the
I-cache invalidation are discarded. For example, if module 'foo' is
unloaded and then module 'bar' is loaded into the same area of memory,
a CPU could end up executing instructions from 'foo' when branching into
'bar' if these instructions were fetched into the pipeline before 'foo'
was unloaded.
Whilst this is highly unlikely to occur in practice, particularly as
any exception acts as a context-synchronizing operation, following the
letter of the architecture requires us to execute an ISB on each CPU
in order for the new instruction stream to be visible.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some code cares about the SPSR_ELx format for exceptions taken from
AArch32 to inspect or manipulate the SPSR_ELx value, which is already in
the SPSR_ELx format, and not in the AArch32 PSR format.
To separate these from cases where we care about the AArch32 PSR format,
migrate these cases to use the PSR_AA32_* definitions rather than
COMPAT_PSR_*.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SPSR_ELx format for exceptions taken from AArch32 is slightly
different to the AArch32 PSR format.
Map between the two in the compat ptrace code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SPSR_ELx format for exceptions taken from AArch32 differs from the
AArch32 PSR format. Thus, we must translate between the two when setting
up a compat sigframe, or restoring context from a compat sigframe.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently valid_user_regs() treats SPSR_ELx.DIT as a RES0 bit, causing
it to be zeroed upon exception return, rather than preserved. Thus, code
relying on DIT will not function as expected, and may expose an
unexpected timing sidechannel.
Let's remove DIT from the set of RES0 bits, such that it is preserved.
At the same time, the related comment is updated to better describe the
situation, and to take into account the most recent documentation of
SPSR_ELx, in ARM DDI 0487C.a.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 7206dc93a5 ("arm64: Expose Arm v8.4 features")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Track mismatches in the cache type register (CTR_EL0), other
than the D/I min line sizes and trap user accesses if there are any.
Fixes: be68a8aaf9 ("arm64: cpufeature: Fix CTR_EL0 field definitions")
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If there is a mismatch in the I/D min line size, we must
always use the system wide safe value both in applications
and in the kernel, while performing cache operations. However,
we have been checking more bits than just the min line sizes,
which triggers false negatives. We may need to trap the user
accesses in such cases, but not necessarily patch the kernel.
This patch fixes the check to do the right thing as advertised.
A new capability will be added to check mismatches in other
fields and ensure we trap the CTR accesses.
Fixes: be68a8aaf9 ("arm64: cpufeature: Fix CTR_EL0 field definitions")
Cc: <stable@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently machine_kexec() doesn't reset to EL2 in the case of a
crashdump kernel. This leaves potentially dodgy state active at EL2, and
means that if the crashdump kernel attempts to online secondary CPUs,
these will be booted as mismatched ELs.
Let's reset to EL2, as we do in all other cases, and simplify things. If
EL2 state is corrupt, things are already sufficiently bad that kdump is
unlikely to work, and it's best-effort regardless.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The implementation of flush_icache_range() includes instruction sequences
which are themselves patched at runtime, so it is not safe to call from
the patching framework.
This patch reworks the alternatives cache-flushing code so that it rolls
its own internal D-cache maintenance using DC CIVAC before invalidating
the entire I-cache after all alternatives have been applied at boot.
Modules don't cause any issues, since flush_icache_range() is safe to
call by the time they are loaded.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Rohit Khanna <rokhanna@nvidia.com>
Cc: Alexander Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-7-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We can't pass the breakpoint directly on arch_check_bp_in_kernelspace()
anymore because its architecture internal datas (struct arch_hw_breakpoint)
are not yet filled by the time we call the function, and most
implementation need this backend to be up to date. So arrange the
function to take the probing struct instead.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We inspect __kpti_forced early on as part of the cpufeature enable
callback which remaps the swapper page table using non-global entries.
Ensure that __kpti_forced has been updated to reflect the kpti=
command-line option before we start using it.
Fixes: ea1e3de85e ("arm64: entry: Add fake CPU feature for unmapping the kernel at EL0")
Cc: <stable@vger.kernel.org> # 4.16.x-
Reported-by: Wei Xu <xuwei5@hisilicon.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Clear current_kprobe and enable preemption in kprobe
even if pre_handler returns !0.
This simplifies function override using kprobes.
Jprobe used to require to keep the preemption disabled and
keep current_kprobe until it returned to original function
entry. For this reason kprobe_int3_handler() and similar
arch dependent kprobe handers checks pre_handler result
and exit without enabling preemption if the result is !0.
After removing the jprobe, Kprobes does not need to
keep preempt disabled even if user handler returns !0
anymore.
But since the function override handler in error-inject
and bpf is also returns !0 if it overrides a function,
to balancing the preempt count, it enables preemption
and reset current kprobe by itself.
That is a bad design that is very buggy. This fixes
such unbalanced preempt-count and current_kprobes setting
in kprobes, bpf and error-inject.
Note: for powerpc and x86, this removes all preempt_disable
from kprobe_ftrace_handler because ftrace callbacks are
called under preempt disabled.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: https://lore.kernel.org/lkml/152942494574.15209.12323837825873032258.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Don't call the ->break_handler() from the arm64 kprobes code,
because it was only used by jprobes which got removed.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/lkml/152942474231.15209.17684808374429473004.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We can't call function trace hook before setup percpu offset.
When entering secondary_start_kernel(), percpu offset has not
been initialized. So this lead hotplug malfunction.
Here is the flow to reproduce this bug:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo function > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
echo 1 > /sys/devices/system/cpu/cpu1/online
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The changes to automatically test for working stack protector compiler
support in the Kconfig files removed the special STACKPROTECTOR_AUTO
option that picked the strongest stack protector that the compiler
supported.
That was all a nice cleanup - it makes no sense to have the AUTO case
now that the Kconfig phase can just determine the compiler support
directly.
HOWEVER.
It also meant that doing "make oldconfig" would now _disable_ the strong
stackprotector if you had AUTO enabled, because in a legacy config file,
the sane stack protector configuration would look like
CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_NONE is not set
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_STACKPROTECTOR_AUTO=y
and when you ran this through "make oldconfig" with the Kbuild changes,
it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
used to be disabled (because it was really enabled by AUTO), and would
disable it in the new config, resulting in:
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
That's dangerously subtle - people could suddenly find themselves with
the weaker stack protector setup without even realizing.
The solution here is to just rename not just the old RECULAR stack
protector option, but also the strong one. This does that by just
removing the CC_ prefix entirely for the user choices, because it really
is not about the compiler support (the compiler support now instead
automatially impacts _visibility_ of the options to users).
This results in "make oldconfig" actually asking the user for their
choice, so that we don't have any silent subtle security model changes.
The end result would generally look like this:
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
where the "CC_" versions really are about internal compiler
infrastructure, not the user selections.
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed (Kees)
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b
oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF
bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7
oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI
EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P
fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2
zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb
ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8
tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3
BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3
LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh
AtfvEeaPHaOyD8/h2Q==
=zUUp
-----END PGP SIGNATURE-----
Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more overflow updates from Kees Cook:
"The rest of the overflow changes for v4.18-rc1.
This includes the explicit overflow fixes from Silvio, further
struct_size() conversions from Matthew, and a bug fix from Dan.
But the bulk of it is the treewide conversions to use either the
2-factor argument allocators (e.g. kmalloc(a * b, ...) into
kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
b) into vmalloc(array_size(a, b)).
Coccinelle was fighting me on several fronts, so I've done a bunch of
manual whitespace updates in the patches as well.
Summary:
- Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
- Explicitly reported overflow fixes (Silvio, Kees)
- Add missing kvcalloc() function (Kees)
- Treewide conversions of allocators to use either 2-factor argument
variant when available, or array_size() and array3_size() as needed
(Kees)"
* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
treewide: Use array_size in f2fs_kvzalloc()
treewide: Use array_size() in f2fs_kzalloc()
treewide: Use array_size() in f2fs_kmalloc()
treewide: Use array_size() in sock_kmalloc()
treewide: Use array_size() in kvzalloc_node()
treewide: Use array_size() in vzalloc_node()
treewide: Use array_size() in vzalloc()
treewide: Use array_size() in vmalloc()
treewide: devm_kzalloc() -> devm_kcalloc()
treewide: devm_kmalloc() -> devm_kmalloc_array()
treewide: kvzalloc() -> kvcalloc()
treewide: kvmalloc() -> kvmalloc_array()
treewide: kzalloc_node() -> kcalloc_node()
treewide: kzalloc() -> kcalloc()
treewide: kmalloc() -> kmalloc_array()
mm: Introduce kvcalloc()
video: uvesafb: Fix integer overflow in allocation
UBIFS: Fix potential integer overflow in allocation
leds: Use struct_size() in allocation
Convert intel uncore to struct_size
...
* ARM: lazy context-switching of FPSIMD registers on arm64, "split"
regions for vGIC redistributor
* s390: cleanups for nested, clock handling, crypto, storage keys and
control register bits
* x86: many bugfixes, implement more Hyper-V super powers,
implement lapic_timer_advance_ns even when the LAPIC timer
is emulated using the processor's VMX preemption timer. Two
security-related bugfixes at the top of the branch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbH8Z/AAoJEL/70l94x66DF+UIAJeOuTp6LGasT/9uAb2OovaN
+5kGmOPGFwkTcmg8BQHI2fXT4vhxMXWPFcQnyig9eXJVxhuwluXDOH4P9IMay0yw
VDCBsWRdMvZDQad2hn6Z5zR4Jx01XrSaG/KqvXbbDKDCy96mWG7SYAY2m3ZwmeQi
3Pa3O3BTijr7hBYnMhdXGkSn4ZyU8uPaAgIJ8795YKeOJ2JmioGYk6fj6y2WCxA3
ztJymBjTmIoZ/F8bjuVouIyP64xH4q9roAyw4rpu7vnbWGqx1fjPYJoB8yddluWF
JqCPsPzhKDO7mjZJy+lfaxIlzz2BN7tKBNCm88s5GefGXgZwk3ByAq/0GQ2M3rk=
=H5zI
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small update for KVM:
ARM:
- lazy context-switching of FPSIMD registers on arm64
- "split" regions for vGIC redistributor
s390:
- cleanups for nested
- clock handling
- crypto
- storage keys
- control register bits
x86:
- many bugfixes
- implement more Hyper-V super powers
- implement lapic_timer_advance_ns even when the LAPIC timer is
emulated using the processor's VMX preemption timer.
- two security-related bugfixes at the top of the branch"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits)
kvm: fix typo in flag name
kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access
KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system
KVM: x86: introduce linear_{read,write}_system
kvm: nVMX: Enforce cpl=0 for VMX instructions
kvm: nVMX: Add support for "VMWRITE to any supported field"
kvm: nVMX: Restrict VMX capability MSR changes
KVM: VMX: Optimize tscdeadline timer latency
KVM: docs: nVMX: Remove known limitations as they do not exist now
KVM: docs: mmu: KVM support exposing SLAT to guests
kvm: no need to check return value of debugfs_create functions
kvm: Make VM ioctl do valloc for some archs
kvm: Change return type to vm_fault_t
KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008
kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation
KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation
KVM: introduce kvm_make_vcpus_request_mask() API
KVM: x86: hyperv: do rep check for each hypercall separately
...
- Spectre v4 mitigation (Speculative Store Bypass Disable) support for
arm64 using SMC firmware call to set a hardware chicken bit
- ACPI PPTT (Processor Properties Topology Table) parsing support and
enable the feature for arm64
- Report signal frame size to user via auxv (AT_MINSIGSTKSZ). The
primary motivation is Scalable Vector Extensions which requires more
space on the signal frame than the currently defined MINSIGSTKSZ
- ARM perf patches: allow building arm-cci as module, demote dev_warn()
to dev_dbg() in arm-ccn event_init(), miscellaneous cleanups
- cmpwait() WFE optimisation to avoid some spurious wakeups
- L1_CACHE_BYTES reverted back to 64 (for performance reasons that have
to do with some network allocations) while keeping ARCH_DMA_MINALIGN
to 128. cache_line_size() returns the actual hardware Cache Writeback
Granule
- Turn LSE atomics on by default in Kconfig
- Kernel fault reporting tidying
- Some #include and miscellaneous cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlsaoqsACgkQa9axLQDI
XvH+8RAAqRCrEtkNPS7zxHyMK/D2cxSy9EVtlJ1sxhmsONEe5t5MDTWX9byobQ5A
PAKMSQBQgUvecqHLOtD7SJWef1il30zgWmc/yPcgNv3OsA1Au7j2g3ht/Drw+N5I
Vy0aOUEtw+Jzs7y/CJyl6lufSkkOzszOujt2Nybiz6omztOrwkW9isKnURzQBNj5
gquZI35h604YJ9F0TqS6ZqU7tNcuB9q02FxvVBpLmb83jP4jSEjYACUJwVVxvEAB
UXjdD4N130rRXDS5OMRWo5+4SAj+kPYhdVYEvaDx7xTOIRHhXK05GlJbsUAc5E6l
xy810fH5Dm0diYpVvYWTA5J+BU1jNOvCys5zKWl7gs2P8YB59PdqY4M2YBPNGb5H
PaVgq73TZAsww6ZInbZlK+wZOIxZZIOf//Z+QKn6EPtu3RmzIFWwyttTj01w1E3i
LhjcUoGnvxJFcMoCr59ihDwfP9nkCVrNc4REOGaWDk6L/t/bOfaZfDz+OCGbwQdL
akCFKZI6q5O/no+YfhtdtNFpCQb/Bo1J88KuotICRXq8z4vO41zIG53bi97W8QeG
rCBiX0NxUxYJ3ybus7kZHTmMGieMyEHP28n12QffwvJj4vJBsUXQBrV8hclx0djZ
HMt7iPi/0BW6nVV7ngIgN3cdCpaDCEGRsfO4Ch0rFZrC9UbYQnE=
=uums
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from the core arm64 and perf changes, the Spectre v4 mitigation
touches the arm KVM code and the ACPI PPTT support touches drivers/
(acpi and cacheinfo). I should have the maintainers' acks in place.
Summary:
- Spectre v4 mitigation (Speculative Store Bypass Disable) support
for arm64 using SMC firmware call to set a hardware chicken bit
- ACPI PPTT (Processor Properties Topology Table) parsing support and
enable the feature for arm64
- Report signal frame size to user via auxv (AT_MINSIGSTKSZ). The
primary motivation is Scalable Vector Extensions which requires
more space on the signal frame than the currently defined
MINSIGSTKSZ
- ARM perf patches: allow building arm-cci as module, demote
dev_warn() to dev_dbg() in arm-ccn event_init(), miscellaneous
cleanups
- cmpwait() WFE optimisation to avoid some spurious wakeups
- L1_CACHE_BYTES reverted back to 64 (for performance reasons that
have to do with some network allocations) while keeping
ARCH_DMA_MINALIGN to 128. cache_line_size() returns the actual
hardware Cache Writeback Granule
- Turn LSE atomics on by default in Kconfig
- Kernel fault reporting tidying
- Some #include and miscellaneous cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (53 commits)
arm64: Fix syscall restarting around signal suppressed by tracer
arm64: topology: Avoid checking numa mask for scheduler MC selection
ACPI / PPTT: fix build when CONFIG_ACPI_PPTT is not enabled
arm64: cpu_errata: include required headers
arm64: KVM: Move VCPU_WORKAROUND_2_FLAG macros to the top of the file
arm64: signal: Report signal frame size to userspace via auxv
arm64/sve: Thin out initialisation sanity-checks for sve_max_vl
arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
arm64: KVM: Add HYP per-cpu accessors
arm64: ssbd: Add prctl interface for per-thread mitigation
arm64: ssbd: Introduce thread flag to control userspace mitigation
arm64: ssbd: Restore mitigation status on CPU resume
arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
arm64: ssbd: Add global mitigation state accessor
arm64: Add 'ssbd' command-line option
arm64: Add ARCH_WORKAROUND_2 probing
arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
...
Commit 17c2895 ("arm64: Abstract syscallno manipulation") abstracts
out the pt_regs.syscallno value for a syscall cancelled by a tracer
as NO_SYSCALL, and provides helpers to set and check for this
condition. However, the way this was implemented has the
unintended side-effect of disabling part of the syscall restart
logic.
This comes about because the second in_syscall() check in
do_signal() re-evaluates the "in a syscall" condition based on the
updated pt_regs instead of the original pt_regs. forget_syscall()
is explicitly called prior to the second check in order to prevent
restart logic in the ret_to_user path being spuriously triggered,
which means that the second in_syscall() check always yields false.
This triggers a failure in
tools/testing/selftests/seccomp/seccomp_bpf.c, when using ptrace to
suppress a signal that interrups a nanosleep() syscall.
Misbehaviour of this type is only expected in the case where a
tracer suppresses a signal and the target process is either being
single-stepped or the interrupted syscall attempts to restart via
-ERESTARTBLOCK.
This patch restores the old behaviour by performing the
in_syscall() check only once at the start of the function.
Fixes: 17c2895860 ("arm64: Abstract syscallno manipulation")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reported-by: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.14.x-
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The numa mask subset check can often lead to system hang or crash during
CPU hotplug and system suspend operation if NUMA is disabled. This is
mostly observed on HMP systems where the CPU compute capacities are
different and ends up in different scheduler domains. Since
cpumask_of_node is returned instead core_sibling, the scheduler is
confused with incorrect cpumasks(e.g. one CPU in two different sched
domains at the same time) on CPU hotplug.
Lets disable the NUMA siblings checks for the time being, as NUMA in
socket machines have LLC's that will assure that the scheduler topology
isn't "borken".
The NUMA check exists to assure that if a LLC within a socket crosses
NUMA nodes/chiplets the scheduler domains remain consistent. This code will
likely have to be re-enabled in the near future once the NUMA mask story
is sorted. At the moment its not necessary because the NUMA in socket
machines LLC's are contained within the NUMA domains.
Further, as a defensive mechanism during hot-plug, lets assure that the
LLC siblings are also masked.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Without including psci.h and arm-smccc.h, we now get a build failure in
some configurations:
arch/arm64/kernel/cpu_errata.c: In function 'arm64_update_smccc_conduit':
arch/arm64/kernel/cpu_errata.c:278:10: error: 'psci_ops' undeclared (first use in this function); did you mean 'sysfs_ops'?
arch/arm64/kernel/cpu_errata.c: In function 'arm64_set_ssbd_mitigation':
arch/arm64/kernel/cpu_errata.c:311:3: error: implicit declaration of function 'arm_smccc_1_1_hvc' [-Werror=implicit-function-declaration]
arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL);
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull timers and timekeeping updates from Thomas Gleixner:
- Core infrastucture work for Y2038 to address the COMPAT interfaces:
+ Add a new Y2038 safe __kernel_timespec and use it in the core
code
+ Introduce config switches which allow to control the various
compat mechanisms
+ Use the new config switch in the posix timer code to control the
32bit compat syscall implementation.
- Prevent bogus selection of CPU local clocksources which causes an
endless reselection loop
- Remove the extra kthread in the clocksource code which has no value
and just adds another level of indirection
- The usual bunch of trivial updates, cleanups and fixlets all over the
place
- More SPDX conversions
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
clocksource/drivers/mxs_timer: Switch to SPDX identifier
clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Remove outdated file path
clocksource/drivers/arc_timer: Add comments about locking while read GFRC
clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages
clocksource/drivers/sprd: Fix Kconfig dependency
clocksource: Move inline keyword to the beginning of function declarations
timer_list: Remove unused function pointer typedef
timers: Adjust a kernel-doc comment
tick: Prefer a lower rating device only if it's CPU local device
clocksource: Remove kthread
time: Change nanosleep to safe __kernel_* types
time: Change types to new y2038 safe __kernel_* types
time: Fix get_timespec64() for y2038 safe compat interfaces
time: Add new y2038 safe __kernel_timespec
posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_64BIT_TIME in architectures
compat: Enable compat_get/put_timespec64 always
...
Pull siginfo updates from Eric Biederman:
"This set of changes close the known issues with setting si_code to an
invalid value, and with not fully initializing struct siginfo. There
remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64
and x86 to get the code that generates siginfo into a simpler and more
maintainable state. Most of that work involves refactoring the signal
handling code and thus careful code review.
Also not included is the work to shrink the in kernel version of
struct siginfo. That depends on getting the number of places that
directly manipulate struct siginfo under control, as it requires the
introduction of struct kernel_siginfo for the in kernel things.
Overall this set of changes looks like it is making good progress, and
with a little luck I will be wrapping up the siginfo work next
development cycle"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
signal/sh: Stop gcc warning about an impossible case in do_divide_error
signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions
signal/um: More carefully relay signals in relay_signal.
signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR}
signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code
signal/signalfd: Add support for SIGSYS
signal/signalfd: Remove __put_user from signalfd_copyinfo
signal/xtensa: Use force_sig_fault where appropriate
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
signal/um: Use force_sig_fault where appropriate
signal/sparc: Use force_sig_fault where appropriate
signal/sparc: Use send_sig_fault where appropriate
signal/sh: Use force_sig_fault where appropriate
signal/s390: Use force_sig_fault where appropriate
signal/riscv: Replace do_trap_siginfo with force_sig_fault
signal/riscv: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_mceerr where appropriate
signal/openrisc: Use force_sig_fault where appropriate
signal/nios2: Use force_sig_fault where appropriate
...
- Lazy context-switching of FPSIMD registers on arm64
- Allow virtual redistributors to be part of two or more MMIO ranges
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlsRY14VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDMGwP/A3FDrzGSjgC65m037/dsQj/Eniv
NkpueEVO3Z8UN44j0TNdeUzj6vQD376GVDwnW3mFlQ416A4ZwwHkk8cQhbpP2UvQ
EqKKUgujvLueZeuAwYG/DtrR9VZ6fh7QLD7Mv8DW/0AaNdBN2LyHEkW0qx7cSXqu
PijTsImj9B8TSykYc0SlJz7Q7Y5QUOYbWrJqqa1cskOdmpN2ATInnA2haXeO7j8v
lkb+WZ9R6xiJSzMCeLEzFV6tUvTiaSw5lVL64jpJhbkBNWPIVAza0erm9TSlQaTw
d3uJlAy0W9UkXSSqvbmtXvBFqCyEOzZ0hwi2MF6RoVuFt1yXwLgHGps6OUkho4Kq
pXWImaRHwxyQGrOY0qm0cxr+6TjYnjn8rIOzmzBOrKKq+aCIQ+Sl+CtNYzczQYeE
rOFBQFsMlzSRJWyabUjhBGFNfDmZZaVFKnUekEqXXETtLxzLZtx+W9i4tzoA1stv
y0+4yAjEyOQoRsAAE3GmzpDsu7Eu2sae6+lTo7DX1y+A7Wi94HKmy47sVjrS+evV
2SLyVZ4mhwMzaQ7ngrjHLD1GXDlBxxk2X+NSmBVe5z4AsuWeoqy81f0rgjyCQNxo
swEqs0k7mMDo8GQNjawwzhdDuHYm4gTX5iGs/Nxx6K4OoJ0bgv83yb/goArp+LEU
/QWT4T37A/pEEECe
=DUmC
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM updates for 4.18
- Lazy context-switching of FPSIMD registers on arm64
- Allow virtual redistributors to be part of two or more MMIO ranges
Stateful CPU architecture extensions may require the signal frame
to grow to a size that exceeds the arch's MINSIGSTKSZ #define.
However, changing this #define is an ABI break.
To allow userspace the option of determining the signal frame size
in a more forwards-compatible way, this patch adds a new auxv entry
tagged with AT_MINSIGSTKSZ, which provides the maximum signal frame
size that the process can observe during its lifetime.
If AT_MINSIGSTKSZ is absent from the aux vector, the caller can
assume that the MINSIGSTKSZ #define is sufficient. This allows for
a consistent interface with older kernels that do not provide
AT_MINSIGSTKSZ.
The idea is that libc could expose this via sysconf() or some
similar mechanism.
There is deliberately no AT_SIGSTKSZ. The kernel knows nothing
about userspace's own stack overheads and should not pretend to
know.
For arm64:
The primary motivation for this interface is the Scalable Vector
Extension, which can require at least 4KB or so of extra space
in the signal frame for the largest hardware implementations.
To determine the correct value, a "Christmas tree" mode (via the
add_all argument) is added to setup_sigframe_layout(), to simulate
addition of all possible records to the signal frame at maximum
possible size.
If this procedure goes wrong somehow, resulting in a stupidly large
frame layout and hence failure of sigframe_alloc() to allocate a
record to the frame, then this is indicative of a kernel bug. In
this case, we WARN() and no attempt is made to populate
AT_MINSIGSTKSZ for userspace.
For arm64 SVE:
The SVE context block in the signal frame needs to be considered
too when computing the maximum possible signal frame size.
Because the size of this block depends on the vector length, this
patch computes the size based not on the thread's current vector
length but instead on the maximum possible vector length: this
determines the maximum size of SVE context block that can be
observed in any signal frame for the lifetime of the process.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the kernel SVE support is reasonably mature, it is
excessive to default sve_max_vl to the invalid value -1 and then
sprinkle WARN_ON()s around the place to make sure it has been
initialised before use. The cpufeatures code already runs pretty
early, and will ensure sve_max_vl gets initialised.
This patch initialises sve_max_vl to something sane that will be
supported by every SVE implementation, and removes most of the
sanity checks.
The checks in find_supported_vector_length() are retained for now.
If anything goes horribly wrong, we are likely to trip a check here
sooner or later.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3,
add a small(-ish) sequence to handle it at EL2. Special care must
be taken to track the state of the guest itself by updating the
workaround flags. We also rely on patching to enable calls into
the firmware.
Note that since we need to execute branches, this always executes
after the Spectre-v2 mitigation has been applied.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If running on a system that performs dynamic SSBD mitigation, allow
userspace to request the mitigation for itself. This is implemented
as a prctl call, allowing the mitigation to be enabled or disabled at
will for this particular thread.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to allow userspace to be mitigated on demand, let's
introduce a new thread flag that prevents the mitigation from
being turned off when exiting to userspace, and doesn't turn
it on on entry into the kernel (with the assumption that the
mitigation is always enabled in the kernel itself).
This will be used by a prctl interface introduced in a later
patch.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On a system where firmware can dynamically change the state of the
mitigation, the CPU will always come up with the mitigation enabled,
including when coming back from suspend.
If the user has requested "no mitigation" via a command line option,
let's enforce it by calling into the firmware again to disable it.
Similarily, for a resume from hibernate, the mitigation could have
been disabled by the boot kernel. Let's ensure that it is set
back on in that case.
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to avoid checking arm64_ssbd_callback_required on each
kernel entry/exit even if no mitigation is required, let's
add yet another alternative that by default jumps over the mitigation,
and that gets nop'ed out if we're doing dynamic mitigation.
Think of it as a poor man's static key...
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On a system where the firmware implements ARCH_WORKAROUND_2,
it may be useful to either permanently enable or disable the
workaround for cases where the user decides that they'd rather
not get a trap overhead, and keep the mitigation permanently
on or off instead of switching it on exception entry/exit.
In any case, default to the mitigation being enabled.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As for Spectre variant-2, we rely on SMCCC 1.1 to provide the
discovery mechanism for detecting the SSBD mitigation.
A new capability is also allocated for that purpose, and a
config option.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a heterogeneous system, we can end up with both affected and
unaffected CPUs. Let's check their status before calling into the
firmware.
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order for the kernel to protect itself, let's call the SSBD mitigation
implemented by the higher exception level (either hypervisor or firmware)
on each transition between userspace and kernel.
We must take the PSCI conduit into account in order to target the
right exception level, hence the introduction of a runtime patching
callback.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the host SVE context can be saved on demand from Hyp,
there is no longer any need to save this state in advance before
entering the guest.
This patch removes the relevant call to
kvm_fpsimd_flush_cpu_state().
Since the problem that function was intended to solve now no longer
exists, the function and its dependencies are also deleted.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to make sve_save_state()/sve_load_state() more easily
reusable and to get rid of a potential branch on context switch
critical paths, this patch makes sve_pffr() inline and moves it to
fpsimd.h.
<asm/processor.h> must be included in fpsimd.h in order to make
this work, and this creates an #include cycle that is tricky to
avoid without modifying core code, due to the way the PR_SVE_*()
prctl helpers are included in the core prctl implementation.
Instead of breaking the cycle, this patch defers inclusion of
<asm/fpsimd.h> in <asm/processor.h> until the point where it is
actually needed: i.e., immediately before the prctl definitions.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
sve_pffr(), which is used to derive the base address used for
low-level SVE save/restore routines, currently takes the relevant
task_struct as an argument.
The only accessed fields are actually part of thread_struct, so
this patch changes the argument type accordingly. This is done in
preparation for moving this function to a header, where we do not
want to have to include <linux/sched.h> due to the consequent
circular #include problems.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Having read_zcr_features() inline in cpufeature.h results in that
header requiring #includes which make it hard to include
<asm/fpsimd.h> elsewhere without triggering header inclusion
cycles.
This is not a hot-path function and arguably should not be in
cpufeature.h in the first place, so this patch moves it to
fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.
This allows some SVE-related #includes to be dropped from
cpufeature.h, which will ease future maintenance.
A couple of missing #includes of <asm/fpsimd.h> are exposed by this
change under arch/arm64/. This patch adds the missing #includes as
necessary.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64. This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.
Four hooks are defined in order to enable this:
* kvm_arch_vcpu_run_map_fp():
Called on PID change to map necessary bits of current to Hyp.
* kvm_arch_vcpu_load_fp():
Set up FP/SIMD for entering the KVM run loop (parse as
"vcpu_load fp").
* kvm_arch_vcpu_ctxsync_fp():
Get FP/SIMD into a safe state for re-enabling interrupts after a
guest exit back to the run loop.
For arm64 specifically, this involves updating the host kernel's
FPSIMD context tracking metadata so that kernel-mode NEON use
will cause the vcpu's FPSIMD state to be saved back correctly
into the vcpu struct. This must be done before re-enabling
interrupts because kernel-mode NEON may be used by softirqs.
* kvm_arch_vcpu_put_fp():
Save guest FP/SIMD state back to memory and dissociate from the
CPU ("vcpu_put fp").
Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current. A few
helpers drive this:
* fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
mark this CPU as having context fp (which may belong to a vcpu)
currently loaded in its registers. This is the non-task
equivalent of the static function fpsimd_bind_to_cpu() in
fpsimd.c.
* task_fpsimd_save():
exported to allow KVM to save the guest's FPSIMD state back to
memory on exit from the run loop.
* fpsimd_flush_state():
invalidate any context's FPSIMD state that is currently loaded.
Used to disassociate the vcpu from the CPU regs on run loop exit.
These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.
Some new vcpu_arch fields are added to make all this work. Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state. However, it is still needed for
preserving other things such as the host's system registers. To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.
arch/arm is not addressed by this patch and continues to use its
current save/restore logic. It could provide implementations of
the helpers later if desired.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for optimising the way KVM manages switching the
guest and host FPSIMD state, it is necessary to provide a means for
code outside arch/arm64/kernel/fpsimd.c to restore the user trap
configuration for SVE correctly for the current task.
Rather than requiring external code to duplicate the maintenance
explicitly, this patch moves the trap maintenenace to
fpsimd_bind_to_cpu(), since it is logically part of the work of
associating the current task with the cpu.
Because fpsimd_bind_to_cpu() is rather a cryptic name to publish
alongside fpsimd_bind_state_to_cpu(), the former function is
renamed to fpsimd_bind_task_to_cpu() to make its purpose more
explicit.
This patch makes appropriate changes to ensure that
fpsimd_bind_task_to_cpu() is always called alongside
task_fpsimd_load(), so that the trap maintenance continues to be
done in every situation where it was done prior to this patch.
As a side-effect, the metadata updates done by
fpsimd_bind_task_to_cpu() now change from conditional to
unconditional in the "already bound" case of sigreturn. This is
harmless, and a couple of extra stores on this slow path will not
impact performance. I consider this a reasonable price to pay for
a slightly cleaner interface.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.
The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory. For these reasons, the ->mm
checks are not useful, providing that TIF_FOREIGN_FPSTATE is
maintained in a consistent way for all threads.
The context switch logic is already deliberately optimised to defer
reloads of the regs until ret_to_user (or sigreturn as a special
case), and save them only if they have been previously loaded.
These paths are the only places where the wrong_task and wrong_cpu
conditions can be made false, by calling fpsimd_bind_task_to_cpu().
Kernel threads by definition never reach these paths. As a result,
the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
always yield true for kernel threads.
This patch removes the redundant checks and special-case code,
ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread
is scheduled in, and ensures that this flag is set for the init
task. The fpsimd_flush_task_state() call already present in
copy_thread() ensures the same for any new task.
With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
ensures that no extra context save work is added for kernel
threads, and eliminates the redundant context saving that may
currently occur for kernel threads that have acquired an mm via
use_mm().
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
contexts to be handled by the fpsimd common code, this patch adapts
task_fpsimd_save() to save back the currently loaded context,
removing the explicit dependency on current.
The relevant storage to write back to in memory is now found by
examining the fpsimd_last_state percpu struct.
fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
fpsimd_last_state is updated under local_bh_disable() or
local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
thus, fpsimd_save() will write back to the correct storage for the
loaded context.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch uses the new update_thread_flag() helpers to simplify a
couple of if () set; else clear; constructs.
No functional change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
fpsimd_last_state.st is set to NULL as a way of indicating that
current's FPSIMD registers are no longer loaded in the cpu. In
particular, this is done when the kernel temporarily uses or
clobbers the FPSIMD registers for its own purposes, as in CPU PM or
kernel-mode NEON, resulting in them being populated with garbage
data not belonging to a task.
Commit 17eed27b02 ("arm64/sve: KVM: Prevent guests from using
SVE") factors this operation out as a new helper
fpsimd_flush_cpu_state() to make it clearer what is being done
here, and on SVE systems this helper is now used, via
kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
has run a vcpu. The reason for this is that KVM does not yet
understand how to restore the full host SVE registers itself after
loading the guest FPSIMD context into them.
This exposes a particular problem: if fpsimd_last_state.st is set
to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
continue to think that current's FPSIMD registers are live even
though they have actually been clobbered.
Prior to the aforementioned commit, the only path where
fpsimd_last_state.st is set to NULL without setting
TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
kernel thread (where current->mm can be NULL). This does not
matter, because the only harm is that at context-switch time
fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
back to current's thread_struct (even though kernel threads are not
considered to have any FPSIMD context of their own and the
registers will never be reloaded).
Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
setting, every CPU passing through that path must subsequently pass
through CPU_PM_EXIT before it can re-enter the kernel proper.
CPU_PM_EXIT sets the flag.
The sve_flush_cpu_state() function added by commit 17eed27b02
also lacks the proper maintenance of TIF_FOREIGN_FPSTATE. This may
cause the bits of a host task's SVE registers that do not alias the
FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
runs in the same task in the meantime. Although this effect is
hidden by the fact that the non-FPSIMD bits of the SVE registers
are zeroed by a syscall anyway, it is doubtless a bad idea to rely
on these different code paths interacting correctly under future
maintenance.
This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
calls that become redundant as a result. This ensures that
TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
FPSIMD registers is invalid.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Otherwise modules that use these arithmetic operations will fail to
link. We accomplish this with the usual EXPORT_SYMBOL, which on most
architectures goes in the .S file but the ARM64 maintainers prefer that
insead it goes into arm64ksyms.
While we're at it, we also fix this up to use SPDX, and I personally
choose to relicense this as GPL2||BSD so that these symbols don't need
to be export_symbol_gpl, so all modules can use the routines, since
these are important general purpose compiler-generated function calls.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: PaX Team <pageexec@freemail.hu>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
The arm_pmu::handle_irq() callback has the same prototype as a generic
IRQ handler, taking the IRQ number and a void pointer argument which it
must convert to an arm_pmu pointer.
This means that all arm_pmu::handle_irq() take an IRQ number they never
use, and all must explicitly cast the void pointer to an arm_pmu
pointer.
Instead, let's change arm_pmu::handle_irq to take an arm_pmu pointer,
allowing these casts to be removed. The redundant IRQ number parameter
is also removed.
Suggested-by: Hoeun Ryu <hoeun.ryu@lge.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Writes to ZCR_EL1 are self-synchronising, and so may be expensive
in typical implementations.
This patch adopts the approach used for costly system register
writes elsewhere in the kernel: the system register write is
suppressed if it would not change the stored value.
Since the common case will be that of switching between tasks that
use the same vector length as one another, prediction hit rates on
the conditional branch should be reasonably good, with lower
expected amortised cost than the unconditional execution of a
heavyweight self-synchronising instruction.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that we have an accurate view of the physical topology
we need to represent it correctly to the scheduler. Generally MC
should equal the LLC in the system, but there are a number of
special cases that need to be dealt with.
In the case of NUMA in socket, we need to assure that the sched
domain we build for the MC layer isn't larger than the DIE above it.
Similarly for LLC's that might exist in cross socket interconnect or
directory hardware we need to assure that MC is shrunk to the socket
or NUMA node.
This patch builds a sibling mask for the LLC, and then picks the
smallest of LLC, socket siblings, or NUMA node siblings, which
gives us the behavior described above. This is ever so slightly
different than the similar alternative where we look for a cache
layer less than or equal to the socket/NUMA siblings.
The logic to pick the MC layer affects all arm64 machines, but
only changes the behavior for DT/MPIDR systems if the NUMA domain
is smaller than the core siblings (generally set to the cluster).
Potentially this fixes a possible bug in DT systems, but really
it only affects ACPI systems where the core siblings is correctly
set to the socket siblings. Thus all currently available ACPI
systems should have MC equal to LLC, including the NUMA in socket
machines where the LLC is partitioned between the NUMA nodes.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id and core_id by assuming
certain levels of the PPTT tree correspond to those concepts.
The package_id is flagged in the tree and can be found by calling
find_acpi_cpu_topology_package() which terminates
its search when it finds an ACPI node flagged as the physical
package. If the tree doesn't contain enough levels to represent
all of the requested levels then the root node will be returned
for all subsequent levels.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The cluster concept isn't architecturally defined for arm64.
Lets match the name of the arm64 topology field to the kernel macro
that uses it.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The /sys cache entries should support ACPI/PPTT generated cache
topology information. For arm64, if ACPI is enabled, determine
the max number of cache levels and populate them using the PPTT
table if one is available.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Vijaya Kumar K <vkilari@codeaurora.org>
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
"make includecheck" detected few duplicated includes in arch/arm64.
This patch removes the double inclusions.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
VMLINUX_SYMBOL() is no-op unless CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX
is defined. It has ever been selected only by BLACKFIN and METAG.
VMLINUX_SYMBOL() is unneeded for ARM64-specific code.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch increases the ARCH_DMA_MINALIGN to 128 so that it covers the
currently known Cache Writeback Granule (CTR_EL0.CWG) on arm64 and moves
the fallback in cache_line_size() from L1_CACHE_BYTES to this constant.
In addition, it warns (and taints) if the CWG is larger than
ARCH_DMA_MINALIGN as this is not safe with non-coherent DMA.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The NVIDIA Denver CPU also needs a PSCI call to harden the branch
predictor.
Signed-off-by: David Gilhooley <dgilhooley@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's possible for userspace to control idx. Sanitize idx when using it
as an array index.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.
Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.
The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.
In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.
Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Our arm64_skip_faulting_instruction() helper advances the userspace
singlestep state machine, but this is also called by the kernel BRK
handler, as used for WARN*().
Thus, if we happen to hit a WARN*() while the user singlestep state
machine is in the active-no-pending state, we'll advance to the
active-pending state without having executed a user instruction, and
will take a step exception earlier than expected when we return to
userspace.
Let's fix this by only advancing the state machine when skipping a user
instruction.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit a257e02579 ("arm64/kernel: don't ban ADRP to work around
Cortex-A53 erratum #843419") introduced a function whose name ends with
"_veneer".
This clashes with commit bd8b22d288 ("Kbuild: kallsyms: ignore veneers
emitted by the ARM linker"), which removes symbols ending in "_veneer"
from kallsyms.
The problem was manifested as 'perf test -vvvvv vmlinux' failed,
correctly claiming the symbol 'module_emit_adrp_veneer' was present in
vmlinux, but not in kallsyms.
...
ERR : 0xffff00000809aa58: module_emit_adrp_veneer not on kallsyms
...
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
Fix the problem by renaming module_emit_adrp_veneer to
module_emit_veneer_for_adrp. Now the test passes.
Fixes: a257e02579 ("arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We transiently switch to KERNEL_DS in compat_ptrace_gethbpregs() and
compat_ptrace_sethbpregs(), but in either case this is pointless as we
don't perform any uaccess during this window.
let's rip out the redundant addr_limit manipulation.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We're missing a sentinel entry in kpti_safe_list. Thus is_midr_in_range_list()
can walk past the end of kpti_safe_list. Depending on the contents of memory,
this could erroneously match a CPU's MIDR, cause a data abort, or other bad
outcomes.
Add the sentinel entry to avoid this.
Fixes: be5b299830 ("arm64: capabilities: Add support for checks based on a list of MIDRs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit:
a7e6f1ca90 ("arm64: signal: Force SIGKILL for unknown signals in force_signal_inject")
... any signal which is not SIGKILL will be upgraded to a SIGKILL be
force_signal_inject(). This includes signals we do expect, such as
SIGILL triggered by do_undefinstr().
Fix the check to use a logical AND rather than a logical OR, permitting
signals whose layout is SIL_FAULT.
Fixes: a7e6f1ca90 ("arm64: signal: Force SIGKILL for unknown signals in force_signal_inject")
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add support macros to conditionally yield the NEON (and thus the CPU)
that may be called from the assembler code.
In some cases, yielding the NEON involves saving and restoring a non
trivial amount of context (especially in the CRC folding algorithms),
and so the macro is split into three, and the code in between is only
executed when the yield path is taken, allowing the context to be preserved.
The third macro takes an optional label argument that marks the resume
path after a yield has been performed.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
bpi.S was introduced as we were starting to build the Spectre v2
mitigation framework, and it was rather unclear that it would
become strictly KVM specific.
Now that the picture is a lot clearer, let's move the content
of that file to hyp-entry.S, where it actually belong.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The very existence of __smccc_workaround_1_hvc_* is a thinko, as
KVM will never use a HVC call to perform the branch prediction
invalidation. Even as a nested hypervisor, it would use an SMC
instruction.
Let's get rid of it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since 5e7951ce19 ("arm64: capabilities: Clean up midr range helpers"),
capabilities must be represented with a single entry. If multiple
CPU types can use the same capability, then they need to be enumerated
in a list.
The EL2 hardening stuff (which affects both A57 and A72) managed to
escape the conversion in the above patch thanks to the 4.17 merge
window. Let's fix it now.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
of Silicon provider service ID 0xC2001700.
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
[maz: reworked errata framework integration]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past invalid
privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as of now)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
=bPlD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past
invalid privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as
of now)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
kvm: x86: fix a prototype warning
kvm: selftests: add sync_regs_test
kvm: selftests: add API testing infrastructure
kvm: x86: fix a compile warning
KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
KVM: X86: Introduce handle_ud()
KVM: vmx: unify adjacent #ifdefs
x86: kvm: hide the unused 'cpu' variable
KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
kvm: Add emulation for movups/movupd
KVM: VMX: raise internal error for exception during invalid protected mode state
KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
KVM: x86: Fix misleading comments on handling pending exceptions
KVM: x86: Rename interrupt.pending to interrupt.injected
KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
x86/kvm: use Enlightened VMCS when running on Hyper-V
x86/hyper-v: detect nested features
x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
...
- Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a bunch
more warnings (hidden behind W=1).
- Build dtc lexer and parser files instead of using shipped versions.
- Rework overlay apply API to take an FDT as input and apply overlays in
a single step.
- Add a phandle lookup cache. This improves boot time by hundreds of
msec on systems with large DT.
- Add trivial mcp4017/18/19 potentiometers bindings.
- Remove VLA stack usage in DT code.
-----BEGIN PGP SIGNATURE-----
iQItBAABCAAXBQJaxiUdEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcM0+w/+
L7nkug1Hz2476eRrsn5bm6oOO0vCrhQcDTJ/AlvU1YO8XBVgGEetLDs8drmvD0/O
FQDcpumX6G0eFoHTnTNWD7keM+0nY5jZBIAqKQNa9a0HKkjYc4HO5Ot9E02XG8W8
759vvCcGeJpysoCls9u8OplzqiDyNVQJd1a0fLivtafdKypuE/Ywh15wrzckPO+F
bxqWQd+uwm98ZVz8/o3vfYtAOJmA06A+hsyVLXYu7iKQcXYVxi+ZNbRV44MQ50NI
1w5m8GgtWe4A2lpXjmeXk1VmLPO3eEgQKnBoH7gcJmCHaVg/SVfMgBscuGSQZRQa
rQvaYRUNGJ0Mtji8EZpZb5Vip4ZCDtZCQBB3snN24CvGXI6WuIIg/8ncXt0AfLqn
pxFmC32ZcwvJR2NCpPVfTgILm6foT9IzJWKl6SQLVtqqVp9nPFua7T3l8AQak7FB
2MMaaqh7L0l0za0ZgArZZo/IWUHRb0MwZdXAkqBZlQ6f3IBqGQeKCnkclAeH8qYr
OorCOmC2OlKXLPHoz8XHeBzPRdnv1dQ//gEkKXBJ2igLU03hRWv9dxnGju/45sun
Ifo79uBAUc9s3F4Kjd/zs2iLztuPrYCSICHtJh9LPeOxoV1ZUNt+6Cm23yQ014Uo
/GsFW+lzh7c9wB1eETjPHd1WuYXiSrmE4zvbdykyLCk=
=ZWpa
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
- Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a
bunch more warnings (hidden behind W=1).
- Build dtc lexer and parser files instead of using shipped versions.
- Rework overlay apply API to take an FDT as input and apply overlays
in a single step.
- Add a phandle lookup cache. This improves boot time by hundreds of
msec on systems with large DT.
- Add trivial mcp4017/18/19 potentiometers bindings.
- Remove VLA stack usage in DT code.
* tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (26 commits)
of: unittest: fix an error code in of_unittest_apply_overlay()
of: unittest: move misplaced function declaration
of: unittest: Remove VLA stack usage
of: overlay: Fix forgotten reference to of_overlay_apply()
of: Documentation: Fix forgotten reference to of_overlay_apply()
of: unittest: local return value variable related cleanups
of: unittest: remove unneeded local return value variables
dt-bindings: trivial: add various mcp4017/18/19 potentiometers
of: unittest: fix an error test in of_unittest_overlay_8()
of: cache phandle nodes to reduce cost of of_find_node_by_phandle()
dt-bindings: rockchip-dw-mshc: use consistent clock names
MAINTAINERS: Add linux/of_*.h headers to appropriate subsystems
scripts: turn off some new dtc warnings by default
scripts/dtc: Update to upstream version v1.4.6-9-gaadd0b65c987
scripts/dtc: generate lexer and parser during build instead of shipping
powerpc: boot: add strrchr function
of: overlay: do not include path in full_name of added nodes
of: unittest: clean up changeset test
arm64/efi: Make strrchr() available to the EFI namespace
ARM: boot: add strrchr function
...
Nothing particularly stands out here, probably because people were tied
up with spectre/meltdown stuff last time around. Still, the main pieces
are:
- Rework of our CPU features framework so that we can whitelist CPUs that
don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to elide
instruction and data cache maintenance when writing out instructions
- Removal of the large memory model which resulted in suboptimal codegen
by the compiler and increased the use of literal pools, which could
potentially be used as ROP gadgets since they are mapped as executable
- Rework of forced signal delivery so that the siginfo_t is well-formed
and handling of show_unhandled_signals is consolidated and made
consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJaw1TCAAoJELescNyEwWM0gyQIAJVMK4QveBW+LwF96NYdZo16
p90Aa+nqKelh/s93govQArDMv1gxyuXdFlQZVOGPQHfqpz6RhJWmBA2tFsUbQrUc
OBcioPrRihqTmKBe+1r1XORwZxkVX6GGmCn0LYpPR7I3TjxXZpvxqaxGxiUvHkci
yVxWlDTyN/7eL3akhCpCDagN3Fxwk3QnJLqE3fxOFMlY7NvQcmUxcITiUl/s469q
xK6SWH9SRH1JK8jTHPitwUBiU//3FfCqSI9HLEdDIDoTuPcVM8UetWvi4QzrzJL1
UYg8lmU0CXNmflDzZJDaMf+qFApOrGxR0YVPpBzlQvxe0JIY69g48f+JzDPz8nc=
=+gNa
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Nothing particularly stands out here, probably because people were
tied up with spectre/meltdown stuff last time around. Still, the main
pieces are:
- Rework of our CPU features framework so that we can whitelist CPUs
that don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to
elide instruction and data cache maintenance when writing out
instructions
- Removal of the large memory model which resulted in suboptimal
codegen by the compiler and increased the use of literal pools,
which could potentially be used as ROP gadgets since they are
mapped as executable
- Rework of forced signal delivery so that the siginfo_t is
well-formed and handling of show_unhandled_signals is consolidated
and made consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric
Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
arm64: uaccess: Fix omissions from usercopy whitelist
arm64: fpsimd: Split cpu field out from struct fpsimd_state
arm64: tlbflush: avoid writing RES0 bits
arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
arm64: fpsimd: include <linux/init.h> in fpsimd.h
drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
perf: arm_spe: include linux/vmalloc.h for vmap()
Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
arm64: cpufeature: Avoid warnings due to unused symbols
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: Delay enabling hardware DBM feature
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Handle shared entries
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Clean up midr range helpers
arm64: capabilities: Change scope of VHE to Boot CPU feature
...
Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
"System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or
compat_sys_xyzzy() should only be called from userspace via the
syscall table, but not from elsewhere in the kernel.
At least on 64-bit x86, it will likely be a hard requirement from
v4.17 onwards to not call system call functions in the kernel: It is
better to use use a different calling convention for system calls
there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
which then hands processing over to the actual syscall function. This
means that only those parameters which are actually needed for a
specific syscall are passed on during syscall entry, instead of
filling in six CPU registers with random user space content all the
time (which may cause serious trouble down the call chain). Those
x86-specific patches will be pushed through the x86 tree in the near
future.
Moreover, rules on how data may be accessed may differ between kernel
data and user data. This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific
code.
This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"
* 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
bpf: whitelist all syscalls for error injection
kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
kernel/sys_ni: sort cond_syscall() entries
syscalls/x86: auto-create compat_sys_*() prototypes
syscalls: sort syscall prototypes in include/linux/compat.h
net: remove compat_sys_*() prototypes from net/compat.h
syscalls: sort syscall prototypes in include/linux/syscalls.h
kexec: move sys_kexec_load() prototype to syscalls.h
x86/sigreturn: use SYSCALL_DEFINE0
x86: fix sys_sigreturn() return type to be long, not unsigned long
x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
...
Pull EFI updates from Ingo Molnar:
"The main EFI changes in this cycle were:
- Fix the apple-properties code (Andy Shevchenko)
- Add WARN() on arm64 if UEFI Runtime Services corrupt the reserved
x18 register (Ard Biesheuvel)
- Use efi_switch_mm() on x86 instead of manipulating %cr3 directly
(Sai Praneeth)
- Fix early memremap leak in ESRT code (Ard Biesheuvel)
- Switch to L"xxx" notation for wide string literals (Ard Biesheuvel)
- ... plus misc other cleanups and bugfixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
x86/efi: Replace efi_pgd with efi_mm.pgd
efi: Use string literals for efi_char16_t variable initializers
efi/esrt: Fix handling of early ESRT table mapping
efi: Use efi_mm in x86 as well as ARM
efi: Make const array 'apple' static
efi/apple-properties: Use memremap() instead of ioremap()
efi: Reorder pr_notice() with add_device_randomness() call
x86/efi: Replace GFP_ATOMIC with GFP_KERNEL in efi_query_variable_store()
efi/arm64: Check whether x18 is preserved by runtime services calls
efi/arm*: Stop printing addresses of virtual mappings
efi/apple-properties: Remove redundant attribute initialization from unmarshal_key_value_pairs()
efi/arm*: Only register page tables when they exist
Using this helper allows us to avoid the in-kernel calls to the
sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_mmap_pgoff().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
When the hardend usercopy support was added for arm64, it was
concluded that all cases of usercopy into and out of thread_struct
were statically sized and so didn't require explicit whitelisting
of the appropriate fields in thread_struct.
Testing with usercopy hardening enabled has revealed that this is
not the case for certain ptrace regset manipulation calls on arm64.
This occurs because the sizes of usercopies associated with the
regset API are dynamic by construction, and because arm64 does not
always stage such copies via the stack: indeed the regset API is
designed to avoid the need for that by adding some bounds checking.
This is currently believed to affect only the fpsimd and TLS
registers.
Because the whitelisted fields in thread_struct must be contiguous,
this patch groups them together in a nested struct. It is also
necessary to be able to determine the location and size of that
struct, so rather than making the struct anonymous (which would
save on edits elsewhere) or adding an anonymous union containing
named and unnamed instances of the same struct (gross), this patch
gives the struct a name and makes the necessary edits to code that
references it (noisy but simple).
Care is needed to ensure that the new struct does not contain
padding (which the usercopy hardening would fail to protect).
For this reason, the presence of tp2_value is made unconditional,
since a padding field would be needed there in any case. This pads
up to the 16-byte alignment required by struct user_fpsimd_state.
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9e8084d3f7 ("arm64: Implement thread_struct whitelist for hardened usercopy")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for using a common representation of the FPSIMD
state for tasks and KVM vcpus, this patch separates out the "cpu"
field that is used to track the cpu on which the state was most
recently loaded.
This will allow common code to operate on task and vcpu contexts
without requiring the cpu field to be stored at the same offset
from the FPSIMD register data in both cases. This should avoid the
need for messing with the definition of those parts of struct
vcpu_arch that are exposed in the KVM user ABI.
The resulting change is also convenient for grouping and defining
the set of thread_struct fields that are supposed to be accessible
to copy_{to,from}_user(), which includes user_fpsimd_state but
should exclude the cpu field. This patch does not amend the
usercopy whitelist to match: that will be addressed in a subsequent
patch.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[will: inline fpsimd_flush_state for now]
Signed-off-by: Will Deacon <will.deacon@arm.com>
MIDR_ALL_VERSIONS is changing, and won't have the same meaning
in 4.17, and the right thing to use will be ERRATA_MIDR_ALL_VERSIONS.
In order to cope with the merge window, let's add a compatibility
macro that will allow a relatively smooth transition, and that
can be removed post 4.17-rc1.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Creates far too many conflicts with arm64/for-next/core, to be
resent post -rc1.
This reverts commit f9f5dc1950.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This reverts commit 1f85b42a69.
The internal dma-direct.h API has changed in -next, which collides with
us trying to use it to manage non-coherent DMA devices on systems with
unreasonably large cache writeback granules.
This isn't at all trivial to resolve, so revert our changes for now and
we can revisit this after the merge window. Effectively, this just
restores our behaviour back to that of 4.16.
Signed-off-by: Will Deacon <will.deacon@arm.com>
An allnoconfig build complains about unused symbols due to functions
that are called via conditional cpufeature and cpu_errata table entries.
Annotate these as __maybe_unused if they are likely to be generic, or
predicate their compilation on the same option as the table entry if
they are specific to a given alternative.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer
from an erratum 1024718, which causes incorrect updates when DBM/AP
bits in a page table entry is modified without a break-before-make
sequence. The work around is to skip enabling the hardware DBM feature
on the affected cores. The hardware Access Flag management features
is not affected. There are some other cores suffering from this
errata, which could be added to the midr_list to trigger the work
around.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: ckadabi@codeaurora.org
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We enable hardware DBM bit in a capable CPU, very early in the
boot via __cpu_setup. This doesn't give us a flexibility of
optionally disable the feature, as the clearing the bit
is a bit costly as the TLB can cache the settings. Instead,
we delay enabling the feature until the CPU is brought up
into the kernel. We use the feature capability mechanism
to handle it.
The hardware DBM is a non-conflicting feature. i.e, the kernel
can safely run with a mix of CPUs with some using the feature
and the others don't. So, it is safe for a late CPU to have
this capability and enable it, even if the active CPUs don't.
To get this handled properly by the infrastructure, we
unconditionally set the capability and only enable it
on CPUs which really have the feature. Also, we print the
feature detection from the "matches" call back to make sure
we don't mislead the user when none of the CPUs could use the
feature.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some capabilities have different criteria for detection and associated
actions based on the matching criteria, even though they all share the
same capability bit. So far we have used multiple entries with the same
capability bit to handle this. This is prone to errors, as the
cpu_enable is invoked for each entry, irrespective of whether the
detection rule applies to the CPU or not. And also this complicates
other helpers, e.g, __this_cpu_has_cap.
This patch adds a wrapper entry to cover all the possible variations
of a capability by maintaining list of matches + cpu_enable callbacks.
To avoid complicating the prototypes for the "matches()", we use
arm64_cpu_capabilities maintain the list and we ignore all the other
fields except the matches & cpu_enable.
This ensures :
1) The capabilitiy is set when at least one of the entry detects
2) Action is only taken for the entries that "matches".
This avoids explicit checks in the cpu_enable() take some action.
The only constraint here is that, all the entries should have the
same "type" (i.e, scope and conflict rules).
If a cpu_enable() method is associated with multiple matches for a
single capability, care should be taken that either the match criteria
are mutually exclusive, or that the method is robust against being
called multiple times.
This also reverts the changes introduced by commit 67948af41f
("arm64: capabilities: Handle duplicate entries for a capability").
Cc: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add helpers for detecting an errata on list of midr ranges
of affected CPUs, with the same work around.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add helpers for checking if the given CPU midr falls in a range
of variants/revisions for a given model.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We are about to introduce generic MIDR range helpers. Clean
up the existing helpers in erratum handling, preparing them
to use generic version.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We expect all CPUs to be running at the same EL inside the kernel
with or without VHE enabled and we have strict checks to ensure
that any mismatch triggers a kernel panic. If VHE is enabled,
we use the feature based on the boot CPU and all other CPUs
should follow. This makes it a perfect candidate for a capability
based on the boot CPU, which should be matched by all the CPUs
(both when is ON and OFF). This saves us some not-so-pretty
hooks and special code, just for verifying the conflict.
The patch also makes the VHE capability entry depend on
CONFIG_ARM64_VHE.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The kernel detects and uses some of the features based on the boot
CPU and expects that all the following CPUs conform to it. e.g,
with VHE and the boot CPU running at EL2, the kernel decides to
keep the kernel running at EL2. If another CPU is brought up without
this capability, we use custom hooks (via check_early_cpu_features())
to handle it. To handle such capabilities add support for detecting
and enabling capabilities based on the boot CPU.
A bit is added to indicate if the capability should be detected
early on the boot CPU. The infrastructure then ensures that such
capabilities are probed and "enabled" early on in the boot CPU
and, enabled on the subsequent CPUs.
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
KPTI is treated as a system wide feature and is only detected if all
the CPUs in the sysetm needs the defense, unless it is forced via kernel
command line. This leaves a system with a mix of CPUs with and without
the defense vulnerable. Also, if a late CPU needs KPTI but KPTI was not
activated at boot time, the CPU is currently allowed to boot, which is a
potential security vulnerability.
This patch ensures that the KPTI is turned on if at least one CPU detects
the capability (i.e, change scope to SCOPE_LOCAL_CPU). Also rejetcs a late
CPU, if it requires the defense, when the system hasn't enabled it,
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we have the flexibility of defining system features based
on individual CPUs, introduce CPU feature type that can be detected
on a local SCOPE and ignores the conflict on late CPUs. This is
applicable for ARM64_HAS_NO_HW_PREFETCH, where it is fine for
the system to have CPUs without hardware prefetch turning up
later. We only suffer a performance penalty, nothing fatal.
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the features and errata workarounds have the same
rules and flow, group the handling of the tables.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
So far we have treated the feature capabilities as system wide
and this wouldn't help with features that could be detected locally
on one or more CPUs (e.g, KPTI, Software prefetch). This patch
splits the feature detection to two phases :
1) Local CPU features are checked on all boot time active CPUs.
2) System wide features are checked only once after all CPUs are
active.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Right now we run through the errata workarounds check on all boot
active CPUs, with SCOPE_ALL. This wouldn't help for detecting erratum
workarounds with a SYSTEM_SCOPE. There are none yet, but we plan to
introduce some: let us clean this up so that such workarounds can be
detected and enabled correctly.
So, we run the checks with SCOPE_LOCAL_CPU on all CPUs and SCOPE_SYSTEM
checks are run only once after all the boot time CPUs are active.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We are about to group the handling of all capabilities (features
and errata workarounds). This patch open codes the wrapper routines
to make it easier to merge the handling.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
While processing the list of capabilities, it is useful to
filter out some of the entries based on the given mask for the
scope of the capabilities to allow better control. This can be
used later for handling LOCAL vs SYSTEM wide capabilities and more.
All capabilities should have their scope set to either LOCAL_CPU or
SYSTEM. No functional/flow change.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that each capability describes how to treat the conflicts
of CPU cap state vs System wide cap state, we can unify the
verification logic to a single place.
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When a CPU is brought up, it is checked against the caps that are
known to be enabled on the system (via verify_local_cpu_capabilities()).
Based on the state of the capability on the CPU vs. that of System we
could have the following combinations of conflict.
x-----------------------------x
| Type | System | Late CPU |
|-----------------------------|
| a | y | n |
|-----------------------------|
| b | n | y |
x-----------------------------x
Case (a) is not permitted for caps which are system features, which the
system expects all the CPUs to have (e.g VHE). While (a) is ignored for
all errata work arounds. However, there could be exceptions to the plain
filtering approach. e.g, KPTI is an optional feature for a late CPU as
long as the system already enables it.
Case (b) is not permitted for errata work arounds that cannot be activated
after the kernel has finished booting.And we ignore (b) for features. Here,
yet again, KPTI is an exception, where if a late CPU needs KPTI we are too
late to enable it (because we change the allocation of ASIDs etc).
Add two different flags to indicate how the conflict should be handled.
ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU - CPUs may have the capability
ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU - CPUs may not have the cappability.
Now that we have the flags to describe the behavior of the errata and
the features, as we treat them, define types for ERRATUM and FEATURE.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We use arm64_cpu_capabilities to represent CPU ELF HWCAPs exposed
to the userspace and the CPU hwcaps used by the kernel, which
include cpu features and CPU errata work arounds. Capabilities
have some properties that decide how they should be treated :
1) Detection, i.e scope : A cap could be "detected" either :
- if it is present on at least one CPU (SCOPE_LOCAL_CPU)
Or
- if it is present on all the CPUs (SCOPE_SYSTEM)
2) When is it enabled ? - A cap is treated as "enabled" when the
system takes some action based on whether the capability is detected or
not. e.g, setting some control register, patching the kernel code.
Right now, we treat all caps are enabled at boot-time, after all
the CPUs are brought up by the kernel. But there are certain caps,
which are enabled early during the boot (e.g, VHE, GIC_CPUIF for NMI)
and kernel starts using them, even before the secondary CPUs are brought
up. We would need a way to describe this for each capability.
3) Conflict on a late CPU - When a CPU is brought up, it is checked
against the caps that are known to be enabled on the system (via
verify_local_cpu_capabilities()). Based on the state of the capability
on the CPU vs. that of System we could have the following combinations
of conflict.
x-----------------------------x
| Type | System | Late CPU |
------------------------------|
| a | y | n |
------------------------------|
| b | n | y |
x-----------------------------x
Case (a) is not permitted for caps which are system features, which the
system expects all the CPUs to have (e.g VHE). While (a) is ignored for
all errata work arounds. However, there could be exceptions to the plain
filtering approach. e.g, KPTI is an optional feature for a late CPU as
long as the system already enables it.
Case (b) is not permitted for errata work arounds which requires some
work around, which cannot be delayed. And we ignore (b) for features.
Here, yet again, KPTI is an exception, where if a late CPU needs KPTI we
are too late to enable it (because we change the allocation of ASIDs
etc).
So this calls for a lot more fine grained behavior for each capability.
And if we define all the attributes to control their behavior properly,
we may be able to use a single table for the CPU hwcaps (which cover
errata and features, not the ELF HWCAPs). This is a prepartory step
to get there. More bits would be added for the properties listed above.
We are going to use a bit-mask to encode all the properties of a
capabilities. This patch encodes the "SCOPE" of the capability.
As such there is no change in how the capabilities are treated.
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We have errata work around processing code in cpu_errata.c,
which calls back into helpers defined in cpufeature.c. Now
that we are going to make the handling of capabilities
generic, by adding the information to each capability,
move the errata work around specific processing code.
No functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We trigger CPU errata work around check on the boot CPU from
smp_prepare_boot_cpu() to make sure that we run the checks only
after the CPU feature infrastructure is initialised. While this
is correct, we can also do this from init_cpu_features() which
initilises the infrastructure, and is called only on the
Boot CPU. This helps to consolidate the CPU capability handling
to cpufeature.c. No functional changes.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We issue the enable() call back for all CPU hwcaps capabilities
available on the system, on all the CPUs. So far we have ignored
the argument passed to the call back, which had a prototype to
accept a "void *" for use with on_each_cpu() and later with
stop_machine(). However, with commit 0a0d111d40
("arm64: cpufeature: Pass capability structure to ->enable callback"),
there are some users of the argument who wants the matching capability
struct pointer where there are multiple matching criteria for a single
capability. Clean up the declaration of the call back to make it clear.
1) Renamed to cpu_enable(), to imply taking necessary actions on the
called CPU for the entry.
2) Pass const pointer to the capability, to allow the call back to
check the entry. (e.,g to check if any action is needed on the CPU)
3) We don't care about the result of the call back, turning this to
a void.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
[suzuki: convert more users, rename call back and drop results]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently a SIGFPE delivered in response to a floating-point
exception trap may have si_code set to 0 on arm64. As reported by
Eric, this is a bad idea since this is the value of SI_USER -- yet
this signal is definitely not the result of kill(2), tgkill(2) etc.
and si_uid and si_pid make limited sense whereas we do want to
yield a value for si_addr (which doesn't exist for SI_USER).
It's not entirely clear whether the architecure permits a
"spurious" fp exception trap where none of the exception flag bits
in ESR_ELx is set. (IMHO the architectural intent is to forbid
this.) However, it does permit those bits to contain garbage if
the TFV bit in ESR_ELx is 0. That case isn't currently handled at
all and may result in si_code == 0 or si_code containing a FPE_FLT*
constant corresponding to an exception that did not in fact happen.
There is nothing sensible we can return for si_code in such cases,
but SI_USER is certainly not appropriate and will lead to violation
of legitimate userspace assumptions.
This patch allocates a new si_code value FPE_UNKNOWN that at least
does not conflict with any existing SI_* or FPE_* code, and yields
this in si_code for undiagnosable cases. This is probably the best
simplicity/incorrectness tradeoff achieveable without relying on
implementation-dependent features or adding a lot of code. In any
case, there appears to be no perfect solution possible that would
justify a lot of effort here.
Yielding FPE_UNKNOWN when some well-defined fp exception caused the
trap is a violation of POSIX, but this is forced by the
architecture. We have no realistic prospect of yielding the
correct code in such cases. At present I am not aware of any ARMv8
implementation that supports trapped floating-point exceptions in
any case.
The new code may be applicable to other architectures for similar
reasons.
No attempt is made to provide ESR_ELx to userspace in the signal
frame, since architectural limitations mean that it is unlikely to
provide much diagnostic value, doesn't benefit existing software
and would create ABI with no proven purpose. The existing
mechanism for passing it also has problems of its own which may
result in the wrong value being passed to userspace due to
interaction with mm faults. The implied rework does not appear
justified.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
of Silicon provider service ID 0xC2001700.
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Expose the new features introduced by Arm v8.4 extensions to
Arm v8-A profile.
These include :
1) Data indpendent timing of instructions. (DIT, exposed as HWCAP_DIT)
2) Unaligned atomic instructions and Single-copy atomicity of loads
and stores. (AT, expose as HWCAP_USCAT)
3) LDAPR and STLR instructions with immediate offsets (extension to
LRCPC, exposed as HWCAP_ILRCPC)
4) Flag manipulation instructions (TS, exposed as HWCAP_FLAGM).
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The printk symbol was intended as a generic address that is always
exported, however that turned out to be false with CONFIG_PRINTK=n:
ERROR: "printk" [arch/arm64/kernel/arm64-reloc-test.ko] undefined!
This changes the references to memstart_addr, which should be there
regardless of configuration.
Fixes: a257e02579 ("arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cortex-A57 and A72 are vulnerable to the so-called "variant 3a" of
Meltdown, where an attacker can speculatively obtain the value
of a privileged system register.
By enabling ARM64_HARDEN_EL2_VECTORS on these CPUs, obtaining
VBAR_EL2 is not disclosing the hypervisor mappings anymore.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We're about to need to allocate hardening slots from other parts
of the kernel (in order to support ARM64_HARDEN_EL2_VECTORS).
Turn the counter into an atomic_t and make it available to the
rest of the kernel. Also add BP_HARDEN_EL2_SLOTS as the number of
slots instead of the hardcoded 4...
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far, the branch from the vector slots to the main vectors can at
most be 4GB from the main vectors (the reach of ADRP), and this
distance is known at compile time. If we were to remap the slots
to an unrelated VA, things would break badly.
A way to achieve VA independence would be to load the absolute
address of the vectors (__kvm_hyp_vector), either using a constant
pool or a series of movs, followed by an indirect branch.
This patches implements the latter solution, using another instance
of a patching callback. Note that since we have to save a register
pair on the stack, we branch to the *second* instruction in the
vectors in order to compensate for it. This also results in having
to adjust this balance in the invalid vector entry point.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far, we only reserve a single instruction in the BPI template in
order to branch to the vectors. As we're going to stuff a few more
instructions there, let's reserve a total of 5 instructions, which
we're going to patch later on as required.
We also introduce a small refactor of the vectors themselves, so that
we stop carrying the target branch around.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
There is no reason why the BP hardening vectors shouldn't be part
of the HYP text at compile time, rather than being mapped at runtime.
Also introduce a new config symbol that controls the compilation
of bpi.S.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The encoder for ADD/SUB (immediate) can only cope with 12bit
immediates, while there is an encoding for a 12bit immediate shifted
by 12 bits to the left.
Let's fix this small oversight by allowing the LSL_12 bit to be set.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add an encoder for the EXTR instruction, which also implements the ROR
variant (where Rn == Rm).
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we can dynamically compute the kernek/hyp VA mask, there
is no need for a feature flag to trigger the alternative patching.
Let's drop the flag and everything that depends on it.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We lack a way to encode operations such as AND, ORR, EOR that take
an immediate value. Doing so is quite involved, and is all about
reverse engineering the decoding algorithm described in the
pseudocode function DecodeBitMasks().
This has been tested by feeding it all the possible literal values
and comparing the output with that of GAS.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We're missing the a way to generate the encoding of the N immediate,
which is only a single bit used in a number of instruction that take
an immediate.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We've so far relied on a patching infrastructure that only gave us
a single alternative, without any way to provide a range of potential
replacement instructions. For a single feature, this is an all or
nothing thing.
It would be interesting to have a more flexible grained way of patching
the kernel though, where we could dynamically tune the code that gets
injected.
In order to achive this, let's introduce a new form of dynamic patching,
assiciating a callback to a patching site. This callback gets source and
target locations of the patching request, as well as the number of
instructions to be patched.
Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb
directives:
asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback)
: "r" (v));
or
alternative_cb callback
mov x0, #0
alternative_cb_end
where callback is the C function computing the alternative.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch. We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.
This has the benefit of being able to retrieve the host context even
when our stack is corrupted, and it has a potential performance benefit
because we trade a store plus a load for an mrs and a load on a round
trip to the guest.
This does require us to calculate the percpu offset without including
the offset from the kernel mapping of the percpu array to the linear
mapping of the array (which is what we store in tpidr_el1), because a
PC-relative generated address in EL2 is already giving us the hyp alias
of the linear mapping of a kernel address. We do this in
__cpu_init_hyp_mode() by using kvm_ksym_ref().
The code that accesses ESR_EL2 was previously using an alternative to
use the _EL1 accessor on VHE systems, but this was actually unnecessary
as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
accessor does the same thing on both systems.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
A recent update to the ARM SMCCC ARCH_WORKAROUND_1 specification
allows firmware to return a non zero, positive value to describe
that although the mitigation is implemented at the higher exception
level, the CPU on which the call is made is not affected.
Let's relax the check on the return value from ARCH_WORKAROUND_1
so that we only error out if the returned value is negative.
Fixes: b092201e00 ("arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, as reported by Eric, an invalid si_code value 0 is
passed in many signals delivered to userspace in response to faults
and other kernel errors. Typically 0 is passed when the fault is
insufficiently diagnosable or when there does not appear to be any
sensible alternative value to choose.
This appears to violate POSIX, and is intuitively wrong for at
least two reasons arising from the fact that 0 == SI_USER:
1) si_code is a union selector, and SI_USER (and si_code <= 0 in
general) implies the existence of a different set of fields
(siginfo._kill) from that which exists for a fault signal
(siginfo._sigfault). However, the code raising the signal
typically writes only the _sigfault fields, and the _kill
fields make no sense in this case.
Thus when userspace sees si_code == 0 (SI_USER) it may
legitimately inspect fields in the inactive union member _kill
and obtain garbage as a result.
There appears to be software in the wild relying on this,
albeit generally only for printing diagnostic messages.
2) Software that wants to be robust against spurious signals may
discard signals where si_code == SI_USER (or <= 0), or may
filter such signals based on the si_uid and si_pid fields of
siginfo._sigkill. In the case of fault signals, this means
that important (and usually fatal) error conditions may be
silently ignored.
In practice, many of the faults for which arm64 passes si_code == 0
are undiagnosable conditions such as exceptions with syndrome
values in ESR_ELx to which the architecture does not yet assign any
meaning, or conditions indicative of a bug or error in the kernel
or system and thus that are unrecoverable and should never occur in
normal operation.
The approach taken in this patch is to translate all such
undiagnosable or "impossible" synchronous fault conditions to
SIGKILL, since these are at least probably localisable to a single
process. Some of these conditions should really result in a kernel
panic, but due to the lack of diagnostic information it is
difficult to be certain: this patch does not add any calls to
panic(), but this could change later if justified.
Although si_code will not reach userspace in the case of SIGKILL,
it is still desirable to pass a nonzero value so that the common
siginfo handling code can detect incorrect use of si_code == 0
without false positives. In this case the si_code dependent
siginfo fields will not be correctly initialised, but since they
are not passed to userspace I deem this not to matter.
A few faults can reasonably occur in realistic userspace scenarios,
and _should_ raise a regular, handleable (but perhaps not
ignorable/blockable) signal: for these, this patch attempts to
choose a suitable standard si_code value for the raised signal in
each case instead of 0.
arm64 was the only arch to define a BUS_FIXME code, so after this
patch nobody defines it. This patch therefore also removes the
relevant code from siginfo_layout().
Cc: James Morse <james.morse@arm.com>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.
This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.
CTR_EL0.DIC: Instruction cache invalidation requirements for
instruction to data coherence. The meaning of this bit[29].
0: Instruction cache invalidation to the point of unification
is required for instruction to data coherence.
1: Instruction cache cleaning to the point of unification is
not required for instruction to data coherence.
CTR_EL0.IDC: Data cache clean requirements for instruction to data
coherence. The meaning of this bit[28].
0: Data cache clean to the point of unification is required for
instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
1: Data cache clean to the point of unification is not required
for instruction to data coherence.
Co-authored-by: Philip Elcan <pelcan@codeaurora.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Omit patching of ADRP instruction at module load time if the current
CPUs are not susceptible to the erratum.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: Drop duplicate initialisation of .def_scope field]
Signed-off-by: Will Deacon <will.deacon@arm.com>
In some cases, core variants that are affected by a certain erratum
also exist in versions that have the erratum fixed, and this fact is
recorded in a dedicated bit in system register REVIDR_EL1.
Since the architecture does not require that a certain bit retains
its meaning across different variants of the same model, each such
REVIDR bit is tightly coupled to a certain revision/variant value,
and so we need a list of revidr_mask/midr pairs to carry this
information.
So add the struct member and the associated macros and handling to
allow REVIDR fixes to be taken into account.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Whether or not we will ever decide to start using x18 as a platform
register in Linux is uncertain, but by that time, we will need to
ensure that UEFI runtime services calls don't corrupt it.
So let's start issuing warnings now for this, and increase the
likelihood that these firmware images have all been replaced by that time.
This has been fixed on the EDK2 side in commit:
6d73863b5464 ("BaseTools/tools_def AARCH64: mark register x18 as reserved")
dated July 13, 2017.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180308080020.22828-6-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We currently have to rely on the GCC large code model for KASLR for
two distinct but related reasons:
- if we enable full randomization, modules will be loaded very far away
from the core kernel, where they are out of range for ADRP instructions,
- even without full randomization, the fact that the 128 MB module region
is now no longer fully reserved for kernel modules means that there is
a very low likelihood that the normal bottom-up allocation of other
vmalloc regions may collide, and use up the range for other things.
Large model code is suboptimal, given that each symbol reference involves
a literal load that goes through the D-cache, reducing cache utilization.
But more importantly, literals are not instructions but part of .text
nonetheless, and hence mapped with executable permissions.
So let's get rid of our dependency on the large model for KASLR, by:
- reducing the full randomization range to 4 GB, thereby ensuring that
ADRP references between modules and the kernel are always in range,
- reduce the spillover range to 4 GB as well, so that we fallback to a
region that is still guaranteed to be in range
- move the randomization window of the core kernel to the middle of the
VMALLOC space
Note that KASAN always uses the module region outside of the vmalloc space,
so keep the kernel close to that if KASAN is enabled.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When PLTs are emitted at relocation time, we really should not exceed
the number that we counted when parsing the relocation tables, and so
currently, we BUG() on this condition. However, even though this is a
clear bug in this particular piece of code, we can easily recover by
failing to load the module.
So instead, return 0 from module_emit_plt_entry() if this condition
occurs, which is not a valid kernel address, and can hence serve as
a flag value that makes the relocation routine bail out.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is the equivalent of commit 001bf455d2 ("ARM: 8428/1: kgdb: Fix
registers on sleeping tasks") but for arm64. Nuff said.
...well, perhaps I could also add that task_pt_regs are userspace
registers and that's not what kgdb is supposed to be reporting. We're
supposed to be reporting kernel registers.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 9730348075 ("arm64: Increase the max granular size") increased
the cache line size to 128 to match Cavium ThunderX, apparently for some
performance benefit which could not be confirmed. This change, however,
has an impact on the network packets allocation in certain
circumstances, requiring slightly over a 4K page with a significant
performance degradation.
This patch reverts L1_CACHE_SHIFT back to 6 (64-byte cache line) while
keeping ARCH_DMA_MINALIGN at 128. The cache_line_size() function was
changed to default to ARCH_DMA_MINALIGN in the absence of a meaningful
CTR_EL0.CWG bit field.
In addition, if a system with ARCH_DMA_MINALIGN < CTR_EL0.CWG is
detected, the kernel will force swiotlb bounce buffering for all
non-coherent devices since DMA cache maintenance on sub-CWG ranges is
not safe, leading to data corruption.
Cc: Tirumalesh Chalamarla <tchalamarla@cavium.com>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Using arm64_force_sig_info means that printing messages about unhandled
signals is dealt with for us, so use that in preference to force_sig_info
and remove any homebrew printing code.
Signed-off-by: Will Deacon <will.deacon@arm.com>
show_unhandled_signals_ratelimited is only called in traps.c, so move it
out of its macro in the dreaded system_misc.h and into a static function
in traps.c
Signed-off-by: Will Deacon <will.deacon@arm.com>
If we fail to deliver a signal due to taking an unhandled fault on the
stackframe, we can call arm64_notify_segfault to deliver a SEGV can deal
with printing any unhandled signal messages for us, rather than roll our
own printing code.
A side-effect of this change is that we now deliver the frame address
in si_addr along with an si_code of SEGV_{ACC,MAP}ERR, rather than an
si_addr of 0 and an si_code of SI_KERNEL as before.
Signed-off-by: Will Deacon <will.deacon@arm.com>
arm64_notify_die deals with printing out information regarding unhandled
signals, so there's no need to roll our own code here.
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for consolidating our handling of printing unhandled
signals, introduce a wrapper around force_sig_info which can act as
the canonical place for dealing with show_unhandled_signals.
Initially, we just hook this up to arm64_notify_die.
Signed-off-by: Will Deacon <will.deacon@arm.com>
For signals other than SIGKILL or those with siginfo_layout(signal, code)
== SIL_FAULT then force_signal_inject does not initialise the siginfo_t
properly. Since the signal number is determined solely by the caller,
simply WARN on unknown signals and force to SIGKILL.
Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
force_signal_inject is a little flakey:
* It only knows about SIGILL and SIGSEGV, so can potentially deliver
other signals based on a partially initialised siginfo_t
* It sets si_addr to point at the PC for SIGSEGV
* It always operates on current, so doesn't need the regs argument
This patch fixes these issues by always assigning the si_addr field to
the address parameter of the function and updates the callers (including
those that indirectly call via arm64_notify_segfault) accordingly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
libfdt gained a new dependency on strrchr, so make it available to the
EFI namespace before we update libfdt.
Thanks to Ard for providing this fix.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
The word "feature" is repeated in the CPU features reporting. This drops it
for improved readability.
Before (redundant "feature" word):
SMP: Total of 4 processors activated.
CPU features: detected feature: 32-bit EL0 Support
CPU features: detected feature: Kernel page table isolation (KPTI)
CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
CPU: All CPU(s) started at EL2
After:
SMP: Total of 4 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: Kernel page table isolation (KPTI)
CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
CPU: All CPU(s) started at EL2
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The PAN emulation notification was only happening for non-boot CPUs
if CPU capabilities had already been configured. This seems to be the
wrong place, as it's system-wide and isn't attached to capabilities,
so its reporting didn't normally happen. Instead, report it once from
the boot CPU.
Before (missing PAN emulation report):
SMP: Total of 4 processors activated.
CPU features: detected feature: 32-bit EL0 Support
CPU features: detected feature: Kernel page table isolation (KPTI)
CPU: All CPU(s) started at EL2
After:
SMP: Total of 4 processors activated.
CPU features: detected feature: 32-bit EL0 Support
CPU features: detected feature: Kernel page table isolation (KPTI)
CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
CPU: All CPU(s) started at EL2
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the early kernel mapping logic can tolerate placements of
Image that cross swapper table boundaries, we can remove the logic
that adjusts the offset if the dice roll produced an offset that
puts the kernel right on top of one.
Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mirror arm behaviour for unimplemented syscalls: Below 2048 return
-ENOSYS, above 2048 raise SIGILL.
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
[will: Tweak die string to identify as compat syscall]
Signed-off-by: Will Deacon <will.deacon@arm.com>
We don't currently limit guest accesses to the LOR registers, which we
neither virtualize nor context-switch. As such, guests are provided with
unusable information/controls, and are not isolated from each other (or
the host).
To prevent these issues, we can trap register accesses and present the
illusion LORegions are unssupported by the CPU. To do this, we mask
ID_AA64MMFR1.LO, and set HCR_EL2.TLOR to trap accesses to the following
registers:
* LORC_EL1
* LOREA_EL1
* LORID_EL1
* LORN_EL1
* LORSA_EL1
... when trapped, we inject an UNDEFINED exception to EL1, simulating
their non-existence.
As noted in D7.2.67, when no LORegions are implemented, LoadLOAcquire
and StoreLORelease must behave as LoadAcquire and StoreRelease
respectively. We can ensure this by clearing LORC_EL1.EN when a CPU's
EL2 is first initialized, as the host kernel will not modify this.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Pull cleanup patchlet from Thomas Gleixner:
"A single commit removing a bunch of bogus double semicolons all over
the tree"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide/trivial: Remove ';;$' typo noise
do_task_stat() calls get_wchan(), which further does unwind_frame().
unwind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).
Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.
Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().
This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.
Reproducer:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent
Above commands result in:
Unable to handle kernel paging request at virtual address ffff801bd3d1e000
pgd = ffff8003cbe97c00
[ffff801bd3d1e000] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 96000006 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: ffff8003c21ba000 task.stack: ffff8003cc6c0000
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [<ffff00000808892c>] lr : [<ffff0000080860b8>] pstate: 60000145
sp : ffff8003cc6c3ab0
x29: ffff8003cc6c3ab0 x28: 0000000000000001
x27: 0000000000000026 x26: 0000000000000026
x25: 00000000000012d8 x24: 0000000000000000
x23: ffff8003c1c04000 x22: ffff000008c83000
x21: ffff8003c1c00000 x20: 000000000000000f
x19: ffff8003c1bc0000 x18: 0000fffffc593690
x17: 0000000000000000 x16: 0000000000000001
x15: 0000b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0000000000000001 x12: 0000000000000000
x11: 00000000e8f4883e x10: 0000000154f47ec8
x9 : 0000000070f367c0 x8 : 0000000000000000
x7 : 00008003f7290000 x6 : 0000000000000018
x5 : 0000000000000000 x4 : ffff8003c1c03cb0
x3 : ffff8003c1c03ca0 x2 : 00000017ffe80000
x1 : ffff8003cc6c3af8 x0 : ffff8003d3e9e000
Process ps (pid: 11696, stack limit = 0xffff8003cc6c0000)
Stack: (0xffff8003cc6c3ab0 to 0xffff8003cc6c4000)
[...]
[<ffff00000808892c>] unwind_frame+0x12c/0x180
[<ffff000008305008>] do_task_stat+0x864/0x870
[<ffff000008305c44>] proc_tgid_stat+0x3c/0x48
[<ffff0000082fde0c>] proc_single_show+0x5c/0xb8
[<ffff0000082b27e0>] seq_read+0x160/0x414
[<ffff000008289e6c>] __vfs_read+0x58/0x164
[<ffff00000828b164>] vfs_read+0x88/0x144
[<ffff00000828c2e8>] SyS_read+0x60/0xc0
[<ffff0000080834a0>] __sys_trace_return+0x0/0x4
Fixes: 20380bb390 (arm64: ftrace: fix a stack tracer's output under function graph tracer)
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
[catalin.marinas@arm.com: replace WARN_ON with WARN_ON_ONCE]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
On lkml suggestions were made to split up such trivial typo fixes into per subsystem
patches:
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
struct efi_uga_draw_protocol *uga = NULL, *first_uga;
efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
unsigned long nr_ugas;
- u32 *handles = (u32 *)uga_handle;;
+ u32 *handles = (u32 *)uga_handle;
efi_status_t status = EFI_INVALID_PARAMETER;
int i;
This patch is the result of the following script:
$ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
... followed by manual review to make sure it's all good.
Splitting this up is just crazy talk, let's get over with this and just do it.
Reported-by: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ID_AA64DFR0_EL1.PMUVer field doesn't follow the usual ID registers
scheme. While value 0xf indicates a non-architected PMU is implemented,
values 0x1 to 0xe indicate an increasingly featureful architected PMU,
as if the field were unsigned.
For more details, see ARM DDI 0487C.a, D10.1.4, "Alternative ID scheme
used for the Performance Monitors Extension version".
Currently, we treat the field as signed, and erroneously bail out for
values 0x8 to 0xe. Let's correct that.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
__show_regs pretty prints PC and LR by attempting to map them to kernel
function names to improve the utility of crash reports. Unfortunately,
this mapping is applied even when the pt_regs corresponds to user mode,
resulting in a KASLR oracle.
Avoid this issue by only looking up the function symbols when the register
state indicates that we're actually running at EL1.
Cc: <stable@vger.kernel.org>
Reported-by: NCSC Security <security@ncsc.gov.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Stop printing a (ratelimited) kernel message for each instance of an
unimplemented syscall being called. Userland making an unimplemented
syscall is not necessarily misbehaviour and to be expected with a
current userland running on an older kernel. Also, the current message
looks scary to users but does not actually indicate a real problem nor
help them narrow down the cause. Just rely on sys_ni_syscall() to return
-ENOSYS.
Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael Weiser <michael.weiser@gmx.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Our field definitions for CTR_EL0 suffer from a number of problems:
- The IDC and DIC fields are missing, which causes us to enable CTR
trapping on CPUs with either of these returning non-zero values.
- The ERG is FTR_LOWER_SAFE, whereas it should be treated like CWG as
FTR_HIGHER_SAFE so that applications can use it to avoid false sharing.
- [nit] A RES1 field is described as "RAO"
This patch updates the CTR_EL0 field definitions to fix these issues.
Cc: <stable@vger.kernel.org>
Cc: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In converting __range_ok() into a static inline, I inadvertently made
it more type-safe, but without considering the ordering of the relevant
conversions. This leads to quite a lot of Sparse noise about the fact
that we use __chk_user_ptr() after addr has already been converted from
a user pointer to an unsigned long.
Rather than just adding another cast for the sake of shutting Sparse up,
it seems reasonable to rework the types to make logical sense (although
the resulting codegen for __range_ok() remains identical). The only
callers this affects directly are our compat traps where the inferred
"user-pointer-ness" of a register value now warrants explicit casting.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In many cases, page tables can be accessed concurrently by either another
CPU (due to things like fast gup) or by the hardware page table walker
itself, which may set access/dirty bits. In such cases, it is important
to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
entries cannot be torn, merged or subject to apparent loss of coherence
due to compiler transformations.
Whilst there are some scenarios where this cannot happen (e.g. pinned
kernel mappings for the linear region), the overhead of using READ_ONCE
/WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
to reason about. This patch consistently uses these macros in the arch
code, as well as explicitly namespacing pointers to page table entries
from the entries themselves by using adopting a 'p' suffix for the former
(as is sometimes used elsewhere in the kernel source).
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
References to CPU part number MIDR_QCOM_FALKOR were dropped from the
mailing list patch due to mainline/arm64 branch dependency. So this
patch adds the missing part number.
Fixes: ec82b567a7 ("arm64: Implement branch predictor hardening for Falkor")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Update the ACPICA kernel code to upstream revision 20180105 including:
* Assorted fixes (Jung-uk Kim).
* Support for X32 ABI compilation (Anuj Mittal).
* Update of ACPICA copyrights to 2018 (Bob Moore).
- Prepare for future modifications to avoid executing the _STA control
method too early (Hans de Goede).
- Make the processor performance control library code ignore _PPC
notifications if they cannot be handled and fix up the C1 idle
state definition when it is used as a fallback state (Chen Yu,
Yazen Ghannam).
- Make it possible to use the SPCR table on x86 and to replace the
original IORT table with a new one from initrd (Prarit Bhargava,
Shunyong Yang).
- Add battery-related quirks for Asus UX360UA and UX410UAK and add
quirks for table parsing on Dell XPS 9570 and Precision M5530
(Kai Heng Feng).
- Address static checker warnings in the CPPC code (Gustavo Silva).
- Avoid printing a raw pointer to the kernel log in the smart
battery driver (Greg Kroah-Hartman).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJafGvJAAoJEILEb/54YlRxiusQAKUa+OM/oxTJkOEfGGRM8NlS
Hq/PaL/TnAj3nCoZN9fM38mI4gkxqu3eVMv6kfiqRe8VYmUX9r9tRbQ9kxvEYa7n
s6Dl+wdC9UND20QJkYVzPlaXbPuZyLFHt4Fkb1hp+HAGgNNYqc4e0lJvI82F2pdo
im1UFI84jg9UQV4WpUJL6ny2c/RMNtpUV5fOKFD8lkvBvVe7mtZTZ+1nZDeqXGkV
jzdrVTHLUEDhjS1o0TBmEsJGNeGOqnK/f+m8Rq4397guPAQQq18MYNC68SzhuGjP
iqhvIvI9sF197i66l/qgsubBifOV4At8Wb0LA5cU8CQLLpEW8GDktz/kucVHyzJ4
cVKuPXptBwwtPbNFHWO8reTUFMAnP7IpjtC31ntr6xWRQCiXv0/i2hRRN54g9T7e
FAOBmmys5DKFOq50OB5WdD3/Qz5OUuVgdbrSxNFARIZpQFtUn7Np2/nmNpPgrrcl
77hO8dpeXUTVvM4HpRQN1+r0KOTLfTAvWV7LYLAjCF9ivc0Vop/tYZQ2VEMSUEFD
SGKC30mGC4pphAjxcSYV282JR7Jx7arQ71ZA5uYTRRuxnEQd/2MC71fNjrFmCgUW
1Pumw0Pw6eZRjj1FZ/pj0X5lm7AlZj0dVzsJFgNb0FcJW0nOhN3czQrA4igoSVng
B2sRv9U8YDnDtzHyTPrY
=rVdp
-----END PGP SIGNATURE-----
Merge tag 'acpi-part2-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These are mostly fixes and cleanups, a few new quirks, a couple of
updates related to the handling of ACPI tables and ACPICA copyrights
refreshment.
Specifics:
- Update the ACPICA kernel code to upstream revision 20180105
including:
* Assorted fixes (Jung-uk Kim)
* Support for X32 ABI compilation (Anuj Mittal)
* Update of ACPICA copyrights to 2018 (Bob Moore)
- Prepare for future modifications to avoid executing the _STA
control method too early (Hans de Goede)
- Make the processor performance control library code ignore _PPC
notifications if they cannot be handled and fix up the C1 idle
state definition when it is used as a fallback state (Chen Yu,
Yazen Ghannam)
- Make it possible to use the SPCR table on x86 and to replace the
original IORT table with a new one from initrd (Prarit Bhargava,
Shunyong Yang)
- Add battery-related quirks for Asus UX360UA and UX410UAK and add
quirks for table parsing on Dell XPS 9570 and Precision M5530 (Kai
Heng Feng)
- Address static checker warnings in the CPPC code (Gustavo Silva)
- Avoid printing a raw pointer to the kernel log in the smart battery
driver (Greg Kroah-Hartman)"
* tag 'acpi-part2-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: sbshc: remove raw pointer from printk() message
ACPI: SPCR: Make SPCR available to x86
ACPI / CPPC: Use 64-bit arithmetic instead of 32-bit
ACPI / tables: Add IORT to injectable table list
ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530
ACPICA: Update version to 20180105
ACPICA: All acpica: Update copyrights to 2018
ACPI / processor: Set default C1 idle state description
ACPI / battery: Add quirk for Asus UX360UA and UX410UAK
ACPI: processor_perflib: Do not send _PPC change notification if not ready
ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs
ACPI / bus: Do not call _STA on battery devices with unmet dependencies
PCI: acpiphp_ibm: prepare for acpi_get_object_info() no longer returning status
ACPI: export acpi_bus_get_status_handle()
ACPICA: Add a missing pair of parentheses
ACPICA: Prefer ACPI_TO_POINTER() over ACPI_ADD_PTR()
ACPICA: Avoid NULL pointer arithmetic
ACPICA: Linux: add support for X32 ABI compilation
ACPI / video: Use true for boolean value
Spectre v1 mitigation:
- back-end version of array_index_mask_nospec()
- masking of the syscall number to restrict speculation through the
syscall table
- masking of __user pointers prior to deference in uaccess routines
Spectre v2 mitigation update:
- using the new firmware SMC calling convention specification update
- removing the current PSCI GET_VERSION firmware call mitigation as
vendors are deploying new SMCCC-capable firmware
- additional branch predictor hardening for synchronous exceptions and
interrupts while in user mode
Meltdown v3 mitigation update for Cavium Thunder X: unaffected but
hardware erratum gets in the way. The kernel now starts with the page
tables mapped as global and switches to non-global if kpti needs to be
enabled.
Other:
- Theoretical trylock bug fixed
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlp8lqcACgkQa9axLQDI
XvH2lxAAnsYqthpGQ11MtDJB+/UiBAFkg9QWPDkwrBDvNhgpll+J0VQuCN1QJ2GX
qQ8rkv8uV+y4Fqr8hORGJy5At+0aI63ZCJ72RGkZTzJAtbFbFGIDHP7RhAEIGJBS
Lk9kDZ7k39wLEx30UXIFYTTVzyHar397TdI7vkTcngiTzZ8MdFATfN/hiKO906q3
14pYnU9Um4aHUdcJ+FocL3dxvdgniuuMBWoNiYXyOCZXjmbQOnDNU2UrICroV8lS
mB+IHNEhX1Gl35QzNBtC0ET+aySfHBMJmM5oln+uVUljIGx6En1WLj6mrHYcx8U2
rIBm5qO/X/4iuzYPGkxwQtpjq3wPYxsSUnMdKJrsUZqAfy2QeIhFx6XUtJsZPB2J
/lgls5xSXMOS7oiOQtmVjcDLBURDmYXGwljXR4n4jLm4CT1V9qSLcKHu1gdFU9Mq
VuMUdPOnQub1vqKndi154IoYDTo21jAib2ktbcxpJfSJnDYoit4Gtnv7eWY+M3Pd
Toaxi8htM2HSRwbvslHYGW8ZcVpI79Jit+ti7CsFg7m9Lvgs0zxcnNui4uPYDymT
jh2JYxuirIJbX9aGGhnmkNhq9REaeZJg9LA2JM8S77FCHN3bnlSdaG6wy899J6EI
lK4anCuPQKKKhUia/dc1MeKwrmmC18EfPyGUkOzywg/jGwGCmZM=
=Y0TT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"As I mentioned in the last pull request, there's a second batch of
security updates for arm64 with mitigations for Spectre/v1 and an
improved one for Spectre/v2 (via a newly defined firmware interface
API).
Spectre v1 mitigation:
- back-end version of array_index_mask_nospec()
- masking of the syscall number to restrict speculation through the
syscall table
- masking of __user pointers prior to deference in uaccess routines
Spectre v2 mitigation update:
- using the new firmware SMC calling convention specification update
- removing the current PSCI GET_VERSION firmware call mitigation as
vendors are deploying new SMCCC-capable firmware
- additional branch predictor hardening for synchronous exceptions
and interrupts while in user mode
Meltdown v3 mitigation update:
- Cavium Thunder X is unaffected but a hardware erratum gets in the
way. The kernel now starts with the page tables mapped as global
and switches to non-global if kpti needs to be enabled.
Other:
- Theoretical trylock bug fixed"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits)
arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
arm/arm64: smccc: Make function identifiers an unsigned quantity
firmware/psci: Expose SMCCC version through psci_ops
firmware/psci: Expose PSCI conduit
arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
arm/arm64: KVM: Turn kvm_psci_version into a static inline
arm/arm64: KVM: Advertise SMCCC v1.1
arm/arm64: KVM: Implement PSCI 1.0 support
arm/arm64: KVM: Add smccc accessors to PSCI code
arm/arm64: KVM: Add PSCI_VERSION helper
arm/arm64: KVM: Consolidate the PSCI include files
arm64: KVM: Increment PC after handling an SMC trap
arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
arm64: entry: Apply BP hardening for suspicious interrupts from EL0
arm64: entry: Apply BP hardening for high-priority synchronous exceptions
arm64: futex: Mask __user pointers prior to dereference
...
SPCR is currently only enabled or ARM64 and x86 can use SPCR to setup
an early console.
General fixes include updating Documentation & Kconfig (for x86),
updating comments, and changing parse_spcr() to acpi_parse_spcr(),
and earlycon_init_is_deferred to earlycon_acpi_spcr_enable to be
more descriptive.
On x86, many systems have a valid SPCR table but the table version is
not 2 so the table version check must be a warning.
On ARM64 when the kernel parameter earlycon is used both the early console
and console are enabled. On x86, only the earlycon should be enabled by
by default. Modify acpi_parse_spcr() to allow options for initializing
the early console and console separately.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
with bitmap_{from,to}_arr32 over the kernel. Additionally to it:
* __check_eq_bitmap() now takes single nbits argument.
* __check_eq_u32_array is not used in new test but may be used in
future. So I don't remove it here, but annotate as __used.
Tested on arm64 and 32-bit BE mips.
[arnd@arndb.de: perf: arm_dsu_pmu: convert to bitmap_from_arr32]
Link: http://lkml.kernel.org/r/20180201172508.5739-2-ynorov@caviumnetworks.com
[ynorov@caviumnetworks.com: fix net/core/ethtool.c]
Link: http://lkml.kernel.org/r/20180205071747.4ekxtsbgxkj5b2fz@yury-thinkpad
Link: http://lkml.kernel.org/r/20171228150019.27953-2-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: David Decotigny <decot@googlers.com>,
Cc: David S. Miller <davem@davemloft.net>,
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we've standardised on SMCCC v1.1 to perform the branch
prediction invalidation, let's drop the previous band-aid.
If vendors haven't updated their firmware to do SMCCC 1.1, they
haven't updated PSCI either, so we don't loose anything.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1.
It is lovely. Really.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It is possible to take an IRQ from EL0 following a branch to a kernel
address in such a way that the IRQ is prioritised over the instruction
abort. Whilst an attacker would need to get the stars to align here,
it might be sufficient with enough calibration so perform BP hardening
in the rare case that we see a kernel address in the ELR when handling
an IRQ from EL0.
Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Software-step and PC alignment fault exceptions have higher priority than
instruction abort exceptions, so apply the BP hardening hooks there too
if the user PC appears to reside in kernel space.
Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Like we've done for get_user and put_user, ensure that user pointers
are masked before invoking the underlying __arch_{clear,copy_*}_user
operations.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a similar manner to array_index_mask_nospec, this patch introduces an
assembly macro (mask_nospec64) which can be used to bound a value under
speculation. This macro is then used to ensure that the indirect branch
through the syscall table is bounded under speculation, with out-of-range
addresses speculating as calls to sys_io_setup (0).
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, USER_DS represents an exclusive limit while KERNEL_DS is
inclusive. In order to do some clever trickery for speculation-safe
masking, we need them both to behave equivalently - there aren't enough
bits to make KERNEL_DS exclusive, so we have precisely one option. This
also happens to correct a longstanding false negative for a range
ending on the very top byte of kernel memory.
Mark Rutland points out that we've actually got the semantics of
addresses vs. segments muddled up in most of the places we need to
amend, so shuffle the {USER,KERNEL}_DS definitions around such that we
can correct those properly instead of just pasting "-1"s everywhere.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The identity map is mapped as both writeable and executable by the
SWAPPER_MM_MMUFLAGS and this is relied upon by the kpti code to manage
a synchronisation flag. Update the .pushsection flags to reflect the
actual mapping attributes.
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
pte_to_phys lives in assembler.h and takes its destination register as
the first argument. Move phys_to_pte out of head.S to sit with its
counterpart and rejig it to follow the same calling convention.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We don't fully understand the Cavium ThunderX erratum, but it appears
that mapping the kernel as nG can lead to horrible consequences such as
attempting to execute userspace from kernel context. Since kpti isn't
enabled for these CPUs anyway, simplify the comment justifying the lack
of post_ttbr_update_workaround in the exception trampoline.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since AArch64 assembly instructions take the destination register as
their first operand, do the same thing for the phys_to_ttbr macro.
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cavium ThunderX's erratum 27456 results in a corruption of icache
entries that are loaded from memory that is mapped as non-global
(i.e. ASID-tagged).
As KPTI is based on memory being mapped non-global, let's prevent
it from kicking in if this erratum is detected.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[will: Update comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Defaulting to global mappings for kernel space is generally good for
performance and appears to be necessary for Cavium ThunderX. If we
subsequently decide that we need to enable kpti, then we need to rewrite
our existing page table entries to be non-global. This is fiddly, and
made worse by the possible use of contiguous mappings, which require
a strict break-before-make sequence.
Since the enable callback runs on each online CPU from stop_machine
context, we can have all CPUs enter the idmap, where secondaries can
wait for the primary CPU to rewrite swapper with its MMU off. It's all
fairly horrible, but at least it only runs once.
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.
When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.
1) A System Error Interrupt (SEI) being raised by the Falkor core due
to the errant memory access attempting to access a region of memory
that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
memory. This behavior may only occur if the instruction cache is
disabled prior to or coincident with translation being changed from
enabled to disabled.
The conditions leading to this erratum will not occur when either of the
following occur:
1) A higher exception level disables translation of a lower exception level
(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).
To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-11-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull printk updates from Petr Mladek:
- Add a console_msg_format command line option:
The value "default" keeps the old "[time stamp] text\n" format. The
value "syslog" allows to see the syslog-like "<log
level>[timestamp] text" format.
This feature was requested by people doing regression tests, for
example, 0day robot. They want to have both filtered and full logs
at hands.
- Reduce the risk of softlockup:
Pass the console owner in a busy loop.
This is a new approach to the old problem. It was first proposed by
Steven Rostedt on Kernel Summit 2017. It marks a context in which
the console_lock owner calls console drivers and could not sleep.
On the other side, printk() callers could detect this state and use
a busy wait instead of a simple console_trylock(). Finally, the
console_lock owner checks if there is a busy waiter at the end of
the special context and eventually passes the console_lock to the
waiter.
The hand-off works surprisingly well and helps in many situations.
Well, there is still a possibility of the softlockup, for example,
when the flood of messages stops and the last owner still has too
much to flush.
There is increasing number of people having problems with
printk-related softlockups. We might eventually need to get better
solution. Anyway, this looks like a good start and promising
direction.
- Do not allow to schedule in console_unlock() called from printk():
This reverts an older controversial commit. The reschedule helped
to avoid softlockups. But it also slowed down the console output.
This patch is obsoleted by the new console waiter logic described
above. In fact, the reschedule made the hand-off less effective.
- Deprecate "%pf" and "%pF" format specifier:
It was needed on ia64, ppc64 and parisc64 to dereference function
descriptors and show the real function address. It is done
transparently by "%ps" and "pS" format specifier now.
Sergey Senozhatsky found that all the function descriptors were in
a special elf section and could be easily detected.
- Remove printk_symbol() API:
It has been obsoleted by "%pS" format specifier, and this change
helped to remove few continuous lines and a less intuitive old API.
- Remove redundant memsets:
Sergey removed unnecessary memset when processing printk.devkmsg
command line option.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
printk: drop redundant devkmsg_log_str memsets
printk: Never set console_may_schedule in console_trylock()
printk: Hide console waiter logic into helpers
printk: Add console owner and waiter logic to load balance console writes
kallsyms: remove print_symbol() function
checkpatch: add pF/pf deprecation warning
symbol lookup: introduce dereference_symbol_descriptor()
parisc64: Add .opd based function descriptor dereference
powerpc64: Add .opd based function descriptor dereference
ia64: Add .opd based function descriptor dereference
sections: split dereference_function_descriptor()
openrisc: Fix conflicting types for _exext and _stext
lib: do not use print_symbol()
irq debug: do not use print_symbol()
sysfs: do not use print_symbol()
drivers: do not use print_symbol()
x86: do not use print_symbol()
unicore32: do not use print_symbol()
sh: do not use print_symbol()
mn10300: do not use print_symbol()
...
Pull siginfo cleanups from Eric Biederman:
"Long ago when 2.4 was just a testing release copy_siginfo_to_user was
made to copy individual fields to userspace, possibly for efficiency
and to ensure initialized values were not copied to userspace.
Unfortunately the design was complex, it's assumptions unstated, and
humans are fallible and so while it worked much of the time that
design failed to ensure unitialized memory is not copied to userspace.
This set of changes is part of a new design to clean up siginfo and
simplify things, and hopefully make the siginfo handling robust enough
that a simple inspection of the code can be made to ensure we don't
copy any unitializied fields to userspace.
The design is to unify struct siginfo and struct compat_siginfo into a
single definition that is shared between all architectures so that
anyone adding to the set of information shared with struct siginfo can
see the whole picture. Hopefully ensuring all future si_code
assignments are arch independent.
The design is to unify copy_siginfo_to_user32 and
copy_siginfo_from_user32 so that those function are complete and cope
with all of the different cases documented in signinfo_layout. I don't
think there was a single implementation of either of those functions
that was complete and correct before my changes unified them.
The design is to introduce a series of helpers including
force_siginfo_fault that take the values that are needed in struct
siginfo and build the siginfo structure for their callers. Ensuring
struct siginfo is built correctly.
The remaining work for 4.17 (unless someone thinks it is post -rc1
material) is to push usage of those helpers down into the
architectures so that architecture specific code will not need to deal
with the fiddly work of intializing struct siginfo, and then when
struct siginfo is guaranteed to be fully initialized change copy
siginfo_to_user into a simple wrapper around copy_to_user.
Further there is work in progress on the issues that have been
documented requires arch specific knowledge to sort out.
The changes below fix or at least document all of the issues that have
been found with siginfo generation. Then proceed to unify struct
siginfo the 32 bit helpers that copy siginfo to and from userspace,
and generally clean up anything that is not arch specific with regards
to siginfo generation.
It is a lot but with the unification you can of siginfo you can
already see the code reduction in the kernel"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
mm/memory_failure: Remove unused trapno from memory_failure
signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
signal: Helpers for faults with specialized siginfo layouts
signal: Add send_sig_fault and force_sig_fault
signal: Replace memset(info,...) with clear_siginfo for clarity
signal: Don't use structure initializers for struct siginfo
signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
ptrace: Use copy_siginfo in setsiginfo and getsiginfo
signal: Unify and correct copy_siginfo_to_user32
signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
signal: Unify and correct copy_siginfo_from_user32
signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
signal: Move addr_lsb into the _sigfault union for clarity
...
- Security mitigations:
- variant 2: invalidating the branch predictor with a call to secure firmware
- variant 3: implementing KPTI for arm64
- 52-bit physical address support for arm64 (ARMv8.2)
- arm64 support for RAS (firmware first only) and SDEI (software
delegated exception interface; allows firmware to inject a RAS error
into the OS)
- Perf support for the ARM DynamIQ Shared Unit PMU
- CPUID and HWCAP bits updated for new floating point multiplication
instructions in ARMv8.4
- Removing some virtual memory layout printks during boot
- Fix initial page table creation to cope with larger than 32M kernel
images when 16K pages are enabled
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlpwxDMACgkQa9axLQDI
XvF55BAAniMpxPXnYNfv6l7/4O8eKo1lJIaG1wbej4JRZ/rT3K4Z3OBXW1dKHO8d
/PTbVmZ90IqIGROkoDrE+6xyjjn9yK3uuW4ytN2zQkBa8VFaHAnHlX+zKQcuwy9f
yxwiHk+C7vK5JR7mpXTazjRknsUv1MPtlTt7DQrSdq0KRDJVDNFC+grmbew2rz0X
cjQDqZqgzuFyrKxdiQVjDmc3zH9NsNBhDo0hlGHf2jK6bGJsAPtI8M2JcLrK8ITG
Ye/dD7BJp1mWD8ff0BPaMxu24qfAMNLH8f2dpTa986/H78irVz7i/t5HG0/1+5Jh
EE4OFRTKZ59Qgyo1zWcaJvdp8YjiaX/L4PWJg8CxM5OhP9dIac9ydcFQfWzpKpUs
xyZfmK6XliGFReAkVOOf5tEqFUDhMtsqhzPYmbmU1lp61wmSYIZ8CTenpWWCJSRO
NOGyG1X2uFBvP69+iPNlfTGz1r7tg1URY5iO8fUEIhY8LrgyORkiqw4OvPEgnMXP
Ngy+dXhyvnps2AAWbSX0O4puRlTgEYLT5KaMLzH/+gWsXATT0rzUCD/aOwUQq/Y7
SWXZHkb3jpmOZZnzZsLL2MNzEIPCFBwSUE9fSv4dA9d/N6tUmlmZALJjHkfzCDpj
+mPsSmAMTj72kUYzm0b5GCtOu/iQ2kDWOZjOM1m4+v/B+f7JoEE=
=iEjP
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"The main theme of this pull request is security covering variants 2
and 3 for arm64. I expect to send additional patches next week
covering an improved firmware interface (requires firmware changes)
for variant 2 and way for KPTI to be disabled on unaffected CPUs
(Cavium's ThunderX doesn't work properly with KPTI enabled because of
a hardware erratum).
Summary:
- Security mitigations:
- variant 2: invalidate the branch predictor with a call to
secure firmware
- variant 3: implement KPTI for arm64
- 52-bit physical address support for arm64 (ARMv8.2)
- arm64 support for RAS (firmware first only) and SDEI (software
delegated exception interface; allows firmware to inject a RAS
error into the OS)
- perf support for the ARM DynamIQ Shared Unit PMU
- CPUID and HWCAP bits updated for new floating point multiplication
instructions in ARMv8.4
- remove some virtual memory layout printks during boot
- fix initial page table creation to cope with larger than 32M kernel
images when 16K pages are enabled"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (104 commits)
arm64: Fix TTBR + PAN + 52-bit PA logic in cpu_do_switch_mm
arm64: Turn on KPTI only on CPUs that need it
arm64: Branch predictor hardening for Cavium ThunderX2
arm64: Run enable method for errata work arounds on late CPUs
arm64: Move BP hardening to check_and_switch_context
arm64: mm: ignore memory above supported physical address size
arm64: kpti: Fix the interaction between ASID switching and software PAN
KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA
KVM: arm64: Handle RAS SErrors from EL2 on guest exit
KVM: arm64: Handle RAS SErrors from EL1 on guest exit
KVM: arm64: Save ESR_EL2 on guest SError
KVM: arm64: Save/Restore guest DISR_EL1
KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
KVM: arm/arm64: mask/unmask daif around VHE guests
arm64: kernel: Prepare for a DISR user
arm64: Unconditionally enable IESB on exception entry/return for firmware-first
arm64: kernel: Survive corrected RAS errors notified by SError
arm64: cpufeature: Detect CPU RAS Extentions
arm64: sysreg: Move to use definitions for all the SCTLR bits
arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early
...
Whitelist Broadcom Vulcan/Cavium ThunderX2 processors in
unmap_kernel_at_el0(). These CPUs are not vulnerable to
CVE-2017-5754 and do not need KPTI when KASLR is off.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Use PSCI based mitigation for speculative execution attacks targeting
the branch predictor. We use the same mechanism as the one used for
Cortex-A CPUs, we expect the PSCI version call to have a side effect
of clearing the BTBs.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When a CPU is brought up after we have finalised the system
wide capabilities (i.e, features and errata), we make sure the
new CPU doesn't need a new errata work around which has not been
detected already. However we don't run enable() method on the new
CPU for the errata work arounds already detected. This could
cause the new CPU running without potential work arounds.
It is upto the "enable()" method to decide if this CPU should
do something about the errata.
Fixes: commit 6a6efbb45b ("arm64: Verify CPU errata work arounds on hotplugged CPU")
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are so many places that build struct siginfo by hand that at
least one of them is bound to get it wrong. A handful of cases in the
kernel arguably did just that when using the errno field of siginfo to
pass no errno values to userspace. The usage is limited to a single
si_code so at least does not mess up anything else.
Encapsulate this questionable pattern in a helper function so
that the userspace ABI is preserved.
Update all of the places that use this pattern to use the new helper
function.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The siginfo structure has all manners of holes with the result that a
structure initializer is not guaranteed to initialize all of the bits.
As we have to copy the structure to userspace don't even try to use
a structure initializer. Instead use clear_siginfo followed by initializing
selected fields. This gives a guarantee that uninitialized kernel memory
is not copied to userspace.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Instead of jumpping while !is_compat_task placee all of the code
inside of an if (is_compat_task) block. This allows the int i
variable to be properly limited to the compat block no matter how the
rest of ptrace_hbptriggered changes.
In a following change a non-variable declaration will preceed
was made independent to ensure the code is easy to review.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With ARM64_SW_TTBR0_PAN enabled, the exception entry code checks the
active ASID to decide whether user access was enabled (non-zero ASID)
when the exception was taken. On return from exception, if user access
was previously disabled, it re-instates TTBR0_EL1 from the per-thread
saved value (updated in switch_mm() or efi_set_pgd()).
Commit 7655abb953 ("arm64: mm: Move ASID from TTBR0 to TTBR1") makes a
TTBR0_EL1 + ASID switching non-atomic. Subsequently, commit 27a921e757
("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN") changes the
__uaccess_ttbr0_disable() function and asm macro to first write the
reserved TTBR0_EL1 followed by the ASID=0 update in TTBR1_EL1. If an
exception occurs between these two, the exception return code will
re-instate a valid TTBR0_EL1. Similar scenario can happen in
cpu_switch_mm() between setting the reserved TTBR0_EL1 and the ASID
update in cpu_do_switch_mm().
This patch reverts the entry.S check for ASID == 0 to TTBR0_EL1 and
disables the interrupts around the TTBR0_EL1 and ASID switching code in
__uaccess_ttbr0_disable(). It also ensures that, when returning from the
EFI runtime services, efi_set_pgd() doesn't leave a non-zero ASID in
TTBR1_EL1 by using uaccess_ttbr0_{enable,disable}.
The accesses to current_thread_info()->ttbr0 are updated to use
READ_ONCE/WRITE_ONCE.
As a safety measure, __uaccess_ttbr0_enable() always masks out any
existing non-zero ASID TTBR1_EL1 before writing in the new ASID.
Fixes: 27a921e757 ("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN")
Acked-by: Will Deacon <will.deacon@arm.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.
With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...
The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.
Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
KVM would like to consume any pending SError (or RAS error) after guest
exit. Today it has to unmask SError and use dsb+isb to synchronise the
CPU. With the RAS extensions we can use ESB to synchronise any pending
SError.
Add the necessary macros to allow DISR to be read and converted to an
ESR.
We clear the DISR register when we enable the RAS cpufeature, and the
kernel has not executed any ESB instructions. Any value we find in DISR
must have belonged to firmware. Executing an ESB instruction is the
only way to update DISR, so we can expect firmware to have handled
any deferred SError. By the same logic we clear DISR in the idle path.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS
extensions use SError to notify software about RAS errors, these can be
contained by the Error Syncronization Barrier.
An ACPI system with firmware-first may use SError as its 'SEI'
notification. Future patches may add code to 'claim' this SError as a
notification.
Other systems can distinguish these RAS errors from the SError ESR and
use the AET bits and additional data from RAS-Error registers to handle
the error. Future patches may add this kernel-first handling.
Without support for either of these we will panic(), even if we received
a corrected error. Add code to decode the severity of RAS errors. We can
safely ignore contained errors where the CPU can continue to make
progress. For all other errors we continue to panic().
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending. Add cpufeature detection.
Platform level RAS support may require additional firmware support.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[Rebased added config option, reworded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
__cpu_setup() configures SCTLR_EL1 using some hard coded hex masks,
and el2_setup() duplicates some this when setting RES1 bits.
Lets make this the same as KVM's hyp_init, which uses named bits.
First, we add definitions for all the SCTLR_EL{1,2} bits, the RES{1,0}
bits, and those we want to set or clear.
Add a build_bug checks to ensures all bits are either set or clear.
This means we don't need to preserve endian-ness configuration
generated elsewhere.
Finally, move the head.S and proc.S users of these hard-coded masks
over to the macro versions.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
this_cpu_has_cap() tests caps->desc not caps->matches, so it stops
walking the list when it finds a 'silent' feature, instead of
walking to the end of the list.
Prior to v4.6's 644c2ae198 ("arm64: cpufeature: Test 'matches' pointer
to find the end of the list") we always tested desc to find the end of
a capability list. This was changed for dubious things like PAN_NOT_UAO.
v4.7's e3661b128e ("arm64: Allow a capability to be checked on
single CPU") added this_cpu_has_cap() using the old desc style test.
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When refactoring the sigreturn code to handle SVE, I changed the
sigreturn implementation to store the new FPSIMD state from the
user sigframe into task_struct before reloading the state into the
CPU regs. This makes it easier to convert the data for SVE when
needed.
However, it turns out that the fpsimd_state structure passed into
fpsimd_update_current_state is not fully initialised, so assigning
the structure as a whole corrupts current->thread.fpsimd_state.cpu
with uninitialised data.
This means that if the garbage data written to .cpu happens to be a
valid cpu number, and the task is subsequently migrated to the cpu
identified by the that number, and then tries to enter userspace,
the CPU FPSIMD regs will be assumed to be correct for the task and
not reloaded as they should be. This can result in returning to
userspace with the FPSIMD registers containing data that is stale or
that belongs to another task or to the kernel.
Knowingly handing around a kernel structure that is incompletely
initialised with user data is a potential source of mistakes,
especially across source file boundaries. To help avoid a repeat
of this issue, this patch adapts the relevant internal API to hand
around the user-accessible subset only: struct user_fpsimd_state.
To avoid future surprises, this patch also converts all uses of
struct fpsimd_state that really only access the user subset, to use
struct user_fpsimd_state. A few missing consts are added to
function prototypes for good measure.
Thanks to Will for spotting the cause of the bug here.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It isn't entirely obvious if we're using software PAN because we
don't say anything about it in the boot log. But if we're using
hardware PAN we'll print a nice CPU feature message indicating
it. Add a print for software PAN too so we know if it's being
used or not.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Among the existing architecture specific versions of
copy_siginfo_to_user32 there are several different implementation
problems. Some architectures fail to handle all of the cases in in
the siginfo union. Some architectures perform a blind copy of the
siginfo union when the si_code is negative. A blind copy suggests the
data is expected to be in 32bit siginfo format, which means that
receiving such a signal via signalfd won't work, or that the data is
in 64bit siginfo and the code is copying nonsense to userspace.
Create a single instance of copy_siginfo_to_user32 that all of the
architectures can share, and teach it to handle all of the cases in
the siginfo union correctly, with the assumption that siginfo is
stored internally to the kernel is 64bit siginfo format.
A special case is made for x86 x32 format. This is needed as presence
of both x32 and ia32 on x86_64 results in two different 32bit signal
formats. By allowing this small special case there winds up being
exactly one code base that needs to be maintained between all of the
architectures. Vastly increasing the testing base and the chances of
finding bugs.
As the x86 copy of copy_siginfo_to_user32 the call of the x86
signal_compat_build_tests were moved into sigaction_compat_abi, so
that they will keep running.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.
Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.
In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.
When copying the embedded sigval union copy the si_int member. That
ensures the 32bit values passes through the kernel unchanged.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Sometimes a single capability could be listed multiple times with
differing matches(), e.g, CPU errata for different MIDR versions.
This breaks verify_local_cpu_feature() and this_cpu_has_cap() as
we stop checking for a capability on a CPU with the first
entry in the given table, which is not sufficient. Make sure we
run the checks for all entries of the same capability. We do
this by fixing __this_cpu_has_cap() to run through all the
entries in the given table for a match and reuse it for
verify_local_cpu_feature().
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The Kryo CPUs are also affected by the Falkor 1003 errata, so
we need to do the same workaround on Kryo CPUs. The MIDR is
slightly more complicated here, where the PART number is not
always the same when looking at all the bits from 15 to 4. Drop
the lower 8 bits and just look at the top 4 to see if it's '2'
and then consider those as Kryo CPUs. This covers all the
combinations without having to list them all out.
Fixes: 38fd94b027 ("arm64: Work around Falkor erratum 1003")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the early assembler page table code assumes that precisely
1xpgd, 1xpud, 1xpmd are sufficient to represent the early kernel text
mappings.
Unfortunately this is rarely the case when running with a 16KB granule,
and we also run into limits with 4KB granule when building much larger
kernels.
This patch re-writes the early page table logic to compute indices of
mappings for each level of page table, and if multiple indices are
required, the next-level page table is scaled up accordingly.
Also the required size of the swapper_pg_dir is computed at link time
to cover the mapping [KIMAGE_ADDR + VOFFSET, _end]. When KASLR is
enabled, an extra page is set aside for each level that may require extra
entries at runtime.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The trampoline page tables are positioned after the early page tables in
the kernel linker script.
As we are about to change the early page table logic to resolve the
swapper size at link time as opposed to compile time, the
SWAPPER_DIR_SIZE variable (currently used to locate the trampline)
will be rendered unsuitable for low level assembler.
This patch solves this issue by moving the trampoline before the PAN
page tables. The offset to the trampoline from ttbr1 can then be
expressed by: PAGE_SIZE + RESERVED_TTBR0_SIZE, which is available to the
entry assembler.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently one resolves the location of the reserved_ttbr0 for PAN by
taking a positive offset from swapper_pg_dir. In a future patch we wish
to extend the swapper s.t. its size is determined at link time rather
than comile time, rendering SWAPPER_DIR_SIZE unsuitable for such a low
level calculation.
In this patch we re-arrange the order of the linker script s.t. instead
one computes reserved_ttbr0 by subtracting RESERVED_TTBR0_SIZE from
swapper_pg_dir.
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When CONFIG_UNMAP_KERNEL_AT_EL0 is set the SDEI entry point and the rest
of the kernel may be unmapped when we take an event. If this may be the
case, use an entry trampoline that can switch to the kernel page tables.
We can't use the provided PSTATE to determine whether to switch page
tables as we may have interrupted the kernel's entry trampoline, (or a
normal-priority event that interrupted the kernel's entry trampoline).
Instead test for a user ASID in ttbr1_el1.
Save a value in regs->addr_limit to indicate whether we need to restore
the original ASID when returning from this event. This value is only used
by do_page_fault(), which we don't call with the SDEI regs.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
SDEI inherits the 'use hvc' bit that is also used by PSCI. PSCI does all
its initialisation early, SDEI does its late.
Remove the __init annotation from acpi_psci_use_hvc().
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.
Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),
Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.
This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.
Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.
Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.
When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Today the arm64 arch code allocates an extra IRQ stack per-cpu. If we
also have SDEI and VMAP stacks we need two extra per-cpu VMAP stacks.
Move the VMAP stack allocation out to a helper in a new header file.
This avoids missing THREADINFO_GFP, or getting the all-important alignment
wrong.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in
tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1
on VHE hosts, and allows future code to blindly access per-cpu variables
without triggering world-switch.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Setting si_code to 0 results in a userspace seeing an si_code of 0.
This is the same si_code as SI_USER. Posix and common sense requires
that SI_USER not be a signal specific si_code. As such this use of 0
for the si_code is a pretty horribly broken ABI.
Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
value of __SI_KILL and now sees a value of SIL_KILL with the result
that uid and pid fields are copied and which might copying the si_addr
field by accident but certainly not by design. Making this a very
flakey implementation.
Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
SIL_FAULT and the appropriate fields will be reliably copied.
But folks this is a new and unique kind of bad. This is massively
untested code bad. This is inventing new and unique was to get
siginfo wrong bad. This is don't even think about Posix or what
siginfo means bad. This is lots of eyeballs all missing the fact
that the code does the wrong thing bad. This is getting stuck
and keep making the same mistake bad.
I really hope we can find a non userspace breaking fix for this on a
port as new as arm64.
Possible ABI fixes include:
- Send the signal without siginfo
- Don't generate a signal
- Possibly assign and use an appropriate si_code
- Don't handle cases which can't happen
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Ref: 53631b54c8 ("arm64: Floating point and SIMD")
Ref: 32015c2356 ("arm64: exception: handle Synchronous External Abort")
Ref: 1d18c47c73 ("arm64: MMU fault handling and page table management")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
* 'for-next/perf' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
perf: ARM DynamIQ Shared Unit PMU support
dt-bindings: Document devicetree binding for ARM DSU PMU
arm_pmu: Use of_cpu_node_to_id helper
arm64: Use of_cpu_node_to_id helper for CPU topology parsing
irqchip: gic-v3: Use of_cpu_node_to_id helper
coresight: of: Use of_cpu_node_to_id helper
of: Add helper for mapping device node to logical CPU number
perf: Export perf_event_update_userpage
Falkor is susceptible to branch predictor aliasing and can
theoretically be attacked by malicious code. This patch
implements a mitigation for these attacks, preventing any
malicious entries from affecting other victim contexts.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
[will: fix label name when !CONFIG_KVM and remove references to MIDR_FALKOR]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cortex-A57, A72, A73 and A75 are susceptible to branch predictor aliasing
and can theoretically be attacked by malicious code.
This patch implements a PSCI-based mitigation for these CPUs when available.
The call into firmware will invalidate the branch predictor state, preventing
any malicious entries from affecting other victim contexts.
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Aliasing attacks against CPU branch predictors can allow an attacker to
redirect speculative control flow on some CPUs and potentially divulge
information from one context to another.
This patch adds initial skeleton code behind a new Kconfig option to
enable implementation-specific mitigations against these attacks for
CPUs that are affected.
Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We will soon need to invoke a CPU-specific function pointer after changing
page tables, so move post_ttbr_update_workaround out into C code to make
this possible.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order to invoke the CPU capability ->matches callback from the ->enable
callback for applying local-CPU workarounds, we need a handle on the
capability structure.
This patch passes a pointer to the capability structure to the ->enable
callback.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For non-KASLR kernels where the KPTI behaviour has not been overridden
on the command line we can use ID_AA64PFR0_EL1.CSV3 to determine whether
or not we should unmap the kernel whilst running at EL0.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Speculation attacks against the entry trampoline can potentially resteer
the speculative instruction stream through the indirect branch and into
arbitrary gadgets within the kernel.
This patch defends against these attacks by forcing a misprediction
through the return stack: a dummy BL instruction loads an entry into
the stack, so that the predicted program flow of the subsequent RET
instruction is to a branch-to-self instruction which is finally resolved
as a branch to the kernel vectors with speculation suppressed.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
print_symbol() is a very old API that has been obsoleted by %pS format
specifier in a normal printk() call.
Replace print_symbol() with a direct printk("%pS") call.
Link: http://lkml.kernel.org/r/20171211125025.2270-3-sergey.senozhatsky@gmail.com
To: Andrew Morton <akpm@linux-foundation.org>
To: Russell King <linux@armlinux.org.uk>
To: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Salter <msalter@redhat.com>
To: Tony Luck <tony.luck@intel.com>
To: David Howells <dhowells@redhat.com>
To: Yoshinori Sato <ysato@users.sourceforge.jp>
To: Guan Xuetao <gxt@mprc.pku.edu.cn>
To: Borislav Petkov <bp@alien8.de>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Thomas Gleixner <tglx@linutronix.de>
To: Peter Zijlstra <peterz@infradead.org>
To: Vineet Gupta <vgupta@synopsys.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-sh@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek@suse.com: updated commit message]
Signed-off-by: Petr Mladek <pmladek@suse.com>
ARM v8.4 extensions add new neon instructions for performing a
multiplication of each FP16 element of one vector with the corresponding
FP16 element of a second vector, and to add or subtract this without an
intermediate rounding to the corresponding FP32 element in a third vector.
This patch detects this feature and let the userspace know about it via a
HWCAP bit and MRS emulation.
Cc: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The UEFI memory map is a bit vague about how to interpret the
EFI_MEMORY_XP attribute when it is combined with EFI_MEMORY_RP and/or
EFI_MEMORY_WP, which have retroactively been redefined as cacheability
attributes rather than permission attributes.
So let's ignore EFI_MEMORY_XP if _RP and/or _WP are also set. In this
case, it is likely that they are being used to describe the capability
of the region (i.e., whether it has the controls to reconfigure it as
non-executable) rather than the nature of the contents of the region
(i.e., whether it contains data that we will never attempt to execute)
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: Vasyl Gomonovych <gomonovych@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180102181042.19074-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make use of the new generic helper to convert an of_node of a CPU
to the logical CPU id in parsing the topology.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CPU_PM_CPU_IDLE_ENTER_RETENTION skips calling cpu_pm_enter() and
cpu_pm_exit(). By not calling cpu_pm functions in idle entry/exit
paths we can reduce the latency involved in entering and exiting
the low power idle state.
On ARM64 based Qualcomm server platform we measured below overhead
for calling cpu_pm_enter and cpu_pm_exit for retention states.
workload: stress --hdd #CPUs --hdd-bytes 32M -t 30
Average overhead of cpu_pm_enter - 1.2us
Average overhead of cpu_pm_exit - 3.1us
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* for-next/52-bit-pa:
arm64: enable 52-bit physical address support
arm64: allow ID map to be extended to 52 bits
arm64: handle 52-bit physical addresses in page table entries
arm64: don't open code page table entry creation
arm64: head.S: handle 52-bit PAs in PTEs in early page table setup
arm64: handle 52-bit addresses in TTBR
arm64: limit PA size to supported range
arm64: add kconfig symbol to configure physical address size
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits
12..15 of a page table entry. Introduce macros to convert between a
physical address and its placement in a table entry, and change all
macros/functions that access PTEs to use them.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: some long lines wrapped]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Instead of open coding the generation of page table entries, use the
macros/functions that exist for this - pfn_p*d and p*d_populate. Most
code in the kernel already uses these macros, this patch tries to fix
up the few places that don't. This is useful for the next patch in this
series, which needs to change the page table entry logic, and it's
better to have that logic in one place.
The KVM extended ID map is special, since we're creating a level above
CONFIG_PGTABLE_LEVELS and the required function isn't available. Leave
it as is and add a comment to explain it. (The normal kernel ID map code
doesn't need this change because its page tables are created in assembly
(__create_page_tables)).
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits
12..15 in page table entries. Introduce a macro to move the bits there,
and change the early ID map and swapper table setup code to use it.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: additional comments for clarification]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits 2..5
in the TTBR registers. Introduce a couple of macros to move the bits
there, and change all TTBR writers to use them.
Leave TTBR0 PAN code unchanged, to avoid complicating it. A system with
52-bit PA will have PAN anyway (because it's ARMv8.1 or later), and a
system without 52-bit PA can only use up to 48-bit PAs. A later patch in
this series will add a kconfig dependency to ensure PAN is configured.
In addition, when using 52-bit PA there is a special alignment
requirement on the top-level table. We don't currently have any VA_BITS
configuration that would violate the requirement, but one could be added
in the future, so add a compile-time BUG_ON to check for it.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: added TTBR_BADD_MASK_52 comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 9de52a755c ("arm64: fpsimd: Fix failure to restore FPSIMD
state after signals") fixed an issue reported in our FPSIMD signal
restore code but inadvertently introduced another issue which tends to
manifest as random SEGVs in userspace.
The problem is that when we copy the struct fpsimd_state from the kernel
stack (populated from the signal frame) into the struct held in the
current thread_struct, we blindly copy uninitialised stack into the
"cpu" field, which means that context-switching of the FP registers is
no longer reliable.
This patch fixes the problem by copying only the user_fpsimd member of
struct fpsimd_state. We should really rework the function prototypes
to take struct user_fpsimd_state * instead, but let's just get this
fixed for now.
Cc: Dave Martin <Dave.Martin@arm.com>
Fixes: 9de52a755c ("arm64: fpsimd: Fix failure to restore FPSIMD state after signals")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, the SVE field in ID_AA64PFR0_EL1 is visible
unconditionally to userspace via the CPU ID register emulation,
irrespective of the kernel config. This means that if a kernel
configured with CONFIG_ARM64_SVE=n is run on SVE-capable hardware,
userspace will see SVE reported as present in the ID regs even
though the kernel forbids execution of SVE instructions.
This patch makes the exposure of the SVE field in ID_AA64PFR0_EL1
conditional on CONFIG_ARM64_SVE=y.
Since future architecture features are likely to encounter a
similar requirement, this patch adds a suitable helper macros for
use when declaring config-conditional ID register fields.
Fixes: 43994d824e ("arm64/sve: Detect SVE and activate runtime support")
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The only inclusion of asm/uaccess.h should be by linux/uaccess.h. All
other headers should use the latter.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.
When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.
1) A System Error Interrupt (SEI) being raised by the Falkor core due
to the errant memory access attempting to access a region of memory
that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
memory. This behavior may only occur if the instruction cache is
disabled prior to or coincident with translation being changed from
enabled to disabled.
The conditions leading to this erratum will not occur when either of the
following occur:
1) A higher exception level disables translation of a lower exception level
(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).
To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The literal pool entry for identifying the vectors base is the only piece
of information in the trampoline page that identifies the true location
of the kernel.
This patch moves it into a page-aligned region of the .rodata section
and maps this adjacent to the trampoline text via an additional fixmap
entry, which protects against any accidental leakage of the trampoline
contents.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There are now a handful of open-coded masks to extract the ASID from a
TTBR value, so introduce a TTBR_ASID_MASK and use that instead.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Allow explicit disabling of the entry trampoline on the kernel command
line (kpti=off) by adding a fake CPU feature (ARM64_UNMAP_KERNEL_AT_EL0)
that can be used to toggle the alternative sequences in our entry code and
avoid use of the trampoline altogether if desired. This also allows us to
make use of a static key in arm64_kernel_unmapped_at_el0().
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When unmapping the kernel at EL0, we use tpidrro_el0 as a scratch register
during exception entry from native tasks and subsequently zero it in
the kernel_ventry macro. We can therefore avoid zeroing tpidrro_el0
in the context-switch path for native tasks using the entry trampoline.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We rely on an atomic swizzling of TTBR1 when transitioning from the entry
trampoline to the kernel proper on an exception. We can't rely on this
atomicity in the face of Falkor erratum #E1003, so on affected cores we
can issue a TLB invalidation to invalidate the walk cache prior to
jumping into the kernel. There is still the possibility of a TLB conflict
here due to conflicting walk cache entries prior to the invalidation, but
this doesn't appear to be the case on these CPUs in practice.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Hook up the entry trampoline to our exception vectors so that all
exceptions from and returns to EL0 go via the trampoline, which swizzles
the vector base register accordingly. Transitioning to and from the
kernel clobbers x30, so we use tpidrro_el0 and far_el1 as scratch
registers for native tasks.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We will need to treat exceptions from EL0 differently in kernel_ventry,
so rework the macro to take the exception level as an argument and
construct the branch target using that.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The exception entry trampoline needs to be mapped at the same virtual
address in both the trampoline page table (which maps nothing else)
and also the kernel page table, so that we can swizzle TTBR1_EL1 on
exceptions from and return to EL0.
This patch maps the trampoline at a fixed virtual address in the fixmap
area of the kernel virtual address space, which allows the kernel proper
to be randomized with respect to the trampoline when KASLR is enabled.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To allow unmapping of the kernel whilst running at EL0, we need to
point the exception vectors at an entry trampoline that can map/unmap
the kernel on entry/exit respectively.
This patch adds the trampoline page, although it is not yet plugged
into the vector table and is therefore unused.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With the ASID now installed in TTBR1, we can re-enable ARM64_SW_TTBR0_PAN
by ensuring that we switch to a reserved ASID of zero when disabling
user access and restore the active user ASID on the uaccess enable path.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The post_ttbr0_update_workaround hook applies to any change to TTBRx_EL1.
Since we're using TTBR1 for the ASID, rename the hook to make it clearer
as to what it's doing.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When deciding whether to invalidate FPSIMD state cached in the cpu,
the backend function sve_flush_cpu_state() attempts to dereference
__this_cpu_read(fpsimd_last_state). However, this is not safe:
there is no guarantee that this task_struct pointer is still valid,
because the task could have exited in the meantime.
This means that we need another means to get the appropriate value
of TIF_SVE for the associated task.
This patch solves this issue by adding a cached copy of the TIF_SVE
flag in fpsimd_last_state, which we can check without dereferencing
the task pointer.
In particular, although this patch is not a KVM fix per se, this
means that this check is now done safely in the KVM world switch
path (which is currently the only user of this code).
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There is currently some duplicate logic to associate current's
FPSIMD context with the cpu when loading FPSIMD state into the cpu
regs.
Subsequent patches will update that logic, so in order to ensure it
only needs to be done in one place, this patch factors the relevant
code out into a new function fpsimd_bind_to_cpu().
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, loading of a task's fpsimd state into the CPU registers
is skipped if that task's state is already present in the registers
of that CPU.
However, the code relies on the struct fpsimd_state * (and by
extension struct task_struct *) to unambiguously identify a task.
There is a particular case in which this doesn't work reliably:
when a task exits, its task_struct may be recycled to describe a
new task.
Consider the following scenario:
1) Task P loads its fpsimd state onto cpu C.
per_cpu(fpsimd_last_state, C) := P;
P->thread.fpsimd_state.cpu := C;
2) Task X is scheduled onto C and loads its fpsimd state on C.
per_cpu(fpsimd_last_state, C) := X;
X->thread.fpsimd_state.cpu := C;
3) X exits, causing X's task_struct to be freed.
4) P forks a new child T, which obtains X's recycled task_struct.
T == X.
T->thread.fpsimd_state.cpu == C (inherited from P).
5) T is scheduled on C.
T's fpsimd state is not loaded, because
per_cpu(fpsimd_last_state, C) == T (== X) &&
T->thread.fpsimd_state.cpu == C.
(This is the check performed by fpsimd_thread_switch().)
So, T gets X's registers because the last registers loaded onto C
were those of X, in (2).
This patch fixes the problem by ensuring that the sched-in check
fails in (5): fpsimd_flush_task_state(T) is called when T is
forked, so that T->thread.fpsimd_state.cpu == C cannot be true.
This relies on the fact that T is not schedulable until after
copy_thread() completes.
Once T's fpsimd state has been loaded on some CPU C there may still
be other cpus D for which per_cpu(fpsimd_last_state, D) ==
&X->thread.fpsimd_state. But D is necessarily != C in this case,
and the check in (5) must fail.
An alternative fix would be to do refcounting on task_struct. This
would result in each CPU holding a reference to the last task whose
fpsimd state was loaded there. It's not clear whether this is
preferable, and it involves higher overhead than the fix proposed
in this patch. It would also move all the task_struct freeing
work into the context switch critical section, or otherwise some
deferred cleanup mechanism would need to be introduced, neither of
which seems obviously justified.
Cc: <stable@vger.kernel.org>
Fixes: 005f78cd88 ("arm64: defer reloading a task's FPSIMD state to userland resume")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: word-smithed the comment so it makes more sense]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Building the kernel with an LTO-enabled GCC spits out the following "const"
warning for the cpu_ops code:
mm/percpu.c:2168:20: error: pcpu_fc_names causes a section type conflict
with dt_supported_cpu_ops
const char * const pcpu_fc_names[PCPU_FC_NR] __initconst = {
^
arch/arm64/kernel/cpu_ops.c:34:37: note: ‘dt_supported_cpu_ops’ was declared here
static const struct cpu_operations *dt_supported_cpu_ops[] __initconst = {
Fix it by adding missed const qualifiers.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
bus access read/write events are not supported in A73, based on the
Cortex-A73 TRM r0p2, section 11.9 Events (pages 11-457 to 11-460).
Fixes: 5561b6c5e9 "arm64: perf: add support for Cortex-A73"
Acked-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Xu YiPing <xuyiping@hisilicon.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The fpsimd_update_current_state() function is responsible for
loading the FPSIMD state from the user signal frame into the
current task during sigreturn. When implementing support for SVE,
conditional code was added to this function in order to handle the
case where SVE state need to be loaded for the task and merged with
the FPSIMD data from the signal frame; however, the FPSIMD-only
case was unintentionally dropped.
As a result of this, sigreturn does not currently restore the
FPSIMD state of the task, except in the case where the system
supports SVE and the signal frame contains SVE state in addition to
FPSIMD state.
This patch fixes this bug by making the copy-in of the FPSIMD data
from the signal frame to thread_struct unconditional.
This remains a performance regression from v4.14, since the FPSIMD
state is now copied into thread_struct and then loaded back,
instead of _only_ being loaded into the CPU FPSIMD registers.
However, it is essential to call task_fpsimd_load() here anyway in
order to ensure that the SVE enable bit in CPACR_EL1 is set
correctly before returning to userspace. This could use some
refactoring, but since sigreturn is not a fast path I have kept
this patch as a pure fix and left the refactoring for later.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 8cd969d28f ("arm64/sve: Signal handling support")
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When building the arm64 kernel with both CONFIG_ARM64_MODULE_PLTS and
CONFIG_DYNAMIC_FTRACE enabled, the ftrace-mod.o object file is built
with the kernel and contains a trampoline that is linked into each
module, so that modules can be loaded far away from the kernel and
still reach the ftrace entry point in the core kernel with an ordinary
relative branch, as is emitted by the compiler instrumentation code
dynamic ftrace relies on.
In order to be able to build out of tree modules, this object file
needs to be included into the linux-headers or linux-devel packages,
which is undesirable, as it makes arm64 a special case (although a
precedent does exist for 32-bit PPC).
Given that the trampoline essentially consists of a PLT entry, let's
not bother with a source or object file for it, and simply patch it
in whenever the trampoline is being populated, using the existing
PLT support routines.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To allow the ftrace trampoline code to reuse the PLT entry routines,
factor it out and move it into asm/module.h.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Plenty of acronym soup here:
- Initial support for the Scalable Vector Extension (SVE)
- Improved handling for SError interrupts (required to handle RAS events)
- Enable GCC support for 128-bit integer types
- Remove kernel text addresses from backtraces and register dumps
- Use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- Perf PMU driver for the Statistical Profiling Extension (SPE)
- Perf PMU driver for Hisilicon's system PMUs
- Misc cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJaCcLqAAoJELescNyEwWM0JREH/2FbmD/khGzEtP8LW+o9D8iV
TBM02uWQxS1bbO1pV2vb+512YQO+iWfeQwJH9Jv2FZcrMvFv7uGRnYgAnJuXNGrl
W+LL6OhN22A24LSawC437RU3Xe7GqrtONIY/yLeJBPablfcDGzPK1eHRA0pUzcyX
VlyDruSHWX44VGBPV6JRd3x0vxpV8syeKOjbRvopRfn3Nwkbd76V3YSfEgwoTG5W
ET1sOnXLmHHdeifn/l1Am5FX1FYstpcd7usUTJ4Oto8y7e09tw3bGJCD0aMJ3vow
v1pCUWohEw7fHqoPc9rTrc1QEnkdML4vjJvMPUzwyTfPrN+7uEuMIEeJierW+qE=
=0qrg
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The big highlight is support for the Scalable Vector Extension (SVE)
which required extensive ABI work to ensure we don't break existing
applications by blowing away their signal stack with the rather large
new vector context (<= 2 kbit per vector register). There's further
work to be done optimising things like exception return, but the ABI
is solid now.
Much of the line count comes from some new PMU drivers we have, but
they're pretty self-contained and I suspect we'll have more of them in
future.
Plenty of acronym soup here:
- initial support for the Scalable Vector Extension (SVE)
- improved handling for SError interrupts (required to handle RAS
events)
- enable GCC support for 128-bit integer types
- remove kernel text addresses from backtraces and register dumps
- use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- perf PMU driver for the Statistical Profiling Extension (SPE)
- perf PMU driver for Hisilicon's system PMUs
- misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
arm64: Make ARMV8_DEPRECATED depend on SYSCTL
arm64: Implement __lshrti3 library function
arm64: support __int128 on gcc 5+
arm64/sve: Add documentation
arm64/sve: Detect SVE and activate runtime support
arm64/sve: KVM: Hide SVE from CPU features exposed to guests
arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
arm64/sve: KVM: Prevent guests from using SVE
arm64/sve: Add sysctl to set the default vector length for new processes
arm64/sve: Add prctl controls for userspace vector length management
arm64/sve: ptrace and ELF coredump support
arm64/sve: Preserve SVE registers around EFI runtime service calls
arm64/sve: Preserve SVE registers around kernel-mode NEON use
arm64/sve: Probe SVE capabilities and usable vector lengths
arm64: cpufeature: Move sys_caps_initialised declarations
arm64/sve: Backend logic for setting the vector length
arm64/sve: Signal handling support
arm64/sve: Support vector length resetting for new processes
arm64/sve: Core task context handling
arm64/sve: Low-level CPU setup
...
This patch enables detection of hardware SVE support via the
cpufeatures framework, and reports its presence to the kernel and
userspace via the new ARM64_SVE cpucap and HWCAP_SVE hwcap
respectively.
Userspace can also detect SVE using ID_AA64PFR0_EL1, using the
cpufeatures MRS emulation.
When running on hardware that supports SVE, this enables runtime
kernel support for SVE, and allows user tasks to execute SVE
instructions and make of the of the SVE-specific user/kernel
interface extensions implemented by this series.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Until KVM has full SVE support, guests must not be allowed to
execute SVE instructions.
This patch enables the necessary traps, and also ensures that the
traps are disabled again on exit from the guest so that the host
can still use SVE if it wants to.
On guest exit, high bits of the SVE Zn registers may have been
clobbered as a side-effect the execution of FPSIMD instructions in
the guest. The existing KVM host FPSIMD restore code is not
sufficient to restore these bits, so this patch explicitly marks
the CPU as not containing cached vector state for any task, thus
forcing a reload on the next return to userspace. This is an
interim measure, in advance of adding full SVE awareness to KVM.
This marking of cached vector state in the CPU as invalid is done
using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c. Due
to the repeated use of this rather obscure operation, it makes
sense to factor it out as a separate helper with a clearer name.
This patch factors it out as fpsimd_flush_cpu_state(), and ports
all callers to use it.
As a side effect of this refactoring, a this_cpu_write() in
fpsimd_cpu_pm_notifier() is changed to __this_cpu_write(). This
should be fine, since cpu_pm_enter() is supposed to be called only
with interrupts disabled.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Because of the effect of SVE on the size of the signal frame, the
default vector length used for new processes involves a tradeoff
between performance of SVE-enabled software on the one hand, and
reliability of non-SVE-aware software on the other hand.
For this reason, the best choice depends on the repertoire of
userspace software in use and is thus best left up to distro
maintainers, sysadmins and developers.
If CONFIG_SYSCTL and CONFIG_PROC_SYSCTL are enabled, this patch
exposes the default vector length in
/proc/sys/abi/sve_default_vector_length, where boot scripts or the
adventurous can poke it.
In common with other arm64 ABI sysctls, this control is currently
global: setting it requires CAP_SYS_ADMIN in the root user
namespace, but the value set is effective for subsequent execs in
all namespaces. The control only affects _new_ processes, however:
changing it does not affect the vector length of any existing
process.
The intended usage model is that if userspace is known to be fully
SVE-tolerant (or a developer is curious to find out) then this
parameter can be cranked up during system startup.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds two arm64-specific prctls, to permit userspace to
control its vector length:
* PR_SVE_SET_VL: set the thread's SVE vector length and vector
length inheritance mode.
* PR_SVE_GET_VL: get the same information.
Although these prctls resemble instruction set features in the SVE
architecture, they provide additional control: the vector length
inheritance mode is Linux-specific and nothing to do with the
architecture, and the architecture does not permit EL0 to set its
own vector length directly. Both can be used in portable tools
without requiring the use of SVE instructions.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: Fixed up prctl constants to avoid clash with PDEATHSIG]
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch defines and implements a new regset NT_ARM_SVE, which
describes a thread's SVE register state. This allows a debugger to
manipulate the SVE state, as well as being included in ELF
coredumps for post-mortem debugging.
Because the regset size and layout are dependent on the thread's
current vector length, it is not possible to define a C struct to
describe the regset contents as is done for existing regsets.
Instead, and for the same reasons, NT_ARM_SVE is based on the
freeform variable-layout approach used for the SVE signal frame.
Additionally, to reduce debug overhead when debugging threads that
might or might not have live SVE register state, NT_ARM_SVE may be
presented in one of two different formats: the old struct
user_fpsimd_state format is embedded for describing the state of a
thread with no live SVE state, whereas a new variable-layout
structure is embedded for describing live SVE state. This avoids a
debugger needing to poll NT_PRFPREG in addition to NT_ARM_SVE, and
allows existing userspace code to handle the non-SVE case without
too much modification.
For this to work, NT_ARM_SVE is defined with a fixed-format header
of type struct user_sve_header, which the recipient can use to
figure out the content, size and layout of the reset of the regset.
Accessor macros are defined to allow the vector-length-dependent
parts of the regset to be manipulated.
Signed-off-by: Alan Hayward <alan.hayward@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Okamoto Takayuki <tokamoto@jp.fujitsu.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The EFI runtime services ABI allows EFI to make free use of the
FPSIMD registers during EFI runtime service calls, subject to the
callee-save requirements of the AArch64 procedure call standard.
However, the SVE architecture allows upper bits of the SVE vector
registers to be zeroed as a side-effect of FPSIMD V-register
writes. This means that the SVE vector registers must be saved in
their entirety in order to avoid data loss: non-SVE-aware EFI
implementations cannot restore them correctly.
The non-IRQ case is already handled gracefully by
kernel_neon_begin(). For the IRQ case, this patch allocates a
suitable per-CPU stash buffer for the full SVE register state and
uses it to preserve the affected registers around EFI calls. It is
currently unclear how the EFI runtime services ABI will be
clarified with respect to SVE, so it safest to assume that the
predicate registers and FFR must be saved and restored too.
No attempt is made to restore the restore the vector length after
a call, for now. It is deemed rather insane for EFI to change it,
and contemporary EFI implementations certainly won't.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Kernel-mode NEON will corrupt the SVE vector registers, due to the
way they alias the FPSIMD vector registers in the hardware.
This patch ensures that any live SVE register content for the task
is saved by kernel_neon_begin(). The data will be restored in the
usual way on return to userspace.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch uses the cpufeatures framework to determine common SVE
capabilities and vector lengths, and configures the runtime SVE
support code appropriately.
ZCR_ELx is not really a feature register, but it is convenient to
use it as a template for recording the maximum vector length
supported by a CPU, using the LEN field. This field is similar to
a feature field in that it is a contiguous bitfield for which we
want to determine the minimum system-wide value. This patch adds
ZCR as a pseudo-register in cpuinfo/cpufeatures, with appropriate
custom code to populate it. Finding the minimum supported value of
the LEN field is left to the cpufeatures framework in the usual
way.
The meaning of ID_AA64ZFR0_EL1 is not architecturally defined yet,
so for now we just require it to be zero.
Note that much of this code is dormant and SVE still won't be used
yet, since system_supports_sve() remains hardwired to false.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
update_cpu_features() currently cannot tell whether it is being
called during early or late secondary boot. This doesn't
desperately matter for anything it currently does.
However, SVE will need to know here whether the set of available
vector lengths is known or still to be determined when booting a
CPU, so that it can be updated appropriately.
This patch simply moves the sys_caps_initialised stuff to the top
of the file so that it can be used more widely. There doesn't seem
to be a more obvious place to put it.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch implements the core logic for changing a task's vector
length on request from userspace. This will be used by the ptrace
and prctl frontends that are implemented in later patches.
The SVE architecture permits, but does not require, implementations
to support vector lengths that are not a power of two. To handle
this, logic is added to check a requested vector length against a
possibly sparse bitmap of available vector lengths at runtime, so
that the best supported value can be chosen.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch implements support for saving and restoring the SVE
registers around signals.
A fixed-size header struct sve_context is always included in the
signal frame encoding the thread's vector length at the time of
signal delivery, optionally followed by a variable-layout structure
encoding the SVE registers.
Because of the need to preserve backwards compatibility, the FPSIMD
view of the SVE registers is always dumped as a struct
fpsimd_context in the usual way, in addition to any sve_context.
The SVE vector registers are dumped in full, including bits 127:0
of each register which alias the corresponding FPSIMD vector
registers in the hardware. To avoid any ambiguity about which
alias to restore during sigreturn, the kernel always restores bits
127:0 of each SVE vector register from the fpsimd_context in the
signal frame (which must be present): userspace needs to take this
into account if it wants to modify the SVE vector register contents
on return from a signal.
FPSR and FPCR, which are used by both FPSIMD and SVE, are not
included in sve_context because they are always present in
fpsimd_context anyway.
For signal delivery, a new helper
fpsimd_signal_preserve_current_state() is added to update _both_
the FPSIMD and SVE views in the task struct, to make it easier to
populate this information into the signal frame. Because of the
redundancy between the two views of the state, only one is updated
otherwise.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's desirable to be able to reset the vector length to some sane
default for new processes, since the new binary and its libraries
may or may not be SVE-aware.
This patch tracks the desired post-exec vector length (if any) in a
new thread member sve_vl_onexec, and adds a new thread flag
TIF_SVE_VL_INHERIT to control whether to inherit or reset the
vector length. Currently these are inactive. Subsequent patches
will provide the capability to configure them.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.
Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task. To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.
The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.
When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary. Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.
TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected. This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly. If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.
The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace. For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens. The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel. This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.
TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
* exec
* fork and clone
Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: Will Deacon <will.deacon@arm.com>
To enable the kernel to use SVE, SVE traps from EL1 to EL2 must be
disabled. To take maximum advantage of the hardware, the full
available vector length also needs to be enabled for EL1 by
programming ZCR_EL2.LEN. (The kernel will program ZCR_EL1.LEN as
required, but this cannot override the limit set by ZCR_EL2.)
This patch makes the appropriate changes to the EL2 early setup
code.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Manipulating the SVE architectural state, including the vector and
predicate registers, first-fault register and the vector length,
requires the use of dedicated instructions added by SVE.
This patch adds suitable assembly functions for saving and
restoring the SVE registers and querying the vector length.
Setting of the vector length is done as part of register restore.
Since people building kernels may not all get an SVE-enabled
toolchain for a while, this patch uses macros that generate
explicit opcodes in place of assembler mnemonics.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The SVE architecture adds some system registers, ID register fields
and a dedicated ESR exception class.
This patch adds the appropriate definitions that will be needed by
the kernel.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The existing FPSIMD context switch code contains a couple of
instances of {set,clear}_ti_thread(task_thread_info(task)). Since
there are thread flag manipulators that operate directly on
task_struct, this verbosity isn't strictly needed.
For consistency, this patch simplifies the affected calls. This
should have no impact on behaviour.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, armv8_deprected.c takes charge of the "abi" sysctl
directory, which makes life difficult for other code that wants to
register sysctls in the same directory.
There is a "new" [1] sysctl registration interface that removes the
need to define ctl_tables for parent directories explicitly, which
is ideal here.
This patch ports register_insn_emulation_sysctl() over to the
register_sysctl() interface and removes the redundant ctl_table for
"abi".
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[1] fea478d410 (sysctl: Add register_sysctl for normal sysctl
users)
The commit message notes an intent to port users of the
pre-existing interfaces over to register_sysctl(), though the
number of users of the new interface currently appears negligible.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently sys_rt_sigreturn() verifies that the base sigframe is
readable, but no similar check is performed on the extra data to
which an extra_context record points.
This matters because the extra data will be read with the
unprotected user accessors. However, this is not a problem at
present because the extra data base address is required to be
exactly at the end of the base sigframe. So, there would need to
be a non-user-readable kernel address within about 59K
(SIGFRAME_MAXSZ - sizeof(struct rt_sigframe)) of some address for
which access_ok(VERIFY_READ) returns true, in order for sigreturn
to be able to read kernel memory that should be inaccessible to the
user task. This is currently impossible due to the untranslatable
address hole between the TTBR0 and TTBR1 address ranges.
Disappearance of the hole between the TTBR0 and TTBR1 mapping
ranges would require the VA size for TTBR0 and TTBR1 to grow to at
least 55 bits, and either the disabling of tagged pointers for
userspace or enabling of tagged pointers for kernel space; none of
which is currently envisaged.
Even so, it is wrong to use the unprotected user accessors without
an accompanying access_ok() check.
To avoid the potential for future surprises, this patch does an
explicit access_ok() check on the extra data space when parsing an
extra_context record.
Fixes: 33f082614c ("arm64: signal: Allow expansion of the signal frame")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A couple of FPSIMD exception handling functions that are called
from entry.S are currently not annotated as such.
This is not a big deal since asmlinkage does nothing on arm/arm64,
but fixing the annotations is more consistent and may help avoid
future surprises.
This patch adds appropriate asmlinkage annotations for
do_fpsimd_acc() and do_fpsimd_exc().
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Function graph does not work currently when CONFIG_DYNAMIC_TRACE is not
set. This is because ftrace_function_trace is not always set to ftrace_stub
when function_graph is in use.
Do not skip checking of graph tracer functions when ftrace_function_trace
is set.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
It's possible for a user to deliberately trigger __dump_instr with a
chosen kernel address.
Let's avoid problems resulting from this by using get_user() rather than
__get_user(), ensuring that we don't erroneously access kernel memory.
Where we use __dump_instr() on kernel text, we already switch to
KERNEL_DS, so this shouldn't adversely affect those cases.
Fixes: 60ffc30d56 ("arm64: Exception handling")
Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Today SError is taken using the inv_entry macro that ends up in
bad_mode.
SError can be used by the RAS Extensions to notify either the OS or
firmware of CPU problems, some of which may have been corrected.
To allow this handling to be added, add a do_serror() C function
that just panic()s. Add the entry.S boiler plate to save/restore the
CPU registers and unmask debug exceptions. Future patches may change
do_serror() to return if the SError Interrupt was notification of a
corrected error.
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Wang Xiongfeng <wangxiongfengi2@huawei.com>
[Split out of a bigger patch, added compat path, renamed, enabled debug
exceptions]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Following our 'dai' order, irqs should be processed with debug and
serror exceptions unmasked.
Add a helper to unmask these two, (and fiq for good measure).
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
el0_sync also unmasks exceptions on a case-by-case basis, debug exceptions
are enabled, unless this was a debug exception. Irqs are unmasked for
some exception types but not for others.
el0_dbg should run with everything masked to prevent us taking a debug
exception from do_debug_exception. For the other cases we can unmask
everything. This changes the behaviour of fpsimd_{acc,exc} and el0_inv
which previously ran with irqs masked.
This patch removed the last user of enable_dbg_and_irq, remove it.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
el1_sync unmasks exceptions on a case-by-case basis, debug exceptions
are unmasked, unless this was a debug exception. IRQs are unmasked
for instruction and data aborts only if the interupted context had
irqs unmasked.
Following our 'dai' order, el1_dbg should run with everything masked.
For the other cases we can inherit whatever we interrupted.
Add a macro inherit_daif to set daif based on the interrupted pstate.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To take RAS Exceptions as quickly as possible we need to keep SError
unmasked as much as possible. We need to mask it during kernel_exit
as taking an error from this code will overwrite the exception-registers.
Adding a naked 'disable_daif' to kernel_exit causes a performance problem
for micro-benchmarks that do no real work, (e.g. calling getpid() in a
loop). This is because the ret_to_user loop has already masked IRQs so
that the TIF_WORK_MASK thread flags can't change underneath it, adding
disable_daif is an additional self-synchronising operation.
In the future, the RAS APEI code may need to modify the TIF_WORK_MASK
flags from an SError, in which case the ret_to_user loop must mask SError
while it examines the flags.
Disable all exceptions for return to EL1. For return to EL0 get the
ret_to_user loop to leave all exceptions masked once it has done its
work, this avoids an extra pstate-write.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Remove the local_{async,fiq}_{en,dis}able macros as they don't respect
our newly defined order and are only used to set the flags for process
context when we bring CPUs online.
Add a helper to do this. The IRQ flag varies as we want it masked on
the boot CPU until we are ready to handle interrupts.
The boot CPU unmasks SError during early boot once it can print an error
message. If we can print an error message about SError, we can do the
same for FIQ. Debug exceptions are already enabled by __cpu_setup(),
which has also configured MDSCR_EL1 to disable MDE and KDE.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently SError is always masked in the kernel. To support RAS exceptions
using SError on hardware with the v8.2 RAS Extensions we need to unmask
SError as much as possible.
Let's define an order for masking and unmasking exceptions. 'dai' is
memorable and effectively what we have today.
Disabling debug exceptions should cause all other exceptions to be masked.
Masking SError should mask irq, but not disable debug exceptions.
Masking irqs has no side effects for other flags. Keeping to this order
makes it easier for entry.S to know which exceptions should be unmasked.
FIQ is never expected, but we mask it when we mask debug exceptions, and
unmask it at all other times.
Given masking debug exceptions masks everything, we don't need macros
to save/restore that bit independently. Remove them and switch the last
caller over to use the daif calls.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There are a few places where we want to mask all exceptions. Today we
do this in a piecemeal fashion, typically we expect the caller to
have masked irqs and the arch code masks debug exceptions, ignoring
serror which is probably masked.
Make it clear that 'mask all exceptions' is the intention by adding
helpers to do exactly that.
This will let us unmask SError without having to add 'oh and SError'
to these paths.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
After commit 9e8e865bbe ("arm64: unify idmap removal"), we no need to
flush tlb in suspend.c, so the included file tlbflush.h can be removed.
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The vdso tries to check for a NULL res pointer in __kernel_clock_getres,
but only checks the lower 32 bits as is uses CBZ on the W register the
res pointer is held in.
Thus, if the res pointer happened to be aligned to a 4GiB boundary, we'd
spuriously skip storing the timespec to it, while returning a zero error code
to the caller.
Prevent this by checking the whole pointer, using CBZ on the X register
the res pointer is held in.
Fixes: 9031fefde6 ("arm64: VDSO support")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Andrew Pinski <apinski@cavium.com>
Reported-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can decode the PSTATE easily enough, so pretty-print it in register
dumps.
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Printing raw pointer values in backtraces has potential security
implications and are of questionable value anyway.
This patch follows x86's lead and removes the "Exception stack:" dump
from kernel backtraces, as well as converting PC/LR values to symbols
such as "sysrq_handle_crash+0x20/0x30".
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Software Step exception is missing after stepping a trapped instruction.
Ensure SPSR.SS gets set to 0 after emulating/skipping a trapped instruction
before doing ERET.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[will: replaced AARCH32_INSN_SIZE with 4]
Signed-off-by: Will Deacon <will.deacon@arm.com>
__memcpy_{to,from}io fall back to byte-at-a-time copying if both the
source and destination pointers are not 8-byte aligned. Since one of the
pointers always points at normal memory, this is unnecessary and
detrimental to performance, so only do byte copying until we hit an 8-byte
boundary for the device pointer.
This change was motivated by performance issues in the pstore driver.
On a test platform, measuring probe time for pstore, console buffer
size of 1/4MB and pmsg of 1/2MB, was in the 90-107ms region. Change
managed to reduce it to 10-25ms, an improvement in boot time.
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Merge in ARM PMU and perf updates for 4.15:
- Support for the Statistical Profiling Extension
- Support for Hisilicon's SoC PMU
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the ARM ARM clearly specifies the rules for inferring
the values of the ID register fields, fix the types of the
feature bits we have in the kernel.
As per ARM ARM DDI0487B.b, section D10.1.4 "Principles of the
ID scheme for fields in ID registers" lists the registers to
which the scheme applies along with the exceptions.
This patch changes the relevant feature bits from FTR_EXACT
to FTR_LOWER_SAFE to select the safer value. This will enable
an older kernel running on a new CPU detect the safer option
rather than completely disabling the feature.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When booting at EL2, ensure that we permit the EL1 host to sample
physical addresses and physical counter values using SPE.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
ARMv8-A adds a few optional features for ARMv8.2 and ARMv8.3.
Expose them to the userspace via HWCAPs and mrs emulation.
SHA2-512 - Instruction support for SHA512 Hash algorithm (e.g SHA512H,
SHA512H2, SHA512U0, SHA512SU1)
SHA3 - SHA3 crypto instructions (EOR3, RAX1, XAR, BCAX).
SM3 - Instruction support for Chinese cryptography algorithm SM3
SM4 - Instruction support for Chinese cryptography algorithm SM4
DP - Dot Product instructions (UDOT, SDOT).
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We register the pm/hotplug callbacks for FPSIMD as late_initcall,
which happens after the userspace is active (from initramfs via
populate_rootfs, a rootfs_initcall). Make sure we are ready even
before the userspace could potentially use it, by promoting to
a core_initcall.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We trap and emulate some instructions (e.g, mrs, deprecated instructions)
for the userspace. However the handlers for these are registered as
late_initcalls and the userspace could be up and running from the initramfs
by that time (with populate_rootfs, which is a rootfs_initcall()). This
could cause problems for the early applications ending up in failure
like :
[ 11.152061] modprobe[93]: undefined instruction: pc=0000ffff8ca48ff4
This patch promotes the specific calls to core_initcalls, which are
guaranteed to be completed before we hit userspace.
Cc: stable@vger.kernel.org
Cc: Dave Martin <dave.martin@arm.com>
Cc: Matthias Brugger <mbrugger@suse.com>
Cc: James Morse <james.morse@arm.com>
Reported-by: Matwey V. Kornilov <matwey.kornilov@gmail.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we inconsistently log identifying information for the boot CPU
and secondary CPUs. For the boot CPU, we log the MIDR and MPIDR across
separate messages, whereas for the secondary CPUs we only log the MIDR.
In some cases, it would be useful to know the MPIDR of secondary CPUs,
and it would be nice for these messages to be consistent.
This patch ensures that in the primary and secondary boot paths, we log
both the MPIDR and MIDR in a single message, with a consistent format.
the MPIDR is consistently padded to 10 hex characters to cover Aff3 in
bits 39:32, so that IDs can be compared easily.
The newly redundant message in setup_arch() is removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Al Stone <ahs3@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: added '0x' prefixes consistently]
Signed-off-by: Will Deacon <will.deacon@arm.com>
As you see in init/version.c, init_uts_ns.name.machine is initially
set to UTS_MACHINE. There is no point to copy the same string.
I dug the git history to figure out why this line is here. My best
guess is like this:
- This line has been around here since the initial support of arm64
by commit 9703d9d7f7 ("arm64: Kernel booting and initialisation").
If ARCH (=arm64) and UTS_MACHINE (=aarch64) do not match,
arch/$(ARCH)/Makefile is supposed to override UTS_MACHINE, but the
initial version of arch/arm64/Makefile missed to do that. Instead,
the boot code copied "aarch64" to init_utsname()->machine.
- Commit 94ed1f2cb5 ("arm64: setup: report ELF_PLATFORM as the
machine for utsname") replaced "aarch64" with ELF_PLATFORM to
make "uname" to reflect the endianness.
- ELF_PLATFORM does not help to provide the UTS machine name to rpm
target, so commit cfa88c7946 ("arm64: Set UTS_MACHINE in the
Makefile") fixed it. The commit simply replaced ELF_PLATFORM with
UTS_MACHINE, but missed the fact the string copy itself is no longer
needed.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
ILP32 series [1] introduces the dependency on <asm/is_compat.h> for
TASK_SIZE macro. Which in turn requires <asm/thread_info.h>, and
<asm/thread_info.h> include <asm/memory.h>, giving a circular dependency,
because TASK_SIZE is currently located in <asm/memory.h>.
In other architectures, TASK_SIZE is defined in <asm/processor.h>, and
moving TASK_SIZE there fixes the problem.
Discussion: https://patchwork.kernel.org/patch/9929107/
[1] https://github.com/norov/linux/tree/ilp32-next
CC: Will Deacon <will.deacon@arm.com>
CC: Laura Abbott <labbott@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When the kernel is entered at EL2 on an ARMv8.0 system, we construct
the EL1 pstate and make sure this uses the the EL1 stack pointer
(we perform an exception return to EL1h).
But if the kernel is either entered at EL1 or stays at EL2 (because
we're on a VHE-capable system), we fail to set SPsel, and use whatever
stack selection the higher exception level has choosen for us.
Let's not take any chance, and make sure that SPsel is set to one
before we decide the mode we're going to run in.
Cc: <stable@vger.kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull address-limit checking fixes from Ingo Molnar:
"This fixes a number of bugs in the address-limit (USER_DS) checks that
got introduced in the merge window, (mostly) affecting the ARM and
ARM64 platforms"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/syscalls: Move address limit check in loop
arm/syscalls: Optimize address limit check
Revert "arm/syscalls: Check address limit on user-mode return"
syscalls: Use CHECK_DATA_CORRUPTION for addr_limit_user_check
__efi_fpsimd_begin()/__efi_fpsimd_end() are for use when making EFI
calls only, so using them in non-EFI kernels is not allowed.
This patch compiles them out if CONFIG_EFI is not set.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
A bug was reported on ARM where set_fs might be called after it was
checked on the work pending function. ARM64 is not affected by this bug
but has a similar construct. In order to avoid any similar problems in
the future, the addr_limit_user_check function is moved at the beginning
of the loop.
Fixes: cf7de27ab3 ("arm64/syscalls: Check address limit on user-mode return")
Reported-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Will Drewry <wad@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1504798247-48833-5-git-send-email-keescook@chromium.org
The stacktraces always begin as follows:
[<c00117b4>] save_stack_trace_tsk+0x0/0x98
[<c0011870>] save_stack_trace+0x24/0x28
...
This is because the stack trace code includes the stack frames for
itself. This is incorrect behaviour, and also leads to "skip" doing the
wrong thing (which is the number of stack frames to avoid recording.)
Perversely, it does the right thing when passed a non-current thread.
Fix this by ensuring that we have a known constant number of frames
above the main stack trace function, and always skip these.
This was fixed for arch arm by commit 3683f44c42 ("ARM: stacktrace:
avoid listing stacktrace functions in stacktrace")
Link: http://lkml.kernel.org/r/1504078343-28754-1-git-send-email-guptap@codeaurora.org
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull namespace updates from Eric Biederman:
"Life has been busy and I have not gotten half as much done this round
as I would have liked. I delayed it so that a minor conflict
resolution with the mips tree could spend a little time in linux-next
before I sent this pull request.
This includes two long delayed user namespace changes from Kirill
Tkhai. It also includes a very useful change from Serge Hallyn that
allows the security capability attribute to be used inside of user
namespaces. The practical effect of this is people can now untar
tarballs and install rpms in user namespaces. It had been suggested to
generalize this and encode some of the namespace information
information in the xattr name. Upon close inspection that makes the
things that should be hard easy and the things that should be easy
more expensive.
Then there is my bugfix/cleanup for signal injection that removes the
magic encoding of the siginfo union member from the kernel internal
si_code. The mips folks reported the case where I had used FPE_FIXME
me is impossible so I have remove FPE_FIXME from mips, while at the
same time including a return statement in that case to keep gcc from
complaining about unitialized variables.
I almost finished the work to get make copy_siginfo_to_user a trivial
copy to user. The code is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3
But I did not have time/energy to get the code posted and reviewed
before the merge window opened.
I was able to see that the security excuse for just copying fields
that we know are initialized doesn't work in practice there are buggy
initializations that don't initialize the proper fields in siginfo. So
we still sometimes copy unitialized data to userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Introduce v3 namespaced file capabilities
mips/signal: In force_fcr31_sig return in the impossible case
signal: Remove kernel interal si_code magic
fcntl: Don't use ambiguous SIG_POLL si_codes
prctl: Allow local CAP_SYS_ADMIN changing exe_file
security: Use user_namespace::level to avoid redundant iterations in cap_capable()
userns,pidns: Verify the userns for new pid namespaces
signal/testing: Don't look for __SI_FAULT in userspace
signal/mips: Document a conflict with SI_USER with SIGFPE
signal/sparc: Document a conflict with SI_USER with SIGFPE
signal/ia64: Document a conflict with SI_USER with SIGFPE
signal/alpha: Document a conflict with SI_USER for SIGTRAP
Merge more updates from Andrew Morton:
- most of the rest of MM
- a small number of misc things
- lib/ updates
- checkpatch
- autofs updates
- ipc/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
ipc: optimize semget/shmget/msgget for lots of keys
ipc/sem: play nicer with large nsops allocations
ipc/sem: drop sem_checkid helper
ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
ipc: convert ipc_namespace.count from atomic_t to refcount_t
kcov: support compat processes
sh: defconfig: cleanup from old Kconfig options
mn10300: defconfig: cleanup from old Kconfig options
m32r: defconfig: cleanup from old Kconfig options
drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
drivers/pps: aesthetic tweaks to PPS-related content
cpumask: make cpumask_next() out-of-line
kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
kmod: split off umh headers into its own file
MAINTAINERS: clarify kmod is just a kernel module loader
kmod: split out umh code into its own file
test_kmod: flip INT checks to be consistent
test_kmod: remove paranoid UINT_MAX check on uint range processing
vfat: deduplicate hex2bin()
...
First, number of CPUs can't be negative number.
Second, different signnnedness leads to suboptimal code in the following
cases:
1)
kmalloc(nr_cpu_ids * sizeof(X));
"int" has to be sign extended to size_t.
2)
while (loff_t *pos < nr_cpu_ids)
MOVSXD is 1 byte longed than the same MOV.
Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".
Code savings on allyesconfig kernel: -3KB
add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function old new delta
coretemp_cpu_online 450 512 +62
rcu_init_one 1234 1272 +38
pci_device_probe 374 399 +25
...
pgdat_reclaimable_pages 628 556 -72
select_fallback_rq 446 369 -77
task_numa_find_cpu 1923 1807 -116
Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
EG76ridTolUFVV+wzJD9
=9cLP
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add enhanced Downstream Port Containment support, which prints more
details about Root Port Programmed I/O errors (Dongdong Liu)
- add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)
- add MediaTek MT2712 and MT7622 support (Ryder Lee)
- add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)
- add Qualcom IPQ8074 support (Varadarajan Narayanan)
- add R-Car r8a7743/5 device tree support (Biju Das)
- add Rockchip per-lane PHY support for better power management (Shawn
Lin)
- fix IRQ mapping for hot-added devices by replacing the
pci_fixup_irqs() boot-time design with a host bridge hook called at
probe-time (Lorenzo Pieralisi, Matthew Minter)
- fix race when enabling two devices that results in upstream bridge
not being enabled correctly (Srinath Mannam)
- fix pciehp power fault infinite loop (Keith Busch)
- fix SHPC bridge MSI hotplug events by enabling bus mastering
(Aleksandr Bezzubikov)
- fix a VFIO issue by correcting PCIe capability sizes (Alex
Williamson)
- fix an INTD issue on Xilinx and possibly other drivers by unifying
INTx IRQ domain support (Paul Burton)
- avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
Roedel)
- allow APM X-Gene device assignment to guests by adding an ACS quirk
(Feng Kan)
- fix driver crashes by disabling Extended Tags on Broadcom HT2100
(Extended Tags support is required for PCIe Receivers but not
Requesters, and we now enable them by default when Requesters support
them) (Sinan Kaya)
- fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
use the real Requester ID (not a phantom RID) (Robin Murphy)
- prevent assignment of Intel VMD children to guests (which may be
supported eventually, but isn't yet) by not associating an IOMMU with
them (Jon Derrick)
- fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
Bauer)
- fix a Function-Level Reset issue with Intel 750 NVMe by waiting
longer (up to 60sec instead of 1sec) for device to become ready
(Sinan Kaya)
- fix a Function-Level Reset issue on iProc Stingray by working around
hardware defects in the CRS implementation (Oza Pawandeep)
- fix an issue with Intel NVMe P3700 after an iProc reset by adding a
delay during shutdown (Oza Pawandeep)
- fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
in compose_msi_msg() (Stephen Hemminger)
- fix a wireless LAN driver timeout by clearing DesignWare MSI
interrupt status after it is handled, not before (Faiz Abbas)
- fix DesignWare ATU enable checking (Jisheng Zhang)
- reduce Layerscape dependencies on the bootloader by doing more
initialization in the driver (Hou Zhiqiang)
- improve Intel VMD performance allowing allocation of more IRQ vectors
than present CPUs (Keith Busch)
- improve endpoint framework support for initial DMA mask, different
BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
Vijay Abraham I, Stan Drozd)
- rework CRS support to add periodic messages while we poll during
enumeration and after Function-Level Reset and prepare for possible
other uses of CRS (Sinan Kaya)
- clean up Root Port AER handling by removing unnecessary code and
moving error handler methods to struct pcie_port_service_driver
(Christoph Hellwig)
- clean up error handling paths in various drivers (Bjorn Andersson,
Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
Lorenzo Pieralisi, Sergei Shtylyov)
- clean up SR-IOV resource handling by disabling VF decoding before
updating the corresponding resource structs (Gavin Shan)
- clean up DesignWare-based drivers by unifying quirks to update Class
Code and Interrupt Pin and related handling of write-protected
registers (Hou Zhiqiang)
- clean up by adding empty generic pcibios_align_resource() and
pcibios_fixup_bus() and removing empty arch-specific implementations
(Palmer Dabbelt)
- request exclusive reset control for several drivers to allow cleanup
elsewhere (Philipp Zabel)
- constify various structures (Arvind Yadav, Bhumika Goyal)
- convert from full_name() to %pOF (Rob Herring)
- remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
Lin)
* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
PCI: xgene: Clean up whitespace
PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
PCI: xgene: Fix platform_get_irq() error handling
PCI: xilinx-nwl: Fix platform_get_irq() error handling
PCI: rockchip: Fix platform_get_irq() error handling
PCI: altera: Fix platform_get_irq() error handling
PCI: spear13xx: Fix platform_get_irq() error handling
PCI: artpec6: Fix platform_get_irq() error handling
PCI: armada8k: Fix platform_get_irq() error handling
PCI: dra7xx: Fix platform_get_irq() error handling
PCI: exynos: Fix platform_get_irq() error handling
PCI: iproc: Clean up whitespace
PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
PCI: iproc: Add 500ms delay during device shutdown
PCI: Fix typos and whitespace errors
PCI: Remove unused "res" variable from pci_resource_io()
PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
PCI/AER: Reformat AER register definitions
iommu/vt-d: Prevent VMD child devices from being remapping targets
x86/PCI: Use is_vmd() rather than relying on the domain number
...
- Update the ACPICA code in the kernel to upstream revision 20170728
including:
* Alias operator handling update (Bob Moore).
* Deferred resolution of reference package elements (Bob Moore).
* Support for the _DMA method in walk resources (Bob Moore).
* Tables handling update and support for deferred table
verification (Lv Zheng).
* Update of SMMU models for IORT (Robin Murphy).
* Compiler and disassembler updates (Alex James, Erik Schmauss,
Ganapatrao Kulkarni, James Morse).
* Tools updates (Erik Schmauss, Lv Zheng).
* Assorted minor fixes and cleanups (Bob Moore, Kees Cook,
Lv Zheng, Shao Ming).
- Rework the initialization of non-wakeup GPEs with method handlers
in order to address a boot crash on some systems with Thunderbolt
devices connected at boot time where we miss an early hotplug
event due to a delay in GPE enabling (Rafael Wysocki).
- Rework the handling of PCI bridges when setting up ACPI-based
device wakeup in order to avoid disabling wakeup for bridges
prematurely (Rafael Wysocki).
- Consolidate Apple DMI checks throughout the tree, add support for
Apple device properties to the device properties framework and
use these properties for the handling of I2C and SPI devices on
Apple systems (Lukas Wunner).
- Add support for _DMA to the ACPI-based device properties lookup
code and make it possible to use the information from there to
configure DMA regions on ARM64 systems (Lorenzo Pieralisi).
- Fix several issues in the APEI code, add support for exporting
the BERT error region over sysfs and update APEI MAINTAINERS
entry with reviewers information (Borislav Petkov, Dongjiu Geng,
Loc Ho, Punit Agrawal, Tony Luck, Yazen Ghannam).
- Fix a potential initialization ordering issue in the ACPI EC
driver and clean it up somewhat (Lv Zheng).
- Update the ACPI SPCR driver to extend the existing XGENE 8250
workaround in it to a new platform (m400) and to work around
an Xgene UART clock issue (Graeme Gregory).
- Add a new utility function to the ACPI core to support using
ACPI OEM ID / OEM Table ID / Revision for system identification
in blacklisting or similar and switch over the existing code
already using this information to this new interface (Toshi Kani).
- Fix an xpower PMIC issue related to GPADC reads that always return
0 without extra pin manipulations (Hans de Goede).
- Add statements to print debug messages in a couple of places in
the ACPI core for easier diagnostics (Rafael Wysocki).
- Clean up the ACPI processor driver slightly (Colin Ian King,
Hanjun Guo).
- Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).
- Add a quirk for Dell OptiPlex 9020M to the ACPI backlight
driver (Alex Hung).
- Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
Ronald Tschalär, Sumeet Pawnikar).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZrcE+AAoJEILEb/54YlRxVGAP/RKzkJlYlOIXtMjf4XWg5ZfJ
RKZA68E9DW179KoBoTCVPD6/eD5UoEJ7fsWXFU2Hgp2xL3N1mZMAJHgAE4GoAwCx
uImoYvQgdPna7DawzRIFkvkfceYxNyh+KaV9s7xne4hAwsB7JzP9yf5Ywll53+oF
Le27/r6lDOaWhG7uYcxSabnQsWZQkBF5mj2GPzEpKDIHcLA1Vii0URzm7mAHdZsz
vGjYhxrshKYEVdkLSRn536m1rEfp2fqsRJ5wqNAazZJr6Cs1WIfNVuv/RfduRJpG
/zHIRAmgKV+3jp39cBpjdnexLczb1rGiCV1yZOvwCNM7jy4evL8vbL7VgcUCopaj
fHbF34chNG/hKJd3Zn3RRCTNzCs6bv+txslOMARxji5eyr2Q4KuVnvg5LM4hxOUP
23FvcYkBYWu4QCNLOTnC7y2OqK6WzOvDpfi7hf13Z42iNzeAUbwt1sVF0/OCwL51
Og6blSy2x8FidKp8oaBBboBzHEiKWnXBj/Hw8KEHVcsqZv1ZC6igNRAL3tjxamU8
98/Z2NSZHYPrrrn13tT9ywISYXReXzUF85787+0ofugvDe8/QyBH6UhzzZc/xKVA
t329JEjEFZZSLgxMIIa9bXoQANxkeZEGsxN6FfwvQhyIVdagLF3UvCjZl/q2NScC
9n++s32qfUBRHetGODWc
=6Ke9
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These include a usual ACPICA code update (this time to upstream
revision 20170728), a fix for a boot crash on some systems with
Thunderbolt devices connected at boot time, a rework of the handling
of PCI bridges when setting up device wakeup, new support for Apple
device properties, support for DMA configurations reported via ACPI on
ARM64, APEI-related updates, ACPI EC driver updates and assorted minor
modifications in several places.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20170728
including:
* Alias operator handling update (Bob Moore).
* Deferred resolution of reference package elements (Bob Moore).
* Support for the _DMA method in walk resources (Bob Moore).
* Tables handling update and support for deferred table
verification (Lv Zheng).
* Update of SMMU models for IORT (Robin Murphy).
* Compiler and disassembler updates (Alex James, Erik Schmauss,
Ganapatrao Kulkarni, James Morse).
* Tools updates (Erik Schmauss, Lv Zheng).
* Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv
Zheng, Shao Ming).
- Rework the initialization of non-wakeup GPEs with method handlers
in order to address a boot crash on some systems with Thunderbolt
devices connected at boot time where we miss an early hotplug event
due to a delay in GPE enabling (Rafael Wysocki).
- Rework the handling of PCI bridges when setting up ACPI-based
device wakeup in order to avoid disabling wakeup for bridges
prematurely (Rafael Wysocki).
- Consolidate Apple DMI checks throughout the tree, add support for
Apple device properties to the device properties framework and use
these properties for the handling of I2C and SPI devices on Apple
systems (Lukas Wunner).
- Add support for _DMA to the ACPI-based device properties lookup
code and make it possible to use the information from there to
configure DMA regions on ARM64 systems (Lorenzo Pieralisi).
- Fix several issues in the APEI code, add support for exporting the
BERT error region over sysfs and update APEI MAINTAINERS entry with
reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit
Agrawal, Tony Luck, Yazen Ghannam).
- Fix a potential initialization ordering issue in the ACPI EC driver
and clean it up somewhat (Lv Zheng).
- Update the ACPI SPCR driver to extend the existing XGENE 8250
workaround in it to a new platform (m400) and to work around an
Xgene UART clock issue (Graeme Gregory).
- Add a new utility function to the ACPI core to support using ACPI
OEM ID / OEM Table ID / Revision for system identification in
blacklisting or similar and switch over the existing code already
using this information to this new interface (Toshi Kani).
- Fix an xpower PMIC issue related to GPADC reads that always return
0 without extra pin manipulations (Hans de Goede).
- Add statements to print debug messages in a couple of places in the
ACPI core for easier diagnostics (Rafael Wysocki).
- Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun
Guo).
- Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).
- Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver
(Alex Hung).
- Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
Ronald Tschalär, Sumeet Pawnikar)"
* tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
ACPI / APEI: Suppress message if HEST not present
intel_pstate: convert to use acpi_match_platform_list()
ACPI / blacklist: add acpi_match_platform_list()
ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources
ACPI: make device_attribute const
ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region
ACPI: APEI: fix the wrong iteration of generic error status block
ACPI / processor: make function acpi_processor_check_duplicates() static
ACPI / EC: Clean up EC GPE mask flag
ACPI: EC: Fix possible issues related to EC initialization order
ACPI / PM: Add debug statements to acpi_pm_notify_handler()
ACPI: Add debug statements to acpi_global_event_handler()
ACPI / scan: Enable GPEs before scanning the namespace
ACPICA: Make it possible to enable runtime GPEs earlier
ACPICA: Dispatch active GPEs at init time
ACPI: SPCR: work around clock issue on xgene UART
ACPI: SPCR: extend XGENE 8250 workaround to m400
ACPI / LPSS: Don't abort ACPI scan on missing mem resource
mailbox: pcc: Drop uninformative output during boot
ACPI/IORT: Add IORT named component memory address limits
...
- VMAP_STACK support, allowing the kernel stacks to be allocated in
the vmalloc space with a guard page for trapping stack overflows. One
of the patches introduces THREAD_ALIGN and changes the generic
alloc_thread_stack_node() to use this instead of THREAD_SIZE (no
functional change for other architectures)
- Contiguous PTE hugetlb support re-enabled (after being reverted a
couple of times). We now have the semantics agreed in the generic mm
layer together with API improvements so that the architecture code can
detect between contiguous and non-contiguous huge PTEs
- Initial support for persistent memory on ARM: DC CVAP instruction
exposed to user space (HWCAP) and the in-kernel pmem API implemented
- raid6 improvements for arm64: faster algorithm for the delta syndrome
and implementation of the recovery routines using Neon
- FP/SIMD refactoring and removal of support for Neon in interrupt
context. This is in preparation for full SVE support
- PTE accessors converted from inline asm to cmpxchg so that we can
use LSE atomics if available (ARMv8.1)
- Perf support for Cortex-A35 and A73
- Non-urgent fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlmuunYACgkQa9axLQDI
XvEH9BAAo8V94GOMkX6HkT+2hjkl7DQ9krjumzmfzLV5AdgHMMzBNozmWKOCzgh0
yaxRcTUju3EyNeKhADr7yLiKDH8fnRPmYEJiVrwfgo7MaPApaCorr7LLIXfPGuxe
DTBHw+oxRMjlmaHeATX4PBWfQxAx+vjjhHqv3Qpmvdm4nYqR+0hZomH2BNsu64fk
AkSeUCxfCEyzSFIKuQM04M4zhSSZHz1tDxWI0b0RcK73qqEOuYZNkn6qxSKP5J4X
b2Y2U8nmxJ5C2fXpDYZaK9shiJ4Vu7X3Ocf/M7hsJzGY5z4dhnmUmxpHROaNiSvo
hCx7POYKyAPovps7zMSqcdsujkqOIQO8RHp4zGXx/pIr1RumjIiCY+RGpUYGibvU
N4Px5hZNneuHaPZZ+sWjOOdNB28xyzeUp2UK9Bb6uHB+/3xssMAD8Fd/b2ZLnS6a
YW3wrZmqA+ckfETsSRibabTs/ayqYHs2SDVwnlDJGtn+4Pw8oQpwGrwokxLQuuw3
uF2sNEPhJz+dcy21q3udYAQE1qOJBlLqTptgP96CHoVqh8X6nYSi5obT7y30ln3n
dhpZGOdi6R8YOouxgXS3Wg07pxn444L/VzDw5ku/5DkdryPOZCSRbk/2t8If6oDM
2VD6PCbTx3hsGc7SZ7FdSwIysD2j446u40OMGdH2iLB5jWBwyOM=
=vd0/
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- VMAP_STACK support, allowing the kernel stacks to be allocated in the
vmalloc space with a guard page for trapping stack overflows. One of
the patches introduces THREAD_ALIGN and changes the generic
alloc_thread_stack_node() to use this instead of THREAD_SIZE (no
functional change for other architectures)
- Contiguous PTE hugetlb support re-enabled (after being reverted a
couple of times). We now have the semantics agreed in the generic mm
layer together with API improvements so that the architecture code
can detect between contiguous and non-contiguous huge PTEs
- Initial support for persistent memory on ARM: DC CVAP instruction
exposed to user space (HWCAP) and the in-kernel pmem API implemented
- raid6 improvements for arm64: faster algorithm for the delta syndrome
and implementation of the recovery routines using Neon
- FP/SIMD refactoring and removal of support for Neon in interrupt
context. This is in preparation for full SVE support
- PTE accessors converted from inline asm to cmpxchg so that we can use
LSE atomics if available (ARMv8.1)
- Perf support for Cortex-A35 and A73
- Non-urgent fixes and cleanups
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits)
arm64: cleanup {COMPAT_,}SET_PERSONALITY() macro
arm64: introduce separated bits for mm_context_t flags
arm64: hugetlb: Cleanup setup_hugepagesz
arm64: Re-enable support for contiguous hugepages
arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages
arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages
arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages
arm64: hugetlb: Add break-before-make logic for contiguous entries
arm64: hugetlb: Spring clean huge pte accessors
arm64: hugetlb: Introduce pte_pgprot helper
arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present
arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
arm64: dma-mapping: Mark atomic_pool as __ro_after_init
arm64: dma-mapping: Do not pass data to gen_pool_set_algo()
arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths
arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
arm64: Move PTE_RDONLY bit handling out of set_pte_at()
kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg()
arm64: Convert pte handling from inline asm to using (cmp)xchg
arm64: neon/efi: Make EFI fpsimd save/restore variables static
...
Pull syscall updates from Ingo Molnar:
"Improve the security of set_fs(): we now check the address limit on a
number of key platforms (x86, arm, arm64) before returning to
user-space - without adding overhead to the typical system call fast
path"
* 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/syscalls: Check address limit on user-mode return
arm/syscalls: Check address limit on user-mode return
x86/syscalls: Check address limit on user-mode return
Pull RCU updates from Ingo Molnad:
"The main RCU related changes in this cycle were:
- Removal of spin_unlock_wait()
- SRCU updates
- RCU torture-test updates
- RCU Documentation updates
- Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant
- Miscellaneous RCU fixes
- CPU-hotplug fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
arch: Remove spin_unlock_wait() arch-specific definitions
locking: Remove spin_unlock_wait() generic definitions
drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
ipc: Replace spin_unlock_wait() with lock/unlock pair
exit: Replace spin_unlock_wait() with lock/unlock pair
completion: Replace spin_unlock_wait() with lock/unlock pair
doc: Set down RCU's scheduling-clock-interrupt needs
doc: No longer allowed to use rcu_dereference on non-pointers
doc: Add RCU files to docbook-generation files
doc: Update memory-barriers.txt for read-to-write dependencies
doc: Update RCU documentation
membarrier: Provide expedited private command
rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
rcu: Add warning to rcu_idle_enter() for irqs enabled
rcu: Make rcu_idle_enter() rely on callers disabling irqs
rcu: Add assertions verifying blocked-tasks list
rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
rcu: Add TPS() protection for _rcu_barrier_trace strings
rcu: Use idle versions of swait to make idle-hack clear
swait: Add idle variants which don't contribute to load average
...
There is some work that should be done after setting the personality.
Currently it's done in the macro, which is not the best idea.
In this patch new arch_setup_new_exec() routine is introduced, and all
setup code is moved there, as suggested by Catalin:
https://lkml.org/lkml/2017/8/4/494
Cc: Pratyush Anand <panand@redhat.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
[catalin.marinas@arm.com: comments changed or removed]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
With 16KB pages and a kernel Image larger than 16MB, the current
kaslr_early_init() logic for avoiding mappings across swapper table
boundaries fails since increasing the offset by kimg_sz just moves the
problem to the next boundary.
This patch rounds the offset down to (1 << SWAPPER_TABLE_SHIFT) if the
Image crosses a PMD_SIZE boundary.
Fixes: afd0e5a876 ("arm64: kaslr: Fix up the kernel image alignment")
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In the KASLR setup routine, we ensure that the early virtual mapping
of the kernel image does not cover more than a single table entry at
the level above the swapper block level, so that the assembler routines
involved in setting up this mapping can remain simple.
In this calculation we add the proposed KASLR offset to the values of
the _text and _end markers, and reject it if they would end up falling
in different swapper table sized windows.
However, when taking the addresses of _text and _end, the modulo offset
(the physical displacement modulo 2 MB) is already accounted for, and
so adding it again results in incorrect results. So disregard the modulo
offset from the calculation.
Fixes: 08cdac619c ("arm64: relocatable: deal with physically misaligned ...")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There are some tricky dependencies between the different stages of
flushing the FPSIMD register state during exec, and these can race
with context switch in ways that can cause the old task's regs to
leak across. In particular, a context switch during the memset() can
cause some of the task's old FPSIMD registers to reappear.
Disabling preemption for this small window would be no big deal for
performance: preemption is already disabled for similar scenarios
like updating the FPSIMD registers in sigreturn.
So, instead of rearranging things in ways that might swap existing
subtle bugs for new ones, this patch just disables preemption
around the FPSIMD state flushing so that races of this type can't
occur here. This brings fpsimd_flush_thread() into line with other
code paths.
Cc: stable@vger.kernel.org
Fixes: 674c242c93 ("arm64: flush FP/SIMD state correctly after execve()")
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently mm->context.flags field uses thread_info flags which is not
the best idea for many reasons. For example, mm_context_t doesn't need
most of thread_info flags. And it would be difficult to add new mm-related
flag if needed because it may easily interfere with TIF ones.
To deal with it, the new MMCF_AARCH32 flag is introduced for
mm_context_t->flags, where MMCF prefix stands for mm_context_t flags.
Also, mm_context_t flag doesn't require atomicity and ordering of the
access, so using set/clear_bit() is replaced with simple masks.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overridden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).
ARM64 architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.
Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
because all nonpanic CPUs are already offline by smp_send_stop() in this
case and smp_send_crash_stop() only works against online CPUs.
The result is that secondary CPUs registers are not saved by
crash_save_cpu() and the vmcore file misreports these CPUs as being
offline.
crash_smp_send_stop() is implemented to fix this problem by replacing the
existing smp_send_crash_stop() and adding a check for multiple calling to
the function. The function (strong symbol version) saves crash information
for nonpanic CPUs and machine_crash_shutdown() tries to save crash
information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
option is disabled.
* crash_kexec_post_notifiers : false
panic()
__crash_kexec()
machine_crash_shutdown()
crash_smp_send_stop() <= save crash dump for nonpanic cores
* crash_kexec_post_notifiers : true
panic()
crash_smp_send_stop() <= save crash dump for nonpanic cores
__crash_kexec()
machine_crash_shutdown()
crash_smp_send_stop() <= just return.
Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently PTE_RDONLY is treated as a hardware only bit and not handled
by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions.
The set_pte_at() function is responsible for setting this bit based on
the write permission or dirty state. This patch moves the PTE_RDONLY
handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect()
functions. The PAGE_* definitions to need to be updated to explicitly
include PTE_RDONLY when !PTE_WRITE.
The patch also removes the redundant PAGE_COPY(_EXEC) definitions as
they are identical to the corresponding PAGE_READONLY(_EXEC).
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>