Commit Graph

1916 Commits

Author SHA1 Message Date
Suzuki K Poulose
fd9d63da17 arm64: capabilities: Add support for features enabled early
The kernel detects and uses some of the features based on the boot
CPU and expects that all the following CPUs conform to it. e.g,
with VHE and the boot CPU running at EL2, the kernel decides to
keep the kernel running at EL2. If another CPU is brought up without
this capability, we use custom hooks (via check_early_cpu_features())
to handle it. To handle such capabilities add support for detecting
and enabling capabilities based on the boot CPU.

A bit is added to indicate if the capability should be detected
early on the boot CPU. The infrastructure then ensures that such
capabilities are probed and "enabled" early on in the boot CPU
and, enabled on the subsequent CPUs.

Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:01:41 +01:00
Suzuki K Poulose
d3aec8a28b arm64: capabilities: Restrict KPTI detection to boot-time CPUs
KPTI is treated as a system wide feature and is only detected if all
the CPUs in the sysetm needs the defense, unless it is forced via kernel
command line. This leaves a system with a mix of CPUs with and without
the defense vulnerable. Also, if a late CPU needs KPTI but KPTI was not
activated at boot time, the CPU is currently allowed to boot, which is a
potential security vulnerability.
This patch ensures that the KPTI is turned on if at least one CPU detects
the capability (i.e, change scope to SCOPE_LOCAL_CPU). Also rejetcs a late
CPU, if it requires the defense, when the system hasn't enabled it,

Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:01:40 +01:00
Suzuki K Poulose
5c137714dd arm64: capabilities: Introduce weak features based on local CPU
Now that we have the flexibility of defining system features based
on individual CPUs, introduce CPU feature type that can be detected
on a local SCOPE and ignores the conflict on late CPUs. This is
applicable for ARM64_HAS_NO_HW_PREFETCH, where it is fine for
the system to have CPUs without hardware prefetch turning up
later. We only suffer a performance penalty, nothing fatal.

Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:01:40 +01:00
Suzuki K Poulose
cce360b54c arm64: capabilities: Filter the entries based on a given mask
While processing the list of capabilities, it is useful to
filter out some of the entries based on the given mask for the
scope of the capabilities to allow better control. This can be
used later for handling LOCAL vs SYSTEM wide capabilities and more.
All capabilities should have their scope set to either LOCAL_CPU or
SYSTEM. No functional/flow change.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:01:38 +01:00
Suzuki K Poulose
5b4747c5dc arm64: capabilities: Add flags to handle the conflicts on late CPU
When a CPU is brought up, it is checked against the caps that are
known to be enabled on the system (via verify_local_cpu_capabilities()).
Based on the state of the capability on the CPU vs. that of System we
could have the following combinations of conflict.

	x-----------------------------x
	| Type  | System   | Late CPU |
	|-----------------------------|
	|  a    |   y      |    n     |
	|-----------------------------|
	|  b    |   n      |    y     |
	x-----------------------------x

Case (a) is not permitted for caps which are system features, which the
system expects all the CPUs to have (e.g VHE). While (a) is ignored for
all errata work arounds. However, there could be exceptions to the plain
filtering approach. e.g, KPTI is an optional feature for a late CPU as
long as the system already enables it.

Case (b) is not permitted for errata work arounds that cannot be activated
after the kernel has finished booting.And we ignore (b) for features. Here,
yet again, KPTI is an exception, where if a late CPU needs KPTI we are too
late to enable it (because we change the allocation of ASIDs etc).

Add two different flags to indicate how the conflict should be handled.

 ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU - CPUs may have the capability
 ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU - CPUs may not have the cappability.

Now that we have the flags to describe the behavior of the errata and
the features, as we treat them, define types for ERRATUM and FEATURE.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:01:37 +01:00
Suzuki K Poulose
143ba05d86 arm64: capabilities: Prepare for fine grained capabilities
We use arm64_cpu_capabilities to represent CPU ELF HWCAPs exposed
to the userspace and the CPU hwcaps used by the kernel, which
include cpu features and CPU errata work arounds. Capabilities
have some properties that decide how they should be treated :

 1) Detection, i.e scope : A cap could be "detected" either :
    - if it is present on at least one CPU (SCOPE_LOCAL_CPU)
	Or
    - if it is present on all the CPUs (SCOPE_SYSTEM)

 2) When is it enabled ? - A cap is treated as "enabled" when the
  system takes some action based on whether the capability is detected or
  not. e.g, setting some control register, patching the kernel code.
  Right now, we treat all caps are enabled at boot-time, after all
  the CPUs are brought up by the kernel. But there are certain caps,
  which are enabled early during the boot (e.g, VHE, GIC_CPUIF for NMI)
  and kernel starts using them, even before the secondary CPUs are brought
  up. We would need a way to describe this for each capability.

 3) Conflict on a late CPU - When a CPU is brought up, it is checked
  against the caps that are known to be enabled on the system (via
  verify_local_cpu_capabilities()). Based on the state of the capability
  on the CPU vs. that of System we could have the following combinations
  of conflict.

	x-----------------------------x
	| Type	| System   | Late CPU |
	------------------------------|
	|  a    |   y      |    n     |
	------------------------------|
	|  b    |   n      |    y     |
	x-----------------------------x

  Case (a) is not permitted for caps which are system features, which the
  system expects all the CPUs to have (e.g VHE). While (a) is ignored for
  all errata work arounds. However, there could be exceptions to the plain
  filtering approach. e.g, KPTI is an optional feature for a late CPU as
  long as the system already enables it.

  Case (b) is not permitted for errata work arounds which requires some
  work around, which cannot be delayed. And we ignore (b) for features.
  Here, yet again, KPTI is an exception, where if a late CPU needs KPTI we
  are too late to enable it (because we change the allocation of ASIDs
  etc).

So this calls for a lot more fine grained behavior for each capability.
And if we define all the attributes to control their behavior properly,
we may be able to use a single table for the CPU hwcaps (which cover
errata and features, not the ELF HWCAPs). This is a prepartory step
to get there. More bits would be added for the properties listed above.

We are going to use a bit-mask to encode all the properties of a
capabilities. This patch encodes the "SCOPE" of the capability.

As such there is no change in how the capabilities are treated.

Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:01:37 +01:00
Suzuki K Poulose
1e89baed5d arm64: capabilities: Move errata processing code
We have errata work around processing code in cpu_errata.c,
which calls back into helpers defined in cpufeature.c. Now
that we are going to make the handling of capabilities
generic, by adding the information to each capability,
move the errata work around specific processing code.
No functional changes.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:01:36 +01:00
Dave Martin
c0cda3b8ee arm64: capabilities: Update prototype for enable call back
We issue the enable() call back for all CPU hwcaps capabilities
available on the system, on all the CPUs. So far we have ignored
the argument passed to the call back, which had a prototype to
accept a "void *" for use with on_each_cpu() and later with
stop_machine(). However, with commit 0a0d111d40
("arm64: cpufeature: Pass capability structure to ->enable callback"),
there are some users of the argument who wants the matching capability
struct pointer where there are multiple matching criteria for a single
capability. Clean up the declaration of the call back to make it clear.

 1) Renamed to cpu_enable(), to imply taking necessary actions on the
    called CPU for the entry.
 2) Pass const pointer to the capability, to allow the call back to
    check the entry. (e.,g to check if any action is needed on the CPU)
 3) We don't care about the result of the call back, turning this to
    a void.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
[suzuki: convert more users, rename call back and drop results]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-26 18:00:37 +01:00
Dave Martin
af4a81b9cd arm64: fpsimd: Fix bad si_code for undiagnosed SIGFPE
Currently a SIGFPE delivered in response to a floating-point
exception trap may have si_code set to 0 on arm64.  As reported by
Eric, this is a bad idea since this is the value of SI_USER -- yet
this signal is definitely not the result of kill(2), tgkill(2) etc.
and si_uid and si_pid make limited sense whereas we do want to
yield a value for si_addr (which doesn't exist for SI_USER).

It's not entirely clear whether the architecure permits a
"spurious" fp exception trap where none of the exception flag bits
in ESR_ELx is set.  (IMHO the architectural intent is to forbid
this.)  However, it does permit those bits to contain garbage if
the TFV bit in ESR_ELx is 0.  That case isn't currently handled at
all and may result in si_code == 0 or si_code containing a FPE_FLT*
constant corresponding to an exception that did not in fact happen.

There is nothing sensible we can return for si_code in such cases,
but SI_USER is certainly not appropriate and will lead to violation
of legitimate userspace assumptions.

This patch allocates a new si_code value FPE_UNKNOWN that at least
does not conflict with any existing SI_* or FPE_* code, and yields
this in si_code for undiagnosable cases.  This is probably the best
simplicity/incorrectness tradeoff achieveable without relying on
implementation-dependent features or adding a lot of code.  In any
case, there appears to be no perfect solution possible that would
justify a lot of effort here.

Yielding FPE_UNKNOWN when some well-defined fp exception caused the
trap is a violation of POSIX, but this is forced by the
architecture.  We have no realistic prospect of yielding the
correct code in such cases.  At present I am not aware of any ARMv8
implementation that supports trapped floating-point exceptions in
any case.

The new code may be applicable to other architectures for similar
reasons.

No attempt is made to provide ESR_ELx to userspace in the signal
frame, since architectural limitations mean that it is unlikely to
provide much diagnostic value, doesn't benefit existing software
and would create ABI with no proven purpose.  The existing
mechanism for passing it also has problems of its own which may
result in the wrong value being passed to userspace due to
interaction with mm faults.  The implied rework does not appear
justified.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-20 10:03:11 +00:00
Suzuki K Poulose
7206dc93a5 arm64: Expose Arm v8.4 features
Expose the new features introduced by Arm v8.4 extensions to
Arm v8-A profile.

These include :

 1) Data indpendent timing of instructions. (DIT, exposed as HWCAP_DIT)
 2) Unaligned atomic instructions and Single-copy atomicity of loads
    and stores. (AT, expose as HWCAP_USCAT)
 3) LDAPR and STLR instructions with immediate offsets (extension to
    LRCPC, exposed as HWCAP_ILRCPC)
 4) Flag manipulation instructions (TS, exposed as HWCAP_FLAGM).

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-19 18:14:27 +00:00
Ard Biesheuvel
350e1dad0d arm64: asm: drop special versions of adr_l/ldr_l/str_l for modules
Now that we started keeping modules within 4 GB of the core kernel
in all cases, we no longer need to special case the adr_l/ldr_l/str_l
macros for modules to deal with them being loaded farther away.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-19 18:14:26 +00:00
Dave Martin
af40ff687b arm64: signal: Ensure si_code is valid for all fault signals
Currently, as reported by Eric, an invalid si_code value 0 is
passed in many signals delivered to userspace in response to faults
and other kernel errors.  Typically 0 is passed when the fault is
insufficiently diagnosable or when there does not appear to be any
sensible alternative value to choose.

This appears to violate POSIX, and is intuitively wrong for at
least two reasons arising from the fact that 0 == SI_USER:

 1) si_code is a union selector, and SI_USER (and si_code <= 0 in
    general) implies the existence of a different set of fields
    (siginfo._kill) from that which exists for a fault signal
    (siginfo._sigfault).  However, the code raising the signal
    typically writes only the _sigfault fields, and the _kill
    fields make no sense in this case.

    Thus when userspace sees si_code == 0 (SI_USER) it may
    legitimately inspect fields in the inactive union member _kill
    and obtain garbage as a result.

    There appears to be software in the wild relying on this,
    albeit generally only for printing diagnostic messages.

 2) Software that wants to be robust against spurious signals may
    discard signals where si_code == SI_USER (or <= 0), or may
    filter such signals based on the si_uid and si_pid fields of
    siginfo._sigkill.  In the case of fault signals, this means
    that important (and usually fatal) error conditions may be
    silently ignored.

In practice, many of the faults for which arm64 passes si_code == 0
are undiagnosable conditions such as exceptions with syndrome
values in ESR_ELx to which the architecture does not yet assign any
meaning, or conditions indicative of a bug or error in the kernel
or system and thus that are unrecoverable and should never occur in
normal operation.

The approach taken in this patch is to translate all such
undiagnosable or "impossible" synchronous fault conditions to
SIGKILL, since these are at least probably localisable to a single
process.  Some of these conditions should really result in a kernel
panic, but due to the lack of diagnostic information it is
difficult to be certain: this patch does not add any calls to
panic(), but this could change later if justified.

Although si_code will not reach userspace in the case of SIGKILL,
it is still desirable to pass a nonzero value so that the common
siginfo handling code can detect incorrect use of si_code == 0
without false positives.  In this case the si_code dependent
siginfo fields will not be correctly initialised, but since they
are not passed to userspace I deem this not to matter.

A few faults can reasonably occur in realistic userspace scenarios,
and _should_ raise a regular, handleable (but perhaps not
ignorable/blockable) signal: for these, this patch attempts to
choose a suitable standard si_code value for the raised signal in
each case instead of 0.

arm64 was the only arch to define a BUS_FIXME code, so after this
patch nobody defines it.  This patch therefore also removes the
relevant code from siginfo_layout().

Cc: James Morse <james.morse@arm.com>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-09 13:58:36 +00:00
Shanker Donthineni
6ae4b6e057 arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
The following two control bits DIC and IDC were defined for this
purpose. No need to perform point of unification cache maintenance
operations from software on systems where CPU caches are transparent.

This patch optimize the three functions __flush_cache_user_range(),
clean_dcache_area_pou() and invalidate_icache_range() if the hardware
reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
in order to avoid the unnecessary overhead.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
     is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
      not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
     instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
     or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
     for instruction to data coherence.

Co-authored-by: Philip Elcan <pelcan@codeaurora.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-09 13:57:57 +00:00
Ard Biesheuvel
ca79acca27 arm64/kernel: enable A53 erratum #8434319 handling at runtime
Omit patching of ADRP instruction at module load time if the current
CPUs are not susceptible to the erratum.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: Drop duplicate initialisation of .def_scope field]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-09 13:23:09 +00:00
Ard Biesheuvel
e8002e02ab arm64/errata: add REVIDR handling to framework
In some cases, core variants that are affected by a certain erratum
also exist in versions that have the erratum fixed, and this fact is
recorded in a dedicated bit in system register REVIDR_EL1.

Since the architecture does not require that a certain bit retains
its meaning across different variants of the same model, each such
REVIDR bit is tightly coupled to a certain revision/variant value,
and so we need a list of revidr_mask/midr pairs to carry this
information.

So add the struct member and the associated macros and handling to
allow REVIDR fixes to be taken into account.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-09 13:23:08 +00:00
Ard Biesheuvel
a257e02579 arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)

Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.

For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.

So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
  0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
  converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
  the load using an unaffected movn/movk sequence.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-09 13:21:53 +00:00
Will Deacon
e03e61c317 arm64: kaslr: Set TCR_EL1.NFD1 when CONFIG_RANDOMIZE_BASE=y
TCR_EL1.NFD1 was allocated by SVE and ensures that fault-surpressing SVE
memory accesses (e.g. speculative accesses from a first-fault gather load)
which translate via TTBR1_EL1 result in a translation fault if they
miss in the TLB when executed from EL0. This mitigates some timing attacks
against KASLR, where the kernel address space could otherwise be probed
efficiently using the FFR in conjunction with suppressed faults on SVE
loads.

Cc: Dave Martin <Dave.Martin@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 18:52:34 +00:00
Catalin Marinas
1f85b42a69 arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)
Commit 9730348075 ("arm64: Increase the max granular size") increased
the cache line size to 128 to match Cavium ThunderX, apparently for some
performance benefit which could not be confirmed. This change, however,
has an impact on the network packets allocation in certain
circumstances, requiring slightly over a 4K page with a significant
performance degradation.

This patch reverts L1_CACHE_SHIFT back to 6 (64-byte cache line) while
keeping ARCH_DMA_MINALIGN at 128. The cache_line_size() function was
changed to default to ARCH_DMA_MINALIGN in the absence of a meaningful
CTR_EL0.CWG bit field.

In addition, if a system with ARCH_DMA_MINALIGN < CTR_EL0.CWG is
detected, the kernel will force swiotlb bounce buffering for all
non-coherent devices since DMA cache maintenance on sub-CWG ranges is
not safe, leading to data corruption.

Cc: Tirumalesh Chalamarla <tchalamarla@cavium.com>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 18:52:32 +00:00
Will Deacon
a26731d9d1 arm64: Move show_unhandled_signals_ratelimited into traps.c
show_unhandled_signals_ratelimited is only called in traps.c, so move it
out of its macro in the dreaded system_misc.h and into a static function
in traps.c

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 18:52:31 +00:00
Will Deacon
a1ece8216c arm64: Introduce arm64_force_sig_info and hook up in arm64_notify_die
In preparation for consolidating our handling of printing unhandled
signals, introduce a wrapper around force_sig_info which can act as
the canonical place for dealing with show_unhandled_signals.

Initially, we just hook this up to arm64_notify_die.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 18:52:23 +00:00
Will Deacon
2c9120f3a8 arm64: signal: Make force_signal_inject more robust
force_signal_inject is a little flakey:

  * It only knows about SIGILL and SIGSEGV, so can potentially deliver
    other signals based on a partially initialised siginfo_t

  * It sets si_addr to point at the PC for SIGSEGV

  * It always operates on current, so doesn't need the regs argument

This patch fixes these issues by always assigning the si_addr field to
the address parameter of the function and updates the callers (including
those that indirectly call via arm64_notify_segfault) accordingly.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 18:52:22 +00:00
Pratyush Anand
9f416319f4 arm64: fix unwind_frame() for filtered out fn for function graph tracing
do_task_stat() calls get_wchan(), which further does unwind_frame().
unwind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.

Reproducer:

cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in:
Unable to handle kernel paging request at virtual address ffff801bd3d1e000
pgd = ffff8003cbe97c00
[ffff801bd3d1e000] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 96000006 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: ffff8003c21ba000 task.stack: ffff8003cc6c0000
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [<ffff00000808892c>] lr : [<ffff0000080860b8>] pstate: 60000145
sp : ffff8003cc6c3ab0
x29: ffff8003cc6c3ab0 x28: 0000000000000001
x27: 0000000000000026 x26: 0000000000000026
x25: 00000000000012d8 x24: 0000000000000000
x23: ffff8003c1c04000 x22: ffff000008c83000
x21: ffff8003c1c00000 x20: 000000000000000f
x19: ffff8003c1bc0000 x18: 0000fffffc593690
x17: 0000000000000000 x16: 0000000000000001
x15: 0000b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0000000000000001 x12: 0000000000000000
x11: 00000000e8f4883e x10: 0000000154f47ec8
x9 : 0000000070f367c0 x8 : 0000000000000000
x7 : 00008003f7290000 x6 : 0000000000000018
x5 : 0000000000000000 x4 : ffff8003c1c03cb0
x3 : ffff8003c1c03ca0 x2 : 00000017ffe80000
x1 : ffff8003cc6c3af8 x0 : ffff8003d3e9e000

Process ps (pid: 11696, stack limit = 0xffff8003cc6c0000)
Stack: (0xffff8003cc6c3ab0 to 0xffff8003cc6c4000)
[...]
[<ffff00000808892c>] unwind_frame+0x12c/0x180
[<ffff000008305008>] do_task_stat+0x864/0x870
[<ffff000008305c44>] proc_tgid_stat+0x3c/0x48
[<ffff0000082fde0c>] proc_single_show+0x5c/0xb8
[<ffff0000082b27e0>] seq_read+0x160/0x414
[<ffff000008289e6c>] __vfs_read+0x58/0x164
[<ffff00000828b164>] vfs_read+0x88/0x144
[<ffff00000828c2e8>] SyS_read+0x60/0xc0
[<ffff0000080834a0>] __sys_trace_return+0x0/0x4

Fixes: 20380bb390 (arm64: ftrace: fix a stack tracer's output under function graph tracer)
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
[catalin.marinas@arm.com: replace WARN_ON with WARN_ON_ONCE]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-23 13:46:38 +00:00
Robin Murphy
9085b34d0e arm64: uaccess: Formalise types for access_ok()
In converting __range_ok() into a static inline, I inadvertently made
it more type-safe, but without considering the ordering of the relevant
conversions. This leads to quite a lot of Sparse noise about the fact
that we use __chk_user_ptr() after addr has already been converted from
a user pointer to an unsigned long.

Rather than just adding another cast for the sake of shutting Sparse up,
it seems reasonable to rework the types to make logical sense (although
the resulting codegen for __range_ok() remains identical). The only
callers this affects directly are our compat traps where the inferred
"user-pointer-ness" of a register value now warrants explicit casting.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-19 13:59:58 +00:00
Bhupesh Sharma
04c4927359 arm64: Fix compilation error while accessing MPIDR_HWID_BITMASK from .S files
Since commit e1a50de378 (arm64: cputype: Silence Sparse warnings),
compilation of arm64 architecture is broken with the following error
messages:

  AR      arch/arm64/kernel/built-in.o
  arch/arm64/kernel/head.S: Assembler messages:
  arch/arm64/kernel/head.S:677: Error: found 'L', expected: ')'
  arch/arm64/kernel/head.S:677: Error: found 'L', expected: ')'
  arch/arm64/kernel/head.S:677: Error: found 'L', expected: ')'
  arch/arm64/kernel/head.S:677: Error: junk at end of line, first
  unrecognized character is `L'
  arch/arm64/kernel/head.S:677: Error: unexpected characters following
  instruction at operand 2 -- `movz x1,:abs_g1_s:0xff00ffffffUL'
  arch/arm64/kernel/head.S:677: Error: unexpected characters following
  instruction at operand 2 -- `movk x1,:abs_g0_nc:0xff00ffffffUL'

This patch fixes the same by using the UL() macro correctly for
assigning the MPIDR_HWID_BITMASK macro value.

Fixes: e1a50de378 ("arm64: cputype: Silence Sparse warnings")
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-19 12:13:29 +00:00
Robin Murphy
e1a50de378 arm64: cputype: Silence Sparse warnings
Sparse makes a fair bit of noise about our MPIDR mask being implicitly
long - let's explicitly describe it as such rather than just relying on
the value forcing automatic promotion.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-17 08:37:05 +00:00
Will Deacon
20a004e7b0 arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
In many cases, page tables can be accessed concurrently by either another
CPU (due to things like fast gup) or by the hardware page table walker
itself, which may set access/dirty bits. In such cases, it is important
to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
entries cannot be torn, merged or subject to apparent loss of coherence
due to compiler transformations.

Whilst there are some scenarios where this cannot happen (e.g. pinned
kernel mappings for the linear region), the overhead of using READ_ONCE
/WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
to reason about. This patch consistently uses these macros in the arch
code, as well as explicitly namespacing pointers to page table entries
from the entries themselves by using adopting a 'p' suffix for the former
(as is sometimes used elsewhere in the kernel source).

Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-16 18:13:57 +00:00
Linus Torvalds
15303ba5d1 KVM changes for 4.16
ARM:
 - Include icache invalidation optimizations, improving VM startup time
 
 - Support for forwarded level-triggered interrupts, improving
   performance for timers and passthrough platform devices
 
 - A small fix for power-management notifiers, and some cosmetic changes
 
 PPC:
 - Add MMIO emulation for vector loads and stores
 
 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
   requiring the complex thread synchronization of older CPU versions
 
 - Improve the handling of escalation interrupts with the XIVE interrupt
   controller
 
 - Support decrement register migration
 
 - Various cleanups and bugfixes.
 
 s390:
 - Cornelia Huck passed maintainership to Janosch Frank
 
 - Exitless interrupts for emulated devices
 
 - Cleanup of cpuflag handling
 
 - kvm_stat counter improvements
 
 - VSIE improvements
 
 - mm cleanup
 
 x86:
 - Hypervisor part of SEV
 
 - UMIP, RDPID, and MSR_SMI_COUNT emulation
 
 - Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
 
 - Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512
   features
 
 - Show vcpu id in its anonymous inode name
 
 - Many fixes and cleanups
 
 - Per-VCPU MSR bitmaps (already merged through x86/pti branch)
 
 - Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz
 Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm
 Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP
 xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3
 /9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p
 FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE=
 =C/uD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "ARM:

   - icache invalidation optimizations, improving VM startup time

   - support for forwarded level-triggered interrupts, improving
     performance for timers and passthrough platform devices

   - a small fix for power-management notifiers, and some cosmetic
     changes

  PPC:

   - add MMIO emulation for vector loads and stores

   - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
     requiring the complex thread synchronization of older CPU versions

   - improve the handling of escalation interrupts with the XIVE
     interrupt controller

   - support decrement register migration

   - various cleanups and bugfixes.

  s390:

   - Cornelia Huck passed maintainership to Janosch Frank

   - exitless interrupts for emulated devices

   - cleanup of cpuflag handling

   - kvm_stat counter improvements

   - VSIE improvements

   - mm cleanup

  x86:

   - hypervisor part of SEV

   - UMIP, RDPID, and MSR_SMI_COUNT emulation

   - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit

   - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
     AVX512 features

   - show vcpu id in its anonymous inode name

   - many fixes and cleanups

   - per-VCPU MSR bitmaps (already merged through x86/pti branch)

   - stable KVM clock when nesting on Hyper-V (merged through
     x86/hyperv)"

* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
  KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
  KVM: PPC: Book3S HV: Branch inside feature section
  KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
  KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
  KVM: PPC: Book3S PR: Fix broken select due to misspelling
  KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
  KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
  KVM: PPC: Book3S HV: Drop locks before reading guest memory
  kvm: x86: remove efer_reload entry in kvm_vcpu_stat
  KVM: x86: AMD Processor Topology Information
  x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
  kvm: embed vcpu id to dentry of vcpu anon inode
  kvm: Map PFN-type memory regions as writable (if possible)
  x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
  KVM: arm/arm64: Fixup userspace irqchip static key optimization
  KVM: arm/arm64: Fix userspace_irqchip_in_use counting
  KVM: arm/arm64: Fix incorrect timer_is_pending logic
  MAINTAINERS: update KVM/s390 maintainers
  MAINTAINERS: add Halil as additional vfio-ccw maintainer
  MAINTAINERS: add David as a reviewer for KVM/s390
  ...
2018-02-10 13:16:35 -08:00
Linus Torvalds
c013632192 2nd set of arm64 updates for 4.16:
Spectre v1 mitigation:
 - back-end version of array_index_mask_nospec()
 - masking of the syscall number to restrict speculation through the
   syscall table
 - masking of __user pointers prior to deference in uaccess routines
 
 Spectre v2 mitigation update:
 - using the new firmware SMC calling convention specification update
 - removing the current PSCI GET_VERSION firmware call mitigation as
   vendors are deploying new SMCCC-capable firmware
 - additional branch predictor hardening for synchronous exceptions and
   interrupts while in user mode
 
 Meltdown v3 mitigation update for Cavium Thunder X: unaffected but
 hardware erratum gets in the way. The kernel now starts with the page
 tables mapped as global and switches to non-global if kpti needs to be
 enabled.
 
 Other:
 - Theoretical trylock bug fixed
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlp8lqcACgkQa9axLQDI
 XvH2lxAAnsYqthpGQ11MtDJB+/UiBAFkg9QWPDkwrBDvNhgpll+J0VQuCN1QJ2GX
 qQ8rkv8uV+y4Fqr8hORGJy5At+0aI63ZCJ72RGkZTzJAtbFbFGIDHP7RhAEIGJBS
 Lk9kDZ7k39wLEx30UXIFYTTVzyHar397TdI7vkTcngiTzZ8MdFATfN/hiKO906q3
 14pYnU9Um4aHUdcJ+FocL3dxvdgniuuMBWoNiYXyOCZXjmbQOnDNU2UrICroV8lS
 mB+IHNEhX1Gl35QzNBtC0ET+aySfHBMJmM5oln+uVUljIGx6En1WLj6mrHYcx8U2
 rIBm5qO/X/4iuzYPGkxwQtpjq3wPYxsSUnMdKJrsUZqAfy2QeIhFx6XUtJsZPB2J
 /lgls5xSXMOS7oiOQtmVjcDLBURDmYXGwljXR4n4jLm4CT1V9qSLcKHu1gdFU9Mq
 VuMUdPOnQub1vqKndi154IoYDTo21jAib2ktbcxpJfSJnDYoit4Gtnv7eWY+M3Pd
 Toaxi8htM2HSRwbvslHYGW8ZcVpI79Jit+ti7CsFg7m9Lvgs0zxcnNui4uPYDymT
 jh2JYxuirIJbX9aGGhnmkNhq9REaeZJg9LA2JM8S77FCHN3bnlSdaG6wy899J6EI
 lK4anCuPQKKKhUia/dc1MeKwrmmC18EfPyGUkOzywg/jGwGCmZM=
 =Y0TT
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull more arm64 updates from Catalin Marinas:
 "As I mentioned in the last pull request, there's a second batch of
  security updates for arm64 with mitigations for Spectre/v1 and an
  improved one for Spectre/v2 (via a newly defined firmware interface
  API).

  Spectre v1 mitigation:

   - back-end version of array_index_mask_nospec()

   - masking of the syscall number to restrict speculation through the
     syscall table

   - masking of __user pointers prior to deference in uaccess routines

  Spectre v2 mitigation update:

   - using the new firmware SMC calling convention specification update

   - removing the current PSCI GET_VERSION firmware call mitigation as
     vendors are deploying new SMCCC-capable firmware

   - additional branch predictor hardening for synchronous exceptions
     and interrupts while in user mode

  Meltdown v3 mitigation update:

    - Cavium Thunder X is unaffected but a hardware erratum gets in the
      way. The kernel now starts with the page tables mapped as global
      and switches to non-global if kpti needs to be enabled.

  Other:

   - Theoretical trylock bug fixed"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits)
  arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
  arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
  arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
  arm/arm64: smccc: Make function identifiers an unsigned quantity
  firmware/psci: Expose SMCCC version through psci_ops
  firmware/psci: Expose PSCI conduit
  arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
  arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
  arm/arm64: KVM: Turn kvm_psci_version into a static inline
  arm/arm64: KVM: Advertise SMCCC v1.1
  arm/arm64: KVM: Implement PSCI 1.0 support
  arm/arm64: KVM: Add smccc accessors to PSCI code
  arm/arm64: KVM: Add PSCI_VERSION helper
  arm/arm64: KVM: Consolidate the PSCI include files
  arm64: KVM: Increment PC after handling an SMC trap
  arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
  arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
  arm64: entry: Apply BP hardening for suspicious interrupts from EL0
  arm64: entry: Apply BP hardening for high-priority synchronous exceptions
  arm64: futex: Mask __user pointers prior to dereference
  ...
2018-02-08 10:44:25 -08:00
Andrey Konovalov
917538e212 kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage
Right now the fact that KASAN uses a single shadow byte for 8 bytes of
memory is scattered all over the code.

This change defines KASAN_SHADOW_SCALE_SHIFT early in asm include files
and makes use of this constant where necessary.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/34937ca3b90736eaad91b568edf5684091f662e3.1515775666.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:43 -08:00
Marc Zyngier
6167ec5c91 arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
A new feature of SMCCC 1.1 is that it offers firmware-based CPU
workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides
BP hardening for CVE-2017-5715.

If the host has some mitigation for this issue, report that
we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the
host workaround on every guest exit.

Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:54:05 +00:00
Marc Zyngier
1a2fb94e6a arm/arm64: KVM: Consolidate the PSCI include files
As we're about to update the PSCI support, and because I'm lazy,
let's move the PSCI include file to include/kvm so that both
ARM architectures can find it.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:54 +00:00
Will Deacon
91b2d3442f arm64: futex: Mask __user pointers prior to dereference
The arm64 futex code has some explicit dereferencing of user pointers
where performing atomic operations in response to a futex command. This
patch uses masking to limit any speculative futex operations to within
the user address space.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:42 +00:00
Will Deacon
f71c2ffcb2 arm64: uaccess: Mask __user pointers for __arch_{clear, copy_*}_user
Like we've done for get_user and put_user, ensure that user pointers
are masked before invoking the underlying __arch_{clear,copy_*}_user
operations.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:40 +00:00
Will Deacon
84624087dd arm64: uaccess: Don't bother eliding access_ok checks in __{get, put}_user
access_ok isn't an expensive operation once the addr_limit for the current
thread has been loaded into the cache. Given that the initial access_ok
check preceding a sequence of __{get,put}_user operations will take
the brunt of the miss, we can make the __* variants identical to the
full-fat versions, which brings with it the benefits of address masking.

The likely cost in these sequences will be from toggling PAN/UAO, which
we can address later by implementing the *_unsafe versions.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:39 +00:00
Will Deacon
c2f0ad4fc0 arm64: uaccess: Prevent speculative use of the current addr_limit
A mispredicted conditional call to set_fs could result in the wrong
addr_limit being forwarded under speculation to a subsequent access_ok
check, potentially forming part of a spectre-v1 attack using uaccess
routines.

This patch prevents this forwarding from taking place, but putting heavy
barriers in set_fs after writing the addr_limit.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:37 +00:00
Will Deacon
6314d90e64 arm64: entry: Ensure branch through syscall table is bounded under speculation
In a similar manner to array_index_mask_nospec, this patch introduces an
assembly macro (mask_nospec64) which can be used to bound a value under
speculation. This macro is then used to ensure that the indirect branch
through the syscall table is bounded under speculation, with out-of-range
addresses speculating as calls to sys_io_setup (0).

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:35 +00:00
Robin Murphy
4d8efc2d5e arm64: Use pointer masking to limit uaccess speculation
Similarly to x86, mitigate speculation past an access_ok() check by
masking the pointer against the address limit before use.

Even if we don't expect speculative writes per se, it is plausible that
a CPU may still speculate at least as far as fetching a cache line for
writing, hence we also harden put_user() and clear_user() for peace of
mind.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:34 +00:00
Robin Murphy
51369e398d arm64: Make USER_DS an inclusive limit
Currently, USER_DS represents an exclusive limit while KERNEL_DS is
inclusive. In order to do some clever trickery for speculation-safe
masking, we need them both to behave equivalently - there aren't enough
bits to make KERNEL_DS exclusive, so we have precisely one option. This
also happens to correct a longstanding false negative for a range
ending on the very top byte of kernel memory.

Mark Rutland points out that we've actually got the semantics of
addresses vs. segments muddled up in most of the places we need to
amend, so shuffle the {USER,KERNEL}_DS definitions around such that we
can correct those properly instead of just pasting "-1"s everywhere.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:32 +00:00
Robin Murphy
022620eed3 arm64: Implement array_index_mask_nospec()
Provide an optimised, assembly implementation of array_index_mask_nospec()
for arm64 so that the compiler is not in a position to transform the code
in ways which affect its ability to inhibit speculation (e.g. by introducing
conditional branches).

This is similar to the sequence used by x86, modulo architectural differences
in the carry/borrow flags.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:30 +00:00
Will Deacon
669474e772 arm64: barrier: Add CSDB macros to control data-value prediction
For CPUs capable of data value prediction, CSDB waits for any outstanding
predictions to architecturally resolve before allowing speculative execution
to continue. Provide macros to expose it to the arch code.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:28 +00:00
Will Deacon
79ddab3b05 arm64: assembler: Align phys_to_pte with pte_to_phys
pte_to_phys lives in assembler.h and takes its destination register as
the first argument. Move phys_to_pte out of head.S to sit with its
counterpart and rejig it to follow the same calling convention.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:25 +00:00
Will Deacon
fa0465fc07 arm64: assembler: Change order of macro arguments in phys_to_ttbr
Since AArch64 assembly instructions take the destination register as
their first operand, do the same thing for the phys_to_ttbr macro.

Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:21 +00:00
Will Deacon
f992b4dfd5 arm64: kpti: Add ->enable callback to remap swapper using nG mappings
Defaulting to global mappings for kernel space is generally good for
performance and appears to be necessary for Cavium ThunderX. If we
subsequently decide that we need to enable kpti, then we need to rewrite
our existing page table entries to be non-global. This is fiddly, and
made worse by the possible use of contiguous mappings, which require
a strict break-before-make sequence.

Since the enable callback runs on each online CPU from stop_machine
context, we can have all CPUs enter the idmap, where secondaries can
wait for the primary CPU to rewrite swapper with its MMU off. It's all
fairly horrible, but at least it only runs once.

Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:18 +00:00
Will Deacon
41acec6240 arm64: kpti: Make use of nG dependent on arm64_kernel_unmapped_at_el0()
To allow systems which do not require kpti to continue running with
global kernel mappings (which appears to be a requirement for Cavium
ThunderX due to a CPU erratum), make the use of nG in the kernel page
tables dependent on arm64_kernel_unmapped_at_el0(), which is resolved
at runtime.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:14 +00:00
Shanker Donthineni
3060e9f0d1 arm64: Add software workaround for Falkor erratum 1041
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.

When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.

1) A System Error Interrupt (SEI) being raised by the Falkor core due
   to the errant memory access attempting to access a region of memory
   that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
   memory. This behavior may only occur if the instruction cache is
   disabled prior to or coincident with translation being changed from
   enabled to disabled.

The conditions leading to this erratum will not occur when either of the
following occur:
 1) A higher exception level disables translation of a lower exception level
   (e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
 2) An exception level disabling its stage-1 translation if its stage-2
    translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
    to 0 when HCR_EL2[VM] has a value of 1).

To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:13 +00:00
Will Deacon
202fb4ef81 arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
If the spinlock "next" ticket wraps around between the initial LDR
and the cmpxchg in the LSE version of spin_trylock, then we can erroneously
think that we have successfuly acquired the lock because we only check
whether the next ticket return by the cmpxchg is equal to the owner ticket
in our updated lock word.

This patch fixes the issue by performing a full 32-bit check of the lock
word when trying to determine whether or not the CASA instruction updated
memory.

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-02-06 22:53:11 +00:00
Linus Torvalds
617aebe6a9 Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
 available to be copied to/from userspace in the face of bugs. To further
 restrict what memory is available for copying, this creates a way to
 whitelist specific areas of a given slab cache object for copying to/from
 userspace, allowing much finer granularity of access control. Slab caches
 that are never exposed to userspace can declare no whitelist for their
 objects, thereby keeping them unavailable to userspace via dynamic copy
 operations. (Note, an implicit form of whitelisting is the use of constant
 sizes in usercopy operations and get_user()/put_user(); these bypass all
 hardened usercopy checks since these sizes cannot change at runtime.)
 
 This new check is WARN-by-default, so any mistakes can be found over the
 next several releases without breaking anyone's system.
 
 The series has roughly the following sections:
 - remove %p and improve reporting with offset
 - prepare infrastructure and whitelist kmalloc
 - update VFS subsystem with whitelists
 - update SCSI subsystem with whitelists
 - update network subsystem with whitelists
 - update process memory with whitelists
 - update per-architecture thread_struct with whitelists
 - update KVM with whitelists and fix ioctl bug
 - mark all other allocations as not whitelisted
 - update lkdtm for more sensible test overage
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4
 43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq
 cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ
 NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q
 9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm
 uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1
 gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr
 C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug
 d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H
 jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi
 Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc
 JgOmUnQNJWCTwUUw5AS1
 =tzmJ
 -----END PGP SIGNATURE-----

Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardened usercopy whitelisting from Kees Cook:
 "Currently, hardened usercopy performs dynamic bounds checking on slab
  cache objects. This is good, but still leaves a lot of kernel memory
  available to be copied to/from userspace in the face of bugs.

  To further restrict what memory is available for copying, this creates
  a way to whitelist specific areas of a given slab cache object for
  copying to/from userspace, allowing much finer granularity of access
  control.

  Slab caches that are never exposed to userspace can declare no
  whitelist for their objects, thereby keeping them unavailable to
  userspace via dynamic copy operations. (Note, an implicit form of
  whitelisting is the use of constant sizes in usercopy operations and
  get_user()/put_user(); these bypass all hardened usercopy checks since
  these sizes cannot change at runtime.)

  This new check is WARN-by-default, so any mistakes can be found over
  the next several releases without breaking anyone's system.

  The series has roughly the following sections:
   - remove %p and improve reporting with offset
   - prepare infrastructure and whitelist kmalloc
   - update VFS subsystem with whitelists
   - update SCSI subsystem with whitelists
   - update network subsystem with whitelists
   - update process memory with whitelists
   - update per-architecture thread_struct with whitelists
   - update KVM with whitelists and fix ioctl bug
   - mark all other allocations as not whitelisted
   - update lkdtm for more sensible test overage"

* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
  lkdtm: Update usercopy tests for whitelisting
  usercopy: Restrict non-usercopy caches to size 0
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
  kvm: whitelist struct kvm_vcpu_arch
  arm: Implement thread_struct whitelist for hardened usercopy
  arm64: Implement thread_struct whitelist for hardened usercopy
  x86: Implement thread_struct whitelist for hardened usercopy
  fork: Provide usercopy whitelisting for task_struct
  fork: Define usercopy region in thread_stack slab caches
  fork: Define usercopy region in mm_struct slab caches
  net: Restrict unwhitelisted proto caches to size 0
  sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  sctp: Define usercopy region in SCTP proto slab cache
  caif: Define usercopy region in caif proto slab cache
  ip: Define usercopy region in IP proto slab cache
  net: Define usercopy region in struct proto slab cache
  scsi: Define usercopy region in scsi_sense_cache slab cache
  cifs: Define usercopy region in cifs_request slab cache
  vxfs: Define usercopy region in vxfs_inode slab cache
  ufs: Define usercopy region in ufs_inode_cache slab cache
  ...
2018-02-03 16:25:42 -08:00
Linus Torvalds
3879ae653a The core framework has a handful of patches this time around, mostly due
to the clk rate protection support added by Jerome Brunet. This feature
 will allow consumers to lock in a certain rate on the output of a clk so
 that things like audio playback don't hear pops when the clk frequency
 changes due to shared parent clks changing rates. Currently the clk
 API doesn't guarantee the rate of a clk stays at the rate you request
 after clk_set_rate() is called, so this new API will allow drivers
 to express that requirement. Beyond this, the core got some debugfs
 pretty printing patches and a couple minor non-critical fixes.
 
 Looking outside of the core framework diff we have some new driver
 additions and the removal of a legacy TI clk driver. Both of these hit
 high in the dirstat. Also, the removal of the asm-generic/clkdev.h file
 causes small one-liners in all the architecture Kbuild files. Overall, the
 driver diff seems to be the normal stuff that comes all the time to
 fix little problems here and there and to support new hardware.
 
 Core:
  - Clk rate protection
  - Symbolic clk flags in debugfs output
  - Clk registration enabled clks while doing bookkeeping updates
 
 New Drivers:
  - Spreadtrum SC9860
  - HiSilicon hi3660 stub
  - Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
  - Amlogic Meson-AXG
  - ASPEED BMC
 
 Removed Drivers:
  - TI OMAP 3xxx legacy clk (non-DT) support
  - asm*/clkdev.h got removed (not really a driver)
 
 Updates:
  - Renesas FDP1-0 module clock on R-Car M3-W
  - Renesas LVDS module clock on R-Car V3M
  - Misc fixes to pr_err() prints
  - Qualcomm MSM8916 audio fixes
  - Qualcomm IPQ8074 rounded out support for more peripherals
  - Qualcomm Alpha PLL variants
  - Divider code was using container_of() on bad pointers
  - Allwinner DE2 clks on H3
  - Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
  - Mediatek clk driver compile test support
  - AT91 PMC clk suspend/resume restoration support
  - PLL issues fixed on si5351
  - Broadcom IProc PLL calculation updates
  - DVFS support for Armada mvebu CPU clks
  - Allwinner fixed post-divider support
  - TI clkctrl fixes and support for newer SoCs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJac5vRAAoJEK0CiJfG5JUlUaIP/Riq0tbApfc4k4GMvSvaieR/
 AwZFIMCxOxO+KGdUsBWj7UUoDfBYmxyknHZkVUA/m+Lm7cRH/YHHMghEceZLaBYW
 zPQmDfkTl/QkwysXZMCw9vg4vO0tt5gWbHljQnvVhxVVTCkIRpaE8Vkktj1RZzpY
 WU/TkvPbVGY3SNm504TRXKWC9KpMTEXVvzqlg6zLDJ/jE7PGzBKtewqMoLDCBH2L
 q6b50BSXDo2Hep0vm6e5xneXKjLNR4kgN4PkbM4Yoi4iWLLbgAu79NfyOvvr/imS
 HxOHRms9tejtyaiR6bQSF0pbLOERZ3QSbMFEbxdxnCTuPEfy3Nw/2W7mNJlhJa8g
 EGLMnLL4WdloL4Z83dAcMrj9OmxYf7Yobf5dMidLrQT5EYuafdj0ParbI8TQpWSB
 eTqaffSUGPE/7xuKouYBcbvocpXXWCcokrP/mEn3OEHXkIeeut1Jd3RmEvsi3gtJ
 pNraJTIpvt4c05rj6yLUOhWfyqlA+fH3p4Fx3rrH1tmKEiG+lrhKoxF26uALZe0V
 OvarhG+LPIE10pCIYlQjZjQVnYLGCxsGAIoK1uz7VYvFPh2T0cxQlzzeqFgrlTyN
 32hMj3LhkQw82FG9xZqjTX1935R35mySRlx63x7HStI1YFief2X9+RHjJR/lofG0
 nC0JWTp5sC/pKf54QBXj
 =bGPp
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "The core framework has a handful of patches this time around, mostly
  due to the clk rate protection support added by Jerome Brunet.

  This feature will allow consumers to lock in a certain rate on the
  output of a clk so that things like audio playback don't hear pops
  when the clk frequency changes due to shared parent clks changing
  rates. Currently the clk API doesn't guarantee the rate of a clk stays
  at the rate you request after clk_set_rate() is called, so this new
  API will allow drivers to express that requirement.

  Beyond this, the core got some debugfs pretty printing patches and a
  couple minor non-critical fixes.

  Looking outside of the core framework diff we have some new driver
  additions and the removal of a legacy TI clk driver. Both of these hit
  high in the dirstat. Also, the removal of the asm-generic/clkdev.h
  file causes small one-liners in all the architecture Kbuild files.

  Overall, the driver diff seems to be the normal stuff that comes all
  the time to fix little problems here and there and to support new
  hardware.

  Summary:

  Core:
   - Clk rate protection
   - Symbolic clk flags in debugfs output
   - Clk registration enabled clks while doing bookkeeping updates

  New Drivers:
   - Spreadtrum SC9860
   - HiSilicon hi3660 stub
   - Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
   - Amlogic Meson-AXG
   - ASPEED BMC

  Removed Drivers:
   - TI OMAP 3xxx legacy clk (non-DT) support
   - asm*/clkdev.h got removed (not really a driver)

  Updates:
   - Renesas FDP1-0 module clock on R-Car M3-W
   - Renesas LVDS module clock on R-Car V3M
   - Misc fixes to pr_err() prints
   - Qualcomm MSM8916 audio fixes
   - Qualcomm IPQ8074 rounded out support for more peripherals
   - Qualcomm Alpha PLL variants
   - Divider code was using container_of() on bad pointers
   - Allwinner DE2 clks on H3
   - Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
   - Mediatek clk driver compile test support
   - AT91 PMC clk suspend/resume restoration support
   - PLL issues fixed on si5351
   - Broadcom IProc PLL calculation updates
   - DVFS support for Armada mvebu CPU clks
   - Allwinner fixed post-divider support
   - TI clkctrl fixes and support for newer SoCs"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
  clk: aspeed: Handle inverse polarity of USB port 1 clock gate
  clk: aspeed: Fix return value check in aspeed_cc_init()
  clk: aspeed: Add reset controller
  clk: aspeed: Register gated clocks
  clk: aspeed: Add platform driver and register PLLs
  clk: aspeed: Register core clocks
  clk: Add clock driver for ASPEED BMC SoCs
  clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
  clk: fix reentrancy of clk_enable() on UP systems
  clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
  clk: Simplify debugfs registration
  clk: Fix debugfs_create_*() usage
  clk: Show symbolic clock flags in debugfs
  clk: renesas: r8a7796: Add FDP clock
  clk: Move __clk_{get,put}() into private clk.h API
  clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
  clk: Improve flags doc for of_clk_detect_critical()
  arch: Remove clkdev.h asm-generic from Kbuild
  clk: sunxi-ng: a83t: Add M divider to TCON1 clock
  clk: Prepare to remove asm-generic/clkdev.h
  ...
2018-02-01 16:56:07 -08:00
Radim Krčmář
7bf14c28ee Merge branch 'x86/hyperv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Topic branch for stable KVM clockource under Hyper-V.

Thanks to Christoffer Dall for resolving the ARM conflict.
2018-02-01 15:04:17 +01:00
Catalin Marinas
1d78a62cb3 arm64: provide pmdp_establish() helper
We need an atomic way to setup pmd page table entry, avoiding races with
CPU setting dirty/accessed bits.  This is required to implement
pmdp_invalidate() that doesn't lose these bits.

Link: http://lkml.kernel.org/r/20171213105756.69879-5-kirill.shutemov@linux.intel.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:37 -08:00