* for-next/kvm-build-fix:
: Fix KVM build issues with 64K pages
KVM: arm64: Fix build error in user_mem_abort()
* for-next/va-refactor:
: VA layout changes
arm64: mm: don't assume struct page is always 64 bytes
Documentation/arm64: fix RST layout of memory.rst
arm64: mm: tidy up top of kernel VA space
arm64: mm: make vmemmap region a projection of the linear region
arm64: mm: extend linear region for 52-bit VA configurations
* for-next/lto:
: Upgrade READ_ONCE() to RCpc acquire on arm64 with LTO
arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y
arm64: alternatives: Remove READ_ONCE() usage during patch operation
arm64: cpufeatures: Add capability for LDAPR instruction
arm64: alternatives: Split up alternative.h
arm64: uaccess: move uao_* alternatives to asm-uaccess.h
* for-next/mem-hotplug:
: Memory hotplug improvements
arm64/mm/hotplug: Ensure early memory sections are all online
arm64/mm/hotplug: Enable MEM_OFFLINE event handling
arm64/mm/hotplug: Register boot memory hot remove notifier earlier
arm64: mm: account for hotplug memory when randomizing the linear region
* for-next/cppc-ffh:
: Add CPPC FFH support using arm64 AMU counters
arm64: abort counter_read_on_cpu() when irqs_disabled()
arm64: implement CPPC FFH support using AMUs
arm64: split counter validation function
arm64: wrap and generalise counter read functions
* for-next/pad-image-header:
: Pad Image header to 64KB and unmap it
arm64: head: tidy up the Image header definition
arm64/head: avoid symbol names pointing into first 64 KB of kernel image
arm64: omit [_text, _stext) from permanent kernel mapping
* for-next/zone-dma-default-32-bit:
: Default to 32-bit wide ZONE_DMA (previously reduced to 1GB for RPi4)
of: unittest: Fix build on architectures without CONFIG_OF_ADDRESS
mm: Remove examples from enum zone_type comment
arm64: mm: Set ZONE_DMA size based on early IORT scan
arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
of: unittest: Add test for of_dma_get_max_cpu_address()
of/address: Introduce of_dma_get_max_cpu_address()
arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
arm64: mm: Move reserve_crashkernel() into mem_init()
arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required
arm64: Ignore any DMA offsets in the max_zone_phys() calculation
* for-next/signal-tag-bits:
: Expose the FAR_EL1 tag bits in siginfo
arm64: expose FAR_EL1 tag bits in siginfo
signal: define the SA_EXPOSE_TAGBITS bit in sa_flags
signal: define the SA_UNSUPPORTED bit in sa_flags
arch: provide better documentation for the arch-specific SA_* flags
signal: clear non-uapi flag bits when passing/returning sa_flags
arch: move SA_* definitions to generic headers
parisc: start using signal-defs.h
parisc: Drop parisc special case for __sighandler_t
* for-next/cmdline-extended:
: Add support for CONFIG_CMDLINE_EXTENDED
arm64: Extend the kernel command line from the bootloader
arm64: kaslr: Refactor early init command line parsing
Now that the PAN toggling has been removed, the only user of
__system_matches_cap() is has_generic_auth(), which is only built when
CONFIG_ARM64_PTR_AUTH is selected, and Qian reports that this results in
a build-time warning when CONFIG_ARM64_PTR_AUTH is not selected:
| arch/arm64/kernel/cpufeature.c:2649:13: warning: '__system_matches_cap' defined but not used [-Wunused-function]
| static bool __system_matches_cap(unsigned int n)
| ^~~~~~~~~~~~~~~~~~~~
It's tricky to restructure things to prevent this, so let's mark
__system_matches_cap() as __maybe_unused, as we used to do for the other
user of __system_matches_cap() which we just removed.
Reported-by: Qian Cai <qcai@redhat.com>
Suggested-by: Qian Cai <qcai@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20201203152403.26100-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that arm64 no longer uses UAO, remove the vestigal feature detection
code and Kconfig text.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-13-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Some code (e.g. futex) needs to make privileged accesses to userspace
memory, and uses uaccess_{enable,disable}_privileged() in order to
permit this. All other uaccess primitives use LDTR/STTR, and never need
to toggle PAN.
Remove the redundant PAN toggling.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-12-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that set_fs() is gone, addr_limit_user_check() is redundant. Remove
the checks and associated thread flag.
To ensure that _TIF_WORK_MASK can be used as an immediate value in an
AND instruction (as it is in `ret_to_user`), TIF_MTE_ASYNC_FAULT is
renumbered to keep the constituent bits of _TIF_WORK_MASK contiguous.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-11-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now the uaccess primitives use LDTR/STTR unconditionally, the
uao_{ldp,stp,user_alternative} asm macros are misnamed, and have a
redundant argument. Let's remove the redundant argument and rename these
to user_{ldp,stp,ldst} respectively to clean this up.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murohy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch separates arm64's user and kernel memory access primitives
into distinct routines, adding new __{get,put}_kernel_nofault() helpers
to access kernel memory, upon which core code builds larger copy
routines.
The kernel access routines (using LDR/STR) are not affected by PAN (when
legitimately accessing kernel memory), nor are they affected by UAO.
Switching to KERNEL_DS may set UAO, but this does not adversely affect
the kernel access routines.
The user access routines (using LDTR/STTR) are not affected by PAN (when
legitimately accessing user memory), but are affected by UAO. As these
are only legitimate to use under USER_DS with UAO clear, this should not
be problematic.
Routines performing atomics to user memory (futex and deprecated
instruction emulation) still need to transiently clear PAN, and these
are left as-is. These are never used on kernel memory.
Subsequent patches will refactor the uaccess helpers to remove redundant
code, and will also remove the redundant PAN/UAO manipulation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As a step towards implementing __{get,put}_kernel_nofault(), this patch
splits most user-memory specific logic out of __{get,put}_user(), with
the memory access and fault handling in new __{raw_get,put}_mem()
helpers.
For now the LDR/LDTR patching is left within the *get_mem() helpers, and
will be removed in a subsequent patch.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently __copy_user_flushcache() open-codes raw_copy_from_user(), and
doesn't use uaccess_mask_ptr() on the user address. Let's have it call
raw_copy_from_user(), which is both a simplification and ensures that
user pointers are masked under speculation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We currently have many uaccess_*{enable,disable}*() variants, which
subsequent patches will cut down as part of removing set_fs() and
friends. Once this simplification is made, most uaccess routines will
only need to ensure that the user page tables are mapped in TTBR0, as is
currently dealt with by uaccess_ttbr0_{enable,disable}().
The existing uaccess_{enable,disable}() routines ensure that user page
tables are mapped in TTBR0, and also disable PAN protections, which is
necessary to be able to use atomics on user memory, but also permit
unrelated privileged accesses to access user memory.
As preparatory step, let's rename uaccess_{enable,disable}() to
uaccess_{enable,disable}_privileged(), highlighting this caveat and
discouraging wider misuse. Subsequent patches can reuse the
uaccess_{enable,disable}() naming for the common case of ensuring the
user page tables are mapped in TTBR0.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for removing addr_limit and set_fs() we must decouple the
SDEI PAN/UAO manipulation from the uaccess code, and explicitly
reinitialize these as required.
SDEI enters the kernel with a non-architectural exception, and prior to
the most recent revision of the specification (ARM DEN 0054B), PSTATE
bits (e.g. PAN, UAO) are not manipulated in the same way as for
architectural exceptions. Notably, older versions of the spec can be
read ambiguously as to whether PSTATE bits are inherited unchanged from
the interrupted context or whether they are generated from scratch, with
TF-A doing the latter.
We have three cases to consider:
1) The existing TF-A implementation of SDEI will clear PAN and clear UAO
(along with other bits in PSTATE) when delivering an SDEI exception.
2) In theory, implementations of SDEI prior to revision B could inherit
PAN and UAO (along with other bits in PSTATE) unchanged from the
interrupted context. However, in practice such implementations do not
exist.
3) Going forward, new implementations of SDEI must clear UAO, and
depending on SCTLR_ELx.SPAN must either inherit or set PAN.
As we can ignore (2) we can assume that upon SDEI entry, UAO is always
clear, though PAN may be clear, inherited, or set per SCTLR_ELx.SPAN.
Therefore, we must explicitly initialize PAN, but do not need to do
anything for UAO.
Considering what we need to do:
* When set_fs() is removed, force_uaccess_begin() will have no HW
side-effects. As this only clears UAO, which we can assume has already
been cleared upon entry, this is not a problem. We do not need to add
code to manipulate UAO explicitly.
* PAN may be cleared upon entry (in case 1 above), so where a kernel is
built to use PAN and this is supported by all CPUs, the kernel must
set PAN upon entry to ensure expected behaviour.
* PAN may be inherited from the interrupted context (in case 3 above),
and so where a kernel is not built to use PAN or where PAN support is
not uniform across CPUs, the kernel must clear PAN to ensure expected
behaviour.
This patch reworks the SDEI code accordingly, explicitly setting PAN to
the expected state in all cases. To cater for the cases where the kernel
does not use PAN or this is not uniformly supported by hardware we add a
new cpu_has_pan() helper which can be used regardless of whether the
kernel is built to use PAN.
The existing system_uses_ttbr0_pan() is redefined in terms of
system_uses_hw_pan() both for clarity and as a minor optimization when
HW PAN is not selected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The SDEI support code is split across arch/arm64/ and drivers/firmware/,
largley this is split so that the arch-specific portions are under
arch/arm64, and the management logic is under drivers/firmware/.
However, exception entry fixups are currently under drivers/firmware.
Let's move the exception entry fixups under arch/arm64/. This
de-clutters the management logic, and puts all the arch-specific
portions in one place. Doing this also allows the fixups to be applied
earlier, so things like PAN and UAO will be in a known good state before
we run other logic. This will also make subsequent refactoring easier.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As with SCTLR_ELx and other control registers, some PSTATE bits are
UNKNOWN out-of-reset, and we may not be able to rely on hardware or
firmware to initialize them to our liking prior to entry to the kernel,
e.g. in the primary/secondary boot paths and return from idle/suspend.
It would be more robust (and easier to reason about) if we consistently
initialized PSTATE to a default value, as we do with control registers.
This will ensure that the kernel is not adversely affected by bits it is
not aware of, e.g. when support for a feature such as PAN/UAO is
disabled.
This patch ensures that PSTATE is consistently initialized at boot time
via an ERET. This is not intended to relax the existing requirements
(e.g. DAIF bits must still be set prior to entering the kernel). For
features detected dynamically (which may require system-wide support),
it is still necessary to subsequently modify PSTATE.
As ERET is not always a Context Synchronization Event, an ISB is placed
before each exception return to ensure updates to control registers have
taken effect. This handles the kernel being entered with SCTLR_ELx.EOS
clear (or any future control bits being in an UNKNOWN state).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Let's make SCTLR_ELx initialization a bit clearer by using meaningful
names for the initialization values, following the same scheme for
SCTLR_EL1 and SCTLR_EL2.
These definitions will be used more widely in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For a while now el2_setup has performed some basic initialization of EL1
even when the kernel is booted at EL1, so the name is a little
misleading. Further, some comments are stale as with VHE it doesn't drop
the CPU to EL1.
To clarify things, rename el2_setup to init_kernel_el, and update
comments to be clearer as to the function's purpose.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To make callsites easier to read, add trivial C wrappers for the
SET_PSTATE_*() helpers, and convert trivial uses over to these. The new
wrappers will be used further in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For consistency, all tasks have a pt_regs reserved at the highest
portion of their task stack. Among other things, this ensures that a
task's SP is always pointing within its stack rather than pointing
immediately past the end.
While it is never legitimate to ERET from a kthread, we take pains to
initialize pt_regs for kthreads as if this were legitimate. As this is
never legitimate, the effects of an erroneous return are rarely tested.
Let's simplify things by initializing a kthread's pt_regs such that an
ERET is caught as an illegal exception return, and removing the explicit
initialization of other exception context. Note that as
spectre_v4_enable_task_mitigation() only manipulates the PSTATE within
the unused regs this is safe to remove.
As user tasks will have their exception context initialized via
start_thread() or start_compat_thread(), this should only impact cases
where something has gone very wrong and we'd like that to be clearly
indicated.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
of_dma_get_max_cpu_address() is not defined if !CONFIG_OF_ADDRESS, so
return early in of_unittest_dma_get_max_cpu_address().
Fixes: 07d13a1d61 ("of: unittest: Add test for of_dma_get_max_cpu_address()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Provide support for additional kernel command line parameters to be
concatenated onto the end of the command line provided by the
bootloader. Additional parameters are specified in the CONFIG_CMDLINE
option when CONFIG_CMDLINE_EXTEND is selected, matching other
architectures and leveraging existing support in the FDT and EFI stub
code.
Special care must be taken for the arch-specific nokaslr parsing. Search
the bootargs FDT property and the CONFIG_CMDLINE when
CONFIG_CMDLINE_EXTEND is in use.
There are a couple of known use cases for this feature:
1) Switching between stable and development kernel versions, where one
of the versions benefits from additional command line parameters,
such as debugging options.
2) Specifying additional command line parameters, for additional tuning
or debugging, when the bootloader does not offer an interactive mode.
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200921191557.350256-3-tyhicks@linux.microsoft.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Don't ask for *the* command line string to search for "nokaslr" in
kaslr_early_init(). Instead, tell a helper function to search all the
appropriate command line strings for "nokaslr" and return the result.
This paves the way for searching multiple command line strings without
having to concatenate the strings in early init.
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200921191557.350256-2-tyhicks@linux.microsoft.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Architectures that support address tagging, such as arm64, may want to
expose fault address tag bits to the signal handler to help diagnose
memory errors. However, these bits have not been previously set,
and their presence may confuse unaware user applications. Therefore,
introduce a SA_EXPOSE_TAGBITS flag bit in sa_flags that a signal
handler may use to explicitly request that the bits are set.
The generic signal handler APIs expect to receive tagged addresses.
Architectures may specify how to untag addresses in the case where
SA_EXPOSE_TAGBITS is clear by defining the arch_untagged_si_addr
function.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://linux-review.googlesource.com/id/I16dd0ed2081f091fce97be0190cb8caa874c26cb
Link: https://lkml.kernel.org/r/13cf24d00ebdd8e1f55caf1821c7c29d54100191.1605904350.git.pcc@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Define a sa_flags bit, SA_UNSUPPORTED, which will never be supported
in the uapi. The purpose of this flag bit is to allow userspace to
distinguish an old kernel that does not clear unknown sa_flags bits
from a kernel that supports every flag bit.
In other words, if userspace does something like:
act.sa_flags |= SA_UNSUPPORTED;
sigaction(SIGSEGV, &act, 0);
sigaction(SIGSEGV, 0, &oldact);
and finds that SA_UNSUPPORTED remains set in oldact.sa_flags, it means
that the kernel cannot be trusted to have cleared unknown flag bits
from sa_flags, so no assumptions about flag bit support can be made.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://linux-review.googlesource.com/id/Ic2501ad150a3a79c1cf27fb8c99be342e9dffbcb
Link: https://lkml.kernel.org/r/bda7ddff8895a9bc4ffc5f3cf3d4d37a32118077.1605582887.git.pcc@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Previously we were not clearing non-uapi flag bits in
sigaction.sa_flags when storing the userspace-provided sa_flags or
when returning them via oldact. Start doing so.
This allows userspace to detect missing support for flag bits and
allows the kernel to use non-uapi bits internally, as we are already
doing in arch/x86 for two flag bits. Now that this change is in
place, we no longer need the code in arch/x86 that was hiding these
bits from userspace, so remove it.
This is technically a userspace-visible behavior change for sigaction, as
the unknown bits returned via oldact.sa_flags are no longer set. However,
we are free to define the behavior for unknown bits exactly because
their behavior is currently undefined, so for now we can define the
meaning of each of them to be "clear the bit in oldact.sa_flags unless
the bit becomes known in the future". Furthermore, this behavior is
consistent with OpenBSD [1], illumos [2] and XNU [3] (FreeBSD [4] and
NetBSD [5] fail the syscall if unknown bits are set). So there is some
precedent for this behavior in other kernels, and in particular in XNU,
which is probably the most popular kernel among those that I looked at,
which means that this change is less likely to be a compatibility issue.
Link: [1] f634a6a4b5/sys/kern/kern_sig.c (L278)
Link: [2] 76f19f5fdc/usr/src/uts/common/syscall/sigaction.c (L86)
Link: [3] a449c6a3b8/bsd/kern/kern_sig.c (L480)
Link: [4] eded70c370/sys/kern/kern_sig.c (L699)
Link: [5] 3365779bec/sys/kern/sys_sig.c (L473)
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://linux-review.googlesource.com/id/I35aab6f5be932505d90f3b3450c083b4db1eca86
Link: https://lkml.kernel.org/r/878dbcb5f47bc9b11881c81f745c0bef5c23f97f.1605235762.git.pcc@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Most architectures with the exception of alpha, mips, parisc and
sparc use the same values for these flags. Move their definitions into
asm-generic/signal-defs.h and allow the architectures with non-standard
values to override them. Also, document the non-standard flag values
in order to make it easier to add new generic flags in the future.
A consequence of this change is that on powerpc and x86, the constants'
values aside from SA_RESETHAND change signedness from unsigned
to signed. This is not expected to impact realistic use of these
constants. In particular the typical use of the constants where they
are or'ed together and assigned to sa_flags (or another int variable)
would not be affected.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://linux-review.googlesource.com/id/Ia3849f18b8009bf41faca374e701cdca36974528
Link: https://lkml.kernel.org/r/b6d0d1ec34f9ee93e1105f14f288fba5f89d1f24.1605235762.git.pcc@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
I believe we can and *should* drop this parisc-specific typedef for
__sighandler_t when compiling a 64-bit kernel. The reasons:
1. We don't have a 64-bit userspace yet, so nothing (on userspace side)
can break.
2. Inside the Linux kernel, this is only used in kernel/signal.c, in
function kernel_sigaction() where the signal handler is compared against
SIG_IGN. SIG_IGN is defined as (__sighandler_t)1), so only the pointers
are compared.
3. Even when a 64-bit userspace gets added at some point, I think
__sighandler_t should be defined what it is: a function pointer struct.
I compiled kernel/signal.c with and without the patch, and the produced code
is identical in both cases.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I21c43f21b264f339e3aa395626af838646f62d97
Link: https://lkml.kernel.org/r/a75b8eb7bb9eac1cf73fb119eb53e5892d6e9656.1605235762.git.pcc@google.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We can't really list every setup in common code. On top of that they are
unlikely to stay true for long as things change in the arch trees
independently of this comment.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20201119175400.9995-8-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)
Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.
This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.
So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.
[nsaenz: unified implementation with DT's counterpart]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20201119175400.9995-7-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)
The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.
So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Link: https://lore.kernel.org/r/20201119175400.9995-6-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
data as the rest of dma-ranges unit tests.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201119175400.9995-5-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
physical address addressable by all DMA masters in the system. It's
specially useful for setting memory zones sizes at early boot time.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201119175400.9995-4-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
zone_dma_bits's initialization happens earlier that it's actually
needed, in arm64_memblock_init(). So move it into the more suitable
zone_sizes_init().
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Link: https://lore.kernel.org/r/20201119175400.9995-3-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
crashkernel might reserve memory located in ZONE_DMA. We plan to delay
ZONE_DMA's initialization after unflattening the devicetree and ACPI's
boot table initialization, so move it later in the boot process.
Specifically into bootmem_init() since request_standard_resources()
depends on it.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Link: https://lore.kernel.org/r/20201119175400.9995-2-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
mem_init() currently relies on knowing the boundaries of the crashkernel
reservation to map such region with page granularity for later
unmapping via set_memory_valid(..., 0). If the crashkernel reservation
is deferred, such boundaries are not known when the linear mapping is
created. Simply parse the command line for "crashkernel" and, if found,
create the linear map with NO_BLOCK_MAPPINGS.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: James Morse <james.morse@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Link: https://lore.kernel.org/r/20201119175556.18681-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, the kernel assumes that if RAM starts above 32-bit (or
zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
such constrained devices have a hardwired DMA offset. In practice, we
haven't noticed any such hardware so let's assume that we can expand
ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
zone_bits.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20201118185809.1078362-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Even though support for EFI boot remains entirely optional for arm64,
it is unlikely that we will ever be able to repurpose the image header
fields that the EFI loader relies on, i.e., the magic NOP at offset
0x0 and the PE header address at offset 0x3c.
So let's factor out the differences into a 'efi_signature_nop' macro and
a local symbol representing the PE header address, and move the
conditional definitions into efi-header.S, taking into account whether
CONFIG_EFI is enabled or not. While at it, switch to a signature NOP
that behaves more like a NOP, i.e., one that only clobbers the
flags.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201117124729.12642-4-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We no longer map the first 64 KB of the kernel image, as there is nothing
there that we ever need to refer back to once the kernel has booted. Even
though facilities like kallsyms are very careful to only refer to the
region that starts at _stext when mapping virtual addresses to symbol
names, let's avoid any confusion by switching to local .L prefixed symbol
names for the EFI header, as none of them have any significance to the
rest of the kernel.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201117124729.12642-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a previous patch, we increased the size of the EFI PE/COFF header
to 64 KB, which resulted in the _stext symbol to appear at a fixed
offset of 64 KB into the image.
Since 64 KB is also the largest page size we support, this completely
removes the need to map the first 64 KB of the kernel image, given that
it only contains the arm64 Image header and the EFI header, neither of
which we ever access again after booting the kernel. More importantly,
we should avoid an executable mapping of non-executable and not entirely
predictable data, to deal with the unlikely event that we inadvertently
emitted something that looks like an opcode that could be used as a
gadget for speculative execution.
So let's limit the kernel mapping of .text to the [_stext, _etext)
region, which matches the view of generic code (such as kallsyms) when
it reasons about the boundaries of the kernel's .text section.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201117124729.12642-2-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Given that smp_call_function_single() can deadlock when interrupts are
disabled, abort the SMP call if irqs_disabled(). This scenario is
currently not possible given the function's uses, but safeguard this for
potential future uses.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20201113155328.4194-1-ionela.voinescu@arm.com
[catalin.marinas@arm.com: modified following Mark's comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If Activity Monitors (AMUs) are present, two of the counters can be used
to implement support for CPPC's (Collaborative Processor Performance
Control) delivered and reference performance monitoring functionality
using FFH (Functional Fixed Hardware).
Given that counters for a certain CPU can only be read from that CPU,
while FFH operations can be called from any CPU for any of the CPUs, use
smp_call_function_single() to provide the requested values.
Therefore, depending on the register addresses, the following values
are returned:
- 0x0 (DeliveredPerformanceCounterRegister): AMU core counter
- 0x1 (ReferencePerformanceCounterRegister): AMU constant counter
The use of Activity Monitors is hidden behind the generic
cpu_read_{corecnt,constcnt}() functions.
Read functionality for these two registers represents the only current
FFH support for CPPC. Read operations for other register values or write
operation for all registers are unsupported. Therefore, keep CPPC's FFH
unsupported if no CPUs have valid AMU frequency counters. For this
purpose, the get_cpu_with_amu_feat() is introduced.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-4-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order for the counter validation function to be reused, split
validate_cpu_freq_invariance_counters() into:
- freq_counters_valid(cpu) - check cpu for valid cycle counters
- freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate) -
generic function that sets the normalization ratio used by
topology_scale_freq_tick()
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-3-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for other uses of Activity Monitors (AMU) cycle counters,
place counter read functionality in generic functions that can reused:
read_corecnt() and read_constcnt().
As a result, implement update_freq_counters_refs() to replace
init_cpu_freq_invariance_counters() and both initialise and update
the per-cpu reference variables.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-2-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 8c96400d6a simplified the page-to-virt and virt-to-page
conversions, based on the assumption that struct page is always 64
bytes in size, in which case we can use a single signed shift to
perform the conversion (provided that the vmemmap array is placed
appropriately in the kernel VA space)
Unfortunately, this assumption turns out not to hold, and so we need
to revert part of this commit, and go back to an affine transformation.
Given that all the quantities involved are compile time constants,
this should not make any practical difference.
Fixes: 8c96400d6a ("arm64: mm: make vmemmap region a projection of the linear region")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20201110180511.29083-1-ardb@kernel.org
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This adds a validation function that scans the entire boot memory and makes
sure that all early memory sections are online. This check is essential for
the memory notifier to work properly, as it cannot prevent any boot memory
from offlining, if all sections are not online to begin with. Although the
boot section scanning is selectively enabled with DEBUG_VM.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1604896137-16644-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This enables MEM_OFFLINE memory event handling. It will help intercept any
possible error condition such as if boot memory some how still got offlined
even after an explicit notifier failure, potentially by a future change in
generic hot plug framework. This would help detect such scenarios and help
debug further. While here, also call out the first section being attempted
for offline or got offlined.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1604896137-16644-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This moves memory notifier registration earlier in the boot process from
device_initcall() to early_initcall() which will help in guarding against
potential early boot memory offline requests. Even though there should not
be any actual offlinig requests till memory block devices are initialized
with memory_dev_init() but then generic init sequence might just change in
future. Hence an early registration for the memory event notifier would be
helpful. While here, just skip the registration if CONFIG_MEMORY_HOTREMOVE
is not enabled and also call out when memory notifier registration fails.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1604896137-16644-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>