From e3e85271330b18f487ab3032ea9ca0601efeafaf Mon Sep 17 00:00:00 2001 From: Joey Gouly Date: Tue, 1 Oct 2024 14:36:17 +0100 Subject: [PATCH 01/13] arm64: set POR_EL0 for kernel threads Restrict kernel threads to only have RWX overlays for pkey 0. This matches what arch/x86 does, by defaulting to a restrictive PKRU. Signed-off-by: Joey Gouly Cc: Will Deacon Cc: Catalin Marinas Reviewed-by: Kevin Brodsky Link: https://lore.kernel.org/r/20241001133618.1547996-2-joey.gouly@arm.com Signed-off-by: Will Deacon --- arch/arm64/kernel/process.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 0540653fbf38..3e7c8c8195c3 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -412,6 +412,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) p->thread.cpu_context.x19 = (unsigned long)args->fn; p->thread.cpu_context.x20 = (unsigned long)args->fn_arg; + + if (system_supports_poe()) + p->thread.por_el0 = POR_EL0_INIT; } p->thread.cpu_context.pc = (unsigned long)ret_from_fork; p->thread.cpu_context.sp = (unsigned long)childregs; From f56d8d2389ba2a0cab0512637bd264611eab1b9a Mon Sep 17 00:00:00 2001 From: Joey Gouly Date: Tue, 1 Oct 2024 14:36:18 +0100 Subject: [PATCH 02/13] Documentation/protection-keys: add AArch64 to documentation As POE support was recently added, update the documentation. Also note that kernel threads have a default protection key register value. Signed-off-by: Joey Gouly Cc: Will Deacon Cc: Catalin Marinas Cc: Jonathan Corbet Link: https://lore.kernel.org/r/20241001133618.1547996-3-joey.gouly@arm.com [will: Adjusted wording based on feedback from Kevin] Signed-off-by: Will Deacon --- Documentation/core-api/protection-keys.rst | 38 +++++++++++++++++----- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index bf28ac0401f3..7eb7c6023e09 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -12,7 +12,10 @@ Pkeys Userspace (PKU) is a feature which can be found on: * Intel server CPUs, Skylake and later * Intel client CPUs, Tiger Lake (11th Gen Core) and later * Future AMD CPUs + * arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE) +x86_64 +====== Pkeys work by dedicating 4 previously Reserved bits in each page table entry to a "protection key", giving 16 possible keys. @@ -28,6 +31,22 @@ register. The feature is only available in 64-bit mode, even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches. +arm64 +===== + +Pkeys use 3 bits in each page table entry, to encode a "protection key index", +giving 8 possible keys. + +Protections for each key are defined with a per-CPU user-writable system +register (POR_EL0). This is a 64-bit register encoding read, write and execute +overlay permissions for each protection key index. + +Being a CPU register, POR_EL0 is inherently thread-local, potentially giving +each thread a different set of protections from every other thread. + +Unlike x86_64, the protection key permissions also apply to instruction +fetches. + Syscalls ======== @@ -38,11 +57,10 @@ There are 3 system calls which directly interact with pkeys:: int pkey_mprotect(unsigned long start, size_t len, unsigned long prot, int pkey); -Before a pkey can be used, it must first be allocated with -pkey_alloc(). An application calls the WRPKRU instruction -directly in order to change access permissions to memory covered -with a key. In this example WRPKRU is wrapped by a C function -called pkey_set(). +Before a pkey can be used, it must first be allocated with pkey_alloc(). An +application writes to the architecture specific CPU register directly in order +to change access permissions to memory covered with a key. In this example +this is wrapped by a C function called pkey_set(). :: int real_prot = PROT_READ|PROT_WRITE; @@ -64,9 +82,9 @@ is no longer in use:: munmap(ptr, PAGE_SIZE); pkey_free(pkey); -.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. - An example implementation can be found in - tools/testing/selftests/x86/protection_keys.c. +.. note:: pkey_set() is a wrapper around writing to the CPU register. + Example implementations can be found in + tools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h Behavior ======== @@ -96,3 +114,7 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + +Note that kernel accesses from a kthread (such as io_uring) will use a default +value for the protection key register and so will not be consistent with +userspace's value of the register or mprotect(). From 7aed6a2c51ffc97a126e0ea0c270fab7af97ae18 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 14 Oct 2024 17:11:00 +0100 Subject: [PATCH 03/13] kasan: Disable Software Tag-Based KASAN with GCC Syzbot reports a KASAN failure early during boot on arm64 when building with GCC 12.2.0 and using the Software Tag-Based KASAN mode: | BUG: KASAN: invalid-access in smp_build_mpidr_hash arch/arm64/kernel/setup.c:133 [inline] | BUG: KASAN: invalid-access in setup_arch+0x984/0xd60 arch/arm64/kernel/setup.c:356 | Write of size 4 at addr 03ff800086867e00 by task swapper/0 | Pointer tag: [03], memory tag: [fe] Initial triage indicates that the report is a false positive and a thorough investigation of the crash by Mark Rutland revealed the root cause to be a bug in GCC: > When GCC is passed `-fsanitize=hwaddress` or > `-fsanitize=kernel-hwaddress` it ignores > `__attribute__((no_sanitize_address))`, and instruments functions > we require are not instrumented. > > [...] > > All versions [of GCC] I tried were broken, from 11.3.0 to 14.2.0 > inclusive. > > I think we have to disable KASAN_SW_TAGS with GCC until this is > fixed Disable Software Tag-Based KASAN when building with GCC by making CC_HAS_KASAN_SW_TAGS depend on !CC_IS_GCC. Cc: Andrey Konovalov Suggested-by: Mark Rutland Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854 Reviewed-by: Andrey Konovalov Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20241014161100.18034-1-will@kernel.org Signed-off-by: Will Deacon --- lib/Kconfig.kasan | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan index 98016e137b7f..233ab2096924 100644 --- a/lib/Kconfig.kasan +++ b/lib/Kconfig.kasan @@ -22,8 +22,11 @@ config ARCH_DISABLE_KASAN_INLINE config CC_HAS_KASAN_GENERIC def_bool $(cc-option, -fsanitize=kernel-address) +# GCC appears to ignore no_sanitize_address when -fsanitize=kernel-hwaddress +# is passed. See https://bugzilla.kernel.org/show_bug.cgi?id=218854 (and +# the linked LKML thread) for more details. config CC_HAS_KASAN_SW_TAGS - def_bool $(cc-option, -fsanitize=kernel-hwaddress) + def_bool !CC_IS_GCC && $(cc-option, -fsanitize=kernel-hwaddress) # This option is only required for software KASAN modes. # Old GCC versions do not have proper support for no_sanitize_address. @@ -98,7 +101,7 @@ config KASAN_SW_TAGS help Enables Software Tag-Based KASAN. - Requires GCC 11+ or Clang. + Requires Clang. Supported only on arm64 CPUs and relies on Top Byte Ignore. From 894b00a3350c560990638bdf89bdf1f3d5491950 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 21 Oct 2024 14:00:10 +0200 Subject: [PATCH 04/13] kasan: Fix Software Tag-Based KASAN with GCC Per [1], -fsanitize=kernel-hwaddress with GCC currently does not disable instrumentation in functions with __attribute__((no_sanitize_address)). However, __attribute__((no_sanitize("hwaddress"))) does correctly disable instrumentation. Use it instead. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117196 [1] Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854 Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com Tested-by: Andrey Konovalov Cc: Andrew Pinski Cc: Mark Rutland Cc: Will Deacon Signed-off-by: Marco Elver Reviewed-by: Andrey Konovalov Fixes: 7b861a53e46b ("kasan: Bump required compiler version") Link: https://lore.kernel.org/r/20241021120013.3209481-1-elver@google.com Signed-off-by: Will Deacon --- include/linux/compiler-gcc.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index f805adaa316e..cd6f9aae311f 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -80,7 +80,11 @@ #define __noscs __attribute__((__no_sanitize__("shadow-call-stack"))) #endif +#ifdef __SANITIZE_HWADDRESS__ +#define __no_sanitize_address __attribute__((__no_sanitize__("hwaddress"))) +#else #define __no_sanitize_address __attribute__((__no_sanitize_address__)) +#endif #if defined(__SANITIZE_THREAD__) #define __no_sanitize_thread __attribute__((__no_sanitize_thread__)) From 237ab03e301d21cc8fed631a8cdb5076c92ac263 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 21 Oct 2024 14:00:11 +0200 Subject: [PATCH 05/13] Revert "kasan: Disable Software Tag-Based KASAN with GCC" This reverts commit 7aed6a2c51ffc97a126e0ea0c270fab7af97ae18. Now that __no_sanitize_address attribute is fixed for KASAN_SW_TAGS with GCC, allow re-enabling KASAN_SW_TAGS with GCC. Cc: Andrey Konovalov Cc: Andrew Pinski Cc: Mark Rutland Cc: Will Deacon Signed-off-by: Marco Elver Reviewed-by: Andrey Konovalov Link: https://lore.kernel.org/r/20241021120013.3209481-2-elver@google.com Signed-off-by: Will Deacon --- lib/Kconfig.kasan | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan index 233ab2096924..98016e137b7f 100644 --- a/lib/Kconfig.kasan +++ b/lib/Kconfig.kasan @@ -22,11 +22,8 @@ config ARCH_DISABLE_KASAN_INLINE config CC_HAS_KASAN_GENERIC def_bool $(cc-option, -fsanitize=kernel-address) -# GCC appears to ignore no_sanitize_address when -fsanitize=kernel-hwaddress -# is passed. See https://bugzilla.kernel.org/show_bug.cgi?id=218854 (and -# the linked LKML thread) for more details. config CC_HAS_KASAN_SW_TAGS - def_bool !CC_IS_GCC && $(cc-option, -fsanitize=kernel-hwaddress) + def_bool $(cc-option, -fsanitize=kernel-hwaddress) # This option is only required for software KASAN modes. # Old GCC versions do not have proper support for no_sanitize_address. @@ -101,7 +98,7 @@ config KASAN_SW_TAGS help Enables Software Tag-Based KASAN. - Requires Clang. + Requires GCC 11+ or Clang. Supported only on arm64 CPUs and relies on Top Byte Ignore. From c83212d79be2c9886d3e6039759ecd388fd5fed1 Mon Sep 17 00:00:00 2001 From: Xiongfeng Wang Date: Wed, 16 Oct 2024 16:47:40 +0800 Subject: [PATCH 06/13] firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() In sdei_device_freeze(), the input parameter of cpuhp_remove_state() is passed as 'sdei_entry_point' by mistake. Change it to 'sdei_hp_state'. Fixes: d2c48b2387eb ("firmware: arm_sdei: Fix sleep from invalid context BUG") Signed-off-by: Xiongfeng Wang Reviewed-by: James Morse Link: https://lore.kernel.org/r/20241016084740.183353-1-wangxiongfeng2@huawei.com Signed-off-by: Will Deacon --- drivers/firmware/arm_sdei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c index 285fe7ad490d..3e8051fe8296 100644 --- a/drivers/firmware/arm_sdei.c +++ b/drivers/firmware/arm_sdei.c @@ -763,7 +763,7 @@ static int sdei_device_freeze(struct device *dev) int err; /* unregister private events */ - cpuhp_remove_state(sdei_entry_point); + cpuhp_remove_state(sdei_hp_state); err = sdei_unregister_shared(); if (err) From 2e8a1acea8597ff42189ea94f0a63fa58640223d Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Tue, 29 Oct 2024 14:45:35 +0000 Subject: [PATCH 07/13] arm64: signal: Improve POR_EL0 handling to avoid uaccess failures Reset POR_EL0 to "allow all" before writing the signal frame, preventing spurious uaccess failures. When POE is supported, the POR_EL0 register constrains memory accesses based on the target page's POIndex (pkey). This raises the question: what constraints should apply to a signal handler? The current answer is that POR_EL0 is reset to POR_EL0_INIT when invoking the handler, giving it full access to POIndex 0. This is in line with x86's MPK support and remains unchanged. This is only part of the story, though. POR_EL0 constrains all unprivileged memory accesses, meaning that uaccess routines such as put_user() are also impacted. As a result POR_EL0 may prevent the signal frame from being written to the signal stack (ultimately causing a SIGSEGV). This is especially concerning when an alternate signal stack is used, because userspace may want to prevent access to it outside of signal handlers. There is currently no provision for that: POR_EL0 is reset after writing to the stack, and POR_EL0_INIT only enables access to POIndex 0. This patch ensures that POR_EL0 is reset to its most permissive state before the signal stack is accessed. Once the signal frame has been fully written, POR_EL0 is still set to POR_EL0_INIT - it is up to the signal handler to enable access to additional pkeys if needed. As to sigreturn(), it expects having access to the stack like any other syscall; we only need to ensure that POR_EL0 is restored from the signal frame after all uaccess calls. This approach is in line with the recent x86/pkeys series [1]. Resetting POR_EL0 early introduces some complications, in that we can no longer read the register directly in preserve_poe_context(). This is addressed by introducing a struct (user_access_state) and helpers to manage any such register impacting user accesses (uaccess and accesses in userspace). Things look like this on signal delivery: 1. Save original POR_EL0 into struct [save_reset_user_access_state()] 2. Set POR_EL0 to "allow all" [save_reset_user_access_state()] 3. Create signal frame 4. Write saved POR_EL0 value to the signal frame [preserve_poe_context()] 5. Finalise signal frame 6. If all operations succeeded: a. Set POR_EL0 to POR_EL0_INIT [set_handler_user_access_state()] b. Else reset POR_EL0 to its original value [restore_user_access_state()] If any step fails when setting up the signal frame, the process will be sent a SIGSEGV, which it may be able to handle. Step 6.b ensures that the original POR_EL0 is saved in the signal frame when delivering that SIGSEGV (so that the original value is restored by sigreturn). The return path (sys_rt_sigreturn) doesn't strictly require any change since restore_poe_context() is already called last. However, to avoid uaccess calls being accidentally added after that point, we use the same approach as in the delivery path, i.e. separating uaccess from writing to the register: 1. Read saved POR_EL0 value from the signal frame [restore_poe_context()] 2. Set POR_EL0 to the saved value [restore_user_access_state()] [1] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@oracle.com/ Fixes: 9160f7e909e1 ("arm64: add POE signal support") Reviewed-by: Catalin Marinas Signed-off-by: Kevin Brodsky Link: https://lore.kernel.org/r/20241029144539.111155-2-kevin.brodsky@arm.com Signed-off-by: Will Deacon --- arch/arm64/kernel/signal.c | 92 ++++++++++++++++++++++++++++++++------ 1 file changed, 78 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 561986947530..c7d311d8b92a 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -66,10 +67,63 @@ struct rt_sigframe_user_layout { unsigned long end_offset; }; +/* + * Holds any EL0-controlled state that influences unprivileged memory accesses. + * This includes both accesses done in userspace and uaccess done in the kernel. + * + * This state needs to be carefully managed to ensure that it doesn't cause + * uaccess to fail when setting up the signal frame, and the signal handler + * itself also expects a well-defined state when entered. + */ +struct user_access_state { + u64 por_el0; +}; + #define BASE_SIGFRAME_SIZE round_up(sizeof(struct rt_sigframe), 16) #define TERMINATOR_SIZE round_up(sizeof(struct _aarch64_ctx), 16) #define EXTRA_CONTEXT_SIZE round_up(sizeof(struct extra_context), 16) +/* + * Save the user access state into ua_state and reset it to disable any + * restrictions. + */ +static void save_reset_user_access_state(struct user_access_state *ua_state) +{ + if (system_supports_poe()) { + u64 por_enable_all = 0; + + for (int pkey = 0; pkey < arch_max_pkey(); pkey++) + por_enable_all |= POE_RXW << (pkey * POR_BITS_PER_PKEY); + + ua_state->por_el0 = read_sysreg_s(SYS_POR_EL0); + write_sysreg_s(por_enable_all, SYS_POR_EL0); + /* Ensure that any subsequent uaccess observes the updated value */ + isb(); + } +} + +/* + * Set the user access state for invoking the signal handler. + * + * No uaccess should be done after that function is called. + */ +static void set_handler_user_access_state(void) +{ + if (system_supports_poe()) + write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); +} + +/* + * Restore the user access state to the values saved in ua_state. + * + * No uaccess should be done after that function is called. + */ +static void restore_user_access_state(const struct user_access_state *ua_state) +{ + if (system_supports_poe()) + write_sysreg_s(ua_state->por_el0, SYS_POR_EL0); +} + static void init_user_layout(struct rt_sigframe_user_layout *user) { const size_t reserved_size = @@ -261,18 +315,20 @@ static int restore_fpmr_context(struct user_ctxs *user) return err; } -static int preserve_poe_context(struct poe_context __user *ctx) +static int preserve_poe_context(struct poe_context __user *ctx, + const struct user_access_state *ua_state) { int err = 0; __put_user_error(POE_MAGIC, &ctx->head.magic, err); __put_user_error(sizeof(*ctx), &ctx->head.size, err); - __put_user_error(read_sysreg_s(SYS_POR_EL0), &ctx->por_el0, err); + __put_user_error(ua_state->por_el0, &ctx->por_el0, err); return err; } -static int restore_poe_context(struct user_ctxs *user) +static int restore_poe_context(struct user_ctxs *user, + struct user_access_state *ua_state) { u64 por_el0; int err = 0; @@ -282,7 +338,7 @@ static int restore_poe_context(struct user_ctxs *user) __get_user_error(por_el0, &(user->poe->por_el0), err); if (!err) - write_sysreg_s(por_el0, SYS_POR_EL0); + ua_state->por_el0 = por_el0; return err; } @@ -850,7 +906,8 @@ invalid: } static int restore_sigframe(struct pt_regs *regs, - struct rt_sigframe __user *sf) + struct rt_sigframe __user *sf, + struct user_access_state *ua_state) { sigset_t set; int i, err; @@ -899,7 +956,7 @@ static int restore_sigframe(struct pt_regs *regs, err = restore_zt_context(&user); if (err == 0 && system_supports_poe() && user.poe) - err = restore_poe_context(&user); + err = restore_poe_context(&user, ua_state); return err; } @@ -908,6 +965,7 @@ SYSCALL_DEFINE0(rt_sigreturn) { struct pt_regs *regs = current_pt_regs(); struct rt_sigframe __user *frame; + struct user_access_state ua_state; /* Always make any pending restarted system calls return -EINTR */ current->restart_block.fn = do_no_restart_syscall; @@ -924,12 +982,14 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!access_ok(frame, sizeof (*frame))) goto badframe; - if (restore_sigframe(regs, frame)) + if (restore_sigframe(regs, frame, &ua_state)) goto badframe; if (restore_altstack(&frame->uc.uc_stack)) goto badframe; + restore_user_access_state(&ua_state); + return regs->regs[0]; badframe: @@ -1035,7 +1095,8 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, } static int setup_sigframe(struct rt_sigframe_user_layout *user, - struct pt_regs *regs, sigset_t *set) + struct pt_regs *regs, sigset_t *set, + const struct user_access_state *ua_state) { int i, err = 0; struct rt_sigframe __user *sf = user->sigframe; @@ -1097,10 +1158,9 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, struct poe_context __user *poe_ctx = apply_user_offset(user, user->poe_offset); - err |= preserve_poe_context(poe_ctx); + err |= preserve_poe_context(poe_ctx, ua_state); } - /* ZA state if present */ if (system_supports_sme() && err == 0 && user->za_offset) { struct za_context __user *za_ctx = @@ -1237,9 +1297,6 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, sme_smstop(); } - if (system_supports_poe()) - write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); - if (ka->sa.sa_flags & SA_RESTORER) sigtramp = ka->sa.sa_restorer; else @@ -1253,6 +1310,7 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, { struct rt_sigframe_user_layout user; struct rt_sigframe __user *frame; + struct user_access_state ua_state; int err = 0; fpsimd_signal_preserve_current_state(); @@ -1260,13 +1318,14 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, if (get_sigframe(&user, ksig, regs)) return 1; + save_reset_user_access_state(&ua_state); frame = user.sigframe; __put_user_error(0, &frame->uc.uc_flags, err); __put_user_error(NULL, &frame->uc.uc_link, err); err |= __save_altstack(&frame->uc.uc_stack, regs->sp); - err |= setup_sigframe(&user, regs, set); + err |= setup_sigframe(&user, regs, set, &ua_state); if (err == 0) { setup_return(regs, &ksig->ka, &user, usig); if (ksig->ka.sa.sa_flags & SA_SIGINFO) { @@ -1276,6 +1335,11 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, } } + if (err == 0) + set_handler_user_access_state(); + else + restore_user_access_state(&ua_state); + return err; } From 466ece4c6e195263ccab2d402ee06b2f8d639a0e Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Tue, 29 Oct 2024 14:45:36 +0000 Subject: [PATCH 08/13] arm64: signal: Remove unnecessary check when saving POE state The POE frame record is allocated unconditionally if POE is supported. If the allocation fails, a SIGSEGV is delivered before setup_sigframe() can be reached. As a result there is no need to consider poe_offset before saving POR_EL0; just remove that check. This is in line with other frame records (FPMR, TPIDR2). Reviewed-by: Mark Brown Reviewed-by: Dave Martin Acked-by: Catalin Marinas Signed-off-by: Kevin Brodsky Link: https://lore.kernel.org/r/20241029144539.111155-3-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas --- arch/arm64/kernel/signal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index c7d311d8b92a..d5eb517cc4df 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1154,7 +1154,7 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, err |= preserve_fpmr_context(fpmr_ctx); } - if (system_supports_poe() && err == 0 && user->poe_offset) { + if (system_supports_poe() && err == 0) { struct poe_context __user *poe_ctx = apply_user_offset(user, user->poe_offset); From 8edbbfcc1ed3ca2170a2c5e888a76ba3652e62c9 Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Tue, 29 Oct 2024 14:45:37 +0000 Subject: [PATCH 09/13] arm64: signal: Remove unused macro Commit 33f082614c34 ("arm64: signal: Allow expansion of the signal frame") introduced the BASE_SIGFRAME_SIZE macro but it has apparently never been used; just remove it. Reviewed-by: Dave Martin Acked-by: Catalin Marinas Signed-off-by: Kevin Brodsky Link: https://lore.kernel.org/r/20241029144539.111155-4-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas --- arch/arm64/kernel/signal.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index d5eb517cc4df..b077181a66c0 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -79,7 +79,6 @@ struct user_access_state { u64 por_el0; }; -#define BASE_SIGFRAME_SIZE round_up(sizeof(struct rt_sigframe), 16) #define TERMINATOR_SIZE round_up(sizeof(struct _aarch64_ctx), 16) #define EXTRA_CONTEXT_SIZE round_up(sizeof(struct extra_context), 16) From 6e182dc9f2680681ffb0b6d9757927f1bd321b38 Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Tue, 29 Oct 2024 14:45:38 +0000 Subject: [PATCH 10/13] selftests/mm: Use generic pkey register manipulation pkey_sighandler_tests.c currently hardcodes x86 PKRU encodings. The first step towards running those tests on arm64 is to abstract away the pkey register values. Since those tests want to deny access to all keys except a few, we have each arch define PKEY_REG_ALLOW_NONE, the pkey register value denying access to all keys. We then use the existing set_pkey_bits() helper to grant access to specific keys. Because pkeys may also remove the execute permission on arm64, we need to be a little careful: all code is mapped with pkey 0, and we need it to remain executable. pkey_reg_restrictive_default() is introduced for that purpose: the value it returns prevents RW access to all pkeys, but retains X permission for pkey 0. test_pkru_preserved_after_sigusr1() only checks that the pkey register value remains unchanged after a signal is delivered, so the particular value is irrelevant. We enable pkey 0 and a few more arbitrary keys in the smallest range available on all architectures (8 keys on arm64). Signed-off-by: Kevin Brodsky Acked-by: Dave Hansen Link: https://lore.kernel.org/r/20241029144539.111155-5-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas --- tools/testing/selftests/mm/pkey-arm64.h | 1 + tools/testing/selftests/mm/pkey-x86.h | 2 + .../selftests/mm/pkey_sighandler_tests.c | 53 +++++++++++++++---- 3 files changed, 47 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/mm/pkey-arm64.h b/tools/testing/selftests/mm/pkey-arm64.h index 580e1b0bb38e..d57fbeace38f 100644 --- a/tools/testing/selftests/mm/pkey-arm64.h +++ b/tools/testing/selftests/mm/pkey-arm64.h @@ -31,6 +31,7 @@ #define NR_RESERVED_PKEYS 1 /* pkey-0 */ #define PKEY_ALLOW_ALL 0x77777777 +#define PKEY_REG_ALLOW_NONE 0x0 #define PKEY_BITS_PER_PKEY 4 #define PAGE_SIZE sysconf(_SC_PAGESIZE) diff --git a/tools/testing/selftests/mm/pkey-x86.h b/tools/testing/selftests/mm/pkey-x86.h index 5f28e26a2511..ac91777c8917 100644 --- a/tools/testing/selftests/mm/pkey-x86.h +++ b/tools/testing/selftests/mm/pkey-x86.h @@ -34,6 +34,8 @@ #define PAGE_SIZE 4096 #define MB (1<<20) +#define PKEY_REG_ALLOW_NONE 0x55555555 + static inline void __page_o_noops(void) { /* 8-bytes of instruction * 512 bytes = 1 page */ diff --git a/tools/testing/selftests/mm/pkey_sighandler_tests.c b/tools/testing/selftests/mm/pkey_sighandler_tests.c index a8088b645ad6..501880dbdc37 100644 --- a/tools/testing/selftests/mm/pkey_sighandler_tests.c +++ b/tools/testing/selftests/mm/pkey_sighandler_tests.c @@ -11,6 +11,7 @@ */ #define _GNU_SOURCE #define __SANE_USERSPACE_TYPES__ +#include #include #include #include @@ -65,6 +66,20 @@ long syscall_raw(long n, long a1, long a2, long a3, long a4, long a5, long a6) return ret; } +/* + * Returns the most restrictive pkey register value that can be used by the + * tests. + */ +static inline u64 pkey_reg_restrictive_default(void) +{ + /* + * Disallow everything except execution on pkey 0, so that each caller + * doesn't need to enable it explicitly (the selftest code runs with + * its code mapped with pkey 0). + */ + return set_pkey_bits(PKEY_REG_ALLOW_NONE, 0, PKEY_DISABLE_ACCESS); +} + static void sigsegv_handler(int signo, siginfo_t *info, void *ucontext) { pthread_mutex_lock(&mutex); @@ -113,7 +128,7 @@ static void raise_sigusr2(void) static void *thread_segv_with_pkey0_disabled(void *ptr) { /* Disable MPK 0 (and all others too) */ - __write_pkey_reg(0x55555555); + __write_pkey_reg(pkey_reg_restrictive_default()); /* Segfault (with SEGV_MAPERR) */ *(int *) (0x1) = 1; @@ -123,7 +138,7 @@ static void *thread_segv_with_pkey0_disabled(void *ptr) static void *thread_segv_pkuerr_stack(void *ptr) { /* Disable MPK 0 (and all others too) */ - __write_pkey_reg(0x55555555); + __write_pkey_reg(pkey_reg_restrictive_default()); /* After we disable MPK 0, we can't access the stack to return */ return NULL; @@ -133,6 +148,7 @@ static void *thread_segv_maperr_ptr(void *ptr) { stack_t *stack = ptr; int *bad = (int *)1; + u64 pkey_reg; /* * Setup alternate signal stack, which should be pkey_mprotect()ed by @@ -142,7 +158,9 @@ static void *thread_segv_maperr_ptr(void *ptr) syscall_raw(SYS_sigaltstack, (long)stack, 0, 0, 0, 0, 0); /* Disable MPK 0. Only MPK 1 is enabled. */ - __write_pkey_reg(0x55555551); + pkey_reg = pkey_reg_restrictive_default(); + pkey_reg = set_pkey_bits(pkey_reg, 1, PKEY_UNRESTRICTED); + __write_pkey_reg(pkey_reg); /* Segfault */ *bad = 1; @@ -240,6 +258,7 @@ static void test_sigsegv_handler_with_different_pkey_for_stack(void) int pkey; int parent_pid = 0; int child_pid = 0; + u64 pkey_reg; sa.sa_flags = SA_SIGINFO | SA_ONSTACK; @@ -257,7 +276,10 @@ static void test_sigsegv_handler_with_different_pkey_for_stack(void) assert(stack != MAP_FAILED); /* Allow access to MPK 0 and MPK 1 */ - __write_pkey_reg(0x55555550); + pkey_reg = pkey_reg_restrictive_default(); + pkey_reg = set_pkey_bits(pkey_reg, 0, PKEY_UNRESTRICTED); + pkey_reg = set_pkey_bits(pkey_reg, 1, PKEY_UNRESTRICTED); + __write_pkey_reg(pkey_reg); /* Protect the new stack with MPK 1 */ pkey = pkey_alloc(0, 0); @@ -307,7 +329,13 @@ static void test_sigsegv_handler_with_different_pkey_for_stack(void) static void test_pkru_preserved_after_sigusr1(void) { struct sigaction sa; - unsigned long pkru = 0x45454544; + u64 pkey_reg; + + /* Allow access to MPK 0 and an arbitrary set of keys */ + pkey_reg = pkey_reg_restrictive_default(); + pkey_reg = set_pkey_bits(pkey_reg, 0, PKEY_UNRESTRICTED); + pkey_reg = set_pkey_bits(pkey_reg, 3, PKEY_UNRESTRICTED); + pkey_reg = set_pkey_bits(pkey_reg, 7, PKEY_UNRESTRICTED); sa.sa_flags = SA_SIGINFO; @@ -320,7 +348,7 @@ static void test_pkru_preserved_after_sigusr1(void) memset(&siginfo, 0, sizeof(siginfo)); - __write_pkey_reg(pkru); + __write_pkey_reg(pkey_reg); raise(SIGUSR1); @@ -330,7 +358,7 @@ static void test_pkru_preserved_after_sigusr1(void) pthread_mutex_unlock(&mutex); /* Ensure the pkru value is the same after returning from signal. */ - ksft_test_result(pkru == __read_pkey_reg() && + ksft_test_result(pkey_reg == __read_pkey_reg() && siginfo.si_signo == SIGUSR1, "%s\n", __func__); } @@ -347,6 +375,7 @@ static noinline void *thread_sigusr2_self(void *ptr) 'S', 'I', 'G', 'U', 'S', 'R', '2', '.', '.', '.', '\n', '\0'}; stack_t *stack = ptr; + u64 pkey_reg; /* * Setup alternate signal stack, which should be pkey_mprotect()ed by @@ -356,7 +385,9 @@ static noinline void *thread_sigusr2_self(void *ptr) syscall(SYS_sigaltstack, (long)stack, 0, 0, 0, 0, 0); /* Disable MPK 0. Only MPK 2 is enabled. */ - __write_pkey_reg(0x55555545); + pkey_reg = pkey_reg_restrictive_default(); + pkey_reg = set_pkey_bits(pkey_reg, 2, PKEY_UNRESTRICTED); + __write_pkey_reg(pkey_reg); raise_sigusr2(); @@ -384,6 +415,7 @@ static void test_pkru_sigreturn(void) int pkey; int parent_pid = 0; int child_pid = 0; + u64 pkey_reg; sa.sa_handler = SIG_DFL; sa.sa_flags = 0; @@ -418,7 +450,10 @@ static void test_pkru_sigreturn(void) * the current thread's stack is protected by the default MPK 0. Hence * both need to be enabled. */ - __write_pkey_reg(0x55555544); + pkey_reg = pkey_reg_restrictive_default(); + pkey_reg = set_pkey_bits(pkey_reg, 0, PKEY_UNRESTRICTED); + pkey_reg = set_pkey_bits(pkey_reg, 2, PKEY_UNRESTRICTED); + __write_pkey_reg(pkey_reg); /* Protect the stack with MPK 2 */ pkey = pkey_alloc(0, 0); From 49f59573e9e06093ba23caf4ea1641b16e7e497e Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Tue, 29 Oct 2024 14:45:39 +0000 Subject: [PATCH 11/13] selftests/mm: Enable pkey_sighandler_tests on arm64 pkey_sighandler_tests.c makes raw syscalls using its own helper, syscall_raw(). One of those syscalls is clone, which is problematic as every architecture has a different opinion on the order of its arguments. To complete arm64 support, we therefore add an appropriate implementation in syscall_raw(), and introduce a clone_raw() helper that shuffles arguments as needed for each arch. Having done this, we enable building pkey_sighandler_tests for arm64 in the Makefile. Signed-off-by: Kevin Brodsky Link: https://lore.kernel.org/r/20241029144539.111155-6-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas --- tools/testing/selftests/mm/Makefile | 8 +-- .../selftests/mm/pkey_sighandler_tests.c | 62 ++++++++++++++----- 2 files changed, 50 insertions(+), 20 deletions(-) diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 02e1204971b0..0f8c110e0805 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -105,12 +105,12 @@ endif ifeq ($(CAN_BUILD_X86_64),1) TEST_GEN_FILES += $(BINARIES_64) endif -else -ifneq (,$(filter $(ARCH),arm64 powerpc)) +else ifeq ($(ARCH),arm64) +TEST_GEN_FILES += protection_keys +TEST_GEN_FILES += pkey_sighandler_tests +else ifeq ($(ARCH),powerpc) TEST_GEN_FILES += protection_keys -endif - endif ifneq (,$(filter $(ARCH),arm64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64 s390)) diff --git a/tools/testing/selftests/mm/pkey_sighandler_tests.c b/tools/testing/selftests/mm/pkey_sighandler_tests.c index 501880dbdc37..c593a426341c 100644 --- a/tools/testing/selftests/mm/pkey_sighandler_tests.c +++ b/tools/testing/selftests/mm/pkey_sighandler_tests.c @@ -60,12 +60,44 @@ long syscall_raw(long n, long a1, long a2, long a3, long a4, long a5, long a6) : "=a"(ret) : "a"(n), "b"(a1), "c"(a2), "d"(a3), "S"(a4), "D"(a5) : "memory"); +#elif defined __aarch64__ + register long x0 asm("x0") = a1; + register long x1 asm("x1") = a2; + register long x2 asm("x2") = a3; + register long x3 asm("x3") = a4; + register long x4 asm("x4") = a5; + register long x5 asm("x5") = a6; + register long x8 asm("x8") = n; + asm volatile ("svc #0" + : "=r"(x0) + : "r"(x0), "r"(x1), "r"(x2), "r"(x3), "r"(x4), "r"(x5), "r"(x8) + : "memory"); + ret = x0; #else # error syscall_raw() not implemented #endif return ret; } +static inline long clone_raw(unsigned long flags, void *stack, + int *parent_tid, int *child_tid) +{ + long a1 = flags; + long a2 = (long)stack; + long a3 = (long)parent_tid; +#if defined(__x86_64__) || defined(__i386) + long a4 = (long)child_tid; + long a5 = 0; +#elif defined(__aarch64__) + long a4 = 0; + long a5 = (long)child_tid; +#else +# error clone_raw() not implemented +#endif + + return syscall_raw(SYS_clone, a1, a2, a3, a4, a5, 0); +} + /* * Returns the most restrictive pkey register value that can be used by the * tests. @@ -294,14 +326,13 @@ static void test_sigsegv_handler_with_different_pkey_for_stack(void) memset(&siginfo, 0, sizeof(siginfo)); /* Use clone to avoid newer glibcs using rseq on new threads */ - long ret = syscall_raw(SYS_clone, - CLONE_VM | CLONE_FS | CLONE_FILES | - CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM | - CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | - CLONE_DETACHED, - (long) ((char *)(stack) + STACK_SIZE), - (long) &parent_pid, - (long) &child_pid, 0, 0); + long ret = clone_raw(CLONE_VM | CLONE_FS | CLONE_FILES | + CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM | + CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | + CLONE_DETACHED, + stack + STACK_SIZE, + &parent_pid, + &child_pid); if (ret < 0) { errno = -ret; @@ -466,14 +497,13 @@ static void test_pkru_sigreturn(void) sigstack.ss_size = STACK_SIZE; /* Use clone to avoid newer glibcs using rseq on new threads */ - long ret = syscall_raw(SYS_clone, - CLONE_VM | CLONE_FS | CLONE_FILES | - CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM | - CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | - CLONE_DETACHED, - (long) ((char *)(stack) + STACK_SIZE), - (long) &parent_pid, - (long) &child_pid, 0, 0); + long ret = clone_raw(CLONE_VM | CLONE_FS | CLONE_FILES | + CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM | + CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | + CLONE_DETACHED, + stack + STACK_SIZE, + &parent_pid, + &child_pid); if (ret < 0) { errno = -ret; From db64dfffcad2992d6bfc680822bdf715335c43f1 Mon Sep 17 00:00:00 2001 From: Kevin Brodsky Date: Thu, 7 Nov 2024 13:16:40 +0000 Subject: [PATCH 12/13] selftests/mm: Define PKEY_UNRESTRICTED for pkey_sighandler_tests Commit 6e182dc9f268 ("selftests/mm: Use generic pkey register manipulation") makes use of PKEY_UNRESTRICTED in pkey_sighandler_tests. The macro has been proposed for addition to uapi headers [1], but the patch hasn't landed yet. Define PKEY_UNRESTRICTED in pkey-helpers.h for the time being to fix the build. [1] https://lore.kernel.org/all/20241028090715.509527-2-yury.khrustalev@arm.com/ Fixes: 6e182dc9f268 ("selftests/mm: Use generic pkey register manipulation") Reported-by: Aishwarya TCV Signed-off-by: Kevin Brodsky Link: https://lore.kernel.org/r/20241107131640.650703-1-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas --- tools/testing/selftests/mm/pkey-helpers.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/tools/testing/selftests/mm/pkey-helpers.h b/tools/testing/selftests/mm/pkey-helpers.h index 9ab6a3ee153b..f7cfe163b0ff 100644 --- a/tools/testing/selftests/mm/pkey-helpers.h +++ b/tools/testing/selftests/mm/pkey-helpers.h @@ -112,6 +112,13 @@ void record_pkey_malloc(void *ptr, long size, int prot); #define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) #endif +/* + * FIXME: Remove once the generic PKEY_UNRESTRICTED definition is merged. + */ +#ifndef PKEY_UNRESTRICTED +#define PKEY_UNRESTRICTED 0x0 +#endif + #ifndef set_pkey_bits static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags) { From 929bbc16abfb0144db7ac619c77f60b188e555ab Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Fri, 8 Nov 2024 11:05:49 +0000 Subject: [PATCH 13/13] selftests/mm: Fix unused function warning for aarch64_write_signal_pkey() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since commit 49f59573e9e0 ("selftests/mm: Enable pkey_sighandler_tests on arm64"), pkey_sighandler_tests.c (which includes pkey-arm64.h via pkey-helpers.h) ends up compiled for arm64. Since it doesn't use aarch64_write_signal_pkey(), the compiler warns: In file included from pkey-helpers.h:106, from pkey_sighandler_tests.c:31: pkey-arm64.h:130:13: warning: ‘aarch64_write_signal_pkey’ defined but not used [-Wunused-function] 130 | static void aarch64_write_signal_pkey(ucontext_t *uctxt, u64 pkey) | ^~~~~~~~~~~~~~~~~~~~~~~~~ Make the aarch64_write_signal_pkey() a 'static inline void' function to avoid the compiler warning. Fixes: f5b5ea51f78f ("selftests: mm: make protection_keys test work on arm64") Cc: Shuah Khan Cc: Joey Gouly Cc: Kevin Brodsky Reviewed-by: Kevin Brodsky Link: https://lore.kernel.org/r/20241108110549.1185923-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas --- tools/testing/selftests/mm/pkey-arm64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/pkey-arm64.h b/tools/testing/selftests/mm/pkey-arm64.h index d57fbeace38f..d9d2100eafc0 100644 --- a/tools/testing/selftests/mm/pkey-arm64.h +++ b/tools/testing/selftests/mm/pkey-arm64.h @@ -127,7 +127,7 @@ static inline u64 get_pkey_bits(u64 reg, int pkey) return 0; } -static void aarch64_write_signal_pkey(ucontext_t *uctxt, u64 pkey) +static inline void aarch64_write_signal_pkey(ucontext_t *uctxt, u64 pkey) { struct _aarch64_ctx *ctx = GET_UC_RESV_HEAD(uctxt); struct poe_context *poe_ctx =