When an interrupt is taken, the SRR registers are set to return to where
it left off. Unless they are modified in the meantime, or the return
address or MSR are modified, there is no need to reload these registers
when returning from interrupt.
Introduce per-CPU flags that track the validity of SRR and HSRR
registers. These are cleared when returning from interrupt, when
using the registers for something else (e.g., OPAL calls), when
adjusting the return address or MSR of a context, and when context
switching (which changes the return address and MSR).
This improves the performance of interrupt returns.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fixup patch from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com
Pass the value of linux_banner to firmware via option vector 7.
Option vector 7 is described in "LoPAR" Linux on Power Architecture
Reference v2.9, in table B.7 on page 824:
An ASCII character formatted null terminated string that describes
the client operating system. The string shall be human readable and
may be displayed on the console.
The string can be up to 256 bytes total, including the nul terminator.
linux_banner contains lots of information, and should make it possible
to identify the exact kernel version that is running:
const char linux_banner[] =
"Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
For example:
Linux version 4.15.0-144-generic (buildd@bos02-ppc64el-018) (gcc
version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #148-Ubuntu SMP Sat May 8
02:32:13 UTC 2021 (Ubuntu 4.15.0-144.148-generic 4.15.18)
It's also printed at boot to the console/dmesg, which should make it
possible to correlate what firmware receives with the console/dmesg on
the machine.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621064938.2021419-2-mpe@ellerman.id.au
POWER9 and POWER10 asynchronous machine checks due to stores have their
cause reported in SRR1 but SRR1[42] is set, which in other cases
indicates DSISR cause.
Check for these cases and clear SRR1[42], so the cause matching uses
the i-side (SRR1) table.
Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler")
Fixes: 201220bb0e ("powerpc/powernv: Machine check handler for POWER10")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210517140355.2325406-1-npiggin@gmail.com
Pull powerpc fixes from Michael Ellerman:
"Fix initrd corruption caused by our recent change to use relative jump
labels.
Fix a crash using perf record on systems without a hardware PMU
backend.
Rework our 64-bit signal handling slighty to make it more closely
match the old behaviour, after the recent change to use unsafe user
accessors.
Thanks to Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel
Axtens, Greg Kurz, and Roman Bolshakov"
* tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set
powerpc: Fix initrd corruption with relative jump labels
powerpc/signal64: Copy siginfo before changing regs->nip
powerpc/mem: Add back missing header to fix 'no previous prototype' error
This commit in sched/urgent moved the cfs_rq_is_decayed() function:
a7b359fc6a: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
and this fresh commit in sched/core modified it in the old location:
9e077b52d8: ("sched/pelt: Check that *_avg are null when *_sum are")
Merge the two variants.
Conflicts:
kernel/sched/fair.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge some powerpc KVM patches from our topic branch.
In particular this brings in Nick's big series rewriting parts of the
guest entry/exit path in C.
Conflicts:
arch/powerpc/kernel/security.c
arch/powerpc/kvm/book3s_hv_rmhandlers.S
When delivering a signal to a sigaction style handler (SA_SIGINFO), we
pass pointers to the siginfo and ucontext via r4 and r5.
Currently we populate the values in those registers by reading the
pointers out of the sigframe in user memory, even though the values in
user memory were written by the kernel just prior:
unsafe_put_user(&frame->info, &frame->pinfo, badframe_block);
unsafe_put_user(&frame->uc, &frame->puc, badframe_block);
...
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
err |= get_user(regs->gpr[4], (unsigned long __user *)&frame->pinfo);
err |= get_user(regs->gpr[5], (unsigned long __user *)&frame->puc);
ie. we write &frame->info into frame->pinfo, and then read frame->pinfo
back into r4, and similarly for &frame->uc.
The code has always been like this, since linux-fullhistory commit
d4f2d95eca2c ("Forward port of 2.4 ppc64 signal changes.").
There's no reason for us to read the values back from user memory,
rather than just setting the value in the gpr[4/5] directly. In fact
reading the value back from user memory opens up the possibility of
another user thread changing the values before we read them back.
Although any process doing that would be racing against the kernel
delivering the signal, and would risk corrupting the stack, so that
would be a userspace bug.
Note that this is 64-bit only code, so there's no subtlety with the size
of pointers differing between kernel and user. Also the frame variable
is not modified to point elsewhere during the function.
In the past reading the values back from user memory was not costly, but
now that we have KUAP on some CPUs it is, so we'd rather avoid it for
that reason too.
So change the code to just set the values directly, using the same
values we have written to the sigframe previously in the function.
Note also that this matches what our 32-bit signal code does.
Using a version of will-it-scale's signal1_threads that sets SA_SIGINFO,
this results in a ~4% increase in signals per second on a Power9, from
229,777 to 239,766.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210610072949.3198522-1-mpe@ellerman.id.au
Make our stack-walking code KASAN-safe by using __no_sanitize_address.
Generic code, arm64, s390 and x86 all make accesses unchecked for similar
sorts of reasons: when unwinding a stack, we might touch memory that KASAN
has marked as being out-of-bounds. In ppc64 KASAN development, I hit this
sometimes when checking for an exception frame - because we're checking
an arbitrary offset into the stack frame.
See commit 2095574632 ("s390/kasan: avoid false positives during stack
unwind"), commit bcaf669b4b ("arm64: disable kasan when accessing
frame->fp in unwind_frame"), commit 91e08ab0c8 ("x86/dumpstack:
Prevent KASAN false positive warnings") and commit 6e22c83664
("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer").
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210614120907.1952321-1-dja@axtens.net
nip is already an unsigned long, no cast needed.
op_callback_addr and emulate_step_addr are kprobe_opcode_t *.
There value is obtained with ppc_kallsyms_lookup_name() which
returns 'unsigned long', and there values are used create_branch()
which expects 'unsigned long'. So change them to 'unsigned long'
to avoid casting them back and forth.
can_optimize() used p->addr several times as 'unsigned long'.
Use a local 'unsigned long' variable and avoid casting multiple times.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e03192a6d4123242a275e71ce2ba0bb4d90700c1.1621516826.git.christophe.leroy@csgroup.eu
'struct ppc_inst' is an internal representation of an instruction, but
in-memory instructions are and will remain a table of 'u32' forever.
Replace all 'struct ppc_inst *' used for locating an instruction in
memory by 'u32 *'. This removes a lot of undue casts to 'struct
ppc_inst *'.
It also helps locating ab-use of 'struct ppc_inst' dereference.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix ppc_inst_next(), use u32 instead of unsigned int]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
'struct ppc_inst' is an internal structure to represent an instruction,
it is not directly the representation of that instruction in text code.
It is not meant to map and dereference code.
Dereferencing code directly through 'struct ppc_inst' has two main issues:
- On powerpc, structs are expected to be 8 bytes aligned while code is
spread every 4 byte.
- Should a non prefixed instruction lie at the end of the page and the
following page not be mapped, it would generate a page fault.
In-memory code must be accessed with ppc_inst_read().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c9a1201dd0a66b4a0f91f0fb46d9385cbf030feb.1621516826.git.christophe.leroy@csgroup.eu
In commit 96d7a4e06f ("powerpc/signal64: Rewrite handle_rt_signal64()
to minimise uaccess switches") the 64-bit signal code was rearranged to
use user_write_access_begin/end().
As part of that change the call to copy_siginfo_to_user() was moved
later in the function, so that it could be done after the
user_write_access_end().
In particular it was moved after we modify regs->nip to point to the
signal trampoline. That means if copy_siginfo_to_user() fails we exit
handle_rt_signal64() with an error but with regs->nip modified, whereas
previously we would not modify regs->nip until the copy succeeded.
Returning an error from signal delivery but with regs->nip updated
leaves the process in a sort of half-delivered state. We do immediately
force a SEGV in signal_setup_done(), called from do_signal(), so the
process should never run in the half-delivered state.
However that SEGV is not delivered until we've gone around to
do_notify_resume() again, so it's possible some tracing could observe
the half-delivered state.
There are other cases where we fail signal delivery with regs partly
updated, eg. the write to newsp and SA_SIGINFO, but the latter at least
is very unlikely to fail as it reads back from the frame we just wrote
to.
Looking at other arches they seem to be more careful about leaving regs
unchanged until the copy operations have succeeded, and in general that
seems like good hygenie.
So although the current behaviour is not cleary buggy, it's also not
clearly correct. So move the call to copy_siginfo_to_user() up prior to
the modification of regs->nip, which is closer to the old behaviour, and
easier to reason about.
Fixes: 96d7a4e06f ("powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210608134605.2783677-1-mpe@ellerman.id.au