Make the following functions generic to all platforms.
- bad_kuap_fault()
- kuap_assert_locked()
- kuap_save_and_lock() (PPC32 only)
- kuap_kernel_restore()
- kuap_get_and_assert_locked()
And for all platforms except book3s/64
- allow_user_access()
- prevent_user_access()
- prevent_user_access_return()
- restore_user_access()
Prepend __ in front of the name of platform specific ones.
For now the generic just calls the platform specific, but
next patch will move redundant parts of specific functions
into the generic one.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaef143a8dae7288cd34565ffa7b49c16aee1ec3.1634627931.git.christophe.leroy@csgroup.eu
Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.
For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.
Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.
unsigned long test(struct pt_regs *regs)
{
int ret;
WARN_ON(regs->msr & MSR_PR);
return regs->gpr[3];
}
unsigned long test9w(unsigned long a, unsigned long b)
{
if (WARN_ON(!b))
return 0;
return a / b;
}
Before the patch:
000003a8 <test>:
3a8: 81 23 00 84 lwz r9,132(r3)
3ac: 71 29 40 00 andi. r9,r9,16384
3b0: 40 82 00 0c bne 3bc <test+0x14>
3b4: 80 63 00 0c lwz r3,12(r3)
3b8: 4e 80 00 20 blr
3bc: 0f e0 00 00 twui r0,0
3c0: 80 63 00 0c lwz r3,12(r3)
3c4: 4e 80 00 20 blr
0000000000000bf0 <.test9w>:
bf0: 7c 89 00 74 cntlzd r9,r4
bf4: 79 29 d1 82 rldicl r9,r9,58,6
bf8: 0b 09 00 00 tdnei r9,0
bfc: 2c 24 00 00 cmpdi r4,0
c00: 41 82 00 0c beq c0c <.test9w+0x1c>
c04: 7c 63 23 92 divdu r3,r3,r4
c08: 4e 80 00 20 blr
c0c: 38 60 00 00 li r3,0
c10: 4e 80 00 20 blr
After the patch:
000003a8 <test>:
3a8: 81 23 00 84 lwz r9,132(r3)
3ac: 71 29 40 00 andi. r9,r9,16384
3b0: 40 82 00 0c bne 3bc <test+0x14>
3b4: 80 63 00 0c lwz r3,12(r3)
3b8: 4e 80 00 20 blr
3bc: 0f e0 00 00 twui r0,0
0000000000000c50 <.test9w>:
c50: 7c 89 00 74 cntlzd r9,r4
c54: 79 29 d1 82 rldicl r9,r9,58,6
c58: 0b 09 00 00 tdnei r9,0
c5c: 7c 63 23 92 divdu r3,r3,r4
c60: 4e 80 00 20 blr
c70: 38 60 00 00 li r3,0
c74: 4e 80 00 20 blr
In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.
In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.
We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/389962b1b702e3c78d169e59bcfac56282889173.1618331882.git.christophe.leroy@csgroup.eu
Pull powerpc updates from Michael Ellerman:
- A large series adding wrappers for our interrupt handlers, so that
irq/nmi/user tracking can be isolated in the wrappers rather than
spread in each handler.
- Conversion of the 32-bit syscall handling into C.
- A series from Nick to streamline our TLB flushing when using the
Radix MMU.
- Switch to using queued spinlocks by default for 64-bit server CPUs.
- A rework of our PCI probing so that it happens later in boot, when
more generic infrastructure is available.
- Two small fixes to allow 32-bit little-endian processes to run on
64-bit kernels.
- Other smaller features, fixes & cleanups.
Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang
Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian
Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong,
Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan
Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu,
Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen
Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun.
* tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits)
powerpc/perf: Adds support for programming of Thresholding in P10
powerpc/pci: Remove unimplemented prototypes
powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
powerpc/64: Fix stack trace not displaying final frame
powerpc/time: Remove get_tbl()
powerpc/time: Avoid using get_tbl()
spi: mpc52xx: Avoid using get_tbl()
powerpc/syscall: Avoid storing 'current' in another pointer
powerpc/32: Handle bookE debugging in C in syscall entry/exit
powerpc/syscall: Do not check unsupported scv vector on PPC32
powerpc/32: Remove the counter in global_dbcr0
powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
powerpc/syscall: implement system call entry/exit logic in C for PPC32
powerpc/32: Always save non volatile GPRs at syscall entry
powerpc/syscall: Change condition to check MSR_RI
powerpc/syscall: Save r3 in regs->orig_r3
powerpc/syscall: Use is_compat_task()
powerpc/syscall: Make interrupt.c buildable on PPC32
...
This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
Bug: Read fault blocked by KUAP!
WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
..........
Call Trace:
[c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
[c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
[c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
--- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
..........
NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
interrupt: 300
[c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
[c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
[c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
[c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
[c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
[c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
[c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
[c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
[c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
[c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
[c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
The kernel consider thread AMR value for kernel thread to be
AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
of course not correct and we should allow userspace access after
kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
AMR value of the operating address space. But, the AMR value is
thread-specific and we inherit the address space and not thread
access restrictions. Because of this ignore AMR value when accessing
userspace via kernel thread.
current_thread_amr/iamr() are updated, because we use them in the
below stack.
....
[ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3
....
NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
LR [c0000000004b9278] gup_pte_range+0x188/0x420
--- interrupt: 700
[c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
[c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
[c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
[c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
[c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
[c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
[c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
[c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
[c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
[c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
[c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
[c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
[c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
[c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
[c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
[c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
[c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
Fixes: 48a8ab4eeb ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210206025634.521979-1-aneesh.kumar@linux.ibm.com
In commit 8150a153c0 ("powerpc/64s: Use early_mmu_has_feature() in
set_kuap()") we switched the KUAP code to use early_mmu_has_feature(),
to avoid a bug where we called set_kuap() before feature patching had
been done, leading to recursion and crashes.
That path, which called probe_kernel_read() from printk(), has since
been removed, see commit 2ac5a3bf70 ("vsprintf: Do not break early
boot with probing addresses").
Additionally probe_kernel_read() no longer invokes any KUAP routines,
since commit fe557319aa ("maccess: rename probe_kernel_{read,write}
to copy_{from,to}_kernel_nofault") and c331652534 ("powerpc: use
non-set_fs based maccess routines").
So it should now be safe to use mmu_has_feature() in the KUAP
routines, because we shouldn't invoke them prior to feature patching.
This is essentially a revert of commit 8150a153c0 ("powerpc/64s: Use
early_mmu_has_feature() in set_kuap()"), but we've since added a
second usage of early_mmu_has_feature() in get_kuap(), so we convert
that to use mmu_has_feature() as well.
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Depends-on: c331652534 ("powerpc: use non-set_fs based maccess routines").
Link: https://lore.kernel.org/r/20201217005306.895685-1-mpe@ellerman.id.au
This partially reverts commit eb232b1624 ("powerpc/book3s64/kuap: Improve
error reporting with KUAP") and update the fault handler to print
[ 55.022514] Kernel attempted to access user page (7e6725b70000) - exploit attempt? (uid: 0)
[ 55.022528] BUG: Unable to handle kernel data access on read at 0x7e6725b70000
[ 55.022533] Faulting instruction address: 0xc000000000e8b9bc
[ 55.022540] Oops: Kernel access of bad area, sig: 11 [#1]
....
when the kernel access userspace address without unlocking AMR.
bad_kuap_fault() is added as part of commit 5e5be3aed2 ("powerpc/mm: Detect
bad KUAP faults") to catch userspace access incorrectly blocked by AMR. Hence
retain the full stack dump there even with hash translation. Also, add a comment
explaining the difference between hash and radix.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201208031539.84878-1-aneesh.kumar@linux.ibm.com
If FTR_BOOK3S_KUAP is disabled, kernel will continue to run with the same AMR
value with which it was entered. Hence there is a high chance that
we can return without restoring the AMR value. This also helps the case
when applications are not using the pkey feature. In this case, different
applications will have the same AMR values and hence we can avoid restoring
AMR in this case too.
Also avoid isync() if not really needed.
Do the same for IAMR.
null-syscall benchmark results:
With smap/smep disabled:
Without patch:
957.95 ns 2778.17 cycles
With patch:
858.38 ns 2489.30 cycles
With smap/smep enabled:
Without patch:
1017.26 ns 2950.36 cycles
With patch:
1021.51 ns 2962.44 cycles
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-23-aneesh.kumar@linux.ibm.com
This prepare kernel to operate with a different value than userspace AMR/IAMR.
For this, AMR/IAMR need to be saved and restored on entry and return from the
kernel.
With KUAP we modify kernel AMR when accessing user address from the kernel
via copy_to/from_user interfaces. We don't need to modify IAMR value in
similar fashion.
If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering
kernel from userspace. If not we can assume that AMR/IAMR is not modified
from userspace.
We need to save AMR if we have MMU_FTR_BOOK3S_KUAP feature enabled and we are
interrupted within kernel. This is required so that if we get interrupted
within copy_to/from_user we continue with the right AMR value.
If we hae MMU_FTR_BOOK3S_KUEP enabled we need to restore IAMR on
return to userspace beause kernel will be running with a different
IAMR value.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-11-aneesh.kumar@linux.ibm.com
This is in preparation to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.
MMU_FTR_KUEP is renamed to MMU_FTR_BOOK3S_KUEP to indicate the feature
is only relevant to BOOK3S_64
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-8-aneesh.kumar@linux.ibm.com