A BPF program may consist of 1m instructions, which means JIT
instruction-address mapping can be as large as 4m. s390 has
FORCE_MAX_ZONEORDER=9 (for memory hotplug reasons), which means maximum
kmalloc size is 1m. This makes it impossible to JIT programs with more
than 256k instructions.
Fix by using kvcalloc, which falls back to vmalloc for larger
allocations. An alternative would be to use a radix tree, but that is
not supported by bpf_prog_fill_jited_linfo.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191107141838.92202-1-iii@linux.ibm.com
Currently bitops-instrumented.h assumes that the architecture provides
atomic, non-atomic and locking bitops (e.g. both set_bit and __set_bit).
This is true on x86 and s390, but is not always true: there is a
generic bitops/non-atomic.h header that provides generic non-atomic
operations, and also a generic bitops/lock.h for locking operations.
powerpc uses the generic non-atomic version, so it does not have it's
own e.g. __set_bit that could be renamed arch___set_bit.
Split up bitops-instrumented.h to mirror the atomic/non-atomic/lock
split. This allows arches to only include the headers where they
have arch-specific versions to rename. Update x86 and s390.
(The generic operations are automatically instrumented because they're
written in C, not asm.)
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190820024941.12640-1-dja@axtens.net
The current code around calling ftrace_graph_ret_addr() is ifdeffed and
also tests if ftrace redirection is present on stack.
ftrace_graph_ret_addr() however performs the test internally and there
is a version for !CONFIG_FUNCTION_GRAPH_TRACER as well. The unnecessary
code can thus be dropped.
Link: http://lkml.kernel.org/r/20191029143904.24051-2-mbenes@suse.cz
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Some architectures, notably ARM, are interested in tweaking this
depending on their runtime DMA addressing limitations.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The idle time reported in /proc/stat sometimes incorrectly contains
huge values on s390. This is caused by a bug in arch_cpu_idle_time().
The kernel tries to figure out when a different cpu entered idle by
accessing its per-cpu data structure. There is an ordering problem: if
the remote cpu has an idle_enter value which is not zero, and an
idle_exit value which is zero, it is assumed it is idle since
"now". The "now" timestamp however is taken before the idle_enter
value is read.
Which in turn means that "now" can be smaller than idle_enter of the
remote cpu. Unconditionally subtracting idle_enter from "now" can thus
lead to a negative value (aka large unsigned value).
Fix this by moving the get_tod_clock() invocation out of the
loop. While at it also make the code a bit more readable.
A similar bug also exists for show_idle_time(). Fix this is as well.
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
unwind_for_each_frame stops after the first frame if regs->gprs[15] <=
sp.
The reason is that in case regs are specified, the first frame should be
regs->psw.addr and the second frame should be sp->gprs[8]. However,
currently the second frame is regs->gprs[15], which confuses
outside_of_stack().
Fix by introducing a flag to distinguish this special case from
unwinding the interrupt handler, for which the current behavior is
appropriate.
Fixes: 78c98f9074 ("s390/unwind: introduce stack unwind API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.2+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The problem is that we were putting the NUL terminator too far:
buf[sizeof(buf) - 1] = '\0';
If the user input isn't NUL terminated and they haven't initialized the
whole buffer then it leads to an info leak. The NUL terminator should
be:
buf[len - 1] = '\0';
Signed-off-by: Yihui Zeng <yzeng56@asu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[heiko.carstens@de.ibm.com: keep semantics of how *lenp and *ppos are handled]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
perf_callchain_kernel stops neither when it encounters a garbage
address, nor when it runs out of space. Fix both issues using x86
version as an inspiration.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This function must be inlined since any caller expects the current
stack pointer; which wouldn't be true if the function isn't inlined.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Unlike pxd_free_tlb(), the pxd_free() functions do not check for folded
page tables. This is not an issue so far, as those functions will actually
never be called, since no code will reach them when page tables are folded.
In order to avoid future issues, and to make the s390 code more similar to
other architectures, add mm_pxd_folded() checks, similar to how it is done
in pxd_free_tlb().
This was found by testing a patch from from Anshuman Khandual, which is
currently discussed on LKML ("mm/debug: Add tests validating architecture
page table helpers").
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
On older HW or under a hypervisor, w/o the instruction-execution-
protection (IEP) facility, and also w/o EDAT-1, a translation-specification
exception may be recognized when bit 55 of a pte is one (_PAGE_NOEXEC).
The current code tries to prevent setting _PAGE_NOEXEC in such cases,
by removing it within set_pte_at(). However, ptep_set_access_flags()
will modify a pte directly, w/o using set_pte_at(). There is at least
one scenario where this can result in an active pte with _PAGE_NOEXEC
set, which would then lead to a panic due to a translation-specification
exception (write to swapped out page):
do_swap_page
pte = mk_pte (with _PAGE_NOEXEC bit)
set_pte_at (will remove _PAGE_NOEXEC bit in page table, but keep it
in local variable pte)
vmf->orig_pte = pte (pte still contains _PAGE_NOEXEC bit)
do_wp_page
wp_page_reuse
entry = vmf->orig_pte (still with _PAGE_NOEXEC bit)
ptep_set_access_flags (writes entry with _PAGE_NOEXEC bit)
Fix this by clearing _PAGE_NOEXEC already in mk_pte_phys(), where the
pgprot value is applied, so that no pte with _PAGE_NOEXEC will ever be
visible, if it is not supported. The check in set_pte_at() can then also
be removed.
Cc: <stable@vger.kernel.org> # 4.11+
Fixes: 57d7f939e7 ("s390: add no-execute support")
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
For pmds and puds, there are a couple of page table helper functions that
only make sense for large entries, like pxd_(mk)dirty/young/write etc.
We currently explicitly check if the entries are large, but in practice
those functions must never be used for normal entries, which point to lower
level page tables, so the code can be simplified.
This also fixes a theoretical bug, where common code could use one of the
functions before actually marking a pmd large, like this:
pmd = pmd_mkhuge(pmd_mkdirty(pmd))
With the current implementation, the resulting large pmd would not be dirty
as requested. This could in theory result in the loss of dirty information,
e.g. after collapsing into a transparent hugepage. Common code currently
always marks an entry large before using one of the functions, but there is
no hard requirement for this. The only requirement would be that it never
uses the functions for normal entries pointing to lower level page tables,
but they might be called before marking an entry large during its creation.
In order to avoid issues with future common code, and to simplify the page
table helpers, remove the checks for large entries and rely on common code
never using them for normal entries.
This was found by testing a patch from from Anshuman Khandual, which is
currently discussed on LKML ("mm/debug: Add tests validating architecture
page table helpers").
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The semantics of pmd/pud_bad() expect that large entries are reported as
bad, but we also check large entries for sanity.
There is currently no issue with this wrong behaviour, but let's conform
to the semantics by reporting large pmd/pud entries as bad, in order to
prevent future issues.
This was found by testing a patch from from Anshuman Khandual, which is
currently discussed on LKML ("mm/debug: Add tests validating architecture
page table helpers").
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The current implementation of get_clock_monotonic() leaves it up to
the caller to call the function with preemption disabled. The only
core kernel caller (sched_clock) however does not disable preemption.
In order to make sure that all callers of this function see monotonic
values handle disabling preemption within the function itself.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently get_wchan uses custom stack unwinding implementation which
relies on back_chain presence. Replace it with more abstract stack
unwinding api usage.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
unwind_for_each_frame(NULL, NULL, 0) does not return any valid frames.
The reason is that get_stack_pointer, unlike get_stack_info and
show_stack, does not handle NULL argument.
Fix by making get_stack_pointer treat NULL as current, like
get_stack_info and show_stack do.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Tested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
"noexec" option is already parsed during startup and its value is
exposed via noexec_disabled variable. Simply reuse that value during
machine facilities detection.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This is the s390 version of commit 40576e5e63 ("x86: alternative.h:
use asm_inline for all alternative variants").
See commit eb11186930 ("compiler-types.h: add asm_inline
definition") for more details.
With this change the compiler will not generate many out-of-line
versions for the three instruction sized arch_spin_unlock() function
anymore. Due to this gcc seems to change a lot of other inline
decisions which results in a net 6k text size growth according to
bloat-o-meter (gcc 9.2 with defconfig).
But that's still better than having many out-of-line versions of
arch_spin_unlock().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This is the s390 version of commit 32ee8230b2 ("x86: bug.h: use
asm_inline in _BUG_FLAGS definitions").
See commit eb11186930 ("compiler-types.h: add asm_inline
definition") for more details.
Just like on x86 the .text section size decreases a bit while the
.data section size increases about the same amount (gcc 9.2 with
defconfig).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert the glue code for the S390 CPACF implementations of DES-ECB,
DES-CBC, DES-CTR, 3DES-ECB, 3DES-CBC, and 3DES-CTR from the deprecated
"blkcipher" API to the "skcipher" API. This is needed in order for the
blkcipher API to be removed.
Note: I made CTR use the same function for encryption and decryption,
since CTR encryption and decryption are identical.
Signed-off-by: Eric Biggers <ebiggers@google.com>
reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the glue code for the S390 CPACF protected key implementations
of AES-ECB, AES-CBC, AES-XTS, and AES-CTR from the deprecated
"blkcipher" API to the "skcipher" API. This is needed in order for the
blkcipher API to be removed.
Note: I made CTR use the same function for encryption and decryption,
since CTR encryption and decryption are identical.
Signed-off-by: Eric Biggers <ebiggers@google.com>
reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Convert the glue code for the S390 CPACF implementations of AES-ECB,
AES-CBC, AES-XTS, and AES-CTR from the deprecated "blkcipher" API to the
"skcipher" API. This is needed in order for the blkcipher API to be
removed.
Note: I made CTR use the same function for encryption and decryption,
since CTR encryption and decryption are identical.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined
__weak symbol, which results in an R_390_GLOB_DAT relocation type. That
is not yet handled by the KASLR relocation code, and the kernel stops with
the message "Unknown relocation type".
Add code to detect and handle R_390_GLOB_DAT relocation types and undefined
symbols.
Fixes: 805bc0bc23 ("s390/kernel: build a relocatable kernel")
Cc: <stable@vger.kernel.org> # v5.2+
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Depending on inlining decisions by the compiler, __get/put_user_fn
might become out of line. Then the compiler is no longer able to tell
that size can only be 1,2,4 or 8 due to the check in __get/put_user
resulting in false positives like
./arch/s390/include/asm/uaccess.h: In function ‘__put_user_fn’:
./arch/s390/include/asm/uaccess.h:113:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
113 | return rc;
| ^~
./arch/s390/include/asm/uaccess.h: In function ‘__get_user_fn’:
./arch/s390/include/asm/uaccess.h:143:9: warning: ‘rc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
143 | return rc;
| ^~
These functions are supposed to be always inlined. Mark it as such.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
To analyze some performance issues with lock contention and scheduling
it is nice to know when diag9c did not result in any action or when
no action was tried.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The names for the z13s and z14 ZR1 machines are missing for the
TUNE_Z13 and TUNE_Z14 descriptions. Just add them.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Make use of 'depends on cc-option' to only display those Kconfig
options for which compiler support is available.
Add this for the MARCH and TUNE options which are the only options
which may result in compile errors if the selected architecture is not
supported by the compiler.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
__insn32_query() will not compile if the compiler decides to not
inline it, since it contains an inline assembly with an "i" constraint
with variable contents.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The inline assembly constraints of __insn32_query() tell the compiler
that only the first byte of "query" is being written to. Intended was
probably that 32 bytes are written to.
Fix and simplify the code and just use a "memory" clobber.
Fixes: d668139718 ("KVM: s390: provide query function for instructions returning 32 byte")
Cc: stable@vger.kernel.org # v5.2+
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Always inline asm inlines with variable operands for "i" constraints,
since they won't compile if the compiler would decide to not inline
them.
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Convert two functions to static inline to get ride of W=1 GCC warnings
like,
mm/gup.c: In function 'gup_pte_range':
mm/gup.c:1816:16: warning: variable 'ptem' set but not used
[-Wunused-but-set-variable]
pte_t *ptep, *ptem;
^~~~
mm/mmap.c: In function 'acct_stack_growth':
mm/mmap.c:2322:16: warning: variable 'new_start' set but not used
[-Wunused-but-set-variable]
unsigned long new_start;
^~~~~~~~~
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/lkml/1570138596-11913-1-git-send-email-cai@lca.pw/
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>