If unwinder is looking at pt_regs which is not on stack then something
went wrong and an error has to be reported rather than successful
unwinding termination.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
CALL_ON_STACK is intended to be used for temporary stack switching with
potential return to the caller.
When CALL_ON_STACK is misused to switch from nodat stack to task stack
back_chain information would later lead stack unwinder from task stack into
(per cpu) nodat stack which is reused for other purposes. This would
yield confusing unwinding result or errors.
To avoid that introduce CALL_ON_STACK_NORETURN to be used instead. It
makes sure that back_chain is zeroed and unwinder finishes gracefully
ending up at task pt_regs.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Avoid mixture of task == NULL and task == current meaning the same
thing and simply always initialize task with current in unwind_start.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Make sure preemption is disabled when temporary switching to nodat
stack with CALL_ON_STACK helper, because nodat stack is per cpu.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
getcpu reads the required values for cpu and node with two
instructions. This might lead to an inconsistent result if user space
gets preempted and migrated to a different CPU between the two
instructions.
Fix this by using just a single instruction to read both values at
once.
This is currently rather a theoretical bug, since there is no real
NUMA support available (except for NUMA emulation).
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When a secondary CPU is brought up it must initialize its control
registers. CPU A which triggers that a secondary CPU B is brought up
stores its control register contents into the lowcore of new CPU B,
which then loads these values on startup.
This is problematic in various ways: the control register which
contains the home space ASCE will correctly contain the kernel ASCE;
however control registers for primary and secondary ASCEs are
initialized with whatever values were present in CPU A.
Typically:
- the primary ASCE will contain the user process ASCE of the process
that triggered onlining of CPU B.
- the secondary ASCE will contain the percpu VDSO ASCE of CPU A.
Due to lazy ASCE handling we may also end up with other combinations.
When then CPU B switches to a different process (!= idle) it will
fixup the primary ASCE. However the problem is that the (wrong) ASCE
from CPU A was loaded into control register 1: as soon as an ASCE is
attached (aka loaded) a CPU is free to generate TLB entries using that
address space.
Even though it is very unlikey that CPU B will actually generate such
entries, this could result in TLB entries of the address space of the
process that ran on CPU A. These entries shouldn't exist at all and
could cause problems later on.
Furthermore the secondary ASCE of CPU B will not be updated correctly.
This means that processes may see wrong results or even crash if they
access VDSO data on CPU B. The correct VDSO ASCE will eventually be
loaded on return to user space as soon as the kernel executed a call
to strnlen_user or an atomic futex operation on CPU B.
Fix both issues by intializing the to be loaded control register
contents with the correct ASCEs and also enforce (re-)loading of the
ASCEs upon first context switch and return to user space.
Fixes: 0aaba41b58 ("s390: remove all code using the access register mode")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Pull scheduler updates from Ingo Molnar:
"The biggest changes in this cycle were:
- Make kcpustat vtime aware (Frederic Weisbecker)
- Rework the CFS load_balance() logic (Vincent Guittot)
- Misc cleanups, smaller enhancements, fixes.
The load-balancing rework is the most intrusive change: it replaces
the old heuristics that have become less meaningful after the
introduction of the PELT metrics, with a grounds-up load-balancing
algorithm.
As such it's not really an iterative series, but replaces the old
load-balancing logic with the new one. We hope there are no
performance regressions left - but statistically it's highly probable
that there *is* going to be some workload that is hurting from these
chnages. If so then we'd prefer to have a look at that workload and
fix its scheduling, instead of reverting the changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
rackmeter: Use vtime aware kcpustat accessor
leds: Use all-in-one vtime aware kcpustat accessor
cpufreq: Use vtime aware kcpustat accessors for user time
procfs: Use all-in-one vtime aware kcpustat accessor
sched/vtime: Bring up complete kcpustat accessor
sched/cputime: Support other fields on kcpustat_field()
sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util()
sched/fair: Add comments for group_type and balancing at SD_NUMA level
sched/fair: Fix rework of find_idlest_group()
sched/uclamp: Fix overzealous type replacement
sched/Kconfig: Fix spelling mistake in user-visible help text
sched/core: Further clarify sched_class::set_next_task()
sched/fair: Use mul_u32_u32()
sched/core: Simplify sched_class::pick_next_task()
sched/core: Optimize pick_next_task()
sched/core: Make pick_next_task_idle() more consistent
sched/fair: Better document newidle_balance()
leds: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
cpufreq: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
procfs: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- Cross-arch changes to move the linker sections for NOTES and
EXCEPTION_TABLE into the RO_DATA area, where they belong on most
architectures. (Kees Cook)
- Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
trap jumps into the middle of those padding areas instead of
sliding execution. (Kees Cook)
- A thorough cleanup of symbol definitions within x86 assembler code.
The rather randomly named macros got streamlined around a
(hopefully) straightforward naming scheme:
SYM_START(name, linkage, align...)
SYM_END(name, sym_type)
SYM_FUNC_START(name)
SYM_FUNC_END(name)
SYM_CODE_START(name)
SYM_CODE_END(name)
SYM_DATA_START(name)
SYM_DATA_END(name)
etc - with about three times of these basic primitives with some
label, local symbol or attribute variant, expressed via postfixes.
No change in functionality intended. (Jiri Slaby)
- Misc other changes, cleanups and smaller fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
x86/entry/64: Remove pointless jump in paranoid_exit
x86/entry/32: Remove unused resume_userspace label
x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
m68k: Convert missed RODATA to RO_DATA
x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
x86/mm: Report actual image regions in /proc/iomem
x86/mm: Report which part of kernel image is freed
x86/mm: Remove redundant address-of operators on addresses
xtensa: Move EXCEPTION_TABLE to RO_DATA segment
powerpc: Move EXCEPTION_TABLE to RO_DATA segment
parisc: Move EXCEPTION_TABLE to RO_DATA segment
microblaze: Move EXCEPTION_TABLE to RO_DATA segment
ia64: Move EXCEPTION_TABLE to RO_DATA segment
h8300: Move EXCEPTION_TABLE to RO_DATA segment
c6x: Move EXCEPTION_TABLE to RO_DATA segment
arm64: Move EXCEPTION_TABLE to RO_DATA segment
alpha: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
x86/vmlinux: Actually use _etext for the end of the text segment
vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
...
- Adjust PMU device drivers registration to avoid WARN_ON and few other
perf improvements.
- Enhance tracing in vfio-ccw.
- Few stack unwinder fixes and improvements, convert get_wchan custom
stack unwinding to generic api usage.
- Fixes for mm helpers issues uncovered with tests validating architecture
page table helpers.
- Fix noexec bit handling when hardware doesn't support it.
- Fix memleak and unsigned value compared with zero bugs in crypto
code. Minor code simplification.
- Fix crash during kdump with kasan enabled kernel.
- Switch bug and alternatives from asm to asm_inline to improve inlining
decisions.
- Use 'depends on cc-option' for MARCH and TUNE options in Kconfig,
add z13s and z14 ZR1 to TUNE descriptions.
- Minor head64.S simplification.
- Fix physical to logical CPU map for SMT.
- Several cleanups in qdio code.
- Other minor cleanups and fixes all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl3ahGYACgkQjYWKoQLX
FBguwAgAig+FNos8zkd7Sr2wg4DPL2IlYERVP40fOLXfGuVUOnMLg8OTO6yDWDpH
5+cKAQS1wWgyvlfjWRUJ6anXLBAsgKRD1nyFIZTpn/wArGk/duCbnl/VFriDgrST
8KTQDJpZ9w9nXtQ7lA2QWaw5U2WG8I2T2JuQJCdLXze7RXi0bDVe8e6131NMaJ42
LLxqOqm8d8XDnd8oDVP04LT5IfhuI2cILoGBP/GyI2fqQk9Ems6M2gxuISq1COmy
WORDLfwWyCLeF7gWKKjxf8Vo1HYcyoFvdXnxWiHb0TDZesQZJr/LLELTP03fbCW9
U4jbXncnnPA7kT4tlC95jT5M69yK5w==
=+FxG
-----END PGP SIGNATURE-----
Merge tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Adjust PMU device drivers registration to avoid WARN_ON and few other
perf improvements.
- Enhance tracing in vfio-ccw.
- Few stack unwinder fixes and improvements, convert get_wchan custom
stack unwinding to generic api usage.
- Fixes for mm helpers issues uncovered with tests validating
architecture page table helpers.
- Fix noexec bit handling when hardware doesn't support it.
- Fix memleak and unsigned value compared with zero bugs in crypto
code. Minor code simplification.
- Fix crash during kdump with kasan enabled kernel.
- Switch bug and alternatives from asm to asm_inline to improve
inlining decisions.
- Use 'depends on cc-option' for MARCH and TUNE options in Kconfig, add
z13s and z14 ZR1 to TUNE descriptions.
- Minor head64.S simplification.
- Fix physical to logical CPU map for SMT.
- Several cleanups in qdio code.
- Other minor cleanups and fixes all over the code.
* tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
s390/cpumf: Adjust registration of s390 PMU device drivers
s390/smp: fix physical to logical CPU map for SMT
s390/early: move access registers setup in C code
s390/head64: remove unnecessary vdso_per_cpu_data setup
s390/early: move control registers setup in C code
s390/kasan: support memcpy_real with TRACE_IRQFLAGS
s390/crypto: Fix unsigned variable compared with zero
s390/pkey: use memdup_user() to simplify code
s390/pkey: fix memory leak within _copy_apqns_from_user()
s390/disassembler: don't hide instruction addresses
s390/cpum_sf: Assign error value to err variable
s390/cpum_sf: Replace function name in debug statements
s390/cpum_sf: Use consistant debug print format for sampling
s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
s390: add error handling to perf_callchain_kernel
s390: always inline current_stack_pointer()
s390/mm: add mm_pxd_folded() checks to pxd_free()
s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
s390/mm: simplify page table helpers for large entries
s390/mm: make pmd/pud_bad() report large entries as bad
...
Linux-next commit titled "perf/core: Optimize perf_init_event()"
changed the semantics of PMU device driver registration.
It was done to speed up the lookup/handling of PMU device driver
specific events. It also enforces that only one PMU device
driver will be registered of type PERF_EVENT_RAW.
This change added these line in function perf_pmu_register():
...
+ ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
+ if (ret < 0)
goto free_pdc;
+
+ WARN_ON(type >= 0 && ret != type);
The warn_on generates a message. We have 3 PMU device drivers,
each registered as type PERF_TYPE_RAW.
The cf_diag device driver (arch/s390/kernel/perf_cpumf_cf_diag.c)
always hits the WARN_ON because it is the second PMU device driver
(after sampling device driver arch/s390/kernel/perf_cpumf_sf.c)
which is registered as type 4 (PERF_TYPE_RAW).
So when the sampling device driver is registered, ret has value 4.
When cf_diag device driver is registered with type 4,
ret has value of 5 and WARN_ON fires.
Adjust the PMU device drivers for s390 to support the new
semantics required by perf_pmu_register().
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
If an SMT capable system is not IPL'ed from the first CPU the setup of
the physical to logical CPU mapping is broken: the IPL core gets CPU
number 0, but then the next core gets CPU number 1. Correct would be
that all SMT threads of CPU 0 get the subsequent logical CPU numbers.
This is important since a lot of code (like e.g. the CPU topology
code) assumes that CPU maps are setup like this. If the mapping is
broken the system will not IPL due to broken topology masks:
[ 1.716341] BUG: arch topology broken
[ 1.716342] the SMT domain not a subset of the MC domain
[ 1.716343] BUG: arch topology broken
[ 1.716344] the MC domain not a subset of the BOOK domain
This scenario can usually not happen since LPARs are always IPL'ed
from CPU 0 and also re-IPL is intiated from CPU 0. However older
kernels did initiate re-IPL on an arbitrary CPU. If therefore a re-IPL
from an old kernel into a new kernel is initiated this may lead to
crash.
Fix this by setting up the physical to logical CPU mapping correctly.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
vdso_per_cpu_data lowcore value is only needed for fully functional
exception handlers, which are activated in setup_lowcore_dat_off. The
same function does init vdso_per_cpu_data via vdso_alloc_boot_cpu.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Due to kptr_restrict, JITted BPF code is now displayed like this:
000000000b6ed1b2: ebdff0800024 stmg %r13,%r15,128(%r15)
000000004cde2ba0: 41d0f040 la %r13,64(%r15)
00000000fbad41b0: a7fbffa0 aghi %r15,-96
Leaking kernel addresses to dmesg is not a concern in this case, because
this happens only when JIT debugging is explicitly activated, which only
root can do.
Use %px in this particular instance, and also to print an instruction
address in show_code and PCREL (e.g. brasl) arguments in print_insn.
While at present functionally equivalent to %016lx, %px is recommended
by Documentation/core-api/printk-formats.rst for such cases.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When starting the CPU Measurement sampling facility using
qsi() function, this function may return an error value.
This error value is referenced in the else part of the
if statement to dump its value in a debug statement.
Right now this value is always zero because it has not been
assigned a value.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Replace hard coded function names in debug statements
by the "%s ...", __func__ construct suggested by checkpatch.pl
script.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use consistant debug print format of the form variable
blank value. Also add leading 0x for all hex values.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The current code around calling ftrace_graph_ret_addr() is ifdeffed and
also tests if ftrace redirection is present on stack.
ftrace_graph_ret_addr() however performs the test internally and there
is a version for !CONFIG_FUNCTION_GRAPH_TRACER as well. The unnecessary
code can thus be dropped.
Link: http://lkml.kernel.org/r/20191029143904.24051-2-mbenes@suse.cz
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The idle time reported in /proc/stat sometimes incorrectly contains
huge values on s390. This is caused by a bug in arch_cpu_idle_time().
The kernel tries to figure out when a different cpu entered idle by
accessing its per-cpu data structure. There is an ordering problem: if
the remote cpu has an idle_enter value which is not zero, and an
idle_exit value which is zero, it is assumed it is idle since
"now". The "now" timestamp however is taken before the idle_enter
value is read.
Which in turn means that "now" can be smaller than idle_enter of the
remote cpu. Unconditionally subtracting idle_enter from "now" can thus
lead to a negative value (aka large unsigned value).
Fix this by moving the get_tod_clock() invocation out of the
loop. While at it also make the code a bit more readable.
A similar bug also exists for show_idle_time(). Fix this is as well.
Cc: <stable@vger.kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
unwind_for_each_frame stops after the first frame if regs->gprs[15] <=
sp.
The reason is that in case regs are specified, the first frame should be
regs->psw.addr and the second frame should be sp->gprs[8]. However,
currently the second frame is regs->gprs[15], which confuses
outside_of_stack().
Fix by introducing a flag to distinguish this special case from
unwinding the interrupt handler, for which the current behavior is
appropriate.
Fixes: 78c98f9074 ("s390/unwind: introduce stack unwind API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: stable@vger.kernel.org # v5.2+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
perf_callchain_kernel stops neither when it encounters a garbage
address, nor when it runs out of space. Fix both issues using x86
version as an inspiration.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently get_wchan uses custom stack unwinding implementation which
relies on back_chain presence. Replace it with more abstract stack
unwinding api usage.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
"noexec" option is already parsed during startup and its value is
exposed via noexec_disabled variable. Simply reuse that value during
machine facilities detection.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit "bpf: Process in-kernel BTF" in linux-next introduced an undefined
__weak symbol, which results in an R_390_GLOB_DAT relocation type. That
is not yet handled by the KASLR relocation code, and the kernel stops with
the message "Unknown relocation type".
Add code to detect and handle R_390_GLOB_DAT relocation types and undefined
symbols.
Fixes: 805bc0bc23 ("s390/kernel: build a relocatable kernel")
Cc: <stable@vger.kernel.org> # v5.2+
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
vtime_account_system() decides if we need to account the time to the
system (__vtime_account_system()) or to the guest (vtime_account_guest()).
So this function is a misnomer as we are on a higher level than
"system". All we know when we call that function is that we are
accounting kernel cputime. Whether it belongs to guest or system time
is a lower level detail.
Rename this function to vtime_account_kernel(). This will clarify things
and avoid too many underscored vtime_account_system() versions.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Link: https://lkml.kernel.org/r/20191003161745.28464-2-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix indentation in the s390 CPU Measuement Facility
sampling device dirver.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
s390 IBM z15 introduces a check if the CPU Mesurement Facility
sampling is temporarily unavailable. If this is the case return -EBUSY
and abort the setup of CPU Measuement facility sampling.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use consistant debug print format of the form variable
blank value.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
Pull integrity updates from Mimi Zohar:
"The major feature in this time is IMA support for measuring and
appraising appended file signatures. In addition are a couple of bug
fixes and code cleanup to use struct_size().
In addition to the PE/COFF and IMA xattr signatures, the kexec kernel
image may be signed with an appended signature, using the same
scripts/sign-file tool that is used to sign kernel modules.
Similarly, the initramfs may contain an appended signature.
This contained a lot of refactoring of the existing appended signature
verification code, so that IMA could retain the existing framework of
calculating the file hash once, storing it in the IMA measurement list
and extending the TPM, verifying the file's integrity based on a file
hash or signature (eg. xattrs), and adding an audit record containing
the file hash, all based on policy. (The IMA support for appended
signatures patch set was posted and reviewed 11 times.)
The support for appended signature paves the way for adding other
signature verification methods, such as fs-verity, based on a single
system-wide policy. The file hash used for verifying the signature and
the signature, itself, can be included in the IMA measurement list"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: ima_api: Use struct_size() in kzalloc()
ima: use struct_size() in kzalloc()
sefltest/ima: support appended signatures (modsig)
ima: Fix use after free in ima_read_modsig()
MODSIGN: make new include file self contained
ima: fix freeing ongoing ahash_request
ima: always return negative code for error
ima: Store the measurement again when appraising a modsig
ima: Define ima-modsig template
ima: Collect modsig
ima: Implement support for module-style appended signatures
ima: Factor xattr_verify() out of ima_appraise_measurement()
ima: Add modsig appraise_type option for module-style appended signatures
integrity: Select CONFIG_KEYS instead of depending on it
PKCS#7: Introduce pkcs7_get_digest()
PKCS#7: Refactor verify_pkcs7_signature()
MODSIGN: Export module signature definitions
ima: initialize the "template" field with the default template
arch_update_cpu_topology is first called from:
kernel_init_freeable->sched_init_smp->sched_init_domains
even before cpus has been registered in:
kernel_init_freeable->do_one_initcall->s390_smp_init
Do not trigger kobject_uevent change events until cpu devices are
actually created. Fixes the following kasan findings:
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb40/0xee0
Read of size 8 at addr 0000000000000020 by task swapper/0/1
BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb36/0xee0
Read of size 8 at addr 0000000000000018 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
([<0000000143c6db7e>] show_stack+0x14e/0x1a8)
[<0000000145956498>] dump_stack+0x1d0/0x218
[<000000014429fb4c>] print_address_description+0x64/0x380
[<000000014429f630>] __kasan_report+0x138/0x168
[<0000000145960b96>] kobject_uevent_env+0xb36/0xee0
[<0000000143c7c47c>] arch_update_cpu_topology+0x104/0x108
[<0000000143df9e22>] sched_init_domains+0x62/0xe8
[<000000014644c94a>] sched_init_smp+0x3a/0xc0
[<0000000146433a20>] kernel_init_freeable+0x558/0x958
[<000000014599002a>] kernel_init+0x22/0x160
[<00000001459a71d4>] ret_from_fork+0x28/0x30
[<00000001459a71dc>] kernel_thread_starter+0x0/0x10
Cc: stable@vger.kernel.org
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
A perf_event can be set up to deliver overflow notifications
via SIGIO signal. The setup of the event is:
1. create event with perf_event_open()
2. assign it a signal for I/O notification with fcntl()
3. Install signal handler and consume samples
The initial setup of perf_event_open() determines the
period/frequency time span needed to elapse before each signal
is delivered to the user process.
While the event is active, system call
ioctl(.., PERF_EVENT_IOC_PERIOD, value) can be used the change
the frequency/period time span of the active event.
The remaining signal handler invocations honour the new value.
This does not work on s390. In fact the time span does not change
regardless of ioctl's third argument 'value'. The call succeeds
but the time span does not change.
Support this behavior and make it common with other platforms.
This is achieved by changing the interval value of the sampling
control block accordingly and feed this new value every time
the event is enabled using pmu_event_enable().
Before this change the interval value was set only once at
pmu_event_add() and never changed.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Add support for IBM z15 machines.
- Add SHA3 and CCA AES cipher key support in zcrypt and pkey refactoring.
- Move to arch_stack_walk infrastructure for the stack unwinder.
- Various kasan fixes and improvements.
- Various command line parsing fixes.
- Improve decompressor phase debuggability.
- Lift no bss usage restriction for the early code.
- Use refcount_t for reference counters for couple of places in
mm code.
- Logging improvements and return code fix in vfio-ccw code.
- Couple of zpci fixes and minor refactoring.
- Remove some outdated documentation.
- Fix secure boot detection.
- Other various minor code clean ups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl1/pRoACgkQjYWKoQLX
FBjxLQf/Y1nlmoc8URLqaqfNTczIvUzdfXuahI7L75RoIIiqHtcHBrVwauSr7Lma
XVRzK/+6q0UPISrOIZEEtQKsMMM7rGuUv/+XTyrOB/Tsc31kN2EIRXltfXI/lkb8
BZdgch4Xs2rOD7y6TvqpYJsXYXsnLMWwCk8V+48V/pok4sEgMDgh0bTQRHPHYmZ6
1cv8ZQ0AeuVxC6ChM30LhajGRPkYd8RQ82K7fU7jxT0Tjzu66SyrW3pTwA5empBD
RI2yBZJ8EXwJyTCpvN8NKiBgihDs9oUZl61Dyq3j64Mb1OuNUhxXA/8jmtnGn0ok
O9vtImCWzExhjSMkvotuhHEC05nEEQ==
=LCgE
-----END PGP SIGNATURE-----
Merge tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for IBM z15 machines.
- Add SHA3 and CCA AES cipher key support in zcrypt and pkey
refactoring.
- Move to arch_stack_walk infrastructure for the stack unwinder.
- Various kasan fixes and improvements.
- Various command line parsing fixes.
- Improve decompressor phase debuggability.
- Lift no bss usage restriction for the early code.
- Use refcount_t for reference counters for couple of places in mm
code.
- Logging improvements and return code fix in vfio-ccw code.
- Couple of zpci fixes and minor refactoring.
- Remove some outdated documentation.
- Fix secure boot detection.
- Other various minor code clean ups.
* tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390: remove pointless drivers-y in drivers/s390/Makefile
s390/cpum_sf: Fix line length and format string
s390/pci: fix MSI message data
s390: add support for IBM z15 machines
s390/crypto: Support for SHA3 via CPACF (MSA6)
s390/startup: add pgm check info printing
s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
vfio-ccw: fix error return code in vfio_ccw_sch_init()
s390: vfio-ap: fix warning reset not completed
s390/base: remove unused s390_base_mcck_handler
s390/sclp: Fix bit checked for has_sipl
s390/zcrypt: fix wrong handling of cca cipher keygenflags
s390/kasan: add kdump support
s390/setup: avoid using strncmp with hardcoded length
s390/sclp: avoid using strncmp with hardcoded length
s390/module: avoid using strncmp with hardcoded length
s390/pci: avoid using strncmp with hardcoded length
s390/kaslr: reserve memory for kasan usage
s390/mem_detect: provide single get_mem_detect_end
s390/cmma: reuse kstrtobool for option value parsing
...
Rewrite some lines to match line length and replace
format string 0x%x to %#x. Add and remove blank line.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add detection for machine types 0x8562 and 8x8561 and set the ELF platform
name to z15. Add the miscellaneous-instruction-extension 3 facility to
the list of facilities for z15.
And allow to generate code that only runs on a z15 machine.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
I accidentally typoed this #ifdef, so verification would always be
disabled.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reported-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
s390_base_mcck_handler was used during system reset if diag308 set was
not available. But after commit d485235b00 ("s390: assume diag308 set
always works") is a dead code and could be removed.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Replace strncmp usage in console mode setup code with simple strcmp.
Replace strncmp which is used for prefix comparison with str_has_prefix.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
"earlyprintk" option documentation does not clearly state which
platform supports which additional values (e.g. ",keep"). Preserve old
option behaviour and reuse str_has_prefix instead of strncmp for prefix
testing.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reuse str_has_prefix instead of strncmp with hardcoded length to
make the intent of a comparison more obvious.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
"vdso" option setup already recognises integer and textual values. Yet
kstrtobool is a more common way to parse boolean values, reuse it to
unify option value parsing behavior and simplify code a bit.
While at it, __setup value parsing callbacks are expected to return
1 when an option is recognized, and returning any other value won't
trigger any error message currently, so simply return 1.
Also don't change default vdso_enabled value of 1 when "vdso" option
value is invalid.
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use common arch_stack_walk infrastructure to avoid duplicated code and
avoid taking care of the stack storage and filtering.
Common code also uses try_get_task_stack/put_task_stack when needed which
have been missing in our code, which also solves potential problem for us.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reading other running task's stack can be a dangerous endeavor. Kasan
stack memory access instrumentation includes special prologue and epilogue
to mark/remove red zones in shadow memory between stack variables. For
that reason there is always a race between a task reading value in other
task's stack and that other task returning from a function and entering
another one generating different red zones pattern.
To avoid kasan reports simply perform uninstrumented memory reads.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
With THREAD_INFO_IN_TASK (which is selected on s390) task's stack usage
is refcounted and should always be protected by get/put when touching
other task's stack to avoid race conditions with task's destruction code.
Fixes: d5c352cdd0 ("s390: move thread_info into task_struct")
Cc: stable@vger.kernel.org # v4.10+
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Clean uncompressed kernel .bss section in the startup code before
the uncompressed kernel is executed. At this point of time initrd and
certificates have been already rescued. Uncompressed kernel .bss size
is known from vmlinux_info. It is also taken into consideration during
uncompressed kernel positioning by kaslr (so it is safe to clean it).
With that uncompressed kernel is starting with .bss section zeroed and
no .bss section usage restrictions apply. Which makes chkbss checks for
uncompressed kernel objects obsolete and they can be removed.
early_nobss.c is also not needed anymore. Parts of it which are still
relevant are moved to early.c. Kasan initialization code is now called
directly from head64 (early.c is instrumented and should not be
executed before kasan shadow memory is set up).
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Few other crucial memory setup options are already handled in
the startup code. Those values are needed by kaslr and kasan
implementations. "vmalloc" is the last piece required for future
improvements such as early decision on kernel page levels depth required
for actual memory setup, as well as vmalloc memory area access monitoring
in kasan.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This is a preparatory patch for kexec_file_load() lockdown. A locked down
kernel needs to prevent unsigned kernel images from being loaded with
kexec_file_load(). Currently, the only way to force the signature
verification is compiling with KEXEC_VERIFY_SIG. This prevents loading
usigned images even when the kernel is not locked down at runtime.
This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE.
Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG
turns on the signature verification but allows unsigned images to be
loaded. KEXEC_SIG_FORCE disallows images without a valid signature.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
cc: kexec@lists.infradead.org
Signed-off-by: James Morris <jmorris@namei.org>
s390 does not map the vdso for statically linked binaries, assuming
that this doesn't make sense. See commit fc5243d98a ("[S390]
arch_setup_additional_pages arguments").
However with glibc commit d665367f596d ("linux: Enable vDSO for static
linking as default (BZ#19767)") and commit 5e855c895401 ("s390: Enable
VDSO for static linking") the vdso is also used for statically linked
binaries - if the kernel would make it available.
Therefore map the vdso always, just like all other architectures.
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Perf relies on _etext and _stext symbols being one of 't', 'T', 'v' or
'V'. Put them into .text section to guarantee that.
Also moves padding to page boundary inside .text which has an effect that
.text section is now padded with nops rather than 0's, which apparently
has been the initial intention for specifying 0x0700 fill expression.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Andreas Krebbel <krebbel@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Cleanup labels in head64 some of which are not being used since git
recorded history.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove pointless stack recursion on stack type ... warning, which
only confuses people. There is no way to make backchain unwinder 100%
reliable. When a task is interrupted in-between stack frame allocation
and backchain write instructions new stack frame backchain pointer is
left uninitialized (there are also sometimes additional instruction
in-between stack frame allocation and backchain write instructions due
to gcc shrink-wrapping). In attempt to unwind such stack the unwinder
would still try to use that invalid backchain value and perform all kind
of sanity checks on it to make sure we are not pointed out of stack. In
some cases that invalid backchain value would be 0 and we would falsely
treat next stackframe as pt_regs and again gprs[15] in those pt_regs
might happen to point at some address within the task's stack.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
After some investigation it doesn't look like init_mm fields
start_code/end_code are used anywhere besides potentially in dump_mm for
debugging purposes. Originally the value of 0 for start_code reflected
the presence of lowcore and early boot code. But with kaslr in place
start_code/end_code range should not span over unoccupied by the code
segment memory. So, adjust init_mm start_code to point at the beginning
of the code segment like other architectures do it.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This reverts commit db9492cef4 ("s390/protvirt: add memory sharing for
diag 308 set/store") which due to ultravisor implementation change is
not needed after all.
Fixes: db9492cef4 ("s390/protvirt: add memory sharing for diag 308 set/store")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
IMA will use the module_signature format for append signatures, so export
the relevant definitions and factor out the code which verifies that the
appended signature trailer is valid.
Also, create a CONFIG_MODULE_SIG_FORMAT option so that IMA can select it
and be able to use mod_check_sig() without having to depend on either
CONFIG_MODULE_SIG or CONFIG_MODULES.
s390 duplicated the definition of struct module_signature so now they can
use the new <linux/module_signature.h> header instead.
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Include <asm/kexec.h> into machine_kexec_reloc.c to expose
arch_kexec_do_relocs declaration and avoid the following sparse warnings:
arch/s390/kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?
arch/s390/boot/../kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since there is really no reason for cf_diag_csd per cpu variable to be
globally visible make it static to avoid the following sparse warning:
arch/s390/kernel/perf_cpum_cf_diag.c:37:1: warning: symbol 'cf_diag_csd' was not declared. Should it be static?
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Tested (64-bit and compat mode) using program from
http://lkml.kernel.org/r/20190604212930.jaaztvkent32b7d3@brauner.io
with the following:
return syscall(__NR_clone, flags, 0, pidfd, 0, 0);
changed to:
return syscall(__NR_clone, 0, flags, pidfd, 0, 0);
due to CLONE_BACKWARDS2.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range. This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.
On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.
The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:
$ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
248
Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.
This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:
# scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
Data old new delta
sysctl_vals - 12 +12
__kstrtab_sysctl_vals - 12 +12
max 14 10 -4
int_max 16 - -16
one 68 - -68
zero 128 28 -100
Total: Before=20583249, After=20583085, chg -0.00%
[mcroce@redhat.com: tipc: remove two unused variables]
Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix integer overflow during stack frame unwind with invalid backchain.
- Cleanup unused symbol export in zcrypt code.
- Fix MIO addressing control activation in PCI code and expose its
usage via sysfs.
- Fix kernel image signature verification report presence detection.
- Fix irq registration in vfio-ap code.
- Add CPU measurement counters for newer machines.
- Add base DASD thin provisioning support and code cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl0oirgACgkQjYWKoQLX
FBiL7gf+MOToP48a3h+lLcIrbH48B2+OR3W+kIID5qR0GtDoPU2gd2HtSwrn9frs
jgh5ZLwGgrnSU/MqFXpCwDfD7x0mSWL/HunlSck1zf6h22LmuYjcjntWzHTS7csv
gTFKNSQX4AdgZAdTEeqC+Axem2ygtpELhe35NPT8HBQD52twe1XkniDl8gu7/zj2
WdPAGsm30wlodIUFt+di4IQAxSQGkDl9nKN+IIREPjdq/kFKP3pFKzXf5k0FduVk
v+VOb+KENzeJoqPqRD/GS+vdVm52gqRbKrJVm+p5NlltGDPoa0m3u4Cjy2ZVA2/t
+BKv4aGHseb6x89XTPIaf1dTLDNf6w==
=GxOr
-----END PGP SIGNATURE-----
Merge tag 's390-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Fix integer overflow during stack frame unwind with invalid
backchain.
- Cleanup unused symbol export in zcrypt code.
- Fix MIO addressing control activation in PCI code and expose its
usage via sysfs.
- Fix kernel image signature verification report presence detection.
- Fix irq registration in vfio-ap code.
- Add CPU measurement counters for newer machines.
- Add base DASD thin provisioning support and code cleanups.
* tag 's390-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
s390/unwind: avoid int overflow in outside_of_stack
s390/zcrypt: remove the exporting of ap_query_configuration
s390/pci: add mio_enabled attribute
s390: fix setting of mio addressing control
s390/ipl: Fix detection of has_secure attribute
s390: vfio-ap: fix irq registration
s390/cpumf: Add extended counter set definitions for model 8561 and 8562
s390/dasd: Handle out-of-space constraint
s390/dasd: Add discard support for ESE volumes
s390/dasd: Use ALIGN_DOWN macro
s390/dasd: Make dasd_setup_queue() a discipline function
s390/dasd: Add new ioctl to release space
s390/dasd: Add dasd_sleep_on_queue_interruptible()
s390/dasd: Add missing intensity definition
s390/dasd: Fix whitespace
s390/dasd: Add dynamic formatting support for ESE volumes
s390/dasd: Recognise data for ESE volumes
s390/dasd: Put sub-order definitions in a separate section
s390/dasd: Make layout analysis ESE compatible
s390/dasd: Remove old defines and function
...
When current task is interrupted in-between stack frame allocation
and backchain write instructions new stack frame backchain pointer
is left uninitialized. That invalid backchain value is passed into
outside_of_stack for sanity check. Make sure int overflow does not happen
by subtracting stack_frame size from the stack "end" rather than adding
it to "random" backchain value.
Fixes: 41b0474c1b1c ("s390/unwind: introduce stack unwind API")
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Move enablement of mio addressing control from detect_machine_facilities
to pci_base_init. detect_machine_facilities runs so early that the
static branches have not been toggled yet, thus mio addressing control
was always off. In pci_base_init we have to use the SMP aware
ctl_set_bit though.
Fixes: 833b441ec0 ("s390: enable processes for mio instructions")
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Use the correct bit for detection of the machine capability associated
with the has_secure attribute. It is expected that the underlying
platform (including hypervisors) unsets the bit when they don't provide
secure ipl for their guests.
Fixes: c9896acc78 ("s390/ipl: Provide has_secure sysfs attribute")
Cc: stable@vger.kernel.org # 5.2
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add the extended counter set definitions for s390 machine types
8561 and 8262. They are identical with machine types 3906 and
3907.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXSMhUgAKCRCRxhvAZXjc
okkiAQC3Hlg/O2JoIb4PqgEvBkpHSdVxyuWagn0ksjACW9ANKQEAl5OadMhvOq16
UHGhKlpE/M8HflknIffoEGlIAWHrdwU=
=7kP5
-----END PGP SIGNATURE-----
Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner:
"This adds two main features.
- First, it adds polling support for pidfds. This allows process
managers to know when a (non-parent) process dies in a race-free
way.
The notification mechanism used follows the same logic that is
currently used when the parent of a task is notified of a child's
death. With this patchset it is possible to put pidfds in an
{e}poll loop and get reliable notifications for process (i.e.
thread-group) exit.
- The second feature compliments the first one by making it possible
to retrieve pollable pidfds for processes that were not created
using CLONE_PIDFD.
A lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these
processes a caller can currently not create a pollable pidfd. This
is a problem for Android's low memory killer (LMK) and service
managers such as systemd.
Both patchsets are accompanied by selftests.
It's perhaps worth noting that the work done so far and the work done
in this branch for pidfd_open() and polling support do already see
some adoption:
- Android is in the process of backporting this work to all their LTS
kernels [1]
- Service managers make use of pidfd_send_signal but will need to
wait until we enable waiting on pidfds for full adoption.
- And projects I maintain make use of both pidfd_send_signal and
CLONE_PIDFD [2] and will use polling support and pidfd_open() too"
[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22
[2] aab6e3eb73/src/lxc/start.c (L1753)
* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: add pidfd_open() tests
arch: wire-up pidfd_open()
pid: add pidfd_open()
pidfd: add polling selftests
pidfd: add polling support
- Improve the handling of shared ACPI power resources in the PCI
bus type layer (Mika Westerberg).
- Make the PCI layer take link delays required by the PCIe spec
into account as appropriate and avoid polling devices in D3cold
for PME (Mika Westerberg).
- Fix some corner case issues in ACPI device power management and
in the PCI bus type layer, optimiza and clean up the handling of
runtime-suspended PCI devices during system-wide transitions to
sleep states (Rafael Wysocki).
- Rework hibernation handling in the ACPI core and the PCI bus type
to resume runtime-suspended devices before hibernation (which
allows some functional problems to be avoided) and fix some ACPI
power management issues related to hiberation (Rafael Wysocki).
- Extend the operating performance points (OPP) framework to support
a wider range of devices (Rajendra Nayak, Stehpen Boyd).
- Fix issues related to genpd_virt_devs and issues with platforms
using the set_opp() callback in the OPP framework (Viresh Kumar,
Dmitry Osipenko).
- Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).
- Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).
- Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210,
and armada-37xx cpufreq drivers (David Arcari, Florian Fainelli,
Paweł Chmiel, YueHaibing).
- Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).
- Fix minor issue in the ACPI system sleep support code and export
one function from it (Lenny Szubowicz, Dexuan Cui).
- Clean up assorted pieces of PM code and documentation (Kefeng Wang,
Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).
- Update the pm-graph utility to v5.4 (Todd Brandt).
- Fix and clean up the cpupower utility (Abhishek Goel, Nick Black).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl0jK18SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxgEAP/RbPe71Y9ufKu64L3EtgV6mS9iuLEhux
/Ad9laLNeM1b0oceT3QxGk7xQCacnZcBlcaqXVWI4NRsn4RBZp1cYZngpgJ9DP6E
ONr8hzyzDOMVReba3XJIF8H+WoTKjywMYtFutjdx6dRe2ZJLutqnuZ0JbH1YSSK7
IxOt0mJVALf2M4Zz7F17d+n3yGE/4xAPBVbj/rBRcTEsGYlR/Hoxs7iF6EBau7fy
R5drUH6XSrWk8adc+z7l3BTGqMMYj9deRSfAWB3wpM4YK7Fv7msX/amBoGINkdn6
xP/ZcrHvhKKzE89MS8OUGP4rGVwq+7tu6mktnYL/tpKgutJqqx5LVvrLsGDSWr+W
/aJExN8Eb4Jh98C6vog3XUJoqBxkVGbU8qoCBU3jlFsaznFEWjW9IKhBHs5CIaqz
MXte6AsJ8lvFzxILjvx0m2206wNpRJRXYLX3a/BSBxa4OgOESjIpBTmbPfOwbxwj
8z9hIVlDTTDtnF6BEyDQr1fjPi3Mxl7ibGnoqRrJm36VKBy9VZNNwG/0Y2oSvm6k
Es8CiTWA3A/46dCZxGr18/9Vbfxn1Yvg9QZ1lCE5Fqij+0F2yRApbHZgUro+1rji
6J8OWC5r5JccdKuGHh4RH/asMFhD0cAR/gUsRzS4dz/wz2jVIN1FstdD1aKN5GBy
d0lchx/AKR5H
=aBN3
-----END PGP SIGNATURE-----
Merge tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update PCI and ACPI power management (improved handling of ACPI
power resources and PCIe link delays, fixes related to corner cases,
hibernation handling rework), fix and extend the operating performance
points (OPP) framework, add new cpufreq drivers for Raspberry Pi and
imx8m chips, update some other cpufreq drivers, clean up assorted
pieces of PM code and documentation and update tools.
Specifics:
- Improve the handling of shared ACPI power resources in the PCI bus
type layer (Mika Westerberg).
- Make the PCI layer take link delays required by the PCIe spec into
account as appropriate and avoid polling devices in D3cold for PME
(Mika Westerberg).
- Fix some corner case issues in ACPI device power management and in
the PCI bus type layer, optimiza and clean up the handling of
runtime-suspended PCI devices during system-wide transitions to
sleep states (Rafael Wysocki).
- Rework hibernation handling in the ACPI core and the PCI bus type
to resume runtime-suspended devices before hibernation (which
allows some functional problems to be avoided) and fix some ACPI
power management issues related to hiberation (Rafael Wysocki).
- Extend the operating performance points (OPP) framework to support
a wider range of devices (Rajendra Nayak, Stehpen Boyd).
- Fix issues related to genpd_virt_devs and issues with platforms
using the set_opp() callback in the OPP framework (Viresh Kumar,
Dmitry Osipenko).
- Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).
- Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).
- Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210, and
armada-37xx cpufreq drivers (David Arcari, Florian Fainelli, Paweł
Chmiel, YueHaibing).
- Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).
- Fix minor issue in the ACPI system sleep support code and export
one function from it (Lenny Szubowicz, Dexuan Cui).
- Clean up assorted pieces of PM code and documentation (Kefeng Wang,
Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).
- Update the pm-graph utility to v5.4 (Todd Brandt).
- Fix and clean up the cpupower utility (Abhishek Goel, Nick Black)"
* tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (57 commits)
ACPI: PM: Make acpi_sleep_state_supported() non-static
PM: sleep: Drop dev_pm_skip_next_resume_phases()
ACPI: PM: Unexport acpi_device_get_power()
Documentation: ABI: power: Add missing newline at end of file
ACPI: PM: Drop unused function and function header
ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
ACPI: PM: Simplify and fix PM domain hibernation callbacks
PCI: PM: Simplify bus-level hibernation callbacks
PM: ACPI/PCI: Resume all devices during hibernation
cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update()
cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get()
kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
cpufreq: Don't skip frequency validation for has_target() drivers
PCI: PM/ACPI: Refresh all stale power state data in pci_pm_complete()
PCI / ACPI: Add _PR0 dependent devices
ACPI / PM: Introduce concept of a _PR0 dependent device
PCI / ACPI: Use cached ACPI device state to get PCI device power state
ACPI: PM: Allow transitions to D0 to occur in special cases
ACPI: PM: Avoid evaluating _PS3 on transitions from D3hot to D3cold
cpufreq: Use has_target() instead of !setpolicy
...
Pull force_sig() argument change from Eric Biederman:
"A source of error over the years has been that force_sig has taken a
task parameter when it is only safe to use force_sig with the current
task.
The force_sig function is built for delivering synchronous signals
such as SIGSEGV where the userspace application caused a synchronous
fault (such as a page fault) and the kernel responded with a signal.
Because the name force_sig does not make this clear, and because the
force_sig takes a task parameter the function force_sig has been
abused for sending other kinds of signals over the years. Slowly those
have been fixed when the oopses have been tracked down.
This set of changes fixes the remaining abusers of force_sig and
carefully rips out the task parameter from force_sig and friends
making this kind of error almost impossible in the future"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
signal: Remove the signal number and task parameters from force_sig_info
signal: Factor force_sig_info_to_task out of force_sig_info
signal: Generate the siginfo in force_sig
signal: Move the computation of force into send_signal and correct it.
signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
signal: Remove the task parameter from force_sig_fault
signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
signal: Explicitly call force_sig_fault on current
signal/unicore32: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from ptrace_break
signal/nds32: Remove tsk parameter from send_sigtrap
signal/riscv: Remove tsk parameter from do_trap
signal/sh: Remove tsk parameter from force_sig_info_fault
signal/um: Remove task parameter from send_sigtrap
signal/x86: Remove task parameter from send_sigtrap
signal: Remove task parameter from force_sig_mceerr
signal: Remove task parameter from force_sig
signal: Remove task parameter from force_sigsegv
...
For non-static-inlines, debug.c already had non-compliant function
header docs. So move the pure prototype kdocs of
("s390: include/asm/debug.h add kerneldoc markups")
from debug.h to debug.c and merge them with the old function docs.
Also, I had the impression that kdoc typically is at the implementation
in the compile unit rather than at the prototype in the header file.
While at it, update the short kdoc description to distinguish the
different functions. And a few more consistency cleanups.
Added a new kdoc for debug_set_critical() since debug.h comments it
as part of the API.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1562149189-1417-3-git-send-email-maier@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Avoid kasan false positive when current task is interrupted in-between
stack frame allocation and backchain write instructions leaving new stack
frame backchain invalid. In particular if backchain is 0 the unwinder
tries to read pt_regs from the stack and might hit kasan poisoned bytes,
leading to kasan "stack-out-of-bounds" report.
Disable kasan instrumentation of unwinder stack reads, since this
limitation couldn't be handled otherwise with current backchain unwinder
implementation.
Fixes: 78c98f9074 ("s390/unwind: introduce stack unwind API")
Reported-by: Julian Wiedmann <jwi@linux.ibm.com>
Tested-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
stop_machine is the only user left of cpu_relax_yield. Given that it
now has special semantics which are tied to stop_machine introduce a
weak stop_machine_yield function which architectures can override, and
get rid of the generic cpu_relax_yield implementation.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The stop_machine loop to advance the state machine and to wait for all
affected CPUs to check-in calls cpu_relax_yield in a tight loop until
the last missing CPUs acknowledged the state transition.
On a virtual system where not all logical CPUs are backed by real CPUs
all the time it can take a while for all CPUs to check-in. With the
current definition of cpu_relax_yield a diagnose 0x44 is done which
tells the hypervisor to schedule *some* other CPU. That can be any
CPU and not necessarily one of the CPUs that need to run in order to
advance the state machine. This can lead to a pretty bad diagnose 0x44
storm until the last missing CPU finally checked-in.
Replace the undirected cpu_relax_yield based on diagnose 0x44 with a
directed yield. Each CPU in the wait loop will pick up the next CPU
in the cpumask of stop_machine. The diagnose 0x9c is used to tell the
hypervisor to run this next CPU instead of the current one. If there
is only a limited number of real CPUs backing the virtual CPUs we
end up with the real CPUs passed around in a round-robin fashion.
[heiko.carstens@de.ibm.com]:
Use cpumask_next_wrap as suggested by Peter Zijlstra.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The declaration for pfn_is_nosave is only available in
kernel/power/power.h. Since this function can be override in arch,
expose it globally. Having a prototype will make sure to avoid warning
(sometime treated as error with W=1) such as:
arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes]
This moves the declaration into a globally visible header file and add
missing include to avoid a warning on powerpc.
Also remove the duplicated prototypes since not required anymore.
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the CONFIG_CRASH_DUMP ifdef to get rid of this:
arch/s390/kernel/machine_kexec.c:146:22: warning: 'do_start_kdump' defined but not used [-Wunused-function]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
x86 and powerpc (partially) enforce already CONFIG_HOTPLUG_CPU. On
s390 it is enabled on all distributions by default since ages.
The only exception is our zfcpdump kernel.
However to simplify testing, enforce HOTPLUG_CPU. This was suggested
by Paul McKenney, since his rcutorture test environments for CONFIG_SMP=y
only support HOTPLUG_CPU=y.
Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
There never have been distributions that shiped with CONFIG_SMP=n for
s390. In addition the kernel currently doesn't even compile with
CONFIG_SMP=n for s390. Most likely it wouldn't even work, even if we
fix the compile error, since nobody tests it, since there is no use
case that I can think of.
Therefore simply enforce CONFIG_SMP and get rid of some more or
less unused code.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Sync with binutils and add a couple of missing instructions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The use of stop_machine to replace the mask bits of the jump label branch
is a very heavy-weight operation. This is in fact not necessary, the
mask of the branch can simply be updated, followed by a signal processor
to all the other CPUs to force them to pick up the modified instruction.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[heiko.carstens@de.ibm.com]: Change jump_label_make_nop() so we get
brcl 0,offset instead of brcl 0,0. This
makes sure that only the mask part of the
instruction gets changed when updated.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.
The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.
The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:
force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
All of the remaining callers pass current into force_sig so
remove the task parameter to make this obvious and to make
misuse more difficult in the future.
This also makes it clear force_sig passes current into force_sig_info.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull more vfs mount updates from Al Viro:
"Propagation of new syscalls to other architectures + cosmetic change
from Christian (fscontext didn't follow the convention for anon inode
names)"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
uapi, fsopen: use square brackets around "fscontext" [ver #2]
Christoph Hellwig writes:
This is a series doing two somewhat interwinded things. It improves
the asm-generic nommu uaccess helper to optionally be entirely generic
and not require any arch helpers for the actual uaccess. For the
generic uaccess.h to actually be generically useful I also had to kill
off the mess we made of <asm/segment.h>, which really shouldn't exist
on most architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcv22JAAoJEGCrR//JCVInH3YP/i7hVIEo4azoRB3/PwFTPel8
Buq9BUrGy1kptLWyi1yuOZf874gF6351hkVUo4EYq/uZ3x41ciojxFgbsIriAU0p
2xzJAWY0YaVuWM+PIVj6KjyDA0N7/U5PcAG+03pl1Lhf0GHldmMbQIWt8D8HXbB8
gyaOeBGM9BneKd8Xu0COUaId9/3GXUwsy2zYc77+PxaHHYJzGDPB5lqNdU0gbB54
P9uXI2mpoAepFMsFgP6/FBvT/jCiMifRIdTXPD94NtjfG+Q4lo+LBQ6bpcLfw4ZD
VNh0982Cyl5n7FNetlTK4CPBn0RZsmBRriMotYfXeghFg0mmNTLwNEMg1u6RQ+uq
VYoBrVGhnx4SFB8xdkqO4md6UwprR2SERIIKwuCTbhwSgs+NkB7t4ftOwTzyQ1/6
7WQjclAIxQK9J7uwAeRGCvyrNJplqSfOA/hRjuq/Z0BCE/+m26Gxfv4aDztU5wFO
FWj2uFGTMuufp+DKoh5Vj5aJiFwfqK5/w1VYWSQdaoiWsHlmmu5IkTrrZyz+S3Tj
cifO9Ghe75Pt+rDllc8yqzKYXa5mL89sWyKiy+2ItOvGVh5EqXBbPXtCrFFIHRFL
WojVPu0yO+OoP0sEdamT/4FxbWO9VrV5YXdaRg/GjVA9ARlKNpFrZbuHtvEitwpi
7AbhxMZwBSSb9R3cz67J
=X2CH
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-nommu' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull nommu generic uaccess updates from Arnd Bergmann:
"asm-generic: kill <asm/segment.h> and improve nommu generic uaccess helpers
Christoph Hellwig writes:
This is a series doing two somewhat interwinded things. It improves
the asm-generic nommu uaccess helper to optionally be entirely
generic and not require any arch helpers for the actual uaccess.
For the generic uaccess.h to actually be generically useful I also
had to kill off the mess we made of <asm/segment.h>, which really
shouldn't exist on most architectures"
* tag 'asm-generic-nommu' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: optimize generic uaccess for 8-byte loads and stores
asm-generic: provide entirely generic nommu uaccess
arch: mostly remove <asm/segment.h>
asm-generic: don't include <asm/segment.h> from <asm/uaccess.h>
Wire up the mount API syscalls on non-x86 arches.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this
will allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc0CCEAAoJEDjwexyKj9rgjmkH/A3e2drvuP/hSF3xfCKTQFdx
/PoLHQVCqENB3HU3FA/ljoXuG6jMgwj61looqlxBNumXFpIfTg0E1JC5S4wRGJ+K
cOVhIKV53gcuZkRcCJQp0WMnGzpk1Daf7iYXYmAl+7e+mREUPxOuJ0Ei6vXvRGZS
8cQrUCGrtPgkAeLlndypHI2M2TDDGJIMczOGbOZau8+8Lo7Wq9zt5y0h/v0ew37g
ogA0eGh6koU1435dt2pclZRiZ1XOcar3Uin9ioT+RnSgJ4pr1Pza/F6IGO0RdQa+
rva990lqGFp5r9lE4rMCwK9LWb/rfHdVPd35t9XPwphnQ/ORoWUwLk3uc5XOHow=
=dbuy
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this will
allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
* tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (98 commits)
s390/vdso: drop unnecessary cc-ldoption
s390: fix clang -Wpointer-sign warnigns in boot code
s390: drop CONFIG_VIRT_TO_BUS
s390: boot, purgatory: pass $(CLANG_FLAGS) where needed
s390: only build for new CPUs with clang
s390: simplify disabled_wait
s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
s390/unwind: introduce stack unwind API
s390/opcodes: add missing instructions to the disassembler
s390/bug: add entry size to the __bug_table section
s390: use proper expoline sections for .dma code
s390/nospec: rename assembler generated expoline thunks
s390: add missing ENDPROC statements to assembler functions
locking/lockdep: check for freed initmem in static_obj()
s390/kernel: add support for kernel address space layout randomization (KASLR)
s390/kernel: introduce .dma sections
s390/sclp: do not use static sccbs
s390/kprobes: use static buffer for insn_page
s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad
s390/kernel: build a relocatable kernel
...
Pull stack trace updates from Ingo Molnar:
"So Thomas looked at the stacktrace code recently and noticed a few
weirdnesses, and we all know how such stories of crummy kernel code
meeting German engineering perfection end: a 45-patch series to clean
it all up! :-)
Here's the changes in Thomas's words:
'Struct stack_trace is a sinkhole for input and output parameters
which is largely pointless for most usage sites. In fact if embedded
into other data structures it creates indirections and extra storage
overhead for no benefit.
Looking at all usage sites makes it clear that they just require an
interface which is based on a storage array. That array is either on
stack, global or embedded into some other data structure.
Some of the stack depot usage sites are outright wrong, but
fortunately the wrongness just causes more stack being used for
nothing and does not have functional impact.
Another oddity is the inconsistent termination of the stack trace
with ULONG_MAX. It's pointless as the number of entries is what
determines the length of the stored trace. In fact quite some call
sites remove the ULONG_MAX marker afterwards with or without nasty
comments about it. Not all architectures do that and those which do,
do it inconsistenly either conditional on nr_entries == 0 or
unconditionally.
The following series cleans that up by:
1) Removing the ULONG_MAX termination in the architecture code
2) Removing the ULONG_MAX fixups at the call sites
3) Providing plain storage array based interfaces for stacktrace
and stackdepot.
4) Cleaning up the mess at the callsites including some related
cleanups.
5) Removing the struct stack_trace based interfaces
This is not changing the struct stack_trace interfaces at the
architecture level, but it removes the exposure to the generic
code'"
* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/stacktrace: Use common infrastructure
stacktrace: Provide common infrastructure
lib/stackdepot: Remove obsolete functions
stacktrace: Remove obsolete functions
livepatch: Simplify stack trace retrieval
tracing: Remove the last struct stack_trace usage
tracing: Simplify stack trace retrieval
tracing: Make ftrace_trace_userstack() static and conditional
tracing: Use percpu stack trace buffer more intelligently
tracing: Simplify stacktrace retrieval in histograms
lockdep: Simplify stack trace handling
lockdep: Remove save argument from check_prev_add()
lockdep: Remove unused trace argument from print_circular_bug()
drm: Simplify stacktrace handling
dm persistent data: Simplify stack trace handling
dm bufio: Simplify stack trace retrieval
btrfs: ref-verify: Simplify stack trace retrieval
dma/debug: Simplify stracktrace retrieval
fault-inject: Simplify stacktrace retrieval
mm/page_owner: Simplify stack trace handling
...
Pull speculation mitigation update from Ingo Molnar:
"This adds the "mitigations=" bootline option, which offers a
cross-arch set of options that will work on x86, PowerPC and s390 that
will map to the arch specific option internally"
* 'core-speculation-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
s390/speculation: Support 'mitigations=' cmdline option
powerpc/speculation: Support 'mitigations=' cmdline option
x86/speculation: Support 'mitigations=' cmdline option
cpu/speculation: Add 'mitigations=' cmdline option
Towards the goal of removing cc-ldoption, it seems that --hash-style=
was added to binutils 2.17.50.0.2 in 2006. The minimal required version
of binutils for the kernel according to
Documentation/process/changes.rst is 2.20.
Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html
Cc: clang-built-linux@googlegroups.com
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The disabled_wait() function uses its argument as the PSW address when
it stops the CPU with a wait PSW that is disabled for interrupts.
The different callers sometimes use a specific number like 0xdeadbeef
to indicate a specific failure, the early boot code uses 0 and some
other calls sites use __builtin_return_address(0).
At the time a dump is created the current PSW and the registers of a
CPU are written to lowcore to make them avaiable to the dump analysis
tool. For a CPU stopped with disabled_wait the PSW and the registers
do not really make sense together, the PSW address does not point to
the function the registers belong to.
Simplify disabled_wait() by using _THIS_IP_ for the PSW address and
drop the argument to the function.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make the call chain more reliable by tagging the ftrace stack entries
with the stack pointer that is associated with the return address.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rework the dump_trace() stack unwinder interface to support different
unwinding algorithms. The new interface looks like this:
struct unwind_state state;
unwind_for_each_frame(&state, task, regs, start_stack)
do_something(state.sp, state.ip, state.reliable);
The unwind_bc.c file contains the implementation for the classic
back-chain unwinder.
One positive side effect of the new code is it now handles ftraced
functions gracefully. It prints the real name of the return function
instead of 'return_to_handler'.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The assembler code in arch/s390 misses proper ENDPROC statements
to properly end functions in a few places. Add them.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds support for relocating the kernel to a random address.
The random kernel offset is obtained from cpacf, using either TRNG, PRNO,
or KMC_PRNG, depending on supported MSA level.
KERNELOFFSET is added to vmcoreinfo, for crash --kaslr support.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With a relocatable kernel that could reside at any place in memory, code
and data that has to stay below 2 GB needs special handling.
This patch introduces .dma sections for such text, data and ex_table.
The sections will be part of the decompressor kernel, so they will not
be relocated and stay below 2 GB. Their location is passed over to the
decompressed / relocated kernel via the .boot.preserved.data section.
The duald and aste for control register setup also need to stay below
2 GB, so move the setup code from arch/s390/kernel/head64.S to
arch/s390/boot/head.S. The duct and linkage_stack could reside above
2 GB, but their content has to be preserved for the decompresed kernel,
so they are also moved into the .dma section.
The start and end address of the .dma sections is added to vmcoreinfo,
for crash support, to help debugging in case the kernel crashed there.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The sccbs for init/read/sdias/early have to be located below 2 GB, and
they are currently defined as a static buffer.
With a relocatable kernel that could reside at any place in memory, this
will no longer guarantee the location below 2 GB, so use a dynamic
GFP_DMA allocation instead.
The sclp_early_sccb buffer needs special handling, as it can be used
very early, and by both the decompressor and also the decompressed
kernel. Therefore, a fixed 4 KB buffer is introduced at 0x11000, the
former PARMAREA_END. The new PARMAREA_END is now 0x12000, and it is
renamed to HEAD_END, as it is rather the end of head.S and not the end
of the parmarea.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With a relocatable kernel that could reside at any place in memory, the
current logic for allocating a kprobes insn_page does not work. The
GFP_DMA allocated buffer might be more than 2 GB away from the kernel.
Use a static buffer for the insn_page instead.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With a relocatable kernel that could reside at any place in memory, the
storage size for the SYSCALL and PGM_CHECK handlers needs to be
increased from .long to .quad.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds support for building a relocatable kernel with -fPIE.
The kernel will be relocated to 0 early in the boot process.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Allow for userspace to use PCI MIO instructions.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Gather statistics to distinguish floating and directed interrupts.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Improve /proc/interrupts on s390 to show statistics for individual
MSI interrupts.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide an interface for userspace so it can find out if a machine is
capeable of doing secure boot. The interface is, for example, needed for
zipl so it can find out which file format it can/should write to disk.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A kernel loaded via kexec_load cannot be verified. Thus disable kexec_load
systemcall in kernels which where IPLed securely. Use the IMA mechanism to
do so.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add kernel signature verification to kexec_file. The verification is based
on module signature verification and works with kernel images signed via
scripts/sign-file.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The leading 64 kB of a kernel image doesn't contain any data needed to boot
the new kernel when it was loaded via kexec_file. Thus kexec_file currently
strips them off before loading the image. Keep the leading 64 kB in order
to be able to pass a ipl_report to the next kernel.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
s390_image_load and s390_elf_load have the same code to load the different
components. Combine this functionality in one shared function.
While at it move kexec_file_update_kernel into the new function as well.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Access the parmarea in head.S via a struct instead of individual offsets.
While at it make the fields in the parmarea .quads.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To register data for the next kernel (command line, oldmem_base, etc.) the
current kernel needs to find the ELF segment that contains head.S. This is
currently done by checking ifor 'phdr->p_paddr == 0'. This works fine for
the current kernel build but in theory the first few pages could be
skipped. Make the detection more robust by checking if the entry point lies
within the segment.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When loading an ELF image via kexec_file the segment alignment is ignored
in the calculation for the load address of the next segment. When there are
multiple segments this can lead to segment overlap and thus load failure.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Fixes: 8be0188271 ("s390/kexec_file: Add ELF loader")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
PR: Adjusted to the use in kexec_file later.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Read the IPL Report block provided by secure-boot, add the entries
of the certificate list to the system key ring and print the list
of components.
PR: Adjust to Vasilys bootdata_preserved patch set. Preserve ipl_cert_list
for later use in kexec_file.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The IPL parameter block is used as an interface between Linux and
the machine to query and change the boot device and boot options.
To be able to create IPL parameter block in user space and pass it
as segment to kexec provide an uapi header with proper structure
definitions for the block.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The ipl_info union in struct ipl_parameter_block has the same name as
the struct ipl_info. This does not help while reading the code and the
union in struct ipl_parameter_block does not need to be named. Drop
the name from the union.
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This comes a bit late, but should be in 5.1 anyway: we want the newly
added system calls to be synchronized across all architectures in
the release.
I hope that in the future, any newly added system calls can be added
to all architectures at the same time, and tested there while they
are in linux-next, avoiding dependencies between the architecture
maintainer trees and the tree that contains the new system call.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcv2aZAAoJEGCrR//JCVIncu4QALpTBqbjSu9u1/nXRGMLWo9J
uToSBohDvsKW7wMkHcr1dU75ERIX9gqIY5pJWDrwzBdGDt02/oiy6WofXZDv4WkR
Sp4YncdTeZENi0nNN+mrGDzNrcvBJd0FRc1MSLgPzfKXgf8P1oRzEsOaJVlGY5hS
A8rNNUYE37m6rhTS59tNxzGvQcq3J7Q9ZRc0xjbSqIFngYVfQQiVbQCqd8RI6s9W
+Hek+e5VF5HQnzhmTT9MQM4TsxMRMNfzrYpjhhayuLJ3CHROJPX7x9pZEGdyusQS
5rDZxKes9SKTFS9QqycSyJkoP0awxrVrjqD1zFkWOJht0c3UCQAmw6GD7rlJkGPB
vofuzmPzMq5XaZ8vpTucWNL+0ymzRXhhQ6esV39vRwxztRc4/DCy5MHDnrPK5yXb
olPbltMAlHMaY5KePI/3jwpkcmzZjz9SNOKQ9/9tFlB5+RVF2qQdUgRMPE+XYa4H
pRrZChrEAf6ZjINGeLlIVtpTlBFPl1LRF7UkOy7TYBvtRqukduXYpOFPb1XspQUl
flIdBLOY3iF33o0eQnz10BEMxlblFhTj0SQrt0684kili7TjsWDaT+hPZSd72hhi
Wey9l39kaexV2Sh7XZ6oUe205ay3R8sTn0Ic2+CnZaboeOuYlLYc8/w2HkTeTYmu
9f3HAlX4Qu6RuX8bxLO0
=Y7Kd
-----END PGP SIGNATURE-----
Merge tag 'syscalls-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull syscall numbering updates from Arnd Bergmann:
"arch: add pidfd and io_uring syscalls everywhere
This comes a bit late, but should be in 5.1 anyway: we want the newly
added system calls to be synchronized across all architectures in the
release.
I hope that in the future, any newly added system calls can be added
to all architectures at the same time, and tested there while they are
in linux-next, avoiding dependencies between the architecture
maintainer trees and the tree that contains the new system call"
* tag 'syscalls-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: add pidfd and io_uring syscalls everywhere
A few architectures use <asm/segment.h> internally, but nothing in
common code does. Remove all the empty or almost empty versions of it,
including the asm-generic one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add support for the CPU-Measurement Facility counter
second version number 6. This number is used to detect some
more counters in the crypto counter set and the extended
counter set.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Inline assembly code changed in this patch should really use "Q"
constraint "Memory reference without index register and with short
displacement". The kernel build with kasan instrumentation enabled
might occasionally break otherwise (due to stack instrumentation).
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the io_uring and pidfd_send_signal system calls to all architectures.
These system calls are designed to handle both native and compat tasks,
so all entries are the same across architectures, only arm-compat and
the generic tale still use an old format.
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> (s390)
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Terminating the last trace entry with ULONG_MAX is a completely pointless
exercise and none of the consumers can rely on it because it's
inconsistently implemented across architectures. In fact quite some of the
callers remove the entry and adjust stack_trace.nr_entries afterwards.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: https://lkml.kernel.org/r/20190410103644.396788431@linutronix.de
IS_ENABLED should be reserved for CONFIG_<FOO> uses so convert
the uses of IS_ENABLED with a #define to __is_defined.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
clang does not support 31 bit object files on s390, so skip
the 32-bit vdso here, and only build it when using gcc to compile
the kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The Ultravisor Call Facility (stfle bit 158) defines an API to the
Ultravisor (UV calls), a mini hypervisor located at machine
level. With help of the Ultravisor, KVM will be able to run
"protected" VMs, special VMs whose memory and management data are
unavailable to KVM.
The protected VMs can also request services from the Ultravisor.
The guest api consists of UV calls to share and unshare memory with the
kvm hypervisor.
To enable this feature support PROTECTED_VIRTUALIZATION_GUEST kconfig
option has been introduced.
Co-developed-by: Janosch Frank <frankja@de.ibm.com>
Signed-off-by: Janosch Frank <frankja@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
.boot.preserved.data is a better fit for ipl block than .boot.data
which is discarded after init. Reusing .boot.preserved.data allows to
simplify code a little bit and avoid copying data from .boot.data to
persistent variables.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce .boot.preserve.data section which is similar to .boot.data and
"shared" between the decompressor code and the decompressed kernel. The
decompressor will store values in it, and copy over to the decompressed
image before starting it. This method allows to avoid using pre-defined
addresses and other hacks to pass values between those boot phases.
Unlike .boot.data section .boot.preserved.data is NOT a part of init data,
and hence will be preserved for the kernel life time.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for the CPU-Measurement Facility counter
second version number 6. This number is used to detect some
more counters in the crypto counter set and the extended
counter set.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed
with the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the
heap of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcnIq7AAoJEDjwexyKj9rgddUH/3VQP6BMvq2fwAsLqx8JeYgT
082xzP2nHli3tO6m8fFHmtqrSg5KTEDfuQVafqp92LeEMKUNWQI6kRu7rXeAVBct
M6hx21mqkm9VNjAlAjSq8IAUXP2K6/K0BMD5mYInYYYVRvJm3on4sHnkEj0kvXbm
OGxwnNBd9UnH5g6ti2vW4cyDvs0aqj1eDbSudy5KedumQz5J2XdFPn4f4Ej6p2+t
nuvlZFDnZ2Z4rliE3RFCuKExZR+YFZgS1urm6pcklncfvbJRsqFJ+nvhurskDUI3
4gOp1Yv1tvGNv/cNVEtnz8g/Kg8/sI7evjQBtxhtEsV/W0sbZPnjCt+28Cf1DN4=
=4nL7
-----END PGP SIGNATURE-----
Merge tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Improvements and bug fixes for 5.1-rc2:
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed with
the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the heap
of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices"
* tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpumf: Fix warning from check_processor_id
zcrypt: handle AP Info notification from CHSC SEI command
vfio: ccw: only free cp on final interrupt
s390/vtime: steal time exponential moving average
s390/zcrypt: revisit ap device remove procedure
s390: limit brk randomization to 32MB
Function __hw_perf_event_init() used a CPU variable without
ensuring CPU preemption has been disabled. This caused the
following warning in the kernel log:
[ 7.277085] BUG: using smp_processor_id() in preemptible
[00000000] code: cf-csdiag/1892
[ 7.277111] caller is cf_diag_event_init+0x13a/0x338
[ 7.277122] CPU: 10 PID: 1892 Comm: cf-csdiag Not tainted
5.0.0-20190318.rc0.git0.9e1a11e0f602.300.fc29.s390x+debug #1
[ 7.277131] Hardware name: IBM 2964 NC9 712 (LPAR)
[ 7.277139] Call Trace:
[ 7.277150] ([<000000000011385a>] show_stack+0x82/0xd0)
[ 7.277161] [<0000000000b7a71a>] dump_stack+0x92/0xd0
[ 7.277174] [<00000000007b7e9c>] check_preemption_disabled+0xe4/0x100
[ 7.277183] [<00000000001228aa>] cf_diag_event_init+0x13a/0x338
[ 7.277195] [<00000000002cf3aa>] perf_try_init_event+0x72/0xf0
[ 7.277204] [<00000000002d0bba>] perf_event_alloc+0x6fa/0xce0
[ 7.277214] [<00000000002dc4a8>] __s390x_sys_perf_event_open+0x398/0xd50
[ 7.277224] [<0000000000b9e8f0>] system_call+0xdc/0x2d8
[ 7.277233] 2 locks held by cf-csdiag/1892:
[ 7.277241] #0: 00000000976f5510 (&sig->cred_guard_mutex){+.+.},
at: __s390x_sys_perf_event_open+0xd2e/0xd50
[ 7.277257] #1: 00000000363b11bd (&pmus_srcu){....},
at: perf_event_alloc+0x52e/0xce0
The variable is now accessed in proper context. Use
get_cpu_var()/put_cpu_var() pair to disable
preemption during access.
As the hardware authorization settings apply to all CPUs, it
does not matter which CPU is used to check the authorization setting.
Remove the event->count assignment. It is not needed as function
perf_event_alloc() allocates memory for the event with kzalloc() and
thus count is already set to zero.
Fixes: fe5908bccc ("s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
for 32-bit guests
s390: interrupt cleanup, introduction of the Guest Information Block,
preparation for processor subfunctions in cpu models
PPC: bug fixes and improvements, especially related to machine checks
and protection keys
x86: many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations; plus AVIC fixes.
Generic: memcg accounting
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
=XIzU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- some cleanups
- direct physical timer assignment
- cache sanitization for 32-bit guests
s390:
- interrupt cleanup
- introduction of the Guest Information Block
- preparation for processor subfunctions in cpu models
PPC:
- bug fixes and improvements, especially related to machine checks
and protection keys
x86:
- many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations
- AVIC fixes
Generic:
- memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
kvm: vmx: fix formatting of a comment
KVM: doc: Document the life cycle of a VM and its resources
MAINTAINERS: Add KVM selftests to existing KVM entry
Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
KVM: PPC: Fix compilation when KVM is not enabled
KVM: Minor cleanups for kvm_main.c
KVM: s390: add debug logging for cpu model subfunctions
KVM: s390: implement subfunction processor calls
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
KVM: arm/arm64: Remove unused timer variable
KVM: PPC: Book3S: Improve KVM reference counting
KVM: PPC: Book3S HV: Fix build failure without IOMMU support
Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
x86: kvmguest: use TSC clocksource if invariant TSC is exposed
KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
...
Add check for the return value of memblock_alloc*() functions and call
panic() in case of error. The panic message repeats the one used by
panicing memblock allocators with adjustment of parameters to include
only relevant ones.
The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.
@@
expression ptr, size, align;
@@
ptr = memblock_alloc(size, align);
+ if (!ptr)
+ panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);
[anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
[rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
[rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
[akpm@linux-foundation.org: fix xtensa printk warning]
Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky]
Acked-by: Paul Burton <paul.burton@mips.com> [MIPS]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Reviewed-by: Juergen Gross <jgross@suse.com> [Xen]
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memblock_alloc_base() function tries to allocate a memory up to the
limit specified by its max_addr parameter and panics if the allocation
fails. Replace its usage with memblock_phys_alloc_range() and make the
callers check the return value and panic in case of error.
Link: http://lkml.kernel.org/r/1548057848-15136-10-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com> [c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com> [Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the memblock_phys_alloc() function an inline wrapper for
memblock_phys_alloc_range() and update the memblock_phys_alloc() callers
to check the returned value and panic in case of error.
Link: http://lkml.kernel.org/r/1548057848-15136-8-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com> [c-sky]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Juergen Gross <jgross@suse.com> [Xen]
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (159 commits)
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
proc: more robust bulk read test
proc: test /proc/*/maps, smaps, smaps_rollup, statm
proc: use seq_puts() everywhere
proc: read kernel cpu stat pointer once
proc: remove unused argument in proc_pid_lookup()
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
fs/proc/self.c: code cleanup for proc_setup_self()
proc: return exit code 4 for skipped tests
mm,mremap: bail out earlier in mremap_to under map pressure
mm/sparse: fix a bad comparison
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
writeback: fix inode cgroup switching comment
mm/huge_memory.c: fix "orig_pud" set but not used
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
mm/memcontrol.c: fix bad line in comment
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/compaction: pass pgdat to too_many_isolated() instead of zone
mm: remove zone_lru_lock() function, access ->lru_lock directly
...
Pull perf updates from Ingo Molnar:
"Lots of tooling updates - too many to list, here's a few highlights:
- Various subcommand updates to 'perf trace', 'perf report', 'perf
record', 'perf annotate', 'perf script', 'perf test', etc.
- CPU and NUMA topology and affinity handling improvements,
- HW tracing and HW support updates:
- Intel PT updates
- ARM CoreSight updates
- vendor HW event updates
- BPF updates
- Tons of infrastructure updates, both on the build system and the
library support side
- Documentation updates.
- ... and lots of other changes, see the changelog for details.
Kernel side updates:
- Tighten up kprobes blacklist handling, reduce the number of places
where developers can install a kprobe and hang/crash the system.
- Fix/enhance vma address filter handling.
- Various PMU driver updates, small fixes and additions.
- refcount_t conversions
- BPF updates
- error code propagation enhancements
- misc other changes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
perf script python: Add Python3 support to syscall-counts-by-pid.py
perf script python: Add Python3 support to syscall-counts.py
perf script python: Add Python3 support to stat-cpi.py
perf script python: Add Python3 support to stackcollapse.py
perf script python: Add Python3 support to sctop.py
perf script python: Add Python3 support to powerpc-hcalls.py
perf script python: Add Python3 support to net_dropmonitor.py
perf script python: Add Python3 support to mem-phys-addr.py
perf script python: Add Python3 support to failed-syscalls-by-pid.py
perf script python: Add Python3 support to netdev-times.py
perf tools: Add perf_exe() helper to find perf binary
perf script: Handle missing fields with -F +..
perf data: Add perf_data__open_dir_data function
perf data: Add perf_data__(create_dir|close_dir) functions
perf data: Fail check_backup in case of error
perf data: Make check_backup work over directories
perf tools: Add rm_rf_perf_data function
perf tools: Add pattern name checking to rm_rf
perf tools: Add depth checking to rm_rf
perf data: Add global path holder
...
To be able to judge the current overcommitment ratio for a CPU add
a lowcore field with the exponential moving average of the steal time.
The average is updated every tick.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The VDSO is part of the kernel image and therefore the struct pages are
marked as reserved during boot.
As we install a special mapping, the actual struct pages will never be
exposed to MM via the page tables. We can therefore leave the pages
marked as reserved.
Link: http://lkml.kernel.org/r/20190114125903.24845-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>