Commit Graph

5445 Commits

Author SHA1 Message Date
Vasily Gorbik
686140a1a9 s390: introduce CPU alternatives
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.

A new kernel boot parameter "noaltinstr" disables patching.

Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.

alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);

Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.

.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.

Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-18 14:11:29 +02:00
Sebastian Ott
0dcd91a9e6 s390/debug: only write data once
debug_event_common memsets the active debug entry with zeros to
prevent stale data leakage. This is overwritten with the actual
debug data in the next step. Only write zeros to that part of the
debug entry that's not used by new debug data.

Micro benchmarks show a 2-10% reduction of cpu cycles with this
approach.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-18 14:11:23 +02:00
Sebastian Ott
94158e544f s390/debug: improve debug_event
debug_event currently truncates the data if used with a size larger than
the buf_size of the debug feature. For lots of callers of this function,
wrappers have been implemented that loop until all data is handled.

Move that functionality into debug_event_common and get rid of the wrappers.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-18 14:11:19 +02:00
Philipp Rudo
7c3eaaa391 s390/kexec: Fix checksum validation return code for kdump
Before kexec boots to a crash kernel it checks whether the image in memory
changed after load. This is done by the function kdump_csum_valid, which
returns true, i.e. an int != 0, on success and 0 otherwise. In other words
when kdump_csum_valid returns an error code it means that the validation
succeeded. This is not only counterintuitive but also produces the wrong
result if the kernel was build without CONFIG_CRASH_DUMP. Fix this by
making kdump_csum_valid return a bool.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-18 14:11:16 +02:00
Martin Schwidefsky
fe3af62553 s390: update defconfig
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-16 08:19:28 +02:00
Heiko Carstens
496da0d706 s390/debug: adjust coding style
The debug feature code hasn't been touched in ages and the code also
looks like this. Therefore clean up the code so it looks a bit more
like current coding style.

There is no functional change - actually I made also sure that the
generated code with performance_defconfig is identical.
A diff of old vs new with "objdump -d" is empty.

The code is still not checkpatch clean, but that was not the goal.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-16 08:19:26 +02:00
Heiko Carstens
df8bbd0c98 s390/kprobes: remove KPROBE_SWAP_INST state
For an unknown reason the s390 kprobes instruction replacement
function modifies the kprobe_status of the current CPU to
KPROBE_SWAP_INST. This was supposed to catch traps that happened
during instruction patching. Such a fault is not supposed to happen,
and silently discarding such a fault is certainly also not what we
want. In fact s390 is the only architecture which has this odd piece
of code.

Just remove this and behave like all other architectures. This was
pointed out by Jens Remus.

Reported-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-12 07:16:50 +02:00
Heiko Carstens
49913f1fd0 s390: cleanup string ops prototypes
Just some trivial changes like removing the extern keyword from the
header file, renaming arguments to match the man pages, and whitespace
removal.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-09 11:18:08 +02:00
Heiko Carstens
993fef95b9 s390: optimize memset implementation
Like for the memset16/32/64 variants avoid that subsequent mvc
instructions depend on each other since that might have negative
performance impacts.

This patch is currently hardly relevant since at least gcc 7.1
generates only inline memset code and not a single memset call.
However there is no reason to not provide an optimized version
just in case gcc generates memset calls again, like it did in
the past.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-09 11:18:07 +02:00
Heiko Carstens
41879ff65d s390/mm: use memset64 instead of clear_table
Use memset64 instead of the (now) open-coded variant clear_table.
Performance wise there is no difference.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-09 11:18:06 +02:00
Heiko Carstens
0b77d6701c s390: implement memset16, memset32 & memset64
Provide fast versions of the new memset variants. E.g. the generic
memset64 is ten times slower than the optimized version if used on a
whole page.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-09 11:18:04 +02:00
Martin Schwidefsky
3bdf5679c9 Merge branch 'sthyi' into features
Add the store-hypervisor-information code into features
using a tip branch for parallel merging into the KVM tree.
2017-10-09 11:16:49 +02:00
QingFeng Hao
3d8757b87d s390/sthyi: add s390_sthyi system call
Add a syscall of s390_sthyi to implement STHYI instruction in LPAR
which reuses the implementation for KVM by Janosch Frank -
commit 95ca2cb579 ("KVM: s390: Add sthyi emulation").

STHYI(Store Hypervisor Information) is an emulated z/VM instruction that
provides a guest with basic information about the layers it is running
on. This includes information about the cpu configuration of both the
machine and the lpar, as well as their names, machine model and
machine type. This information enables an application to determine the
maximum capacity of CPs and IFLs available to software.

For the arguments of s390_sthyi, code shall be 0 and flags is reserved for
future use, info is the output argument to store the required hypervisor
info.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-09 11:15:36 +02:00
QingFeng Hao
9fb6c9b3fe s390/sthyi: add cache to store hypervisor info
STHYI requires extensive locking in the higher hypervisors and is
very computational/memory expensive. Therefore we cache the retrieved
hypervisor info whose valid period is 1s with mutex to allow concurrent
access. rw semaphore can't benefit here due to cache line bounce.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-09 11:15:35 +02:00
QingFeng Hao
b7c92f1a4e s390/sthyi: reorganize sthyi implementation
As we need to support sthyi instruction on LPAR too, move the common code
to kernel part and kvm related code to intercept.c for better reuse.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-09 11:15:33 +02:00
Heiko Carstens
91a1fad759 s390: use generic rwsem implementation
We never optimized our rwsem inline assemblies to make use of the new
atomic instructions. The generic rwsem implementation implicitly makes
use of the new instructions, since it implements the required rwsem
primitives with atomic operations, which we did optimize.

However even when compiling for old architectures the generic variant
still generates better code. So it's time to simply remove our old
code and switch to the generic implementation.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-04 10:30:31 +02:00
Heiko Carstens
e0d281d067 s390/disassembler: add new z14 instructions
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:52:24 +02:00
Heiko Carstens
ea7c360b10 s390/disassembler: add missing z13 instructions
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:52:18 +02:00
Heiko Carstens
630f789e80 s390/disassembler: add sthyi instruction
This instruction came with a z/VM extension and not with a specific
machine generation.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:52:11 +02:00
Heiko Carstens
7e1263b720 s390/disassembler: remove double instructions
Remove a couple of instructions that are listed twice.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:52:06 +02:00
Heiko Carstens
caefea1d3a s390/disassembler: fix LRDFU format
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:52:03 +02:00
Heiko Carstens
5c50538752 s390/disassembler: add missing end marker for e7 table
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.

This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.

Cc: <stable@vger.kernel.org> # v3.18+
Fixes: 3585cb0280 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:51:59 +02:00
Thomas Huth
7fb2b2d512 s390/virtio: remove the old KVM virtio transport
There is no recent user space application available anymore which still
supports this old virtio transport. Additionally, commit 3b2fbb3f06
("virtio/s390: deprecate old transport") introduced a deprecation message
in the driver, and apparently nobody complained so far that it is still
required. So let's simply remove it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:51:35 +02:00
Julian Wiedmann
f9a5d70cfa s390/ccwgroup: tie a ccwgroup driver to its ccw driver
When grouping devices, the ccwgroup core only checks whether all of the
devices are bound to the same ccw_driver. It has no means of checking
if the requesting ccwgroup driver actually supports this device type.
qeth implements its own device matching in qeth_core_probe_device(),
while ctcm and lcs currently have no sanity-checking at all.

Enable ccwgroup drivers to optionally defer the device type checking to
the ccwgroup core, by specifying their supported ccw_driver.
This allows us drop the device type matching from qeth, and improves
the robustness of ctcm and lcs.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:51:30 +02:00
Harald Freudenberger
bf7fa03870 s390/crypto: add s390 platform specific aes gcm support.
This patch introduces gcm(aes) support into the aes_s390 kernel module.

Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:51:25 +02:00
Patrick Steuer
eecd49c462 s390/crypto: add inline assembly for KMA instruction to cpacf.h
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-29 15:51:19 +02:00
Martin Schwidefsky
eb3b7b848f s390/rwlock: introduce rwlock wait queueing
Like the common queued rwlock code the s390 implementation uses the
queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
the queueing. The encoding of the rwlock_t differs though, the counter
field in the rwlock_t is split into two parts. The upper two bytes hold
the write bit and the write wait counter, the lower two bytes hold the
read counter.

The arch_read_lock operation works exactly like the common qrwlock but
the enqueue operation for a writer follows a diffent logic. After the
failed inline try to get the rwlock in write, the writer first increases
the write wait counter, acquires the wait spin_lock for the queueing,
and then loops until there are no readers and the write bit is zero.
Without the write wait counter a CPU that just released the rwlock
could immediately reacquire the lock in the inline code, bypassing all
outstanding read and write waiters. For s390 this would cause massive
imbalances in favour of writers in case of a contended rwlock.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:44 +02:00
Martin Schwidefsky
b96f7d881a s390/spinlock: introduce spinlock wait queueing
The queued spinlock code for s390 follows the principles of the common
code qspinlock implementation but with a few notable differences.

The format of the spinlock_t locking word differs, s390 needs to store
the logical CPU number of the lock holder in the spinlock_t to be able
to use the diagnose 9c directed yield hypervisor call.

The inline code sequences for spin_lock and spin_unlock are nice and
short. The inline portion of a spin_lock now typically looks like this:

	lhi	%r0,0			# 0 indicates an empty lock
	l	%r1,0x3a0		# CPU number + 1 from lowcore
	cs	%r0,%r1,<some_lock>	# lock operation
	jnz	call_wait		# on failure call wait function
locked:
	...
call_wait:
	la	%r2,<some_lock>
	brasl	%r14,arch_spin_lock_wait
	j	locked

A spin_unlock is as simple as before:

	lhi	%r0,0
	sth	%r0,2(%r2)		# unlock operation

After a CPU has queued itself it may not enable interrupts again for the
arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
is removed.

To improve performance the code implements opportunistic lock stealing.
If the wait function finds a spinlock_t that indicates that the lock is
free but there are queued waiters, the CPU may steal the lock up to three
times without queueing itself. The lock stealing update the steal counter
in the lock word to prevent more than 3 steals. The counter is reset at
the time the CPU next in the queue successfully takes the lock.

While the queued spinlocks improve performance in a system with dedicated
CPUs, in a virtualized environment with continuously overcommitted CPUs
the queued spinlocks can have a negative effect on performance. This
is due to the fact that a queued CPU that is preempted by the hypervisor
will block the queue at some point even without holding the lock. With
the classic spinlock it does not matter if a CPU is preempted that waits
for the lock. Therefore use the queued spinlock code only if the system
runs with dedicated CPUs and fall back to classic spinlocks when running
with shared CPUs.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:44 +02:00
Martin Schwidefsky
8153380379 s390/spinlock: use the cpu number +1 as spinlock value
The queued spinlock code will come out simpler if the encoding of
the CPU that holds the spinlock is (cpu+1) instead of (~cpu).

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:44 +02:00
Martin Schwidefsky
1887aa07b6 s390/topology: add detection of dedicated vs shared CPUs
The topology information returned by STSI 15.x.x contains a flag
if the CPUs of a topology-list are dedicated or shared. Make this
information available if the machine provides topology information.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:43 +02:00
Heiko Carstens
1922099979 s390/cpumf: remove superfluous nr_cpumask_bits check
Paul Burton reported that the nr_cpumask_bits check
within cpumsf_pmu_event_init() is not necessary.

Actually there is already a prior check within
perf_event_alloc(). Therefore remove the check.

Reported-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:43 +02:00
Alice Frosi
262832bc5a s390/ptrace: add runtime instrumention register get/set
Add runtime instrumention register get and set which allows to read
and modify the runtime instrumention control block.

Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:41 +02:00
Alice Frosi
bb59c2da3f s390/runtime_instrumentation: clean up struct runtime_instr_cb
Update runtime_instr_cb structure to be consistent with the runtime
instrumentation documentation.

Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:40 +02:00
Heiko Carstens
79962038df s390: add support for FORTIFY_SOURCE
This is the quite trivial backend for s390 which is required to enable
FORTIFY_SOURCE support.

See commit 6974f0c455 ("include/linux/string.h: add the option of
fortified string.h functions") for more details.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:40 +02:00
Heiko Carstens
59a19ea9a0 s390: get rid of exit_thread()
exit_thread() is empty now. Therefore remove it and get rid of a
pointless branch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:40 +02:00
Heiko Carstens
7b83c6297d s390/guarded storage: simplify task exit handling
Free data structures required for guarded storage from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.

In addition this allows to get rid of exit_thread() in a later patch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:40 +02:00
Heiko Carstens
5ef2d5231d s390/ptrace: fix guarded storage regset handling
If the guarded storage regset for current is supposed to be changed,
the regset from user space is copied directly into the guarded storage
control block.

If then the process gets scheduled away while the control block is
being copied and before the new control block has been loaded, the
result is random: the process can be scheduled away due to a page
fault or preemption. If that happens the already copied parts will be
overwritten by save_gs_cb(), called from switch_to().

Avoid this by copying the data to a temporary buffer on the stack and
do the actual update with preemption disabled.

Fixes: f5bbd72198 ("s390/ptrace: guarded storage regset for the current task")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:40 +02:00
Heiko Carstens
fa1edf3f63 s390/guarded storage: fix possible memory corruption
For PREEMPT enabled kernels the guarded storage (GS) code contains a
possible use-after-free bug. If a task that makes use of GS exits, it
will execute do_exit() while still enabled for preemption.

That function will call exit_thread_runtime_instr() via exit_thread().
If exit_thread_gs() gets preempted after the GS control block of the
task has been freed but before the pointer to it is set to NULL, then
save_gs_cb(), called from switch_to(), will write to already freed
memory.

Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.

Fixes: 916cda1aa1 ("s390: add a system call for guarded storage")
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:39 +02:00
Heiko Carstens
8d9047f8b9 s390/runtime instrumentation: simplify task exit handling
Free data structures required for runtime instrumentation from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.

In addition this allows to get rid of exit_thread() in a later patch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:39 +02:00
Heiko Carstens
d6e646ad7c s390/runtime instrumention: fix possible memory corruption
For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.

That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.

Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.

Fixes: e4b8b3f33f ("s390: add support for runtime instrumentation")
Cc: <stable@vger.kernel.org> # v3.7+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:39 +02:00
Heiko Carstens
8076428f0c s390: convert release_thread() into a static inline function
release_thread() is an empty function that gets called on every task
exit. Move the function to a header file and force inlining of it, so
that the compiler can optimize it away instead of generating a
pointless function call.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28 07:29:39 +02:00
Heiko Carstens
51dce3867c s390/topology: enable / disable topology dynamically
Add a new sysctl file /proc/sys/s390/topology which displays if
topology is on (1) or off (0) as specified by the "topology=" kernel
parameter.

This allows to change topology information during runtime and
configuring it via /etc/sysctl.conf instead of using the kernel line
parameter.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-20 13:47:55 +02:00
Heiko Carstens
1b25fda053 s390/topology: alternative topology for topology-less machines
If running on machines that do not provide topology information we
currently generate a "fake" topology which defines the maximum
distance between each cpu: each cpu will be put into an own drawer.

Historically this used to be the best option for (virtual) machines in
overcommited hypervisors.

For some workloads however it is better to generate a different
topology where all cpus are siblings within a package (all cpus are
core siblings). This shows performance improvements of up to 10%,
depending on the workload.

In order to keep the current behaviour, but also allow to switch to
the different core sibling topology use the existing "topology="
kernel parameter:

Specifying "topology=on" on machines without topology information will
generate the core siblings (fake) topology information, instead of the
default topology information where all cpus have the maximum distance.

On machines which provide topology information specifying
"topology=on" does not have any effect.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-20 13:47:54 +02:00
Gerald Schaefer
ba385c0594 s390/mm: fix write access check in gup_huge_pmd()
The check for the _SEGMENT_ENTRY_PROTECT bit in gup_huge_pmd() is the
wrong way around. It must not be set for write==1, and not be checked for
write==0. Fix this similar to how it was fixed for ptes long time ago in
commit 25591b0703 ("[S390] fix get_user_pages_fast").

One impact of this bug would be unnecessarily using the gup slow path for
write==0 on r/w mappings. A potentially more severe impact would be that
gup_huge_pmd() will succeed for write==1 on r/o mappings.

Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-19 08:36:20 +02:00
Gerald Schaefer
91c575b335 s390/mm: make pmdp_invalidate() do invalidation only
Commit 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
inadvertently changed the behavior of pmdp_invalidate(), so that it now
clears the pmd instead of just marking it as invalid. Fix this by restoring
the original behavior.

A possible impact of the misbehaving pmdp_invalidate() would be the
MADV_DONTNEED races (see commits ced10803 and 58ceeb6b), although we
should not have any negative impact on the related dirty/young flags,
since those flags are not set by the hardware on s390.

Fixes: 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-19 08:36:19 +02:00
Pu Hou
fc3100d64f s390/perf: fix bug when creating per-thread event
A per-thread event could not be created correctly like below:

    perf record --per-thread -e rB0000 -- sleep 1
    Error:
    The sys_perf_event_open() syscall returned with 19 (No such device) for event (rB0000).
    /bin/dmesg may provide additional information.
    No CONFIG_PERF_EVENTS=y kernel support configured?

This bug was introduced by:

    commit c311c79799
    Author: Alexey Dobriyan <adobriyan@gmail.com>
    Date:   Mon May 8 15:56:15 2017 -0700

    cpumask: make "nr_cpumask_bits" unsigned

If a per-thread event is not attached to any CPU, the cpu field
in struct perf_event is -1. The above commit converts the CPU number
to unsigned int, which result in an illegal CPU number.

Fixes: c311c79799 ("cpumask: make "nr_cpumask_bits" unsigned")
Cc: <stable@vger.kernel.org> # v4.12+
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-13 16:34:23 +02:00
Linus Torvalds
260d16580d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:
 "The second patch set for the 4.14 merge window:

   - Convert the dasd device driver to the blk-mq interface.

   - Provide three zcrypt interfaces for vfio_ap. These will be required
     for KVM guest access to the crypto cards attached via the AP bus.

   - A couple of memory management bug fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/dasd: blk-mq conversion
  s390/mm: use a single lock for the fields in mm_context_t
  s390/mm: fix race on mm->context.flush_mm
  s390/mm: fix local TLB flushing vs. detach of an mm address space
  s390/zcrypt: externalize AP queue interrupt control
  s390/zcrypt: externalize AP config info query
  s390/zcrypt: externalize test AP queue
  s390/mm: use VM_BUG_ON in crst_table_[upgrade|downgrade]
2017-09-12 06:01:59 -07:00
Linus Torvalds
dd198ce714 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "Life has been busy and I have not gotten half as much done this round
  as I would have liked. I delayed it so that a minor conflict
  resolution with the mips tree could spend a little time in linux-next
  before I sent this pull request.

  This includes two long delayed user namespace changes from Kirill
  Tkhai. It also includes a very useful change from Serge Hallyn that
  allows the security capability attribute to be used inside of user
  namespaces. The practical effect of this is people can now untar
  tarballs and install rpms in user namespaces. It had been suggested to
  generalize this and encode some of the namespace information
  information in the xattr name. Upon close inspection that makes the
  things that should be hard easy and the things that should be easy
  more expensive.

  Then there is my bugfix/cleanup for signal injection that removes the
  magic encoding of the siginfo union member from the kernel internal
  si_code. The mips folks reported the case where I had used FPE_FIXME
  me is impossible so I have remove FPE_FIXME from mips, while at the
  same time including a return statement in that case to keep gcc from
  complaining about unitialized variables.

  I almost finished the work to get make copy_siginfo_to_user a trivial
  copy to user. The code is available at:

     git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

  But I did not have time/energy to get the code posted and reviewed
  before the merge window opened.

  I was able to see that the security excuse for just copying fields
  that we know are initialized doesn't work in practice there are buggy
  initializations that don't initialize the proper fields in siginfo. So
  we still sometimes copy unitialized data to userspace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Introduce v3 namespaced file capabilities
  mips/signal: In force_fcr31_sig return in the impossible case
  signal: Remove kernel interal si_code magic
  fcntl: Don't use ambiguous SIG_POLL si_codes
  prctl: Allow local CAP_SYS_ADMIN changing exe_file
  security: Use user_namespace::level to avoid redundant iterations in cap_capable()
  userns,pidns: Verify the userns for new pid namespaces
  signal/testing: Don't look for __SI_FAULT in userspace
  signal/mips: Document a conflict with SI_USER with SIGFPE
  signal/sparc: Document a conflict with SI_USER with SIGFPE
  signal/ia64: Document a conflict with SI_USER with SIGFPE
  signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11 18:34:47 -07:00
Linus Torvalds
4dfc278803 IOMMU Updates for Linux v4.14
Slightly more changes than usual this time:
 
 	- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now
 	  tries to preserve the mappings of the kernel so that master
 	  aborts for devices are avoided. Master aborts cause some
 	  devices to fail in the kdump kernel, so this code makes the
 	  dump more likely to succeed when AMD IOMMU is enabled.
 
 	- Common flush queue implementation for IOVA code users. The
 	  code is still optional, but AMD and Intel IOMMU drivers had
 	  their own implementation which is now unified.
 
 	- Finish support for iommu-groups. All drivers implement this
 	  feature now so that IOMMU core code can rely on it.
 
 	- Finish support for 'struct iommu_device' in iommu drivers. All
 	  drivers now use the interface.
 
 	- New functions in the IOMMU-API for explicit IO/TLB flushing.
 	  This will help to reduce the number of IO/TLB flushes when
 	  IOMMU drivers support this interface.
 
 	- Support for mt2712 in the Mediatek IOMMU driver
 
 	- New IOMMU driver for QCOM hardware
 
 	- System PM support for ARM-SMMU
 
 	- Shutdown method for ARM-SMMU-v3
 
 	- Some constification patches
 
 	- Various other small improvements and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZtCFNAAoJECvwRC2XARrjZnQP/AxC/ezQpq82HbegF4sM/cVE
 Ep7TeTqodEl75FS/6txe2wU0pwodqk/LB9ajfQZUbE1w8vKsNEqi5qf4ZYHGoxYI
 5bWyjJBzKIlwENH5lsBpQNt6XLevrYmRsFy7F0tRYy+qPQq8k+js2i7/XkCL3q7L
 3xklF847RRoITaTOhhaROx1pF23dSMEsS2XGuWHcZfjORtep4wcFKzd/2SvlCWCo
 P2bRU7jBzfJuuGSA80gaiUbDmrULTUfYuZNp7njASzCgsDmagERtvDEpdoXPNNSp
 u6s4LjU1Dp3fgr6g6cFRO7B6JUbWd619nwo9so/c/wZN54yEngBF9EyeeF3mv2O5
 ZbM2mOW3RlZcjxFT/AC8G4cZwwP6MpCEQOdqknoqc6ZQwcDqwN0o9I4+po0wsiAU
 89ijZZe9Mx0p9lNpihaBEB1erAUWPo5Obh62zo80W3h6x9WzkGQWM+PyFK2DYoaC
 8biEZzcc21sLEHvXQkcEGJSKrihHr9sluOqvxmCw5QAkYIFAeZRoeH7JtZWjVCnr
 T3XvaG1G1Aw6tS7ErxufdKawREAGki0Rm9i1baiH9sqNj5rllM01Y+PgU6E21Nbg
 iZp9gJLjfwM4vhYLlovvQK5PRoOBsCkyCpEI4GJqjLeam5p/WN06CFFf0ifQofYr
 qDPCVDkWHWV8nugFFKE7
 =EVh9
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "Slightly more changes than usual this time:

   - KDump Kernel IOMMU take-over code for AMD IOMMU. The code now tries
     to preserve the mappings of the kernel so that master aborts for
     devices are avoided. Master aborts cause some devices to fail in
     the kdump kernel, so this code makes the dump more likely to
     succeed when AMD IOMMU is enabled.

   - common flush queue implementation for IOVA code users. The code is
     still optional, but AMD and Intel IOMMU drivers had their own
     implementation which is now unified.

   - finish support for iommu-groups. All drivers implement this feature
     now so that IOMMU core code can rely on it.

   - finish support for 'struct iommu_device' in iommu drivers. All
     drivers now use the interface.

   - new functions in the IOMMU-API for explicit IO/TLB flushing. This
     will help to reduce the number of IO/TLB flushes when IOMMU drivers
     support this interface.

   - support for mt2712 in the Mediatek IOMMU driver

   - new IOMMU driver for QCOM hardware

   - system PM support for ARM-SMMU

   - shutdown method for ARM-SMMU-v3

   - some constification patches

   - various other small improvements and fixes"

* tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
  iommu/vt-d: Don't be too aggressive when clearing one context entry
  iommu: Introduce Interface for IOMMU TLB Flushing
  iommu/s390: Constify iommu_ops
  iommu/vt-d: Avoid calling virt_to_phys() on null pointer
  iommu/vt-d: IOMMU Page Request needs to check if address is canonical.
  arm/tegra: Call bus_set_iommu() after iommu_device_register()
  iommu/exynos: Constify iommu_ops
  iommu/ipmmu-vmsa: Make ipmmu_gather_ops const
  iommu/ipmmu-vmsa: Rereserving a free context before setting up a pagetable
  iommu/amd: Rename a few flush functions
  iommu/amd: Check if domain is NULL in get_domain() and return -EBUSY
  iommu/mediatek: Fix a build warning of BIT(32) in ARM
  iommu/mediatek: Fix a build fail of m4u_type
  iommu: qcom: annotate PM functions as __maybe_unused
  iommu/pamu: Fix PAMU boot crash
  memory: mtk-smi: Degrade SMI init to module_init
  iommu/mediatek: Enlarge the validate PA range for 4GB mode
  iommu/mediatek: Disable iommu clock when system suspend
  iommu/mediatek: Move pgtable allocation into domain_alloc
  iommu/mediatek: Merge 2 M4U HWs into one iommu domain
  ...
2017-09-09 15:03:24 -07:00
Linus Torvalds
0d519f2d1e pci-v4.14-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
 vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
 64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
 XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
 0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
 B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
 nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
 c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
 4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
 h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
 GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
 EG76ridTolUFVV+wzJD9
 =9cLP
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add enhanced Downstream Port Containment support, which prints more
   details about Root Port Programmed I/O errors (Dongdong Liu)

 - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)

 - add MediaTek MT2712 and MT7622 support (Ryder Lee)

 - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)

 - add Qualcom IPQ8074 support (Varadarajan Narayanan)

 - add R-Car r8a7743/5 device tree support (Biju Das)

 - add Rockchip per-lane PHY support for better power management (Shawn
   Lin)

 - fix IRQ mapping for hot-added devices by replacing the
   pci_fixup_irqs() boot-time design with a host bridge hook called at
   probe-time (Lorenzo Pieralisi, Matthew Minter)

 - fix race when enabling two devices that results in upstream bridge
   not being enabled correctly (Srinath Mannam)

 - fix pciehp power fault infinite loop (Keith Busch)

 - fix SHPC bridge MSI hotplug events by enabling bus mastering
   (Aleksandr Bezzubikov)

 - fix a VFIO issue by correcting PCIe capability sizes (Alex
   Williamson)

 - fix an INTD issue on Xilinx and possibly other drivers by unifying
   INTx IRQ domain support (Paul Burton)

 - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
   Roedel)

 - allow APM X-Gene device assignment to guests by adding an ACS quirk
   (Feng Kan)

 - fix driver crashes by disabling Extended Tags on Broadcom HT2100
   (Extended Tags support is required for PCIe Receivers but not
   Requesters, and we now enable them by default when Requesters support
   them) (Sinan Kaya)

 - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
   use the real Requester ID (not a phantom RID) (Robin Murphy)

 - prevent assignment of Intel VMD children to guests (which may be
   supported eventually, but isn't yet) by not associating an IOMMU with
   them (Jon Derrick)

 - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
   Bauer)

 - fix a Function-Level Reset issue with Intel 750 NVMe by waiting
   longer (up to 60sec instead of 1sec) for device to become ready
   (Sinan Kaya)

 - fix a Function-Level Reset issue on iProc Stingray by working around
   hardware defects in the CRS implementation (Oza Pawandeep)

 - fix an issue with Intel NVMe P3700 after an iProc reset by adding a
   delay during shutdown (Oza Pawandeep)

 - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
   in compose_msi_msg() (Stephen Hemminger)

 - fix a wireless LAN driver timeout by clearing DesignWare MSI
   interrupt status after it is handled, not before (Faiz Abbas)

 - fix DesignWare ATU enable checking (Jisheng Zhang)

 - reduce Layerscape dependencies on the bootloader by doing more
   initialization in the driver (Hou Zhiqiang)

 - improve Intel VMD performance allowing allocation of more IRQ vectors
   than present CPUs (Keith Busch)

 - improve endpoint framework support for initial DMA mask, different
   BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
   Vijay Abraham I, Stan Drozd)

 - rework CRS support to add periodic messages while we poll during
   enumeration and after Function-Level Reset and prepare for possible
   other uses of CRS (Sinan Kaya)

 - clean up Root Port AER handling by removing unnecessary code and
   moving error handler methods to struct pcie_port_service_driver
   (Christoph Hellwig)

 - clean up error handling paths in various drivers (Bjorn Andersson,
   Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
   Lorenzo Pieralisi, Sergei Shtylyov)

 - clean up SR-IOV resource handling by disabling VF decoding before
   updating the corresponding resource structs (Gavin Shan)

 - clean up DesignWare-based drivers by unifying quirks to update Class
   Code and Interrupt Pin and related handling of write-protected
   registers (Hou Zhiqiang)

 - clean up by adding empty generic pcibios_align_resource() and
   pcibios_fixup_bus() and removing empty arch-specific implementations
   (Palmer Dabbelt)

 - request exclusive reset control for several drivers to allow cleanup
   elsewhere (Philipp Zabel)

 - constify various structures (Arvind Yadav, Bhumika Goyal)

 - convert from full_name() to %pOF (Rob Herring)

 - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
   Lin)

* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
  PCI: xgene: Clean up whitespace
  PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
  PCI: xgene: Fix platform_get_irq() error handling
  PCI: xilinx-nwl: Fix platform_get_irq() error handling
  PCI: rockchip: Fix platform_get_irq() error handling
  PCI: altera: Fix platform_get_irq() error handling
  PCI: spear13xx: Fix platform_get_irq() error handling
  PCI: artpec6: Fix platform_get_irq() error handling
  PCI: armada8k: Fix platform_get_irq() error handling
  PCI: dra7xx: Fix platform_get_irq() error handling
  PCI: exynos: Fix platform_get_irq() error handling
  PCI: iproc: Clean up whitespace
  PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
  PCI: iproc: Add 500ms delay during device shutdown
  PCI: Fix typos and whitespace errors
  PCI: Remove unused "res" variable from pci_resource_io()
  PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
  PCI/AER: Reformat AER register definitions
  iommu/vt-d: Prevent VMD child devices from being remapping targets
  x86/PCI: Use is_vmd() rather than relying on the domain number
  ...
2017-09-08 15:47:43 -07:00