Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
debug_event_common memsets the active debug entry with zeros to
prevent stale data leakage. This is overwritten with the actual
debug data in the next step. Only write zeros to that part of the
debug entry that's not used by new debug data.
Micro benchmarks show a 2-10% reduction of cpu cycles with this
approach.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
debug_event currently truncates the data if used with a size larger than
the buf_size of the debug feature. For lots of callers of this function,
wrappers have been implemented that loop until all data is handled.
Move that functionality into debug_event_common and get rid of the wrappers.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Before kexec boots to a crash kernel it checks whether the image in memory
changed after load. This is done by the function kdump_csum_valid, which
returns true, i.e. an int != 0, on success and 0 otherwise. In other words
when kdump_csum_valid returns an error code it means that the validation
succeeded. This is not only counterintuitive but also produces the wrong
result if the kernel was build without CONFIG_CRASH_DUMP. Fix this by
making kdump_csum_valid return a bool.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The debug feature code hasn't been touched in ages and the code also
looks like this. Therefore clean up the code so it looks a bit more
like current coding style.
There is no functional change - actually I made also sure that the
generated code with performance_defconfig is identical.
A diff of old vs new with "objdump -d" is empty.
The code is still not checkpatch clean, but that was not the goal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For an unknown reason the s390 kprobes instruction replacement
function modifies the kprobe_status of the current CPU to
KPROBE_SWAP_INST. This was supposed to catch traps that happened
during instruction patching. Such a fault is not supposed to happen,
and silently discarding such a fault is certainly also not what we
want. In fact s390 is the only architecture which has this odd piece
of code.
Just remove this and behave like all other architectures. This was
pointed out by Jens Remus.
Reported-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just some trivial changes like removing the extern keyword from the
header file, renaming arguments to match the man pages, and whitespace
removal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Like for the memset16/32/64 variants avoid that subsequent mvc
instructions depend on each other since that might have negative
performance impacts.
This patch is currently hardly relevant since at least gcc 7.1
generates only inline memset code and not a single memset call.
However there is no reason to not provide an optimized version
just in case gcc generates memset calls again, like it did in
the past.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use memset64 instead of the (now) open-coded variant clear_table.
Performance wise there is no difference.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide fast versions of the new memset variants. E.g. the generic
memset64 is ten times slower than the optimized version if used on a
whole page.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a syscall of s390_sthyi to implement STHYI instruction in LPAR
which reuses the implementation for KVM by Janosch Frank -
commit 95ca2cb579 ("KVM: s390: Add sthyi emulation").
STHYI(Store Hypervisor Information) is an emulated z/VM instruction that
provides a guest with basic information about the layers it is running
on. This includes information about the cpu configuration of both the
machine and the lpar, as well as their names, machine model and
machine type. This information enables an application to determine the
maximum capacity of CPs and IFLs available to software.
For the arguments of s390_sthyi, code shall be 0 and flags is reserved for
future use, info is the output argument to store the required hypervisor
info.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
STHYI requires extensive locking in the higher hypervisors and is
very computational/memory expensive. Therefore we cache the retrieved
hypervisor info whose valid period is 1s with mutex to allow concurrent
access. rw semaphore can't benefit here due to cache line bounce.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As we need to support sthyi instruction on LPAR too, move the common code
to kernel part and kvm related code to intercept.c for better reuse.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We never optimized our rwsem inline assemblies to make use of the new
atomic instructions. The generic rwsem implementation implicitly makes
use of the new instructions, since it implements the required rwsem
primitives with atomic operations, which we did optimize.
However even when compiling for old architectures the generic variant
still generates better code. So it's time to simply remove our old
code and switch to the generic implementation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This instruction came with a z/VM extension and not with a specific
machine generation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove a couple of instructions that are listed twice.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.
This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.
Cc: <stable@vger.kernel.org> # v3.18+
Fixes: 3585cb0280 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is no recent user space application available anymore which still
supports this old virtio transport. Additionally, commit 3b2fbb3f06
("virtio/s390: deprecate old transport") introduced a deprecation message
in the driver, and apparently nobody complained so far that it is still
required. So let's simply remove it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When grouping devices, the ccwgroup core only checks whether all of the
devices are bound to the same ccw_driver. It has no means of checking
if the requesting ccwgroup driver actually supports this device type.
qeth implements its own device matching in qeth_core_probe_device(),
while ctcm and lcs currently have no sanity-checking at all.
Enable ccwgroup drivers to optionally defer the device type checking to
the ccwgroup core, by specifying their supported ccw_driver.
This allows us drop the device type matching from qeth, and improves
the robustness of ctcm and lcs.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch introduces gcm(aes) support into the aes_s390 kernel module.
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Like the common queued rwlock code the s390 implementation uses the
queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
the queueing. The encoding of the rwlock_t differs though, the counter
field in the rwlock_t is split into two parts. The upper two bytes hold
the write bit and the write wait counter, the lower two bytes hold the
read counter.
The arch_read_lock operation works exactly like the common qrwlock but
the enqueue operation for a writer follows a diffent logic. After the
failed inline try to get the rwlock in write, the writer first increases
the write wait counter, acquires the wait spin_lock for the queueing,
and then loops until there are no readers and the write bit is zero.
Without the write wait counter a CPU that just released the rwlock
could immediately reacquire the lock in the inline code, bypassing all
outstanding read and write waiters. For s390 this would cause massive
imbalances in favour of writers in case of a contended rwlock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queued spinlock code for s390 follows the principles of the common
code qspinlock implementation but with a few notable differences.
The format of the spinlock_t locking word differs, s390 needs to store
the logical CPU number of the lock holder in the spinlock_t to be able
to use the diagnose 9c directed yield hypervisor call.
The inline code sequences for spin_lock and spin_unlock are nice and
short. The inline portion of a spin_lock now typically looks like this:
lhi %r0,0 # 0 indicates an empty lock
l %r1,0x3a0 # CPU number + 1 from lowcore
cs %r0,%r1,<some_lock> # lock operation
jnz call_wait # on failure call wait function
locked:
...
call_wait:
la %r2,<some_lock>
brasl %r14,arch_spin_lock_wait
j locked
A spin_unlock is as simple as before:
lhi %r0,0
sth %r0,2(%r2) # unlock operation
After a CPU has queued itself it may not enable interrupts again for the
arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
is removed.
To improve performance the code implements opportunistic lock stealing.
If the wait function finds a spinlock_t that indicates that the lock is
free but there are queued waiters, the CPU may steal the lock up to three
times without queueing itself. The lock stealing update the steal counter
in the lock word to prevent more than 3 steals. The counter is reset at
the time the CPU next in the queue successfully takes the lock.
While the queued spinlocks improve performance in a system with dedicated
CPUs, in a virtualized environment with continuously overcommitted CPUs
the queued spinlocks can have a negative effect on performance. This
is due to the fact that a queued CPU that is preempted by the hypervisor
will block the queue at some point even without holding the lock. With
the classic spinlock it does not matter if a CPU is preempted that waits
for the lock. Therefore use the queued spinlock code only if the system
runs with dedicated CPUs and fall back to classic spinlocks when running
with shared CPUs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queued spinlock code will come out simpler if the encoding of
the CPU that holds the spinlock is (cpu+1) instead of (~cpu).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The topology information returned by STSI 15.x.x contains a flag
if the CPUs of a topology-list are dedicated or shared. Make this
information available if the machine provides topology information.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Paul Burton reported that the nr_cpumask_bits check
within cpumsf_pmu_event_init() is not necessary.
Actually there is already a prior check within
perf_event_alloc(). Therefore remove the check.
Reported-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add runtime instrumention register get and set which allows to read
and modify the runtime instrumention control block.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Update runtime_instr_cb structure to be consistent with the runtime
instrumentation documentation.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is the quite trivial backend for s390 which is required to enable
FORTIFY_SOURCE support.
See commit 6974f0c455 ("include/linux/string.h: add the option of
fortified string.h functions") for more details.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
exit_thread() is empty now. Therefore remove it and get rid of a
pointless branch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Free data structures required for guarded storage from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.
In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the guarded storage regset for current is supposed to be changed,
the regset from user space is copied directly into the guarded storage
control block.
If then the process gets scheduled away while the control block is
being copied and before the new control block has been loaded, the
result is random: the process can be scheduled away due to a page
fault or preemption. If that happens the already copied parts will be
overwritten by save_gs_cb(), called from switch_to().
Avoid this by copying the data to a temporary buffer on the stack and
do the actual update with preemption disabled.
Fixes: f5bbd72198 ("s390/ptrace: guarded storage regset for the current task")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For PREEMPT enabled kernels the guarded storage (GS) code contains a
possible use-after-free bug. If a task that makes use of GS exits, it
will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via exit_thread().
If exit_thread_gs() gets preempted after the GS control block of the
task has been freed but before the pointer to it is set to NULL, then
save_gs_cb(), called from switch_to(), will write to already freed
memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: 916cda1aa1 ("s390: add a system call for guarded storage")
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Free data structures required for runtime instrumentation from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.
In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: e4b8b3f33f ("s390: add support for runtime instrumentation")
Cc: <stable@vger.kernel.org> # v3.7+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
release_thread() is an empty function that gets called on every task
exit. Move the function to a header file and force inlining of it, so
that the compiler can optimize it away instead of generating a
pointless function call.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new sysctl file /proc/sys/s390/topology which displays if
topology is on (1) or off (0) as specified by the "topology=" kernel
parameter.
This allows to change topology information during runtime and
configuring it via /etc/sysctl.conf instead of using the kernel line
parameter.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If running on machines that do not provide topology information we
currently generate a "fake" topology which defines the maximum
distance between each cpu: each cpu will be put into an own drawer.
Historically this used to be the best option for (virtual) machines in
overcommited hypervisors.
For some workloads however it is better to generate a different
topology where all cpus are siblings within a package (all cpus are
core siblings). This shows performance improvements of up to 10%,
depending on the workload.
In order to keep the current behaviour, but also allow to switch to
the different core sibling topology use the existing "topology="
kernel parameter:
Specifying "topology=on" on machines without topology information will
generate the core siblings (fake) topology information, instead of the
default topology information where all cpus have the maximum distance.
On machines which provide topology information specifying
"topology=on" does not have any effect.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The check for the _SEGMENT_ENTRY_PROTECT bit in gup_huge_pmd() is the
wrong way around. It must not be set for write==1, and not be checked for
write==0. Fix this similar to how it was fixed for ptes long time ago in
commit 25591b0703 ("[S390] fix get_user_pages_fast").
One impact of this bug would be unnecessarily using the gup slow path for
write==0 on r/w mappings. A potentially more severe impact would be that
gup_huge_pmd() will succeed for write==1 on r/o mappings.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
inadvertently changed the behavior of pmdp_invalidate(), so that it now
clears the pmd instead of just marking it as invalid. Fix this by restoring
the original behavior.
A possible impact of the misbehaving pmdp_invalidate() would be the
MADV_DONTNEED races (see commits ced10803 and 58ceeb6b), although we
should not have any negative impact on the related dirty/young flags,
since those flags are not set by the hardware on s390.
Fixes: 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A per-thread event could not be created correctly like below:
perf record --per-thread -e rB0000 -- sleep 1
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (rB0000).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
This bug was introduced by:
commit c311c79799
Author: Alexey Dobriyan <adobriyan@gmail.com>
Date: Mon May 8 15:56:15 2017 -0700
cpumask: make "nr_cpumask_bits" unsigned
If a per-thread event is not attached to any CPU, the cpu field
in struct perf_event is -1. The above commit converts the CPU number
to unsigned int, which result in an illegal CPU number.
Fixes: c311c79799 ("cpumask: make "nr_cpumask_bits" unsigned")
Cc: <stable@vger.kernel.org> # v4.12+
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull more s390 updates from Martin Schwidefsky:
"The second patch set for the 4.14 merge window:
- Convert the dasd device driver to the blk-mq interface.
- Provide three zcrypt interfaces for vfio_ap. These will be required
for KVM guest access to the crypto cards attached via the AP bus.
- A couple of memory management bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: blk-mq conversion
s390/mm: use a single lock for the fields in mm_context_t
s390/mm: fix race on mm->context.flush_mm
s390/mm: fix local TLB flushing vs. detach of an mm address space
s390/zcrypt: externalize AP queue interrupt control
s390/zcrypt: externalize AP config info query
s390/zcrypt: externalize test AP queue
s390/mm: use VM_BUG_ON in crst_table_[upgrade|downgrade]
Pull namespace updates from Eric Biederman:
"Life has been busy and I have not gotten half as much done this round
as I would have liked. I delayed it so that a minor conflict
resolution with the mips tree could spend a little time in linux-next
before I sent this pull request.
This includes two long delayed user namespace changes from Kirill
Tkhai. It also includes a very useful change from Serge Hallyn that
allows the security capability attribute to be used inside of user
namespaces. The practical effect of this is people can now untar
tarballs and install rpms in user namespaces. It had been suggested to
generalize this and encode some of the namespace information
information in the xattr name. Upon close inspection that makes the
things that should be hard easy and the things that should be easy
more expensive.
Then there is my bugfix/cleanup for signal injection that removes the
magic encoding of the siginfo union member from the kernel internal
si_code. The mips folks reported the case where I had used FPE_FIXME
me is impossible so I have remove FPE_FIXME from mips, while at the
same time including a return statement in that case to keep gcc from
complaining about unitialized variables.
I almost finished the work to get make copy_siginfo_to_user a trivial
copy to user. The code is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3
But I did not have time/energy to get the code posted and reviewed
before the merge window opened.
I was able to see that the security excuse for just copying fields
that we know are initialized doesn't work in practice there are buggy
initializations that don't initialize the proper fields in siginfo. So
we still sometimes copy unitialized data to userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Introduce v3 namespaced file capabilities
mips/signal: In force_fcr31_sig return in the impossible case
signal: Remove kernel interal si_code magic
fcntl: Don't use ambiguous SIG_POLL si_codes
prctl: Allow local CAP_SYS_ADMIN changing exe_file
security: Use user_namespace::level to avoid redundant iterations in cap_capable()
userns,pidns: Verify the userns for new pid namespaces
signal/testing: Don't look for __SI_FAULT in userspace
signal/mips: Document a conflict with SI_USER with SIGFPE
signal/sparc: Document a conflict with SI_USER with SIGFPE
signal/ia64: Document a conflict with SI_USER with SIGFPE
signal/alpha: Document a conflict with SI_USER for SIGTRAP
Slightly more changes than usual this time:
- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now
tries to preserve the mappings of the kernel so that master
aborts for devices are avoided. Master aborts cause some
devices to fail in the kdump kernel, so this code makes the
dump more likely to succeed when AMD IOMMU is enabled.
- Common flush queue implementation for IOVA code users. The
code is still optional, but AMD and Intel IOMMU drivers had
their own implementation which is now unified.
- Finish support for iommu-groups. All drivers implement this
feature now so that IOMMU core code can rely on it.
- Finish support for 'struct iommu_device' in iommu drivers. All
drivers now use the interface.
- New functions in the IOMMU-API for explicit IO/TLB flushing.
This will help to reduce the number of IO/TLB flushes when
IOMMU drivers support this interface.
- Support for mt2712 in the Mediatek IOMMU driver
- New IOMMU driver for QCOM hardware
- System PM support for ARM-SMMU
- Shutdown method for ARM-SMMU-v3
- Some constification patches
- Various other small improvements and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZtCFNAAoJECvwRC2XARrjZnQP/AxC/ezQpq82HbegF4sM/cVE
Ep7TeTqodEl75FS/6txe2wU0pwodqk/LB9ajfQZUbE1w8vKsNEqi5qf4ZYHGoxYI
5bWyjJBzKIlwENH5lsBpQNt6XLevrYmRsFy7F0tRYy+qPQq8k+js2i7/XkCL3q7L
3xklF847RRoITaTOhhaROx1pF23dSMEsS2XGuWHcZfjORtep4wcFKzd/2SvlCWCo
P2bRU7jBzfJuuGSA80gaiUbDmrULTUfYuZNp7njASzCgsDmagERtvDEpdoXPNNSp
u6s4LjU1Dp3fgr6g6cFRO7B6JUbWd619nwo9so/c/wZN54yEngBF9EyeeF3mv2O5
ZbM2mOW3RlZcjxFT/AC8G4cZwwP6MpCEQOdqknoqc6ZQwcDqwN0o9I4+po0wsiAU
89ijZZe9Mx0p9lNpihaBEB1erAUWPo5Obh62zo80W3h6x9WzkGQWM+PyFK2DYoaC
8biEZzcc21sLEHvXQkcEGJSKrihHr9sluOqvxmCw5QAkYIFAeZRoeH7JtZWjVCnr
T3XvaG1G1Aw6tS7ErxufdKawREAGki0Rm9i1baiH9sqNj5rllM01Y+PgU6E21Nbg
iZp9gJLjfwM4vhYLlovvQK5PRoOBsCkyCpEI4GJqjLeam5p/WN06CFFf0ifQofYr
qDPCVDkWHWV8nugFFKE7
=EVh9
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"Slightly more changes than usual this time:
- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now tries
to preserve the mappings of the kernel so that master aborts for
devices are avoided. Master aborts cause some devices to fail in
the kdump kernel, so this code makes the dump more likely to
succeed when AMD IOMMU is enabled.
- common flush queue implementation for IOVA code users. The code is
still optional, but AMD and Intel IOMMU drivers had their own
implementation which is now unified.
- finish support for iommu-groups. All drivers implement this feature
now so that IOMMU core code can rely on it.
- finish support for 'struct iommu_device' in iommu drivers. All
drivers now use the interface.
- new functions in the IOMMU-API for explicit IO/TLB flushing. This
will help to reduce the number of IO/TLB flushes when IOMMU drivers
support this interface.
- support for mt2712 in the Mediatek IOMMU driver
- new IOMMU driver for QCOM hardware
- system PM support for ARM-SMMU
- shutdown method for ARM-SMMU-v3
- some constification patches
- various other small improvements and fixes"
* tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
iommu/vt-d: Don't be too aggressive when clearing one context entry
iommu: Introduce Interface for IOMMU TLB Flushing
iommu/s390: Constify iommu_ops
iommu/vt-d: Avoid calling virt_to_phys() on null pointer
iommu/vt-d: IOMMU Page Request needs to check if address is canonical.
arm/tegra: Call bus_set_iommu() after iommu_device_register()
iommu/exynos: Constify iommu_ops
iommu/ipmmu-vmsa: Make ipmmu_gather_ops const
iommu/ipmmu-vmsa: Rereserving a free context before setting up a pagetable
iommu/amd: Rename a few flush functions
iommu/amd: Check if domain is NULL in get_domain() and return -EBUSY
iommu/mediatek: Fix a build warning of BIT(32) in ARM
iommu/mediatek: Fix a build fail of m4u_type
iommu: qcom: annotate PM functions as __maybe_unused
iommu/pamu: Fix PAMU boot crash
memory: mtk-smi: Degrade SMI init to module_init
iommu/mediatek: Enlarge the validate PA range for 4GB mode
iommu/mediatek: Disable iommu clock when system suspend
iommu/mediatek: Move pgtable allocation into domain_alloc
iommu/mediatek: Merge 2 M4U HWs into one iommu domain
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
EG76ridTolUFVV+wzJD9
=9cLP
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add enhanced Downstream Port Containment support, which prints more
details about Root Port Programmed I/O errors (Dongdong Liu)
- add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)
- add MediaTek MT2712 and MT7622 support (Ryder Lee)
- add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)
- add Qualcom IPQ8074 support (Varadarajan Narayanan)
- add R-Car r8a7743/5 device tree support (Biju Das)
- add Rockchip per-lane PHY support for better power management (Shawn
Lin)
- fix IRQ mapping for hot-added devices by replacing the
pci_fixup_irqs() boot-time design with a host bridge hook called at
probe-time (Lorenzo Pieralisi, Matthew Minter)
- fix race when enabling two devices that results in upstream bridge
not being enabled correctly (Srinath Mannam)
- fix pciehp power fault infinite loop (Keith Busch)
- fix SHPC bridge MSI hotplug events by enabling bus mastering
(Aleksandr Bezzubikov)
- fix a VFIO issue by correcting PCIe capability sizes (Alex
Williamson)
- fix an INTD issue on Xilinx and possibly other drivers by unifying
INTx IRQ domain support (Paul Burton)
- avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
Roedel)
- allow APM X-Gene device assignment to guests by adding an ACS quirk
(Feng Kan)
- fix driver crashes by disabling Extended Tags on Broadcom HT2100
(Extended Tags support is required for PCIe Receivers but not
Requesters, and we now enable them by default when Requesters support
them) (Sinan Kaya)
- fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
use the real Requester ID (not a phantom RID) (Robin Murphy)
- prevent assignment of Intel VMD children to guests (which may be
supported eventually, but isn't yet) by not associating an IOMMU with
them (Jon Derrick)
- fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
Bauer)
- fix a Function-Level Reset issue with Intel 750 NVMe by waiting
longer (up to 60sec instead of 1sec) for device to become ready
(Sinan Kaya)
- fix a Function-Level Reset issue on iProc Stingray by working around
hardware defects in the CRS implementation (Oza Pawandeep)
- fix an issue with Intel NVMe P3700 after an iProc reset by adding a
delay during shutdown (Oza Pawandeep)
- fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
in compose_msi_msg() (Stephen Hemminger)
- fix a wireless LAN driver timeout by clearing DesignWare MSI
interrupt status after it is handled, not before (Faiz Abbas)
- fix DesignWare ATU enable checking (Jisheng Zhang)
- reduce Layerscape dependencies on the bootloader by doing more
initialization in the driver (Hou Zhiqiang)
- improve Intel VMD performance allowing allocation of more IRQ vectors
than present CPUs (Keith Busch)
- improve endpoint framework support for initial DMA mask, different
BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
Vijay Abraham I, Stan Drozd)
- rework CRS support to add periodic messages while we poll during
enumeration and after Function-Level Reset and prepare for possible
other uses of CRS (Sinan Kaya)
- clean up Root Port AER handling by removing unnecessary code and
moving error handler methods to struct pcie_port_service_driver
(Christoph Hellwig)
- clean up error handling paths in various drivers (Bjorn Andersson,
Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
Lorenzo Pieralisi, Sergei Shtylyov)
- clean up SR-IOV resource handling by disabling VF decoding before
updating the corresponding resource structs (Gavin Shan)
- clean up DesignWare-based drivers by unifying quirks to update Class
Code and Interrupt Pin and related handling of write-protected
registers (Hou Zhiqiang)
- clean up by adding empty generic pcibios_align_resource() and
pcibios_fixup_bus() and removing empty arch-specific implementations
(Palmer Dabbelt)
- request exclusive reset control for several drivers to allow cleanup
elsewhere (Philipp Zabel)
- constify various structures (Arvind Yadav, Bhumika Goyal)
- convert from full_name() to %pOF (Rob Herring)
- remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
Lin)
* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
PCI: xgene: Clean up whitespace
PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
PCI: xgene: Fix platform_get_irq() error handling
PCI: xilinx-nwl: Fix platform_get_irq() error handling
PCI: rockchip: Fix platform_get_irq() error handling
PCI: altera: Fix platform_get_irq() error handling
PCI: spear13xx: Fix platform_get_irq() error handling
PCI: artpec6: Fix platform_get_irq() error handling
PCI: armada8k: Fix platform_get_irq() error handling
PCI: dra7xx: Fix platform_get_irq() error handling
PCI: exynos: Fix platform_get_irq() error handling
PCI: iproc: Clean up whitespace
PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
PCI: iproc: Add 500ms delay during device shutdown
PCI: Fix typos and whitespace errors
PCI: Remove unused "res" variable from pci_resource_io()
PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
PCI/AER: Reformat AER register definitions
iommu/vt-d: Prevent VMD child devices from being remapping targets
x86/PCI: Use is_vmd() rather than relying on the domain number
...