The early_param() is only called during kernel initialization, So Linux
marks the functions of it with __init macro to save memory.
But it forgot to mark the parse_no_kvmapf/stealacc/kvmclock_vsyscall,
So, Make them __init as well.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: rkrcmar@redhat.com
Cc: kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
bit when enabling async_PF, but this bit is reserved on old hypervisors,
which results in a failure upon migration.
To avoid breaking different cases, we are checking for CPUID feature bit
before enabling the feature and nothing else.
Fixes: 52a5c155cf ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
Cc: <stable@vger.kernel.org>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix the following sparse warning by moving the prototype
of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h .
CHECK arch/s390/kvm/../../../virt/kvm/kvm_main.c
arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the kvm_arch_irq_routing_update() prototype outside of
ifdef CONFIG_HAVE_KVM_EVENTFD guards to fix the following sparse warning:
arch/s390/kvm/../../../virt/kvm/irqchip.c:171:28: warning: symbol 'kvm_arch_irq_routing_update' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'Total' line looks a bit weird when we have a single event only. This
can happen e.g. due to filters. Therefore suppress when there's only a
single event in the output.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We keep the current logic that sorts all events (parent and child), but
re-shuffle the events afterwards, grouping the children after the
respective parent. Note that the percentage column for child events
gives the percentage of the parent's total.
Since we rework the logic anyway, we modify the total average
calculation to use the raw numbers instead of the (rounded) averages.
Note that this can result in differing numbers (between total average
and the sum of the individual averages) due to rounding errors.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drilldown (i.e. toggle display of child trace events) was implemented by
overriding the fields filter. This resulted in inconsistencies: E.g. when
drilldown was not active, adding a filter that also matches child trace
events would not only filter fields according to the filter, but also add
in the child trace events matching the filter. E.g. on x86, setting
'kvm_userspace_exit' as the fields filter after startup would result in
display of kvm_userspace_exit(DCR), although that wasn't previously
present - not exactly what one would expect from a filter.
This patch addresses the issue by keeping drilldown and fields filter
separate. While at it, we also fix a PEP8 issue by adding a blank line
at one place (since we're in the area...).
We implement this by adding a framework that also allows to define a
taxonomy among the debugfs events to identify child trace events. I.e.
drilldown using 'x' can now also work with debugfs. A respective parent-
child relationship is only known for S390 at the moment, but could be
added adjusting other platforms' ARCH.dbg_is_child() methods
accordingly.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can do with a single dialog that takes both, pids and guest names.
Note that we keep both interactive commands, 'p' and 'g' for now, to
avoid confusion among users used to a specific key.
While at it, we improve on some minor glitches regarding curses usage,
e.g. cursor still visible when not supposed to be.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Helps quite a bit reading the code when it's obvious when a method is
intended for internal use only.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Te checks for debugfs assumed that debugfs is always mounted at
/sys/kernel/debug - which is likely, but not guaranteed. This is addressed
by checking /proc/mounts for the actual location.
Furthermore, when debugfs was mounted, but the kvm module not loaded, a
misleading error pointing towards debugfs not present was given.
To reproduce,
(a) run kvm_stat with debugfs mounted at a place different from
/sys/kernel/debug
(b) run kvm_stat with debugfs mounted but kvm module not loaded
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Entering an invalid regular expression did not produce any indication of an
error so far.
To reproduce, press 'f' and enter 'foo(' (with an unescaped bracket).
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When we apply a filter that will only leave child trace events, we
receive a ZeroDivisionError when calculating the percentages.
In that case, provide percentages based on child events only.
To reproduce, run 'kvm_stat -f .*[\(].*'.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use '==' for equality checks and 'is' when comparing identities.
An example where '==' and 'is' behave differently:
>>> a = 4242
>>> a == 4242
True
>>> a is 4242
False
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If it's clear that the values of a dictionary will be used then use
the '.items()' method.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Tested-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
[Include fix for logging mode by Stefan Raspl]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a namedtuple for storing the values as it allows to access the
fields of a tuple via names. This makes the overall code much easier
to read and to understand. Access by index is still possible as
before.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Tested-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'sortkey' function references a value in its enclosing
scope (closure). This is not common practice for a sort key function
so let's replace it. Additionally, the function 'sorted' has already a
parameter for reversing the result therefore the inversion of the
values is unneeded. The check for stats[x][1] is also superfluous as
it's ensured that this value is initialized with 0.
Signed-off-by: Marc Hartmayer <mhartmay@linux.vnet.ibm.com>
Tested-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported by syzkaller:
WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
Call Trace:
vmx_handle_exit+0xbd/0xe20 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
do_vfs_ioctl+0xa4/0x6a0
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x25/0x9c
The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
a second thread to mmap and operate on the same vCPU. This triggers a race
condition when running the testcase with multiple threads. Sometimes one thread
exits with a triple fault while another thread mmaps and operates on the same
vCPU. Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
exit with KVM_EXIT_INTERNAL_ERROR.
Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Although L2 is in halt state, it will be in the active state after
VM entry if the VM entry is vectoring according to SDM 26.6.2 Activity
State. Halting the vcpu here means the event won't be injected to L2
and this decision isn't reported to L1. Thus L0 drops an event that
should be injected to L2.
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On x86, special KVM memslots such as the TSS region have anonymous
memory mappings created on behalf of userspace, and these mappings are
removed when the VM is destroyed.
It is however possible for removing these mappings via vm_munmap() to
fail. This can most easily happen if the thread receives SIGKILL while
it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)'
in __x86_set_memory_region(). syzkaller was able to hit this, using
'exit()' to send the SIGKILL. Note that while the vm_munmap() failure
results in the mapping not being removed immediately, it is not leaked
forever but rather will be freed when the process exits.
It's not really possible to handle this failure properly, so almost
every other caller of vm_munmap() doesn't check the return value. It's
a limitation of having the kernel manage these mappings rather than
userspace.
So just remove the WARN_ON() so that users can't spam the kernel log
with this warning.
Fixes: f0d648bdf0 ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
L1 might want to use SECONDARY_EXEC_DESC, so we must not clear the VMCS
bit if UMIP is not being emulated.
We must still set the bit when emulating UMIP as the feature can be
passed to L2 where L0 will do the emulation and because L2 can change
CR4 without a VM exit, we should clear the bit if UMIP is disabled.
Fixes: 0367f205a3 ("KVM: vmx: add support for emulating UMIP")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The initial reset of the local APIC is performed before the VMCS has been
created, but it tries to do a vmwrite:
vmwrite error: reg 810 value 4a00 (err 18944)
CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G W I 4.16.0-0.rc2.git0.1.fc28.x86_64 #1
Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014
Call Trace:
vmx_set_rvi [kvm_intel]
vmx_hwapic_irr_update [kvm_intel]
kvm_lapic_reset [kvm]
kvm_create_lapic [kvm]
kvm_arch_vcpu_init [kvm]
kvm_vcpu_init [kvm]
vmx_create_vcpu [kvm_intel]
kvm_vm_ioctl [kvm]
Move it later, after the VMCS has been created.
Fixes: 4191db26b7 ("KVM: x86: Update APICv on APIC reset")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have certain cases where the multiple epoch facility is broken:
- timer wakeup during epoch change
- cpu hotplug
- SCK instruction
- stp sync checks
Fix those.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJajqO9AAoJEBF7vIC1phx8fk4QAIACbrjlCpdqQ/3s8JC6SfNb
2tnyt4gQLw0ztb11kPGkjYXAkG9SA7v4Y3J3oXDtFH8BP/xhf6CO3jVmWCXUNv7E
wfk04Dh0xJwnwBsHYuERFlngB2BTODLoHV/w00fd/ja1c8T5yGULzADie6dJjNNT
B/q/eCIpzHQYZLZnBW7+YO05ciMwssi2luq46uijY/MZkfCYIvO8pf4MNcuLPvWq
CepxzCyXbdy2xw0fWu7lrYk/0VU08eYchGbqjsDbpuz3CdbKJhVLwZhSGx89WebX
/+s2IKXQZEtxKcBWHOZS2k98mB8LNMLumnaoeEJDjDt3T+lu3B/ujGfPURipcvGQ
0ch4iM5Fmhyx3IxYk4lEgrdoRpjHdjnBs1ONyNGIx35NJrfWjAsRRHw6ov6qQ0rH
rcDmBC8bBZmZYTxBXD+R5rTn+noJp2OkNt4Wc5X7SnKj3DIbfR3FKgT3z+mtJyIX
l8+qnaQpj/Pchuko4j7gh0/uzHVt3WtG3HtLqQnqHJTZM+b9nkIeDfUbdp3cLycD
W2wfs9LO2tXXcX1A05KFPPSjNDUypz1ToAfyt6JgPXjE7ZfHkpLJTQPrN+BoJZCk
3P//LQ85yJaDNcEJtH9S7nGjhSTdW1MqeO61mhlkag4A5Qe2Mquqd18H1ngac7aq
0Xna6qvJBvdvPwEmI04H
=umvB
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: fixes for multiple epoch facility
We have certain cases where the multiple epoch facility is broken:
- timer wakeup during epoch change
- cpu hotplug
- SCK instruction
- stp sync checks
Fix those.
Fix the interaction of userspace irqchip VMs with in-kernl irqchip VMs
and make sure we can build 32-bit KVM/ARM with gcc-8.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJakFXlAAoJEEtpOizt6ddyg0kH/1Aor4Gkra8AFVaNP8r3xZxN
ZZfxiKDBuca32uEKtkk0DUKaPIFRJS60K4SvKJTPbOV5gj43vq29SNrpBiBHNdj+
5hcZRjH66p5HzBmZZk/hispijpQUW2cXDnf2tsupknwENqWHhgf440t3ruuDLdLU
npg+I/0588uPWaljj2YfAn1DQOFLx/gpjuNY8n+dHYm6PcTtVp+tTYEYu2P5znFa
WOn6Zw0XLVdC4e3sFOdajlGk5TZ8+ILEWVJkXKg5bd0vV67WB3mCvsz6byr9c5CU
vdjRgz2/iSGldbo+e4LuqnxSUjaVF53T2BSfKLl1YO0J2vIP1hkZl7XR0tRJfww=
=SF7J
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-fixes-for-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Fixes for v4.16, Round 1
Fix the interaction of userspace irqchip VMs with in-kernl irqchip VMs
and make sure we can build 32-bit KVM/ARM with gcc-8.
Right now, SET CLOCK called in the guest does not properly take care of
the epoch index, as the call goes via the old kvm_s390_set_tod_clock()
interface. So the epoch index is neither reset to 0, if required, nor
properly set to e.g. 0xff on negative values.
Fix this by providing a single kvm_s390_set_tod_clock() function. Move
Multiple-epoch facility handling into it.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180207114647.6220-3-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 8fa1696ea7 ("KVM: s390: Multiple Epoch Facility support")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
For now, we don't take care of over/underflows. Especially underflows
are critical:
Assume the epoch is currently 0 and we get a sync request for delta=1,
meaning the TOD is moved forward by 1 and we have to fix it up by
subtracting 1 from the epoch. Right now, this will leave the epoch
index untouched, resulting in epoch=-1, epoch_idx=0, which is wrong.
We have to take care of over and underflows, also for the VSIE case. So
let's factor out calculation into a separate function.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180207114647.6220-5-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 8fa1696ea7 ("KVM: s390: Multiple Epoch Facility support")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[use u8 for idx]
Missed when enabling the Multiple-epoch facility. If the facility is
installed and the control is set, a sign based comaprison has to be
performed.
Right now we would inject wrong interrupts and ignore interrupt
conditions. Also the sleep time is calculated in a wrong way.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180207114647.6220-2-david@redhat.com>
Fixes: 8fa1696ea7 ("KVM: s390: Multiple Epoch Facility support")
Cc: stable@vger.kernel.org
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In banked-sr.c, we use a top-level '__asm__(".arch_extension virt")'
statement to allow compilation of a multi-CPU kernel for ARMv6
and older ARMv7-A that don't normally support access to the banked
registers.
This is considered to be a programming error by the gcc developers
and will no longer work in gcc-8, where we now get a build error:
/tmp/cc4Qy7GR.s:34: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_usr'
/tmp/cc4Qy7GR.s:41: Error: Banked registers are not available with this architecture. -- `mrs r3,ELR_hyp'
/tmp/cc4Qy7GR.s:55: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_svc'
/tmp/cc4Qy7GR.s:62: Error: Banked registers are not available with this architecture. -- `mrs r3,LR_svc'
/tmp/cc4Qy7GR.s:69: Error: Banked registers are not available with this architecture. -- `mrs r3,SPSR_svc'
/tmp/cc4Qy7GR.s:76: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_abt'
Passign the '-march-armv7ve' flag to gcc works, and is ok here, because
we know the functions won't ever be called on pre-ARMv7VE machines.
Unfortunately, older compiler versions (4.8 and earlier) do not understand
that flag, so we still need to keep the asm around.
Backporting to stable kernels (4.6+) is needed to allow those to be built
with future compilers as well.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84129
Fixes: 33280b4cd1 ("ARM: KVM: Add banked registers save/restore")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When introducing support for irqchip in userspace we needed a way to
mask the timer signal to prevent the guest continuously exiting due to a
screaming timer.
We did this by disabling the corresponding percpu interrupt on the
host interrupt controller, because we cannot rely on the host system
having a GIC, and therefore cannot make any assumptions about having an
active state to hide the timer signal.
Unfortunately, when introducing this feature, it became entirely
possible that a VCPU which belongs to a VM that has a userspace irqchip
can disable the vtimer irq on the host on some physical CPU, and then go
away without ever enabling the vtimer irq on that physical CPU again.
This means that using irqchips in userspace on a system that also
supports running VMs with an in-kernel GIC can prevent forward progress
from in-kernel GIC VMs.
Later on, when we started taking virtual timer interrupts in the arch
timer code, we would also leave this timer state active for userspace
irqchip VMs, because we leave it up to a VGIC-enabled guest to
deactivate the hardware IRQ using the HW bit in the LR.
Both issues are solved by only using the enable/disable trick on systems
that do not have a host GIC which supports the active state, because all
VMs on such systems must use irqchips in userspace. Systems that have a
working GIC with support for an active state use the active state to
mask the timer signal for both userspace and in-kernel irqchips.
Cc: Alexander Graf <agraf@suse.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <stable@vger.kernel.org> # v4.12+
Fixes: d9e1397783 ("KVM: arm/arm64: Support arch timers with a userspace gic")
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
- optimization for the exitless interrupt support that was merged
in 4.16-rc1
- improve the branch prediction blocking for nested KVM
- replace some jump tables with switch statements to improve
expoline performance
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJahEB/AAoJEBF7vIC1phx8Hv0QAIDxeDQ/F6CeMkJj+8MtC5pW
MQ5YLhGKoQsVfX3Eb2jWyczBYDQUPVvoNpZH7I31cVtT6Gmtvc46SvcwvNgRwsOn
TL+jYLIFxSpHU/gvS7STsybc3RMJhLAytBy5YI3uPhuW4a7/ZMZ2p+XIhoq1J3uX
n0m41wsdb0YW9xi7DRoSw40znL3rgQ29rsaQtq61asnaJbakOLdAHqoJr3vhJvmq
kDQFV8OZDY8buya+M+goPlqFiOIW/86tkxmFDDEP8KaGcQUjl++Q7v7RBfNsUDaB
8Sc2eP4ui5KV5t7W+NUmzJ09RyDIlG92jXOV09optRjScyYag4sDpFicvU7KOGql
676Uq75lhlXb89JRw7X9AYToMQG+vdRfxcEWh3XGGDxZ/Ow739Gw6Iz+6ICfhBMY
Oh15AiCHsvAq3QHER61eb1lL+1yooqTgpXr42ak02DJNjxo1h/JLigmOzATjWYcN
nAu7np9RcYZgypMW8BuS80CmF+YApHWqoEZsUPJRND3u2/poygNLmR68SkKm2rYo
4QXn95PQ58kI10MhXYap41Vp+RWaBjIMR07HV/5ciJZEK468NRvEoJRWfbmH/MzH
gKgcDZfoQPPfZyGx2iIsAldqwK3H7Muaa/3DNDsiy9grmc94b3oyN1GHQkRbCuRQ
ELF1hyQsPMfaRi74Gwol
=DM2J
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes and improvements for 4.16
- optimization for the exitless interrupt support that was merged
in 4.16-rc1
- improve the branch prediction blocking for nested KVM
- replace some jump tables with switch statements to improve
expoline performance
Just like for the interception handlers, let's also use a switch-case
in our interrupt delivery code.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180206141743.24497-1-david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Instead of having huge jump tables for function selection,
let's use normal switch/case statements for the instruction
handlers in intercept.c We can now also get rid of
intercept_handler_t.
This allows the compiler to make the right decision depending
on the situation (e.g. avoid jump-tables for thunks).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Instead of having huge jump tables for function selection,
let's use normal switch/case statements for the instruction
handlers in priv.c
This allows the compiler to make the right decision depending
on the situation (e.g. avoid jump-tables for thunks).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If the guest runs with bp isolation when doing a SIE instruction,
we must also run the nested guest with bp isolation when emulating
that SIE instruction.
This is done by activating BPBC in the lpar, which acts as an override
for lower level guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If GISA is available, we do not have to kick CPUs out of SIE to deliver
interrupts. The hardware can deliver such interrupts while running.
Cc: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
For interrupt injection of floating interrupts we queue the interrupt
either in the GISA or in the floating interrupt list. The first CPU
that looks at these data structures - either in KVM code or hardware
will then deliver that interrupt. To minimize latency we also:
-a: choose a VCPU to deliver that interrupt. We prefer idle CPUs
-b: we wake up the host thread that runs the VCPU
-c: set an I/O intervention bit for that CPU so that it exits guest
context as soon as the PSW I/O mask is enabled
This will make sure that this CPU will execute the interrupt delivery
code of KVM very soon.
We can now optimize the injection case if we have exitless interrupts.
The wakeup is still necessary in case the target CPU sleeps. We can
avoid the I/O intervention request bit though. Whenever this
intervention request would be handled, the hardware could also directly
inject the interrupt on that CPU, no need to go through the interrupt
injection loop of KVM.
Cc: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
except, again, POLLFREE and POLL_BUSY_LOOP.
With this, we finally get to the promised end result:
- POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
stray instances of ->poll() still using those will be caught by
sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more poll annotation updates from Al Viro:
"This is preparation to solving the problems you've mentioned in the
original poll series.
After this series, the kernel is ready for running
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
as a for bulk search-and-replace.
After that, the kernel is ready to apply the patch to unify
{de,}mangle_poll(), and then get rid of kernel-side POLL... uses
entirely, and we should be all done with that stuff.
Basically, that's what you suggested wrt KPOLL..., except that we can
use EPOLL... instead - they already are arch-independent (and equal to
what is currently kernel-side POLL...).
After the preparations (in this series) switch to returning EPOLL...
from ->poll() instances is completely mechanical and kernel-side
POLL... can go away. The last step (killing kernel-side POLL... and
unifying {de,}mangle_poll() has to be done after the
search-and-replace job, since we need userland-side POLL... for
unified {de,}mangle_poll(), thus the cherry-pick at the last step.
After that we will have:
- POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
->poll() still using those will be caught by sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly)"
* 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
annotate ep_scan_ready_list()
ep_send_events_proc(): return result via esed->res
preparation to switching ->poll() to returning EPOLL...
add EPOLLNVAL, annotate EPOLL... and event_poll->event
use linux/poll.h instead of asm/poll.h
xen: fix poll misannotation
smc: missing poll annotations
nios2: defconfig: Cleanup from old Kconfig options
nios2: dts: Remove leading 0x and 0s from bindings notation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJagGD7AAoJEFWoEK+e3syCyIgQALRklRmZIDK+ucnnwGhDZIWQ
Nip13tRFaFDmdwYymvZfv22b9dCAMINcoNY7yj6qCV2r1KBktQI/EmCL1QIj4PqS
uILFiQwGSj/YFapY5I32vDoHPKE5xP1UohJDsJh1mZbKWqzN5ydWh7sfOLscsWmT
4ybAMIKLt6xketsLeC/ygznjXeGUeUFKtQzwp8bcrCNsZkf9/BqwtnKLlrH2hurf
AdARsGEiH3aD/Tz2vTDvdSyMuqGORRGgDZVB+pevmdkUg5pyuAP+sfm/DB6lP8gB
beLsuikcDccdqnsMg57WrIAxcblc+S2fUfWzwNQUB9GyELO49vFvuVSD5Vp1xMtq
DWWiE4jhnCgpY13uKxRW01Ddo0u78PvdbojrxZ4iyBWAvlyhdaPXwXP0TwmhVl4A
tnIpRntPeP/0X5Htprd3SnXJFE5qRifbIDXTYbPG2QOFVIysvWjQuhN2rqi0PVJ5
duNL6uJ+Dt+NNwVamryyYuUsyrqU8CZtjLgI3P3m7xW9rscWgPZM4brJzYetq5mn
JwO2HwQBbNIcUSfCrFIaq1LEYKUOPY+glDXodxD519R0IWmJHBDTmbch+P92Xc5G
A2UINPNDj89qxcTEWJ+1qzheT3oRQAH1UaN9cIfwa1hPmjTPlhK2ltnOQ5atQJhU
fbf5dUlW16qqIf+DxK2K
=gek7
-----END PGP SIGNATURE-----
Merge tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2
Pull nios2 update from Ley Foon Tan:
- clean up old Kconfig options from defconfig
- remove leading 0x and 0s from bindings notation in dts files
* tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: defconfig: Cleanup from old Kconfig options
nios2: dts: Remove leading 0x and 0s from bindings notation
The commit 917538e212 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT
usage") removed KASAN_SHADOW_SCALE_SHIFT definition from
include/linux/kasan.h and added it to architecture-specific headers,
except for xtensa. This broke the xtensa build with KASAN enabled.
Define KASAN_SHADOW_SCALE_SHIFT in arch/xtensa/include/asm/kasan.h
Reported by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 917538e212 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage")
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Remove old, dead Kconfig option INET_LRO. It is gone since
commit 7bbf3cae65 ("ipv4: Remove inet_lro library").
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
Improve the DTS files by removing all the leading "0x" and zeros to fix the
following dtc warnings:
Warning (unit_address_format): Node /XXX unit name should not have leading "0x"
and
Warning (unit_address_format): Node /XXX unit name should not have leading 0s
Converted using the following command:
find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +
For simplicity, two sed expressions were used to solve each warnings separately.
To make the regex expression more robust a few other issues were resolved,
namely setting unit-address to lower case, and adding a whitespace before the
the opening curly brace:
https://elinux.org/Device_Tree_Linux#Linux_conventions
This is a follow up to commit 4c9847b737 ("dt-bindings: Remove leading 0x from bindings notation")
Reported-by: David Daney <ddaney@caviumnetworks.com>
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJafyuhAAoJEPfTWPspceCm2zcP/3m0sVBYtKlXRsyB3fiRtYtV
NUcDiXWSNW9jluVRImjNonTGF3mnR7TBRyUStnjANafFtKOx06QNikXTTxAjjZRb
4HSDoFJ8LVW1GPDgi/o/c7dn32ypNd7eZsklqn1yNFbYfayLCPHODu5LZTnq3sRO
K23zgIlbn3G65PtXkrheNImOeU3XnyVF/p0NejF2/9klOEs9Zj2XaG67Opyg+iFd
3jlAOuvL/Wsuj3SKyrrtFB85okPsqV9j+omMJDB2W5uWAjdKYJ76zpxUU/FmAT4m
6CBWuMEGWAhYtsocvPU/tvBa++dX4eR4w+e6A44SqWgt/+e5aC7uhBVqxYlWUxIi
ThcAv6oH2yhrzr1tOIta82l5bAwHKqY2NV1P4/XUaYEY65lVrYowK5yzRoOUgLJR
REr68eLbvrP+IYYn7AOEmrTAwp4oPWKGjSJLEJw3ESacHzjdI+WwdMfaVjcog/tF
SA2w968ZqvOvhL8ZZuQfblMuuMxy2XyAclWswueyHdxJePgZSupj2wcIFo7fJiAR
/iPpUpFvAJccRUmlrnHd4C5HRdJ1gUKdCWOkGJxsktm4DKnJ/8m49rYApT3mdK5/
/s0yzziukCpjbOFCjuhjmLicTRlL/XHDRFyySVdcIQcCMUtlUNbKU3P5vBXRxzAu
ZmRXXNgiRaHJ5O38TKWn
=TGix
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180210' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes to round off the merge window on the block side:
- a set of bcache fixes by way of Michael Lyle, from the usual bcache
suspects.
- add a simple-to-hook-into function for bpf EIO error injection.
- fix blk-wbt that mischarectized flushes as reads. Improve the logic
so that flushes and writes are accounted as writes, and only reads
as reads. From me.
- fix requeue crash in BFQ, from Paolo"
* tag 'for-linus-20180210' of git://git.kernel.dk/linux-block:
block, bfq: add requeue-request hook
bcache: fix for data collapse after re-attaching an attached device
bcache: return attach error when no cache set exist
bcache: set writeback_rate_update_seconds in range [1, 60] seconds
bcache: fix for allocator and register thread race
bcache: set error_limit correctly
bcache: properly set task state in bch_writeback_thread()
bcache: fix high CPU occupancy during journal
bcache: add journal statistic
block: Add should_fail_bio() for bpf error injection
blk-wbt: account flush requests correctly
Mellanox fixes and new system type support.
The following is an automated git shortlog grouped by driver:
mlx-platform:
- Add support for new 200G IB and Ethernet systems
- Add support for new msn201x system type
- Add support for new msn274x system type
- Fix power cable setting for msn21xx family
- Add define for the negative bus
- Use defines for bus assignment
platform/mellanox:
- mlxreg-hotplug: Fix uninitialized variable
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJafyo5AAoJEKbMaAwKp364sxgIAIFi/sRACXrJsowOX7Okg4ID
Rx4BBpynqbOXwxGfHj2Hs5Y1Y9CG6/kCQM3MJfxJmi+Mb8IseucJGVswFmsCfq/U
IWbj7YwSfpeUOJMWGfpdUaX9RDNa8bTYT2gOx/scZXalsjLTWV1AmlZLOLwlEnK8
s26zL7GHiFjBY7BJ0GX/atbBOrAFsjuM2QodmwMEuf8lZJSDOxwg0WSqAZHKsPM3
/2WzeR/2jEX+XlSnwKUN62XsKRF79eZDbBxt+PF/FekAppHnkgLbBfHw7MvhZqg1
bg2W16nqOnNBJNRHdz6gKM8TeWyr537D/uQmHafc2oyYmhUYbGLPr65q5fqZjoE=
=VbPT
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86
Pull x86 platform driver updates from Darren Hart:
"Mellanox fixes and new system type support.
Mostly data for new system types with a correction and an
uninitialized variable fix"
[ Pulling from github because git.infradead.org currently seems to be
down for some reason, but Darren had a backup location - Linus ]
* tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86:
platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems
platform/x86: mlx-platform: Add support for new msn201x system type
platform/x86: mlx-platform: Add support for new msn274x system type
platform/x86: mlx-platform: Fix power cable setting for msn21xx family
platform/x86: mlx-platform: Add define for the negative bus
platform/x86: mlx-platform: Use defines for bus assignment
platform/mellanox: mlxreg-hotplug: Fix uninitialized variable
Moving cros_ec_dev to drivers/mfd.
Other small maintenance fixes.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEE6gYDF28Li+nEiKLaHwn1ewov5lgFAlp+l8YUHGJsZXVuZ0Bj
aHJvbWl1bS5vcmcACgkQHwn1ewov5ljXzw/+MAcTYwIBdUyIZXBhA/FMYvAyouyN
0xzi/J7raKGhewG0Art28t+KT5j4T7Rz8Pd6zas6k4lGVfvWN5YBhz0ggMyCn3Lc
6t5aDzp4wG5OSl30Y1uA5x2mZ4uXDSskcgSCng0BooJptY1jxGwmt/97m/hxzDdH
ChiaRab+rhMeV2K2haONHWXGPi472fqPsXOb1uyIoX2quRdt1XbWrCGGmR81Tp/e
uQZC5DYOti1I3EYz+jOecbOkr42YIJyZxGTK6Mtk2NgUAsFNxhpg7vF6LGV39ynr
t4BAZk0qrIY/kP0xHl13+laW3w6eh/Rnrw/mSMaeo1CZ8K0X9NDRKb7hoSPcEkVe
Jzz6h+Gs4K2cds/VgOjjF8l/OzoZ5sIWaBoJf4tZKuEZ5JSIPeXH72k6ufg9/d72
0ArHQunxdAuZYrq1pwizduW/h5nG0nlHI9tGx1dQ9lS5TFl82EWkj6cE5gr8kFfo
FmUOyt0AM4ZYGqF1qgwEDXdSpbKrBA/CKB/K2xkcEl426DZlbO08TZ9bpsmX6pqb
d9ZbN9OvZxpDmb2XIXBPWCs6z/oWEyDTa011IJPkbq7SZ2Gckrt6x5FkgIBQFR7e
whEa9S0EMApEkCWVEF83Hpw/HL9UtLduRAMoxgjd81hJpQkBZ/jrwpU4aZOo3kAn
20AdipHKII3+6V8=
=YEO6
-----END PGP SIGNATURE-----
Merge tag 'chrome-platform-for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform
Pull chrome platform updates from Benson Leung:
- move cros_ec_dev to drivers/mfd
- other small maintenance fixes
[ The cros_ec_dev movement came in earlier through the MFD tree - Linus ]
* tag 'chrome-platform-for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
platform/chrome: Use proper protocol transfer function
platform/chrome: cros_ec_lpc: Add support for Google Glimmer
platform/chrome: cros_ec_lpc: Register the driver if ACPI entry is missing.
platform/chrome: cros_ec_lpc: remove redundant pointer request
cros_ec: fix nul-termination for firmware build info
platform/chrome: chromeos_laptop: make chromeos_laptop const