For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.
This patch moves then to the beginning of the list, and skip testing them.
Cc: stable@kernel.org
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.
Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.
Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.
Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery. Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.
On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).
Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.
So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that have already failed
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.
[marcelo: remove unused variable]
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add missing decoder flags for or instructions (0xc-0xd).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.
[avi: build fix on non-x86/ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.
[avi: no PIC on ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduces a new decode option "No64", which is used for instructions that are
invalid in long mode.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We should now use dev_set_drvdata to set the driver driver_data field.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Patchwork: http://patchwork.linux-mips.org/patch/747/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Loongson: Switch from flatmem to sparsemem
MIPS: Loongson: Disallow 4kB pages
MIPS: Add missing definition for MADV_HWPOISON.
MIPS: Fix build error if __xchg() is not getting inlined.
MIPS: IP22/IP28 Disable early printk to fix boot problems on some systems.
Currently, with PAGE_SIZE_4KB, the kernel for loongson will hang on:
Kernel panic - not syncing: Attempted to kill init!
The possible reason is the cache aliases problem:
Loongson 2F has 64kb, 4 way L1 Cache, the way size is 16kb, which is bigger
then 4kb. so, If using 4kb page size, there is cache aliases problem.
To avoid this kind of problem, extra cache flushing. The 2nd possible
solution is 16kb page size which avoids cache aliases without the need for
extra cache flushes. So we disable 4kB pages until the aliasing issue is
solved.
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
Patchwork: http://patchwork.linux-mips.org/patch/736/
Cc: linux-mips@linux-mips.org
Cc: zhangfx@lemote.com
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Thanks to Joseph S. Myers for reporting this.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "Joseph S. Myers" <joseph@codesourcery.com>
Patchwork: http://patchwork.linux-mips.org/patch/723/
If __xchg() is not getting inlined the outline version of the function
will have a reference to __xchg_called_with_bad_pointer() which does not
exist remaining. Fixed by using BUILD_BUG_ON() to check for allowable
operand sizes.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Patchwork: http://patchwork.linux-mips.org/patch/705/
Some Debian users have reported that the kernel hangs early during boot on
some IP22 systems. Thomas Bogendoerfer found that this is due to a "bad
interaction between CONFIG_EARLY_PRINTK and overwritten prom memory during
early boot". Since there's no fix yet, disable CONFIG_EARLY_PRINTK for now.
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Cc: linux-mips@linux-mips.org
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Patchwork: http://patchwork.linux-mips.org/patch/702/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6:
alpha: Fixup last users of irq_chip->typename
Alpha: Rearrange thread info flags fixing two regressions
arch/alpha/kernel: Add kmalloc NULL tests
arch/alpha/kernel/sys_ruffian.c: Use DIV_ROUND_CLOSEST
The typename member of struct irq_chip was kept for migration purposes
and is obsolete since more than 2 years. Fix up the leftovers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Matt Turner <mattst88@gmail.com>
The removal of the TIF_NOTIFY_RESUME flag, commit a583f1b542
"remove unused TIF_NOTIFY_RESUME flag," resulted in incorrect
setting of the unaligned access control flags by the prctl syscall.
The re-addition of the TIF_NOTIFY_RESUME flag, commit d0420c83f3
"KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6]"
further caused problems, namely incorrect operands to assembler code
as evidenced by:
AS arch/alpha/kernel/entry.o
arch/alpha/kernel/entry.S: Assembler messages:
arch/alpha/kernel/entry.S:326: Warning: operand out of range
(0x0000000000000406 is not between 0x0000000000000000 and
0x00000000000000ff)
Both regressions fixed by (1) rearranging TIF_NOTIFY_RESUME flag to be
in lower 8 bits of the thread info flags, and (2) making sure that
ALPHA_UAC_SHIFT matches the rearrangement of the thread info flags.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: David Howells <dhowells@redhat.com>,
Signed-off-by: Matt Turner <mattst88@gmail.com>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] Update mach-types
ARM: 5793/1: ARM: Check put_user fail in do_signal when enable OABI_COMPAT
MAINTAINERS: add maintainer information for AMBA primecell drivers
[ARM] pxa/spitz: fix compile regression on spitz
ARM: PNX4008: i2c-pnx: use the same dev_id for request_irq and free_irq
[ARM] pxa/cpufreq: fix index assignments for end marker
ARM: PNX4008: fix watchdog device driver name
[ARM] kmap: fix build errors with DEBUG_HIGHMEM enabled
Code was added to mm/higmem.c that depends on several
kmap types that powerpc does not support. We add dummy
invalid definitions for KM_NMI, KM_NM_PTE, and KM_IRQ_PTE.
According to list discussion, this fix should not be needed
anymore starting with 2.6.33. The code is commented to this
effect so hopefully we will remember to remove this.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sunsu: Use sunserial_console_termios() in sunsu_console_setup().
sunsu: Pass true 'ignore_line' to console match when RSC or LOM console.
serial: suncore: Fix RSC/LOM handling in sunserial_console_termios().
serial: suncore: Add 'ignore_line' argument to sunserial_console_match().
sunsu: Fix detection of SU ports which are RSC console or control.
sunsab: Do not set sunsab_reg.cons right before registering minors.
sparc64: Fix definition of VMEMMAP_SIZE.
Check that the result of kmalloc is not NULL before passing it to other
functions.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression *x;
identifier f;
constant char *C;
@@
x = \(kmalloc\|kcalloc\|kzalloc\)(...);
... when != x == NULL
when != x != NULL
when != (x || ...)
(
kfree(x)
f(...,C,...,x,...)
|
*f(...,x,...)
|
*x->f
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d
but is perhaps more readable.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@haskernel@
@@
@depends on haskernel@
expression x,__divisor;
@@
- (((x) + ((__divisor) / 2)) / (__divisor))
+ DIV_ROUND_CLOSEST(x,__divisor)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
kernel unwinding is broken with gcc >= 4.x. Part of the problem is that
binutils seems very sensitive to where the unwind information is stored.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>