linux

Author	SHA1	Message	Date
Izik Eidus	80b14b5b32	KVM: Unmap kernel-allocated memory on slot destruction kvm_vm_ioctl_set_memory_region() is able to remove memory in addition to adding it. Therefore when using kernel swapping support for old userspaces, we need to munmap the memory if the user request to remove it Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:55 +02:00
Avi Kivity	60395224d9	KVM: Add a might_sleep() annotation to gfn_to_page() This will help trap accesses to guest memory in atomic context. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:55 +02:00
Avi Kivity	e00c8cf29b	KVM: Move vmx_vcpu_reset() out of vmx_vcpu_setup() Split guest reset code out of vmx_vcpu_setup(). Besides being cleaner, this moves the realmode tss setup (which can sleep) outside vmx_vcpu_setup() (which is executed with preemption enabled). [izik: remove unused variable] Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:55 +02:00
Zhang Xiantao	34c16eecf7	KVM: Portability: Split kvm_vcpu into arch dependent and independent parts (part 1) First step to split kvm_vcpu. Currently, we just use an macro to define the common fields in kvm_vcpu for all archs, and all archs need to define its own kvm_vcpu struct. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:54 +02:00
Anthony Liguori	8d4e1288eb	KVM: Allocate userspace memory for older userspace Allocate a userspace buffer for older userspaces. Also eliminate phys_mem buffer. The memset() in kvmctl really kills initial memory usage but swapping works even with old userspaces. A side effect is that maximum guest side is reduced for older userspace on i386. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:54 +02:00
Izik Eidus	8a7ae055f3	KVM: MMU: Partial swapping of guest memory This allows guest memory to be swapped. Pages which are currently mapped via shadow page tables are pinned into memory, but all other pages can be freely swapped. The patch makes gfn_to_page() elevate the page's reference count, and introduces kvm_release_page() that pairs with it. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:54 +02:00
Izik Eidus	cea7bb2128	KVM: MMU: Make gfn_to_page() always safe In case the page is not present in the guest memory map, return a dummy page the guest can scribble on. This simplifies error checking in its users. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:54 +02:00
Avi Kivity	3176bc3e59	KVM: Rename KVM_TLB_FLUSH to KVM_REQ_TLB_FLUSH We now have a new namespace, KVM_REQ_*, for bits in vcpu->requests. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:53 +02:00
Avi Kivity	ab6ef34b90	KVM: Move apic timer interrupt backlog processing to common code Beside the obvious goodness of making code more common, this prevents a livelock with the next patch which moves interrupt injection out of the critical section. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:53 +02:00
Carsten Otte	313a3dc75d	KVM: Portability: split kvm_vcpu_ioctl This patch splits kvm_vcpu_ioctl into archtecture independent parts, and x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c. Common ioctls for all architectures are: KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT, KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU Note that some PPC chips don't have an FPU, so we might need an #ifdef around KVM_GET/SET_FPU one day. x86 specific ioctls are: KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS An interresting aspect is vcpu_load/vcpu_put. We now have a common vcpu_load/put which does the preemption stuff, and an architecture specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the vmx/svm function defined in kvm_x86_ops. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:52 +02:00
Carsten Otte	043405e100	KVM: Move x86 msr handling to new files x86.[ch] Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:51 +02:00
Izik Eidus	6fc138d227	KVM: Support assigning userspace memory to the guest Instead of having the kernel allocate memory to the guest, let userspace allocate it and pass the address to the kernel. This is required for s390 support, but also enables features like memory sharing and using hugetlbfs backed memory. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:51 +02:00
Mike Day	d77c26fce9	KVM: CodingStyle cleanup Signed-off-by: Mike D. Day <ncmike@ncultra.org> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:50 +02:00
Rusty Russell	76fafa5e22	KVM: Hoist kvm_create_lapic() into kvm_vcpu_init() Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm and vmx do it. And make it return the error rather than a fairly random -ENOMEM. This also solves the problem that neither svm.c nor vmx.c actually handles the error path properly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:50 +02:00
Rusty Russell	d589444e92	KVM: Add kvm_free_lapic() to pair with kvm_create_lapic() Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic(). And guess what? I found a minor bug: we don't need to hrtimer_cancel() from kvm_main.c, because we do that in kvm_free_apic(). Also: 1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init. 2) Don't set apic->regs_page to zero before freeing apic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:50 +02:00
Izik Eidus	82ce2c9683	KVM: Allow dynamic allocation of the mmu shadow cache size The user is now able to set how many mmu pages will be allocated to the guest. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:50 +02:00
Izik Eidus	195aefde9c	KVM: Add general accessors to read and write guest memory Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:50 +02:00
Izik Eidus	290fc38da8	KVM: Remove the usage of page->private field by rmap When kvm uses user-allocated pages in the future for the guest, we won't be able to use page->private for rmap, since page->rmap is reserved for the filesystem. So we move the rmap base pointers to the memory slot. A side effect of this is that we need to store the gfn of each gpte in the shadow pages, since the memory slot is addressed by gfn, instead of hfn like struct page. Signed-off-by: Izik Eidus <izik@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:50 +02:00
Laurent Vivier	a22436b7b8	KVM: Purify x86_decode_insn() error case management The only valid case is on protected page access, other cases are errors. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:49 +02:00
Ryan Harper	217648638c	KVM: MMU: Ignore reserved bits in cr3 in non-pae mode This patch removes the fault injected when the guest attempts to set reserved bits in cr3. X86 hardware doesn't generate a fault when setting reserved bits. The result of this patch is that vmware-server, running within a kvm guest, boots and runs memtest from an iso. Signed-off-by: Ryan Harper <ryanh@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:48 +02:00
Avi Kivity	c7addb9020	KVM: Allow not-present guest page faults to bypass kvm There are two classes of page faults trapped by kvm: - host page faults, where the fault is needed to allow kvm to install the shadow pte or update the guest accessed and dirty bits - guest page faults, where the guest has faulted and kvm simply injects the fault back into the guest to handle The second class, guest page faults, is pure overhead. We can eliminate some of it on vmx using the following evil trick: - when we set up a shadow page table entry, if the corresponding guest pte is not present, set up the shadow pte as not present - if the guest pte _is_ present, mark the shadow pte as present but also set one of the reserved bits in the shadow pte - tell the vmx hardware not to trap faults which have the present bit clear With this, normal page-not-present faults go directly to the guest, bypassing kvm entirely. Unfortunately, this trick only works on Intel hardware, as AMD lacks a way to discriminate among page faults based on error code. It is also a little risky since it uses reserved bits which might become unreserved in the future, so a module parameter is provided to disable it. Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:48 +02:00
Laurent Vivier	3427318fd2	KVM: Call x86_decode_insn() only when needed Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to not modify the context if it must be re-entered. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:47 +02:00
Laurent Vivier	1be3aa4718	KVM: emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn() emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn(). x86_emulate_insn() is x86_emulate_memop() without the decoding part. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:47 +02:00
Anthony Liguori	7aa81cc047	KVM: Refactor hypercall infrastructure (v3) This patch refactors the current hypercall infrastructure to better support live migration and SMP. It eliminates the hypercall page by trapping the UD exception that would occur if you used the wrong hypercall instruction for the underlying architecture and replacing it with the right one lazily. A fall-out of this patch is that the unhandled hypercalls no longer trap to userspace. There is very little reason though to use a hypercall to communicate with userspace as PIO or MMIO can be used. There is no code in tree that uses userspace hypercalls. [avi: fix #ud injection on vmx] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-01-30 17:52:46 +02:00
Kay Sievers	af5ca3f4ec	Driver core: change sysdev classes to use dynamic kobject names All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:40 -08:00
Amit Shah	404fb881b8	KVM: SVM: Fix FPU leak while emulating clts The clts code didn't use set_cr0 properly, so our lazy FPU processing wasn't being done by the clts instruction at all. (this isn't called on Intel as the hardware does the decode for us) Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-11-27 15:38:18 +02:00
Laurent Vivier	49d3bd7e2b	KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs() In kvm_flush_remote_tlbs(), replace a loop using smp_call_function_single() by a single call to smp_call_function_mask() (which is new for x86_64). Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-22 17:21:54 +02:00
Laurent Vivier	0552f73b9a	KVM: Move kvm_guest_exit() after local_irq_enable() We need to make sure that the timer interrupt happens before we clear PF_VCPU, so the accounting code actually sees guest mode. http://lkml.org/lkml/2007/10/15/114 Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-22 12:03:29 +02:00
Laurent Vivier	d172fcd3ae	sched: guest CPU accounting: maintain guest state in KVM Modify KVM to update guest time accounting. [ mingo@elte.hu: ported to 2.6.24 KVM. ] Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-15 17:00:19 +02:00
Avi Kivity	0967b7bf1c	KVM: Skip pio instruction when it is emulated, not executed If we defer updating rip until pio instructions are executed, we have a problem with reset: a pio reset updates rip, and when the instruction completes we skip the emulated instruction, pointing rip somewhere completely unrelated. Fix by updating rip when we see decode the instruction, not after emulation. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:29 +02:00
Avi Kivity	054b136967	KVM: Improve emulation failure reporting Report failed opcodes from all locations. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:28 +02:00
Avi Kivity	04d2cc7780	KVM: Move main vcpu loop into subarch independent code This simplifies adding new code as well as reducing overall code size. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:28 +02:00
Christian Ehrhardt	cbdd1bea2a	KVM: Rename kvm_arch_ops to kvm_x86_ops This patch just renames the current (misnamed) _arch namings to _x86 to ensure better readability when a real arch layer takes place. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:27 +02:00
Laurent Vivier	0d8d2bd4f2	KVM: Simplify memory allocation The mutex->splinlock convertion alllows us to make some code simplifications. As we can keep the lock longer, we don't have to release it and then have to check if the environment has not been modified before re-taking it. We can remove kvm->busy and kvm->memory_config_version. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:27 +02:00
Rusty Russell	1747fb71fd	KVM: Hoist SVM's get_cs_db_l_bits into core code. SVM gets the DB and L bits for the cs by decoding the segment. This is in fact the completely generic code, so hoist it for kvm-lite to use. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:27 +02:00
Rusty Russell	81f50e3bfd	KVM: Keep control regs in sync We don't update the vcpu control registers in various places. We should do so. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:27 +02:00
Amit Shah	380102c8e4	KVM: Set the ET flag in CR0 after initializing FX This was missed when moving stuff around in fbc4f2e Fixes Solaris guests and bug #1773613 Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:27 +02:00
He, Qing	c5ec153402	KVM: enable in-kernel APIC INIT/SIPI handling This patch enables INIT/SIPI handling using in-kernel APIC by introducing a ->mp_state field to emulate the SMP state transition. [avi: remove smp_processor_id() warning] Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Xin Li <xin.b.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:26 +02:00
He, Qing	5cd4f6fd85	KVM: disable tpr/cr8 sync when in-kernel APIC is used Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:26 +02:00
Eddie Dong	1b9778dae7	KVM: Keep track of missed timer irq injections APIC timer IRQ is set every time when a certain period expires at host time, but the guest may be descheduled at that time and thus the irq be overwritten by later fire. This patch keep track of firing irq numbers and decrease only when the IRQ is injected to guest or buffered in APIC. Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:26 +02:00
Eddie Dong	2a8067f17b	KVM: pending irq save/restore Add in kernel irqchip save/restore support for pending vectors. [avi: fix compile warning on i386] [avi: remove printk] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:26 +02:00
Eddie Dong	96ad2cc613	KVM: in-kernel LAPIC save and restore support This patch adds a new vcpu-based IOCTL to save and restore the local apic registers for a single vcpu. The kernel only copies the apic page as a whole, extraction of registers is left to userspace side. On restore, the APIC timer is restarted from the initial count, this introduces a little delay, but works fine. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
He, Qing	6bf9e962d1	KVM: in-kernel IOAPIC save and restore support This patch adds support for in-kernel ioapic save and restore (to and from userspace). It uses the same get/set_irqchip ioctl as in-kernel PIC. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
He, Qing	c52fb35a8b	KVM: Bypass irq_pending get/set when using in kernel irqchip vcpu->irq_pending is saved in get/set_sreg IOCTL, but when in-kernel local APIC is used, doing this may occasionally overwrite vcpu->apic to an invalid value, as in the vm restore path. Signed-off-by: Qing He <qing.he@intel.com>	2007-10-13 10:18:25 +02:00
He, Qing	6ceb9d791e	KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support This patch adds two new ioctls to dump and write kernel irqchips for save/restore and live migration. PIC s/r and l/m is implemented in this patch. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	9cf98828d1	KVM: Protect in-kernel pio using kvm->lock pio operation and IRQ_LINE kvm_vm_ioctl is not kvm->lock protected. Add lock to same with IOAPIC MMIO operations. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	b6958ce44a	KVM: Emulate hlt in the kernel By sleeping in the kernel when hlt is executed, we simplify the in-kernel guest interrupt path considerably. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	1fd4f2a5ed	KVM: In-kernel I/O APIC model This allows in-kernel host-side device drivers to raise guest interrupts without going to userspace. [avi: fix level-triggered interrupt redelivery on eoi] [avi: add missing #include] [avi: avoid redelivery of edge-triggered interrupt] [avi: implement polarity] [avi: don't deliver edge-triggered interrupts when unmasking] [avi: fix host oops on invalid guest access] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	97222cc831	KVM: Emulate local APIC in kernel Because lightweight exits (exits which don't involve userspace) are many times faster than heavyweight exits, it makes sense to emulate high usage devices in the kernel. The local APIC is one such device, especially for Windows and for SMP, so we add an APIC model to kvm. It also allows in-kernel host-side drivers to inject interrupts without going through userspace. [compile fix on i386 from Jindrich Makovicka] Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00
Eddie Dong	7017fc3d1a	KVM: Define and use cr8 access functions This patch is to wrap APIC base register and CR8 operation which can provide a unique API for user level irqchip and kernel irqchip. This is a preparation of merging lapic/ioapic patch. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:25 +02:00

1 2 3 4

197 Commits