linux

Author	SHA1	Message	Date
Scott Wood	f1e89028f0	kvm/ppc/booke: Hold srcu lock when calling gfn functions KVM core expects arch code to acquire the srcu lock when calling gfn_to_memslot and similar functions. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-11 11:10:59 +03:00
Scott Wood	2b6398fcf2	kvm/ppc/booke64: Disable e6500 support The previous patch made 64-bit booke KVM build again, but Altivec support is still not complete, and we can't prevent the guest from turning on Altivec (which can corrupt host state until state save/restore is implemented). Disable e6500 on KVM until this is fixed. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-06-11 11:10:56 +03:00
Lai Jiangshan	505d6421bd	powerpc,kvm: fix imbalance srcu_read_[un]lock() At the point of up_out label in kvmppc_hv_setup_htab_rma(), srcu read lock is still held. We have to release it before return. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm@vger.kernel.org Cc: kvm-ppc@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2013-06-10 13:45:26 -07:00
Paul Mackerras	8e44ddc3f3	powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation This adds the remaining two hypercalls defined by PAPR for manipulating the XICS interrupt controller, H_IPOLL and H_XIRR_X. H_IPOLL returns information about the priority and pending interrupts for a virtual cpu, without changing any state. H_XIRR_X is like H_XIRR in that it reads and acknowledges the highest-priority pending interrupt, but it also returns the timestamp (timebase register value) from when the interrupt was first received by the hypervisor. Currently we just return the current time, since we don't do any software queueing of virtual interrupts inside the XICS emulation code. These hcalls are not currently used by Linux guests, but may be in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-01 08:29:27 +10:00
Marc Zyngier	535cf7b3b1	KVM: get rid of $(addprefix ../../../virt/kvm/, ...) in Makefiles As requested by the KVM maintainers, remove the addprefix used to refer to the main KVM code from the arch code, and replace it with a KVM variable that does the same thing. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Christoffer Dall <cdall@cs.columbia.edu> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2013-05-19 15:14:00 +03:00
Linus Torvalds	01227a889e	Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...	2013-05-05 14:47:31 -07:00
Linus Torvalds	5a148af669	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc update from Benjamin Herrenschmidt: "The main highlights this time around are: - A pile of addition POWER8 bits and nits, such as updated performance counter support (Michael Ellerman), new branch history buffer support (Anshuman Khandual), base support for the new PCI host bridge when not using the hypervisor (Gavin Shan) and other random related bits and fixes from various contributors. - Some rework of our page table format by Aneesh Kumar which fixes a thing or two and paves the way for THP support. THP itself will not make it this time around however. - More Freescale updates, including Altivec support on the new e6500 cores, new PCI controller support, and a pile of new boards support and updates. - The usual batch of trivial cleanups & fixes" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) powerpc: Fix build error for book3e powerpc: Context switch the new EBB SPRs powerpc: Turn on the EBB H/FSCR bits powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S powerpc: Setup BHRB instructions facility in HFSCR for POWER8 powerpc: Fix interrupt range check on debug exception powerpc: Update tlbie/tlbiel as per ISA doc powerpc: Print page size info during boot powerpc: print both base and actual page size on hash failure powerpc: Fix hpte_decode to use the correct decoding for page sizes powerpc: Decode the pte-lp-encoding bits correctly. powerpc: Use encode avpn where we need only avpn values powerpc: Reduce PTE table memory wastage powerpc: Move the pte free routines from common header powerpc: Reduce the PTE_INDEX_SIZE powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format powerpc: New hugepage directory format powerpc: Don't truncate pgd_index wrongly powerpc: Don't hard code the size of pte page powerpc: Save DAR and DSISR in pt_regs on MCE ...	2013-05-02 10:16:16 -07:00
Paul Mackerras	5975a2e095	KVM: PPC: Book3S: Add API for in-kernel XICS emulation This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-05-02 15:28:36 +02:00
Wei Yongjun	d133b40f2c	kvm/ppc/mpic: fix missing unlock in set_base_addr() Add the missing unlock before return from function set_base_addr() when disables the mapping. Introduced by commit `5df554ad5b` (kvm/ppc/mpic: in-kernel MPIC emulation) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-05-02 15:28:35 +02:00
Scott Wood	ed840ee9c8	kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write These functions do an srcu_dereference without acquiring the srcu lock themselves. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-05-02 15:28:35 +02:00
Scott Wood	1d6f6b7339	kvm/ppc/mpic: remove users This is an unused (no pun intended) leftover from when this code did reference counting. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-05-02 15:28:34 +02:00
Scott Wood	398d87836e	kvm/ppc/mpic: fix mmio region lists when multiple guests used Keeping a linked list of statically defined objects doesn't work very well when we have multiple guests. :-P Switch to an array of constant objects. This fixes a hang when multiple guests are used. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: remove struct list_head from mem_reg] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-05-02 15:28:33 +02:00
Linus Torvalds	20b4fb4852	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...	2013-05-01 17:51:54 -07:00
Linus Torvalds	5d434fcb25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...	2013-04-30 09:36:50 -07:00
Aneesh Kumar K.V	b1022fbd29	powerpc: Decode the pte-lp-encoding bits correctly. We look at both the segment base page size and actual page size and store the pte-lp-encodings in an array per base page size. We also update all relevant functions to take actual page size argument so that we can use the correct PTE LP encoding in HPTE. This should also get the basic Multiple Page Size per Segment (MPSS) support. This is needed to enable THP on ppc64. [Fixed PR KVM build --BenH] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-04-30 16:00:14 +10:00
Paul Mackerras	8b78645c93	KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state This adds the ability for userspace to save and restore the state of the XICS interrupt presentation controllers (ICPs) via the KVM_GET/SET_ONE_REG interface. Since there is one ICP per vcpu, we simply define a new 64-bit register in the ONE_REG space for the ICP state. The state includes the CPU priority setting, the pending IPI priority, and the priority and source number of any pending external interrupt. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:34 +02:00
Paul Mackerras	d19bd86204	KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls This adds support for the ibm,int-on and ibm,int-off RTAS calls to the in-kernel XICS emulation and corrects the handling of the saved priority by the ibm,set-xive RTAS call. With this, ibm,int-off sets the specified interrupt's priority in its saved_priority field and sets the priority to 0xff (the least favoured value). ibm,int-on restores the saved_priority to the priority field, and ibm,set-xive sets both the priority and the saved_priority to the specified priority value. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:33 +02:00
Paul Mackerras	4619ac88b7	KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts This streamlines our handling of external interrupts that come in while we're in the guest. First, when waking up a hardware thread that was napping, we split off the "napping due to H_CEDE" case earlier, and use the code that handles an external interrupt (0x500) in the guest to handle that too. Secondly, the code that handles those external interrupts now checks if any other thread is exiting to the host before bouncing an external interrupt to the guest, and also checks that there is actually an external interrupt pending for the guest before setting the LPCR MER bit (mediated external request). This also makes sure that we clear the "ceded" flag when we handle a wakeup from cede in real mode, and fixes a potential infinite loop in kvmppc_run_vcpu() which can occur if we ever end up with the ceded flag set but MSR[EE] off. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:32 +02:00
Benjamin Herrenschmidt	e7d26f285b	KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation This adds an implementation of the XICS hypercalls in real mode for HV KVM, which allows us to avoid exiting the guest MMU context on all threads for a variety of operations such as fetching a pending interrupt, EOI of messages, IPIs, etc. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:32 +02:00
Benjamin Herrenschmidt	54695c3088	KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM Currently, we wake up a CPU by sending a host IPI with smp_send_reschedule() to thread 0 of that core, which will take all threads out of the guest, and cause them to re-evaluate their interrupt status on the way back in. This adds a mechanism to differentiate real host IPIs from IPIs sent by KVM for guest threads to poke each other, in order to target the guest threads precisely when possible and avoid that global switch of the core to host state. We then use this new facility in the in-kernel XICS code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:31 +02:00
Benjamin Herrenschmidt	bc5ad3f370	KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller This adds in-kernel emulation of the XICS (eXternal Interrupt Controller Specification) interrupt controller specified by PAPR, for both HV and PR KVM guests. The XICS emulation supports up to 1048560 interrupt sources. Interrupt source numbers below 16 are reserved; 0 is used to mean no interrupt and 2 is used for IPIs. Internally these are represented in blocks of 1024, called ICS (interrupt controller source) entities, but that is not visible to userspace. Each vcpu gets one ICP (interrupt controller presentation) entity, used to store the per-vcpu state such as vcpu priority, pending interrupt state, IPI request, etc. This does not include any API or any way to connect vcpus to their ICP state; that will be added in later patches. This is based on an initial implementation by Michael Ellerman <michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and Paul Mackerras. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix typo, add dependency on !KVM_MPIC] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:30 +02:00
Michael Ellerman	8e591cb720	KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls For pseries machine emulation, in order to move the interrupt controller code to the kernel, we need to intercept some RTAS calls in the kernel itself. This adds an infrastructure to allow in-kernel handlers to be registered for RTAS services by name. A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to associate token values with those service names. Then, when the guest requests an RTAS service with one of those token values, it will be handled by the relevant in-kernel handler rather than being passed up to userspace as at present. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix warning] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:29 +02:00
Scott Wood	91194919a6	kvm/ppc/mpic: Eliminate mmio_mapped We no longer need to keep track of this now that MPIC destruction always happens either during VM destruction (after MMIO has been destroyed) or during a failed creation (before the fd has been exposed to userspace, and thus before the MMIO region could have been registered). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:28 +02:00
Scott Wood	07f0a7bdec	kvm: destroy emulated devices on VM exit The hassle of getting refcounting right was greater than the hassle of keeping a list of devices to destroy on VM exit. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:28 +02:00
Alexander Graf	447a03c02a	KVM: PPC: MPIC: Restrict to e500 platforms The code as is doesn't make any sense on non-e500 platforms. Restrict it there, so that people don't get wrong ideas on what would actually work. This patch should get reverted as soon as it's possible to either run e500 guests on non-e500 hosts or the MPIC emulation gains support for non-e500 modes. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:26 +02:00
Alexander Graf	5efdb4be59	KVM: PPC: MPIC: Add support for KVM_IRQ_LINE Now that all pieces are in place for reusing generic irq infrastructure, we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply reuse it for PPC, as it will work there just as well. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:25 +02:00
Alexander Graf	de9ba2f363	KVM: PPC: Support irq routing and irqfd for in-kernel MPIC Now that all the irq routing and irqfd pieces are generic, we can expose real irqchip support to all of KVM's internal helpers. This allows us to use irqfd with the in-kernel MPIC. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:25 +02:00
Scott Wood	eb1e4f43e0	kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC Enabling this capability connects the vcpu to the designated in-kernel MPIC. Using explicit connections between vcpus and irqchips allows for flexibility, but the main benefit at the moment is that it simplifies the code -- KVM doesn't need vm-global state to remember which MPIC object is associated with this vm, and it doesn't need to care about ordering between irqchip creation and vcpu creation. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:24 +02:00
Scott Wood	5df554ad5b	kvm/ppc/mpic: in-kernel MPIC emulation Hook the MPIC code up to the KVM interfaces, add locking, etc. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:23 +02:00
Scott Wood	f0f5c481a9	kvm/ppc/mpic: adapt to kernel style and environment Remove braces that Linux style doesn't permit, remove space after '*' that Lindent added, keep error/debug strings contiguous, etc. Substitute type names, debug prints, etc. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:22 +02:00
Scott Wood	6dd830a09a	kvm/ppc/mpic: remove some obviously unneeded code Remove some parts of the code that are obviously QEMU or Raven specific before fixing style issues, to reduce the style issues that need to be fixed. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:22 +02:00
Scott Wood	b823f98f89	kvm/ppc/mpic: import hw/openpic.c from QEMU This is QEMU's hw/openpic.c from commit abd8d4a4d6dfea7ddea72f095f993e1de941614e ("Update version for 1.4.0-rc0"), run through Lindent with no other changes to ease merging future changes between Linux and QEMU. Remaining style issues (including those introduced by Lindent) will be fixed in a later patch. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:20 +02:00
Paul Mackerras	c35635efdc	KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications done by the host to the virtual processor areas (VPAs) and dispatch trace logs (DTLs) registered by the guest. This is because those modifications are done either in real mode or in the host kernel context, and in neither case does the access go through the guest's HPT, and thus no change (C) bit gets set in the guest's HPT. However, the changes done by the host do need to be tracked so that the modified pages get transferred when doing live migration. In order to track these modifications, this adds a dirty flag to the struct representing the VPA/DTL areas, and arranges to set the flag when the VPA/DTL gets modified by the host. Then, when we are collecting the dirty log, we also check the dirty flags for the VPA and DTL for each vcpu and set the relevant bit in the dirty log if necessary. Doing this also means we now need to keep track of the guest physical address of the VPA/DTL areas. So as not to lose track of modifications to a VPA/DTL area when it gets unregistered, or when a new area gets registered in its place, we need to transfer the dirty state to the rmap chain. This adds code to kvmppc_unpin_guest_page() to do that if the area was dirty. To simplify that code, we now require that all VPA, DTL and SLB shadow buffer areas fit within a single host page. Guests already comply with this requirement because pHyp requires that these areas not cross a 4k boundary. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:13 +02:00
Paul Mackerras	a1b4a0f606	KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes At present, the code that determines whether a HPT entry has changed, and thus needs to be sent to userspace when it is copying the HPT, doesn't consider a hardware update to the reference and change bits (R and C) in the HPT entries to constitute a change that needs to be sent to userspace. This adds code to check for changes in R and C when we are scanning the HPT to find changed entries, and adds code to set the changed flag for the HPTE when we update the R and C bits in the guest view of the HPTE. Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification() function into kvm_book3s_64.h. Current Linux guest kernels don't use the hardware updates of R and C in the HPT, so this change won't affect them. Linux (or other) kernels might in future want to use the R and C bits and have them correctly transferred across when a guest is migrated, so it is better to correct this deficiency. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:12 +02:00
Mihai Caraman	d9ce6041b3	KVM: PPC: e500: Add e6500 core to Kconfig description Add e6500 core to Kconfig description. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:11 +02:00
Mihai Caraman	ea17a971c2	KVM: PPC: e500mc: Enable e6500 cores Extend processor compatibility names to e6500 cores. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:10 +02:00
Mihai Caraman	5b21501045	KVM: PPC: e500: Remove E.PT and E.HV.LRAT categories from VCPUs Embedded.Page Table (E.PT) category is not supported yet in e6500 kernel. Configure TLBnCFG to remove E.PT and E.HV.LRAT categories from VCPUs. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:09 +02:00
Mihai Caraman	9a6061d7fd	KVM: PPC: e500: Add support for EPTCFG register EPTCFG register defined by E.PT is accessed unconditionally by Linux guests in the presence of MAV 2.0. Emulate it now. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:08 +02:00
Mihai Caraman	307d9008ed	KVM: PPC: e500: Add support for TLBnPS registers Add support for TLBnPS registers available in MMU Architecture Version (MAV) 2.0. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:07 +02:00
Mihai Caraman	8893a188b1	KVM: PPC: e500: Move vcpu's MMU configuration to dedicated functions Vcpu's MMU default configuration and geometry update logic was buried in a chunk of code. Move them to dedicated functions to add more clarity. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:07 +02:00
Mihai Caraman	a85d2aa23e	KVM: PPC: e500: Expose MMU registers via ONE_REG MMU registers were exposed to user-space using sregs interface. Add them to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation mechanism. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:06 +02:00
Mihai Caraman	35b299e279	KVM: PPC: Book3E: Refactor ONE_REG ioctl implementation Refactor Book3E ONE_REG ioctl implementation to use kvmppc_get_one_reg/ kvmppc_set_one_reg delegation interface introduced by Book3S. This is necessary for MMU SPRs which are platform specifics. Get rid of useless case braces in the process. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:05 +02:00
Bharat Bhushan	9b4f530807	booke: exit to user space if emulator request This allows the exit to user space if emulator request by returning EMULATE_EXIT_USER. This will be used in subsequent patches in list Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:04 +02:00
Bharat Bhushan	0f47f9b517	KVM: extend EMULATE_EXIT_USER to support different exit reasons Currently the instruction emulator code returns EMULATE_EXIT_USER and common code initializes the "run->exit_reason = .." and "vcpu->arch.hcall_needed = .." with one fixed reason. But there can be different reasons when emulator need to exit to user space. To support that the "run->exit_reason = .." and "vcpu->arch.hcall_needed = .." initialization is moved a level up to emulator. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:03 +02:00
Bharat Bhushan	c402a3f457	Rename EMULATE_DO_PAPR to EMULATE_EXIT_USER Instruction emulation return EMULATE_DO_PAPR when it requires exit to userspace on book3s. Similar return is required for booke. EMULATE_DO_PAPR reads out to be confusing so it is renamed to EMULATE_EXIT_USER. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:03 +02:00
Bharat Bhushan	092d62ee93	KVM: PPC: debug stub interface parameter defined This patch defines the interface parameter for KVM_SET_GUEST_DEBUG ioctl support. Follow up patches will use this for setting up hardware breakpoints, watchpoints and software breakpoints. Also kvm_arch_vcpu_ioctl_set_guest_debug() is brought one level below. This is because I am not sure what is required for book3s. So this ioctl behaviour will not change for book3s. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-26 20:27:02 +02:00
Benjamin Herrenschmidt	234d15def9	Merge remote-tracking branch 'origin/master' into next Merge upstream to get the audit fixes	2013-04-24 14:43:36 +10:00
Paul Mackerras	3cc33d50f5	powerpc: Fix build errors with UP configs in HV-style KVM This fixes these errors when building UP with CONFIG_KVM_BOOK3S_64_HV=y: arch/powerpc/kvm/book3s_hv.c:1855:2: error: implicit declaration of function 'inhibit_secondary_onlining' [-Werror=implicit-function-declaration] arch/powerpc/kvm/book3s_hv.c:1862:2: error: implicit declaration of function 'uninhibit_secondary_onlining' [-Werror=implicit-function-declaration] cc1: all warnings being treated as errors and this error (with CONFIG_KVM_BOOK3S_64=m, or a vmlinux link error with CONFIG_KVM_BOOK3S_64=y): ERROR: "smp_send_reschedule" [arch/powerpc/kvm/kvm.ko] undefined! make[2]: *** [__modpost] Error 1 The fix for the link error is suboptimal; ideally we want a self_ipi() function from irq.c, connected at least to the MPIC code, to initiate an IPI to this cpu. The fix here at least lets the code build, and it will work, just with interrupts being delayed sometimes. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 13:03:57 +10:00
Zhang Yanfei	c843be8a54	powerpc: remove cast for kmalloc/kzalloc return value remove cast for kmalloc/kzalloc return value. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <michael@ellerman.id.au>	2013-04-18 13:03:56 +10:00
Scott Wood	be28a27c99	kvm/ppc: don't call complete_mmio_load when it's a store complete_mmio_load writes back the mmio result into the destination register. Doing this on a store results in register corruption. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-17 15:21:16 +02:00
Stuart Yoder	c32498ee64	KVM: PPC: emulate dcbst Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-17 15:21:15 +02:00
Bharat Bhushan	8c32a2ea65	Added ONE_REG interface for debug instruction This patch adds the one_reg interface to get the special instruction to be used for setting software breakpoint from userspace. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-17 15:21:14 +02:00
Scott Wood	4d2be6f7c7	kvm/ppc/e500: eliminate tlb_refs Commit `523f0e5421` ("KVM: PPC: E500: Explicitly mark shadow maps invalid") began using E500_TLB_VALID for guest TLB1 entries, and skipping invalidations if it's not set. However, when E500_TLB_VALID was set for such entries, it was on a fake local ref, and so the invalidations never happen. gtlb_privs is documented as being only for guest TLB0, though we already violate that with E500_TLB_BITMAP. Now that we have MMU notifiers, and thus don't need to actually retain a reference to the mapped pages, get rid of tlb_refs, and use gtlb_privs for E500_TLB_VALID in TLB1. Since we can have more than one host TLB entry for a given tlbe_ref, be careful not to clear existing flags that are relevant to other host TLB entries when preparing a new host TLB entry. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-11 15:53:43 +02:00
Scott Wood	66a5fecdcc	kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit It's possible that we're using the same host TLB1 slot to map (a presumably different portion of) the same guest TLB1 entry. Clear the bit in the map before setting it, so that if the esels are the same the bit will remain set. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-11 15:53:38 +02:00
Scott Wood	6b2ba1a912	kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid Add one to esel values in h2g_tlb1_rmap, so that "no mapping" can be distinguished from "esel 0". Note that we're not saved by the fact that host esel 0 is reserved for non-KVM use, because KVM host esel numbering is not the raw host numbering (see to_htlb1_esel). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-11 15:53:34 +02:00
Scott Wood	c5e6cb051c	kvm/powerpc/e500mc: fix tlb invalidation on cpu migration The existing check handles the case where we've migrated to a different core than we last ran on, but it doesn't handle the case where we're still on the same cpu we last ran on, but some other vcpu has run on this cpu in the meantime. Without this, guest segfaults (and other misbehavior) have been seen in smp guests. Cc: stable@vger.kernel.org # 3.8.x Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-04-11 00:06:39 +02:00
Al Viro	75ef9de126	constify a bunch of struct file_operations instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-09 14:16:20 -04:00
Paul Mackerras	4fe27d2add	KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external Currently kvmppc_core_dequeue_external() takes a struct kvm_interrupt * argument and does nothing with it, in any of its implementations. This removes it in order to make things easier for forthcoming in-kernel interrupt controller emulation code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-03-22 01:21:17 +01:00
Scott Wood	47bf379742	kvm/ppc/e500: eliminate tlb_refs Commit `523f0e5421` ("KVM: PPC: E500: Explicitly mark shadow maps invalid") began using E500_TLB_VALID for guest TLB1 entries, and skipping invalidations if it's not set. However, when E500_TLB_VALID was set for such entries, it was on a fake local ref, and so the invalidations never happen. gtlb_privs is documented as being only for guest TLB0, though we already violate that with E500_TLB_BITMAP. Now that we have MMU notifiers, and thus don't need to actually retain a reference to the mapped pages, get rid of tlb_refs, and use gtlb_privs for E500_TLB_VALID in TLB1. Since we can have more than one host TLB entry for a given tlbe_ref, be careful not to clear existing flags that are relevant to other host TLB entries when preparing a new host TLB entry. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-03-22 01:21:15 +01:00
Scott Wood	36ada4f431	kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit It's possible that we're using the same host TLB1 slot to map (a presumably different portion of) the same guest TLB1 entry. Clear the bit in the map before setting it, so that if the esels are the same the bit will remain set. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-03-22 01:21:13 +01:00
Scott Wood	d6940b6416	kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid Add one to esel values in h2g_tlb1_rmap, so that "no mapping" can be distinguished from "esel 0". Note that we're not saved by the fact that host esel 0 is reserved for non-KVM use, because KVM host esel numbering is not the raw host numbering (see to_htlb1_esel). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-03-22 01:21:11 +01:00
Bharat Bhushan	15b708beee	KVM: PPC: booke: Added debug handler Installed debug handler will be used for guest debug support and debug facility emulation features (patches for these features will follow this patch). Signed-off-by: Liu Yu <yu.liu@freescale.com> [bharat.bhushan@freescale.com: Substantial changes] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-03-22 01:21:09 +01:00
Bharat Bhushan	78accda4f8	KVM: PPC: Added one_reg interface for timer registers If userspace wants to change some specific bits of TSR (timer status register) then it uses GET/SET_SREGS ioctl interface. So the steps will be: i) user-space will make get ioctl, ii) change TSR in userspace iii) then make set ioctl. It can happen that TSR gets changed by kernel after step i) and before step iii). To avoid this we have added below one_reg ioctls for oring and clearing specific bits in TSR. This patch adds one registerface for: 1) setting specific bit in TSR (timer status register) 2) clearing specific bit in TSR (timer status register) 3) setting/getting the TCR register. There are cases where we want to only change TCR and not TSR. Although we can uses SREGS without KVM_SREGS_E_UPDATE_TSR flag but I think one reg is better. I am open if someone feels we should use SREGS only here. 4) getting/setting TSR register Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-03-22 01:21:06 +01:00
Bharat Bhushan	d26f22c9cd	KVM: PPC: move tsr update in a separate function This is done so that same function can be called from SREGS and ONE_REG interface (follow up patch). Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-03-22 01:21:05 +01:00
Marcelo Tosatti	2ae33b3896	Merge remote-tracking branch 'upstream/master' into queue Merge reason: From: Alexander Graf <agraf@suse.de> "Just recently this really important patch got pulled into Linus' tree for 3.9: commit `1674400aae` Author: Anton Blanchard <anton <at> samba.org> Date: Tue Mar 12 01:51:51 2013 +0000 Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue. Could you please merge kvm/next against linus/master, so that I can base my trees against that?" * upstream/master: (653 commits) PCI: Use ROM images from firmware only if no other ROM source available sparc: remove unused "config BITS" sparc: delete "if !ULTRA_HAS_POPULATION_COUNT" KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED inet: limit length of fragment queue hash table bucket lists qeth: Fix scatter-gather regression qeth: Fix invalid router settings handling qeth: delay feature trace sgy-cts1000: Remove __dev* attributes KVM: x86: fix deadlock in clock-in-progress request handling KVM: allow host header to be included even for !CONFIG_KVM hwmon: (lm75) Fix tcn75 prefix hwmon: (lm75.h) Update header inclusion MAINTAINERS: Remove Mark M. Hoffman xfs: ensure we capture IO errors correctly xfs: fix xfs_iomap_eof_prealloc_initial_size type ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-03-21 11:11:52 -03:00
Zhang Yanfei	6e51c9ff6a	powerpc: remove cast for kmalloc/kzalloc return value remove cast for kmalloc/kzalloc return value. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-03-18 14:16:00 +01:00
Aneesh Kumar K.V	af81d7878c	powerpc: Rename USER_ESID_BITS* to ESID_BITS* Now we use ESID_BITS of kernel address to build proto vsid. So rename USER_ESIT_BITS to ESID_BITS Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.8]	2013-03-17 12:45:44 +11:00
Takuya Yoshikawa	8482644aea	KVM: set_memory_region: Refactor commit_memory_region() This patch makes the parameter old a const pointer to the old memory slot and adds a new parameter named change to know the change being requested: the former is for removing extra copying and the latter is for cleaning up the code. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-03-04 20:21:08 -03:00
Takuya Yoshikawa	7b6195a91d	KVM: set_memory_region: Refactor prepare_memory_region() This patch drops the parameter old, a copy of the old memory slot, and adds a new parameter named change to know the change being requested. This not only cleans up the code but also removes extra copying of the memory slot structure. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-03-04 20:21:08 -03:00
Takuya Yoshikawa	462fce4606	KVM: set_memory_region: Drop user_alloc from prepare/commit_memory_region() X86 does not use this any more. The remaining user, s390's !user_alloc check, can be simply removed since KVM_SET_MEMORY_REGION ioctl is no longer supported. Note: fixed powerpc's indentations with spaces to suppress checkpatch errors. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2013-03-04 20:21:08 -03:00
Sasha Levin	b67bfe0d42	hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S \| hlist_for_each_entry_continue(a, - b, c) S \| hlist_for_each_entry_from(a, - b, c) S \| hlist_for_each_entry_rcu(a, - b, c, d) S \| hlist_for_each_entry_rcu_bh(a, - b, c, d) S \| hlist_for_each_entry_continue_rcu_bh(a, - b, c) S \| for_each_busy_worker(a, c, - b, d) S \| ax25_uid_for_each(a, - b, c) S \| ax25_for_each(a, - b, c) S \| inet_bind_bucket_for_each(a, - b, c) S \| sctp_for_each_hentry(a, - b, c) S \| sk_for_each(a, - b, c) S \| sk_for_each_rcu(a, - b, c) S \| sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S \| sk_for_each_safe(a, - b, c, d) S \| sk_for_each_bound(a, - b, c) S \| hlist_for_each_entry_safe(a, - b, c, d, e) S \| hlist_for_each_entry_continue_rcu(a, - b, c) S \| nr_neigh_for_each(a, - b, c) S \| nr_neigh_for_each_safe(a, - b, c, d) S \| nr_node_for_each(a, - b, c) S \| nr_node_for_each_safe(a, - b, c, d) S \| - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S \| - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S \| for_each_host(a, - b, c) S \| for_each_host_safe(a, - b, c, d) S \| for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:24 -08:00
Linus Torvalds	89f883372f	Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...	2013-02-24 13:07:18 -08:00
Linus Torvalds	9d3cae26ac	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "So from the depth of frozen Minnesota, here's the powerpc pull request for 3.9. It has a few interesting highlights, in addition to the usual bunch of bug fixes, minor updates, embedded device tree updates and new boards: - Hand tuned asm implementation of SHA1 (by Paulus & Michael Ellerman) - Support for Doorbell interrupts on Power8 (kind of fast thread-thread IPIs) by Ian Munsie - Long overdue cleanup of the way we handle relocation of our open firmware trampoline (prom_init.c) on 64-bit by Anton Blanchard - Support for saving/restoring & context switching the PPR (Processor Priority Register) on server processors that support it. This allows the kernel to preserve thread priorities established by userspace. By Haren Myneni. - DAWR (new watchpoint facility) support on Power8 by Michael Neuling - Ability to change the DSCR (Data Stream Control Register) which controls cache prefetching on a running process via ptrace by Alexey Kardashevskiy - Support for context switching the TAR register on Power8 (new branch target register meant to be used by some new specific userspace perf event interrupt facility which is yet to be enabled) by Ian Munsie. - Improve preservation of the CFAR register (which captures the origin of a branch) on various exception conditions by Paulus. - Move the Bestcomm DMA driver from arch powerpc to drivers/dma where it belongs by Philippe De Muyter - Support for Transactional Memory on Power8 by Michael Neuling (based on original work by Matt Evans). For those curious about the feature, the patch contains a pretty good description." (See commit `db8ff90702`: "powerpc: Documentation for transactional memory on powerpc" for the mentioned description added to the file Documentation/powerpc/transactional_memory.txt) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (140 commits) powerpc/kexec: Disable hard IRQ before kexec powerpc/85xx: l2sram - Add compatible string for BSC9131 platform powerpc/85xx: bsc9131 - Correct typo in SDHC device node powerpc/e500/qemu-e500: enable coreint powerpc/mpic: allow coreint to be determined by MPIC version powerpc/fsl_pci: Store the pci ctlr device ptr in the pci ctlr struct powerpc/85xx: Board support for ppa8548 powerpc/fsl: remove extraneous DIU platform functions arch/powerpc/platforms/85xx/p1022_ds.c: adjust duplicate test powerpc: Documentation for transactional memory on powerpc powerpc: Add transactional memory to pseries and ppc64 defconfigs powerpc: Add config option for transactional memory powerpc: Add transactional memory to POWER8 cpu features powerpc: Add new transactional memory state to the signal context powerpc: Hook in new transactional memory code powerpc: Routines for FP/VSX/VMX unavailable during a transaction powerpc: Add transactional memory unavaliable execption handler powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes powerpc: Add FP/VSX and VMX register load functions for transactional memory powerpc: Add helper functions for transactional memory context switching ...	2013-02-23 17:09:55 -08:00
Paul Mackerras	deb26c274d	powerpc/kvm/book3s_pr: Fix compilation on 32-bit machines Commit `a413f474a0` ("powerpc: Disable relocation on exceptions whenever PR KVM is active") added calls to pSeries_disable_reloc_on_exc() and pSeries_enable_reloc_on_exc() to book3s_pr.c, and added declarations of those functions to <asm/hvcall.h>, but didn't add an include of <asm/hvcall.h> to book3s_pr.c. 64-bit kernels seem to get hvcall.h included via some other path, but 32-bit kernels fail to compile with: arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_init_vm’: arch/powerpc/kvm/book3s_pr.c:1300:4: error: implicit declaration of function ‘pSeries_disable_reloc_on_exc’ [-Werror=implicit-function-declaration] arch/powerpc/kvm/book3s_pr.c: In function ‘kvmppc_core_destroy_vm’: arch/powerpc/kvm/book3s_pr.c:1316:4: error: implicit declaration of function ‘pSeries_enable_reloc_on_exc’ [-Werror=implicit-function-declaration] cc1: all warnings being treated as errors make[2]: * [arch/powerpc/kvm/book3s_pr.o] Error 1 make[1]: * [arch/powerpc/kvm] Error 2 make: *** [sub-make] Error 2 This fixes it by adding an include of hvcall.h. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-02-15 16:54:36 +11:00
Paul Mackerras	0acb91112a	powerpc/kvm/book3s_hv: Preserve guest CFAR register value The CFAR (Come-From Address Register) is a useful debugging aid that exists on POWER7 processors. Currently HV KVM doesn't save or restore the CFAR register for guest vcpus, making the CFAR of limited use in guests. This adds the necessary code to capture the CFAR value saved in the early exception entry code (it has to be saved before any branch is executed), save it in the vcpu.arch struct, and restore it on entry to the guest. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-02-15 16:54:33 +11:00
Alexander Graf	011da89962	KVM: PPC: BookE: Handle alignment interrupts When the guest triggers an alignment interrupt, we don't handle it properly today and instead BUG_ON(). This really shouldn't happen. Instead, we should just pass the interrupt back into the guest so it can deal with it. Reported-by: Gao Guanhua-B22826 <B22826@freescale.com> Tested-by: Gao Guanhua-B22826 <B22826@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-02-13 12:56:45 +01:00
Bharat Bhushan	1d542d9c2b	KVM: PPC: booke: Allow multiple exception types Current kvmppc_booke_handlers uses the same macro (KVM_HANDLER) and all handlers are considered to be the same size. This will not be the case if we want to use different macros for different handlers. This patch improves the kvmppc_booke_handler so that it can support different macros for different handlers. Signed-off-by: Liu Yu <yu.liu@freescale.com> [bharat.bhushan@freescale.com: Substantial changes] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-02-13 12:56:40 +01:00
Bharat Bhushan	ffe129ecd7	KVM: PPC: booke: use vcpu reference from thread_struct Like other places, use thread_struct to get vcpu reference. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-02-13 12:56:39 +01:00
Benjamin Herrenschmidt	dfd0436ad0	Merge branch 'merge' into next Merge "merge" branch to bring in various bug fixes that are going into 3.8	2013-01-29 11:33:37 +11:00
Greg Kroah-Hartman	422d26b6ec	Merge 3.8-rc5 into driver-core-next This resolves a gpio driver merge issue pointed out in linux-next. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-25 21:06:30 -08:00
Alexander Graf	b9e3e20893	KVM: PPC: E500: Remove kvmppc_e500_tlbil_all usage from guest TLB code The guest TLB handling code should not have any insight into how the host TLB shadow code works. kvmppc_e500_tlbil_all() is a function that is used for distinction between e500v2 and e500mc (E.HV) on how to flush shadow entries. This function really is private between the e500.c/e500mc.c file and e500_mmu_host.c. Instead of this one, use the public kvmppc_core_flush_tlb() function to flush all shadow TLB entries. As a nice side effect, with this we also end up flushing TLB1 entries which we forgot to do before. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:34 +01:00
Alexander Graf	483ba97c0f	KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static Host shadow TLB flushing is logic that the guest TLB code should have no insight about. Declare the internal clear_tlb_refs and clear_tlb1_bitmap functions static to the host TLB handling file. Instead of these, we can use the already exported kvmppc_core_flush_tlb(). This gives us a common API across the board to say "please flush any pending host shadow translation". Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:33 +01:00
Alexander Graf	c015c62b13	KVM: PPC: e500: Implement TLB1-in-TLB0 mapping When a host mapping fault happens in a guest TLB1 entry today, we map the translated guest entry into the host's TLB1. This isn't particularly clever when the guest is mapped by normal 4k pages, since these would be a lot better to put into TLB0 instead. This patch adds the required logic to map 4k TLB1 shadow maps into the host's TLB0. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:32 +01:00
Alexander Graf	b71c9e2fb7	KVM: PPC: E500: Split host and guest MMU parts This patch splits the file e500_tlb.c into e500_mmu.c (guest TLB handling) and e500_mmu_host.c (host TLB handling). The main benefit of this split is readability and maintainability. It's just a lot harder to write dirty code :). Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:31 +01:00
Alexander Graf	9d98b3ff94	KVM: PPC: e500: Call kvmppc_mmu_map for initial mapping When emulating tlbwe, we want to automatically map the entry that just got written in our shadow TLB map, because chances are quite high that it's going to be used very soon. Today this happens explicitly, duplicating all the logic that is in kvmppc_mmu_map() already. Just call that one instead. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:31 +01:00
Alexander Graf	2c378fd779	KVM: PPC: E500: Propagate errors when shadow mapping When shadow mapping a page, mapping this page can fail. In that case we don't have a shadow map. Take this case into account, otherwise we might end up writing bogus TLB entries into the host TLB. While at it, also move the write_stlbe() calls into the respective TLBn handlers. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:30 +01:00
Alexander Graf	523f0e5421	KVM: PPC: E500: Explicitly mark shadow maps invalid When we invalidate shadow TLB maps on the host, we don't mark them as not valid. But we should. Fix this by removing the E500_TLB_VALID from their flags when invalidating. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:30 +01:00
Alexander Graf	9445ef0181	KVM: PPC: E500: Move write_stlbe higher Later patches want to call the function and it doesn't have dependencies on anything below write_host_tlbe. Move it higher up in the file. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-24 19:23:29 +01:00
Kees Cook	07ff8b5358	arch/powerpc/kvm: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Alexander Graf <agraf@suse.de> CC: Avi Kivity <avi@redhat.com> CC: Marcelo Tosatti <mtosatti@redhat.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-21 14:43:12 -08:00
Alexander Graf	d3286144c9	KVM: PPC: Emulate dcbf Guests can trigger MMIO exits using dcbf. Since we don't emulate cache incoherent MMIO, just do nothing and move on. Reported-by: Ben Collins <ben.c@servergy.com> Signed-off-by: Alexander Graf <agraf@suse.de> Tested-by: Ben Collins <ben.c@servergy.com> CC: stable@vger.kernel.org	2013-01-18 00:40:49 +01:00
Alexander Graf	324b3e6316	KVM: PPC: BookE: Add EPR ONE_REG sync We need to be able to read and write the contents of the EPR register from user space. This patch implements that logic through the ONE_REG API and declares its (never implemented) SREGS counterpart as deprecated. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-10 13:42:33 +01:00
Alexander Graf	1c81063655	KVM: PPC: BookE: Implement EPR exit The External Proxy Facility in FSL BookE chips allows the interrupt controller to automatically acknowledge an interrupt as soon as a core gets its pending external interrupt delivered. Today, user space implements the interrupt controller, so we need to check on it during such a cycle. This patch implements logic for user space to enable EPR exiting, disable EPR exiting and EPR exiting itself, so that user space can acknowledge an interrupt when an external interrupt has successfully been delivered into the guest vcpu. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-10 13:42:31 +01:00
Alexander Graf	37ecb257f6	KVM: PPC: BookE: Emulate mfspr on EPR The EPR register is potentially valid for PR KVM as well, so we need to emulate accesses to it. It's only defined for reading, so only handle the mfspr case. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-10 13:42:30 +01:00
Alexander Graf	b8c649a99d	KVM: PPC: BookE: Allow irq deliveries to inject requests When injecting an interrupt into guest context, we usually don't need to check for requests anymore. At least not until today. With the introduction of EPR, we will have to create a request when the guest has successfully accepted an external interrupt though. So we need to prepare the interrupt delivery to abort guest entry gracefully. Otherwise we'd delay the EPR request. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-10 13:42:21 +01:00
Mihai Caraman	f2be655004	KVM: PPC: Fix mfspr/mtspr MMUCFG emulation On mfspr/mtspr emulation path Book3E's MMUCFG SPR with value 1015 clashes with G4's MSSSR0 SPR. Move MSSSR0 emulation from generic part to Books3S. MSSSR0 also clashes with Book3S's DABRX SPR. DABRX was not explicitly handled so Book3S execution flow will behave as before. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-10 13:30:11 +01:00
Alexander Graf	50c7bb80b5	KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1 When running on top of pHyp, the hypercall instruction "sc 1" goes straight into pHyp without trapping in supervisor mode. So if we want to support PAPR guest in this configuration we need to add a second way of accessing PAPR hypercalls, preferably with the exact same semantics except for the instruction. So let's overlay an officially reserved instruction and emulate PAPR hypercalls whenever we hit that one. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-10 13:15:08 +01:00
Alexander Graf	5a33169ed2	KVM: PPC: Only WARN on invalid emulation When we hit an emulation result that we didn't expect, that is an error, but it's nothing that warrants a BUG(), because it can be guest triggered. So instead, let's only WARN() the user that this happened. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-10 13:15:08 +01:00
Ian Munsie	a413f474a0	powerpc: Disable relocation on exceptions whenever PR KVM is active For PR KVM we allow userspace to map 0xc000000000000000. Because transitioning from userspace to the guest kernel may use the relocated exception vectors we have to disable relocation on exceptions whenever PR KVM is active as we cannot trust that address. This issue does not apply to HV KVM, since changing from a guest to the hypervisor will never use the relocated exception vectors. Currently the hypervisor interface only allows us to toggle relocation on exceptions on a partition wide scope, so we need to globally disable relocation on exceptions when the first PR KVM instance is started and only re-enable them when all PR KVM instances have been destroyed. It's a bit heavy handed, but until the hypervisor gives us a lightweight way to toggle relocation on exceptions on a single thread it's only real option. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-01-10 17:00:42 +11:00
Andreas Schwab	d591390da9	KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV Fixes this build breakage: arch/powerpc/kvm/book3s_hv_ras.c: In function ‘kvmppc_realmode_mc_power7’: arch/powerpc/kvm/book3s_hv_ras.c:126:23: error: ‘struct paca_struct’ has no member named ‘opal_mc_evt’ Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-01-06 14:02:00 +01:00
Alex Williamson	f82a8cfe93	KVM: struct kvm_memory_slot.user_alloc -> bool There's no need for this to be an int, it holds a boolean. Move to the end of the struct for alignment. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:24:38 -02:00
Alex Williamson	bbacc0c111	KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is the user accessible slots and the other is user + private. Make this more obvious. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-13 23:21:57 -02:00
Mihai Caraman	352df1deb2	KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to the list of ONE_REG PPC supported registers. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: remove HV dependency, use get/put_user] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:20 +01:00
Mihai Caraman	38f988240c	KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation Add EPCR support in booke mtspr/mfspr emulation. EPCR register is defined only for 64-bit and HV categories, we will expose it at this point only to 64-bit virtual processors running on 64-bit HV hosts. Define a reusable setter function for vcpu's EPCR. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: move HV dependency in the code] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:19 +01:00
Mihai Caraman	95e90b43c9	KVM: PPC: bookehv: Add guest computation mode for irq delivery When delivering guest IRQs, update MSR computation mode according to guest interrupt computation mode found in EPCR. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: remove HV dependency in the code] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:18 +01:00
Mihai Caraman	e9666ea1b3	KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit Extend MAS2 EPN mask to retain most significant bits on 64-bit hosts. Use this mask in tlb effective address accessor. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:15 +01:00
Mihai Caraman	9e2fa64693	KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation Mask high 32 bits of MAS2's effective page number in tlbwe emulation for guests running in 32-bit mode. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:14 +01:00
Mihai Caraman	7cdd7a95c6	KVM: PPC: e500: Add emulation helper for getting instruction ea Add emulation helper for getting instruction ea and refactor tlb instruction emulation to use it. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: keep rt variable around] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:12 +01:00
Mihai Caraman	e51f8f32d6	KVM: PPC: bookehv64: Add support for interrupt handling Add interrupt handling support for 64-bit bookehv hosts. Unify 32 and 64 bit implementations using a common stack layout and a common execution flow starting from kvm_handler_common macro. Update documentation for 64-bit input register values. This patch only address the bolted TLB miss exception handlers version. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:11 +01:00
Mihai Caraman	ff59474684	KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler GET_VCPU define will not be implemented for 64-bit for performance reasons so get rid of it also on 32-bit. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:10 +01:00
Mihai Caraman	b50df19ccc	KVM: PPC: booke: Fix get_tb() compile error on 64-bit Include header file for get_tb() declaration. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:09 +01:00
Mihai Caraman	910040b82d	KVM: PPC: e500: Silence bogus GCC warning in tlb code 64-bit GCC 4.5.1 warns about an uninitialized variable which was guarded by a flag. Initialize the variable to make it happy. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: reword comment] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:08 +01:00
Paul Mackerras	b4072df407	KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking Currently, if a machine check interrupt happens while we are in the guest, we exit the guest and call the host's machine check handler, which tends to cause the host to panic. Some machine checks can be triggered by the guest; for example, if the guest creates two entries in the SLB that map the same effective address, and then accesses that effective address, the CPU will take a machine check interrupt. To handle this better, when a machine check happens inside the guest, we call a new function, kvmppc_realmode_machine_check(), while still in real mode before exiting the guest. On POWER7, it handles the cases that the guest can trigger, either by flushing and reloading the SLB, or by flushing the TLB, and then it delivers the machine check interrupt directly to the guest without going back to the host. On POWER7, the OPAL firmware patches the machine check interrupt vector so that it gets control first, and it leaves behind its analysis of the situation in a structure pointed to by the opal_mc_evt field of the paca. The kvmppc_realmode_machine_check() function looks at this, and if OPAL reports that there was no error, or that it has handled the error, we also go straight back to the guest with a machine check. We have to deliver a machine check to the guest since the machine check interrupt might have trashed valid values in SRR0/1. If the machine check is one we can't handle in real mode, and one that OPAL hasn't already handled, or on PPC970, we exit the guest and call the host's machine check handler. We do this by jumping to the machine_check_fwnmi label, rather than absolute address 0x200, because we don't want to re-execute OPAL's handler on POWER7. On PPC970, the two are equivalent because address 0x200 just contains a branch. Then, if the host machine check handler decides that the system can continue executing, kvmppc_handle_exit() delivers a machine check interrupt to the guest -- once again to let the guest know that SRR0/1 have been modified. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix checkpatch warnings] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:07 +01:00
Paul Mackerras	1b400ba0cd	KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations When we change or remove a HPT (hashed page table) entry, we can do either a global TLB invalidation (tlbie) that works across the whole machine, or a local invalidation (tlbiel) that only affects this core. Currently we do local invalidations if the VM has only one vcpu or if the guest requests it with the H_LOCAL flag, though the guest Linux kernel currently doesn't ever use H_LOCAL. Then, to cope with the possibility that vcpus moving around to different physical cores might expose stale TLB entries, there is some code in kvmppc_hv_entry to flush the whole TLB of entries for this VM if either this vcpu is now running on a different physical core from where it last ran, or if this physical core last ran a different vcpu. There are a number of problems on POWER7 with this as it stands: - The TLB invalidation is done per thread, whereas it only needs to be done per core, since the TLB is shared between the threads. - With the possibility of the host paging out guest pages, the use of H_LOCAL by an SMP guest is dangerous since the guest could possibly retain and use a stale TLB entry pointing to a page that had been removed from the guest. - The TLB invalidations that we do when a vcpu moves from one physical core to another are unnecessary in the case of an SMP guest that isn't using H_LOCAL. - The optimization of using local invalidations rather than global should apply to guests with one virtual core, not just one vcpu. (None of this applies on PPC970, since there we always have to invalidate the whole TLB when entering and leaving the guest, and we can't support paging out guest memory.) To fix these problems and simplify the code, we now maintain a simple cpumask of which cpus need to flush the TLB on entry to the guest. (This is indexed by cpu, though we only ever use the bits for thread 0 of each core.) Whenever we do a local TLB invalidation, we set the bits for every cpu except the bit for thread 0 of the core that we're currently running on. Whenever we enter a guest, we test and clear the bit for our core, and flush the TLB if it was set. On initial startup of the VM, and when resetting the HPT, we set all the bits in the need_tlb_flush cpumask, since any core could potentially have stale TLB entries from the previous VM to use the same LPID, or the previous contents of the HPT. Then, we maintain a count of the number of online virtual cores, and use that when deciding whether to use a local invalidation rather than the number of online vcpus. The code to make that decision is extracted out into a new function, global_invalidates(). For multi-core guests on POWER7 (i.e. when we are using mmu notifiers), we now never do local invalidations regardless of the H_LOCAL flag. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:05 +01:00
Paul Mackerras	3a2e7b0d76	KVM: PPC: Book3S PR: MSR_DE doesn't exist on Book 3S The mask of MSR bits that get transferred from the guest MSR to the shadow MSR included MSR_DE. In fact that bit only exists on Book 3E processors, and it is assigned the same bit used for MSR_BE on Book 3S processors. Since we already had MSR_BE in the mask, this just removes MSR_DE. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:03 +01:00
Paul Mackerras	28c483b62f	KVM: PPC: Book3S PR: Fix VSX handling This fixes various issues in how we were handling the VSX registers that exist on POWER7 machines. First, we were running off the end of the current->thread.fpr[] array. Ultimately this was because the vcpu->arch.vsr[] array is sized to be able to store both the FP registers and the extra VSX registers (i.e. 64 entries), but PR KVM only uses it for the extra VSX registers (i.e. 32 entries). Secondly, calling load_up_vsx() from C code is a really bad idea, because it jumps to fast_exception_return at the end, rather than returning with a blr instruction. This was causing it to jump off to a random location with random register contents, since it was using the largely uninitialized stack frame created by kvmppc_load_up_vsx. In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx, since giveup_fpu and load_up_fpu handle the extra VSX registers as well as the standard FP registers on machines with VSX. Also, since VSX instructions can access the VMX registers and the FP registers as well as the extra VSX registers, we have to load up the FP and VMX registers before we can turn on the MSR_VSX bit for the guest. Conversely, if we save away any of the VSX or FP registers, we have to turn off MSR_VSX for the guest. To handle all this, it is more convenient for a single call to kvmppc_giveup_ext() to handle all the state saving that needs to be done, so we make it take a set of MSR bits rather than just one, and the switch statement becomes a series of if statements. Similarly kvmppc_handle_ext needs to be able to load up more than one set of registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:02 +01:00
Paul Mackerras	b0a94d4e23	KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registers This adds basic emulation of the PURR and SPURR registers. We assume we are emulating a single-threaded core, so these advance at the same rate as the timebase. A Linux kernel running on a POWER7 expects to be able to access these registers and is not prepared to handle a program interrupt on accessing them. This also adds a very minimal emulation of the DSCR (data stream control register). Writes are ignored and reads return zero. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:01 +01:00
Paul Mackerras	1cc8ed0b13	KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages Currently, if the guest does an H_PROTECT hcall requesting that the permissions on a HPT entry be changed to allow writing, we make the requested change even if the page is marked read-only in the host Linux page tables. This is a problem since it would for instance allow a guest to modify a page that KSM has decided can be shared between multiple guests. To fix this, if the new permissions for the page allow writing, we need to look up the memslot for the page, work out the host virtual address, and look up the Linux page tables to get the PTE for the page. If that PTE is read-only, we reduce the HPTE permissions to read-only. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:34:00 +01:00
Paul Mackerras	05dd85f793	KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPT This fixes a bug in the code which allows userspace to read out the contents of the guest's hashed page table (HPT). On the second and subsequent passes through the HPT, when we are reporting only those entries that have changed, we were incorrectly initializing the index field of the header with the index of the first entry we skipped rather than the first changed entry. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:59 +01:00
Paul Mackerras	a64fd70748	KVM: PPC: Book3S HV: Reset reverse-map chains when resetting the HPT With HV-style KVM, we maintain reverse-mapping lists that enable us to find all the HPT (hashed page table) entries that reference each guest physical page, with the heads of the lists in the memslot->arch.rmap arrays. When we reset the HPT (i.e. when we reboot the VM), we clear out all the HPT entries but we were not clearing out the reverse mapping lists. The result is that as we create new HPT entries, the lists get corrupted, which can easily lead to loops, resulting in the host kernel hanging when it tries to traverse those lists. This fixes the problem by zeroing out all the reverse mapping lists when we zero out the HPT. This incidentally means that we are also zeroing our record of the referenced and changed bits (not the bits in the Linux PTEs, used by the Linux MM subsystem, but the bits used by the KVM_GET_DIRTY_LOG ioctl, and those used by kvm_age_hva() and kvm_test_age_hva()). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:58 +01:00
Paul Mackerras	a2932923cc	KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the "bolted" entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. The format of the data provides a simple run-length compression of the invalid entries. Each block of data starts with a header that indicates the index (position in the HPT, which is just an array), the number of valid entries starting at that index (may be zero), and the number of invalid entries following those valid entries. The valid entries, 16 bytes each, follow the header. The invalid entries are not explicitly represented. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix documentation] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:57 +01:00
Paul Mackerras	6b445ad4f8	KVM: PPC: Book3S HV: Make a HPTE removal function available This makes a HPTE removal function, kvmppc_do_h_remove(), available outside book3s_hv_rm_mmu.c. This will be used by the HPT writing code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:55 +01:00
Paul Mackerras	44e5f6be62	KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs This uses a bit in our record of the guest view of the HPTE to record when the HPTE gets modified. We use a reserved bit for this, and ensure that this bit is always cleared in HPTE values returned to the guest. The recording of modified HPTEs is only done if other code indicates its interest by setting kvm->arch.hpte_mod_interest to a non-zero value. The reason for this is that when later commits add facilities for userspace to read the HPT, the first pass of reading the HPT will be quicker if there are no (or very few) HPTEs marked as modified, rather than having most HPTEs marked as modified. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:54 +01:00
Paul Mackerras	4879f24172	KVM: PPC: Book3S HV: Fix bug causing loss of page dirty state This fixes a bug where adding a new guest HPT entry via the H_ENTER hcall would lose the "changed" bit in the reverse map information for the guest physical page being mapped. The result was that the KVM_GET_DIRTY_LOG could return a zero bit for the page even though the page had been modified by the guest. This fixes it by only modifying the index and present bits in the reverse map entry, thus preserving the reference and change bits. We were also unnecessarily setting the reference bit, and this fixes that too. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:53 +01:00
Paul Mackerras	7ed661bf85	KVM: PPC: Book3S HV: Restructure HPT entry creation code This restructures the code that creates HPT (hashed page table) entries so that it can be called in situations where we don't have a struct vcpu pointer, only a struct kvm pointer. It also fixes a bug where kvmppc_map_vrma() would corrupt the guest R4 value. Most of the work of kvmppc_virtmode_h_enter is now done by a new function, kvmppc_virtmode_do_h_enter, which itself calls another new function, kvmppc_do_h_enter, which contains most of the old kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments for the place to return the HPTE index, the Linux page tables to use, and whether it is being called in real mode, thus removing the need for it to have the vcpu as an argument. Currently kvmppc_map_vrma creates the VRMA (virtual real mode area) HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily to handle H_ENTER hcalls from the guest that need to pin a page of memory. Since H_ENTER returns the index of the created HPTE in R4, kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4 in the case when it gets called from kvmppc_map_vrma on the first VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls kvmppc_virtmode_do_h_enter with the address of a dummy word as the place to store the HPTE index, thus avoiding corrupting the guest R4. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:52 +01:00
Alexander Graf	0e673fb679	KVM: PPC: Support eventfd In order to support the generic eventfd infrastructure on PPC, we need to call into the generic KVM in-kernel device mmio code. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-12-06 01:33:50 +01:00
Marcelo Tosatti	42897d866b	KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization TSC initialization will soon make use of online_vcpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:14 -02:00
Alexander Graf	0588000eac	Merge commit 'origin/queue' into for-queue Conflicts: arch/powerpc/include/asm/Kbuild arch/powerpc/include/uapi/asm/Kbuild	2012-10-31 13:36:18 +01:00
Paul Mackerras	9f8c8c7812	KVM: PPC: Book3S HV: Allow DTL to be set to address 0, length 0 Commit `55b665b026` ("KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas") includes a check on the length of the dispatch trace log (DTL) to make sure the buffer is at least one entry long. This is appropriate when registering a buffer, but the interface also allows for any existing buffer to be unregistered by specifying a zero address. In this case the length check is not appropriate. This makes the check conditional on the address being non-zero. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:58 +01:00
Paul Mackerras	c7b676709c	KVM: PPC: Book3S HV: Fix accounting of stolen time Currently the code that accounts stolen time tends to overestimate the stolen time, and will sometimes report more stolen time in a DTL (dispatch trace log) entry than has elapsed since the last DTL entry. This can cause guests to underflow the user or system time measured for some tasks, leading to ridiculous CPU percentages and total runtimes being reported by top and other utilities. In addition, the current code was designed for the previous policy where a vcore would only run when all the vcpus in it were runnable, and so only counted stolen time on a per-vcore basis. Now that a vcore can run while some of the vcpus in it are doing other things in the kernel (e.g. handling a page fault), we need to count the time when a vcpu task is preempted while it is not running as part of a vcore as stolen also. To do this, we bring back the BUSY_IN_HOST vcpu state and extend the vcpu_load/put functions to count preemption time while the vcpu is in that state. Handling the transitions between the RUNNING and BUSY_IN_HOST states requires checking and updating two variables (accumulated time stolen and time last preempted), so we add a new spinlock, vcpu->arch.tbacct_lock. This protects both the per-vcpu stolen/preempt-time variables, and the per-vcore variables while this vcpu is running the vcore. Finally, we now don't count time spent in userspace as stolen time. The task could be executing in userspace on behalf of the vcpu, or it could be preempted, or the vcpu could be genuinely stopped. Since we have no way of dividing up the time between these cases, we don't count any of it as stolen. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:57 +01:00
Paul Mackerras	8455d79e21	KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run Currently the Book3S HV code implements a policy on multi-threaded processors (i.e. POWER7) that requires all of the active vcpus in a virtual core to be ready to run before we run the virtual core. However, that causes problems on reset, because reset stops all vcpus except vcpu 0, and can also reduce throughput since all four threads in a virtual core have to wait whenever any one of them hits a hypervisor page fault. This relaxes the policy, allowing the virtual core to run as soon as any vcpu in it is runnable. With this, the KVMPPC_VCPU_STOPPED state and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish between them. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:56 +01:00
Paul Mackerras	2f12f03436	KVM: PPC: Book3S HV: Fixes for late-joining threads If a thread in a virtual core becomes runnable while other threads in the same virtual core are already running in the guest, it is possible for the latecomer to join the others on the core without first pulling them all out of the guest. Currently this only happens rarely, when a vcpu is first started. This fixes some bugs and omissions in the code in this case. First, we need to check for VPA updates for the latecomer and make a DTL entry for it. Secondly, if it comes along while the master vcpu is doing a VPA update, we don't need to do anything since the master will pick it up in kvmppc_run_core. To handle this correctly we introduce a new vcore state, VCORE_STARTING. Thirdly, there is a race because we currently clear the hardware thread's hwthread_req before waiting to see it get to nap. A latecomer thread could have its hwthread_req cleared before it gets to test it, and therefore never increment the nap_count, leading to messages about wait_for_nap timeouts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:55 +01:00
Paul Mackerras	913d3ff9a3	KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock There were a few places where we were traversing the list of runnable threads in a virtual core, i.e. vc->runnable_threads, without holding the vcore spinlock. This extends the places where we hold the vcore spinlock to cover everywhere that we traverse that list. Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault, this moves the call of it from kvmppc_handle_exit out to kvmppc_vcpu_run, where we don't hold the vcore lock. In kvmppc_vcore_blocked, we don't actually need to check whether all vcpus are ceded and don't have any pending exceptions, since the caller has already done that. The caller (kvmppc_run_vcpu) wasn't actually checking for pending exceptions, so we add that. The change of if to while in kvmppc_run_vcpu is to make sure that we never call kvmppc_remove_runnable() when the vcore state is RUNNING or EXITING. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:55 +01:00
Paul Mackerras	7b444c6710	KVM: PPC: Book3S HV: Fix some races in starting secondary threads Subsequent patches implementing in-kernel XICS emulation will make it possible for IPIs to arrive at secondary threads at arbitrary times. This fixes some races in how we start the secondary threads, which if not fixed could lead to occasional crashes of the host kernel. This makes sure that (a) we have grabbed all the secondary threads, and verified that they are no longer in the kernel, before we start any thread, (b) that the secondary thread loads its vcpu pointer after clearing the IPI that woke it up (so we don't miss a wakeup), and (c) that the secondary thread clears its vcpu pointer before incrementing the nap count. It also removes unnecessary setting of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:54 +01:00
Paul Mackerras	512691d490	KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online When a Book3S HV KVM guest is running, we need the host to be in single-thread mode, that is, all of the cores (or at least all of the cores where the KVM guest could run) to be running only one active hardware thread. This is because of the hardware restriction in POWER processors that all of the hardware threads in the core must be in the same logical partition. Complying with this restriction is much easier if, from the host kernel's point of view, only one hardware thread is active. This adds two hooks in the SMP hotplug code to allow the KVM code to make sure that secondary threads (i.e. hardware threads other than thread 0) cannot come online while any KVM guest exists. The KVM code still has to check that any core where it runs a guest has the secondary threads offline, but having done that check it can now be sure that they will not come online while the guest is running. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:53 +01:00
Alexander Graf	388cf9ee3c	KVM: PPC: Move mtspr/mfspr emulation into own functions The mtspr/mfspr emulation code became quite big over time. Move it into its own function so things stay more readable. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-30 10:54:51 +01:00
Alexander Graf	e43a028752	KVM: PPC: 44x: fix DCR read/write When remembering the direction of a DCR transaction, we should write to the same variable that we interpret on later when doing vcpu_run again. Signed-off-by: Alexander Graf <agraf@suse.de> Cc: stable@vger.kernel.org	2012-10-30 10:54:50 +01:00
Xiao Guangrong	81c52c56e2	KVM: do not treat noslot pfn as a error pfn This patch filters noslot pfn out from error pfns based on Marcelo comment: noslot pfn is not a error pfn After this patch, - is_noslot_pfn indicates that the gfn is not in slot - is_error_pfn indicates that the gfn is in slot but the error is occurred when translate the gfn to pfn - is_error_noslot_pfn indicates that the pfn either it is error pfns or it is noslot pfn And is_invalid_pfn can be removed, it makes the code more clean Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-10-29 20:31:04 -02:00
Marcelo Tosatti	19bf7f8ac3	Merge remote-tracking branch 'master' into queue Merge reason: development work has dependency on kvm patches merged upstream. Conflicts: arch/powerpc/include/asm/Kbuild arch/powerpc/include/asm/kvm_para.h Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-10-29 19:15:32 -02:00
Christoffer Dall	8ca40a70a7	KVM: Take kvm instead of vcpu to mmu_notifier_retry The mmu_notifier_retry is not specific to any vcpu (and never will be) so only take struct kvm as a parameter. The motivation is the ARM mmu code that needs to call this from somewhere where we long let go of the vcpu pointer. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-10-23 13:35:43 +02:00
Aneesh Kumar K.V	ce236ab576	powerpc: Build fix for powerpc KVM Fix build failure for powerpc KVM by adding missing VPN_SHIFT definition and the ';' arch/powerpc/kvm/book3s_32_mmu_host.c: In function 'kvmppc_mmu_map_page': arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: 'VPN_SHIFT' undeclared (first use in this function) arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: (Each undeclared identifier is reported only once arch/powerpc/kvm/book3s_32_mmu_host.c:176: error: for each function it appears in.) arch/powerpc/kvm/book3s_32_mmu_host.c:178: error: expected ';' before 'next_pteg' arch/powerpc/kvm/book3s_32_mmu_host.c:190: error: label 'next_pteg' used but not defined make[1]: *** [arch/powerpc/kvm/book3s_32_mmu_host.o] Error 1 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-10-18 10:37:52 +11:00
Konstantin Khlebnikov	314e51b985	mm: kill vma flag VM_RESERVED and mm->reserved_vm counter A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: \| effect \| alternative flags -+------------------------+--------------------------------------------- 1\| account as reserved_vm \| VM_IO 2\| skip in core dump \| VM_IO, VM_DONTDUMP 3\| do not merge or expand \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4\| do not mlock \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND \| VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO\|VM_DONTEXPAND\|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND \| VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:19 +09:00
Julia Lawall	12ecd9570d	arch/powerpc/kvm/e500_tlb.c: fix error return code Convert a 0 error return code to a negative one, as returned elsewhere in the function. A new label is also added to avoid freeing things that are known to not yet be allocated. A simplified version of the semantic match that finds the first problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e,e1,e2,e3,e4,x; @@ ( if ($ret != 0\\|ret < 0$ \|\| ...) { ... return ...; } \| ret = 0 ) ... when != ret = e1 x = $kmalloc\\|kzalloc\\|kcalloc\\|devm_kzalloc\\|ioremap\\|ioremap_nocache\\|devm_ioremap\\|devm_ioremap_nocache$(...); ... when != x = e2 when != ret = e3 if (x == NULL \|\| ...) { ... when != ret = e4 * return ret; } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:55 +02:00
Paul Mackerras	55b665b026	KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas The PAPR paravirtualization interface lets guests register three different types of per-vCPU buffer areas in its memory for communication with the hypervisor. These are called virtual processor areas (VPAs). Currently the hypercalls to register and unregister VPAs are handled by KVM in the kernel, and userspace has no way to know about or save and restore these registrations across a migration. This adds "register" codes for these three areas that userspace can use with the KVM_GET/SET_ONE_REG ioctls to see what addresses have been registered, and to register or unregister them. This will be needed for guest hibernation and migration, and is also needed so that userspace can unregister them on reset (otherwise we corrupt guest memory after reboot by writing to the VPAs registered by the previous kernel). The "register" for the VPA is a 64-bit value containing the address, since the length of the VPA is fixed. The "registers" for the SLB shadow buffer and dispatch trace log (DTL) are 128 bits long, consisting of the guest physical address in the high (first) 64 bits and the length in the low 64 bits. This also fixes a bug where we were calling init_vpa unconditionally, leading to an oops when unregistering the VPA. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:55 +02:00
Paul Mackerras	a8bd19ef4d	KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface This enables userspace to get and set all the guest floating-point state using the KVM_[GS]ET_ONE_REG ioctls. The floating-point state includes all of the traditional floating-point registers and the FPSCR (floating point status/control register), all the VMX/Altivec vector registers and the VSCR (vector status/control register), and on POWER7, the vector-scalar registers (note that each FP register is the high-order half of the corresponding VSR). Most of these are implemented in common Book 3S code, except for VSX on POWER7. Because HV and PR differ in how they store the FP and VSX registers on POWER7, the code for these cases is not common. On POWER7, the FP registers are the upper halves of the VSX registers vsr0 - vsr31. PR KVM stores vsr0 - vsr31 in two halves, with the upper halves in the arch.fpr[] array and the lower halves in the arch.vsr[] array, whereas HV KVM on POWER7 stores the whole VSX register in arch.vsr[]. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix whitespace, vsx compilation] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:54 +02:00
Paul Mackerras	a136a8bdc0	KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface This enables userspace to get and set various SPRs (special-purpose registers) using the KVM_[GS]ET_ONE_REG ioctls. With this, userspace can get and set all the SPRs that are part of the guest state, either through the KVM_[GS]ET_REGS ioctls, the KVM_[GS]ET_SREGS ioctls, or the KVM_[GS]ET_ONE_REG ioctls. The SPRs that are added here are: - DABR: Data address breakpoint register - DSCR: Data stream control register - PURR: Processor utilization of resources register - SPURR: Scaled PURR - DAR: Data address register - DSISR: Data storage interrupt status register - AMR: Authority mask register - UAMOR: User authority mask override register - MMCR0, MMCR1, MMCRA: Performance monitor unit control registers - PMC1..PMC8: Performance monitor unit counter registers In order to reduce code duplication between PR and HV KVM code, this moves the kvm_vcpu_ioctl_[gs]et_one_reg functions into book3s.c and centralizes the copying between user and kernel space there. The registers that are handled differently between PR and HV, and those that exist only in one flavor, are handled in kvmppc_[gs]et_one_reg() functions that are specific to each flavor. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: minimal style fixes] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:54 +02:00
Scott Wood	5bd1cf1185	KVM: PPC: set IN_GUEST_MODE before checking requests Avoid a race as described in the code comment. Also remove a related smp_wmb() from booke's kvmppc_prepare_to_enter(). I can't see any reason for it, and the book3s_pr version doesn't have it. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:54 +02:00
Scott Wood	adbb48a854	KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages This was found by kmemleak. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:53 +02:00
Scott Wood	e400e72f25	KVM: PPC: e500: fix allocation size error on g2h_tlb1_map We were only allocating half the bytes we need, which was made more obvious by a recent fix to the memset in clear_tlb1_bitmap(). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Cc: stable@vger.kernel.org	2012-10-05 23:38:53 +02:00
Paul Mackerras	70bddfefbd	KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation In the case where the host kernel is using a 64kB base page size and the guest uses a 4k HPTE (hashed page table entry) to map an emulated MMIO device, we were calculating the guest physical address wrongly. We were calculating a gfn as the guest physical address shifted right 16 bits (PAGE_SHIFT) but then only adding back in 12 bits from the effective address, since the HPTE had a 4k page size. Thus the gpa reported to userspace was missing 4 bits. Instead, we now compute the guest physical address from the HPTE without reference to the host page size, and then compute the gfn by shifting the gpa right PAGE_SHIFT bits. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:53 +02:00
Paul Mackerras	964ee98ccd	KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs When making a vcpu non-runnable we incorrectly changed the thread IDs of all other threads on the core, just remove that code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:52 +02:00
Paul Mackerras	a47d72f361	KVM: PPC: Book3S HV: Fix updates of vcpu->cpu This removes the powerpc "generic" updates of vcpu->cpu in load and put, and moves them to the various backends. The reason is that "HV" KVM does its own sauce with that field and the generic updates might corrupt it. The field contains the CPU# of the -first- HW CPU of the core always for all the VCPU threads of a core (the one that's online from a host Linux perspective). However, the preempt notifiers are going to be called on the threads VCPUs when they are running (due to them sleeping on our private waitqueue) causing unload to be called, potentially clobbering the value. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:52 +02:00
Paul Mackerras	dfe49dbd1f	KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly This adds an implementation of kvm_arch_flush_shadow_memslot for Book3S HV, and arranges for kvmppc_core_commit_memory_region to flush the dirty log when modifying an existing slot. With this, we can handle deletion and modification of memory slots. kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which on Book3S HV now traverses the reverse map chains to remove any HPT (hashed page table) entries referring to pages in the memslot. This gets called by generic code whenever deleting a memslot or changing the guest physical address for a memslot. We flush the dirty log in kvmppc_core_commit_memory_region for consistency with what x86 does. We only need to flush when an existing memslot is being modified, because for a new memslot the rmap array (which stores the dirty bits) is all zero, meaning that every page is considered clean already, and when deleting a memslot we obviously don't care about the dirty bits any more. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:51 +02:00
Paul Mackerras	a66b48c3a3	KVM: PPC: Move kvm->arch.slot_phys into memslot.arch Now that we have an architecture-specific field in the kvm_memory_slot structure, we can use it to store the array of page physical addresses that we need for Book3S HV KVM on PPC970 processors. This reduces the size of struct kvm_arch for Book3S HV, and also reduces the size of struct kvm_arch_memory_slot for other PPC KVM variants since the fields in it are now only compiled in for Book3S HV. This necessitates making the kvm_arch_create_memslot and kvm_arch_free_memslot operations specific to each PPC KVM variant. That in turn means that we now don't allocate the rmap arrays on Book3S PR and Book E. Since we now unpin pages and free the slot_phys array in kvmppc_core_free_memslot, we no longer need to do it in kvmppc_core_destroy_vm, since the generic code takes care to free all the memslots when destroying a VM. We now need the new memslot to be passed in to kvmppc_core_prepare_memory_region, since we need to initialize its arch.slot_phys member on Book3S HV. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:51 +02:00
Paul Mackerras	2c9097e4c1	KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots The generic KVM code uses SRCU (sleeping RCU) to protect accesses to the memslots data structures against updates due to userspace adding, modifying or removing memory slots. We need to do that too, both to avoid accessing stale copies of the memslots and to avoid lockdep warnings. This therefore adds srcu_read_lock/unlock pairs around code that accesses and uses memslots. Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE need to access the memslots, and we don't want to call the SRCU code in real mode (since we have no assurance that it would only access the linear mapping), we hold the SRCU read lock for the VM while in the guest. This does mean that adding or removing memory slots while some vcpus are executing in the guest will block for up to two jiffies. This tradeoff is acceptable since adding/removing memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE are performance-critical hot paths. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:51 +02:00
Alexander Graf	7a08c2740f	KVM: PPC: BookE: Support FPU on non-hv systems When running on HV aware hosts, we can not trap when the guest sets the FP bit, so we just let it do so when it wants to, because it has full access to MSR. For non-HV aware hosts with an FPU (like 440), we need to also adjust the shadow MSR though. Otherwise the guest gets an FP unavailable trap even when it really enabled the FP bit in MSR. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:50 +02:00
Alexander Graf	ceb985f9d1	KVM: PPC: 440: Implement mfdcrx We need mfdcrx to execute properly on 460 cores. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:49 +02:00
Alexander Graf	e4dcfe88fb	KVM: PPC: 440: Implement mtdcrx We need mtdcrx to execute properly on 460 cores. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:49 +02:00
Alexander Graf	430c7ff52f	KVM: PPC: E500: Remove E500_TLB_DIRTY flag Since we always mark pages as dirty immediately when mapping them read/write now, there's no need for the dirty flag in our cache. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:48 +02:00
Alexander Graf	166a2b7000	KVM: PPC: Use symbols for exit trace Exit traces are a lot easier to read when you don't have to remember cryptic numbers for guest exit reasons. Symbolify them in our trace output. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:48 +02:00
Alexander Graf	50c871edf5	KVM: PPC: BookE: Add MCSR SPR support Add support for the MCSR SPR. This only implements the SPR storage bits, not actual machine checks. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:48 +02:00
Alexander Graf	491dd5b8a4	KVM: PPC: 44x: Initialize PVR We need to make sure that vcpu->arch.pvr is initialized to a sane value, so let's just take the host PVR. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:47 +02:00
Bharat Bhushan	6df8d3fc58	booke: Added ONE_REG interface for IAC/DAC debug registers IAC/DAC are defined as 32 bit while they are 64 bit wide. So ONE_REG interface is added to set/get them. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:47 +02:00
Bharat Bhushan	f61c94bb99	KVM: PPC: booke: Add watchdog emulation This patch adds the watchdog emulation in KVM. The watchdog emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_BOOKE_WATCHDOG) ioctl. The kernel timer are used for watchdog emulation and emulates h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how it is configured. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> [bharat.bhushan@freescale.com: reworked patch] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> [agraf: adjust to new request framework] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:47 +02:00
Alexander Graf	7c973a2ebb	KVM: PPC: Add return value to core_check_requests Requests may want to tell us that we need to go back into host state, so add a return value for the checks. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:46 +02:00
Alexander Graf	7ee788556b	KVM: PPC: Add return value in prepare_to_enter Our prepare_to_enter helper wants to be able to return in more circumstances to the host than only when an interrupt is pending. Broaden the interface a bit and move even more generic code to the generic helper. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:46 +02:00
Alexander Graf	206c2ed7f1	KVM: PPC: Ignore EXITING_GUEST_MODE mode We don't need to do anything when mode is EXITING_GUEST_MODE, because we essentially are outside of guest mode and did everything it asked us to do by the time we check it. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:46 +02:00
Alexander Graf	3766a4c693	KVM: PPC: Move kvm_guest_enter call into generic code We need to call kvm_guest_enter in booke and book3s, so move its call to generic code. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:45 +02:00
Alexander Graf	bd2be6836e	KVM: PPC: Book3S: PR: Rework irq disabling Today, we disable preemption while inside guest context, because we need to expose to the world that we are not in a preemptible context. However, during that time we already have interrupts disabled, which would indicate that we are in a non-preemptible context. The reason the checks for irqs_disabled() fail for us though is that we manually control hard IRQs and ignore all the lazy EE framework. Let's stop doing that. Instead, let's always use lazy EE to indicate when we want to disable IRQs, but do a special final switch that gets us into EE disabled, but soft enabled state. That way when we get back out of guest state, we are immediately ready to process interrupts. This simplifies the code drastically and reduces the time that we appear as preempt disabled. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:45 +02:00
Alexander Graf	24afa37b9c	KVM: PPC: Consistentify vcpu exit path When getting out of __vcpu_run, let's be consistent about the state we return in. We want to always * have IRQs enabled * have called kvm_guest_exit before Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:45 +02:00
Alexander Graf	0652eaaebe	KVM: PPC: Book3S: PR: Indicate we're out of guest mode When going out of guest mode, indicate that we are in vcpu->mode. That way requests from other CPUs don't needlessly need to kick us to process them, because it'll just happen next time we enter the guest. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:44 +02:00
Alexander Graf	706fb730cb	KVM: PPC: Exit guest context while handling exit The x86 implementation of KVM accounts for host time while processing guest exits. Do the same for us. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:43 +02:00
Alexander Graf	c63ddcb454	KVM: PPC: Book3S: PR: Only do resched check once per exit Now that we use our generic exit helper, we can safely drop our previous kvm_resched that we used to trigger at the beginning of the exit handler function. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:43 +02:00
Alexander Graf	e85ad380c6	KVM: PPC: BookE: Drop redundant vcpu->mode set We only need to set vcpu->mode to outside once. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:43 +02:00
Alexander Graf	9b0cb3c808	KVM: PPC: Book3s: PR: Add (dumb) MMU Notifier support Now that we have very simple MMU Notifier support for e500 in place, also add the same simple support to book3s. It gets us one step closer to actual fast support. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:43 +02:00
Alexander Graf	03d25c5bd5	KVM: PPC: Use same kvmppc_prepare_to_enter code for booke and book3s_pr We need to do the same things when preparing to enter a guest for booke and book3s_pr cores. Fold the generic code into a generic function that both call. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:42 +02:00
Alexander Graf	2d8185d4ee	KVM: PPC: BookE: No duplicate request != 0 check We only call kvmppc_check_requests() when vcpu->requests != 0, so drop the redundant check in the function itself Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:42 +02:00
Alexander Graf	6346046c3a	KVM: PPC: BookE: Add some more trace points Without trace points, debugging what exactly is going on inside guest code can be very tricky. Add a few more trace points at places that hopefully tell us more when things go wrong. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:42 +02:00
Alexander Graf	862d31f788	KVM: PPC: E500: Implement MMU notifiers The e500 target has lived without mmu notifiers ever since it got introduced, but fails for the user space check on them with hugetlbfs. So in order to get that one working, implement mmu notifiers in a reasonably dumb fashion and be happy. On embedded hardware, we almost never end up with mmu notifier calls, since most people don't overcommit. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:41 +02:00
Alexander Graf	d69c643644	KVM: PPC: BookE: Add support for vcpu->mode Generic KVM code might want to know whether we are inside guest context or outside. It also wants to be able to push us out of guest context. Add support to the BookE code for the generic vcpu->mode field that describes the above states. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:41 +02:00
Alexander Graf	4ffc6356ec	KVM: PPC: BookE: Add check_requests helper function We need a central place to check for pending requests in. Add one that only does the timer check we already do in a different place. Later, this central function can be extended by more checks. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:41 +02:00
Paul Mackerras	1340f3e887	KVM: PPC: Quieten message about allocating linear regions This is printed once for every RMA or HPT region that get preallocated. If one preallocates hundreds of such regions (in order to run hundreds of KVM guests), that gets rather painful, so make it a bit quieter. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:40 +02:00
Alexander Graf	2bb890f5ee	KVM: PPC: E500: Fix clear_tlb_refs Our mapping code assumes that TLB0 entries are always mapped. However, after calling clear_tlb_refs() this is no longer the case. Map them dynamically if we find an entry unmapped in TLB0. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:40 +02:00
Alexander Graf	cf1c5ca473	KVM: PPC: BookE: Expose remote TLB flushes in debugfs We're already counting remote TLB flushes in a variable, but don't export it to user space yet. Do so, so we know what's going on. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:39 +02:00
Alexander Graf	f4800b1f4d	KVM: PPC: Expose SYNC cap based on mmu notifiers Semantically, the "SYNC" cap means that we have mmu notifiers available. Express this in our #ifdef'ery around the feature, so that we can be sure we don't miss out on ppc targets when they get their implementation. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:39 +02:00
Alexander Graf	97c9505984	KVM: PPC: PR: Use generic tracepoint for guest exit We want to have tracing information on guest exits for booke as well as book3s. Since most information is identical, use a common trace point. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:39 +02:00
Liu Yu-B13201	9202e07636	KVM: PPC: Add support for ePAPR idle hcall in host kernel And add a new flag definition in kvm_ppc_pvinfo to indicate whether the host supports the EV_IDLE hcall. Signed-off-by: Liu Yu <yu.liu@freescale.com> [stuart.yoder@freescale.com: cleanup,fixes for conditions allowing idle] Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> [agraf: fix typo] Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:37 +02:00
Stuart Yoder	784bafac79	KVM: PPC: add pvinfo for hcall opcodes on e500mc/e5500 Signed-off-by: Liu Yu <yu.liu@freescale.com> [stuart: factored this out from idle hcall support in host patch] Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:37 +02:00
Stuart Yoder	fdcf8bd7e7	KVM: PPC: use definitions in epapr header for hcalls Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-10-05 23:38:36 +02:00
Linus Torvalds	5f3d2f2e1a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Some highlights in addition to the usual batch of fixes: - 64TB address space support for 64-bit processes by Aneesh Kumar - Gavin Shan did a major cleanup & re-organization of our EEH support code (IBM fancy PCI error handling & recovery infrastructure) which paves the way for supporting different platform backends, along with some rework of the PCIe code for the PowerNV platform in order to remove home made resource allocations and instead use the generic code (which is possible after some small improvements to it done by Gavin). - Uprobes support by Ananth N Mavinakayanahalli - A pile of embedded updates from Freescale folks, including new SoC and board supports, more KVM stuff including preparing for 64-bit BookE KVM support, ePAPR 1.1 updates, etc..." Fixup trivial conflicts in drivers/scsi/ipr.c * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits) powerpc/iommu: Fix multiple issues with IOMMU pools code powerpc: Fix VMX fix for memcpy case driver/mtd:IFC NAND:Initialise internal SRAM before any write powerpc/fsl-pci: use 'Header Type' to identify PCIE mode powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get powerpc: Remove tlb batching hack for nighthawk powerpc: Set paca->data_offset = 0 for boot cpu powerpc/perf: Sample only if SIAR-Valid bit is set in P7+ powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled powerpc/mpc85xx: Update interrupt handling for IFC controller powerpc/85xx: Enable USB support in p1023rds_defconfig powerpc/smp: Do not disable IPI interrupts during suspend powerpc/eeh: Fix crash on converting OF node to edev powerpc/eeh: Lock module while handling EEH event powerpc/kprobe: Don't emulate store when kprobe stwu r1 powerpc/kprobe: Complete kprobe and migrate exception frame powerpc/kprobe: Introduce a new thread flag powerpc: Remove unused __get_user64() and __put_user64() powerpc/eeh: Global mutex to protect PE tree powerpc/eeh: Remove EEH PE for normal PCI hotplug ...	2012-10-06 03:16:12 +09:00
Aneesh Kumar K.V	5524a27d39	powerpc/mm: Convert virtual address to vpn This patch convert different functions to take virtual page number instead of virtual address. Virtual page number is virtual address shifted right by VPN_SHIFT (12) bits. This enable us to have an address range of upto 76 bits. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:31:49 +10:00
Benjamin Herrenschmidt	fff34b3412	Merge branch 'merge' into next Brings in various bug fixes from 3.6-rcX	2012-09-07 09:48:59 +10:00
Mihai Caraman	0127262c01	powerpc: Restore VDSO information on critical exception om BookE Critical exception on 64-bit booke uses user-visible SPRG3 as scratch. Restore VDSO information in SPRG3 on exception prolog. Use a common sprg3 field in PACA for all powerpc64 architectures. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-07 09:48:49 +10:00
Marcelo Tosatti	2df72e9bc4	KVM: split kvm_arch_flush_shadow Introducing kvm_arch_flush_shadow_memslot, to invalidate the translations of a single memory slot. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-09-06 16:37:25 +03:00
Gavin Shan	66a03505a7	KVM: PPC: book3s: fix build error caused by gfn_to_hva_memslot() The build error was caused by that builtin functions are calling the functions implemented in modules. This error was introduced by commit `4d8b81abc4` ("KVM: introduce readonly memslot"). The patch fixes the build error by moving function __gfn_to_hva_memslot() from kvm_main.c to kvm_host.h and making that "inline" so that the builtin function (kvmppc_h_enter) can use that. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-08-27 16:44:20 -03:00
Marcelo Tosatti	c78aa4c4b9	Merge remote-tracking branch 'upstream/master' into queue Merging critical fixes from upstream required for development. * upstream/master: (809 commits) libata: Add a space to " 2GB ATA Flash Disk" DMA blacklist entry Revert "powerpc: Update g5_defconfig" powerpc/perf: Use pmc_overflow() to detect rolled back events powerpc: Fix VMX in interrupt check in POWER7 copy loops powerpc: POWER7 copy_to_user/copy_from_user patch applied twice powerpc: Fix personality handling in ppc64_personality() powerpc/dma-iommu: Fix IOMMU window check powerpc: Remove unnecessary ifdefs powerpc/kgdb: Restore current_thread_info properly powerpc/kgdb: Bail out of KGDB when we've been triggered powerpc/kgdb: Do not set kgdb_single_step on ppc powerpc/mpic_msgr: Add missing includes powerpc: Fix null pointer deref in perf hardware breakpoints powerpc: Fixup whitespace in xmon powerpc: Fix xmon dl command for new printk implementation xfs: check for possible overflow in xfs_ioc_trim xfs: unlock the AGI buffer when looping in xfs_dialloc xfs: fix uninitialised variable in xfs_rtbuf_get() powerpc/fsl: fix "Failed to mount /dev: No such device" errors powerpc/fsl: update defconfigs ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-08-26 13:58:41 -03:00
Alan Cox	e8143ccb6b	ppc: e500_tlb memset clears nothing Put the parameters the right way around Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44031 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-08-16 14:14:53 +02:00
Alexander Graf	249ba1ee0f	KVM: PPC: Add cache flush on page map When we map a page that wasn't icache cleared before, do so when first mapping it in KVM using the same information bits as the Linux mapping logic. That way we are 100% sure that any page we map does not have stale entries in the icache. Signed-off-by: Alexander Graf <agraf@suse.de>	2012-08-16 14:14:53 +02:00
Paul Mackerras	04f995a544	KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code In handling the H_CEDE hypercall, if this vcpu has already been prodded (with the H_PROD hypercall, which Linux guests don't in fact use), we branch to a numeric label '1f'. Unfortunately there is another '1:' label before the one that we want to jump to. This fixes the problem by using a textual label, 'kvm_cede_prodded'. It also changes the label for another longish branch from '2:' to 'kvm_cede_exit' to avoid a possible future problem if code modifications add another numeric '2:' label in between. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2012-08-16 14:14:52 +02:00
Xiao Guangrong	32cad84f44	KVM: do not release the error page After commit `a2766325cf`, the error page is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-08-06 16:04:58 +03:00
Xiao Guangrong	cb9aaa30b1	KVM: do not release the error pfn After commit `a2766325cf`, the error pfn is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-08-06 16:04:57 +03:00

... 2 3 4 5 6 ...

882 Commits