linux/arch/powerpc/sysdev
Paul Mackerras 371fefd6f2 KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7.  The host still has to run single-threaded.

This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability.  The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.

To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode.  KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline).  To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c.  In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it.  Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.

When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host.  This number is exported
to userspace via the KVM_CAP_PPC_SMT capability.  If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.

We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host.  We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked.  This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.

When a vcore starts to run, it executes in the context of one of the
vcpu threads.  The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).

It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running.  In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest.  It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.

Note that there is no fixed relationship between the hardware thread
number and the vcpu number.  Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:57 +03:00
..
bestcomm Fix common misspellings 2011-03-31 11:26:23 -03:00
qe_lib powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
xics KVM: PPC: Allow book3s_hv guests to use SMT processor modes 2011-07-12 13:16:57 +03:00
6xx-suspend.S [POWERPC] Add 6xx-style HID0_SLEEP support. 2008-05-16 23:22:28 +10:00
axonram.c powerpc: Remove ioremap_flags 2011-05-19 14:30:43 +10:00
cpm1.c powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
cpm2_pic.c powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
cpm2_pic.h powerpc/cpm2: Checkpatch cleanup 2010-03-04 10:43:58 -06:00
cpm2.c powerpc/fsl-cpm: Configure clock correctly for SCC 2010-04-19 23:13:03 -05:00
cpm_common.c of/gpio: add default of_xlate function if device has a node pointer 2010-07-05 16:14:30 -06:00
dart_iommu.c powerpc/dart: iommu table cleanup 2010-11-29 15:48:20 +11:00
dart.h
dcr-low.S powerpc/4xx: Extended DCR support v2 2008-12-21 14:21:15 +11:00
dcr.c powerpc: Const-qualify Device Node Argument to DCR Resource Extent API 2008-12-21 14:21:16 +11:00
fsl_85xx_cache_ctlr.h powerpc/85xx: add cache-sram support 2010-10-14 00:54:38 -05:00
fsl_85xx_cache_sram.c powerpc: Remove ioremap_flags 2011-05-19 14:30:43 +10:00
fsl_85xx_l2ctlr.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2011-03-18 06:31:43 -07:00
fsl_gtm.c of/address: Clean up function declarations 2010-08-01 01:42:42 -06:00
fsl_lbc.c powerpc/85xx: fix race bug of calling request_irq after enable elbc interrupts 2011-06-03 00:09:09 -05:00
fsl_msi.c Merge branch 'merge' into next 2011-05-19 17:00:06 +10:00
fsl_msi.h powerpc/fsl_msi: add removal path and probe failing path 2010-05-24 21:26:35 -05:00
fsl_pci.c powerpc/85xx: Don't add disabled PCIe devices 2011-04-12 06:29:21 -05:00
fsl_pci.h powerpc/fsl_pci: Add support for FSL PCIe controllers v2.x 2011-03-15 09:29:56 -05:00
fsl_pmc.c dt/powerpc: Eliminate users of of_platform_{,un}register_driver 2011-02-28 01:36:39 -07:00
fsl_rio.c powerpc/e500: fix breakage with fsl_rio_mcheck_exception 2011-06-22 06:15:16 -05:00
fsl_soc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 2010-10-22 20:30:48 -07:00
fsl_soc.h powerpc/5121: shared DIU framebuffer support 2010-08-01 17:06:44 -06:00
grackle.c of: add 'of_' prefix to machine_is_compatible() 2010-02-09 08:33:00 -07:00
i8259.c powerpc: Remove i8259 irq_host_ops->unmap 2011-05-19 15:31:41 +10:00
indirect_pci.c Fix common misspellings 2011-03-31 11:26:23 -03:00
ipic.c Merge remote branch 'origin/master' into merge 2011-05-20 15:36:52 +10:00
ipic.h [POWERPC] ipic: ack only for edge interrupts 2007-12-12 01:53:07 -06:00
Kconfig powerpc/4xx: Adding PCIe MSI support 2011-05-26 15:00:37 +10:00
Makefile powerpc/4xx: Adding PCIe MSI support 2011-05-26 15:00:37 +10:00
micropatch.c powerpc/cpm1: Mark micropatch code/data static and __init 2010-07-11 11:04:06 -05:00
mmio_nvram.c powerpc/nvram: Search for nvram using compatible 2011-04-20 17:01:20 +10:00
mpc5xxx_clocks.c powerpc/5xxx: Add common mpc5xxx_get_bus_frequency() function 2009-06-17 00:30:22 -06:00
mpc8xx_pic.c powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
mpc8xx_pic.h
mpc8xxx_gpio.c powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
mpic_msi.c powerpc: Fix MSI support on U4 bridge PCIe slot 2009-12-18 14:55:43 +11:00
mpic_pasemi_msi.c powerpc: Convert to new irq_* function names 2011-03-29 14:48:12 +02:00
mpic_u3msi.c powerpc: Convert to new irq_* function names 2011-03-29 14:48:12 +02:00
mpic.c arch/powerpc: use printk_ratelimited instead of printk_ratelimit 2011-06-29 15:31:01 +10:00
mpic.h powerpc: mpic irq_data conversion. 2011-03-10 11:03:56 +11:00
msi_bitmap.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mv64x60_dev.c powerpc/mv64x60: Suspected typo in assignment 2011-03-02 16:50:05 +11:00
mv64x60_pci.c powerpc/pci: Remove owner field from attribute initialization in PCI bridge init 2010-08-05 13:53:35 -07:00
mv64x60_pic.c powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
mv64x60_udbg.c [POWERPC] Fix mv64x60 early console code to use cell-index property 2008-04-24 20:57:34 +10:00
mv64x60.h
of_rtc.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
pmi.c dt/powerpc: Eliminate users of of_platform_{,un}register_driver 2011-02-28 01:36:39 -07:00
ppc4xx_cpm.c powerpc/4xx: Add suspend and idle support 2010-11-29 10:05:06 -05:00
ppc4xx_gpio.c of/gpio: add default of_xlate function if device has a node pointer 2010-07-05 16:14:30 -06:00
ppc4xx_msi.c powerpc/4xx: Adding PCIe MSI support 2011-05-26 15:00:37 +10:00
ppc4xx_pci.c powerpc/44x: Adding PCI-E support for PowerPC 460SX based SOC. 2010-05-07 15:07:19 -04:00
ppc4xx_pci.h Fix common misspellings 2011-03-31 11:26:23 -03:00
ppc4xx_soc.c powerpc/4xx: Add optional "reset_type" property to control reboot via dts 2010-05-05 12:51:54 -04:00
rtc_cmos_setup.c powerpc: rtc_cmos_setup: assign interrupts only if there is i8259 PIC 2008-07-28 08:47:38 -05:00
scom.c powerpc: Add SCOM infrastructure 2011-04-20 17:01:19 +10:00
simple_gpio.c of/gpio: add default of_xlate function if device has a node pointer 2010-07-05 16:14:30 -06:00
simple_gpio.h powerpc: Implement GPIO driver for simple memory-mapped banks 2008-12-30 11:13:45 -06:00
tsi108_dev.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2011-01-11 16:31:41 -08:00
tsi108_pci.c powerpc: Convert to new irq_* function names 2011-03-29 14:48:12 +02:00
uic.c powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
xilinx_intc.c powerpc/irq: Stop exporting irq_map 2011-05-04 15:02:15 +10:00
xilinx_pci.c powerpc/virtex: Add support for Xilinx PCI host bridge 2009-06-06 10:14:22 -06:00