linux/Documentation/virtual/kvm/api.txt

The Definitive KVM (Kernel-based Virtual Machine) API Documentation
===================================================================

1. General description
----------------------

The kvm API is a set of ioctls that are issued to control various aspects
of a virtual machine.  The ioctls belong to three classes

 - System ioctls: These query and set global attributes which affect the
   whole kvm subsystem.  In addition a system ioctl is used to create
   virtual machines

 - VM ioctls: These query and set attributes that affect an entire virtual
   machine, for example memory layout.  In addition a VM ioctl is used to
   create virtual cpus (vcpus).

   Only run VM ioctls from the same process (address space) that was used
   to create the VM.

 - vcpu ioctls: These query and set attributes that control the operation
   of a single virtual cpu.

   Only run vcpu ioctls from the same thread that was used to create the
   vcpu.


2. File descriptors
-------------------

The kvm API is centered around file descriptors.  An initial
open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
can be used to issue system ioctls.  A KVM_CREATE_VM ioctl on this
handle will create a VM file descriptor which can be used to issue VM
ioctls.  A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
and return a file descriptor pointing to it.  Finally, ioctls on a vcpu
fd can be used to control the vcpu, including the important task of
actually running guest code.

In general file descriptors can be migrated among processes by means
of fork() and the SCM_RIGHTS facility of unix domain socket.  These
kinds of tricks are explicitly not supported by kvm.  While they will
not cause harm to the host, their actual behavior is not guaranteed by
the API.  The only supported use is one virtual machine per process,
and one vcpu per thread.


3. Extensions
-------------

As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
incompatible change are allowed.  However, there is an extension
facility that allows backward-compatible extensions to the API to be
queried and used.

The extension mechanism is not based on the Linux version number.
Instead, kvm defines extension identifiers and a facility to query
whether a particular extension identifier is available.  If it is, a
set of ioctls is available for application use.


4. API description
------------------

This section describes ioctls that can be used to control kvm guests.
For each ioctl, the following information is provided along with a
description:

  Capability: which KVM extension provides this ioctl.  Can be 'basic',
      which means that is will be provided by any kernel that supports
      API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
      means availability needs to be checked with KVM_CHECK_EXTENSION
      (see section 4.4).

  Architectures: which instruction set architectures provide this ioctl.
      x86 includes both i386 and x86_64.

  Type: system, vm, or vcpu.

  Parameters: what parameters are accepted by the ioctl.

  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
      are not detailed, but errors with specific meanings are.


4.1 KVM_GET_API_VERSION

Capability: basic
Architectures: all
Type: system ioctl
Parameters: none
Returns: the constant KVM_API_VERSION (=12)

This identifies the API version as the stable kvm API. It is not
expected that this number will change.  However, Linux 2.6.20 and
2.6.21 report earlier versions; these are not documented and not
supported.  Applications should refuse to run if KVM_GET_API_VERSION
returns a value other than 12.  If this check passes, all ioctls
described as 'basic' will be available.


4.2 KVM_CREATE_VM

Capability: basic
Architectures: all
Type: system ioctl
Parameters: machine type identifier (KVM_VM_*)
Returns: a VM fd that can be used to control the new virtual machine.

The new VM has no virtual cpus and no memory.  An mmap() of a VM fd
will access the virtual machine's physical address space; offset zero
corresponds to guest physical address zero.  Use of mmap() on a VM fd
is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
available.
You most certainly want to use 0 as machine type.

In order to create user controlled virtual machines on S390, check
KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
privileged user (CAP_SYS_ADMIN).


4.3 KVM_GET_MSR_INDEX_LIST

Capability: basic
Architectures: x86
Type: system
Parameters: struct kvm_msr_list (in/out)
Returns: 0 on success; -1 on error
Errors:
  E2BIG:     the msr index list is to be to fit in the array specified by
             the user.

struct kvm_msr_list {
	__u32 nmsrs; /* number of msrs in entries */
	__u32 indices[0];
};

This ioctl returns the guest msrs that are supported.  The list varies
by kvm version and host processor, but does not change otherwise.  The
user fills in the size of the indices array in nmsrs, and in return
kvm adjusts nmsrs to reflect the actual number of msrs and fills in
the indices array with their numbers.

Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
not returned in the MSR list, as different vcpus can have a different number
of banks, as set via the KVM_X86_SETUP_MCE ioctl.


4.4 KVM_CHECK_EXTENSION

Capability: basic
Architectures: all
Type: system ioctl
Parameters: extension identifier (KVM_CAP_*)
Returns: 0 if unsupported; 1 (or some other positive integer) if supported

The API allows the application to query about extensions to the core
kvm API.  Userspace passes an extension identifier (an integer) and
receives an integer that describes the extension availability.
Generally 0 means no and 1 means yes, but some extensions may report
additional information in the integer return value.


4.5 KVM_GET_VCPU_MMAP_SIZE

Capability: basic
Architectures: all
Type: system ioctl
Parameters: none
Returns: size of vcpu mmap area, in bytes

The KVM_RUN ioctl (cf.) communicates with userspace via a shared
memory region.  This ioctl returns the size of that region.  See the
KVM_RUN documentation for details.


4.6 KVM_SET_MEMORY_REGION

Capability: basic
Architectures: all
Type: vm ioctl
Parameters: struct kvm_memory_region (in)
Returns: 0 on success, -1 on error

This ioctl is obsolete and has been removed.


4.7 KVM_CREATE_VCPU

Capability: basic
Architectures: all
Type: vm ioctl
Parameters: vcpu id (apic id on x86)
Returns: vcpu fd on success, -1 on error

This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
in the range [0, max_vcpus).

The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
the KVM_CHECK_EXTENSION ioctl() at run-time.
The maximum possible value for max_vcpus can be retrieved using the
KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.

If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
cpus max.
If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is
same as the value returned from KVM_CAP_NR_VCPUS.

On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
threads in one or more virtual CPU cores.  (This is because the
hardware requires all the hardware threads in a CPU core to be in the
same partition.)  The KVM_CAP_PPC_SMT capability indicates the number
of vcpus per virtual core (vcore).  The vcore id is obtained by
dividing the vcpu id by the number of vcpus per vcore.  The vcpus in a
given vcore will always be in the same physical core as each other
(though that might be a different physical core from time to time).
Userspace can control the threading (SMT) mode of the guest by its
allocation of vcpu ids.  For example, if userspace wants
single-threaded guest vcpus, it should make all vcpu ids be a multiple
of the number of vcpus per vcore.

For virtual cpus that have been created with S390 user controlled virtual
machines, the resulting vcpu fd can be memory mapped at page offset
KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
cpu's hardware control block.


4.8 KVM_GET_DIRTY_LOG (vm ioctl)

Capability: basic
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_dirty_log (in/out)
Returns: 0 on success, -1 on error

/* for KVM_GET_DIRTY_LOG */
struct kvm_dirty_log {
	__u32 slot;
	__u32 padding;
	union {
		void __user *dirty_bitmap; /* one bit per page */
		__u64 padding;
	};
};

Given a memory slot, return a bitmap containing any pages dirtied
since the last call to this ioctl.  Bit 0 is the first page in the
memory slot.  Ensure the entire structure is cleared to avoid padding
issues.


4.9 KVM_SET_MEMORY_ALIAS

Capability: basic
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_memory_alias (in)
Returns: 0 (success), -1 (error)

This ioctl is obsolete and has been removed.


4.10 KVM_RUN

Capability: basic
Architectures: all
Type: vcpu ioctl
Parameters: none
Returns: 0 on success, -1 on error
Errors:
  EINTR:     an unmasked signal is pending

This ioctl is used to run a guest virtual cpu.  While there are no
explicit parameters, there is an implicit parameter block that can be
obtained by mmap()ing the vcpu fd at offset 0, with the size given by
KVM_GET_VCPU_MMAP_SIZE.  The parameter block is formatted as a 'struct
kvm_run' (see below).


4.11 KVM_GET_REGS

Capability: basic
Architectures: all except ARM, arm64
Type: vcpu ioctl
Parameters: struct kvm_regs (out)
Returns: 0 on success, -1 on error

Reads the general purpose registers from the vcpu.

/* x86 */
struct kvm_regs {
	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
	__u64 rax, rbx, rcx, rdx;
	__u64 rsi, rdi, rsp, rbp;
	__u64 r8,  r9,  r10, r11;
	__u64 r12, r13, r14, r15;
	__u64 rip, rflags;
};


4.12 KVM_SET_REGS

Capability: basic
Architectures: all except ARM, arm64
Type: vcpu ioctl
Parameters: struct kvm_regs (in)
Returns: 0 on success, -1 on error

Writes the general purpose registers into the vcpu.

See KVM_GET_REGS for the data structure.


4.13 KVM_GET_SREGS

Capability: basic
Architectures: x86, ppc
Type: vcpu ioctl
Parameters: struct kvm_sregs (out)
Returns: 0 on success, -1 on error

Reads special registers from the vcpu.

/* x86 */
struct kvm_sregs {
	struct kvm_segment cs, ds, es, fs, gs, ss;
	struct kvm_segment tr, ldt;
	struct kvm_dtable gdt, idt;
	__u64 cr0, cr2, cr3, cr4, cr8;
	__u64 efer;
	__u64 apic_base;
	__u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
};

/* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */

interrupt_bitmap is a bitmap of pending external interrupts.  At most
one bit may be set.  This interrupt has been acknowledged by the APIC
but not yet injected into the cpu core.


4.14 KVM_SET_SREGS

Capability: basic
Architectures: x86, ppc
Type: vcpu ioctl
Parameters: struct kvm_sregs (in)
Returns: 0 on success, -1 on error

Writes special registers into the vcpu.  See KVM_GET_SREGS for the
data structures.


4.15 KVM_TRANSLATE

Capability: basic
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_translation (in/out)
Returns: 0 on success, -1 on error

Translates a virtual address according to the vcpu's current address
translation mode.

struct kvm_translation {
	/* in */
	__u64 linear_address;

	/* out */
	__u64 physical_address;
	__u8  valid;
	__u8  writeable;
	__u8  usermode;
	__u8  pad[5];
};


4.16 KVM_INTERRUPT

Capability: basic
Architectures: x86, ppc
Type: vcpu ioctl
Parameters: struct kvm_interrupt (in)
Returns: 0 on success, -1 on error

Queues a hardware interrupt vector to be injected.  This is only
useful if in-kernel local APIC or equivalent is not used.

/* for KVM_INTERRUPT */
struct kvm_interrupt {
	/* in */
	__u32 irq;
};

X86:

Note 'irq' is an interrupt vector, not an interrupt pin or line.

PPC:

Queues an external interrupt to be injected. This ioctl is overleaded
with 3 different irq values:

a) KVM_INTERRUPT_SET

  This injects an edge type external interrupt into the guest once it's ready
  to receive interrupts. When injected, the interrupt is done.

b) KVM_INTERRUPT_UNSET

  This unsets any pending interrupt.

  Only available with KVM_CAP_PPC_UNSET_IRQ.

c) KVM_INTERRUPT_SET_LEVEL

  This injects a level type external interrupt into the guest context. The
  interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
  is triggered.

  Only available with KVM_CAP_PPC_IRQ_LEVEL.

Note that any value for 'irq' other than the ones stated above is invalid
and incurs unexpected behavior.


4.17 KVM_DEBUG_GUEST

Capability: basic
Architectures: none
Type: vcpu ioctl
Parameters: none)
Returns: -1 on error

Support for this has been removed.  Use KVM_SET_GUEST_DEBUG instead.


4.18 KVM_GET_MSRS

Capability: basic
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_msrs (in/out)
Returns: 0 on success, -1 on error

Reads model-specific registers from the vcpu.  Supported msr indices can
be obtained using KVM_GET_MSR_INDEX_LIST.

struct kvm_msrs {
	__u32 nmsrs; /* number of msrs in entries */
	__u32 pad;

	struct kvm_msr_entry entries[0];
};

struct kvm_msr_entry {
	__u32 index;
	__u32 reserved;
	__u64 data;
};

Application code should set the 'nmsrs' member (which indicates the
size of the entries array) and the 'index' member of each array entry.
kvm will fill in the 'data' member.


4.19 KVM_SET_MSRS

Capability: basic
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_msrs (in)
Returns: 0 on success, -1 on error

Writes model-specific registers to the vcpu.  See KVM_GET_MSRS for the
data structures.

Application code should set the 'nmsrs' member (which indicates the
size of the entries array), and the 'index' and 'data' members of each
array entry.


4.20 KVM_SET_CPUID

Capability: basic
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_cpuid (in)
Returns: 0 on success, -1 on error

Defines the vcpu responses to the cpuid instruction.  Applications
should use the KVM_SET_CPUID2 ioctl if available.


struct kvm_cpuid_entry {
	__u32 function;
	__u32 eax;
	__u32 ebx;
	__u32 ecx;
	__u32 edx;
	__u32 padding;
};

/* for KVM_SET_CPUID */
struct kvm_cpuid {
	__u32 nent;
	__u32 padding;
	struct kvm_cpuid_entry entries[0];
};


4.21 KVM_SET_SIGNAL_MASK

Capability: basic
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_signal_mask (in)
Returns: 0 on success, -1 on error

Defines which signals are blocked during execution of KVM_RUN.  This
signal mask temporarily overrides the threads signal mask.  Any
unblocked signal received (except SIGKILL and SIGSTOP, which retain
their traditional behaviour) will cause KVM_RUN to return with -EINTR.

Note the signal will only be delivered if not blocked by the original
signal mask.

/* for KVM_SET_SIGNAL_MASK */
struct kvm_signal_mask {
	__u32 len;
	__u8  sigset[0];
};


4.22 KVM_GET_FPU

Capability: basic
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_fpu (out)
Returns: 0 on success, -1 on error

Reads the floating point state from the vcpu.

/* for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
	__u8  fpr[8][16];
	__u16 fcw;
	__u16 fsw;
	__u8  ftwx;  /* in fxsave format */
	__u8  pad1;
	__u16 last_opcode;
	__u64 last_ip;
	__u64 last_dp;
	__u8  xmm[16][16];
	__u32 mxcsr;
	__u32 pad2;
};


4.23 KVM_SET_FPU

Capability: basic
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_fpu (in)
Returns: 0 on success, -1 on error

Writes the floating point state to the vcpu.

/* for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
	__u8  fpr[8][16];
	__u16 fcw;
	__u16 fsw;
	__u8  ftwx;  /* in fxsave format */
	__u8  pad1;
	__u16 last_opcode;
	__u64 last_ip;
	__u64 last_dp;
	__u8  xmm[16][16];
	__u32 mxcsr;
	__u32 pad2;
};


4.24 KVM_CREATE_IRQCHIP

Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
Architectures: x86, ia64, ARM, arm64, s390
Type: vm ioctl
Parameters: none
Returns: 0 on success, -1 on error

Creates an interrupt controller model in the kernel.  On x86, creates a virtual
ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM/arm64, a GIC is
created. On s390, a dummy irq routing table is created.

Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.


4.25 KVM_IRQ_LINE

Capability: KVM_CAP_IRQCHIP
Architectures: x86, ia64, arm, arm64
Type: vm ioctl
Parameters: struct kvm_irq_level
Returns: 0 on success, -1 on error

Sets the level of a GSI input to the interrupt controller model in the kernel.
On some architectures it is required that an interrupt controller model has
been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
interrupts require the level to be set to 1 and then back to 0.

On real hardware, interrupt pins can be active-low or active-high.  This
does not matter for the level field of struct kvm_irq_level: 1 always
means active (asserted), 0 means inactive (deasserted).

x86 allows the operating system to program the interrupt polarity
(active-low/active-high) for level-triggered interrupts, and KVM used
to consider the polarity.  However, due to bitrot in the handling of
active-low interrupts, the above convention is now valid on x86 too.
This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED.  Userspace
should not present interrupts to the guest as active-low unless this
capability is present (or unless it is not using the in-kernel irqchip,
of course).


ARM/arm64 can signal an interrupt either at the CPU level, or at the
in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
use PPIs designated for specific cpus.  The irq field is interpreted
like this:

  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |     irq_id     |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)

(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)

In both cases, level is used to assert/deassert the line.

struct kvm_irq_level {
	union {
		__u32 irq;     /* GSI */
		__s32 status;  /* not used for KVM_IRQ_LEVEL */
	};
	__u32 level;           /* 0 or 1 */
};


4.26 KVM_GET_IRQCHIP

Capability: KVM_CAP_IRQCHIP
Architectures: x86, ia64
Type: vm ioctl
Parameters: struct kvm_irqchip (in/out)
Returns: 0 on success, -1 on error

Reads the state of a kernel interrupt controller created with
KVM_CREATE_IRQCHIP into a buffer provided by the caller.

struct kvm_irqchip {
	__u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
	__u32 pad;
        union {
		char dummy[512];  /* reserving space */
		struct kvm_pic_state pic;
		struct kvm_ioapic_state ioapic;
	} chip;
};


4.27 KVM_SET_IRQCHIP

Capability: KVM_CAP_IRQCHIP
Architectures: x86, ia64
Type: vm ioctl
Parameters: struct kvm_irqchip (in)
Returns: 0 on success, -1 on error

Sets the state of a kernel interrupt controller created with
KVM_CREATE_IRQCHIP from a buffer provided by the caller.

struct kvm_irqchip {
	__u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
	__u32 pad;
        union {
		char dummy[512];  /* reserving space */
		struct kvm_pic_state pic;
		struct kvm_ioapic_state ioapic;
	} chip;
};


4.28 KVM_XEN_HVM_CONFIG

Capability: KVM_CAP_XEN_HVM
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_xen_hvm_config (in)
Returns: 0 on success, -1 on error

Sets the MSR that the Xen HVM guest uses to initialize its hypercall
page, and provides the starting address and size of the hypercall
blobs in userspace.  When the guest writes the MSR, kvm copies one
page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
memory.

struct kvm_xen_hvm_config {
	__u32 flags;
	__u32 msr;
	__u64 blob_addr_32;
	__u64 blob_addr_64;
	__u8 blob_size_32;
	__u8 blob_size_64;
	__u8 pad2[30];
};


4.29 KVM_GET_CLOCK

Capability: KVM_CAP_ADJUST_CLOCK
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_clock_data (out)
Returns: 0 on success, -1 on error

Gets the current timestamp of kvmclock as seen by the current guest. In
conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
such as migration.

struct kvm_clock_data {
	__u64 clock;  /* kvmclock current value */
	__u32 flags;
	__u32 pad[9];
};


4.30 KVM_SET_CLOCK

Capability: KVM_CAP_ADJUST_CLOCK
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_clock_data (in)
Returns: 0 on success, -1 on error

Sets the current timestamp of kvmclock to the value specified in its parameter.
In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
such as migration.

struct kvm_clock_data {
	__u64 clock;  /* kvmclock current value */
	__u32 flags;
	__u32 pad[9];
};


4.31 KVM_GET_VCPU_EVENTS

Capability: KVM_CAP_VCPU_EVENTS
Extended by: KVM_CAP_INTR_SHADOW
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_vcpu_event (out)
Returns: 0 on success, -1 on error

Gets currently pending exceptions, interrupts, and NMIs as well as related
states of the vcpu.

struct kvm_vcpu_events {
	struct {
		__u8 injected;
		__u8 nr;
		__u8 has_error_code;
		__u8 pad;
		__u32 error_code;
	} exception;
	struct {
		__u8 injected;
		__u8 nr;
		__u8 soft;
		__u8 shadow;
	} interrupt;
	struct {
		__u8 injected;
		__u8 pending;
		__u8 masked;
		__u8 pad;
	} nmi;
	__u32 sipi_vector;
	__u32 flags;
};

KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
interrupt.shadow contains a valid state. Otherwise, this field is undefined.


4.32 KVM_SET_VCPU_EVENTS

Capability: KVM_CAP_VCPU_EVENTS
Extended by: KVM_CAP_INTR_SHADOW
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_vcpu_event (in)
Returns: 0 on success, -1 on error

Set pending exceptions, interrupts, and NMIs as well as related states of the
vcpu.

See KVM_GET_VCPU_EVENTS for the data structure.

Fields that may be modified asynchronously by running VCPUs can be excluded
from the update. These fields are nmi.pending and sipi_vector. Keep the
corresponding bits in the flags field cleared to suppress overwriting the
current in-kernel state. The bits are:

KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector

If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
the flags field to signal that interrupt.shadow contains a valid state and
shall be written into the VCPU.


4.33 KVM_GET_DEBUGREGS

Capability: KVM_CAP_DEBUGREGS
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_debugregs (out)
Returns: 0 on success, -1 on error

Reads debug registers from the vcpu.

struct kvm_debugregs {
	__u64 db[4];
	__u64 dr6;
	__u64 dr7;
	__u64 flags;
	__u64 reserved[9];
};


4.34 KVM_SET_DEBUGREGS

Capability: KVM_CAP_DEBUGREGS
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_debugregs (in)
Returns: 0 on success, -1 on error

Writes debug registers into the vcpu.

See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.


4.35 KVM_SET_USER_MEMORY_REGION

Capability: KVM_CAP_USER_MEM
Architectures: all
Type: vm ioctl
Parameters: struct kvm_userspace_memory_region (in)
Returns: 0 on success, -1 on error

struct kvm_userspace_memory_region {
	__u32 slot;
	__u32 flags;
	__u64 guest_phys_addr;
	__u64 memory_size; /* bytes */
	__u64 userspace_addr; /* start of the userspace allocated memory */
};

/* for kvm_memory_region::flags */
#define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
#define KVM_MEM_READONLY	(1UL << 1)

This ioctl allows the user to create or modify a guest physical memory
slot.  When changing an existing slot, it may be moved in the guest
physical memory space, or its flags may be modified.  It may not be
resized.  Slots may not overlap in guest physical address space.

Memory for the region is taken starting at the address denoted by the
field userspace_addr, which must point at user addressable memory for
the entire memory slot size.  Any object may back this memory, including
anonymous memory, ordinary files, and hugetlbfs.

It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
be identical.  This allows large pages in the guest to be backed by large
pages in the host.

The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
to make a new slot read-only.  In this case, writes to this memory will be
posted to userspace as KVM_EXIT_MMIO exits.

When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
the memory region are automatically reflected into the guest.  For example, an
mmap() that affects the region will be made visible immediately.  Another
example is madvise(MADV_DROP).

It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.


4.36 KVM_SET_TSS_ADDR

Capability: KVM_CAP_SET_TSS_ADDR
Architectures: x86
Type: vm ioctl
Parameters: unsigned long tss_address (in)
Returns: 0 on success, -1 on error

This ioctl defines the physical address of a three-page region in the guest
physical address space.  The region must be within the first 4GB of the
guest physical address space and must not conflict with any memory slot
or any mmio address.  The guest may malfunction if it accesses this memory
region.

This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).


4.37 KVM_ENABLE_CAP

Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
Architectures: ppc, s390
Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
Parameters: struct kvm_enable_cap (in)
Returns: 0 on success; -1 on error

+Not all extensions are enabled by default. Using this ioctl the application
can enable an extension, making it available to the guest.

On systems that do not support this ioctl, it always fails. On systems that
do support it, it only works for extensions that are supported for enablement.

To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
be used.

struct kvm_enable_cap {
       /* in */
       __u32 cap;

The capability that is supposed to get enabled.

       __u32 flags;

A bitfield indicating future enhancements. Has to be 0 for now.

       __u64 args[4];

Arguments for enabling a feature. If a feature needs initial values to
function properly, this is the place to put them.

       __u8  pad[64];
};

The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.

4.38 KVM_GET_MP_STATE

Capability: KVM_CAP_MP_STATE
Architectures: x86, ia64
Type: vcpu ioctl
Parameters: struct kvm_mp_state (out)
Returns: 0 on success; -1 on error

struct kvm_mp_state {
	__u32 mp_state;
};

Returns the vcpu's current "multiprocessing state" (though also valid on
uniprocessor guests).

Possible values are:

 - KVM_MP_STATE_RUNNABLE:        the vcpu is currently running
 - KVM_MP_STATE_UNINITIALIZED:   the vcpu is an application processor (AP)
                                 which has not yet received an INIT signal
 - KVM_MP_STATE_INIT_RECEIVED:   the vcpu has received an INIT signal, and is
                                 now ready for a SIPI
 - KVM_MP_STATE_HALTED:          the vcpu has executed a HLT instruction and
                                 is waiting for an interrupt
 - KVM_MP_STATE_SIPI_RECEIVED:   the vcpu has just received a SIPI (vector
                                 accessible via KVM_GET_VCPU_EVENTS)

This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
irqchip, the multiprocessing state must be maintained by userspace.


4.39 KVM_SET_MP_STATE

Capability: KVM_CAP_MP_STATE
Architectures: x86, ia64
Type: vcpu ioctl
Parameters: struct kvm_mp_state (in)
Returns: 0 on success; -1 on error

Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
arguments.

This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
irqchip, the multiprocessing state must be maintained by userspace.


4.40 KVM_SET_IDENTITY_MAP_ADDR

Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
Architectures: x86
Type: vm ioctl
Parameters: unsigned long identity (in)
Returns: 0 on success, -1 on error

This ioctl defines the physical address of a one-page region in the guest
physical address space.  The region must be within the first 4GB of the
guest physical address space and must not conflict with any memory slot
or any mmio address.  The guest may malfunction if it accesses this memory
region.

This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).


4.41 KVM_SET_BOOT_CPU_ID

Capability: KVM_CAP_SET_BOOT_CPU_ID
Architectures: x86, ia64
Type: vm ioctl
Parameters: unsigned long vcpu_id
Returns: 0 on success, -1 on error

Define which vcpu is the Bootstrap Processor (BSP).  Values are the same
as the vcpu id in KVM_CREATE_VCPU.  If this ioctl is not called, the default
is vcpu 0.


4.42 KVM_GET_XSAVE

Capability: KVM_CAP_XSAVE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xsave (out)
Returns: 0 on success, -1 on error

struct kvm_xsave {
	__u32 region[1024];
};

This ioctl would copy current vcpu's xsave struct to the userspace.


4.43 KVM_SET_XSAVE

Capability: KVM_CAP_XSAVE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xsave (in)
Returns: 0 on success, -1 on error

struct kvm_xsave {
	__u32 region[1024];
};

This ioctl would copy userspace's xsave struct to the kernel.


4.44 KVM_GET_XCRS

Capability: KVM_CAP_XCRS
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xcrs (out)
Returns: 0 on success, -1 on error

struct kvm_xcr {
	__u32 xcr;
	__u32 reserved;
	__u64 value;
};

struct kvm_xcrs {
	__u32 nr_xcrs;
	__u32 flags;
	struct kvm_xcr xcrs[KVM_MAX_XCRS];
	__u64 padding[16];
};

This ioctl would copy current vcpu's xcrs to the userspace.


4.45 KVM_SET_XCRS

Capability: KVM_CAP_XCRS
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xcrs (in)
Returns: 0 on success, -1 on error

struct kvm_xcr {
	__u32 xcr;
	__u32 reserved;
	__u64 value;
};

struct kvm_xcrs {
	__u32 nr_xcrs;
	__u32 flags;
	struct kvm_xcr xcrs[KVM_MAX_XCRS];
	__u64 padding[16];
};

This ioctl would set vcpu's xcr to the value userspace specified.


4.46 KVM_GET_SUPPORTED_CPUID

Capability: KVM_CAP_EXT_CPUID
Architectures: x86
Type: system ioctl
Parameters: struct kvm_cpuid2 (in/out)
Returns: 0 on success, -1 on error

struct kvm_cpuid2 {
	__u32 nent;
	__u32 padding;
	struct kvm_cpuid_entry2 entries[0];
};

#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
#define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
#define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)

struct kvm_cpuid_entry2 {
	__u32 function;
	__u32 index;
	__u32 flags;
	__u32 eax;
	__u32 ebx;
	__u32 ecx;
	__u32 edx;
	__u32 padding[3];
};

This ioctl returns x86 cpuid features which are supported by both the hardware
and kvm.  Userspace can use the information returned by this ioctl to
construct cpuid information (for KVM_SET_CPUID2) that is consistent with
hardware, kernel, and userspace capabilities, and with user requirements (for
example, the user may wish to constrain cpuid to emulate older hardware,
or for feature consistency across a cluster).

Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
with the 'nent' field indicating the number of entries in the variable-size
array 'entries'.  If the number of entries is too low to describe the cpu
capabilities, an error (E2BIG) is returned.  If the number is too high,
the 'nent' field is adjusted and an error (ENOMEM) is returned.  If the
number is just right, the 'nent' field is adjusted to the number of valid
entries in the 'entries' array, which is then filled.

The entries returned are the host cpuid as returned by the cpuid instruction,
with unknown or unsupported features masked out.  Some features (for example,
x2apic), may not be present in the host cpu, but are exposed by kvm if it can
emulate them efficiently. The fields in each entry are defined as follows:

  function: the eax value used to obtain the entry
  index: the ecx value used to obtain the entry (for entries that are
         affected by ecx)
  flags: an OR of zero or more of the following:
        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
           if the index field is valid
        KVM_CPUID_FLAG_STATEFUL_FUNC:
           if cpuid for this function returns different values for successive
           invocations; there will be several entries with the same function,
           all with this flag set
        KVM_CPUID_FLAG_STATE_READ_NEXT:
           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
           the first entry to be read by a cpu
   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
         this function/index combination

The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
support.  Instead it is reported via

  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)

if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.


4.47 KVM_PPC_GET_PVINFO

Capability: KVM_CAP_PPC_GET_PVINFO
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_pvinfo (out)
Returns: 0 on success, !0 on error

struct kvm_ppc_pvinfo {
	__u32 flags;
	__u32 hcall[4];
	__u8  pad[108];
};

This ioctl fetches PV specific information that need to be passed to the guest
using the device tree or other means from vm context.

The hcall array defines 4 instructions that make up a hypercall.

If any additional field gets added to this structure later on, a bit for that
additional piece of information will be set in the flags bitmap.

The flags bitmap is defined as:

   /* the host supports the ePAPR idle hcall
   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)

4.48 KVM_ASSIGN_PCI_DEVICE

Capability: KVM_CAP_DEVICE_ASSIGNMENT
Architectures: x86 ia64
Type: vm ioctl
Parameters: struct kvm_assigned_pci_dev (in)
Returns: 0 on success, -1 on error

Assigns a host PCI device to the VM.

struct kvm_assigned_pci_dev {
	__u32 assigned_dev_id;
	__u32 busnr;
	__u32 devfn;
	__u32 flags;
	__u32 segnr;
	union {
		__u32 reserved[11];
	};
};

The PCI device is specified by the triple segnr, busnr, and devfn.
Identification in succeeding service requests is done via assigned_dev_id. The
following flags are specified:

/* Depends on KVM_CAP_IOMMU */
#define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
/* The following two depend on KVM_CAP_PCI_2_3 */
#define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)

If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts
via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with other
assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the
guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details.

The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
isolation of the device.  Usages not specifying this flag are deprecated.

Only PCI header type 0 devices with PCI BAR resources are supported by
device assignment.  The user requesting this ioctl must have read/write
access to the PCI sysfs resource files associated with the device.


4.49 KVM_DEASSIGN_PCI_DEVICE

Capability: KVM_CAP_DEVICE_DEASSIGNMENT
Architectures: x86 ia64
Type: vm ioctl
Parameters: struct kvm_assigned_pci_dev (in)
Returns: 0 on success, -1 on error

Ends PCI device assignment, releasing all associated resources.

See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is
used in kvm_assigned_pci_dev to identify the device.


4.50 KVM_ASSIGN_DEV_IRQ

Capability: KVM_CAP_ASSIGN_DEV_IRQ
Architectures: x86 ia64
Type: vm ioctl
Parameters: struct kvm_assigned_irq (in)
Returns: 0 on success, -1 on error

Assigns an IRQ to a passed-through device.

struct kvm_assigned_irq {
	__u32 assigned_dev_id;
	__u32 host_irq; /* ignored (legacy field) */
	__u32 guest_irq;
	__u32 flags;
	union {
		__u32 reserved[12];
	};
};

The following flags are defined:

#define KVM_DEV_IRQ_HOST_INTX    (1 << 0)
#define KVM_DEV_IRQ_HOST_MSI     (1 << 1)
#define KVM_DEV_IRQ_HOST_MSIX    (1 << 2)

#define KVM_DEV_IRQ_GUEST_INTX   (1 << 8)
#define KVM_DEV_IRQ_GUEST_MSI    (1 << 9)
#define KVM_DEV_IRQ_GUEST_MSIX   (1 << 10)

It is not valid to specify multiple types per host or guest IRQ. However, the
IRQ type of host and guest can differ or can even be null.


4.51 KVM_DEASSIGN_DEV_IRQ

Capability: KVM_CAP_ASSIGN_DEV_IRQ
Architectures: x86 ia64
Type: vm ioctl
Parameters: struct kvm_assigned_irq (in)
Returns: 0 on success, -1 on error

Ends an IRQ assignment to a passed-through device.

See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
by assigned_dev_id, flags must correspond to the IRQ type specified on
KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.


4.52 KVM_SET_GSI_ROUTING

Capability: KVM_CAP_IRQ_ROUTING
Architectures: x86 ia64 s390
Type: vm ioctl
Parameters: struct kvm_irq_routing (in)
Returns: 0 on success, -1 on error

Sets the GSI routing table entries, overwriting any previously set entries.

struct kvm_irq_routing {
	__u32 nr;
	__u32 flags;
	struct kvm_irq_routing_entry entries[0];
};

No flags are specified so far, the corresponding field must be set to zero.

struct kvm_irq_routing_entry {
	__u32 gsi;
	__u32 type;
	__u32 flags;
	__u32 pad;
	union {
		struct kvm_irq_routing_irqchip irqchip;
		struct kvm_irq_routing_msi msi;
		struct kvm_irq_routing_s390_adapter adapter;
		__u32 pad[8];
	} u;
};

/* gsi routing entry types */
#define KVM_IRQ_ROUTING_IRQCHIP 1
#define KVM_IRQ_ROUTING_MSI 2
#define KVM_IRQ_ROUTING_S390_ADAPTER 3

No flags are specified so far, the corresponding field must be set to zero.

struct kvm_irq_routing_irqchip {
	__u32 irqchip;
	__u32 pin;
};

struct kvm_irq_routing_msi {
	__u32 address_lo;
	__u32 address_hi;
	__u32 data;
	__u32 pad;
};

struct kvm_irq_routing_s390_adapter {
	__u64 ind_addr;
	__u64 summary_addr;
	__u64 ind_offset;
	__u32 summary_offset;
	__u32 adapter_id;
};


4.53 KVM_ASSIGN_SET_MSIX_NR

Capability: KVM_CAP_DEVICE_MSIX
Architectures: x86 ia64
Type: vm ioctl
Parameters: struct kvm_assigned_msix_nr (in)
Returns: 0 on success, -1 on error

Set the number of MSI-X interrupts for an assigned device. The number is
reset again by terminating the MSI-X assignment of the device via
KVM_DEASSIGN_DEV_IRQ. Calling this service more than once at any earlier
point will fail.

struct kvm_assigned_msix_nr {
	__u32 assigned_dev_id;
	__u16 entry_nr;
	__u16 padding;
};

#define KVM_MAX_MSIX_PER_DEV		256


4.54 KVM_ASSIGN_SET_MSIX_ENTRY

Capability: KVM_CAP_DEVICE_MSIX
Architectures: x86 ia64
Type: vm ioctl
Parameters: struct kvm_assigned_msix_entry (in)
Returns: 0 on success, -1 on error

Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
the GSI vector to zero means disabling the interrupt.

struct kvm_assigned_msix_entry {
	__u32 assigned_dev_id;
	__u32 gsi;
	__u16 entry; /* The index of entry in the MSI-X table */
	__u16 padding[3];
};


4.55 KVM_SET_TSC_KHZ

Capability: KVM_CAP_TSC_CONTROL
Architectures: x86
Type: vcpu ioctl
Parameters: virtual tsc_khz
Returns: 0 on success, -1 on error

Specifies the tsc frequency for the virtual machine. The unit of the
frequency is KHz.


4.56 KVM_GET_TSC_KHZ

Capability: KVM_CAP_GET_TSC_KHZ
Architectures: x86
Type: vcpu ioctl
Parameters: none
Returns: virtual tsc-khz on success, negative value on error

Returns the tsc frequency of the guest. The unit of the return value is
KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
error.


4.57 KVM_GET_LAPIC

Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (out)
Returns: 0 on success, -1 on error

#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
	char regs[KVM_APIC_REG_SIZE];
};

Reads the Local APIC registers and copies them into the input argument.  The
data format and layout are the same as documented in the architecture manual.


4.58 KVM_SET_LAPIC

Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (in)
Returns: 0 on success, -1 on error

#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
	char regs[KVM_APIC_REG_SIZE];
};

Copies the input argument into the Local APIC registers.  The data format
and layout are the same as documented in the architecture manual.


4.59 KVM_IOEVENTFD

Capability: KVM_CAP_IOEVENTFD
Architectures: all
Type: vm ioctl
Parameters: struct kvm_ioeventfd (in)
Returns: 0 on success, !0 on error

This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
within the guest.  A guest write in the registered address will signal the
provided event instead of triggering an exit.

struct kvm_ioeventfd {
	__u64 datamatch;
	__u64 addr;        /* legal pio/mmio address */
	__u32 len;         /* 1, 2, 4, or 8 bytes    */
	__s32 fd;
	__u32 flags;
	__u8  pad[36];
};

For the special case of virtio-ccw devices on s390, the ioevent is matched
to a subchannel/virtqueue tuple instead.

The following flags are defined:

#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
#define KVM_IOEVENTFD_FLAG_PIO       (1 << kvm_ioeventfd_flag_nr_pio)
#define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)

If datamatch flag is set, the event will be signaled only if the written value
to the registered address is equal to datamatch in struct kvm_ioeventfd.

For virtio-ccw devices, addr contains the subchannel id and datamatch the
virtqueue index.


4.60 KVM_DIRTY_TLB

Capability: KVM_CAP_SW_TLB
Architectures: ppc
Type: vcpu ioctl
Parameters: struct kvm_dirty_tlb (in)
Returns: 0 on success, -1 on error

struct kvm_dirty_tlb {
	__u64 bitmap;
	__u32 num_dirty;
};

This must be called whenever userspace has changed an entry in the shared
TLB, prior to calling KVM_RUN on the associated vcpu.

The "bitmap" field is the userspace address of an array.  This array
consists of a number of bits, equal to the total number of TLB entries as
determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
nearest multiple of 64.

Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
array.

The array is little-endian: the bit 0 is the least significant bit of the
first byte, bit 8 is the least significant bit of the second byte, etc.
This avoids any complications with differing word sizes.

The "num_dirty" field is a performance hint for KVM to determine whether it
should skip processing the bitmap and just invalidate everything.  It must
be set to the number of set bits in the bitmap.


4.61 KVM_ASSIGN_SET_INTX_MASK

Capability: KVM_CAP_PCI_2_3
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_assigned_pci_dev (in)
Returns: 0 on success, -1 on error

Allows userspace to mask PCI INTx interrupts from the assigned device.  The
kernel will not deliver INTx interrupts to the guest between setting and
clearing of KVM_ASSIGN_SET_INTX_MASK via this interface.  This enables use of
and emulation of PCI 2.3 INTx disable command register behavior.

This may be used for both PCI 2.3 devices supporting INTx disable natively and
older devices lacking this support. Userspace is responsible for emulating the
read value of the INTx disable bit in the guest visible PCI command register.
When modifying the INTx disable state, userspace should precede updating the
physical device command register by calling this ioctl to inform the kernel of
the new intended INTx mask state.

Note that the kernel uses the device INTx disable bit to internally manage the
device interrupt state for PCI 2.3 devices.  Reads of this register may
therefore not match the expected value.  Writes should always use the guest
intended INTx disable value rather than attempting to read-copy-update the
current physical device state.  Races between user and kernel updates to the
INTx disable bit are handled lazily in the kernel.  It's possible the device
may generate unintended interrupts, but they will not be injected into the
guest.

See KVM_ASSIGN_DEV_IRQ for the data structure.  The target device is specified
by assigned_dev_id.  In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is
evaluated.


4.62 KVM_CREATE_SPAPR_TCE

Capability: KVM_CAP_SPAPR_TCE
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_create_spapr_tce (in)
Returns: file descriptor for manipulating the created TCE table

This creates a virtual TCE (translation control entry) table, which
is an IOMMU for PAPR-style virtual I/O.  It is used to translate
logical addresses used in virtual I/O into guest physical addresses,
and provides a scatter/gather capability for PAPR virtual I/O.

/* for KVM_CAP_SPAPR_TCE */
struct kvm_create_spapr_tce {
	__u64 liobn;
	__u32 window_size;
};

The liobn field gives the logical IO bus number for which to create a
TCE table.  The window_size field specifies the size of the DMA window
which this TCE table will translate - the table will contain one 64
bit TCE entry for every 4kiB of the DMA window.

When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
table has been created using this ioctl(), the kernel will handle it
in real mode, updating the TCE table.  H_PUT_TCE calls for other
liobns will cause a vm exit and must be handled by userspace.

The return value is a file descriptor which can be passed to mmap(2)
to map the created TCE table into userspace.  This lets userspace read
the entries written by kernel-handled H_PUT_TCE calls, and also lets
userspace update the TCE table directly which is useful in some
circumstances.


4.63 KVM_ALLOCATE_RMA

Capability: KVM_CAP_PPC_RMA
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_allocate_rma (out)
Returns: file descriptor for mapping the allocated RMA

This allocates a Real Mode Area (RMA) from the pool allocated at boot
time by the kernel.  An RMA is a physically-contiguous, aligned region
of memory used on older POWER processors to provide the memory which
will be accessed by real-mode (MMU off) accesses in a KVM guest.
POWER processors support a set of sizes for the RMA that usually
includes 64MB, 128MB, 256MB and some larger powers of two.

/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
	__u64 rma_size;
};

The return value is a file descriptor which can be passed to mmap(2)
to map the allocated RMA into userspace.  The mapped area can then be
passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
RMA for a virtual machine.  The size of the RMA in bytes (which is
fixed at host kernel boot time) is returned in the rma_size field of
the argument structure.

The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
is supported; 2 if the processor requires all virtual machines to have
an RMA, or 1 if the processor can use an RMA but doesn't require it,
because it supports the Virtual RMA (VRMA) facility.


4.64 KVM_NMI

Capability: KVM_CAP_USER_NMI
Architectures: x86
Type: vcpu ioctl
Parameters: none
Returns: 0 on success, -1 on error

Queues an NMI on the thread's vcpu.  Note this is well defined only
when KVM_CREATE_IRQCHIP has not been called, since this is an interface
between the virtual cpu core and virtual local APIC.  After KVM_CREATE_IRQCHIP
has been called, this interface is completely emulated within the kernel.

To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
following algorithm:

  - pause the vpcu
  - read the local APIC's state (KVM_GET_LAPIC)
  - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
  - if so, issue KVM_NMI
  - resume the vcpu

Some guests configure the LINT1 NMI input to cause a panic, aiding in
debugging.


4.65 KVM_S390_UCAS_MAP

Capability: KVM_CAP_S390_UCONTROL
Architectures: s390
Type: vcpu ioctl
Parameters: struct kvm_s390_ucas_mapping (in)
Returns: 0 in case of success

The parameter is defined like this:
	struct kvm_s390_ucas_mapping {
		__u64 user_addr;
		__u64 vcpu_addr;
		__u64 length;
	};

This ioctl maps the memory at "user_addr" with the length "length" to
the vcpu's address space starting at "vcpu_addr". All parameters need to
be aligned by 1 megabyte.


4.66 KVM_S390_UCAS_UNMAP

Capability: KVM_CAP_S390_UCONTROL
Architectures: s390
Type: vcpu ioctl
Parameters: struct kvm_s390_ucas_mapping (in)
Returns: 0 in case of success

The parameter is defined like this:
	struct kvm_s390_ucas_mapping {
		__u64 user_addr;
		__u64 vcpu_addr;
		__u64 length;
	};

This ioctl unmaps the memory in the vcpu's address space starting at
"vcpu_addr" with the length "length". The field "user_addr" is ignored.
All parameters need to be aligned by 1 megabyte.


4.67 KVM_S390_VCPU_FAULT

Capability: KVM_CAP_S390_UCONTROL
Architectures: s390
Type: vcpu ioctl
Parameters: vcpu absolute address (in)
Returns: 0 in case of success

This call creates a page table entry on the virtual cpu's address space
(for user controlled virtual machines) or the virtual machine's address
space (for regular virtual machines). This only works for minor faults,
thus it's recommended to access subject memory page via the user page
table upfront. This is useful to handle validity intercepts for user
controlled virtual machines to fault in the virtual cpu's lowcore pages
prior to calling the KVM_RUN ioctl.


4.68 KVM_SET_ONE_REG

Capability: KVM_CAP_ONE_REG
Architectures: all
Type: vcpu ioctl
Parameters: struct kvm_one_reg (in)
Returns: 0 on success, negative value on failure

struct kvm_one_reg {
       __u64 id;
       __u64 addr;
};

Using this ioctl, a single vcpu register can be set to a specific value
defined by user space with the passed in struct kvm_one_reg, where id
refers to the register identifier as described below and addr is a pointer
to a variable with the respective size. There can be architecture agnostic
and architecture specific registers. Each have their own range of operation
and their own constants and width. To keep track of the implemented
registers, find a list below:

  Arch  |       Register        | Width (bits)
        |                       |
  PPC   | KVM_REG_PPC_HIOR      | 64
  PPC   | KVM_REG_PPC_IAC1      | 64
  PPC   | KVM_REG_PPC_IAC2      | 64
  PPC   | KVM_REG_PPC_IAC3      | 64
  PPC   | KVM_REG_PPC_IAC4      | 64
  PPC   | KVM_REG_PPC_DAC1      | 64
  PPC   | KVM_REG_PPC_DAC2      | 64
  PPC   | KVM_REG_PPC_DABR      | 64
  PPC   | KVM_REG_PPC_DSCR      | 64
  PPC   | KVM_REG_PPC_PURR      | 64
  PPC   | KVM_REG_PPC_SPURR     | 64
  PPC   | KVM_REG_PPC_DAR       | 64
  PPC   | KVM_REG_PPC_DSISR     | 32
  PPC   | KVM_REG_PPC_AMR       | 64
  PPC   | KVM_REG_PPC_UAMOR     | 64
  PPC   | KVM_REG_PPC_MMCR0     | 64
  PPC   | KVM_REG_PPC_MMCR1     | 64
  PPC   | KVM_REG_PPC_MMCRA     | 64
  PPC   | KVM_REG_PPC_PMC1      | 32
  PPC   | KVM_REG_PPC_PMC2      | 32
  PPC   | KVM_REG_PPC_PMC3      | 32
  PPC   | KVM_REG_PPC_PMC4      | 32
  PPC   | KVM_REG_PPC_PMC5      | 32
  PPC   | KVM_REG_PPC_PMC6      | 32
  PPC   | KVM_REG_PPC_PMC7      | 32
  PPC   | KVM_REG_PPC_PMC8      | 32
  PPC   | KVM_REG_PPC_FPR0      | 64
          ...
  PPC   | KVM_REG_PPC_FPR31     | 64
  PPC   | KVM_REG_PPC_VR0       | 128
          ...
  PPC   | KVM_REG_PPC_VR31      | 128
  PPC   | KVM_REG_PPC_VSR0      | 128
          ...
  PPC   | KVM_REG_PPC_VSR31     | 128
  PPC   | KVM_REG_PPC_FPSCR     | 64
  PPC   | KVM_REG_PPC_VSCR      | 32
  PPC   | KVM_REG_PPC_VPA_ADDR  | 64
  PPC   | KVM_REG_PPC_VPA_SLB   | 128
  PPC   | KVM_REG_PPC_VPA_DTL   | 128
  PPC   | KVM_REG_PPC_EPCR	| 32
  PPC   | KVM_REG_PPC_EPR	| 32
  PPC   | KVM_REG_PPC_TCR	| 32
  PPC   | KVM_REG_PPC_TSR	| 32
  PPC   | KVM_REG_PPC_OR_TSR	| 32
  PPC   | KVM_REG_PPC_CLEAR_TSR	| 32
  PPC   | KVM_REG_PPC_MAS0	| 32
  PPC   | KVM_REG_PPC_MAS1	| 32
  PPC   | KVM_REG_PPC_MAS2	| 64
  PPC   | KVM_REG_PPC_MAS7_3	| 64
  PPC   | KVM_REG_PPC_MAS4	| 32
  PPC   | KVM_REG_PPC_MAS6	| 32
  PPC   | KVM_REG_PPC_MMUCFG	| 32
  PPC   | KVM_REG_PPC_TLB0CFG	| 32
  PPC   | KVM_REG_PPC_TLB1CFG	| 32
  PPC   | KVM_REG_PPC_TLB2CFG	| 32
  PPC   | KVM_REG_PPC_TLB3CFG	| 32
  PPC   | KVM_REG_PPC_TLB0PS	| 32
  PPC   | KVM_REG_PPC_TLB1PS	| 32
  PPC   | KVM_REG_PPC_TLB2PS	| 32
  PPC   | KVM_REG_PPC_TLB3PS	| 32
  PPC   | KVM_REG_PPC_EPTCFG	| 32
  PPC   | KVM_REG_PPC_ICP_STATE | 64
  PPC   | KVM_REG_PPC_TB_OFFSET	| 64
  PPC   | KVM_REG_PPC_SPMC1	| 32
  PPC   | KVM_REG_PPC_SPMC2	| 32
  PPC   | KVM_REG_PPC_IAMR	| 64
  PPC   | KVM_REG_PPC_TFHAR	| 64
  PPC   | KVM_REG_PPC_TFIAR	| 64
  PPC   | KVM_REG_PPC_TEXASR	| 64
  PPC   | KVM_REG_PPC_FSCR	| 64
  PPC   | KVM_REG_PPC_PSPB	| 32
  PPC   | KVM_REG_PPC_EBBHR	| 64
  PPC   | KVM_REG_PPC_EBBRR	| 64
  PPC   | KVM_REG_PPC_BESCR	| 64
  PPC   | KVM_REG_PPC_TAR	| 64
  PPC   | KVM_REG_PPC_DPDES	| 64
  PPC   | KVM_REG_PPC_DAWR	| 64
  PPC   | KVM_REG_PPC_DAWRX	| 64
  PPC   | KVM_REG_PPC_CIABR	| 64
  PPC   | KVM_REG_PPC_IC	| 64
  PPC   | KVM_REG_PPC_VTB	| 64
  PPC   | KVM_REG_PPC_CSIGR	| 64
  PPC   | KVM_REG_PPC_TACR	| 64
  PPC   | KVM_REG_PPC_TCSCR	| 64
  PPC   | KVM_REG_PPC_PID	| 64
  PPC   | KVM_REG_PPC_ACOP	| 64
  PPC   | KVM_REG_PPC_VRSAVE	| 32
  PPC   | KVM_REG_PPC_LPCR	| 64
  PPC   | KVM_REG_PPC_PPR	| 64
  PPC   | KVM_REG_PPC_ARCH_COMPAT 32
  PPC   | KVM_REG_PPC_DABRX     | 32
  PPC   | KVM_REG_PPC_TM_GPR0	| 64
          ...
  PPC   | KVM_REG_PPC_TM_GPR31	| 64
  PPC   | KVM_REG_PPC_TM_VSR0	| 128
          ...
  PPC   | KVM_REG_PPC_TM_VSR63	| 128
  PPC   | KVM_REG_PPC_TM_CR	| 64
  PPC   | KVM_REG_PPC_TM_LR	| 64
  PPC   | KVM_REG_PPC_TM_CTR	| 64
  PPC   | KVM_REG_PPC_TM_FPSCR	| 64
  PPC   | KVM_REG_PPC_TM_AMR	| 64
  PPC   | KVM_REG_PPC_TM_PPR	| 64
  PPC   | KVM_REG_PPC_TM_VRSAVE	| 64
  PPC   | KVM_REG_PPC_TM_VSCR	| 32
  PPC   | KVM_REG_PPC_TM_DSCR	| 64
  PPC   | KVM_REG_PPC_TM_TAR	| 64

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM core registers have the following id bit patterns:
  0x4020 0000 0010 <index into the kvm_regs struct:16>

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>

ARM CCSIDR registers are demultiplexed by CSSELR value:
  0x4020 0000 0011 00 <csselr:8>

ARM 32-bit VFP control registers have the following id bit patterns:
  0x4020 0000 0012 1 <regno:12>

ARM 64-bit FP registers have the following id bit patterns:
  0x4030 0000 0012 0 <regno:12>


arm64 registers are mapped using the lower 32 bits. The upper 16 of
that is the register group type, or coprocessor number:

arm64 core/FP-SIMD registers have the following id bit patterns. Note
that the size of the access is variable, as the kvm_regs structure
contains elements ranging from 32 to 128 bits. The index is a 32bit
value in the kvm_regs structure seen as a 32bit array.
  0x60x0 0000 0010 <index into the kvm_regs struct:16>

arm64 CCSIDR registers are demultiplexed by CSSELR value:
  0x6020 0000 0011 00 <csselr:8>

arm64 system registers have the following id bit patterns:
  0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>

4.69 KVM_GET_ONE_REG

Capability: KVM_CAP_ONE_REG
Architectures: all
Type: vcpu ioctl
Parameters: struct kvm_one_reg (in and out)
Returns: 0 on success, negative value on failure

This ioctl allows to receive the value of a single register implemented
in a vcpu. The register to read is indicated by the "id" field of the
kvm_one_reg struct passed in. On success, the register value can be found
at the memory location pointed to by "addr".

The list of registers accessible using this interface is identical to the
list in 4.68.


4.70 KVM_KVMCLOCK_CTRL

Capability: KVM_CAP_KVMCLOCK_CTRL
Architectures: Any that implement pvclocks (currently x86 only)
Type: vcpu ioctl
Parameters: None
Returns: 0 on success, -1 on error

This signals to the host kernel that the specified guest is being paused by
userspace.  The host will set a flag in the pvclock structure that is checked
from the soft lockup watchdog.  The flag is part of the pvclock structure that
is shared between guest and host, specifically the second bit of the flags
field of the pvclock_vcpu_time_info structure.  It will be set exclusively by
the host and read/cleared exclusively by the guest.  The guest operation of
checking and clearing the flag must an atomic operation so
load-link/store-conditional, or equivalent must be used.  There are two cases
where the guest will clear the flag: when the soft lockup watchdog timer resets
itself or when a soft lockup is detected.  This ioctl can be called any time
after pausing the vcpu, but before it is resumed.


4.71 KVM_SIGNAL_MSI

Capability: KVM_CAP_SIGNAL_MSI
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_msi (in)
Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error

Directly inject a MSI message. Only valid with in-kernel irqchip that handles
MSI messages.

struct kvm_msi {
	__u32 address_lo;
	__u32 address_hi;
	__u32 data;
	__u32 flags;
	__u8  pad[16];
};

No flags are defined so far. The corresponding field must be 0.


4.71 KVM_CREATE_PIT2

Capability: KVM_CAP_PIT2
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_pit_config (in)
Returns: 0 on success, -1 on error

Creates an in-kernel device model for the i8254 PIT. This call is only valid
after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
parameters have to be passed:

struct kvm_pit_config {
	__u32 flags;
	__u32 pad[15];
};

Valid flags are:

#define KVM_PIT_SPEAKER_DUMMY     1 /* emulate speaker port stub */

PIT timer interrupts may use a per-VM kernel thread for injection. If it
exists, this thread will have a name of the following pattern:

kvm-pit/<owner-process-pid>

When running a guest with elevated priorities, the scheduling parameters of
this thread may have to be adjusted accordingly.

This IOCTL replaces the obsolete KVM_CREATE_PIT.


4.72 KVM_GET_PIT2

Capability: KVM_CAP_PIT_STATE2
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_pit_state2 (out)
Returns: 0 on success, -1 on error

Retrieves the state of the in-kernel PIT model. Only valid after
KVM_CREATE_PIT2. The state is returned in the following structure:

struct kvm_pit_state2 {
	struct kvm_pit_channel_state channels[3];
	__u32 flags;
	__u32 reserved[9];
};

Valid flags are:

/* disable PIT in HPET legacy mode */
#define KVM_PIT_FLAGS_HPET_LEGACY  0x00000001

This IOCTL replaces the obsolete KVM_GET_PIT.


4.73 KVM_SET_PIT2

Capability: KVM_CAP_PIT_STATE2
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_pit_state2 (in)
Returns: 0 on success, -1 on error

Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
See KVM_GET_PIT2 for details on struct kvm_pit_state2.

This IOCTL replaces the obsolete KVM_SET_PIT.


4.74 KVM_PPC_GET_SMMU_INFO

Capability: KVM_CAP_PPC_GET_SMMU_INFO
Architectures: powerpc
Type: vm ioctl
Parameters: None
Returns: 0 on success, -1 on error

This populates and returns a structure describing the features of
the "Server" class MMU emulation supported by KVM.
This can in turn be used by userspace to generate the appropriate
device-tree properties for the guest operating system.

The structure contains some global informations, followed by an
array of supported segment page sizes:

      struct kvm_ppc_smmu_info {
	     __u64 flags;
	     __u32 slb_size;
	     __u32 pad;
	     struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
      };

The supported flags are:

    - KVM_PPC_PAGE_SIZES_REAL:
        When that flag is set, guest page sizes must "fit" the backing
        store page sizes. When not set, any page size in the list can
        be used regardless of how they are backed by userspace.

    - KVM_PPC_1T_SEGMENTS
        The emulated MMU supports 1T segments in addition to the
        standard 256M ones.

The "slb_size" field indicates how many SLB entries are supported

The "sps" array contains 8 entries indicating the supported base
page sizes for a segment in increasing order. Each entry is defined
as follow:

   struct kvm_ppc_one_seg_page_size {
	__u32 page_shift;	/* Base page shift of segment (or 0) */
	__u32 slb_enc;		/* SLB encoding for BookS */
	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
   };

An entry with a "page_shift" of 0 is unused. Because the array is
organized in increasing order, a lookup can stop when encoutering
such an entry.

The "slb_enc" field provides the encoding to use in the SLB for the
page size. The bits are in positions such as the value can directly
be OR'ed into the "vsid" argument of the slbmte instruction.

The "enc" array is a list which for each of those segment base page
size provides the list of supported actual page sizes (which can be
only larger or equal to the base page size), along with the
corresponding encoding in the hash PTE. Similarly, the array is
8 entries sorted by increasing sizes and an entry with a "0" shift
is an empty entry and a terminator:

   struct kvm_ppc_one_page_size {
	__u32 page_shift;	/* Page shift (or 0) */
	__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
   };

The "pte_enc" field provides a value that can OR'ed into the hash
PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
into the hash PTE second double word).

4.75 KVM_IRQFD

Capability: KVM_CAP_IRQFD
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_irqfd (in)
Returns: 0 on success, -1 on error

Allows setting an eventfd to directly trigger a guest interrupt.
kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
kvm_irqfd.gsi specifies the irqchip pin toggled by this event.  When
an event is triggered on the eventfd, an interrupt is injected into
the guest using the specified gsi pin.  The irqfd is removed using
the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
and kvm_irqfd.gsi.

With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
mechanism allowing emulation of level-triggered, irqfd-based
interrupts.  When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
additional eventfd in the kvm_irqfd.resamplefd field.  When operating
in resample mode, posting of an interrupt through kvm_irq.fd asserts
the specified gsi in the irqchip.  When the irqchip is resampled, such
as from an EOI, the gsi is de-asserted and the user is notified via
kvm_irqfd.resamplefd.  It is the user's responsibility to re-queue
the interrupt if the device making use of it still requires service.
Note that closing the resamplefd is not sufficient to disable the
irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.

4.76 KVM_PPC_ALLOCATE_HTAB

Capability: KVM_CAP_PPC_ALLOC_HTAB
Architectures: powerpc
Type: vm ioctl
Parameters: Pointer to u32 containing hash table order (in/out)
Returns: 0 on success, -1 on error

This requests the host kernel to allocate an MMU hash table for a
guest using the PAPR paravirtualization interface.  This only does
anything if the kernel is configured to use the Book 3S HV style of
virtualization.  Otherwise the capability doesn't exist and the ioctl
returns an ENOTTY error.  The rest of this description assumes Book 3S
HV.

There must be no vcpus running when this ioctl is called; if there
are, it will do nothing and return an EBUSY error.

The parameter is a pointer to a 32-bit unsigned integer variable
containing the order (log base 2) of the desired size of the hash
table, which must be between 18 and 46.  On successful return from the
ioctl, it will have been updated with the order of the hash table that
was allocated.

If no hash table has been allocated when any vcpu is asked to run
(with the KVM_RUN ioctl), the host kernel will allocate a
default-sized hash table (16 MB).

If this ioctl is called when a hash table has already been allocated,
the kernel will clear out the existing hash table (zero all HPTEs) and
return the hash table order in the parameter.  (If the guest is using
the virtualized real-mode area (VRMA) facility, the kernel will
re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)

4.77 KVM_S390_INTERRUPT

Capability: basic
Architectures: s390
Type: vm ioctl, vcpu ioctl
Parameters: struct kvm_s390_interrupt (in)
Returns: 0 on success, -1 on error

Allows to inject an interrupt to the guest. Interrupts can be floating
(vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.

Interrupt parameters are passed via kvm_s390_interrupt:

struct kvm_s390_interrupt {
	__u32 type;
	__u32 parm;
	__u64 parm64;
};

type can be one of the following:

KVM_S390_SIGP_STOP (vcpu) - sigp restart
KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
KVM_S390_RESTART (vcpu) - restart
KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
			   parameters in parm and parm64
KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
    I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
    I/O interruption parameters in parm (subchannel) and parm64 (intparm,
    interruption subclass)
KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
                           machine check interrupt code in parm64 (note that
                           machine checks needing further payload are not
                           supported by this ioctl)

Note that the vcpu ioctl is asynchronous to vcpu execution.

4.78 KVM_PPC_GET_HTAB_FD

Capability: KVM_CAP_PPC_HTAB_FD
Architectures: powerpc
Type: vm ioctl
Parameters: Pointer to struct kvm_get_htab_fd (in)
Returns: file descriptor number (>= 0) on success, -1 on error

This returns a file descriptor that can be used either to read out the
entries in the guest's hashed page table (HPT), or to write entries to
initialize the HPT.  The returned fd can only be written to if the
KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
can only be read if that bit is clear.  The argument struct looks like
this:

/* For KVM_PPC_GET_HTAB_FD */
struct kvm_get_htab_fd {
	__u64	flags;
	__u64	start_index;
	__u64	reserved[2];
};

/* Values for kvm_get_htab_fd.flags */
#define KVM_GET_HTAB_BOLTED_ONLY	((__u64)0x1)
#define KVM_GET_HTAB_WRITE		((__u64)0x2)

The `start_index' field gives the index in the HPT of the entry at
which to start reading.  It is ignored when writing.

Reads on the fd will initially supply information about all
"interesting" HPT entries.  Interesting entries are those with the
bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
all entries.  When the end of the HPT is reached, the read() will
return.  If read() is called again on the fd, it will start again from
the beginning of the HPT, but will only return HPT entries that have
changed since they were last read.

Data read or written is structured as a header (8 bytes) followed by a
series of valid HPT entries (16 bytes) each.  The header indicates how
many valid HPT entries there are and how many invalid entries follow
the valid entries.  The invalid entries are not represented explicitly
in the stream.  The header format is:

struct kvm_get_htab_header {
	__u32	index;
	__u16	n_valid;
	__u16	n_invalid;
};

Writes to the fd create HPT entries starting at the index given in the
header; first `n_valid' valid entries with contents from the data
written, then `n_invalid' invalid entries, invalidating any previously
valid entries found.

4.79 KVM_CREATE_DEVICE

Capability: KVM_CAP_DEVICE_CTRL
Type: vm ioctl
Parameters: struct kvm_create_device (in/out)
Returns: 0 on success, -1 on error
Errors:
  ENODEV: The device type is unknown or unsupported
  EEXIST: Device already created, and this type of device may not
          be instantiated multiple times

  Other error conditions may be defined by individual device types or
  have their standard meanings.

Creates an emulated device in the kernel.  The file descriptor returned
in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.

If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
device type is supported (not necessarily whether it can be created
in the current vm).

Individual devices should not define flags.  Attributes should be used
for specifying any behavior that is not implied by the device type
number.

struct kvm_create_device {
	__u32	type;	/* in: KVM_DEV_TYPE_xxx */
	__u32	fd;	/* out: device handle */
	__u32	flags;	/* in: KVM_CREATE_DEVICE_xxx */
};

4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR

Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
Type: device ioctl, vm ioctl
Parameters: struct kvm_device_attr
Returns: 0 on success, -1 on error
Errors:
  ENXIO:  The group or attribute is unknown/unsupported for this device
  EPERM:  The attribute cannot (currently) be accessed this way
          (e.g. read-only attribute, or attribute that only makes
          sense when the device is in a different state)

  Other error conditions may be defined by individual device types.

Gets/sets a specified piece of device configuration and/or state.  The
semantics are device-specific.  See individual device documentation in
the "devices" directory.  As with ONE_REG, the size of the data
transferred is defined by the particular attribute.

struct kvm_device_attr {
	__u32	flags;		/* no flags currently defined */
	__u32	group;		/* device-defined */
	__u64	attr;		/* group-defined */
	__u64	addr;		/* userspace address of attr data */
};

4.81 KVM_HAS_DEVICE_ATTR

Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
Type: device ioctl, vm ioctl
Parameters: struct kvm_device_attr
Returns: 0 on success, -1 on error
Errors:
  ENXIO:  The group or attribute is unknown/unsupported for this device

Tests whether a device supports a particular attribute.  A successful
return indicates the attribute is implemented.  It does not necessarily
indicate that the attribute can be read or written in the device's
current state.  "addr" is ignored.

4.82 KVM_ARM_VCPU_INIT

Capability: basic
Architectures: arm, arm64
Type: vcpu ioctl
Parameters: struct kvm_vcpu_init (in)
Returns: 0 on success; -1 on error
Errors:
  EINVAL:    the target is unknown, or the combination of features is invalid.
  ENOENT:    a features bit specified is unknown.

This tells KVM what type of CPU to present to the guest, and what
optional features it should have.  This will cause a reset of the cpu
registers to their initial values.  If this is not called, KVM_RUN will
return ENOEXEC for that vcpu.

Note that because some registers reflect machine topology, all vcpus
should be created before this ioctl is invoked.

Possible features:
	- KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
	  Depends on KVM_CAP_ARM_PSCI.
	- KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
	  Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
	- KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU.
	  Depends on KVM_CAP_ARM_PSCI_0_2.


4.83 KVM_ARM_PREFERRED_TARGET

Capability: basic
Architectures: arm, arm64
Type: vm ioctl
Parameters: struct struct kvm_vcpu_init (out)
Returns: 0 on success; -1 on error
Errors:
  ENODEV:    no preferred target available for the host

This queries KVM for preferred CPU target type which can be emulated
by KVM on underlying host.

The ioctl returns struct kvm_vcpu_init instance containing information
about preferred CPU target type and recommended features for it.  The
kvm_vcpu_init->features bitmap returned will have feature bits set if
the preferred target recommends setting these features, but this is
not mandatory.

The information returned by this ioctl can be used to prepare an instance
of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
in VCPU matching underlying host.


4.84 KVM_GET_REG_LIST

Capability: basic
Architectures: arm, arm64
Type: vcpu ioctl
Parameters: struct kvm_reg_list (in/out)
Returns: 0 on success; -1 on error
Errors:
  E2BIG:     the reg index list is too big to fit in the array specified by
             the user (the number required will be written into n).

struct kvm_reg_list {
	__u64 n; /* number of registers in reg[] */
	__u64 reg[0];
};

This ioctl returns the guest registers that are supported for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.


4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)

Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
Architectures: arm, arm64
Type: vm ioctl
Parameters: struct kvm_arm_device_address (in)
Returns: 0 on success, -1 on error
Errors:
  ENODEV: The device id is unknown
  ENXIO:  Device not supported on current system
  EEXIST: Address already set
  E2BIG:  Address outside guest physical address space
  EBUSY:  Address overlaps with other device range

struct kvm_arm_device_addr {
	__u64 id;
	__u64 addr;
};

Specify a device address in the guest's physical address space where guests
can access emulated or directly exposed devices, which the host kernel needs
to know about. The id field is an architecture specific identifier for a
specific device.

ARM/arm64 divides the id field into two parts, a device id and an
address type id specific to the individual device.

  bits:  | 63        ...       32 | 31    ...    16 | 15    ...    0 |
  field: |        0x00000000      |     device id   |  addr type id  |

ARM/arm64 currently only require this when using the in-kernel GIC
support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2
as the device id.  When setting the base address for the guest's
mapping of the VGIC virtual CPU and distributor interface, the ioctl
must be called after calling KVM_CREATE_IRQCHIP, but before calling
KVM_RUN on any of the VCPUs.  Calling this ioctl twice for any of the
base addresses will return -EEXIST.

Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API
should be used instead.


4.86 KVM_PPC_RTAS_DEFINE_TOKEN

Capability: KVM_CAP_PPC_RTAS
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_rtas_token_args
Returns: 0 on success, -1 on error

Defines a token value for a RTAS (Run Time Abstraction Services)
service in order to allow it to be handled in the kernel.  The
argument struct gives the name of the service, which must be the name
of a service that has a kernel-side implementation.  If the token
value is non-zero, it will be associated with that service, and
subsequent RTAS calls by the guest specifying that token will be
handled by the kernel.  If the token value is 0, then any token
associated with the service will be forgotten, and subsequent RTAS
calls by the guest for that service will be passed to userspace to be
handled.


5. The kvm_run structure
------------------------

Application code obtains a pointer to the kvm_run structure by
mmap()ing a vcpu fd.  From that point, application code can control
execution by changing fields in kvm_run prior to calling the KVM_RUN
ioctl, and obtain information about the reason KVM_RUN returned by
looking up structure members.

struct kvm_run {
	/* in */
	__u8 request_interrupt_window;

Request that KVM_RUN return when it becomes possible to inject external
interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.

	__u8 padding1[7];

	/* out */
	__u32 exit_reason;

When KVM_RUN has returned successfully (return value 0), this informs
application code why KVM_RUN has returned.  Allowable values for this
field are detailed below.

	__u8 ready_for_interrupt_injection;

If request_interrupt_window has been specified, this field indicates
an interrupt can be injected now with KVM_INTERRUPT.

	__u8 if_flag;

The value of the current interrupt flag.  Only valid if in-kernel
local APIC is not used.

	__u8 padding2[2];

	/* in (pre_kvm_run), out (post_kvm_run) */
	__u64 cr8;

The value of the cr8 register.  Only valid if in-kernel local APIC is
not used.  Both input and output.

	__u64 apic_base;

The value of the APIC BASE msr.  Only valid if in-kernel local
APIC is not used.  Both input and output.

	union {
		/* KVM_EXIT_UNKNOWN */
		struct {
			__u64 hardware_exit_reason;
		} hw;

If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
reasons.  Further architecture-specific information is available in
hardware_exit_reason.

		/* KVM_EXIT_FAIL_ENTRY */
		struct {
			__u64 hardware_entry_failure_reason;
		} fail_entry;

If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
to unknown reasons.  Further architecture-specific information is
available in hardware_entry_failure_reason.

		/* KVM_EXIT_EXCEPTION */
		struct {
			__u32 exception;
			__u32 error_code;
		} ex;

Unused.

		/* KVM_EXIT_IO */
		struct {
#define KVM_EXIT_IO_IN  0
#define KVM_EXIT_IO_OUT 1
			__u8 direction;
			__u8 size; /* bytes */
			__u16 port;
			__u32 count;
			__u64 data_offset; /* relative to kvm_run start */
		} io;

If exit_reason is KVM_EXIT_IO, then the vcpu has
executed a port I/O instruction which could not be satisfied by kvm.
data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
where kvm expects application code to place the data for the next
KVM_RUN invocation (KVM_EXIT_IO_IN).  Data format is a packed array.

		struct {
			struct kvm_debug_exit_arch arch;
		} debug;

Unused.

		/* KVM_EXIT_MMIO */
		struct {
			__u64 phys_addr;
			__u8  data[8];
			__u32 len;
			__u8  is_write;
		} mmio;

If exit_reason is KVM_EXIT_MMIO, then the vcpu has
executed a memory-mapped I/O instruction which could not be satisfied
by kvm.  The 'data' member contains the written data if 'is_write' is
true, and should be filled by application code otherwise.

The 'data' member contains, in its first 'len' bytes, the value as it would
appear if the VCPU performed a load or store of the appropriate width directly
to the byte array.

NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
      KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN.  The kernel side will first finish
incomplete operations and then check for pending signals.  Userspace
can re-enter the guest with an unmasked signal pending to complete
pending operations.

		/* KVM_EXIT_HYPERCALL */
		struct {
			__u64 nr;
			__u64 args[6];
			__u64 ret;
			__u32 longmode;
			__u32 pad;
		} hypercall;

Unused.  This was once used for 'hypercall to userspace'.  To implement
such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.

		/* KVM_EXIT_TPR_ACCESS */
		struct {
			__u64 rip;
			__u32 is_write;
			__u32 pad;
		} tpr_access;

To be documented (KVM_TPR_ACCESS_REPORTING).

		/* KVM_EXIT_S390_SIEIC */
		struct {
			__u8 icptcode;
			__u64 mask; /* psw upper half */
			__u64 addr; /* psw lower half */
			__u16 ipa;
			__u32 ipb;
		} s390_sieic;

s390 specific.

		/* KVM_EXIT_S390_RESET */
#define KVM_S390_RESET_POR       1
#define KVM_S390_RESET_CLEAR     2
#define KVM_S390_RESET_SUBSYSTEM 4
#define KVM_S390_RESET_CPU_INIT  8
#define KVM_S390_RESET_IPL       16
		__u64 s390_reset_flags;

s390 specific.

		/* KVM_EXIT_S390_UCONTROL */
		struct {
			__u64 trans_exc_code;
			__u32 pgm_code;
		} s390_ucontrol;

s390 specific. A page fault has occurred for a user controlled virtual
machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
resolved by the kernel.
The program code and the translation exception code that were placed
in the cpu's lowcore are presented here as defined by the z Architecture
Principles of Operation Book in the Chapter for Dynamic Address Translation
(DAT)

		/* KVM_EXIT_DCR */
		struct {
			__u32 dcrn;
			__u32 data;
			__u8  is_write;
		} dcr;

powerpc specific.

		/* KVM_EXIT_OSI */
		struct {
			__u64 gprs[32];
		} osi;

MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
hypercalls and exit with this exit struct that contains all the guest gprs.

If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
Userspace can now handle the hypercall and when it's done modify the gprs as
necessary. Upon guest entry all guest GPRs will then be replaced by the values
in this struct.

		/* KVM_EXIT_PAPR_HCALL */
		struct {
			__u64 nr;
			__u64 ret;
			__u64 args[9];
		} papr_hcall;

This is used on 64-bit PowerPC when emulating a pSeries partition,
e.g. with the 'pseries' machine type in qemu.  It occurs when the
guest does a hypercall using the 'sc 1' instruction.  The 'nr' field
contains the hypercall number (from the guest R3), and 'args' contains
the arguments (from the guest R4 - R12).  Userspace should put the
return code in 'ret' and any extra returned values in args[].
The possible hypercalls are defined in the Power Architecture Platform
Requirements (PAPR) document available from www.power.org (free
developer registration required to access it).

		/* KVM_EXIT_S390_TSCH */
		struct {
			__u16 subchannel_id;
			__u16 subchannel_nr;
			__u32 io_int_parm;
			__u32 io_int_word;
			__u32 ipb;
			__u8 dequeued;
		} s390_tsch;

s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
interrupt for the target subchannel has been dequeued and subchannel_id,
subchannel_nr, io_int_parm and io_int_word contain the parameters for that
interrupt. ipb is needed for instruction parameter decoding.

		/* KVM_EXIT_EPR */
		struct {
			__u32 epr;
		} epr;

On FSL BookE PowerPC chips, the interrupt controller has a fast patch
interrupt acknowledge path to the core. When the core successfully
delivers an interrupt, it automatically populates the EPR register with
the interrupt vector number and acknowledges the interrupt inside
the interrupt controller.

In case the interrupt controller lives in user space, we need to do
the interrupt acknowledge cycle through it to fetch the next to be
delivered interrupt vector using this exit.

It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
external interrupt has just been delivered into the guest. User space
should put the acknowledged interrupt vector into the 'epr' field.

		/* Fix the size of the union. */
		char padding[256];
	};

	/*
	 * shared registers between kvm and userspace.
	 * kvm_valid_regs specifies the register classes set by the host
	 * kvm_dirty_regs specified the register classes dirtied by userspace
	 * struct kvm_sync_regs is architecture specific, as well as the
	 * bits for kvm_valid_regs and kvm_dirty_regs
	 */
	__u64 kvm_valid_regs;
	__u64 kvm_dirty_regs;
	union {
		struct kvm_sync_regs regs;
		char padding[1024];
	} s;

If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
certain guest registers without having to call SET/GET_*REGS. Thus we can
avoid some system call overhead if userspace has to handle the exit.
Userspace can query the validity of the structure by checking
kvm_valid_regs for specific bits. These bits are architecture specific
and usually define the validity of a groups of registers. (e.g. one bit
 for general purpose registers)

};


4.81 KVM_GET_EMULATED_CPUID

Capability: KVM_CAP_EXT_EMUL_CPUID
Architectures: x86
Type: system ioctl
Parameters: struct kvm_cpuid2 (in/out)
Returns: 0 on success, -1 on error

struct kvm_cpuid2 {
	__u32 nent;
	__u32 flags;
	struct kvm_cpuid_entry2 entries[0];
};

The member 'flags' is used for passing flags from userspace.

#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
#define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
#define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)

struct kvm_cpuid_entry2 {
	__u32 function;
	__u32 index;
	__u32 flags;
	__u32 eax;
	__u32 ebx;
	__u32 ecx;
	__u32 edx;
	__u32 padding[3];
};

This ioctl returns x86 cpuid features which are emulated by
kvm.Userspace can use the information returned by this ioctl to query
which features are emulated by kvm instead of being present natively.

Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
structure with the 'nent' field indicating the number of entries in
the variable-size array 'entries'. If the number of entries is too low
to describe the cpu capabilities, an error (E2BIG) is returned. If the
number is too high, the 'nent' field is adjusted and an error (ENOMEM)
is returned. If the number is just right, the 'nent' field is adjusted
to the number of valid entries in the 'entries' array, which is then
filled.

The entries returned are the set CPUID bits of the respective features
which kvm emulates, as returned by the CPUID instruction, with unknown
or unsupported feature bits cleared.

Features like x2apic, for example, may not be present in the host cpu
but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
emulated efficiently and thus not included here.

The fields in each entry are defined as follows:

  function: the eax value used to obtain the entry
  index: the ecx value used to obtain the entry (for entries that are
         affected by ecx)
  flags: an OR of zero or more of the following:
        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
           if the index field is valid
        KVM_CPUID_FLAG_STATEFUL_FUNC:
           if cpuid for this function returns different values for successive
           invocations; there will be several entries with the same function,
           all with this flag set
        KVM_CPUID_FLAG_STATE_READ_NEXT:
           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
           the first entry to be read by a cpu
   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
         this function/index combination


6. Capabilities that can be enabled
-----------------------------------

There are certain capabilities that change the behavior of the virtual CPU when
enabled. To enable them, please see section 4.37. Below you can find a list of
capabilities and what their effect on the vCPU is when enabling them.

The following information is provided along with the description:

  Architectures: which instruction set architectures provide this ioctl.
      x86 includes both i386 and x86_64.

  Parameters: what parameters are accepted by the capability.

  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
      are not detailed, but errors with specific meanings are.


6.1 KVM_CAP_PPC_OSI

Architectures: ppc
Parameters: none
Returns: 0 on success; -1 on error

This capability enables interception of OSI hypercalls that otherwise would
be treated as normal system calls to be injected into the guest. OSI hypercalls
were invented by Mac-on-Linux to have a standardized communication mechanism
between the guest and the host.

When this capability is enabled, KVM_EXIT_OSI can occur.


6.2 KVM_CAP_PPC_PAPR

Architectures: ppc
Parameters: none
Returns: 0 on success; -1 on error

This capability enables interception of PAPR hypercalls. PAPR hypercalls are
done using the hypercall instruction "sc 1".

It also sets the guest privilege level to "supervisor" mode. Usually the guest
runs in "hypervisor" privilege mode with a few missing features.

In addition to the above, it changes the semantics of SDR1. In this mode, the
HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
HTAB invisible to the guest.

When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.


6.3 KVM_CAP_SW_TLB

Architectures: ppc
Parameters: args[0] is the address of a struct kvm_config_tlb
Returns: 0 on success; -1 on error

struct kvm_config_tlb {
	__u64 params;
	__u64 array;
	__u32 mmu_type;
	__u32 array_len;
};

Configures the virtual CPU's TLB array, establishing a shared memory area
between userspace and KVM.  The "params" and "array" fields are userspace
addresses of mmu-type-specific data structures.  The "array_len" field is an
safety mechanism, and should be set to the size in bytes of the memory that
userspace has reserved for the array.  It must be at least the size dictated
by "mmu_type" and "params".

While KVM_RUN is active, the shared region is under control of KVM.  Its
contents are undefined, and any modification by userspace results in
boundedly undefined behavior.

On return from KVM_RUN, the shared region will reflect the current state of
the guest's TLB.  If userspace makes any changes, it must call KVM_DIRTY_TLB
to tell KVM which entries have been changed, prior to calling KVM_RUN again
on this vcpu.

For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
 - The "params" field is of type "struct kvm_book3e_206_tlb_params".
 - The "array" field points to an array of type "struct
   kvm_book3e_206_tlb_entry".
 - The array consists of all entries in the first TLB, followed by all
   entries in the second TLB.
 - Within a TLB, entries are ordered first by increasing set number.  Within a
   set, entries are ordered by way (increasing ESEL).
 - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
 - The tsize field of mas1 shall be set to 4K on TLB0, even though the
   hardware ignores this value for TLB0.

6.4 KVM_CAP_S390_CSS_SUPPORT

Architectures: s390
Parameters: none
Returns: 0 on success; -1 on error

This capability enables support for handling of channel I/O instructions.

TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
handled in-kernel, while the other I/O instructions are passed to userspace.

When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
SUBCHANNEL intercepts.

6.5 KVM_CAP_PPC_EPR

Architectures: ppc
Parameters: args[0] defines whether the proxy facility is active
Returns: 0 on success; -1 on error

This capability enables or disables the delivery of interrupts through the
external proxy facility.

When enabled (args[0] != 0), every time the guest gets an external interrupt
delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
to receive the topmost interrupt vector.

When disabled (args[0] == 0), behavior is as if this facility is unsupported.

When this capability is enabled, KVM_EXIT_EPR can occur.

6.6 KVM_CAP_IRQ_MPIC

Architectures: ppc
Parameters: args[0] is the MPIC device fd
            args[1] is the MPIC CPU number for this vcpu

This capability connects the vcpu to an in-kernel MPIC device.

6.7 KVM_CAP_IRQ_XICS

Architectures: ppc
Parameters: args[0] is the XICS device fd
            args[1] is the XICS CPU number (server ID) for this vcpu

This capability connects the vcpu to an in-kernel XICS device.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								The Definitive KVM (Kernel-based Virtual Machine) API Documentation
 								===================================================================
 . General description
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
+								----------------------
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								The kvm API is a set of ioctls that are issued to control various aspects
 								of a virtual machine.  The ioctls belong to three classes
 								 - System ioctls: These query and set global attributes which affect the
 								   whole kvm subsystem.  In addition a system ioctl is used to create
 								   virtual machines
 								 - VM ioctls: These query and set attributes that affect an entire virtual
 								   machine, for example memory layout.  In addition a VM ioctl is used to
 								   create virtual cpus (vcpus).
 								   Only run VM ioctls from the same process (address space) that was used
 								   to create the VM.
 								 - vcpu ioctls: These query and set attributes that control the operation
 								   of a single virtual cpu.
 								   Only run vcpu ioctls from the same thread that was used to create the
 								   vcpu.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: trivial document fixes

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2009-12-24 01:04:16 +00:00
+. File descriptors
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
+								-------------------
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								The kvm API is centered around file descriptors.  An initial
 								open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
 								can be used to issue system ioctls.  A KVM_CREATE_VM ioctl on this
-												KVM: trivial document fixes

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2009-12-24 01:04:16 +00:00
+								handle will create a VM file descriptor which can be used to issue VM
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								ioctls.  A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
 								and return a file descriptor pointing to it.  Finally, ioctls on a vcpu
 								fd can be used to control the vcpu, including the important task of
 								actually running guest code.
 								In general file descriptors can be migrated among processes by means
 								of fork() and the SCM_RIGHTS facility of unix domain socket.  These
 								kinds of tricks are explicitly not supported by kvm.  While they will
 								not cause harm to the host, their actual behavior is not guaranteed by
 								the API.  The only supported use is one virtual machine per process,
 								and one vcpu per thread.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+. Extensions
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
+								-------------
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
 								incompatible change are allowed.  However, there is an extension
 								facility that allows backward-compatible extensions to the API to be
 								queried and used.
-												doc: Fix typo in doucmentations

Correct typo (double words) in documentations.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2013-07-17 16:29:12 +00:00
+								The extension mechanism is not based on the Linux version number.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Instead, kvm defines extension identifiers and a facility to query
 								whether a particular extension identifier is available.  If it is, a
 								set of ioctls is available for application use.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+. API description
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
+								------------------
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								This section describes ioctls that can be used to control kvm guests.
 								For each ioctl, the following information is provided along with a
 								description:
 								  Capability: which KVM extension provides this ioctl.  Can be 'basic',
 								      which means that is will be provided by any kernel that supports
 								      API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
 								      means availability needs to be checked with KVM_CHECK_EXTENSION
 								      (see section 4.4).
 								  Architectures: which instruction set architectures provide this ioctl.
 								      x86 includes both i386 and x86_64.
 								  Type: system, vm, or vcpu.
 								  Parameters: what parameters are accepted by the ioctl.
 								  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
 								      are not detailed, but errors with specific meanings are.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+.1 KVM_GET_API_VERSION
 								Capability: basic
 								Architectures: all
 								Type: system ioctl
 								Parameters: none
 								Returns: the constant KVM_API_VERSION (=12)
 								This identifies the API version as the stable kvm API. It is not
 								expected that this number will change.  However, Linux 2.6.20 and
 .6.21 report earlier versions; these are not documented and not
 								supported.  Applications should refuse to run if KVM_GET_API_VERSION
 								returns a value other than 12.  If this check passes, all ioctls
 								described as 'basic' will be available.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+.2 KVM_CREATE_VM
 								Capability: basic
 								Architectures: all
 								Type: system ioctl
-												KVM: s390: add parameter for KVM_CREATE_VM

This patch introduces a new config option for user controlled kernel
virtual machines. It introduces a parameter to KVM_CREATE_VM that
allows to set bits that alter the capabilities of the newly created
virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
This requires CAP_SYS_ADMIN privileges and creates a user controlled
virtual machine on s390 architectures.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:20 +00:00
+								Parameters: machine type identifier (KVM_VM_*)
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Returns: a VM fd that can be used to control the new virtual machine.
 								The new VM has no virtual cpus and no memory.  An mmap() of a VM fd
 								will access the virtual machine's physical address space; offset zero
 								corresponds to guest physical address zero.  Use of mmap() on a VM fd
 								is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
 								available.
-												KVM: s390: add parameter for KVM_CREATE_VM

This patch introduces a new config option for user controlled kernel
virtual machines. It introduces a parameter to KVM_CREATE_VM that
allows to set bits that alter the capabilities of the newly created
virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
This requires CAP_SYS_ADMIN privileges and creates a user controlled
virtual machine on s390 architectures.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:20 +00:00
+								You most certainly want to use 0 as machine type.
 								In order to create user controlled virtual machines on S390, check
 								KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
 								privileged user (CAP_SYS_ADMIN).
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+.3 KVM_GET_MSR_INDEX_LIST
 								Capability: basic
 								Architectures: x86
 								Type: system
 								Parameters: struct kvm_msr_list (in/out)
 								Returns: 0 on success; -1 on error
 								Errors:
 								  E2BIG:     the msr index list is to be to fit in the array specified by
 								             the user.
 								struct kvm_msr_list {
 									__u32 nmsrs; /* number of msrs in entries */
 									__u32 indices[0];
 								};
 								This ioctl returns the guest msrs that are supported.  The list varies
 								by kvm version and host processor, but does not change otherwise.  The
 								user fills in the size of the indices array in nmsrs, and in return
 								kvm adjusts nmsrs to reflect the actual number of msrs and fills in
 								the indices array with their numbers.
-												KVM: Document MCE banks non-exposure via KVM_GET_MSR_INDEX_LIST

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-07-07 11:09:39 +00:00
+								Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
 								not returned in the MSR list, as different vcpus can have a different number
 								of banks, as set via the KVM_X86_SETUP_MCE ioctl.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+.4 KVM_CHECK_EXTENSION
 								Capability: basic
 								Architectures: all
 								Type: system ioctl
 								Parameters: extension identifier (KVM_CAP_*)
 								Returns: 0 if unsupported; 1 (or some other positive integer) if supported
 								The API allows the application to query about extensions to the core
 								kvm API.  Userspace passes an extension identifier (an integer) and
 								receives an integer that describes the extension availability.
 								Generally 0 means no and 1 means yes, but some extensions may report
 								additional information in the integer return value.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+.5 KVM_GET_VCPU_MMAP_SIZE
 								Capability: basic
 								Architectures: all
 								Type: system ioctl
 								Parameters: none
 								Returns: size of vcpu mmap area, in bytes
 								The KVM_RUN ioctl (cf.) communicates with userspace via a shared
 								memory region.  This ioctl returns the size of that region.  See the
 								KVM_RUN documentation for details.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+.6 KVM_SET_MEMORY_REGION
 								Capability: basic
 								Architectures: all
 								Type: vm ioctl
 								Parameters: struct kvm_memory_region (in)
 								Returns: 0 on success, -1 on error
-												KVM: Remove kernel-allocated memory regions

Equivalent (and better) functionality is provided by user-allocated memory
regions.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-06-21 08:48:05 +00:00
+								This ioctl is obsolete and has been removed.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.7 KVM_CREATE_VCPU
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: all
 								Type: vm ioctl
 								Parameters: vcpu id (apic id on x86)
 								Returns: vcpu fd on success, -1 on error
 								This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
-												KVM: x86: Raise the hard VCPU count limit

The patch raises the hard limit of VCPU count to 254.

This will allow developers to easily work on scalability
and will allow users to test high VCPU setups easily without
patching the kernel.

To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
now returns the recommended VCPU limit (which is still 64) - this
should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
returns the hard limit which is now 254.

Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Suggested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-07-18 14:17:15 +00:00
+								in the range [0, max_vcpus).
 								The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
 								the KVM_CHECK_EXTENSION ioctl() at run-time.
 								The maximum possible value for max_vcpus can be retrieved using the
 								KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
-												KVM: Add documentation for KVM_CAP_NR_VCPUS

Document KVM_CAP_NR_VCPUS that can be used by the userspace to determine
maximum number of VCPUs it can create with the KVM_CREATE_VCPU ioctl.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-05-09 19:48:54 +00:00
+								If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
 								cpus max.
-												KVM: x86: Raise the hard VCPU count limit

The patch raises the hard limit of VCPU count to 254.

This will allow developers to easily work on scalability
and will allow users to test high VCPU setups easily without
patching the kernel.

To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
now returns the recommended VCPU limit (which is still 64) - this
should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
returns the hard limit which is now 254.

Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Suggested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-07-18 14:17:15 +00:00
+								If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is
 								same as the value returned from KVM_CAP_NR_VCPUS.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
-												KVM: PPC: Allow book3s_hv guests to use SMT processor modes

This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7.  The host still has to run single-threaded.

This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability.  The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.

To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode.  KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline).  To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c.  In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it.  Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.

When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host.  This number is exported
to userspace via the KVM_CAP_PPC_SMT capability.  If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.

We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host.  We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked.  This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.

When a vcore starts to run, it executes in the context of one of the
vcpu threads.  The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).

It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running.  In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest.  It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.

Note that there is no fixed relationship between the hardware thread
number and the vcpu number.  Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2011-06-29 00:23:08 +00:00
+								On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
 								threads in one or more virtual CPU cores.  (This is because the
 								hardware requires all the hardware threads in a CPU core to be in the
 								same partition.)  The KVM_CAP_PPC_SMT capability indicates the number
-												KVM: Restore missing powerpc API docs

Commit 371fefd6 lost a doc hunk somehow, restore it.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-08-29 13:27:08 +00:00
+								of vcpus per virtual core (vcore).  The vcore id is obtained by
 								dividing the vcpu id by the number of vcpus per vcore.  The vcpus in a
 								given vcore will always be in the same physical core as each other
 								(though that might be a different physical core from time to time).
 								Userspace can control the threading (SMT) mode of the guest by its
 								allocation of vcpu ids.  For example, if userspace wants
 								single-threaded guest vcpus, it should make all vcpu ids be a multiple
 								of the number of vcpus per vcore.
-												KVM: s390: ucontrol: export SIE control block to user

This patch exports the s390 SIE hardware control block to userspace
via the mapping of the vcpu file descriptor. In order to do so,
a new arch callback named kvm_arch_vcpu_fault  is introduced for all
architectures. It allows to map architecture specific pages.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:23 +00:00
+								For virtual cpus that have been created with S390 user controlled virtual
 								machines, the resulting vcpu fd can be memory mapped at page offset
 								KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
 								cpu's hardware control block.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.8 KVM_GET_DIRTY_LOG (vm ioctl)
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_dirty_log (in/out)
 								Returns: 0 on success, -1 on error
 								/* for KVM_GET_DIRTY_LOG */
 								struct kvm_dirty_log {
 									__u32 slot;
 									__u32 padding;
 									union {
 										void __user *dirty_bitmap; /* one bit per page */
 										__u64 padding;
 									};
 								};
 								Given a memory slot, return a bitmap containing any pages dirtied
 								since the last call to this ioctl.  Bit 0 is the first page in the
 								memory slot.  Ensure the entire structure is cleared to avoid padding
 								issues.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.9 KVM_SET_MEMORY_ALIAS
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_memory_alias (in)
 								Returns: 0 (success), -1 (error)
-												KVM: Remove memory alias support

As advertised in feature-removal-schedule.txt.  Equivalent support is provided
by overlapping memory regions.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-06-21 08:44:20 +00:00
+								This ioctl is obsolete and has been removed.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.10 KVM_RUN
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: all
 								Type: vcpu ioctl
 								Parameters: none
 								Returns: 0 on success, -1 on error
 								Errors:
 								  EINTR:     an unmasked signal is pending
 								This ioctl is used to run a guest virtual cpu.  While there are no
 								explicit parameters, there is an implicit parameter block that can be
 								obtained by mmap()ing the vcpu fd at offset 0, with the size given by
 								KVM_GET_VCPU_MMAP_SIZE.  The parameter block is formatted as a 'struct
 								kvm_run' (see below).
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.11 KVM_GET_REGS
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								Architectures: all except ARM, arm64
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Type: vcpu ioctl
 								Parameters: struct kvm_regs (out)
 								Returns: 0 on success, -1 on error
 								Reads the general purpose registers from the vcpu.
 								/* x86 */
 								struct kvm_regs {
 									/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
 									__u64 rax, rbx, rcx, rdx;
 									__u64 rsi, rdi, rsp, rbp;
 									__u64 r8,  r9,  r10, r11;
 									__u64 r12, r13, r14, r15;
 									__u64 rip, rflags;
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.12 KVM_SET_REGS
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								Architectures: all except ARM, arm64
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Type: vcpu ioctl
 								Parameters: struct kvm_regs (in)
 								Returns: 0 on success, -1 on error
 								Writes the general purpose registers into the vcpu.
 								See KVM_GET_REGS for the data structure.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.13 KVM_GET_SREGS
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
-												KVM: PPC: booke: add sregs support

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2011-04-27 22:24:21 +00:00
+								Architectures: x86, ppc
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Type: vcpu ioctl
 								Parameters: struct kvm_sregs (out)
 								Returns: 0 on success, -1 on error
 								Reads special registers from the vcpu.
 								/* x86 */
 								struct kvm_sregs {
 									struct kvm_segment cs, ds, es, fs, gs, ss;
 									struct kvm_segment tr, ldt;
 									struct kvm_dtable gdt, idt;
 									__u64 cr0, cr2, cr3, cr4, cr8;
 									__u64 efer;
 									__u64 apic_base;
 									__u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
 								};
-												KVM: PPC: Fix SREGS documentation reference

Reflect the uapi folder change in SREGS API documentation.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-12-11 03:38:23 +00:00
+								/* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */
-												KVM: PPC: booke: add sregs support

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2011-04-27 22:24:21 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								interrupt_bitmap is a bitmap of pending external interrupts.  At most
 								one bit may be set.  This interrupt has been acknowledged by the APIC
 								but not yet injected into the cpu core.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.14 KVM_SET_SREGS
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
-												KVM: PPC: booke: add sregs support

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2011-04-27 22:24:21 +00:00
+								Architectures: x86, ppc
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Type: vcpu ioctl
 								Parameters: struct kvm_sregs (in)
 								Returns: 0 on success, -1 on error
 								Writes special registers into the vcpu.  See KVM_GET_SREGS for the
 								data structures.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.15 KVM_TRANSLATE
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_translation (in/out)
 								Returns: 0 on success, -1 on error
 								Translates a virtual address according to the vcpu's current address
 								translation mode.
 								struct kvm_translation {
 									/* in */
 									__u64 linear_address;
 									/* out */
 									__u64 physical_address;
 									__u8  valid;
 									__u8  writeable;
 									__u8  usermode;
 									__u8  pad[5];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.16 KVM_INTERRUPT
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
-												KVM: PPC: Document KVM_INTERRUPT ioctl

This adds some documentation for the KVM_INTERRUPT special cases that
PowerPC now implements.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2010-08-31 00:03:32 +00:00
+								Architectures: x86, ppc
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Type: vcpu ioctl
 								Parameters: struct kvm_interrupt (in)
 								Returns: 0 on success, -1 on error
 								Queues a hardware interrupt vector to be injected.  This is only
-												KVM: PPC: Document KVM_INTERRUPT ioctl

This adds some documentation for the KVM_INTERRUPT special cases that
PowerPC now implements.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2010-08-31 00:03:32 +00:00
+								useful if in-kernel local APIC or equivalent is not used.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								/* for KVM_INTERRUPT */
 								struct kvm_interrupt {
 									/* in */
 									__u32 irq;
 								};
-												KVM: PPC: Document KVM_INTERRUPT ioctl

This adds some documentation for the KVM_INTERRUPT special cases that
PowerPC now implements.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2010-08-31 00:03:32 +00:00
+								X86:
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								Note 'irq' is an interrupt vector, not an interrupt pin or line.
-												KVM: PPC: Document KVM_INTERRUPT ioctl

This adds some documentation for the KVM_INTERRUPT special cases that
PowerPC now implements.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2010-08-31 00:03:32 +00:00
+								PPC:
 								Queues an external interrupt to be injected. This ioctl is overleaded
 								with 3 different irq values:
 								a) KVM_INTERRUPT_SET
 								  This injects an edge type external interrupt into the guest once it's ready
 								  to receive interrupts. When injected, the interrupt is done.
 								b) KVM_INTERRUPT_UNSET
 								  This unsets any pending interrupt.
 								  Only available with KVM_CAP_PPC_UNSET_IRQ.
 								c) KVM_INTERRUPT_SET_LEVEL
 								  This injects a level type external interrupt into the guest context. The
 								  interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
 								  is triggered.
 								  Only available with KVM_CAP_PPC_IRQ_LEVEL.
 								Note that any value for 'irq' other than the ones stated above is invalid
 								and incurs unexpected behavior.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.17 KVM_DEBUG_GUEST
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: none
 								Type: vcpu ioctl
 								Parameters: none)
 								Returns: -1 on error
 								Support for this has been removed.  Use KVM_SET_GUEST_DEBUG instead.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.18 KVM_GET_MSRS
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_msrs (in/out)
 								Returns: 0 on success, -1 on error
 								Reads model-specific registers from the vcpu.  Supported msr indices can
 								be obtained using KVM_GET_MSR_INDEX_LIST.
 								struct kvm_msrs {
 									__u32 nmsrs; /* number of msrs in entries */
 									__u32 pad;
 									struct kvm_msr_entry entries[0];
 								};
 								struct kvm_msr_entry {
 									__u32 index;
 									__u32 reserved;
 									__u64 data;
 								};
 								Application code should set the 'nmsrs' member (which indicates the
 								size of the entries array) and the 'index' member of each array entry.
 								kvm will fill in the 'data' member.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.19 KVM_SET_MSRS
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_msrs (in)
 								Returns: 0 on success, -1 on error
 								Writes model-specific registers to the vcpu.  See KVM_GET_MSRS for the
 								data structures.
 								Application code should set the 'nmsrs' member (which indicates the
 								size of the entries array), and the 'index' and 'data' members of each
 								array entry.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.20 KVM_SET_CPUID
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_cpuid (in)
 								Returns: 0 on success, -1 on error
 								Defines the vcpu responses to the cpuid instruction.  Applications
 								should use the KVM_SET_CPUID2 ioctl if available.
 								struct kvm_cpuid_entry {
 									__u32 function;
 									__u32 eax;
 									__u32 ebx;
 									__u32 ecx;
 									__u32 edx;
 									__u32 padding;
 								};
 								/* for KVM_SET_CPUID */
 								struct kvm_cpuid {
 									__u32 nent;
 									__u32 padding;
 									struct kvm_cpuid_entry entries[0];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.21 KVM_SET_SIGNAL_MASK
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_signal_mask (in)
 								Returns: 0 on success, -1 on error
 								Defines which signals are blocked during execution of KVM_RUN.  This
 								signal mask temporarily overrides the threads signal mask.  Any
 								unblocked signal received (except SIGKILL and SIGSTOP, which retain
 								their traditional behaviour) will cause KVM_RUN to return with -EINTR.
 								Note the signal will only be delivered if not blocked by the original
 								signal mask.
 								/* for KVM_SET_SIGNAL_MASK */
 								struct kvm_signal_mask {
 									__u32 len;
 									__u8  sigset[0];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.22 KVM_GET_FPU
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_fpu (out)
 								Returns: 0 on success, -1 on error
 								Reads the floating point state from the vcpu.
 								/* for KVM_GET_FPU and KVM_SET_FPU */
 								struct kvm_fpu {
 									__u8  fpr[8][16];
 									__u16 fcw;
 									__u16 fsw;
 									__u8  ftwx;  /* in fxsave format */
 									__u8  pad1;
 									__u16 last_opcode;
 									__u64 last_ip;
 									__u64 last_dp;
 									__u8  xmm[16][16];
 									__u32 mxcsr;
 									__u32 pad2;
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.23 KVM_SET_FPU
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Capability: basic
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_fpu (in)
 								Returns: 0 on success, -1 on error
 								Writes the floating point state to the vcpu.
 								/* for KVM_GET_FPU and KVM_SET_FPU */
 								struct kvm_fpu {
 									__u8  fpr[8][16];
 									__u16 fcw;
 									__u16 fsw;
 									__u8  ftwx;  /* in fxsave format */
 									__u8  pad1;
 									__u16 last_opcode;
 									__u64 last_ip;
 									__u64 last_dp;
 									__u8  xmm[16][16];
 									__u32 mxcsr;
 									__u32 pad2;
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.24 KVM_CREATE_IRQCHIP
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
-												KVM: s390: irq routing for adapter interrupts.

Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-07-15 11:36:01 +00:00
+								Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
 								Architectures: x86, ia64, ARM, arm64, s390
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
+								Type: vm ioctl
 								Parameters: none
 								Returns: 0 on success, -1 on error
 								Creates an interrupt controller model in the kernel.  On x86, creates a virtual
 								ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
 								local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM/arm64, a GIC is
-												KVM: s390: irq routing for adapter interrupts.

Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-07-15 11:36:01 +00:00
+								created. On s390, a dummy irq routing table is created.
 								Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
 								before KVM_CREATE_IRQCHIP can be used.
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.25 KVM_IRQ_LINE
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
 								Capability: KVM_CAP_IRQCHIP
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								Architectures: x86, ia64, arm, arm64
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
+								Type: vm ioctl
 								Parameters: struct kvm_irq_level
 								Returns: 0 on success, -1 on error
 								Sets the level of a GSI input to the interrupt controller model in the kernel.
-												KVM: ARM: Inject IRQs and FIQs from userspace

All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic).  The IOCTL uses the follwing struct.

struct kvm_irq_level {
	union {
		__u32 irq;     /* GSI */
		__s32 status;  /* not used for KVM_IRQ_LEVEL */
	};
	__u32 level;           /* 0 or 1 */
};

ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus.  The irq field is interpreted like this:

  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |   irq_number   |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

This is documented in Documentation/kvm/api.txt.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:08 +00:00
+								On some architectures it is required that an interrupt controller model has
 								been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
 								interrupts require the level to be set to 1 and then back to 0.
-												kvm: x86: ignore ioapic polarity

Both QEMU and KVM have already accumulated a significant number of
optimizations based on the hard-coded assumption that ioapic polarity
will always use the ActiveHigh convention, where the logical and
physical states of level-triggered irq lines always match (i.e.,
active(asserted) == high == 1, inactive == low == 0). QEMU guests
are expected to follow directions given via ACPI and configure the
ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
QEMU will still use the ActiveHigh signaling convention when
interfacing with KVM.

This patch modifies KVM to completely ignore ioapic polarity as set by
the guest OS, enabling misbehaving guests to work alongside those which
comply with the ActiveHigh polarity specified by QEMU's ACPI tables.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
[Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

											
										
										
											2014-02-28 04:06:17 +00:00
+								On real hardware, interrupt pins can be active-low or active-high.  This
 								does not matter for the level field of struct kvm_irq_level: 1 always
 								means active (asserted), 0 means inactive (deasserted).
 								x86 allows the operating system to program the interrupt polarity
 								(active-low/active-high) for level-triggered interrupts, and KVM used
 								to consider the polarity.  However, due to bitrot in the handling of
 								active-low interrupts, the above convention is now valid on x86 too.
 								This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED.  Userspace
 								should not present interrupts to the guest as active-low unless this
 								capability is present (or unless it is not using the in-kernel irqchip,
 								of course).
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								ARM/arm64 can signal an interrupt either at the CPU level, or at the
 								in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
 								use PPIs designated for specific cpus.  The irq field is interpreted
 								like this:
-												KVM: ARM: Inject IRQs and FIQs from userspace

All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic).  The IOCTL uses the follwing struct.

struct kvm_irq_level {
	union {
		__u32 irq;     /* GSI */
		__s32 status;  /* not used for KVM_IRQ_LEVEL */
	};
	__u32 level;           /* 0 or 1 */
};

ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus.  The irq field is interpreted like this:

  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
  field: | irq_type  | vcpu_index |   irq_number   |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
               (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

This is documented in Documentation/kvm/api.txt.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:08 +00:00
 								  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
 								  field: | irq_type  | vcpu_index |     irq_id     |
 								The irq_type field has the following values:
 								- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
 								- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
 								               (the vcpu_index field is ignored)
 								- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
 								(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
-												kvm: x86: ignore ioapic polarity

Both QEMU and KVM have already accumulated a significant number of
optimizations based on the hard-coded assumption that ioapic polarity
will always use the ActiveHigh convention, where the logical and
physical states of level-triggered irq lines always match (i.e.,
active(asserted) == high == 1, inactive == low == 0). QEMU guests
are expected to follow directions given via ACPI and configure the
ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
QEMU will still use the ActiveHigh signaling convention when
interfacing with KVM.

This patch modifies KVM to completely ignore ioapic polarity as set by
the guest OS, enabling misbehaving guests to work alongside those which
comply with the ActiveHigh polarity specified by QEMU's ACPI tables.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
[Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

											
										
										
											2014-02-28 04:06:17 +00:00
+								In both cases, level is used to assert/deassert the line.
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
 								struct kvm_irq_level {
 									union {
 										__u32 irq;     /* GSI */
 										__s32 status;  /* not used for KVM_IRQ_LEVEL */
 									};
 									__u32 level;           /* 0 or 1 */
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.26 KVM_GET_IRQCHIP
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
 								Capability: KVM_CAP_IRQCHIP
 								Architectures: x86, ia64
 								Type: vm ioctl
 								Parameters: struct kvm_irqchip (in/out)
 								Returns: 0 on success, -1 on error
 								Reads the state of a kernel interrupt controller created with
 								KVM_CREATE_IRQCHIP into a buffer provided by the caller.
 								struct kvm_irqchip {
 									__u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
 									__u32 pad;
 								        union {
 										char dummy[512];  /* reserving space */
 										struct kvm_pic_state pic;
 										struct kvm_ioapic_state ioapic;
 									} chip;
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.27 KVM_SET_IRQCHIP
-												KVM: Document KVM_CAP_IRQCHIP

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-08-23 14:08:04 +00:00
 								Capability: KVM_CAP_IRQCHIP
 								Architectures: x86, ia64
 								Type: vm ioctl
 								Parameters: struct kvm_irqchip (in)
 								Returns: 0 on success, -1 on error
 								Sets the state of a kernel interrupt controller created with
 								KVM_CREATE_IRQCHIP from a buffer provided by the caller.
 								struct kvm_irqchip {
 									__u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
 									__u32 pad;
 								        union {
 										char dummy[512];  /* reserving space */
 										struct kvm_pic_state pic;
 										struct kvm_ioapic_state ioapic;
 									} chip;
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.28 KVM_XEN_HVM_CONFIG
-												KVM: Xen PV-on-HVM guest support

Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-10-15 22:21:43 +00:00
 								Capability: KVM_CAP_XEN_HVM
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_xen_hvm_config (in)
 								Returns: 0 on success, -1 on error
 								Sets the MSR that the Xen HVM guest uses to initialize its hypercall
 								page, and provides the starting address and size of the hypercall
 								blobs in userspace.  When the guest writes the MSR, kvm copies one
 								page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
 								memory.
 								struct kvm_xen_hvm_config {
 									__u32 flags;
 									__u32 msr;
 									__u64 blob_addr_32;
 									__u64 blob_addr_64;
 									__u8 blob_size_32;
 									__u8 blob_size_64;
 									__u8 pad2[30];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.29 KVM_GET_CLOCK
-												KVM: allow userspace to adjust kvmclock offset

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-10-16 19:28:36 +00:00
 								Capability: KVM_CAP_ADJUST_CLOCK
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_clock_data (out)
 								Returns: 0 on success, -1 on error
 								Gets the current timestamp of kvmclock as seen by the current guest. In
 								conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
 								such as migration.
 								struct kvm_clock_data {
 									__u64 clock;  /* kvmclock current value */
 									__u32 flags;
 									__u32 pad[9];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.30 KVM_SET_CLOCK
-												KVM: allow userspace to adjust kvmclock offset

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-10-16 19:28:36 +00:00
 								Capability: KVM_CAP_ADJUST_CLOCK
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_clock_data (in)
 								Returns: 0 on success, -1 on error
-												KVM: trivial document fixes

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2009-12-24 01:04:16 +00:00
+								Sets the current timestamp of kvmclock to the value specified in its parameter.
-												KVM: allow userspace to adjust kvmclock offset

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-10-16 19:28:36 +00:00
+								In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
 								such as migration.
 								struct kvm_clock_data {
 									__u64 clock;  /* kvmclock current value */
 									__u32 flags;
 									__u32 pad[9];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.31 KVM_GET_VCPU_EVENTS
-												KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-11-12 00:04:25 +00:00
 								Capability: KVM_CAP_VCPU_EVENTS
-												KVM: x86: Save&restore interrupt shadow mask

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-19 18:38:07 +00:00
+								Extended by: KVM_CAP_INTR_SHADOW
-												KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-11-12 00:04:25 +00:00
+								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_vcpu_event (out)
 								Returns: 0 on success, -1 on error
 								Gets currently pending exceptions, interrupts, and NMIs as well as related
 								states of the vcpu.
 								struct kvm_vcpu_events {
 									struct {
 										__u8 injected;
 										__u8 nr;
 										__u8 has_error_code;
 										__u8 pad;
 										__u32 error_code;
 									} exception;
 									struct {
 										__u8 injected;
 										__u8 nr;
 										__u8 soft;
-												KVM: x86: Save&restore interrupt shadow mask

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-19 18:38:07 +00:00
+										__u8 shadow;
-												KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-11-12 00:04:25 +00:00
+									} interrupt;
 									struct {
 										__u8 injected;
 										__u8 pending;
 										__u8 masked;
 										__u8 pad;
 									} nmi;
 									__u32 sipi_vector;
-												KVM: x86: Extend KVM_SET_VCPU_EVENTS with selective updates

User space may not want to overwrite asynchronously changing VCPU event
states on write-back. So allow to skip nmi.pending and sipi_vector by
setting corresponding bits in the flags field of kvm_vcpu_events.

[avi: advertise the bits in KVM_GET_VCPU_EVENTS]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-12-06 17:24:15 +00:00
+									__u32 flags;
-												KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-11-12 00:04:25 +00:00
+								};
-												KVM: x86: Save&restore interrupt shadow mask

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-19 18:38:07 +00:00
+								KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
 								interrupt.shadow contains a valid state. Otherwise, this field is undefined.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.32 KVM_SET_VCPU_EVENTS
-												KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-11-12 00:04:25 +00:00
 								Capability: KVM_CAP_VCPU_EVENTS
-												KVM: x86: Save&restore interrupt shadow mask

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-19 18:38:07 +00:00
+								Extended by: KVM_CAP_INTR_SHADOW
-												KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-11-12 00:04:25 +00:00
+								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_vcpu_event (in)
 								Returns: 0 on success, -1 on error
 								Set pending exceptions, interrupts, and NMIs as well as related states of the
 								vcpu.
 								See KVM_GET_VCPU_EVENTS for the data structure.
-												KVM: x86: Extend KVM_SET_VCPU_EVENTS with selective updates

User space may not want to overwrite asynchronously changing VCPU event
states on write-back. So allow to skip nmi.pending and sipi_vector by
setting corresponding bits in the flags field of kvm_vcpu_events.

[avi: advertise the bits in KVM_GET_VCPU_EVENTS]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-12-06 17:24:15 +00:00
+								Fields that may be modified asynchronously by running VCPUs can be excluded
 								from the update. These fields are nmi.pending and sipi_vector. Keep the
 								corresponding bits in the flags field cleared to suppress overwriting the
 								current in-kernel state. The bits are:
 								KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
 								KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
-												KVM: x86: Save&restore interrupt shadow mask

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-19 18:38:07 +00:00
+								If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
 								the flags field to signal that interrupt.shadow contains a valid state and
 								shall be written into the VCPU.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.33 KVM_GET_DEBUGREGS
-												KVM: x86: Add support for saving&restoring debug registers

So far user space was not able to save and restore debug registers for
migration or after reset. Plug this hole.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-15 09:45:43 +00:00
 								Capability: KVM_CAP_DEBUGREGS
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_debugregs (out)
 								Returns: 0 on success, -1 on error
 								Reads debug registers from the vcpu.
 								struct kvm_debugregs {
 									__u64 db[4];
 									__u64 dr6;
 									__u64 dr7;
 									__u64 flags;
 									__u64 reserved[9];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.34 KVM_SET_DEBUGREGS
-												KVM: x86: Add support for saving&restoring debug registers

So far user space was not able to save and restore debug registers for
migration or after reset. Plug this hole.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-15 09:45:43 +00:00
 								Capability: KVM_CAP_DEBUGREGS
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_debugregs (in)
 								Returns: 0 on success, -1 on error
 								Writes debug registers into the vcpu.
 								See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
 								yet and must be cleared on entry.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.35 KVM_SET_USER_MEMORY_REGION
-												KVM: Document KVM_SET_USER_MEMORY_REGION

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-25 10:16:48 +00:00
 								Capability: KVM_CAP_USER_MEM
 								Architectures: all
 								Type: vm ioctl
 								Parameters: struct kvm_userspace_memory_region (in)
 								Returns: 0 on success, -1 on error
 								struct kvm_userspace_memory_region {
 									__u32 slot;
 									__u32 flags;
 									__u64 guest_phys_addr;
 									__u64 memory_size; /* bytes */
 									__u64 userspace_addr; /* start of the userspace allocated memory */
 								};
 								/* for kvm_memory_region::flags */
-												KVM: introduce readonly memslot

In current code, if we map a readonly memory space from host to guest
and the page is not currently mapped in the host, we will get a fault
pfn and async is not allowed, then the vm will crash

We introduce readonly memory region to map ROM/ROMD to the guest, read access
is happy for readonly memslot, write access on readonly memslot will cause
KVM_EXIT_MMIO exit

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2012-08-21 03:02:51 +00:00
+								#define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 								#define KVM_MEM_READONLY	(1UL << 1)
-												KVM: Document KVM_SET_USER_MEMORY_REGION

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-25 10:16:48 +00:00
 								This ioctl allows the user to create or modify a guest physical memory
 								slot.  When changing an existing slot, it may be moved in the guest
 								physical memory space, or its flags may be modified.  It may not be
 								resized.  Slots may not overlap in guest physical address space.
 								Memory for the region is taken starting at the address denoted by the
 								field userspace_addr, which must point at user addressable memory for
 								the entire memory slot size.  Any object may back this memory, including
 								anonymous memory, ordinary files, and hugetlbfs.
 								It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
 								be identical.  This allows large pages in the guest to be backed by large
 								pages in the host.
-												KVM: set_memory_region: Disallow changing read-only attribute later

As Xiao pointed out, there are a few problems with it:
 - kvm_arch_commit_memory_region() write protects the memory slot only
   for GET_DIRTY_LOG when modifying the flags.
 - FNAME(sync_page) uses the old spte value to set a new one without
   checking KVM_MEM_READONLY flag.

Since we flush all shadow pages when creating a new slot, the simplest
fix is to disallow such problematic flag changes: this is safe because
no one is doing such things.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2013-01-30 10:40:41 +00:00
+								The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
 								KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
 								writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
 								use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
 								to make a new slot read-only.  In this case, writes to this memory will be
 								posted to userspace as KVM_EXIT_MMIO exits.
-												KVM: Improve wording of KVM_SET_USER_MEMORY_REGION documentation

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2012-09-07 11:17:47 +00:00
 								When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 								the memory region are automatically reflected into the guest.  For example, an
 								mmap() that affects the region will be made visible immediately.  Another
 								example is madvise(MADV_DROP).
-												KVM: Document KVM_SET_USER_MEMORY_REGION

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-25 10:16:48 +00:00
 								It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
 								The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
 								allocation and is deprecated.
-												KVM: x86: Add KVM_GET/SET_VCPU_EVENTS

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-11-12 00:04:25 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.36 KVM_SET_TSS_ADDR
-												KVM: Document KVM_SET_TSS_ADDR

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-25 10:27:30 +00:00
 								Capability: KVM_CAP_SET_TSS_ADDR
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: unsigned long tss_address (in)
 								Returns: 0 on success, -1 on error
 								This ioctl defines the physical address of a three-page region in the guest
 								physical address space.  The region must be within the first 4GB of the
 								guest physical address space and must not conflict with any memory slot
 								or any mmio address.  The guest may malfunction if it accesses this memory
 								region.
 								This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
 								because of a quirk in the virtualization implementation (see the internals
 								documentation when it pops into existence).
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.37 KVM_ENABLE_CAP
-												KVM: Add support for enabling capabilities per-vcpu

Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-24 20:48:29 +00:00
-												KVM: Add per-vm capability enablement.

Allow KVM_ENABLE_CAP to act on a vm as well as on a vcpu. This makes more
sense when the caller wants to enable a vm-related capability.

s390 will be the first user; wire it up.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-10-23 16:26:34 +00:00
+								Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
-												KVM: s390: Base infrastructure for enabling capabilities.

Make s390 support KVM_ENABLE_CAP.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-12-20 14:32:11 +00:00
+								Architectures: ppc, s390
-												KVM: Add per-vm capability enablement.

Allow KVM_ENABLE_CAP to act on a vm as well as on a vcpu. This makes more
sense when the caller wants to enable a vm-related capability.

s390 will be the first user; wire it up.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-10-23 16:26:34 +00:00
+								Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
-												KVM: Add support for enabling capabilities per-vcpu

Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-24 20:48:29 +00:00
+								Parameters: struct kvm_enable_cap (in)
 								Returns: 0 on success; -1 on error
 								+Not all extensions are enabled by default. Using this ioctl the application
 								can enable an extension, making it available to the guest.
 								On systems that do not support this ioctl, it always fails. On systems that
 								do support it, it only works for extensions that are supported for enablement.
 								To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
 								be used.
 								struct kvm_enable_cap {
 								       /* in */
 								       __u32 cap;
 								The capability that is supposed to get enabled.
 								       __u32 flags;
 								A bitfield indicating future enhancements. Has to be 0 for now.
 								       __u64 args[4];
 								Arguments for enabling a feature. If a feature needs initial values to
 								function properly, this is the place to put them.
 								       __u8  pad[64];
 								};
-												KVM: Add per-vm capability enablement.

Allow KVM_ENABLE_CAP to act on a vm as well as on a vcpu. This makes more
sense when the caller wants to enable a vm-related capability.

s390 will be the first user; wire it up.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-10-23 16:26:34 +00:00
+								The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
 								for vm-wide capabilities.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.38 KVM_GET_MP_STATE
-												KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-04-25 12:51:46 +00:00
 								Capability: KVM_CAP_MP_STATE
 								Architectures: x86, ia64
 								Type: vcpu ioctl
 								Parameters: struct kvm_mp_state (out)
 								Returns: 0 on success; -1 on error
 								struct kvm_mp_state {
 									__u32 mp_state;
 								};
 								Returns the vcpu's current "multiprocessing state" (though also valid on
 								uniprocessor guests).
 								Possible values are:
 								 - KVM_MP_STATE_RUNNABLE:        the vcpu is currently running
 								 - KVM_MP_STATE_UNINITIALIZED:   the vcpu is an application processor (AP)
 								                                 which has not yet received an INIT signal
 								 - KVM_MP_STATE_INIT_RECEIVED:   the vcpu has received an INIT signal, and is
 								                                 now ready for a SIPI
 								 - KVM_MP_STATE_HALTED:          the vcpu has executed a HLT instruction and
 								                                 is waiting for an interrupt
 								 - KVM_MP_STATE_SIPI_RECEIVED:   the vcpu has just received a SIPI (vector
-												tree-wide: fix comment/printk typos

"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2010-11-01 19:38:34 +00:00
+								                                 accessible via KVM_GET_VCPU_EVENTS)
-												KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-04-25 12:51:46 +00:00
 								This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
 								irqchip, the multiprocessing state must be maintained by userspace.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.39 KVM_SET_MP_STATE
-												KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-04-25 12:51:46 +00:00
 								Capability: KVM_CAP_MP_STATE
 								Architectures: x86, ia64
 								Type: vcpu ioctl
 								Parameters: struct kvm_mp_state (in)
 								Returns: 0 on success; -1 on error
 								Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
 								arguments.
 								This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
 								irqchip, the multiprocessing state must be maintained by userspace.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.40 KVM_SET_IDENTITY_MAP_ADDR
-												KVM: Document KVM_SET_IDENTITY_MAP ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-04-29 09:08:56 +00:00
 								Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: unsigned long identity (in)
 								Returns: 0 on success, -1 on error
 								This ioctl defines the physical address of a one-page region in the guest
 								physical address space.  The region must be within the first 4GB of the
 								guest physical address space and must not conflict with any memory slot
 								or any mmio address.  The guest may malfunction if it accesses this memory
 								region.
 								This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
 								because of a quirk in the virtualization implementation (see the internals
 								documentation when it pops into existence).
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.41 KVM_SET_BOOT_CPU_ID
-												KVM: Document KVM_SET_BOOT_CPU_ID

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-04-29 09:12:57 +00:00
 								Capability: KVM_CAP_SET_BOOT_CPU_ID
 								Architectures: x86, ia64
 								Type: vm ioctl
 								Parameters: unsigned long vcpu_id
 								Returns: 0 on success, -1 on error
 								Define which vcpu is the Bootstrap Processor (BSP).  Values are the same
 								as the vcpu id in KVM_CREATE_VCPU.  If this ioctl is not called, the default
 								is vcpu 0.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.42 KVM_GET_XSAVE
-												KVM: x86: XSAVE/XRSTOR live migration support

This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-06-13 09:29:39 +00:00
 								Capability: KVM_CAP_XSAVE
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_xsave (out)
 								Returns: 0 on success, -1 on error
 								struct kvm_xsave {
 									__u32 region[1024];
 								};
 								This ioctl would copy current vcpu's xsave struct to the userspace.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.43 KVM_SET_XSAVE
-												KVM: x86: XSAVE/XRSTOR live migration support

This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-06-13 09:29:39 +00:00
 								Capability: KVM_CAP_XSAVE
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_xsave (in)
 								Returns: 0 on success, -1 on error
 								struct kvm_xsave {
 									__u32 region[1024];
 								};
 								This ioctl would copy userspace's xsave struct to the kernel.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.44 KVM_GET_XCRS
-												KVM: x86: XSAVE/XRSTOR live migration support

This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-06-13 09:29:39 +00:00
 								Capability: KVM_CAP_XCRS
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_xcrs (out)
 								Returns: 0 on success, -1 on error
 								struct kvm_xcr {
 									__u32 xcr;
 									__u32 reserved;
 									__u64 value;
 								};
 								struct kvm_xcrs {
 									__u32 nr_xcrs;
 									__u32 flags;
 									struct kvm_xcr xcrs[KVM_MAX_XCRS];
 									__u64 padding[16];
 								};
 								This ioctl would copy current vcpu's xcrs to the userspace.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.45 KVM_SET_XCRS
-												KVM: x86: XSAVE/XRSTOR live migration support

This patch enable save/restore of xsave state.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-06-13 09:29:39 +00:00
 								Capability: KVM_CAP_XCRS
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_xcrs (in)
 								Returns: 0 on success, -1 on error
 								struct kvm_xcr {
 									__u32 xcr;
 									__u32 reserved;
 									__u64 value;
 								};
 								struct kvm_xcrs {
 									__u32 nr_xcrs;
 									__u32 flags;
 									struct kvm_xcr xcrs[KVM_MAX_XCRS];
 									__u64 padding[16];
 								};
 								This ioctl would set vcpu's xcr to the value userspace specified.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.46 KVM_GET_SUPPORTED_CPUID
-												KVM: Document KVM_GET_SUPPORTED_CPUID2 ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-07-14 06:45:21 +00:00
 								Capability: KVM_CAP_EXT_CPUID
 								Architectures: x86
 								Type: system ioctl
 								Parameters: struct kvm_cpuid2 (in/out)
 								Returns: 0 on success, -1 on error
 								struct kvm_cpuid2 {
 									__u32 nent;
 									__u32 padding;
 									struct kvm_cpuid_entry2 entries[0];
 								};
-												kvm: Add KVM_GET_EMULATED_CPUID

Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.

Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.

s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

											
										
										
											2013-09-22 14:44:50 +00:00
+								#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
 								#define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
 								#define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
-												KVM: Document KVM_GET_SUPPORTED_CPUID2 ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-07-14 06:45:21 +00:00
 								struct kvm_cpuid_entry2 {
 									__u32 function;
 									__u32 index;
 									__u32 flags;
 									__u32 eax;
 									__u32 ebx;
 									__u32 ecx;
 									__u32 edx;
 									__u32 padding[3];
 								};
 								This ioctl returns x86 cpuid features which are supported by both the hardware
 								and kvm.  Userspace can use the information returned by this ioctl to
 								construct cpuid information (for KVM_SET_CPUID2) that is consistent with
 								hardware, kernel, and userspace capabilities, and with user requirements (for
 								example, the user may wish to constrain cpuid to emulate older hardware,
 								or for feature consistency across a cluster).
 								Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
 								with the 'nent' field indicating the number of entries in the variable-size
 								array 'entries'.  If the number of entries is too low to describe the cpu
 								capabilities, an error (E2BIG) is returned.  If the number is too high,
 								the 'nent' field is adjusted and an error (ENOMEM) is returned.  If the
 								number is just right, the 'nent' field is adjusted to the number of valid
 								entries in the 'entries' array, which is then filled.
 								The entries returned are the host cpuid as returned by the cpuid instruction,
-												KVM: Document that KVM_GET_SUPPORTED_CPUID may return emulated values

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-09-12 14:39:11 +00:00
+								with unknown or unsupported features masked out.  Some features (for example,
 								x2apic), may not be present in the host cpu, but are exposed by kvm if it can
 								emulate them efficiently. The fields in each entry are defined as follows:
-												KVM: Document KVM_GET_SUPPORTED_CPUID2 ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-07-14 06:45:21 +00:00
 								  function: the eax value used to obtain the entry
 								  index: the ecx value used to obtain the entry (for entries that are
 								         affected by ecx)
 								  flags: an OR of zero or more of the following:
 								        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
 								           if the index field is valid
 								        KVM_CPUID_FLAG_STATEFUL_FUNC:
 								           if cpuid for this function returns different values for successive
 								           invocations; there will be several entries with the same function,
 								           all with this flag set
 								        KVM_CPUID_FLAG_STATE_READ_NEXT:
 								           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
 								           the first entry to be read by a cpu
 								   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
 								         this function/index combination
-												KVM: Don't automatically expose the TSC deadline timer in cpuid

Unlike all of the other cpuid bits, the TSC deadline timer bit is set
unconditionally, regardless of what userspace wants.

This is broken in several ways:
 - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
   deadline timer feature, a guest that uses the feature will break
 - live migration to older host kernels that don't support the TSC deadline
   timer will cause the feature to be pulled from under the guest's feet;
   breaking it
 - guests that are broken wrt the feature will fail.

Fix by not enabling the feature automatically; instead report it to userspace.
Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
KVM_GET_SUPPORTED_CPUID.

Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

[avi: add the KVM_CAP + documentation]

Reported-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-12-21 11:28:29 +00:00
+								The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
 								as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
 								support.  Instead it is reported via
 								  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
 								if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 								feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.47 KVM_PPC_GET_PVINFO
-												KVM: PPC: Add get_pvinfo interface to query hypercall instructions

We need to tell the guest the opcodes that make up a hypercall through
interfaces that are controlled by userspace. So we need to add a call
for userspace to allow it to query those opcodes so it can pass them
on.

This is required because the hypercall opcodes can change based on
the hypervisor conditions. If we're running in hardware accelerated
hypervisor mode, a hypercall looks different from when we're running
without hardware acceleration.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-07-29 12:48:08 +00:00
 								Capability: KVM_CAP_PPC_GET_PVINFO
 								Architectures: ppc
 								Type: vm ioctl
 								Parameters: struct kvm_ppc_pvinfo (out)
 								Returns: 0 on success, !0 on error
 								struct kvm_ppc_pvinfo {
 									__u32 flags;
 									__u32 hcall[4];
 									__u8  pad[108];
 								};
 								This ioctl fetches PV specific information that need to be passed to the guest
 								using the device tree or other means from vm context.
-												KVM: PPC: Add support for ePAPR idle hcall in host kernel

And add a new flag definition in kvm_ppc_pvinfo to indicate
whether the host supports the EV_IDLE hcall.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart.yoder@freescale.com: cleanup,fixes for conditions allowing idle]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: fix typo]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-07-03 05:48:52 +00:00
+								The hcall array defines 4 instructions that make up a hypercall.
-												KVM: PPC: Add get_pvinfo interface to query hypercall instructions

We need to tell the guest the opcodes that make up a hypercall through
interfaces that are controlled by userspace. So we need to add a call
for userspace to allow it to query those opcodes so it can pass them
on.

This is required because the hypercall opcodes can change based on
the hypervisor conditions. If we're running in hardware accelerated
hypervisor mode, a hypercall looks different from when we're running
without hardware acceleration.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-07-29 12:48:08 +00:00
 								If any additional field gets added to this structure later on, a bit for that
 								additional piece of information will be set in the flags bitmap.
-												KVM: PPC: Add support for ePAPR idle hcall in host kernel

And add a new flag definition in kvm_ppc_pvinfo to indicate
whether the host supports the EV_IDLE hcall.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart.yoder@freescale.com: cleanup,fixes for conditions allowing idle]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: fix typo]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-07-03 05:48:52 +00:00
+								The flags bitmap is defined as:
 								   /* the host supports the ePAPR idle hcall
 								   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.48 KVM_ASSIGN_PCI_DEVICE
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								Capability: KVM_CAP_DEVICE_ASSIGNMENT
 								Architectures: x86 ia64
 								Type: vm ioctl
 								Parameters: struct kvm_assigned_pci_dev (in)
 								Returns: 0 on success, -1 on error
 								Assigns a host PCI device to the VM.
 								struct kvm_assigned_pci_dev {
 									__u32 assigned_dev_id;
 									__u32 busnr;
 									__u32 devfn;
 									__u32 flags;
 									__u32 segnr;
 									union {
 										__u32 reserved[11];
 									};
 								};
 								The PCI device is specified by the triple segnr, busnr, and devfn.
 								Identification in succeeding service requests is done via assigned_dev_id. The
 								following flags are specified:
 								/* Depends on KVM_CAP_IOMMU */
 								#define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
-												KVM: Allow host IRQ sharing for assigned PCI 2.3 devices

PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share legacy IRQs of such devices with other host devices
when passing them to a guest.

The new IRQ sharing feature introduced here is optional, user space has
to request it explicitly. Moreover, user space can inform us about its
view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the
interrupt and signaling it if the guest masked it via the virtualized
PCI config space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2012-02-28 13:19:54 +00:00
+								/* The following two depend on KVM_CAP_PCI_2_3 */
 								#define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 								#define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
 								If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts
 								via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with other
 								assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the
 								guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details.
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
-												KVM: Remove ability to assign a device without iommu support

This option has no users and it exposes a security hole that we
can allow devices to be assigned without iommu protection.  Make
KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-12-21 04:59:03 +00:00
+								The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
 								isolation of the device.  Usages not specifying this flag are deprecated.
-												KVM: Device assignment permission checks

Only allow KVM device assignment to attach to devices which:

 - Are not bridges
 - Have BAR resources (assume others are special devices)
 - The user has permissions to use

Assigning a bridge is a configuration error, it's not supported, and
typically doesn't result in the behavior the user is expecting anyway.
Devices without BAR resources are typically chipset components that
also don't have host drivers.  We don't want users to hold such devices
captive or cause system problems by fencing them off into an iommu
domain.  We determine "permission to use" by testing whether the user
has access to the PCI sysfs resource files.  By default a normal user
will not have access to these files, so it provides a good indication
that an administration agent has granted the user access to the device.

[Yang Bai: add missing #include]
[avi: fix comment style]

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yang Bai <hamo.by@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-12-21 04:59:09 +00:00
+								Only PCI header type 0 devices with PCI BAR resources are supported by
 								device assignment.  The user requesting this ioctl must have read/write
 								access to the PCI sysfs resource files associated with the device.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.49 KVM_DEASSIGN_PCI_DEVICE
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								Capability: KVM_CAP_DEVICE_DEASSIGNMENT
 								Architectures: x86 ia64
 								Type: vm ioctl
 								Parameters: struct kvm_assigned_pci_dev (in)
 								Returns: 0 on success, -1 on error
 								Ends PCI device assignment, releasing all associated resources.
 								See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is
 								used in kvm_assigned_pci_dev to identify the device.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.50 KVM_ASSIGN_DEV_IRQ
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								Capability: KVM_CAP_ASSIGN_DEV_IRQ
 								Architectures: x86 ia64
 								Type: vm ioctl
 								Parameters: struct kvm_assigned_irq (in)
 								Returns: 0 on success, -1 on error
 								Assigns an IRQ to a passed-through device.
 								struct kvm_assigned_irq {
 									__u32 assigned_dev_id;
-												KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation

Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-06-03 06:51:05 +00:00
+									__u32 host_irq; /* ignored (legacy field) */
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
+									__u32 guest_irq;
 									__u32 flags;
 									union {
 										__u32 reserved[12];
 									};
 								};
 								The following flags are defined:
 								#define KVM_DEV_IRQ_HOST_INTX    (1 << 0)
 								#define KVM_DEV_IRQ_HOST_MSI     (1 << 1)
 								#define KVM_DEV_IRQ_HOST_MSIX    (1 << 2)
 								#define KVM_DEV_IRQ_GUEST_INTX   (1 << 8)
 								#define KVM_DEV_IRQ_GUEST_MSI    (1 << 9)
 								#define KVM_DEV_IRQ_GUEST_MSIX   (1 << 10)
 								It is not valid to specify multiple types per host or guest IRQ. However, the
 								IRQ type of host and guest can differ or can even be null.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.51 KVM_DEASSIGN_DEV_IRQ
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								Capability: KVM_CAP_ASSIGN_DEV_IRQ
 								Architectures: x86 ia64
 								Type: vm ioctl
 								Parameters: struct kvm_assigned_irq (in)
 								Returns: 0 on success, -1 on error
 								Ends an IRQ assignment to a passed-through device.
 								See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
 								by assigned_dev_id, flags must correspond to the IRQ type specified on
 								KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.52 KVM_SET_GSI_ROUTING
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								Capability: KVM_CAP_IRQ_ROUTING
-												KVM: s390: irq routing for adapter interrupts.

Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-07-15 11:36:01 +00:00
+								Architectures: x86 ia64 s390
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
+								Type: vm ioctl
 								Parameters: struct kvm_irq_routing (in)
 								Returns: 0 on success, -1 on error
 								Sets the GSI routing table entries, overwriting any previously set entries.
 								struct kvm_irq_routing {
 									__u32 nr;
 									__u32 flags;
 									struct kvm_irq_routing_entry entries[0];
 								};
 								No flags are specified so far, the corresponding field must be set to zero.
 								struct kvm_irq_routing_entry {
 									__u32 gsi;
 									__u32 type;
 									__u32 flags;
 									__u32 pad;
 									union {
 										struct kvm_irq_routing_irqchip irqchip;
 										struct kvm_irq_routing_msi msi;
-												KVM: s390: irq routing for adapter interrupts.

Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-07-15 11:36:01 +00:00
+										struct kvm_irq_routing_s390_adapter adapter;
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
+										__u32 pad[8];
 									} u;
 								};
 								/* gsi routing entry types */
 								#define KVM_IRQ_ROUTING_IRQCHIP 1
 								#define KVM_IRQ_ROUTING_MSI 2
-												KVM: s390: irq routing for adapter interrupts.

Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-07-15 11:36:01 +00:00
+								#define KVM_IRQ_ROUTING_S390_ADAPTER 3
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								No flags are specified so far, the corresponding field must be set to zero.
 								struct kvm_irq_routing_irqchip {
 									__u32 irqchip;
 									__u32 pin;
 								};
 								struct kvm_irq_routing_msi {
 									__u32 address_lo;
 									__u32 address_hi;
 									__u32 data;
 									__u32 pad;
 								};
-												KVM: s390: irq routing for adapter interrupts.

Introduce a new interrupt class for s390 adapter interrupts and enable
irqfds for s390.

This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP,
that needs to be enabled by userspace.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

											
										
										
											2013-07-15 11:36:01 +00:00
+								struct kvm_irq_routing_s390_adapter {
 									__u64 ind_addr;
 									__u64 summary_addr;
 									__u64 ind_offset;
 									__u32 summary_offset;
 									__u32 adapter_id;
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.53 KVM_ASSIGN_SET_MSIX_NR
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								Capability: KVM_CAP_DEVICE_MSIX
 								Architectures: x86 ia64
 								Type: vm ioctl
 								Parameters: struct kvm_assigned_msix_nr (in)
 								Returns: 0 on success, -1 on error
-												KVM: Fix KVM_ASSIGN_SET_MSIX_ENTRY documentation

The documented behavior did not match the implemented one (which also
never changed).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-06-11 10:24:24 +00:00
+								Set the number of MSI-X interrupts for an assigned device. The number is
 								reset again by terminating the MSI-X assignment of the device via
 								KVM_DEASSIGN_DEV_IRQ. Calling this service more than once at any earlier
 								point will fail.
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								struct kvm_assigned_msix_nr {
 									__u32 assigned_dev_id;
 									__u16 entry_nr;
 									__u16 padding;
 								};
 								#define KVM_MAX_MSIX_PER_DEV		256
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												doc: Fix numbering of KVM API description sections

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2011-02-14 23:05:59 +00:00
+.54 KVM_ASSIGN_SET_MSIX_ENTRY
-												KVM: Document device assigment API

Adds API documentation for KVM_[DE]ASSIGN_PCI_DEVICE,
KVM_[DE]ASSIGN_DEV_IRQ, KVM_SET_GSI_ROUTING, KVM_ASSIGN_SET_MSIX_NR, and
KVM_ASSIGN_SET_MSIX_ENTRY.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2010-11-16 21:30:07 +00:00
 								Capability: KVM_CAP_DEVICE_MSIX
 								Architectures: x86 ia64
 								Type: vm ioctl
 								Parameters: struct kvm_assigned_msix_entry (in)
 								Returns: 0 on success, -1 on error
 								Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
 								the GSI vector to zero means disabling the interrupt.
 								struct kvm_assigned_msix_entry {
 									__u32 assigned_dev_id;
 									__u32 gsi;
 									__u16 entry; /* The index of entry in the MSI-X table */
 									__u16 padding[3];
 								};
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
 .55 KVM_SET_TSC_KHZ
-												KVM: X86: Implement userspace interface to set virtual_tsc_khz

This patch implements two new vm-ioctls to get and set the
virtual_tsc_khz if the machine supports tsc-scaling. Setting
the tsc-frequency is only possible before userspace creates
any vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-03-25 08:44:51 +00:00
 								Capability: KVM_CAP_TSC_CONTROL
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: virtual tsc_khz
 								Returns: 0 on success, -1 on error
 								Specifies the tsc frequency for the virtual machine. The unit of the
 								frequency is KHz.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
 .56 KVM_GET_TSC_KHZ
-												KVM: X86: Implement userspace interface to set virtual_tsc_khz

This patch implements two new vm-ioctls to get and set the
virtual_tsc_khz if the machine supports tsc-scaling. Setting
the tsc-frequency is only possible before userspace creates
any vcpu.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-03-25 08:44:51 +00:00
 								Capability: KVM_CAP_GET_TSC_KHZ
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: none
 								Returns: virtual tsc-khz on success, negative value on error
 								Returns the tsc frequency of the guest. The unit of the return value is
 								KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
 								error.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
 .57 KVM_GET_LAPIC
-												KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-05-11 12:30:51 +00:00
 								Capability: KVM_CAP_IRQCHIP
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_lapic_state (out)
 								Returns: 0 on success, -1 on error
 								#define KVM_APIC_REG_SIZE 0x400
 								struct kvm_lapic_state {
 									char regs[KVM_APIC_REG_SIZE];
 								};
 								Reads the Local APIC registers and copies them into the input argument.  The
 								data format and layout are the same as documented in the architecture manual.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
 .58 KVM_SET_LAPIC
-												KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-05-11 12:30:51 +00:00
 								Capability: KVM_CAP_IRQCHIP
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: struct kvm_lapic_state (in)
 								Returns: 0 on success, -1 on error
 								#define KVM_APIC_REG_SIZE 0x400
 								struct kvm_lapic_state {
 									char regs[KVM_APIC_REG_SIZE];
 								};
-												doc: fix double words

Fix double words "the the" in various files
within Documentations.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2014-03-21 01:04:30 +00:00
+								Copies the input argument into the Local APIC registers.  The data format
-												KVM: Document KVM_GET_LAPIC, KVM_SET_LAPIC ioctl

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-05-11 12:30:51 +00:00
+								and layout are the same as documented in the architecture manual.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
 .59 KVM_IOEVENTFD
-												KVM: Document KVM_IOEVENTFD

Document KVM_IOEVENTFD that can be used to receive
notifications of PIO/MMIO events without triggering
an exit.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-05-28 11:12:30 +00:00
 								Capability: KVM_CAP_IOEVENTFD
 								Architectures: all
 								Type: vm ioctl
 								Parameters: struct kvm_ioeventfd (in)
 								Returns: 0 on success, !0 on error
 								This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
 								within the guest.  A guest write in the registered address will signal the
 								provided event instead of triggering an exit.
 								struct kvm_ioeventfd {
 									__u64 datamatch;
 									__u64 addr;        /* legal pio/mmio address */
 									__u32 len;         /* 1, 2, 4, or 8 bytes    */
 									__s32 fd;
 									__u32 flags;
 									__u8  pad[36];
 								};
-												KVM: ioeventfd for virtio-ccw devices.

Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw
devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2013-02-28 11:33:20 +00:00
+								For the special case of virtio-ccw devices on s390, the ioevent is matched
 								to a subchannel/virtqueue tuple instead.
-												KVM: Document KVM_IOEVENTFD

Document KVM_IOEVENTFD that can be used to receive
notifications of PIO/MMIO events without triggering
an exit.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-05-28 11:12:30 +00:00
+								The following flags are defined:
 								#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
 								#define KVM_IOEVENTFD_FLAG_PIO       (1 << kvm_ioeventfd_flag_nr_pio)
 								#define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
-												KVM: ioeventfd for virtio-ccw devices.

Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw
devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2013-02-28 11:33:20 +00:00
+								#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
 									(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
-												KVM: Document KVM_IOEVENTFD

Document KVM_IOEVENTFD that can be used to receive
notifications of PIO/MMIO events without triggering
an exit.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2011-05-28 11:12:30 +00:00
 								If datamatch flag is set, the event will be signaled only if the written value
 								to the registered address is equal to datamatch in struct kvm_ioeventfd.
-												KVM: ioeventfd for virtio-ccw devices.

Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw
devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2013-02-28 11:33:20 +00:00
+								For virtio-ccw devices, addr contains the subchannel id and datamatch the
 								virtqueue index.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
 .60 KVM_DIRTY_TLB
-												KVM: PPC: e500: MMU API

This implements a shared-memory API for giving host userspace access to
the guest's TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-08-18 20:25:21 +00:00
 								Capability: KVM_CAP_SW_TLB
 								Architectures: ppc
 								Type: vcpu ioctl
 								Parameters: struct kvm_dirty_tlb (in)
 								Returns: 0 on success, -1 on error
 								struct kvm_dirty_tlb {
 									__u64 bitmap;
 									__u32 num_dirty;
 								};
 								This must be called whenever userspace has changed an entry in the shared
 								TLB, prior to calling KVM_RUN on the associated vcpu.
 								The "bitmap" field is the userspace address of an array.  This array
 								consists of a number of bits, equal to the total number of TLB entries as
 								determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
 								nearest multiple of 64.
 								Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
 								array.
 								The array is little-endian: the bit 0 is the least significant bit of the
 								first byte, bit 8 is the least significant bit of the second byte, etc.
 								This avoids any complications with differing word sizes.
 								The "num_dirty" field is a performance hint for KVM to determine whether it
 								should skip processing the bitmap and just invalidate everything.  It must
 								be set to the number of set bits in the bitmap.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
 .61 KVM_ASSIGN_SET_INTX_MASK
-												KVM: Allow host IRQ sharing for assigned PCI 2.3 devices

PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share legacy IRQs of such devices with other host devices
when passing them to a guest.

The new IRQ sharing feature introduced here is optional, user space has
to request it explicitly. Moreover, user space can inform us about its
view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the
interrupt and signaling it if the guest masked it via the virtualized
PCI config space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2012-02-28 13:19:54 +00:00
 								Capability: KVM_CAP_PCI_2_3
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_assigned_pci_dev (in)
 								Returns: 0 on success, -1 on error
 								Allows userspace to mask PCI INTx interrupts from the assigned device.  The
 								kernel will not deliver INTx interrupts to the guest between setting and
 								clearing of KVM_ASSIGN_SET_INTX_MASK via this interface.  This enables use of
 								and emulation of PCI 2.3 INTx disable command register behavior.
 								This may be used for both PCI 2.3 devices supporting INTx disable natively and
 								older devices lacking this support. Userspace is responsible for emulating the
 								read value of the INTx disable bit in the guest visible PCI command register.
 								When modifying the INTx disable state, userspace should precede updating the
 								physical device command register by calling this ioctl to inform the kernel of
 								the new intended INTx mask state.
 								Note that the kernel uses the device INTx disable bit to internally manage the
 								device interrupt state for PCI 2.3 devices.  Reads of this register may
 								therefore not match the expected value.  Writes should always use the guest
 								intended INTx disable value rather than attempting to read-copy-update the
 								current physical device state.  Races between user and kernel updates to the
 								INTx disable bit are handled lazily in the kernel.  It's possible the device
 								may generate unintended interrupts, but they will not be injected into the
 								guest.
 								See KVM_ASSIGN_DEV_IRQ for the data structure.  The target device is specified
 								by assigned_dev_id.  In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is
 								evaluated.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode

This improves I/O performance for guests using the PAPR
paravirtualization interface by making the H_PUT_TCE hcall faster, by
implementing it in real mode.  H_PUT_TCE is used for updating virtual
IOMMU tables, and is used both for virtual I/O and for real I/O in the
PAPR interface.

Since this moves the IOMMU tables into the kernel, we define a new
KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables.  The
ioctl returns a file descriptor which can be used to mmap the newly
created table.  The qemu driver models use them in the same way as
userspace managed tables, but they can be updated directly by the
guest with a real-mode H_PUT_TCE implementation, reducing the number
of host/guest context switches during guest IO.

There are certain circumstances where it is useful for userland qemu
to write to the TCE table even if the kernel H_PUT_TCE path is used
most of the time.  Specifically, allowing this will avoid awkwardness
when we need to reset the table.  More importantly, we will in the
future need to write the table in order to restore its state after a
checkpoint resume or migration.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2011-06-29 00:22:41 +00:00
+.62 KVM_CREATE_SPAPR_TCE
 								Capability: KVM_CAP_SPAPR_TCE
 								Architectures: powerpc
 								Type: vm ioctl
 								Parameters: struct kvm_create_spapr_tce (in)
 								Returns: file descriptor for manipulating the created TCE table
 								This creates a virtual TCE (translation control entry) table, which
 								is an IOMMU for PAPR-style virtual I/O.  It is used to translate
 								logical addresses used in virtual I/O into guest physical addresses,
 								and provides a scatter/gather capability for PAPR virtual I/O.
 								/* for KVM_CAP_SPAPR_TCE */
 								struct kvm_create_spapr_tce {
 									__u64 liobn;
 									__u32 window_size;
 								};
 								The liobn field gives the logical IO bus number for which to create a
 								TCE table.  The window_size field specifies the size of the DMA window
 								which this TCE table will translate - the table will contain one 64
 								bit TCE entry for every 4kiB of the DMA window.
 								When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
 								table has been created using this ioctl(), the kernel will handle it
 								in real mode, updating the TCE table.  H_PUT_TCE calls for other
 								liobns will cause a vm exit and must be handled by userspace.
 								The return value is a file descriptor which can be passed to mmap(2)
 								to map the created TCE table into userspace.  This lets userspace read
 								the entries written by kernel-handled H_PUT_TCE calls, and also lets
 								userspace update the TCE table directly which is useful in some
 								circumstances.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests

This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility.  These processors require a physically
contiguous, aligned area of memory for each guest.  When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access.  The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.

Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator.  The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.

KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs.  The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.

This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA.  It
also returns the size of the RMA in the argument structure.

Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace.  To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory.  Subsequently we will get rid of this
array and use memory associated with each memslot instead.

This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region.  Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB.  However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.

Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest.  This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2011-06-29 00:25:44 +00:00
+.63 KVM_ALLOCATE_RMA
 								Capability: KVM_CAP_PPC_RMA
 								Architectures: powerpc
 								Type: vm ioctl
 								Parameters: struct kvm_allocate_rma (out)
 								Returns: file descriptor for mapping the allocated RMA
 								This allocates a Real Mode Area (RMA) from the pool allocated at boot
 								time by the kernel.  An RMA is a physically-contiguous, aligned region
 								of memory used on older POWER processors to provide the memory which
 								will be accessed by real-mode (MMU off) accesses in a KVM guest.
 								POWER processors support a set of sizes for the RMA that usually
 								includes 64MB, 128MB, 256MB and some larger powers of two.
 								/* for KVM_ALLOCATE_RMA */
 								struct kvm_allocate_rma {
 									__u64 rma_size;
 								};
 								The return value is a file descriptor which can be passed to mmap(2)
 								to map the allocated RMA into userspace.  The mapped area can then be
 								passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
 								RMA for a virtual machine.  The size of the RMA in bytes (which is
 								fixed at host kernel boot time) is returned in the rma_size field of
 								the argument structure.
 								The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
 								is supported; 2 if the processor requires all virtual machines to have
 								an RMA, or 1 if the processor can use an RMA but doesn't require it,
 								because it supports the Virtual RMA (VRMA) facility.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Document KVM_NMI

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-12-07 10:42:47 +00:00
+.64 KVM_NMI
 								Capability: KVM_CAP_USER_NMI
 								Architectures: x86
 								Type: vcpu ioctl
 								Parameters: none
 								Returns: 0 on success, -1 on error
 								Queues an NMI on the thread's vcpu.  Note this is well defined only
 								when KVM_CREATE_IRQCHIP has not been called, since this is an interface
 								between the virtual cpu core and virtual local APIC.  After KVM_CREATE_IRQCHIP
 								has been called, this interface is completely emulated within the kernel.
 								To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
 								following algorithm:
 								  - pause the vpcu
 								  - read the local APIC's state (KVM_GET_LAPIC)
 								  - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
 								  - if so, issue KVM_NMI
 								  - resume the vcpu
 								Some guests configure the LINT1 NMI input to cause a panic, aiding in
 								debugging.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: PPC: Add generic single register ioctls

Right now we transfer a static struct every time we want to get or set
registers. Unfortunately, over time we realize that there are more of
these than we thought of before and the extensibility and flexibility of
transferring a full struct every time is limited.

So this is a new approach to the problem. With these new ioctls, we can
get and set a single register that is identified by an ID. This allows for
very precise and limited transmittal of data. When we later realize that
it's a better idea to shove over multiple registers at once, we can reuse
most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-09-14 08:02:41 +00:00
+.65 KVM_S390_UCAS_MAP
-												KVM: s390: ucontrol: per vcpu address spaces

This patch introduces two ioctls for virtual cpus, that are only
valid for kernel virtual machines that are controlled by userspace.
Each virtual cpu has its individual address space in this mode of
operation, and each address space is backed by the gmap
implementation just like the address space for regular KVM guests.
KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
space to the vcpu. Starting offset and length in both the user and
the vcpu address space need to be aligned to 1M.
KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
virtual cpu in a similar way.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:21 +00:00
 								Capability: KVM_CAP_S390_UCONTROL
 								Architectures: s390
 								Type: vcpu ioctl
 								Parameters: struct kvm_s390_ucas_mapping (in)
 								Returns: 0 in case of success
 								The parameter is defined like this:
 									struct kvm_s390_ucas_mapping {
 										__u64 user_addr;
 										__u64 vcpu_addr;
 										__u64 length;
 									};
 								This ioctl maps the memory at "user_addr" with the length "length" to
 								the vcpu's address space starting at "vcpu_addr". All parameters need to
-												doc: fix misspellings with 'codespell' tool

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2013-05-08 23:56:16 +00:00
+								be aligned by 1 megabyte.
-												KVM: s390: ucontrol: per vcpu address spaces

This patch introduces two ioctls for virtual cpus, that are only
valid for kernel virtual machines that are controlled by userspace.
Each virtual cpu has its individual address space in this mode of
operation, and each address space is backed by the gmap
implementation just like the address space for regular KVM guests.
KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
space to the vcpu. Starting offset and length in both the user and
the vcpu address space need to be aligned to 1M.
KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
virtual cpu in a similar way.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:21 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: PPC: Add generic single register ioctls

Right now we transfer a static struct every time we want to get or set
registers. Unfortunately, over time we realize that there are more of
these than we thought of before and the extensibility and flexibility of
transferring a full struct every time is limited.

So this is a new approach to the problem. With these new ioctls, we can
get and set a single register that is identified by an ID. This allows for
very precise and limited transmittal of data. When we later realize that
it's a better idea to shove over multiple registers at once, we can reuse
most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-09-14 08:02:41 +00:00
+.66 KVM_S390_UCAS_UNMAP
-												KVM: s390: ucontrol: per vcpu address spaces

This patch introduces two ioctls for virtual cpus, that are only
valid for kernel virtual machines that are controlled by userspace.
Each virtual cpu has its individual address space in this mode of
operation, and each address space is backed by the gmap
implementation just like the address space for regular KVM guests.
KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
space to the vcpu. Starting offset and length in both the user and
the vcpu address space need to be aligned to 1M.
KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
virtual cpu in a similar way.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:21 +00:00
 								Capability: KVM_CAP_S390_UCONTROL
 								Architectures: s390
 								Type: vcpu ioctl
 								Parameters: struct kvm_s390_ucas_mapping (in)
 								Returns: 0 in case of success
 								The parameter is defined like this:
 									struct kvm_s390_ucas_mapping {
 										__u64 user_addr;
 										__u64 vcpu_addr;
 										__u64 length;
 									};
 								This ioctl unmaps the memory in the vcpu's address space starting at
 								"vcpu_addr" with the length "length". The field "user_addr" is ignored.
-												doc: fix misspellings with 'codespell' tool

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2013-05-08 23:56:16 +00:00
+								All parameters need to be aligned by 1 megabyte.
-												KVM: s390: ucontrol: per vcpu address spaces

This patch introduces two ioctls for virtual cpus, that are only
valid for kernel virtual machines that are controlled by userspace.
Each virtual cpu has its individual address space in this mode of
operation, and each address space is backed by the gmap
implementation just like the address space for regular KVM guests.
KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
space to the vcpu. Starting offset and length in both the user and
the vcpu address space need to be aligned to 1M.
KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
virtual cpu in a similar way.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:21 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: PPC: Add generic single register ioctls

Right now we transfer a static struct every time we want to get or set
registers. Unfortunately, over time we realize that there are more of
these than we thought of before and the extensibility and flexibility of
transferring a full struct every time is limited.

So this is a new approach to the problem. With these new ioctls, we can
get and set a single register that is identified by an ID. This allows for
very precise and limited transmittal of data. When we later realize that
it's a better idea to shove over multiple registers at once, we can reuse
most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-09-14 08:02:41 +00:00
+.67 KVM_S390_VCPU_FAULT
-												KVM: s390: ucontrol: interface to inject faults on a vcpu page table

This patch allows the user to fault in pages on a virtual cpus
address space for user controlled virtual machines. Typically this
is superfluous because userspace can just create a mapping and
let the kernel's page fault logic take are of it. There is one
exception: SIE won't start if the lowcore is not present. Normally
the kernel takes care of this [handle_validity() in
arch/s390/kvm/intercept.c] but since the kernel does not handle
intercepts for user controlled virtual machines, userspace needs to
be able to handle this condition.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:26 +00:00
 								Capability: KVM_CAP_S390_UCONTROL
 								Architectures: s390
 								Type: vcpu ioctl
 								Parameters: vcpu absolute address (in)
 								Returns: 0 in case of success
 								This call creates a page table entry on the virtual cpu's address space
 								(for user controlled virtual machines) or the virtual machine's address
 								space (for regular virtual machines). This only works for minor faults,
 								thus it's recommended to access subject memory page via the user page
 								table upfront. This is useful to handle validity intercepts for user
 								controlled virtual machines to fault in the virtual cpu's lowcore pages
 								prior to calling the KVM_RUN ioctl.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: PPC: Add generic single register ioctls

Right now we transfer a static struct every time we want to get or set
registers. Unfortunately, over time we realize that there are more of
these than we thought of before and the extensibility and flexibility of
transferring a full struct every time is limited.

So this is a new approach to the problem. With these new ioctls, we can
get and set a single register that is identified by an ID. This allows for
very precise and limited transmittal of data. When we later realize that
it's a better idea to shove over multiple registers at once, we can reuse
most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-09-14 08:02:41 +00:00
+.68 KVM_SET_ONE_REG
 								Capability: KVM_CAP_ONE_REG
 								Architectures: all
 								Type: vcpu ioctl
 								Parameters: struct kvm_one_reg (in)
 								Returns: 0 on success, negative value on failure
 								struct kvm_one_reg {
 								       __u64 id;
 								       __u64 addr;
 								};
 								Using this ioctl, a single vcpu register can be set to a specific value
 								defined by user space with the passed in struct kvm_one_reg, where id
 								refers to the register identifier as described below and addr is a pointer
 								to a variable with the respective size. There can be architecture agnostic
 								and architecture specific registers. Each have their own range of operation
 								and their own constants and width. To keep track of the implemented
 								registers, find a list below:
 								  Arch  |       Register        | Width (bits)
 								        |                       |
-												KVM: PPC: Add support for explicit HIOR setting

Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SET_ONE_REG based method, we drop the PVR logic.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-09-14 19:45:23 +00:00
+								  PPC   | KVM_REG_PPC_HIOR      | 64
-												Document IACx/DACx registers access using ONE_REG API

Patch to access the debug registers (IACx/DACx) using ONE_REG api
was sent earlier. But that missed the respective documentation.

Also corrected the index number referencing in section 4.69

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-08-15 17:37:13 +00:00
+								  PPC   | KVM_REG_PPC_IAC1      | 64
 								  PPC   | KVM_REG_PPC_IAC2      | 64
 								  PPC   | KVM_REG_PPC_IAC3      | 64
 								  PPC   | KVM_REG_PPC_IAC4      | 64
 								  PPC   | KVM_REG_PPC_DAC1      | 64
 								  PPC   | KVM_REG_PPC_DAC2      | 64
-												KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface

This enables userspace to get and set various SPRs (special-purpose
registers) using the KVM_[GS]ET_ONE_REG ioctls.  With this, userspace
can get and set all the SPRs that are part of the guest state, either
through the KVM_[GS]ET_REGS ioctls, the KVM_[GS]ET_SREGS ioctls, or
the KVM_[GS]ET_ONE_REG ioctls.

The SPRs that are added here are:

- DABR:  Data address breakpoint register
- DSCR:  Data stream control register
- PURR:  Processor utilization of resources register
- SPURR: Scaled PURR
- DAR:   Data address register
- DSISR: Data storage interrupt status register
- AMR:   Authority mask register
- UAMOR: User authority mask override register
- MMCR0, MMCR1, MMCRA: Performance monitor unit control registers
- PMC1..PMC8: Performance monitor unit counter registers

In order to reduce code duplication between PR and HV KVM code, this
moves the kvm_vcpu_ioctl_[gs]et_one_reg functions into book3s.c and
centralizes the copying between user and kernel space there.  The
registers that are handled differently between PR and HV, and those
that exist only in one flavor, are handled in kvmppc_[gs]et_one_reg()
functions that are specific to each flavor.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: minimal style fixes]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-09-25 20:31:56 +00:00
+								  PPC   | KVM_REG_PPC_DABR      | 64
 								  PPC   | KVM_REG_PPC_DSCR      | 64
 								  PPC   | KVM_REG_PPC_PURR      | 64
 								  PPC   | KVM_REG_PPC_SPURR     | 64
 								  PPC   | KVM_REG_PPC_DAR       | 64
 								  PPC   | KVM_REG_PPC_DSISR     | 32
 								  PPC   | KVM_REG_PPC_AMR       | 64
 								  PPC   | KVM_REG_PPC_UAMOR     | 64
 								  PPC   | KVM_REG_PPC_MMCR0     | 64
 								  PPC   | KVM_REG_PPC_MMCR1     | 64
 								  PPC   | KVM_REG_PPC_MMCRA     | 64
 								  PPC   | KVM_REG_PPC_PMC1      | 32
 								  PPC   | KVM_REG_PPC_PMC2      | 32
 								  PPC   | KVM_REG_PPC_PMC3      | 32
 								  PPC   | KVM_REG_PPC_PMC4      | 32
 								  PPC   | KVM_REG_PPC_PMC5      | 32
 								  PPC   | KVM_REG_PPC_PMC6      | 32
 								  PPC   | KVM_REG_PPC_PMC7      | 32
 								  PPC   | KVM_REG_PPC_PMC8      | 32
-												KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface

This enables userspace to get and set all the guest floating-point
state using the KVM_[GS]ET_ONE_REG ioctls.  The floating-point state
includes all of the traditional floating-point registers and the
FPSCR (floating point status/control register), all the VMX/Altivec
vector registers and the VSCR (vector status/control register), and
on POWER7, the vector-scalar registers (note that each FP register
is the high-order half of the corresponding VSR).

Most of these are implemented in common Book 3S code, except for VSX
on POWER7.  Because HV and PR differ in how they store the FP and VSX
registers on POWER7, the code for these cases is not common.  On POWER7,
the FP registers are the upper halves of the VSX registers vsr0 - vsr31.
PR KVM stores vsr0 - vsr31 in two halves, with the upper halves in the
arch.fpr[] array and the lower halves in the arch.vsr[] array, whereas
HV KVM on POWER7 stores the whole VSX register in arch.vsr[].

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace, vsx compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-09-25 20:32:30 +00:00
+								  PPC   | KVM_REG_PPC_FPR0      | 64
 								          ...
 								  PPC   | KVM_REG_PPC_FPR31     | 64
 								  PPC   | KVM_REG_PPC_VR0       | 128
 								          ...
 								  PPC   | KVM_REG_PPC_VR31      | 128
 								  PPC   | KVM_REG_PPC_VSR0      | 128
 								          ...
 								  PPC   | KVM_REG_PPC_VSR31     | 128
 								  PPC   | KVM_REG_PPC_FPSCR     | 64
 								  PPC   | KVM_REG_PPC_VSCR      | 32
-												KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas

The PAPR paravirtualization interface lets guests register three
different types of per-vCPU buffer areas in its memory for communication
with the hypervisor.  These are called virtual processor areas (VPAs).
Currently the hypercalls to register and unregister VPAs are handled
by KVM in the kernel, and userspace has no way to know about or save
and restore these registrations across a migration.

This adds "register" codes for these three areas that userspace can
use with the KVM_GET/SET_ONE_REG ioctls to see what addresses have
been registered, and to register or unregister them.  This will be
needed for guest hibernation and migration, and is also needed so
that userspace can unregister them on reset (otherwise we corrupt
guest memory after reboot by writing to the VPAs registered by the
previous kernel).

The "register" for the VPA is a 64-bit value containing the address,
since the length of the VPA is fixed.  The "registers" for the SLB
shadow buffer and dispatch trace log (DTL) are 128 bits long,
consisting of the guest physical address in the high (first) 64 bits
and the length in the low 64 bits.

This also fixes a bug where we were calling init_vpa unconditionally,
leading to an oops when unregistering the VPA.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-09-25 20:33:06 +00:00
+								  PPC   | KVM_REG_PPC_VPA_ADDR  | 64
 								  PPC   | KVM_REG_PPC_VPA_SLB   | 128
 								  PPC   | KVM_REG_PPC_VPA_DTL   | 128
-												KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface

Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to
the list of ONE_REG PPC supported registers.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: remove HV dependency, use get/put_user]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-10-11 06:13:29 +00:00
+								  PPC   | KVM_REG_PPC_EPCR	| 32
-												KVM: PPC: BookE: Add EPR ONE_REG sync

We need to be able to read and write the contents of the EPR register
from user space.

This patch implements that logic through the ONE_REG API and declares
its (never implemented) SREGS counterpart as deprecated.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-01-04 17:28:51 +00:00
+								  PPC   | KVM_REG_PPC_EPR	| 32
-												KVM: PPC: Added one_reg interface for timer registers

If userspace wants to change some specific bits of TSR
(timer status register) then it uses GET/SET_SREGS ioctl interface.
So the steps will be:
      i)   user-space will make get ioctl,
      ii)  change TSR in userspace
      iii) then make set ioctl.
It can happen that TSR gets changed by kernel after step i) and
before step iii).

To avoid this we have added below one_reg ioctls for oring and clearing
specific bits in TSR. This patch adds one registerface for:
     1) setting specific bit in TSR (timer status register)
     2) clearing specific bit in TSR (timer status register)
     3) setting/getting the TCR register. There are cases where we want to only
        change TCR and not TSR. Although we can uses SREGS without
        KVM_SREGS_E_UPDATE_TSR flag but I think one reg is better. I am open
        if someone feels we should use SREGS only here.
     4) getting/setting TSR register

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-02-24 18:57:12 +00:00
+								  PPC   | KVM_REG_PPC_TCR	| 32
 								  PPC   | KVM_REG_PPC_TSR	| 32
 								  PPC   | KVM_REG_PPC_OR_TSR	| 32
 								  PPC   | KVM_REG_PPC_CLEAR_TSR	| 32
-												KVM: PPC: e500: Expose MMU registers via ONE_REG

MMU registers were exposed to user-space using sregs interface. Add them
to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation
mechanism.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-11 00:03:08 +00:00
+								  PPC   | KVM_REG_PPC_MAS0	| 32
 								  PPC   | KVM_REG_PPC_MAS1	| 32
 								  PPC   | KVM_REG_PPC_MAS2	| 64
 								  PPC   | KVM_REG_PPC_MAS7_3	| 64
 								  PPC   | KVM_REG_PPC_MAS4	| 32
 								  PPC   | KVM_REG_PPC_MAS6	| 32
 								  PPC   | KVM_REG_PPC_MMUCFG	| 32
 								  PPC   | KVM_REG_PPC_TLB0CFG	| 32
 								  PPC   | KVM_REG_PPC_TLB1CFG	| 32
 								  PPC   | KVM_REG_PPC_TLB2CFG	| 32
 								  PPC   | KVM_REG_PPC_TLB3CFG	| 32
-												KVM: PPC: e500: Add support for TLBnPS registers

Add support for TLBnPS registers available in MMU Architecture Version
(MAV) 2.0.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-11 00:03:10 +00:00
+								  PPC   | KVM_REG_PPC_TLB0PS	| 32
 								  PPC   | KVM_REG_PPC_TLB1PS	| 32
 								  PPC   | KVM_REG_PPC_TLB2PS	| 32
 								  PPC   | KVM_REG_PPC_TLB3PS	| 32
-												KVM: PPC: e500: Add support for EPTCFG register

EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
in the presence of MAV 2.0. Emulate it now.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-11 00:03:11 +00:00
+								  PPC   | KVM_REG_PPC_EPTCFG	| 32
-												KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state

This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface.  Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state.  The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-17 20:32:26 +00:00
+								  PPC   | KVM_REG_PPC_ICP_STATE | 64
-												KVM: PPC: Book3S HV: Implement timebase offset for guests

This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  The kernel rounds up the value given
to a multiple of 2^24.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu->arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu->arch.dec_expires is a
host timebase value.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-09-06 03:17:46 +00:00
+								  PPC   | KVM_REG_PPC_TB_OFFSET	| 64
-												KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg

This reserves space in get/set_one_reg ioctl for the extra guest state
needed for POWER8.  It doesn't implement these at all, it just reserves
them so that the ABI is defined now.

A few things to note here:

- This add *a lot* state for transactional memory.  TM suspend mode,
  this is unavoidable, you can't simply roll back all transactions and
  store only the checkpointed state.  I've added this all to
  get/set_one_reg (including GPRs) rather than creating a new ioctl
  which returns a struct kvm_regs like KVM_GET_REGS does.  This means we
  if we need to extract the TM state, we are going to need a bucket load
  of IOCTLs.  Hopefully most of the time this will not be needed as we
  can look at the MSR to see if TM is active and only grab them when
  needed.  If this becomes a bottle neck in future we can add another
  ioctl to grab all this state in one go.

- The TM state is offset by 0x80000000.

- For TM, I've done away with VMX and FP and created a single 64x128 bit
  VSX register space.

- I've left a space of 1 (at 0x9c) since Paulus needs to add a value
  which applies to POWER7 as well.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-09-03 01:13:12 +00:00
+								  PPC   | KVM_REG_PPC_SPMC1	| 32
 								  PPC   | KVM_REG_PPC_SPMC2	| 32
 								  PPC   | KVM_REG_PPC_IAMR	| 64
 								  PPC   | KVM_REG_PPC_TFHAR	| 64
 								  PPC   | KVM_REG_PPC_TFIAR	| 64
 								  PPC   | KVM_REG_PPC_TEXASR	| 64
 								  PPC   | KVM_REG_PPC_FSCR	| 64
 								  PPC   | KVM_REG_PPC_PSPB	| 32
 								  PPC   | KVM_REG_PPC_EBBHR	| 64
 								  PPC   | KVM_REG_PPC_EBBRR	| 64
 								  PPC   | KVM_REG_PPC_BESCR	| 64
 								  PPC   | KVM_REG_PPC_TAR	| 64
 								  PPC   | KVM_REG_PPC_DPDES	| 64
 								  PPC   | KVM_REG_PPC_DAWR	| 64
 								  PPC   | KVM_REG_PPC_DAWRX	| 64
 								  PPC   | KVM_REG_PPC_CIABR	| 64
 								  PPC   | KVM_REG_PPC_IC	| 64
 								  PPC   | KVM_REG_PPC_VTB	| 64
 								  PPC   | KVM_REG_PPC_CSIGR	| 64
 								  PPC   | KVM_REG_PPC_TACR	| 64
 								  PPC   | KVM_REG_PPC_TCSCR	| 64
 								  PPC   | KVM_REG_PPC_PID	| 64
 								  PPC   | KVM_REG_PPC_ACOP	| 64
-												KVM: PPC: Book3S: Add GET/SET_ONE_REG interface for VRSAVE

The VRSAVE register value for a vcpu is accessible through the
GET/SET_SREGS interface for Book E processors, but not for Book 3S
processors.  In order to make this accessible for Book 3S processors,
this adds a new register identifier for GET/SET_ONE_REG, and adds
the code to implement it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-09-06 03:18:32 +00:00
+								  PPC   | KVM_REG_PPC_VRSAVE	| 32
-												KVM: PPC: Book3S HV: Store LPCR value for each virtual core

This adds the ability to have a separate LPCR (Logical Partitioning
Control Register) value relating to a guest for each virtual core,
rather than only having a single value for the whole VM.  This
corresponds to what real POWER hardware does, where there is a LPCR
per CPU thread but most of the fields are required to have the same
value on all active threads in a core.

The per-virtual-core LPCR can be read and written using the
GET/SET_ONE_REG interface.  Userspace can can only modify the
following fields of the LPCR value:

DPFD	Default prefetch depth
ILE	Interrupt little-endian
TC	Translation control (secondary HPT hash group search disable)

We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
contains bits relating to memory management, i.e. the Virtualized
Partition Memory (VPM) bits and the bits relating to guest real mode.
When this default value is updated, the update needs to be propagated
to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-09-20 04:52:38 +00:00
+								  PPC   | KVM_REG_PPC_LPCR	| 64
-												KVM: PPC: Book3S HV: Add support for guest Program Priority Register

POWER7 and later IBM server processors have a register called the
Program Priority Register (PPR), which controls the priority of
each hardware CPU SMT thread, and affects how fast it runs compared
to other SMT threads.  This priority can be controlled by writing to
the PPR or by use of a set of instructions of the form or rN,rN,rN
which are otherwise no-ops but have been defined to set the priority
to particular levels.

This adds code to context switch the PPR when entering and exiting
guests and to make the PPR value accessible through the SET/GET_ONE_REG
interface.  When entering the guest, we set the PPR as late as
possible, because if we are setting a low thread priority it will
make the code run slowly from that point on.  Similarly, the
first-level interrupt handlers save the PPR value in the PACA very
early on, and set the thread priority to the medium level, so that
the interrupt handling code runs at a reasonable speed.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-09-20 04:52:39 +00:00
+								  PPC   | KVM_REG_PPC_PPR	| 64
-												KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7

This enables us to use the Processor Compatibility Register (PCR) on
POWER7 to put the processor into architecture 2.05 compatibility mode
when running a guest.  In this mode the new instructions and registers
that were introduced on POWER7 are disabled in user mode.  This
includes all the VSX facilities plus several other instructions such
as ldbrx, stdbrx, popcntw, popcntd, etc.

To select this mode, we have a new register accessible through the
set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
this to zero gives the full set of capabilities of the processor.
Setting it to one of the "logical" PVR values defined in PAPR puts
the vcpu into the compatibility mode for the corresponding
architecture level.  The supported values are:

0x0f000002	Architecture 2.05 (POWER6)
0x0f000003	Architecture 2.06 (POWER7)
0x0f100003	Architecture 2.06+ (POWER7+)

Since the PCR is per-core, the architecture compatibility level and
the corresponding PCR value are stored in the struct kvmppc_vcore, and
are therefore shared between all vcpus in a virtual core.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: squash in fix to add missing break statements and documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-09-21 04:35:02 +00:00
+								  PPC   | KVM_REG_PPC_ARCH_COMPAT 32
-												KVM: PPC: Book3S HV: Add support for DABRX register on POWER7

The DABRX (DABR extension) register on POWER7 processors provides finer
control over which accesses cause a data breakpoint interrupt.  It
contains 3 bits which indicate whether to enable accesses in user,
kernel and hypervisor modes respectively to cause data breakpoint
interrupts, plus one bit that enables both real mode and virtual mode
accesses to cause interrupts.  Currently, KVM sets DABRX to allow
both kernel and user accesses to cause interrupts while in the guest.

This adds support for the guest to specify other values for DABRX.
PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
and DABRX with one call.  This adds a real-mode implementation of
H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
implementation.  To support this, we add a per-vcpu field to store the
DABRX value plus code to get and set it via the ONE_REG interface.

For Linux guests to use this new hcall, userspace needs to add
"hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
property in the device tree.  If userspace does this and then migrates
the guest to a host where the kernel doesn't include this patch, then
userspace will need to implement H_SET_XDABR by writing the specified
DABR value to the DABR using the ONE_REG interface.  In that case, the
old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
still work correctly, at least for Linux guests, since Linux guests
cope with getting data breakpoint interrupts in modes that weren't
requested by just ignoring the interrupt, and Linux guests never set
DABRX_BTI.

The other thing this does is to make H_SET_DABR and H_SET_XDABR work
on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
For them, this adds the logic to convert DABR/X values into DAWR/X values
on POWER8.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2014-01-08 10:25:29 +00:00
+								  PPC   | KVM_REG_PPC_DABRX     | 32
-												KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg

This reserves space in get/set_one_reg ioctl for the extra guest state
needed for POWER8.  It doesn't implement these at all, it just reserves
them so that the ABI is defined now.

A few things to note here:

- This add *a lot* state for transactional memory.  TM suspend mode,
  this is unavoidable, you can't simply roll back all transactions and
  store only the checkpointed state.  I've added this all to
  get/set_one_reg (including GPRs) rather than creating a new ioctl
  which returns a struct kvm_regs like KVM_GET_REGS does.  This means we
  if we need to extract the TM state, we are going to need a bucket load
  of IOCTLs.  Hopefully most of the time this will not be needed as we
  can look at the MSR to see if TM is active and only grab them when
  needed.  If this becomes a bottle neck in future we can add another
  ioctl to grab all this state in one go.

- The TM state is offset by 0x80000000.

- For TM, I've done away with VMX and FP and created a single 64x128 bit
  VSX register space.

- I've left a space of 1 (at 0x9c) since Paulus needs to add a value
  which applies to POWER7 as well.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-09-03 01:13:12 +00:00
+								  PPC   | KVM_REG_PPC_TM_GPR0	| 64
 								          ...
 								  PPC   | KVM_REG_PPC_TM_GPR31	| 64
 								  PPC   | KVM_REG_PPC_TM_VSR0	| 128
 								          ...
 								  PPC   | KVM_REG_PPC_TM_VSR63	| 128
 								  PPC   | KVM_REG_PPC_TM_CR	| 64
 								  PPC   | KVM_REG_PPC_TM_LR	| 64
 								  PPC   | KVM_REG_PPC_TM_CTR	| 64
 								  PPC   | KVM_REG_PPC_TM_FPSCR	| 64
 								  PPC   | KVM_REG_PPC_TM_AMR	| 64
 								  PPC   | KVM_REG_PPC_TM_PPR	| 64
 								  PPC   | KVM_REG_PPC_TM_VRSAVE	| 64
 								  PPC   | KVM_REG_PPC_TM_VSCR	| 32
 								  PPC   | KVM_REG_PPC_TM_DSCR	| 64
 								  PPC   | KVM_REG_PPC_TM_TAR	| 64
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
+								ARM registers are mapped using the lower 32 bits.  The upper 16 of that
 								is the register group type, or coprocessor number:
 								ARM core registers have the following id bit patterns:
-												KVM: ARM: Fix API documentation for ONE_REG encoding

Unless I'm mistaken, the size field was encoded 4 bits off and a wrong
value was used for 64-bit FP registers.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

											
										
										
											2013-04-23 01:57:46 +00:00
+x4020 0000 0010 <index into the kvm_regs struct:16>
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
-												KVM: ARM: User space API for getting/setting co-proc registers

The following three ioctls are implemented:
 -  KVM_GET_REG_LIST
 -  KVM_GET_ONE_REG
 -  KVM_SET_ONE_REG

Now we have a table for all the cp15 registers, we can drive a generic
API.

The register IDs carry the following encoding:

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>

For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:10 +00:00
+								ARM 32-bit CP15 registers have the following id bit patterns:
-												KVM: ARM: Fix API documentation for ONE_REG encoding

Unless I'm mistaken, the size field was encoded 4 bits off and a wrong
value was used for 64-bit FP registers.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

											
										
										
											2013-04-23 01:57:46 +00:00
+x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
-												KVM: ARM: User space API for getting/setting co-proc registers

The following three ioctls are implemented:
 -  KVM_GET_REG_LIST
 -  KVM_GET_ONE_REG
 -  KVM_SET_ONE_REG

Now we have a table for all the cp15 registers, we can drive a generic
API.

The register IDs carry the following encoding:

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>

For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:10 +00:00
 								ARM 64-bit CP15 registers have the following id bit patterns:
-												KVM: ARM: Fix API documentation for ONE_REG encoding

Unless I'm mistaken, the size field was encoded 4 bits off and a wrong
value was used for 64-bit FP registers.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

											
										
										
											2013-04-23 01:57:46 +00:00
+x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
-												KVM: ARM: Demux CCSIDR in the userspace API

The Cache Size Selection Register (CSSELR) selects the current Cache
Size ID Register (CCSIDR).  You write which cache you are interested
in to CSSELR, and read the information out of CCSIDR.

Which cache numbers are valid is known by reading the Cache Level ID
Register (CLIDR).

To export this state to userspace, we add a KVM_REG_ARM_DEMUX
numberspace (17), which uses 8 bits to represent which register is
being demultiplexed (0 for CCSIDR), and the lower 8 bits to represent
this demultiplexing (in our case, the CSSELR value, which is 4 bits).

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:10 +00:00
+								ARM CCSIDR registers are demultiplexed by CSSELR value:
-												KVM: ARM: Fix API documentation for ONE_REG encoding

Unless I'm mistaken, the size field was encoded 4 bits off and a wrong
value was used for 64-bit FP registers.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

											
										
										
											2013-04-23 01:57:46 +00:00
+x4020 0000 0011 00 <csselr:8>
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
-												KVM: ARM: VFP userspace interface

We use space #18 for floating point regs.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:11 +00:00
+								ARM 32-bit VFP control registers have the following id bit patterns:
-												KVM: ARM: Fix API documentation for ONE_REG encoding

Unless I'm mistaken, the size field was encoded 4 bits off and a wrong
value was used for 64-bit FP registers.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

											
										
										
											2013-04-23 01:57:46 +00:00
+x4020 0000 0012 1 <regno:12>
-												KVM: ARM: VFP userspace interface

We use space #18 for floating point regs.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:11 +00:00
 								ARM 64-bit FP registers have the following id bit patterns:
-												KVM: ARM: Fix API documentation for ONE_REG encoding

Unless I'm mistaken, the size field was encoded 4 bits off and a wrong
value was used for 64-bit FP registers.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>

											
										
										
											2013-04-23 01:57:46 +00:00
+x4030 0000 0012 0 <regno:12>
-												KVM: ARM: VFP userspace interface

We use space #18 for floating point regs.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:11 +00:00
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
 								arm64 registers are mapped using the lower 32 bits. The upper 16 of
 								that is the register group type, or coprocessor number:
 								arm64 core/FP-SIMD registers have the following id bit patterns. Note
 								that the size of the access is variable, as the kvm_regs structure
 								contains elements ranging from 32 to 128 bits. The index is a 32bit
 								value in the kvm_regs structure seen as a 32bit array.
 x60x0 0000 0010 <index into the kvm_regs struct:16>
 								arm64 CCSIDR registers are demultiplexed by CSSELR value:
 x6020 0000 0011 00 <csselr:8>
 								arm64 system registers have the following id bit patterns:
 x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>
-												KVM: PPC: Add generic single register ioctls

Right now we transfer a static struct every time we want to get or set
registers. Unfortunately, over time we realize that there are more of
these than we thought of before and the extensibility and flexibility of
transferring a full struct every time is limited.

So this is a new approach to the problem. With these new ioctls, we can
get and set a single register that is identified by an ID. This allows for
very precise and limited transmittal of data. When we later realize that
it's a better idea to shove over multiple registers at once, we can reuse
most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-09-14 08:02:41 +00:00
+.69 KVM_GET_ONE_REG
 								Capability: KVM_CAP_ONE_REG
 								Architectures: all
 								Type: vcpu ioctl
 								Parameters: struct kvm_one_reg (in and out)
 								Returns: 0 on success, negative value on failure
 								This ioctl allows to receive the value of a single register implemented
 								in a vcpu. The register to read is indicated by the "id" field of the
 								kvm_one_reg struct passed in. On success, the register value can be found
 								at the memory location pointed to by "addr".
 								The list of registers accessible using this interface is identical to the
-												Document IACx/DACx registers access using ONE_REG API

Patch to access the debug registers (IACx/DACx) using ONE_REG api
was sent earlier. But that missed the respective documentation.

Also corrected the index number referencing in section 4.69

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-08-15 17:37:13 +00:00
+								list in 4.68.
-												KVM: PPC: Add generic single register ioctls

Right now we transfer a static struct every time we want to get or set
registers. Unfortunately, over time we realize that there are more of
these than we thought of before and the extensibility and flexibility of
transferring a full struct every time is limited.

So this is a new approach to the problem. With these new ioctls, we can
get and set a single register that is identified by an ID. This allows for
very precise and limited transmittal of data. When we later realize that
it's a better idea to shove over multiple registers at once, we can reuse
most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
interface.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-09-14 08:02:41 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: x86: Add ioctl for KVM_KVMCLOCK_CTRL

Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-03-10 19:37:27 +00:00
+.70 KVM_KVMCLOCK_CTRL
 								Capability: KVM_CAP_KVMCLOCK_CTRL
 								Architectures: Any that implement pvclocks (currently x86 only)
 								Type: vcpu ioctl
 								Parameters: None
 								Returns: 0 on success, -1 on error
 								This signals to the host kernel that the specified guest is being paused by
 								userspace.  The host will set a flag in the pvclock structure that is checked
 								from the soft lockup watchdog.  The flag is part of the pvclock structure that
 								is shared between guest and host, specifically the second bit of the flags
 								field of the pvclock_vcpu_time_info structure.  It will be set exclusively by
 								the host and read/cleared exclusively by the guest.  The guest operation of
 								checking and clearing the flag must an atomic operation so
 								load-link/store-conditional, or equivalent must be used.  There are two cases
 								where the guest will clear the flag: when the soft lockup watchdog timer resets
 								itself or when a soft lockup is detected.  This ioctl can be called any time
 								after pausing the vcpu, but before it is resumed.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Introduce direct MSI message injection for in-kernel irqchips

Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated "on the fly" by user space,
IRQ routes are a limited resource that user space has to manage
carefully.

By providing a direct injection path, we can both avoid using up limited
resources and simplify the necessary steps for user land.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2012-03-29 19:14:12 +00:00
+.71 KVM_SIGNAL_MSI
 								Capability: KVM_CAP_SIGNAL_MSI
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_msi (in)
 								Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
 								Directly inject a MSI message. Only valid with in-kernel irqchip that handles
 								MSI messages.
 								struct kvm_msi {
 									__u32 address_lo;
 									__u32 address_hi;
 									__u32 data;
 									__u32 flags;
 									__u8  pad[16];
 								};
 								No flags are defined so far. The corresponding field must be 0.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: x86: Document in-kernel PIT API

Add descriptions for KVM_CREATE_PIT2 and KVM_GET/SET_PIT2.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:16 +00:00
+.71 KVM_CREATE_PIT2
 								Capability: KVM_CAP_PIT2
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_pit_config (in)
 								Returns: 0 on success, -1 on error
 								Creates an in-kernel device model for the i8254 PIT. This call is only valid
 								after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
 								parameters have to be passed:
 								struct kvm_pit_config {
 									__u32 flags;
 									__u32 pad[15];
 								};
 								Valid flags are:
 								#define KVM_PIT_SPEAKER_DUMMY     1 /* emulate speaker port stub */
-												KVM: x86: Run PIT work in own kthread

We can't run PIT IRQ injection work in the interrupt context of the host
timer. This would allow the user to influence the handler complexity by
asking for a broadcast to a large number of VCPUs. Therefore, this work
was pushed into workqueue context in 9d244caf2e. However, this prevents
prioritizing the PIT injection over other task as workqueues share
kernel threads.

This replaces the workqueue with a kthread worker and gives that thread
a name in the format "kvm-pit/<owner-process-pid>". That allows to
identify and adjust the kthread priority according to the VM process
parameters.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:17 +00:00
+								PIT timer interrupts may use a per-VM kernel thread for injection. If it
 								exists, this thread will have a name of the following pattern:
 								kvm-pit/<owner-process-pid>
 								When running a guest with elevated priorities, the scheduling parameters of
 								this thread may have to be adjusted accordingly.
-												KVM: x86: Document in-kernel PIT API

Add descriptions for KVM_CREATE_PIT2 and KVM_GET/SET_PIT2.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:16 +00:00
+								This IOCTL replaces the obsolete KVM_CREATE_PIT.
 .72 KVM_GET_PIT2
 								Capability: KVM_CAP_PIT_STATE2
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_pit_state2 (out)
 								Returns: 0 on success, -1 on error
 								Retrieves the state of the in-kernel PIT model. Only valid after
 								KVM_CREATE_PIT2. The state is returned in the following structure:
 								struct kvm_pit_state2 {
 									struct kvm_pit_channel_state channels[3];
 									__u32 flags;
 									__u32 reserved[9];
 								};
 								Valid flags are:
 								/* disable PIT in HPET legacy mode */
 								#define KVM_PIT_FLAGS_HPET_LEGACY  0x00000001
 								This IOCTL replaces the obsolete KVM_GET_PIT.
 .73 KVM_SET_PIT2
 								Capability: KVM_CAP_PIT_STATE2
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_pit_state2 (in)
 								Returns: 0 on success, -1 on error
 								Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
 								See KVM_GET_PIT2 for details on struct kvm_pit_state2.
 								This IOCTL replaces the obsolete KVM_SET_PIT.
-												kvm/powerpc: Add new ioctl to retreive server MMU infos

This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-04-26 19:43:42 +00:00
+.74 KVM_PPC_GET_SMMU_INFO
 								Capability: KVM_CAP_PPC_GET_SMMU_INFO
 								Architectures: powerpc
 								Type: vm ioctl
 								Parameters: None
 								Returns: 0 on success, -1 on error
 								This populates and returns a structure describing the features of
 								the "Server" class MMU emulation supported by KVM.
-												Documentation/virtual/kvm/api.txt fix a typo

Corrected the word appropariate to appropriate.

Signed-off-by: Stefan Huber <steffhip@googlemail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2013-06-05 10:24:37 +00:00
+								This can in turn be used by userspace to generate the appropriate
-												kvm/powerpc: Add new ioctl to retreive server MMU infos

This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-04-26 19:43:42 +00:00
+								device-tree properties for the guest operating system.
 								The structure contains some global informations, followed by an
 								array of supported segment page sizes:
 								      struct kvm_ppc_smmu_info {
 									     __u64 flags;
 									     __u32 slb_size;
 									     __u32 pad;
 									     struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
 								      };
 								The supported flags are:
 								    - KVM_PPC_PAGE_SIZES_REAL:
 								        When that flag is set, guest page sizes must "fit" the backing
 								        store page sizes. When not set, any page size in the list can
 								        be used regardless of how they are backed by userspace.
 								    - KVM_PPC_1T_SEGMENTS
 								        The emulated MMU supports 1T segments in addition to the
 								        standard 256M ones.
 								The "slb_size" field indicates how many SLB entries are supported
 								The "sps" array contains 8 entries indicating the supported base
 								page sizes for a segment in increasing order. Each entry is defined
 								as follow:
 								   struct kvm_ppc_one_seg_page_size {
 									__u32 page_shift;	/* Base page shift of segment (or 0) */
 									__u32 slb_enc;		/* SLB encoding for BookS */
 									struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
 								   };
 								An entry with a "page_shift" of 0 is unused. Because the array is
 								organized in increasing order, a lookup can stop when encoutering
 								such an entry.
 								The "slb_enc" field provides the encoding to use in the SLB for the
 								page size. The bits are in positions such as the value can directly
 								be OR'ed into the "vsid" argument of the slbmte instruction.
 								The "enc" array is a list which for each of those segment base page
 								size provides the list of supported actual page sizes (which can be
 								only larger or equal to the base page size), along with the
-												doc: fix misspellings with 'codespell' tool

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

											
										
										
											2013-05-08 23:56:16 +00:00
+								corresponding encoding in the hash PTE. Similarly, the array is
-												kvm/powerpc: Add new ioctl to retreive server MMU infos

This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-04-26 19:43:42 +00:00
+entries sorted by increasing sizes and an entry with a "0" shift
 								is an empty entry and a terminator:
 								   struct kvm_ppc_one_page_size {
 									__u32 page_shift;	/* Page shift (or 0) */
 									__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
 								   };
 								The "pte_enc" field provides a value that can OR'ed into the hash
 								PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
 								into the hash PTE second double word).
-												KVM: Add missing KVM_IRQFD API documentation

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-06-29 15:56:16 +00:00
+.75 KVM_IRQFD
 								Capability: KVM_CAP_IRQFD
 								Architectures: x86
 								Type: vm ioctl
 								Parameters: struct kvm_irqfd (in)
 								Returns: 0 on success, -1 on error
 								Allows setting an eventfd to directly trigger a guest interrupt.
 								kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
 								kvm_irqfd.gsi specifies the irqchip pin toggled by this event.  When
-												KVM: doc: Fix typo in doc/virtual/kvm

Correct spelling typo in Documentations/virtual/kvm

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2013-12-21 16:21:23 +00:00
+								an event is triggered on the eventfd, an interrupt is injected into
-												KVM: Add missing KVM_IRQFD API documentation

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-06-29 15:56:16 +00:00
+								the guest using the specified gsi pin.  The irqfd is removed using
 								the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 								and kvm_irqfd.gsi.
-												KVM: Add resampling irqfds for level triggered interrupts

To emulate level triggered interrupts, add a resample option to
KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
the user when the irqchip has been resampled by the VM.  This may, for
instance, indicate an EOI.  Also in this mode, posting of an interrupt
through an irqfd only asserts the interrupt.  On resampling, the
interrupt is automatically de-asserted prior to user notification.
This enables level triggered interrupts to be posted and re-enabled
from vfio with no userspace intervention.

All resampling irqfds can make use of a single irq source ID, so we
reserve a new one for this interface.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2012-09-21 17:58:03 +00:00
+								With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
 								mechanism allowing emulation of level-triggered, irqfd-based
 								interrupts.  When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
 								additional eventfd in the kvm_irqfd.resamplefd field.  When operating
 								in resample mode, posting of an interrupt through kvm_irq.fd asserts
 								the specified gsi in the irqchip.  When the irqchip is resampled, such
-												KVM: doc: Fix typo in doc/virtual/kvm

Correct spelling typo in Documentations/virtual/kvm

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2013-12-21 16:21:23 +00:00
+								as from an EOI, the gsi is de-asserted and the user is notified via
-												KVM: Add resampling irqfds for level triggered interrupts

To emulate level triggered interrupts, add a resample option to
KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
the user when the irqchip has been resampled by the VM.  This may, for
instance, indicate an EOI.  Also in this mode, posting of an interrupt
through an irqfd only asserts the interrupt.  On resampling, the
interrupt is automatically de-asserted prior to user notification.
This enables level triggered interrupts to be posted and re-enabled
from vfio with no userspace intervention.

All resampling irqfds can make use of a single irq source ID, so we
reserve a new one for this interface.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2012-09-21 17:58:03 +00:00
+								kvm_irqfd.resamplefd.  It is the user's responsibility to re-queue
 								the interrupt if the device making use of it still requires service.
 								Note that closing the resamplefd is not sufficient to disable the
 								irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
 								and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
-												 KVM updates for the 3.6 merge window
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ
 ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8
 ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN
 UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG
 csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64
 3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv
 pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb
 Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi
 Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/
 pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5
 JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm
 2AsN5bS0BWq+aqPpZHa5
 =pECD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Avi Kivity:
 "Highlights include
  - full big real mode emulation on pre-Westmere Intel hosts (can be
    disabled with emulate_invalid_guest_state=0)
  - relatively small ppc and s390 updates
  - PCID/INVPCID support in guests
  - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
    interrupt intensive workloads)
  - Lockless write faults during live migration
  - EPT accessed/dirty bits support for new Intel processors"

Fix up conflicts in:
 - Documentation/virtual/kvm/api.txt:

   Stupid subchapter numbering, added next to each other.

 - arch/powerpc/kvm/booke_interrupts.S:

   PPC asm changes clashing with the KVM fixes

 - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

   Duplicated commits through the kvm tree and the s390 tree, with
   subsequent edits in the KVM tree.

* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
  KVM: fix race with level interrupts
  x86, hyper: fix build with !CONFIG_KVM_GUEST
  Revert "apic: fix kvm build on UP without IOAPIC"
  KVM guest: switch to apic_set_eoi_write, apic_write
  apic: add apic_set_eoi_write for PV use
  KVM: VMX: Implement PCID/INVPCID for guests with EPT
  KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
  KVM: PPC: Critical interrupt emulation support
  KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
  KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
  KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
  KVM: PPC: bookehv64: Add support for std/ld emulation.
  booke: Added crit/mc exception handler for e500v2
  booke/bookehv: Add host crit-watchdog exception support
  KVM: MMU: document mmu-lock and fast page fault
  KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
  KVM: MMU: trace fast page fault
  KVM: MMU: fast path of handling guest page fault
  KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
  KVM: MMU: fold tlb flush judgement into mmu_spte_update
  ...

											
										
										
											2012-07-24 19:01:20 +00:00
+.76 KVM_PPC_ALLOCATE_HTAB
-												KVM: PPC: Book3S HV: Make the guest hash table size configurable

This adds a new ioctl to enable userspace to control the size of the guest
hashed page table (HPT) and to clear it out when resetting the guest.
The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
a pointer to a u32 containing the desired order of the HPT (log base 2
of the size in bytes), which is updated on successful return to the
actual order of the HPT which was allocated.

There must be no vcpus running at the time of this ioctl.  To enforce
this, we now keep a count of the number of vcpus running in
kvm->arch.vcpus_running.

If the ioctl is called when a HPT has already been allocated, we don't
reallocate the HPT but just clear it out.  We first clear the
kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
the kvm->lock mutex, it will prevent any vcpus from starting to run until
we're done, and (b) it means that the first vcpu to run after we're done
will re-establish the VRMA if necessary.

If userspace doesn't call this ioctl before running the first vcpu, the
kernel will allocate a default-sized HPT at that point.  We do it then
rather than when creating the VM, as the code did previously, so that
userspace has a chance to do the ioctl if it wants.

When allocating the HPT, we can allocate either from the kernel page
allocator, or from the preallocated pool.  If userspace is asking for
a different size from the preallocated HPTs, we first try to allocate
using the kernel page allocator.  Then we try to allocate from the
preallocated pool, and then if that fails, we try allocating decreasing
sizes from the kernel page allocator, down to the minimum size allowed
(256kB).  Note that the kernel page allocator limits allocations to
1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
16MB (on 64-bit powerpc, at least).

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix module compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-05-04 02:32:53 +00:00
 								Capability: KVM_CAP_PPC_ALLOC_HTAB
 								Architectures: powerpc
 								Type: vm ioctl
 								Parameters: Pointer to u32 containing hash table order (in/out)
 								Returns: 0 on success, -1 on error
 								This requests the host kernel to allocate an MMU hash table for a
 								guest using the PAPR paravirtualization interface.  This only does
 								anything if the kernel is configured to use the Book 3S HV style of
 								virtualization.  Otherwise the capability doesn't exist and the ioctl
 								returns an ENOTTY error.  The rest of this description assumes Book 3S
 								HV.
 								There must be no vcpus running when this ioctl is called; if there
 								are, it will do nothing and return an EBUSY error.
 								The parameter is a pointer to a 32-bit unsigned integer variable
 								containing the order (log base 2) of the desired size of the hash
 								table, which must be between 18 and 46.  On successful return from the
 								ioctl, it will have been updated with the order of the hash table that
 								was allocated.
 								If no hash table has been allocated when any vcpu is asked to run
 								(with the KVM_RUN ioctl), the host kernel will allocate a
 								default-sized hash table (16 MB).
 								If this ioctl is called when a hash table has already been allocated,
 								the kernel will clear out the existing hash table (zero all HPTEs) and
 								return the hash table order in the parameter.  (If the guest is using
 								the virtualized real-mode area (VRMA) facility, the kernel will
 								re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
-												s390/kvm: Add documentation for KVM_S390_INTERRUPT

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-10-02 14:25:37 +00:00
+.77 KVM_S390_INTERRUPT
 								Capability: basic
 								Architectures: s390
 								Type: vm ioctl, vcpu ioctl
 								Parameters: struct kvm_s390_interrupt (in)
 								Returns: 0 on success, -1 on error
 								Allows to inject an interrupt to the guest. Interrupts can be floating
 								(vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.
 								Interrupt parameters are passed via kvm_s390_interrupt:
 								struct kvm_s390_interrupt {
 									__u32 type;
 									__u32 parm;
 									__u64 parm64;
 								};
 								type can be one of the following:
 								KVM_S390_SIGP_STOP (vcpu) - sigp restart
 								KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
 								KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
 								KVM_S390_RESTART (vcpu) - restart
 								KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
 											   parameters in parm and parm64
 								KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
 								KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
 								KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
-												KVM: s390: Support for I/O interrupts.

Add support for handling I/O interrupts (standard, subchannel-related
ones and rudimentary adapter interrupts).

The subchannel-identifying parameters are encoded into the interrupt
type.

I/O interrupts are floating, so they can't be injected on a specific
vcpu.

Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-12-20 14:32:08 +00:00
+								KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
 								    I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
 								    I/O interruption parameters in parm (subchannel) and parm64 (intparm,
 								    interruption subclass)
-												KVM: s390: Add support for machine checks.

Add support for injecting machine checks (only repressible
conditions for now).

This is a bit more involved than I/O interrupts, for these reasons:

- Machine checks come in both floating and cpu varieties.
- We don't have a bit for machine checks enabling, but have to use
  a roundabout approach with trapping PSW changing instructions and
  watching for opened machine checks.

Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-12-20 14:32:09 +00:00
+								KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
 								                           machine check interrupt code in parm64 (note that
 								                           machine checks needing further payload are not
 								                           supported by this ioctl)
-												s390/kvm: Add documentation for KVM_S390_INTERRUPT

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-10-02 14:25:37 +00:00
 								Note that the vcpu ioctl is asynchronous to vcpu execution.
-												KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT.  There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags.  The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).

This is intended for use in implementing qemu's savevm/loadvm and for
live migration.  Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
end of the HPT, it returns from the read.  Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.

The format of the data provides a simple run-length compression of the
invalid entries.  Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries.  The valid entries, 16
bytes each, follow the header.  The invalid entries are not explicitly
represented.

Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2012-11-19 22:57:20 +00:00
+.78 KVM_PPC_GET_HTAB_FD
 								Capability: KVM_CAP_PPC_HTAB_FD
 								Architectures: powerpc
 								Type: vm ioctl
 								Parameters: Pointer to struct kvm_get_htab_fd (in)
 								Returns: file descriptor number (>= 0) on success, -1 on error
 								This returns a file descriptor that can be used either to read out the
 								entries in the guest's hashed page table (HPT), or to write entries to
 								initialize the HPT.  The returned fd can only be written to if the
 								KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
 								can only be read if that bit is clear.  The argument struct looks like
 								this:
 								/* For KVM_PPC_GET_HTAB_FD */
 								struct kvm_get_htab_fd {
 									__u64	flags;
 									__u64	start_index;
 									__u64	reserved[2];
 								};
 								/* Values for kvm_get_htab_fd.flags */
 								#define KVM_GET_HTAB_BOLTED_ONLY	((__u64)0x1)
 								#define KVM_GET_HTAB_WRITE		((__u64)0x2)
 								The `start_index' field gives the index in the HPT of the entry at
 								which to start reading.  It is ignored when writing.
 								Reads on the fd will initially supply information about all
 								"interesting" HPT entries.  Interesting entries are those with the
 								bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
 								all entries.  When the end of the HPT is reached, the read() will
 								return.  If read() is called again on the fd, it will start again from
 								the beginning of the HPT, but will only return HPT entries that have
 								changed since they were last read.
 								Data read or written is structured as a header (8 bytes) followed by a
 								series of valid HPT entries (16 bytes) each.  The header indicates how
 								many valid HPT entries there are and how many invalid entries follow
 								the valid entries.  The invalid entries are not represented explicitly
 								in the stream.  The header format is:
 								struct kvm_get_htab_header {
 									__u32	index;
 									__u16	n_valid;
 									__u16	n_invalid;
 								};
 								Writes to the fd create HPT entries starting at the index given in the
 								header; first `n_valid' valid entries with contents from the data
 								written, then `n_invalid' invalid entries, invalidating any previously
 								valid entries found.
-												kvm: add device control API

Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it.  If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.

This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device.  Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc.  It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.

Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-12 14:08:42 +00:00
+.79 KVM_CREATE_DEVICE
 								Capability: KVM_CAP_DEVICE_CTRL
 								Type: vm ioctl
 								Parameters: struct kvm_create_device (in/out)
 								Returns: 0 on success, -1 on error
 								Errors:
 								  ENODEV: The device type is unknown or unsupported
 								  EEXIST: Device already created, and this type of device may not
 								          be instantiated multiple times
 								  Other error conditions may be defined by individual device types or
 								  have their standard meanings.
 								Creates an emulated device in the kernel.  The file descriptor returned
 								in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
 								If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
 								device type is supported (not necessarily whether it can be created
 								in the current vm).
 								Individual devices should not define flags.  Attributes should be used
 								for specifying any behavior that is not implied by the device type
 								number.
 								struct kvm_create_device {
 									__u32	type;	/* in: KVM_DEV_TYPE_xxx */
 									__u32	fd;	/* out: device handle */
 									__u32	flags;	/* in: KVM_CREATE_DEVICE_xxx */
 								};
 .80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
-												KVM: s390: Per-vm kvm device controls

We sometimes need to get/set attributes specific to a virtual machine
and so need something else than ONE_REG.

Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vm file descriptor.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

											
										
										
											2014-04-09 11:13:00 +00:00
+								Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
 								Type: device ioctl, vm ioctl
-												kvm: add device control API

Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it.  If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.

This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device.  Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc.  It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.

Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-12 14:08:42 +00:00
+								Parameters: struct kvm_device_attr
 								Returns: 0 on success, -1 on error
 								Errors:
 								  ENXIO:  The group or attribute is unknown/unsupported for this device
 								  EPERM:  The attribute cannot (currently) be accessed this way
 								          (e.g. read-only attribute, or attribute that only makes
 								          sense when the device is in a different state)
 								  Other error conditions may be defined by individual device types.
 								Gets/sets a specified piece of device configuration and/or state.  The
 								semantics are device-specific.  See individual device documentation in
 								the "devices" directory.  As with ONE_REG, the size of the data
 								transferred is defined by the particular attribute.
 								struct kvm_device_attr {
 									__u32	flags;		/* no flags currently defined */
 									__u32	group;		/* device-defined */
 									__u64	attr;		/* group-defined */
 									__u64	addr;		/* userspace address of attr data */
 								};
 .81 KVM_HAS_DEVICE_ATTR
-												KVM: s390: Per-vm kvm device controls

We sometimes need to get/set attributes specific to a virtual machine
and so need something else than ONE_REG.

Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vm file descriptor.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

											
										
										
											2014-04-09 11:13:00 +00:00
+								Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
 								Type: device ioctl, vm ioctl
-												kvm: add device control API

Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it.  If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.

This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device.  Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc.  It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.

Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-12 14:08:42 +00:00
+								Parameters: struct kvm_device_attr
 								Returns: 0 on success, -1 on error
 								Errors:
 								  ENXIO:  The group or attribute is unknown/unsupported for this device
 								Tests whether a device supports a particular attribute.  A successful
 								return indicates the attribute is implemented.  It does not necessarily
 								indicate that the attribute can be read or written in the device's
 								current state.  "addr" is ignored.
-												KVM: Add missing KVM_IRQFD API documentation

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-06-29 15:56:16 +00:00
-												kvm api doc: fix section numbers

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

											
										
										
											2013-06-19 01:42:07 +00:00
+.82 KVM_ARM_VCPU_INIT
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
 								Capability: basic
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								Architectures: arm, arm64
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
+								Type: vcpu ioctl
-												KVM: Documentation: Fix typo for KVM_ARM_VCPU_INIT ioctl

Fix minor typo in "Parameters:" of KVM_ARM_VCPU_INIT documentation.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2013-12-12 16:12:24 +00:00
+								Parameters: struct kvm_vcpu_init (in)
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
+								Returns: 0 on success; -1 on error
 								Errors:
 								  EINVAL:    the target is unknown, or the combination of features is invalid.
 								  ENOENT:    a features bit specified is unknown.
 								This tells KVM what type of CPU to present to the guest, and what
 								optional features it should have.  This will cause a reset of the cpu
 								registers to their initial values.  If this is not called, KVM_RUN will
 								return ENOEXEC for that vcpu.
 								Note that because some registers reflect machine topology, all vcpus
 								should be created before this ioctl is invoked.
-												KVM: ARM: Power State Coordination Interface implementation

Implement the PSCI specification (ARM DEN 0022A) to control
virtual CPUs being "powered" on or off.

PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability.

A virtual CPU can now be initialized in a "powered off" state,
using the KVM_ARM_VCPU_POWER_OFF feature flag.

The guest can use either SMC or HVC to execute a PSCI function.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:13 +00:00
+								Possible features:
 									- KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
 									  Depends on KVM_CAP_ARM_PSCI.
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+									- KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
 									  Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
-												KVM: Documentation: Add info regarding KVM_ARM_VCPU_PSCI_0_2 feature

We have in-kernel emulation of PSCI v0.2 in KVM ARM/ARM64. To provide
PSCI v0.2 interface to VCPUs, we have to enable KVM_ARM_VCPU_PSCI_0_2
feature when doing KVM_ARM_VCPU_INIT ioctl.

The patch updates documentation of KVM_ARM_VCPU_INIT ioctl to provide
info regarding KVM_ARM_VCPU_PSCI_0_2 feature.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2014-04-29 05:54:17 +00:00
+									- KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU.
 									  Depends on KVM_CAP_ARM_PSCI_0_2.
-												KVM: ARM: Power State Coordination Interface implementation

Implement the PSCI specification (ARM DEN 0022A) to control
virtual CPUs being "powered" on or off.

PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability.

A virtual CPU can now be initialized in a "powered off" state,
using the KVM_ARM_VCPU_POWER_OFF feature flag.

The guest can use either SMC or HVC to execute a PSCI function.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:13 +00:00
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
-												KVM: Add documentation for KVM_ARM_PREFERRED_TARGET ioctl

To implement CPU=Host we have added KVM_ARM_PREFERRED_TARGET
vm ioctl which provides information to user space required for
creating VCPU matching underlying Host.

This patch adds info related to this new KVM_ARM_PREFERRED_TARGET
vm ioctl in the KVM API documentation.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2013-09-30 08:50:08 +00:00
+.83 KVM_ARM_PREFERRED_TARGET
 								Capability: basic
 								Architectures: arm, arm64
 								Type: vm ioctl
 								Parameters: struct struct kvm_vcpu_init (out)
 								Returns: 0 on success; -1 on error
 								Errors:
-												KVM: ARM: Remove non-ASCII space characters

Some strange character leaped into the documentation, which makes
git-send-email behave quite strangely.  Get rid of this before it bites
anyone else.

Cc: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2013-10-16 00:43:00 +00:00
+								  ENODEV:    no preferred target available for the host
-												KVM: Add documentation for KVM_ARM_PREFERRED_TARGET ioctl

To implement CPU=Host we have added KVM_ARM_PREFERRED_TARGET
vm ioctl which provides information to user space required for
creating VCPU matching underlying Host.

This patch adds info related to this new KVM_ARM_PREFERRED_TARGET
vm ioctl in the KVM API documentation.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2013-09-30 08:50:08 +00:00
 								This queries KVM for preferred CPU target type which can be emulated
 								by KVM on underlying host.
 								The ioctl returns struct kvm_vcpu_init instance containing information
 								about preferred CPU target type and recommended features for it.  The
 								kvm_vcpu_init->features bitmap returned will have feature bits set if
 								the preferred target recommends setting these features, but this is
 								not mandatory.
 								The information returned by this ioctl can be used to prepare an instance
 								of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
 								in VCPU matching underlying host.
 .84 KVM_GET_REG_LIST
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
 								Capability: basic
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								Architectures: arm, arm64
-												KVM: ARM: Initial skeleton to compile KVM support

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>

											
										
										
											2013-01-20 23:28:06 +00:00
+								Type: vcpu ioctl
 								Parameters: struct kvm_reg_list (in/out)
 								Returns: 0 on success; -1 on error
 								Errors:
 								  E2BIG:     the reg index list is too big to fit in the array specified by
 								             the user (the number required will be written into n).
 								struct kvm_reg_list {
 									__u64 n; /* number of registers in reg[] */
 									__u64 reg[0];
 								};
 								This ioctl returns the guest registers that are supported for the
 								KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
-												KVM: arm-vgic: Set base addr through device API

Support setting the distributor and cpu interface base addresses in the
VM physical address space through the KVM_{SET,GET}_DEVICE_ATTR API
in addition to the ARM specific API.

This has the added benefit of being able to share more code in user
space and do things in a uniform manner.

Also deprecate the older API at the same time, but backwards
compatibility will be maintained.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2013-09-23 21:55:56 +00:00
 .85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
-												KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl

On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits.  An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.

We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-01-23 18:18:04 +00:00
 								Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								Architectures: arm, arm64
-												KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl

On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits.  An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.

We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-01-23 18:18:04 +00:00
+								Type: vm ioctl
 								Parameters: struct kvm_arm_device_address (in)
 								Returns: 0 on success, -1 on error
 								Errors:
 								  ENODEV: The device id is unknown
 								  ENXIO:  Device not supported on current system
 								  EEXIST: Address already set
 								  E2BIG:  Address outside guest physical address space
-												ARM: KVM: VGIC accept vcpu and dist base addresses from user space

User space defines the model to emulate to a guest and should therefore
decide which addresses are used for both the virtual CPU interface
directly mapped in the guest physical address space and for the emulated
distributor interface, which is mapped in software by the in-kernel VGIC
support.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-01-22 00:36:13 +00:00
+								  EBUSY:  Address overlaps with other device range
-												KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl

On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits.  An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.

We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-01-23 18:18:04 +00:00
 								struct kvm_arm_device_addr {
 									__u64 id;
 									__u64 addr;
 								};
 								Specify a device address in the guest's physical address space where guests
 								can access emulated or directly exposed devices, which the host kernel needs
 								to know about. The id field is an architecture specific identifier for a
 								specific device.
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								ARM/arm64 divides the id field into two parts, a device id and an
 								address type id specific to the individual device.
-												KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl

On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits.  An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.

We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-01-23 18:18:04 +00:00
 								  bits:  | 63        ...       32 | 31    ...    16 | 15    ...    0 |
 								  field: |        0x00000000      |     device id   |  addr type id  |
-												arm64: KVM: userspace API documentation

Unsurprisingly, the arm64 userspace API is extremely similar to
the 32bit one, the only significant difference being the ONE_REG
register mapping.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-04-02 16:46:31 +00:00
+								ARM/arm64 currently only require this when using the in-kernel GIC
 								support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2
 								as the device id.  When setting the base address for the guest's
 								mapping of the VGIC virtual CPU and distributor interface, the ioctl
 								must be called after calling KVM_CREATE_IRQCHIP, but before calling
 								KVM_RUN on any of the VCPUs.  Calling this ioctl twice for any of the
 								base addresses will return -EEXIST.
-												KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl

On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits.  An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.

We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-01-23 18:18:04 +00:00
-												KVM: arm-vgic: Set base addr through device API

Support setting the distributor and cpu interface base addresses in the
VM physical address space through the KVM_{SET,GET}_DEVICE_ATTR API
in addition to the ARM specific API.

This has the added benefit of being able to share more code in user
space and do things in a uniform manner.

Also deprecate the older API at the same time, but backwards
compatibility will be maintained.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2013-09-23 21:55:56 +00:00
+								Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API
 								should be used instead.
-												KVM: Add documentation for KVM_ARM_PREFERRED_TARGET ioctl

To implement CPU=Host we have added KVM_ARM_PREFERRED_TARGET
vm ioctl which provides information to user space required for
creating VCPU matching underlying Host.

This patch adds info related to this new KVM_ARM_PREFERRED_TARGET
vm ioctl in the KVM API documentation.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

											
										
										
											2013-09-30 08:50:08 +00:00
+.86 KVM_PPC_RTAS_DEFINE_TOKEN
-												KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls

For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself.  This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names.  Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix warning]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-17 20:30:00 +00:00
 								Capability: KVM_CAP_PPC_RTAS
 								Architectures: ppc
 								Type: vm ioctl
 								Parameters: struct kvm_rtas_token_args
 								Returns: 0 on success, -1 on error
 								Defines a token value for a RTAS (Run Time Abstraction Services)
 								service in order to allow it to be handled in the kernel.  The
 								argument struct gives the name of the service, which must be the name
 								of a service that has a kernel-side implementation.  If the token
 								value is non-zero, it will be associated with that service, and
 								subsequent RTAS calls by the guest specifying that token will be
 								handled by the kernel.  If the token value is 0, then any token
 								associated with the service will be forgotten, and subsequent RTAS
 								calls by the guest for that service will be passed to userspace to be
 								handled.
-												KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl

On ARM some bits are specific to the model being emulated for the guest and
user space needs a way to tell the kernel about those bits.  An example is mmio
device base addresses, where KVM must know the base address for a given device
to properly emulate mmio accesses within a certain address range or directly
map a device with virtualiation extensions into the guest address space.

We make this API ARM-specific as we haven't yet reached a consensus for a
generic API for all KVM architectures that will allow us to do something like
this.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

											
										
										
											2013-01-23 18:18:04 +00:00
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+. The kvm_run structure
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
+								------------------------
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 								Application code obtains a pointer to the kvm_run structure by
 								mmap()ing a vcpu fd.  From that point, application code can control
 								execution by changing fields in kvm_run prior to calling the KVM_RUN
 								ioctl, and obtain information about the reason KVM_RUN returned by
 								looking up structure members.
 								struct kvm_run {
 									/* in */
 									__u8 request_interrupt_window;
 								Request that KVM_RUN return when it becomes possible to inject external
 								interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
 									__u8 padding1[7];
 									/* out */
 									__u32 exit_reason;
 								When KVM_RUN has returned successfully (return value 0), this informs
 								application code why KVM_RUN has returned.  Allowable values for this
 								field are detailed below.
 									__u8 ready_for_interrupt_injection;
 								If request_interrupt_window has been specified, this field indicates
 								an interrupt can be injected now with KVM_INTERRUPT.
 									__u8 if_flag;
 								The value of the current interrupt flag.  Only valid if in-kernel
 								local APIC is not used.
 									__u8 padding2[2];
 									/* in (pre_kvm_run), out (post_kvm_run) */
 									__u64 cr8;
 								The value of the cr8 register.  Only valid if in-kernel local APIC is
 								not used.  Both input and output.
 									__u64 apic_base;
 								The value of the APIC BASE msr.  Only valid if in-kernel local
 								APIC is not used.  Both input and output.
 									union {
 										/* KVM_EXIT_UNKNOWN */
 										struct {
 											__u64 hardware_exit_reason;
 										} hw;
 								If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
 								reasons.  Further architecture-specific information is available in
 								hardware_exit_reason.
 										/* KVM_EXIT_FAIL_ENTRY */
 										struct {
 											__u64 hardware_entry_failure_reason;
 										} fail_entry;
 								If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
 								to unknown reasons.  Further architecture-specific information is
 								available in hardware_entry_failure_reason.
 										/* KVM_EXIT_EXCEPTION */
 										struct {
 											__u32 exception;
 											__u32 error_code;
 										} ex;
 								Unused.
 										/* KVM_EXIT_IO */
 										struct {
 								#define KVM_EXIT_IO_IN  0
 								#define KVM_EXIT_IO_OUT 1
 											__u8 direction;
 											__u8 size; /* bytes */
 											__u16 port;
 											__u32 count;
 											__u64 data_offset; /* relative to kvm_run start */
 										} io;
-												KVM: trivial document fixes

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2009-12-24 01:04:16 +00:00
+								If exit_reason is KVM_EXIT_IO, then the vcpu has
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								executed a port I/O instruction which could not be satisfied by kvm.
 								data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
 								where kvm expects application code to place the data for the next
-												KVM: trivial document fixes

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2009-12-24 01:04:16 +00:00
+								KVM_RUN invocation (KVM_EXIT_IO_IN).  Data format is a packed array.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 										struct {
 											struct kvm_debug_exit_arch arch;
 										} debug;
 								Unused.
 										/* KVM_EXIT_MMIO */
 										struct {
 											__u64 phys_addr;
 											__u8  data[8];
 											__u32 len;
 											__u8  is_write;
 										} mmio;
-												KVM: trivial document fixes

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2009-12-24 01:04:16 +00:00
+								If exit_reason is KVM_EXIT_MMIO, then the vcpu has
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								executed a memory-mapped I/O instruction which could not be satisfied
 								by kvm.  The 'data' member contains the written data if 'is_write' is
 								true, and should be filled by application code otherwise.
-												KVM: Specify byte order for KVM_EXIT_MMIO

The KVM API documentation is not clear about the semantics of the data
field on the mmio struct on the kvm_run struct.

This has become problematic when supporting ARM guests on big-endian
host systems with guests of both endianness types, because it is unclear
how the data should be exported to user space.

This should not break with existing implementations as all supported
existing implementations of known user space applications (QEMU and
kvmtools for virtio) only support default endianness of the
architectures on the host side.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

											
										
										
											2014-01-28 16:28:42 +00:00
+								The 'data' member contains, in its first 'len' bytes, the value as it would
 								appear if the VCPU performed a load or store of the appropriate width directly
 								to the byte array.
-												KVM: PPC: BookE: Implement EPR exit

The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-01-04 17:12:48 +00:00
+								NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
 								      KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
-												KVM: PPC: Add OSI hypercall interface

MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-24 20:48:30 +00:00
+								operations are complete (and guest state is consistent) only after userspace
 								has re-entered the kernel with KVM_RUN.  The kernel side will first finish
-												KVM: add doc note about PIO/MMIO completion API

Document that partially emulated instructions leave the guest state
inconsistent, and that the kernel will complete operations before
checking for pending signals.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-02-13 18:10:26 +00:00
+								incomplete operations and then check for pending signals.  Userspace
 								can re-enter the guest with an unmasked signal pending to complete
 								pending operations.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+										/* KVM_EXIT_HYPERCALL */
 										struct {
 											__u64 nr;
 											__u64 args[6];
 											__u64 ret;
 											__u32 longmode;
 											__u32 pad;
 										} hypercall;
-												KVM: Document replacements for KVM_EXIT_HYPERCALL

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-04-01 11:39:21 +00:00
+								Unused.  This was once used for 'hypercall to userspace'.  To implement
 								such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
 								Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
 										/* KVM_EXIT_TPR_ACCESS */
 										struct {
 											__u64 rip;
 											__u32 is_write;
 											__u32 pad;
 										} tpr_access;
 								To be documented (KVM_TPR_ACCESS_REPORTING).
 										/* KVM_EXIT_S390_SIEIC */
 										struct {
 											__u8 icptcode;
 											__u64 mask; /* psw upper half */
 											__u64 addr; /* psw lower half */
 											__u16 ipa;
 											__u32 ipb;
 										} s390_sieic;
 								s390 specific.
 										/* KVM_EXIT_S390_RESET */
 								#define KVM_S390_RESET_POR       1
 								#define KVM_S390_RESET_CLEAR     2
 								#define KVM_S390_RESET_SUBSYSTEM 4
 								#define KVM_S390_RESET_CPU_INIT  8
 								#define KVM_S390_RESET_IPL       16
 										__u64 s390_reset_flags;
 								s390 specific.
-												KVM: s390: ucontrol: export page faults to user

This patch introduces a new exit reason in the kvm_run structure
named KVM_EXIT_S390_UCONTROL. This exit indicates, that a virtual cpu
has regognized a fault on the host page table. The idea is that
userspace can handle this fault by mapping memory at the fault
location into the cpu's address space and then continue to run the
virtual cpu.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-04 09:25:22 +00:00
+										/* KVM_EXIT_S390_UCONTROL */
 										struct {
 											__u64 trans_exc_code;
 											__u32 pgm_code;
 										} s390_ucontrol;
 								s390 specific. A page fault has occurred for a user controlled virtual
 								machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
 								resolved by the kernel.
 								The program code and the translation exception code that were placed
 								in the cpu's lowcore are presented here as defined by the z Architecture
 								Principles of Operation Book in the Chapter for Dynamic Address Translation
 								(DAT)
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+										/* KVM_EXIT_DCR */
 										struct {
 											__u32 dcrn;
 											__u32 data;
 											__u8  is_write;
 										} dcr;
 								powerpc specific.
-												KVM: PPC: Add OSI hypercall interface

MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2010-03-24 20:48:30 +00:00
+										/* KVM_EXIT_OSI */
 										struct {
 											__u64 gprs[32];
 										} osi;
 								MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
 								hypercalls and exit with this exit struct that contains all the guest gprs.
 								If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
 								Userspace can now handle the hypercall and when it's done modify the gprs as
 								necessary. Upon guest entry all guest GPRs will then be replaced by the values
 								in this struct.
-												KVM: PPC: Add support for Book3S processors in hypervisor mode

This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode.  Using hypervisor mode means
that the guest can use the processor's supervisor mode.  That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host.  This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses.  That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification.  In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest.  We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount.  Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition.  MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest.  At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management.  This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2011-06-29 00:21:34 +00:00
+										/* KVM_EXIT_PAPR_HCALL */
 										struct {
 											__u64 nr;
 											__u64 ret;
 											__u64 args[9];
 										} papr_hcall;
 								This is used on 64-bit PowerPC when emulating a pSeries partition,
 								e.g. with the 'pseries' machine type in qemu.  It occurs when the
 								guest does a hypercall using the 'sc 1' instruction.  The 'nr' field
 								contains the hypercall number (from the guest R3), and 'args' contains
 								the arguments (from the guest R4 - R12).  Userspace should put the
 								return code in 'ret' and any extra returned values in args[].
 								The possible hypercalls are defined in the Power Architecture Platform
 								Requirements (PAPR) document available from www.power.org (free
 								developer registration required to access it).
-												KVM: s390: Add support for channel I/O instructions.

Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
intercepts for channel I/O instructions to userspace. Only I/O
instructions interacting with I/O interrupts need to be handled
in-kernel:

- TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
  interrupts entirely in-kernel.
- TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
  and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
  related processing.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-12-20 14:32:12 +00:00
+										/* KVM_EXIT_S390_TSCH */
 										struct {
 											__u16 subchannel_id;
 											__u16 subchannel_nr;
 											__u32 io_int_parm;
 											__u32 io_int_word;
 											__u32 ipb;
 											__u8 dequeued;
 										} s390_tsch;
 								s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
 								and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
 								interrupt for the target subchannel has been dequeued and subchannel_id,
 								subchannel_nr, io_int_parm and io_int_word contain the parameters for that
 								interrupt. ipb is needed for instruction parameter decoding.
-												KVM: PPC: BookE: Implement EPR exit

The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-01-04 17:12:48 +00:00
+										/* KVM_EXIT_EPR */
 										struct {
 											__u32 epr;
 										} epr;
 								On FSL BookE PowerPC chips, the interrupt controller has a fast patch
 								interrupt acknowledge path to the core. When the core successfully
 								delivers an interrupt, it automatically populates the EPR register with
 								the interrupt vector number and acknowledges the interrupt inside
 								the interrupt controller.
 								In case the interrupt controller lives in user space, we need to do
 								the interrupt acknowledge cycle through it to fetch the next to be
 								delivered interrupt vector using this exit.
 								It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
 								external interrupt has just been delivered into the guest. User space
 								should put the acknowledged interrupt vector into the 'epr' field.
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+										/* Fix the size of the union. */
 										char padding[256];
 									};
-												KVM: provide synchronous registers in kvm_run

On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make certain userspace handled exits more expensive than necessary.
Lets provide a section in kvm_run that works as a shared save area
for guest registers.
We also provide two 64bit flags fields (architecture specific), that will
specify
1. which parts of these fields are valid.
2. which registers were modified by userspace

Each bit for these flag fields will define a group of registers (like
general purpose) or a single register.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2012-01-11 10:20:30 +00:00
 									/*
 									 * shared registers between kvm and userspace.
 									 * kvm_valid_regs specifies the register classes set by the host
 									 * kvm_dirty_regs specified the register classes dirtied by userspace
 									 * struct kvm_sync_regs is architecture specific, as well as the
 									 * bits for kvm_valid_regs and kvm_dirty_regs
 									 */
 									__u64 kvm_valid_regs;
 									__u64 kvm_dirty_regs;
 									union {
 										struct kvm_sync_regs regs;
 										char padding[1024];
 									} s;
 								If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
 								certain guest registers without having to call SET/GET_*REGS. Thus we can
 								avoid some system call overhead if userspace has to handle the exit.
 								Userspace can query the validity of the structure by checking
 								kvm_valid_regs for specific bits. These bits are architecture specific
 								and usually define the validity of a groups of registers. (e.g. one bit
 								 for general purpose registers)
-												KVM: Document basic API

Document the basic API corresponding to the 2.6.22 release.

Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2009-06-09 09:37:58 +00:00
+								};
-												KVM: Update documentation to include detailed ENABLE_CAP description

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-08-31 08:58:55 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												kvm: Add KVM_GET_EMULATED_CPUID

Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.

Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.

s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

											
										
										
											2013-09-22 14:44:50 +00:00
+.81 KVM_GET_EMULATED_CPUID
 								Capability: KVM_CAP_EXT_EMUL_CPUID
 								Architectures: x86
 								Type: system ioctl
 								Parameters: struct kvm_cpuid2 (in/out)
 								Returns: 0 on success, -1 on error
 								struct kvm_cpuid2 {
 									__u32 nent;
 									__u32 flags;
 									struct kvm_cpuid_entry2 entries[0];
 								};
 								The member 'flags' is used for passing flags from userspace.
 								#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
 								#define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
 								#define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
 								struct kvm_cpuid_entry2 {
 									__u32 function;
 									__u32 index;
 									__u32 flags;
 									__u32 eax;
 									__u32 ebx;
 									__u32 ecx;
 									__u32 edx;
 									__u32 padding[3];
 								};
 								This ioctl returns x86 cpuid features which are emulated by
 								kvm.Userspace can use the information returned by this ioctl to query
 								which features are emulated by kvm instead of being present natively.
 								Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
 								structure with the 'nent' field indicating the number of entries in
 								the variable-size array 'entries'. If the number of entries is too low
 								to describe the cpu capabilities, an error (E2BIG) is returned. If the
 								number is too high, the 'nent' field is adjusted and an error (ENOMEM)
 								is returned. If the number is just right, the 'nent' field is adjusted
 								to the number of valid entries in the 'entries' array, which is then
 								filled.
 								The entries returned are the set CPUID bits of the respective features
 								which kvm emulates, as returned by the CPUID instruction, with unknown
 								or unsupported feature bits cleared.
 								Features like x2apic, for example, may not be present in the host cpu
 								but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
 								emulated efficiently and thus not included here.
 								The fields in each entry are defined as follows:
 								  function: the eax value used to obtain the entry
 								  index: the ecx value used to obtain the entry (for entries that are
 								         affected by ecx)
 								  flags: an OR of zero or more of the following:
 								        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
 								           if the index field is valid
 								        KVM_CPUID_FLAG_STATEFUL_FUNC:
 								           if cpuid for this function returns different values for successive
 								           invocations; there will be several entries with the same function,
 								           all with this flag set
 								        KVM_CPUID_FLAG_STATE_READ_NEXT:
 								           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
 								           the first entry to be read by a cpu
 								   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
 								         this function/index combination
-												KVM: Update documentation to include detailed ENABLE_CAP description

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-08-31 08:58:55 +00:00
+. Capabilities that can be enabled
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
+								-----------------------------------
-												KVM: Update documentation to include detailed ENABLE_CAP description

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-08-31 08:58:55 +00:00
 								There are certain capabilities that change the behavior of the virtual CPU when
 								enabled. To enable them, please see section 4.37. Below you can find a list of
 								capabilities and what their effect on the vCPU is when enabling them.
 								The following information is provided along with the description:
 								  Architectures: which instruction set architectures provide this ioctl.
 								      x86 includes both i386 and x86_64.
 								  Parameters: what parameters are accepted by the capability.
 								  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
 								      are not detailed, but errors with specific meanings are.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Update documentation to include detailed ENABLE_CAP description

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-08-31 08:58:55 +00:00
+.1 KVM_CAP_PPC_OSI
 								Architectures: ppc
 								Parameters: none
 								Returns: 0 on success; -1 on error
 								This capability enables interception of OSI hypercalls that otherwise would
 								be treated as normal system calls to be injected into the guest. OSI hypercalls
 								were invented by Mac-on-Linux to have a standardized communication mechanism
 								between the guest and the host.
 								When this capability is enabled, KVM_EXIT_OSI can occur.
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: Update documentation to include detailed ENABLE_CAP description

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

											
										
										
											2011-08-31 08:58:55 +00:00
+.2 KVM_CAP_PPC_PAPR
 								Architectures: ppc
 								Parameters: none
 								Returns: 0 on success; -1 on error
 								This capability enables interception of PAPR hypercalls. PAPR hypercalls are
 								done using the hypercall instruction "sc 1".
 								It also sets the guest privilege level to "supervisor" mode. Usually the guest
 								runs in "hypervisor" privilege mode with a few missing features.
 								In addition to the above, it changes the semantics of SDR1. In this mode, the
 								HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
 								HTAB invisible to the guest.
 								When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
-												KVM: PPC: e500: MMU API

This implements a shared-memory API for giving host userspace access to
the guest's TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-08-18 20:25:21 +00:00
-												KVM: Improve readability of KVM API doc

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-04-24 14:40:15 +00:00
-												KVM: PPC: e500: MMU API

This implements a shared-memory API for giving host userspace access to
the guest's TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>


											
										
										
											2011-08-18 20:25:21 +00:00
+.3 KVM_CAP_SW_TLB
 								Architectures: ppc
 								Parameters: args[0] is the address of a struct kvm_config_tlb
 								Returns: 0 on success; -1 on error
 								struct kvm_config_tlb {
 									__u64 params;
 									__u64 array;
 									__u32 mmu_type;
 									__u32 array_len;
 								};
 								Configures the virtual CPU's TLB array, establishing a shared memory area
 								between userspace and KVM.  The "params" and "array" fields are userspace
 								addresses of mmu-type-specific data structures.  The "array_len" field is an
 								safety mechanism, and should be set to the size in bytes of the memory that
 								userspace has reserved for the array.  It must be at least the size dictated
 								by "mmu_type" and "params".
 								While KVM_RUN is active, the shared region is under control of KVM.  Its
 								contents are undefined, and any modification by userspace results in
 								boundedly undefined behavior.
 								On return from KVM_RUN, the shared region will reflect the current state of
 								the guest's TLB.  If userspace makes any changes, it must call KVM_DIRTY_TLB
 								to tell KVM which entries have been changed, prior to calling KVM_RUN again
 								on this vcpu.
 								For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
 								 - The "params" field is of type "struct kvm_book3e_206_tlb_params".
 								 - The "array" field points to an array of type "struct
 								   kvm_book3e_206_tlb_entry".
 								 - The array consists of all entries in the first TLB, followed by all
 								   entries in the second TLB.
 								 - Within a TLB, entries are ordered first by increasing set number.  Within a
 								   set, entries are ordered by way (increasing ESEL).
 								 - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
 								   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
 								 - The tsize field of mas1 shall be set to 4K on TLB0, even though the
 								   hardware ignores this value for TLB0.
-												KVM: s390: Add support for channel I/O instructions.

Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
intercepts for channel I/O instructions to userspace. Only I/O
instructions interacting with I/O interrupts need to be handled
in-kernel:

- TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
  interrupts entirely in-kernel.
- TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
  and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
  related processing.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

											
										
										
											2012-12-20 14:32:12 +00:00
 .4 KVM_CAP_S390_CSS_SUPPORT
 								Architectures: s390
 								Parameters: none
 								Returns: 0 on success; -1 on error
 								This capability enables support for handling of channel I/O instructions.
 								TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
 								handled in-kernel, while the other I/O instructions are passed to userspace.
 								When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
 								SUBCHANNEL intercepts.
-												KVM: PPC: BookE: Implement EPR exit

The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.

Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.

This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.

Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-01-04 17:12:48 +00:00
 .5 KVM_CAP_PPC_EPR
 								Architectures: ppc
 								Parameters: args[0] defines whether the proxy facility is active
 								Returns: 0 on success; -1 on error
 								This capability enables or disables the delivery of interrupts through the
 								external proxy facility.
 								When enabled (args[0] != 0), every time the guest gets an external interrupt
 								delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
 								to receive the topmost interrupt vector.
 								When disabled (args[0] == 0), behavior is as if this facility is unsupported.
 								When this capability is enabled, KVM_EXIT_EPR can occur.
-												kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC

Enabling this capability connects the vcpu to the designated in-kernel
MPIC.  Using explicit connections between vcpus and irqchips allows
for flexibility, but the main benefit at the moment is that it
simplifies the code -- KVM doesn't need vm-global state to remember
which MPIC object is associated with this vm, and it doesn't need to
care about ordering between irqchip creation and vcpu creation.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu]
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-12 14:08:47 +00:00
 .6 KVM_CAP_IRQ_MPIC
 								Architectures: ppc
 								Parameters: args[0] is the MPIC device fd
 								            args[1] is the MPIC CPU number for this vcpu
 								This capability connects the vcpu to an in-kernel MPIC device.
-												KVM: PPC: Book3S: Add API for in-kernel XICS emulation

This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it.  The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.

The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source.  The attribute number is the same as the interrupt
source number.

This does not support irq routing or irqfd yet.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

											
										
										
											2013-04-27 00:28:37 +00:00
 .7 KVM_CAP_IRQ_XICS
 								Architectures: ppc
 								Parameters: args[0] is the XICS device fd
 								            args[1] is the XICS CPU number (server ID) for this vcpu
 								This capability connects the vcpu to an in-kernel XICS device.