diff --git a/Documentation/powerpc/elfnote.rst b/Documentation/powerpc/elfnote.rst new file mode 100644 index 000000000000..2a5c4beeb809 --- /dev/null +++ b/Documentation/powerpc/elfnote.rst @@ -0,0 +1,42 @@ +========================== +ELF Note PowerPC Namespace +========================== + +The PowerPC namespace in an ELF Note of the kernel binary is used to store +capabilities and information which can be used by a bootloader or userland. + +Types and Descriptors +--------------------- + +The types to be used with the "PowerPC" namesapce are defined in the +include/uapi/asm/elfnote.h + + 1) PPC_ELFNOTE_CAPABILITIES + +Define the capabilities supported/required by the kernel. This type uses a +bitmap as "descriptor" field. Each bit is described below: + +- Ultravisor-capable bit (PowerNV only). + + #define PPCCAP_ULTRAVISOR_BIT (1 << 0) + + Indicate that the powerpc kernel binary knows how to run in an + ultravisor-enabled system. + + In an ultravisor-enabled system, some machine resources are now controlled + by the ultravisor. If the kernel is not ultravisor-capable, but it ends up + being run on a machine with ultravisor, the kernel will probably crash + trying to access ultravisor resources. For instance, it may crash in early + boot trying to set the partition table entry 0. + + In an ultravisor-enabled system, a bootloader could warn the user or prevent + the kernel from being run if the PowerPC ultravisor capability doesn't exist + or the Ultravisor-capable bit is not set. + +References +---------- + +arch/powerpc/include/asm/elfnote.h +arch/powerpc/kernel/note.S + + diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst new file mode 100644 index 000000000000..94a149f34ec3 --- /dev/null +++ b/Documentation/powerpc/ultravisor.rst @@ -0,0 +1,1057 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _ultravisor: + +============================ +Protected Execution Facility +============================ + +.. contents:: + :depth: 3 + +.. sectnum:: + :depth: 3 + +Protected Execution Facility +############################ + + Protected Execution Facility (PEF) is an architectural change for + POWER 9 that enables Secure Virtual Machines (SVMs). DD2.3 chips + (PVR=0x004e1203) or greater will be PEF-capable. A new ISA release + will include the PEF RFC02487 changes. + + When enabled, PEF adds a new higher privileged mode, called Ultravisor + mode, to POWER architecture. Along with the new mode there is new + firmware called the Protected Execution Ultravisor (or Ultravisor + for short). Ultravisor mode is the highest privileged mode in POWER + architecture. + + +------------------+ + | Privilege States | + +==================+ + | Problem | + +------------------+ + | Supervisor | + +------------------+ + | Hypervisor | + +------------------+ + | Ultravisor | + +------------------+ + + PEF protects SVMs from the hypervisor, privileged users, and other + VMs in the system. SVMs are protected while at rest and can only be + executed by an authorized machine. All virtual machines utilize + hypervisor services. The Ultravisor filters calls between the SVMs + and the hypervisor to assure that information does not accidentally + leak. All hypercalls except H_RANDOM are reflected to the hypervisor. + H_RANDOM is not reflected to prevent the hypervisor from influencing + random values in the SVM. + + To support this there is a refactoring of the ownership of resources + in the CPU. Some of the resources which were previously hypervisor + privileged are now ultravisor privileged. + +Hardware +======== + + The hardware changes include the following: + + * There is a new bit in the MSR that determines whether the current + process is running in secure mode, MSR(S) bit 41. MSR(S)=1, process + is in secure mode, MSR(s)=0 process is in normal mode. + + * The MSR(S) bit can only be set by the Ultravisor. + + * HRFID cannot be used to set the MSR(S) bit. If the hypervisor needs + to return to a SVM it must use an ultracall. It can determine if + the VM it is returning to is secure. + + * There is a new Ultravisor privileged register, SMFCTRL, which has an + enable/disable bit SMFCTRL(E). + + * The privilege of a process is now determined by three MSR bits, + MSR(S, HV, PR). In each of the tables below the modes are listed + from least privilege to highest privilege. The higher privilege + modes can access all the resources of the lower privilege modes. + + **Secure Mode MSR Settings** + + +---+---+---+---------------+ + | S | HV| PR|Privilege | + +===+===+===+===============+ + | 1 | 0 | 1 | Problem | + +---+---+---+---------------+ + | 1 | 0 | 0 | Privileged(OS)| + +---+---+---+---------------+ + | 1 | 1 | 0 | Ultravisor | + +---+---+---+---------------+ + | 1 | 1 | 1 | Reserved | + +---+---+---+---------------+ + + **Normal Mode MSR Settings** + + +---+---+---+---------------+ + | S | HV| PR|Privilege | + +===+===+===+===============+ + | 0 | 0 | 1 | Problem | + +---+---+---+---------------+ + | 0 | 0 | 0 | Privileged(OS)| + +---+---+---+---------------+ + | 0 | 1 | 0 | Hypervisor | + +---+---+---+---------------+ + | 0 | 1 | 1 | Problem (Host)| + +---+---+---+---------------+ + + * Memory is partitioned into secure and normal memory. Only processes + that are running in secure mode can access secure memory. + + * The hardware does not allow anything that is not running secure to + access secure memory. This means that the Hypervisor cannot access + the memory of the SVM without using an ultracall (asking the + Ultravisor). The Ultravisor will only allow the hypervisor to see + the SVM memory encrypted. + + * I/O systems are not allowed to directly address secure memory. This + limits the SVMs to virtual I/O only. + + * The architecture allows the SVM to share pages of memory with the + hypervisor that are not protected with encryption. However, this + sharing must be initiated by the SVM. + + * When a process is running in secure mode all hypercalls + (syscall lev=1) go to the Ultravisor. + + * When a process is in secure mode all interrupts go to the + Ultravisor. + + * The following resources have become Ultravisor privileged and + require an Ultravisor interface to manipulate: + + * Processor configurations registers (SCOMs). + + * Stop state information. + + * The debug registers CIABR, DAWR, and DAWRX when SMFCTRL(D) is set. + If SMFCTRL(D) is not set they do not work in secure mode. When set, + reading and writing requires an Ultravisor call, otherwise that + will cause a Hypervisor Emulation Assistance interrupt. + + * PTCR and partition table entries (partition table is in secure + memory). An attempt to write to PTCR will cause a Hypervisor + Emulation Assitance interrupt. + + * LDBAR (LD Base Address Register) and IMC (In-Memory Collection) + non-architected registers. An attempt to write to them will cause a + Hypervisor Emulation Assistance interrupt. + + * Paging for an SVM, sharing of memory with Hypervisor for an SVM. + (Including Virtual Processor Area (VPA) and virtual I/O). + + +Software/Microcode +================== + + The software changes include: + + * SVMs are created from normal VM using (open source) tooling supplied + by IBM. + + * All SVMs start as normal VMs and utilize an ultracall, UV_ESM + (Enter Secure Mode), to make the transition. + + * When the UV_ESM ultracall is made the Ultravisor copies the VM into + secure memory, decrypts the verification information, and checks the + integrity of the SVM. If the integrity check passes the Ultravisor + passes control in secure mode. + + * The verification information includes the pass phrase for the + encrypted disk associated with the SVM. This pass phrase is given + to the SVM when requested. + + * The Ultravisor is not involved in protecting the encrypted disk of + the SVM while at rest. + + * For external interrupts the Ultravisor saves the state of the SVM, + and reflects the interrupt to the hypervisor for processing. + For hypercalls, the Ultravisor inserts neutral state into all + registers not needed for the hypercall then reflects the call to + the hypervisor for processing. The H_RANDOM hypercall is performed + by the Ultravisor and not reflected. + + * For virtual I/O to work bounce buffering must be done. + + * The Ultravisor uses AES (IAPM) for protection of SVM memory. IAPM + is a mode of AES that provides integrity and secrecy concurrently. + + * The movement of data between normal and secure pages is coordinated + with the Ultravisor by a new HMM plug-in in the Hypervisor. + + The Ultravisor offers new services to the hypervisor and SVMs. These + are accessed through ultracalls. + +Terminology +=========== + + * Hypercalls: special system calls used to request services from + Hypervisor. + + * Normal memory: Memory that is accessible to Hypervisor. + + * Normal page: Page backed by normal memory and available to + Hypervisor. + + * Shared page: A page backed by normal memory and available to both + the Hypervisor/QEMU and the SVM (i.e page has mappings in SVM and + Hypervisor/QEMU). + + * Secure memory: Memory that is accessible only to Ultravisor and + SVMs. + + * Secure page: Page backed by secure memory and only available to + Ultravisor and SVM. + + * SVM: Secure Virtual Machine. + + * Ultracalls: special system calls used to request services from + Ultravisor. + + +Ultravisor calls API +#################### + + This section describes Ultravisor calls (ultracalls) needed to + support Secure Virtual Machines (SVM)s and Paravirtualized KVM. The + ultracalls allow the SVMs and Hypervisor to request services from the + Ultravisor such as accessing a register or memory region that can only + be accessed when running in Ultravisor-privileged mode. + + The specific service needed from an ultracall is specified in register + R3 (the first parameter to the ultracall). Other parameters to the + ultracall, if any, are specified in registers R4 through R12. + + Return value of all ultracalls is in register R3. Other output values + from the ultracall, if any, are returned in registers R4 through R12. + The only exception to this register usage is the ``UV_RETURN`` + ultracall described below. + + Each ultracall returns specific error codes, applicable in the context + of the ultracall. However, like with the PowerPC Architecture Platform + Reference (PAPR), if no specific error code is defined for a + particular situation, then the ultracall will fallback to an erroneous + parameter-position based code. i.e U_PARAMETER, U_P2, U_P3 etc + depending on the ultracall parameter that may have caused the error. + + Some ultracalls involve transferring a page of data between Ultravisor + and Hypervisor. Secure pages that are transferred from secure memory + to normal memory may be encrypted using dynamically generated keys. + When the secure pages are transferred back to secure memory, they may + be decrypted using the same dynamically generated keys. Generation and + management of these keys will be covered in a separate document. + + For now this only covers ultracalls currently implemented and being + used by Hypervisor and SVMs but others can be added here when it + makes sense. + + The full specification for all hypercalls/ultracalls will eventually + be made available in the public/OpenPower version of the PAPR + specification. + + **Note** + + If PEF is not enabled, the ultracalls will be redirected to the + Hypervisor which must handle/fail the calls. + +Ultracalls used by Hypervisor +============================= + + This section describes the virtual memory management ultracalls used + by the Hypervisor to manage SVMs. + +UV_PAGE_OUT +----------- + + Encrypt and move the contents of a page from secure memory to normal + memory. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_PAGE_OUT, + uint16_t lpid, /* LPAR ID */ + uint64_t dest_ra, /* real address of destination page */ + uint64_t src_gpa, /* source guest-physical-address */ + uint8_t flags, /* flags */ + uint64_t order) /* page size order */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_PARAMETER if ``lpid`` is invalid. + * U_P2 if ``dest_ra`` is invalid. + * U_P3 if the ``src_gpa`` address is invalid. + * U_P4 if any bit in the ``flags`` is unrecognized + * U_P5 if the ``order`` parameter is unsupported. + * U_FUNCTION if functionality is not supported. + * U_BUSY if page cannot be currently paged-out. + +Description +~~~~~~~~~~~ + + Encrypt the contents of a secure-page and make it available to + Hypervisor in a normal page. + + By default, the source page is unmapped from the SVM's partition- + scoped page table. But the Hypervisor can provide a hint to the + Ultravisor to retain the page mapping by setting the ``UV_SNAPSHOT`` + flag in ``flags`` parameter. + + If the source page is already a shared page the call returns + U_SUCCESS, without doing anything. + +Use cases +~~~~~~~~~ + + #. QEMU attempts to access an address belonging to the SVM but the + page frame for that address is not mapped into QEMU's address + space. In this case, the Hypervisor will allocate a page frame, + map it into QEMU's address space and issue the ``UV_PAGE_OUT`` + call to retrieve the encrypted contents of the page. + + #. When Ultravisor runs low on secure memory and it needs to page-out + an LRU page. In this case, Ultravisor will issue the + ``H_SVM_PAGE_OUT`` hypercall to the Hypervisor. The Hypervisor will + then allocate a normal page and issue the ``UV_PAGE_OUT`` ultracall + and the Ultravisor will encrypt and move the contents of the secure + page into the normal page. + + #. When Hypervisor accesses SVM data, the Hypervisor requests the + Ultravisor to transfer the corresponding page into a insecure page, + which the Hypervisor can access. The data in the normal page will + be encrypted though. + +UV_PAGE_IN +---------- + + Move the contents of a page from normal memory to secure memory. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_PAGE_IN, + uint16_t lpid, /* the LPAR ID */ + uint64_t src_ra, /* source real address of page */ + uint64_t dest_gpa, /* destination guest physical address */ + uint64_t flags, /* flags */ + uint64_t order) /* page size order */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_BUSY if page cannot be currently paged-in. + * U_FUNCTION if functionality is not supported + * U_PARAMETER if ``lpid`` is invalid. + * U_P2 if ``src_ra`` is invalid. + * U_P3 if the ``dest_gpa`` address is invalid. + * U_P4 if any bit in the ``flags`` is unrecognized + * U_P5 if the ``order`` parameter is unsupported. + +Description +~~~~~~~~~~~ + + Move the contents of the page identified by ``src_ra`` from normal + memory to secure memory and map it to the guest physical address + ``dest_gpa``. + + If `dest_gpa` refers to a shared address, map the page into the + partition-scoped page-table of the SVM. If `dest_gpa` is not shared, + copy the contents of the page into the corresponding secure page. + Depending on the context, decrypt the page before being copied. + + The caller provides the attributes of the page through the ``flags`` + parameter. Valid values for ``flags`` are: + + * CACHE_INHIBITED + * CACHE_ENABLED + * WRITE_PROTECTION + + The Hypervisor must pin the page in memory before making + ``UV_PAGE_IN`` ultracall. + +Use cases +~~~~~~~~~ + + #. When a normal VM switches to secure mode, all its pages residing + in normal memory, are moved into secure memory. + + #. When an SVM requests to share a page with Hypervisor the Hypervisor + allocates a page and informs the Ultravisor. + + #. When an SVM accesses a secure page that has been paged-out, + Ultravisor invokes the Hypervisor to locate the page. After + locating the page, the Hypervisor uses UV_PAGE_IN to make the + page available to Ultravisor. + +UV_PAGE_INVAL +------------- + + Invalidate the Ultravisor mapping of a page. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_PAGE_INVAL, + uint16_t lpid, /* the LPAR ID */ + uint64_t guest_pa, /* destination guest-physical-address */ + uint64_t order) /* page size order */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_PARAMETER if ``lpid`` is invalid. + * U_P2 if ``guest_pa`` is invalid (or corresponds to a secure + page mapping). + * U_P3 if the ``order`` is invalid. + * U_FUNCTION if functionality is not supported. + * U_BUSY if page cannot be currently invalidated. + +Description +~~~~~~~~~~~ + + This ultracall informs Ultravisor that the page mapping in Hypervisor + corresponding to the given guest physical address has been invalidated + and that the Ultravisor should not access the page. If the specified + ``guest_pa`` corresponds to a secure page, Ultravisor will ignore the + attempt to invalidate the page and return U_P2. + +Use cases +~~~~~~~~~ + + #. When a shared page is unmapped from the QEMU's page table, possibly + because it is paged-out to disk, Ultravisor needs to know that the + page should not be accessed from its side too. + + +UV_WRITE_PATE +------------- + + Validate and write the partition table entry (PATE) for a given + partition. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_WRITE_PATE, + uint32_t lpid, /* the LPAR ID */ + uint64_t dw0 /* the first double word to write */ + uint64_t dw1) /* the second double word to write */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_BUSY if PATE cannot be currently written to. + * U_FUNCTION if functionality is not supported. + * U_PARAMETER if ``lpid`` is invalid. + * U_P2 if ``dw0`` is invalid. + * U_P3 if the ``dw1`` address is invalid. + * U_PERMISSION if the Hypervisor is attempting to change the PATE + of a secure virtual machine or if called from a + context other than Hypervisor. + +Description +~~~~~~~~~~~ + + Validate and write a LPID and its partition-table-entry for the given + LPID. If the LPID is already allocated and initialized, this call + results in changing the partition table entry. + +Use cases +~~~~~~~~~ + + #. The Partition table resides in Secure memory and its entries, + called PATE (Partition Table Entries), point to the partition- + scoped page tables for the Hypervisor as well as each of the + virtual machines (both secure and normal). The Hypervisor + operates in partition 0 and its partition-scoped page tables + reside in normal memory. + + #. This ultracall allows the Hypervisor to register the partition- + scoped and process-scoped page table entries for the Hypervisor + and other partitions (virtual machines) with the Ultravisor. + + #. If the value of the PATE for an existing partition (VM) changes, + the TLB cache for the partition is flushed. + + #. The Hypervisor is responsible for allocating LPID. The LPID and + its PATE entry are registered together. The Hypervisor manages + the PATE entries for a normal VM and can change the PATE entry + anytime. Ultravisor manages the PATE entries for an SVM and + Hypervisor is not allowed to modify them. + +UV_RETURN +--------- + + Return control from the Hypervisor back to the Ultravisor after + processing an hypercall or interrupt that was forwarded (aka + *reflected*) to the Hypervisor. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_RETURN) + +Return values +~~~~~~~~~~~~~ + + This call never returns to Hypervisor on success. It returns + U_INVALID if ultracall is not made from a Hypervisor context. + +Description +~~~~~~~~~~~ + + When an SVM makes an hypercall or incurs some other exception, the + Ultravisor usually forwards (aka *reflects*) the exceptions to the + Hypervisor. After processing the exception, Hypervisor uses the + ``UV_RETURN`` ultracall to return control back to the SVM. + + The expected register state on entry to this ultracall is: + + * Non-volatile registers are restored to their original values. + * If returning from an hypercall, register R0 contains the return + value (**unlike other ultracalls**) and, registers R4 through R12 + contain any output values of the hypercall. + * R3 contains the ultracall number, i.e UV_RETURN. + * If returning with a synthesized interrupt, R2 contains the + synthesized interrupt number. + +Use cases +~~~~~~~~~ + + #. Ultravisor relies on the Hypervisor to provide several services to + the SVM such as processing hypercall and other exceptions. After + processing the exception, Hypervisor uses UV_RETURN to return + control back to the Ultravisor. + + #. Hypervisor has to use this ultracall to return control to the SVM. + + +UV_REGISTER_MEM_SLOT +-------------------- + + Register an SVM address-range with specified properties. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_REGISTER_MEM_SLOT, + uint64_t lpid, /* LPAR ID of the SVM */ + uint64_t start_gpa, /* start guest physical address */ + uint64_t size, /* size of address range in bytes */ + uint64_t flags /* reserved for future expansion */ + uint16_t slotid) /* slot identifier */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_PARAMETER if ``lpid`` is invalid. + * U_P2 if ``start_gpa`` is invalid. + * U_P3 if ``size`` is invalid. + * U_P4 if any bit in the ``flags`` is unrecognized. + * U_P5 if the ``slotid`` parameter is unsupported. + * U_PERMISSION if called from context other than Hypervisor. + * U_FUNCTION if functionality is not supported. + + +Description +~~~~~~~~~~~ + + Register a memory range for an SVM. The memory range starts at the + guest physical address ``start_gpa`` and is ``size`` bytes long. + +Use cases +~~~~~~~~~ + + + #. When a virtual machine goes secure, all the memory slots managed by + the Hypervisor move into secure memory. The Hypervisor iterates + through each of memory slots, and registers the slot with + Ultravisor. Hypervisor may discard some slots such as those used + for firmware (SLOF). + + #. When new memory is hot-plugged, a new memory slot gets registered. + + +UV_UNREGISTER_MEM_SLOT +---------------------- + + Unregister an SVM address-range that was previously registered using + UV_REGISTER_MEM_SLOT. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_UNREGISTER_MEM_SLOT, + uint64_t lpid, /* LPAR ID of the SVM */ + uint64_t slotid) /* reservation slotid */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_FUNCTION if functionality is not supported. + * U_PARAMETER if ``lpid`` is invalid. + * U_P2 if ``slotid`` is invalid. + * U_PERMISSION if called from context other than Hypervisor. + +Description +~~~~~~~~~~~ + + Release the memory slot identified by ``slotid`` and free any + resources allocated towards the reservation. + +Use cases +~~~~~~~~~ + + #. Memory hot-remove. + + +UV_SVM_TERMINATE +---------------- + + Terminate an SVM and release its resources. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_SVM_TERMINATE, + uint64_t lpid, /* LPAR ID of the SVM */) + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_FUNCTION if functionality is not supported. + * U_PARAMETER if ``lpid`` is invalid. + * U_INVALID if VM is not secure. + * U_PERMISSION if not called from a Hypervisor context. + +Description +~~~~~~~~~~~ + + Terminate an SVM and release all its resources. + +Use cases +~~~~~~~~~ + + #. Called by Hypervisor when terminating an SVM. + + +Ultracalls used by SVM +====================== + +UV_SHARE_PAGE +------------- + + Share a set of guest physical pages with the Hypervisor. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_SHARE_PAGE, + uint64_t gfn, /* guest page frame number */ + uint64_t num) /* number of pages of size PAGE_SIZE */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_FUNCTION if functionality is not supported. + * U_INVALID if the VM is not secure. + * U_PARAMETER if ``gfn`` is invalid. + * U_P2 if ``num`` is invalid. + +Description +~~~~~~~~~~~ + + Share the ``num`` pages starting at guest physical frame number ``gfn`` + with the Hypervisor. Assume page size is PAGE_SIZE bytes. Zero the + pages before returning. + + If the address is already backed by a secure page, unmap the page and + back it with an insecure page, with the help of the Hypervisor. If it + is not backed by any page yet, mark the PTE as insecure and back it + with an insecure page when the address is accessed. If it is already + backed by an insecure page, zero the page and return. + +Use cases +~~~~~~~~~ + + #. The Hypervisor cannot access the SVM pages since they are backed by + secure pages. Hence an SVM must explicitly request Ultravisor for + pages it can share with Hypervisor. + + #. Shared pages are needed to support virtio and Virtual Processor Area + (VPA) in SVMs. + + +UV_UNSHARE_PAGE +--------------- + + Restore a shared SVM page to its initial state. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_UNSHARE_PAGE, + uint64_t gfn, /* guest page frame number */ + uint73 num) /* number of pages of size PAGE_SIZE*/ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_FUNCTION if functionality is not supported. + * U_INVALID if VM is not secure. + * U_PARAMETER if ``gfn`` is invalid. + * U_P2 if ``num`` is invalid. + +Description +~~~~~~~~~~~ + + Stop sharing ``num`` pages starting at ``gfn`` with the Hypervisor. + Assume that the page size is PAGE_SIZE. Zero the pages before + returning. + + If the address is already backed by an insecure page, unmap the page + and back it with a secure page. Inform the Hypervisor to release + reference to its shared page. If the address is not backed by a page + yet, mark the PTE as secure and back it with a secure page when that + address is accessed. If it is already backed by an secure page zero + the page and return. + +Use cases +~~~~~~~~~ + + #. The SVM may decide to unshare a page from the Hypervisor. + + +UV_UNSHARE_ALL_PAGES +-------------------- + + Unshare all pages the SVM has shared with Hypervisor. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_UNSHARE_ALL_PAGES) + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success. + * U_FUNCTION if functionality is not supported. + * U_INVAL if VM is not secure. + +Description +~~~~~~~~~~~ + + Unshare all shared pages from the Hypervisor. All unshared pages are + zeroed on return. Only pages explicitly shared by the SVM with the + Hypervisor (using UV_SHARE_PAGE ultracall) are unshared. Ultravisor + may internally share some pages with the Hypervisor without explicit + request from the SVM. These pages will not be unshared by this + ultracall. + +Use cases +~~~~~~~~~ + + #. This call is needed when ``kexec`` is used to boot a different + kernel. It may also be needed during SVM reset. + +UV_ESM +------ + + Secure the virtual machine (*enter secure mode*). + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t ultracall(const uint64_t UV_ESM, + uint64_t esm_blob_addr, /* location of the ESM blob */ + unint64_t fdt) /* Flattened device tree */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * U_SUCCESS on success (including if VM is already secure). + * U_FUNCTION if functionality is not supported. + * U_INVALID if VM is not secure. + * U_PARAMETER if ``esm_blob_addr`` is invalid. + * U_P2 if ``fdt`` is invalid. + * U_PERMISSION if any integrity checks fail. + * U_RETRY insufficient memory to create SVM. + * U_NO_KEY symmetric key unavailable. + +Description +~~~~~~~~~~~ + + Secure the virtual machine. On successful completion, return + control to the virtual machine at the address specified in the + ESM blob. + +Use cases +~~~~~~~~~ + + #. A normal virtual machine can choose to switch to a secure mode. + +Hypervisor Calls API +#################### + + This document describes the Hypervisor calls (hypercalls) that are + needed to support the Ultravisor. Hypercalls are services provided by + the Hypervisor to virtual machines and Ultravisor. + + Register usage for these hypercalls is identical to that of the other + hypercalls defined in the Power Architecture Platform Reference (PAPR) + document. i.e on input, register R3 identifies the specific service + that is being requested and registers R4 through R11 contain + additional parameters to the hypercall, if any. On output, register + R3 contains the return value and registers R4 through R9 contain any + other output values from the hypercall. + + This document only covers hypercalls currently implemented/planned + for Ultravisor usage but others can be added here when it makes sense. + + The full specification for all hypercalls/ultracalls will eventually + be made available in the public/OpenPower version of the PAPR + specification. + +Hypervisor calls to support Ultravisor +====================================== + + Following are the set of hypercalls needed to support Ultravisor. + +H_SVM_INIT_START +---------------- + + Begin the process of converting a normal virtual machine into an SVM. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t hypercall(const uint64_t H_SVM_INIT_START) + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * H_SUCCESS on success. + +Description +~~~~~~~~~~~ + + Initiate the process of securing a virtual machine. This involves + coordinating with the Ultravisor, using ultracalls, to allocate + resources in the Ultravisor for the new SVM, transferring the VM's + pages from normal to secure memory etc. When the process is + completed, Ultravisor issues the H_SVM_INIT_DONE hypercall. + +Use cases +~~~~~~~~~ + + #. Ultravisor uses this hypercall to inform Hypervisor that a VM + has initiated the process of switching to secure mode. + + +H_SVM_INIT_DONE +--------------- + + Complete the process of securing an SVM. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t hypercall(const uint64_t H_SVM_INIT_DONE) + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * H_SUCCESS on success. + * H_UNSUPPORTED if called from the wrong context (e.g. + from an SVM or before an H_SVM_INIT_START + hypercall). + +Description +~~~~~~~~~~~ + + Complete the process of securing a virtual machine. This call must + be made after a prior call to ``H_SVM_INIT_START`` hypercall. + +Use cases +~~~~~~~~~ + + On successfully securing a virtual machine, the Ultravisor informs + Hypervisor about it. Hypervisor can use this call to finish setting + up its internal state for this virtual machine. + + +H_SVM_PAGE_IN +------------- + + Move the contents of a page from normal memory to secure memory. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t hypercall(const uint64_t H_SVM_PAGE_IN, + uint64_t guest_pa, /* guest-physical-address */ + uint64_t flags, /* flags */ + uint64_t order) /* page size order */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * H_SUCCESS on success. + * H_PARAMETER if ``guest_pa`` is invalid. + * H_P2 if ``flags`` is invalid. + * H_P3 if ``order`` of page is invalid. + +Description +~~~~~~~~~~~ + + Retrieve the content of the page, belonging to the VM at the specified + guest physical address. + + Only valid value(s) in ``flags`` are: + + * H_PAGE_IN_SHARED which indicates that the page is to be shared + with the Ultravisor. + + * H_PAGE_IN_NONSHARED indicates that the UV is not anymore + interested in the page. Applicable if the page is a shared page. + + The ``order`` parameter must correspond to the configured page size. + +Use cases +~~~~~~~~~ + + #. When a normal VM becomes a secure VM (using the UV_ESM ultracall), + the Ultravisor uses this hypercall to move contents of each page of + the VM from normal memory to secure memory. + + #. Ultravisor uses this hypercall to ask Hypervisor to provide a page + in normal memory that can be shared between the SVM and Hypervisor. + + #. Ultravisor uses this hypercall to page-in a paged-out page. This + can happen when the SVM touches a paged-out page. + + #. If SVM wants to disable sharing of pages with Hypervisor, it can + inform Ultravisor to do so. Ultravisor will then use this hypercall + and inform Hypervisor that it has released access to the normal + page. + +H_SVM_PAGE_OUT +--------------- + + Move the contents of the page to normal memory. + +Syntax +~~~~~~ + +.. code-block:: c + + uint64_t hypercall(const uint64_t H_SVM_PAGE_OUT, + uint64_t guest_pa, /* guest-physical-address */ + uint64_t flags, /* flags (currently none) */ + uint64_t order) /* page size order */ + +Return values +~~~~~~~~~~~~~ + + One of the following values: + + * H_SUCCESS on success. + * H_PARAMETER if ``guest_pa`` is invalid. + * H_P2 if ``flags`` is invalid. + * H_P3 if ``order`` is invalid. + +Description +~~~~~~~~~~~ + + Move the contents of the page identified by ``guest_pa`` to normal + memory. + + Currently ``flags`` is unused and must be set to 0. The ``order`` + parameter must correspond to the configured page size. + +Use cases +~~~~~~~~~ + + #. If Ultravisor is running low on secure pages, it can move the + contents of some secure pages, into normal pages using this + hypercall. The content will be encrypted. + +References +########## + +.. [1] `Supporting Protected Computing on IBM Power Architecture `_ diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index ec1c97a8e8cb..e698f48cbc6d 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -15,6 +15,7 @@ #include #include #include +#include #include @@ -34,6 +35,16 @@ extern struct static_key hcall_tracepoint_key; void __trace_hcall_entry(unsigned long opcode, unsigned long *args); void __trace_hcall_exit(long opcode, long retval, unsigned long *retbuf); +/* Ultravisor */ +#ifdef CONFIG_PPC_POWERNV +long ucall_norets(unsigned long opcode, ...); +#else +static inline long ucall_norets(unsigned long opcode, ...) +{ + return U_NOT_AVAILABLE; +} +#endif + /* OPAL */ int64_t __opal_call(int64_t a0, int64_t a1, int64_t a2, int64_t a3, int64_t a4, int64_t a5, int64_t a6, int64_t a7, diff --git a/arch/powerpc/include/asm/elfnote.h b/arch/powerpc/include/asm/elfnote.h new file mode 100644 index 000000000000..a201b6e9ae44 --- /dev/null +++ b/arch/powerpc/include/asm/elfnote.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * PowerPC ELF notes. + * + * Copyright 2019, IBM Corporation + */ + +#ifndef __ASM_POWERPC_ELFNOTE_H__ +#define __ASM_POWERPC_ELFNOTE_H__ + +/* + * These note types should live in a SHT_NOTE segment and have + * "PowerPC" in the name field. + */ + +/* + * The capabilities supported/required by this kernel (bitmap). + * + * This type uses a bitmap as "desc" field. Each bit is described + * in arch/powerpc/kernel/note.S + */ +#define PPC_ELFNOTE_CAPABILITIES 1 + +#endif /* __ASM_POWERPC_ELFNOTE_H__ */ diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h index faeca8b76c8c..b3e214a97f3a 100644 --- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -50,6 +50,7 @@ #define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000) #define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000) #define FW_FEATURE_PAPR_SCM ASM_CONST(0x0000002000000000) +#define FW_FEATURE_ULTRAVISOR ASM_CONST(0x0000004000000000) #ifndef __ASSEMBLY__ @@ -68,9 +69,9 @@ enum { FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN | FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 | FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE | - FW_FEATURE_PAPR_SCM, + FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR, FW_FEATURE_PSERIES_ALWAYS = 0, - FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL, + FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR, FW_FEATURE_POWERNV_ALWAYS = 0, FW_FEATURE_PS3_POSSIBLE = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1, FW_FEATURE_PS3_ALWAYS = FW_FEATURE_LPAR | FW_FEATURE_PS3_LV1, diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index d7bf1f104c15..350101e11ddb 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -48,15 +48,16 @@ struct iommu_table_ops { * returns old TCE and DMA direction mask. * @tce is a physical address. */ - int (*exchange)(struct iommu_table *tbl, + int (*xchg_no_kill)(struct iommu_table *tbl, long index, unsigned long *hpa, - enum dma_data_direction *direction); - /* Real mode */ - int (*exchange_rm)(struct iommu_table *tbl, - long index, - unsigned long *hpa, - enum dma_data_direction *direction); + enum dma_data_direction *direction, + bool realmode); + + void (*tce_kill)(struct iommu_table *tbl, + unsigned long index, + unsigned long pages, + bool realmode); __be64 *(*useraddrptr)(struct iommu_table *tbl, long index, bool alloc); #endif @@ -209,6 +210,12 @@ extern void iommu_del_device(struct device *dev); extern long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl, unsigned long entry, unsigned long *hpa, enum dma_data_direction *direction); +extern long iommu_tce_xchg_no_kill(struct mm_struct *mm, + struct iommu_table *tbl, + unsigned long entry, unsigned long *hpa, + enum dma_data_direction *direction); +extern void iommu_tce_kill(struct iommu_table *tbl, + unsigned long entry, unsigned long pages); #else static inline void iommu_register_group(struct iommu_table_group *table_group, int pci_domain_number, diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e6e5f59aaa97..4bb552d639b8 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -283,6 +283,7 @@ struct kvm_arch { cpumask_t cpu_in_guest; u8 radix; u8 fwnmi_enabled; + u8 secure_guest; bool threads_indep; bool nested_enable; pgd_t *pgtable; diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h new file mode 100644 index 000000000000..6a0f9c74f959 --- /dev/null +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Ultravisor API. + * + * Copyright 2019, IBM Corporation. + * + */ +#ifndef _ASM_POWERPC_ULTRAVISOR_API_H +#define _ASM_POWERPC_ULTRAVISOR_API_H + +#include + +/* Return codes */ +#define U_BUSY H_BUSY +#define U_FUNCTION H_FUNCTION +#define U_NOT_AVAILABLE H_NOT_AVAILABLE +#define U_P2 H_P2 +#define U_P3 H_P3 +#define U_P4 H_P4 +#define U_P5 H_P5 +#define U_PARAMETER H_PARAMETER +#define U_PERMISSION H_PERMISSION +#define U_SUCCESS H_SUCCESS + +/* opcodes */ +#define UV_WRITE_PATE 0xF104 +#define UV_RETURN 0xF11C + +#endif /* _ASM_POWERPC_ULTRAVISOR_API_H */ diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h new file mode 100644 index 000000000000..d7aa97aa7834 --- /dev/null +++ b/arch/powerpc/include/asm/ultravisor.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Ultravisor definitions + * + * Copyright 2019, IBM Corporation. + * + */ +#ifndef _ASM_POWERPC_ULTRAVISOR_H +#define _ASM_POWERPC_ULTRAVISOR_H + +#include +#include +#include + +int early_init_dt_scan_ultravisor(unsigned long node, const char *uname, + int depth, void *data); + +/* + * In ultravisor enabled systems, PTCR becomes ultravisor privileged only for + * writing and an attempt to write to it will cause a Hypervisor Emulation + * Assistance interrupt. + */ +static inline void set_ptcr_when_no_uv(u64 val) +{ + if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR)) + mtspr(SPRN_PTCR, val); +} + +static inline int uv_register_pate(u64 lpid, u64 dw0, u64 dw1) +{ + return ucall_norets(UV_WRITE_PATE, lpid, dw0, dw1); +} + +#endif /* _ASM_POWERPC_ULTRAVISOR_H */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index ea0c69236789..934e64b28894 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -53,7 +53,7 @@ obj-y := cputable.o ptrace.o syscalls.o \ dma-common.o obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ - paca.o nvram_64.o firmware.o + paca.o nvram_64.o firmware.o note.o obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o @@ -156,6 +156,7 @@ endif obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o +obj-$(CONFIG_PPC_POWERNV) += ucall.o # Disable GCOV, KCOV & sanitizers in odd or sensitive code GCOV_PROFILE_prom_init.o := n diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 4ccb6b3a7fbd..484f54dab247 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -506,6 +506,7 @@ int main(void) OFFSET(KVM_VRMA_SLB_V, kvm, arch.vrma_slb_v); OFFSET(KVM_RADIX, kvm, arch.radix); OFFSET(KVM_FWNMI, kvm, arch.fwnmi_enabled); + OFFSET(KVM_SECURE_GUEST, kvm, arch.secure_guest); OFFSET(VCPU_DSISR, kvm_vcpu, arch.shregs.dsisr); OFFSET(VCPU_DAR, kvm_vcpu, arch.shregs.dar); OFFSET(VCPU_VPA, kvm_vcpu, arch.vpa.pinned_addr); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index e7a2b160d4c6..9704f3f76e63 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1013,29 +1013,32 @@ int iommu_tce_check_gpa(unsigned long page_shift, unsigned long gpa) } EXPORT_SYMBOL_GPL(iommu_tce_check_gpa); -long iommu_tce_xchg(struct mm_struct *mm, struct iommu_table *tbl, +extern long iommu_tce_xchg_no_kill(struct mm_struct *mm, + struct iommu_table *tbl, unsigned long entry, unsigned long *hpa, enum dma_data_direction *direction) { long ret; unsigned long size = 0; - ret = tbl->it_ops->exchange(tbl, entry, hpa, direction); - + ret = tbl->it_ops->xchg_no_kill(tbl, entry, hpa, direction, false); if (!ret && ((*direction == DMA_FROM_DEVICE) || (*direction == DMA_BIDIRECTIONAL)) && !mm_iommu_is_devmem(mm, *hpa, tbl->it_page_shift, &size)) SetPageDirty(pfn_to_page(*hpa >> PAGE_SHIFT)); - /* if (unlikely(ret)) - pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n", - __func__, hwaddr, entry << tbl->it_page_shift, - hwaddr, ret); */ - return ret; } -EXPORT_SYMBOL_GPL(iommu_tce_xchg); +EXPORT_SYMBOL_GPL(iommu_tce_xchg_no_kill); + +void iommu_tce_kill(struct iommu_table *tbl, + unsigned long entry, unsigned long pages) +{ + if (tbl->it_ops->tce_kill) + tbl->it_ops->tce_kill(tbl, entry, pages, false); +} +EXPORT_SYMBOL_GPL(iommu_tce_kill); int iommu_take_ownership(struct iommu_table *tbl) { @@ -1049,7 +1052,7 @@ int iommu_take_ownership(struct iommu_table *tbl) * requires exchange() callback defined so if it is not * implemented, we disallow taking ownership over the table. */ - if (!tbl->it_ops->exchange) + if (!tbl->it_ops->xchg_no_kill) return -EINVAL; spin_lock_irqsave(&tbl->large_pool.lock, flags); diff --git a/arch/powerpc/kernel/note.S b/arch/powerpc/kernel/note.S new file mode 100644 index 000000000000..bcdad15395dd --- /dev/null +++ b/arch/powerpc/kernel/note.S @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * PowerPC ELF notes. + * + * Copyright 2019, IBM Corporation + */ + +#include +#include + +/* + * Ultravisor-capable bit (PowerNV only). + * + * Bit 0 indicates that the powerpc kernel binary knows how to run in an + * ultravisor-enabled system. + * + * In an ultravisor-enabled system, some machine resources are now controlled + * by the ultravisor. If the kernel is not ultravisor-capable, but it ends up + * being run on a machine with ultravisor, the kernel will probably crash + * trying to access ultravisor resources. For instance, it may crash in early + * boot trying to set the partition table entry 0. + * + * In an ultravisor-enabled system, a bootloader could warn the user or prevent + * the kernel from being run if the PowerPC ultravisor capability doesn't exist + * or the Ultravisor-capable bit is not set. + */ +#ifdef CONFIG_PPC_POWERNV +#define PPCCAP_ULTRAVISOR_BIT (1 << 0) +#else +#define PPCCAP_ULTRAVISOR_BIT 0 +#endif + +/* + * Add the PowerPC Capabilities in the binary ELF note. It is a bitmap that + * can be used to advertise kernel capabilities to userland. + */ +#define PPC_CAPABILITIES_BITMAP (PPCCAP_ULTRAVISOR_BIT) + +ELFNOTE(PowerPC, PPC_ELFNOTE_CAPABILITIES, + .long PPC_CAPABILITIES_BITMAP) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 7159e791a70d..5828f1c81dc9 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -55,6 +55,7 @@ #include #include #include +#include #include @@ -702,6 +703,9 @@ void __init early_init_devtree(void *params) #ifdef CONFIG_PPC_POWERNV /* Some machines might need OPAL info for debugging, grab it now. */ of_scan_flat_dt(early_init_dt_scan_opal, NULL); + + /* Scan tree for ultravisor feature */ + of_scan_flat_dt(early_init_dt_scan_ultravisor, NULL); #endif #ifdef CONFIG_FA_DUMP diff --git a/arch/powerpc/kernel/ucall.S b/arch/powerpc/kernel/ucall.S new file mode 100644 index 000000000000..07296bc39166 --- /dev/null +++ b/arch/powerpc/kernel/ucall.S @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Generic code to perform an ultravisor call. + * + * Copyright 2019, IBM Corporation. + * + */ +#include +#include + +_GLOBAL(ucall_norets) +EXPORT_SYMBOL_GPL(ucall_norets) + sc 2 /* Invoke the ultravisor */ + blr /* Return r3 = status */ diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index e99a14798ab0..5834db0a54c6 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -416,7 +416,7 @@ static void kvmppc_clear_tce(struct mm_struct *mm, struct iommu_table *tbl, unsigned long hpa = 0; enum dma_data_direction dir = DMA_NONE; - iommu_tce_xchg(mm, tbl, entry, &hpa, &dir); + iommu_tce_xchg_no_kill(mm, tbl, entry, &hpa, &dir); } static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm, @@ -447,7 +447,8 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm, unsigned long hpa = 0; long ret; - if (WARN_ON_ONCE(iommu_tce_xchg(kvm->mm, tbl, entry, &hpa, &dir))) + if (WARN_ON_ONCE(iommu_tce_xchg_no_kill(kvm->mm, tbl, entry, &hpa, + &dir))) return H_TOO_HARD; if (dir == DMA_NONE) @@ -455,7 +456,7 @@ static long kvmppc_tce_iommu_do_unmap(struct kvm *kvm, ret = kvmppc_tce_iommu_mapped_dec(kvm, tbl, entry); if (ret != H_SUCCESS) - iommu_tce_xchg(kvm->mm, tbl, entry, &hpa, &dir); + iommu_tce_xchg_no_kill(kvm->mm, tbl, entry, &hpa, &dir); return ret; } @@ -501,7 +502,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, if (mm_iommu_mapped_inc(mem)) return H_TOO_HARD; - ret = iommu_tce_xchg(kvm->mm, tbl, entry, &hpa, &dir); + ret = iommu_tce_xchg_no_kill(kvm->mm, tbl, entry, &hpa, &dir); if (WARN_ON_ONCE(ret)) { mm_iommu_mapped_dec(mem); return H_TOO_HARD; @@ -579,6 +580,8 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, stit->tbl, entry, ua, dir); + iommu_tce_kill(stit->tbl, entry, 1); + if (ret != H_SUCCESS) { kvmppc_clear_tce(vcpu->kvm->mm, stit->tbl, entry); goto unlock_exit; @@ -656,12 +659,14 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, */ if (get_user(tce, tces + i)) { ret = H_TOO_HARD; - goto unlock_exit; + goto invalidate_exit; } tce = be64_to_cpu(tce); - if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua)) - return H_PARAMETER; + if (kvmppc_tce_to_ua(vcpu->kvm, tce, &ua)) { + ret = H_PARAMETER; + goto invalidate_exit; + } list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { ret = kvmppc_tce_iommu_map(vcpu->kvm, stt, @@ -671,13 +676,17 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, if (ret != H_SUCCESS) { kvmppc_clear_tce(vcpu->kvm->mm, stit->tbl, entry); - goto unlock_exit; + goto invalidate_exit; } } kvmppc_tce_put(stt, entry + i, tce); } +invalidate_exit: + list_for_each_entry_lockless(stit, &stt->iommu_tables, next) + iommu_tce_kill(stit->tbl, entry, npages); + unlock_exit: srcu_read_unlock(&vcpu->kvm->srcu, idx); @@ -716,7 +725,7 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, continue; if (ret == H_TOO_HARD) - return ret; + goto invalidate_exit; WARN_ON_ONCE(1); kvmppc_clear_tce(vcpu->kvm->mm, stit->tbl, entry); @@ -726,6 +735,10 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift)) kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value); - return H_SUCCESS; +invalidate_exit: + list_for_each_entry_lockless(stit, &stt->iommu_tables, next) + iommu_tce_kill(stit->tbl, ioba >> stt->page_shift, npages); + + return ret; } EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce); diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index f50bbeedfc66..ab6eeb8e753e 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -218,13 +218,14 @@ static long kvmppc_rm_ioba_validate(struct kvmppc_spapr_tce_table *stt, return H_SUCCESS; } -static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl, +static long iommu_tce_xchg_no_kill_rm(struct mm_struct *mm, + struct iommu_table *tbl, unsigned long entry, unsigned long *hpa, enum dma_data_direction *direction) { long ret; - ret = tbl->it_ops->exchange_rm(tbl, entry, hpa, direction); + ret = tbl->it_ops->xchg_no_kill(tbl, entry, hpa, direction, true); if (!ret && ((*direction == DMA_FROM_DEVICE) || (*direction == DMA_BIDIRECTIONAL))) { @@ -240,13 +241,20 @@ static long iommu_tce_xchg_rm(struct mm_struct *mm, struct iommu_table *tbl, return ret; } +extern void iommu_tce_kill_rm(struct iommu_table *tbl, + unsigned long entry, unsigned long pages) +{ + if (tbl->it_ops->tce_kill) + tbl->it_ops->tce_kill(tbl, entry, pages, true); +} + static void kvmppc_rm_clear_tce(struct kvm *kvm, struct iommu_table *tbl, unsigned long entry) { unsigned long hpa = 0; enum dma_data_direction dir = DMA_NONE; - iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir); + iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir); } static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm, @@ -278,7 +286,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm, unsigned long hpa = 0; long ret; - if (iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir)) + if (iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir)) /* * real mode xchg can fail if struct page crosses * a page boundary @@ -290,7 +298,7 @@ static long kvmppc_rm_tce_iommu_do_unmap(struct kvm *kvm, ret = kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry); if (ret) - iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir); + iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir); return ret; } @@ -336,7 +344,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl, if (WARN_ON_ONCE_RM(mm_iommu_mapped_inc(mem))) return H_TOO_HARD; - ret = iommu_tce_xchg_rm(kvm->mm, tbl, entry, &hpa, &dir); + ret = iommu_tce_xchg_no_kill_rm(kvm->mm, tbl, entry, &hpa, &dir); if (ret) { mm_iommu_mapped_dec(mem); /* @@ -417,6 +425,8 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt, stit->tbl, entry, ua, dir); + iommu_tce_kill_rm(stit->tbl, entry, 1); + if (ret != H_SUCCESS) { kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry); return ret; @@ -556,8 +566,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, unsigned long tce = be64_to_cpu(((u64 *)tces)[i]); ua = 0; - if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) - return H_PARAMETER; + if (kvmppc_rm_tce_to_ua(vcpu->kvm, tce, &ua, NULL)) { + ret = H_PARAMETER; + goto invalidate_exit; + } list_for_each_entry_lockless(stit, &stt->iommu_tables, next) { ret = kvmppc_rm_tce_iommu_map(vcpu->kvm, stt, @@ -567,13 +579,17 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, if (ret != H_SUCCESS) { kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry); - goto unlock_exit; + goto invalidate_exit; } } kvmppc_rm_tce_put(stt, entry + i, tce); } +invalidate_exit: + list_for_each_entry_lockless(stit, &stt->iommu_tables, next) + iommu_tce_kill_rm(stit->tbl, entry, npages); + unlock_exit: if (rmap) unlock_rmap(rmap); @@ -616,7 +632,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, continue; if (ret == H_TOO_HARD) - return ret; + goto invalidate_exit; WARN_ON_ONCE_RM(1); kvmppc_rm_clear_tce(vcpu->kvm, stit->tbl, entry); @@ -626,7 +642,11 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu, for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift)) kvmppc_rm_tce_put(stt, ioba >> stt->page_shift, tce_value); - return H_SUCCESS; +invalidate_exit: + list_for_each_entry_lockless(stit, &stt->iommu_tables, next) + iommu_tce_kill_rm(stit->tbl, ioba >> stt->page_shift, npages); + + return ret; } /* This can be called in either virtual mode or real mode */ diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 07181d0dfcb7..9a05b0d932ef 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -29,6 +29,7 @@ #include #include #include +#include /* Sign-extend HDEC if not on POWER9 */ #define EXTEND_HDEC(reg) \ @@ -1085,16 +1086,10 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r5, VCPU_LR(r4) - ld r6, VCPU_CR(r4) mtlr r5 - mtcr r6 ld r1, VCPU_GPR(R1)(r4) - ld r2, VCPU_GPR(R2)(r4) - ld r3, VCPU_GPR(R3)(r4) ld r5, VCPU_GPR(R5)(r4) - ld r6, VCPU_GPR(R6)(r4) - ld r7, VCPU_GPR(R7)(r4) ld r8, VCPU_GPR(R8)(r4) ld r9, VCPU_GPR(R9)(r4) ld r10, VCPU_GPR(R10)(r4) @@ -1112,10 +1107,42 @@ BEGIN_FTR_SECTION mtspr SPRN_HDSISR, r0 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) + ld r6, VCPU_KVM(r4) + lbz r7, KVM_SECURE_GUEST(r6) + cmpdi r7, 0 + ld r6, VCPU_GPR(R6)(r4) + ld r7, VCPU_GPR(R7)(r4) + bne ret_to_ultra + + lwz r0, VCPU_CR(r4) + mtcr r0 + ld r0, VCPU_GPR(R0)(r4) + ld r2, VCPU_GPR(R2)(r4) + ld r3, VCPU_GPR(R3)(r4) ld r4, VCPU_GPR(R4)(r4) HRFI_TO_GUEST b . +/* + * Use UV_RETURN ultracall to return control back to the Ultravisor after + * processing an hypercall or interrupt that was forwarded (a.k.a. reflected) + * to the Hypervisor. + * + * All registers have already been loaded, except: + * R0 = hcall result + * R2 = SRR1, so UV can detect a synthesized interrupt (if any) + * R3 = UV_RETURN + */ +ret_to_ultra: + lwz r0, VCPU_CR(r4) + mtcr r0 + + ld r0, VCPU_GPR(R3)(r4) + mfspr r2, SPRN_SRR1 + li r3, 0 + ori r3, r3, UV_RETURN + ld r4, VCPU_GPR(R4)(r4) + sc 2 /* * Enter the guest on a P9 or later system where we have exactly diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index c3bfef08dcf8..4d0bb194ed70 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -62,6 +62,7 @@ #include #include #include +#include #include @@ -1076,8 +1077,8 @@ void hash__early_init_mmu_secondary(void) if (!cpu_has_feature(CPU_FTR_ARCH_300)) mtspr(SPRN_SDR1, _SDR1); else - mtspr(SPRN_PTCR, - __pa(partition_tb) | (PATB_SIZE_SHIFT - 12)); + set_ptcr_when_no_uv(__pa(partition_tb) | + (PATB_SIZE_SHIFT - 12)); } /* Initialize SLB */ slb_initialize(); diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 4c8bed856533..206b43ae4000 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -12,6 +12,8 @@ #include #include #include +#include +#include #include #include @@ -205,25 +207,14 @@ void __init mmu_partition_table_init(void) * 64 K size. */ ptcr = __pa(partition_tb) | (PATB_SIZE_SHIFT - 12); - mtspr(SPRN_PTCR, ptcr); + set_ptcr_when_no_uv(ptcr); powernv_set_nmmu_ptcr(ptcr); } -void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0, - unsigned long dw1) +static void flush_partition(unsigned int lpid, bool radix) { - unsigned long old = be64_to_cpu(partition_tb[lpid].patb0); - - partition_tb[lpid].patb0 = cpu_to_be64(dw0); - partition_tb[lpid].patb1 = cpu_to_be64(dw1); - - /* - * Global flush of TLBs and partition table caches for this lpid. - * The type of flush (hash or radix) depends on what the previous - * use of this partition ID was, not the new use. - */ asm volatile("ptesync" : : : "memory"); - if (old & PATB_HR) { + if (radix) { asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : : "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid)); asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : : @@ -237,6 +228,39 @@ void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0, /* do we need fixup here ?*/ asm volatile("eieio; tlbsync; ptesync" : : : "memory"); } + +void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0, + unsigned long dw1) +{ + unsigned long old = be64_to_cpu(partition_tb[lpid].patb0); + + /* + * When ultravisor is enabled, the partition table is stored in secure + * memory and can only be accessed doing an ultravisor call. However, we + * maintain a copy of the partition table in normal memory to allow Nest + * MMU translations to occur (for normal VMs). + * + * Therefore, here we always update partition_tb, regardless of whether + * we are running under an ultravisor or not. + */ + partition_tb[lpid].patb0 = cpu_to_be64(dw0); + partition_tb[lpid].patb1 = cpu_to_be64(dw1); + + /* + * If ultravisor is enabled, we do an ultravisor call to register the + * partition table entry (PATE), which also do a global flush of TLBs + * and partition table caches for the lpid. Otherwise, just do the + * flush. The type of flush (hash or radix) depends on what the previous + * use of the partition ID was, not the new use. + */ + if (firmware_has_feature(FW_FEATURE_ULTRAVISOR)) { + uv_register_pate(lpid, dw0, dw1); + pr_info("PATE registered by ultravisor: dw0 = 0x%lx, dw1 = 0x%lx\n", + dw0, dw1); + } else { + flush_partition(lpid, (old & PATB_HR)); + } +} EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry); static pmd_t *get_pmd_from_cache(struct mm_struct *mm) diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 5bab67fe96e0..71b649473045 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -27,6 +27,7 @@ #include #include #include +#include #include @@ -650,8 +651,9 @@ void radix__early_init_mmu_secondary(void) lpcr = mfspr(SPRN_LPCR); mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR); - mtspr(SPRN_PTCR, - __pa(partition_tb) | (PATB_SIZE_SHIFT - 12)); + set_ptcr_when_no_uv(__pa(partition_tb) | + (PATB_SIZE_SHIFT - 12)); + radix_init_amor(); } @@ -667,7 +669,7 @@ void radix__mmu_cleanup_all(void) if (!firmware_has_feature(FW_FEATURE_LPAR)) { lpcr = mfspr(SPRN_LPCR); mtspr(SPRN_LPCR, lpcr & ~LPCR_UPRT); - mtspr(SPRN_PTCR, 0); + set_ptcr_when_no_uv(0); powernv_set_nmmu_ptcr(0); radix__flush_tlb_all(); } diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 69a3aefa905b..49186f0985a4 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -4,6 +4,7 @@ obj-y += idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o +obj-y += ultravisor.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 09f49eed7fb8..78599bca66c2 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -675,7 +675,8 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on) sprs.ptcr = mfspr(SPRN_PTCR); sprs.rpr = mfspr(SPRN_RPR); sprs.tscr = mfspr(SPRN_TSCR); - sprs.ldbar = mfspr(SPRN_LDBAR); + if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR)) + sprs.ldbar = mfspr(SPRN_LDBAR); sprs_saved = true; @@ -789,7 +790,8 @@ core_woken: mtspr(SPRN_MMCR0, sprs.mmcr0); mtspr(SPRN_MMCR1, sprs.mmcr1); mtspr(SPRN_MMCR2, sprs.mmcr2); - mtspr(SPRN_LDBAR, sprs.ldbar); + if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR)) + mtspr(SPRN_LDBAR, sprs.ldbar); mtspr(SPRN_SPRG3, local_paca->sprg_vdso); diff --git a/arch/powerpc/platforms/powernv/opal-msglog.c b/arch/powerpc/platforms/powernv/opal-msglog.c index dc51d03c6370..d26da19a611f 100644 --- a/arch/powerpc/platforms/powernv/opal-msglog.c +++ b/arch/powerpc/platforms/powernv/opal-msglog.c @@ -29,23 +29,23 @@ struct memcons { static struct memcons *opal_memcons = NULL; -ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count) +ssize_t memcons_copy(struct memcons *mc, char *to, loff_t pos, size_t count) { const char *conbuf; ssize_t ret; size_t first_read = 0; uint32_t out_pos, avail; - if (!opal_memcons) + if (!mc) return -ENODEV; - out_pos = be32_to_cpu(READ_ONCE(opal_memcons->out_pos)); + out_pos = be32_to_cpu(READ_ONCE(mc->out_pos)); /* Now we've read out_pos, put a barrier in before reading the new * data it points to in conbuf. */ smp_rmb(); - conbuf = phys_to_virt(be64_to_cpu(opal_memcons->obuf_phys)); + conbuf = phys_to_virt(be64_to_cpu(mc->obuf_phys)); /* When the buffer has wrapped, read from the out_pos marker to the end * of the buffer, and then read the remaining data as in the un-wrapped @@ -53,7 +53,7 @@ ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count) if (out_pos & MEMCONS_OUT_POS_WRAP) { out_pos &= MEMCONS_OUT_POS_MASK; - avail = be32_to_cpu(opal_memcons->obuf_size) - out_pos; + avail = be32_to_cpu(mc->obuf_size) - out_pos; ret = memory_read_from_buffer(to, count, &pos, conbuf + out_pos, avail); @@ -71,7 +71,7 @@ ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count) } /* Sanity check. The firmware should not do this to us. */ - if (out_pos > be32_to_cpu(opal_memcons->obuf_size)) { + if (out_pos > be32_to_cpu(mc->obuf_size)) { pr_err("OPAL: memory console corruption. Aborting read.\n"); return -EINVAL; } @@ -86,6 +86,11 @@ out: return ret; } +ssize_t opal_msglog_copy(char *to, loff_t pos, size_t count) +{ + return memcons_copy(opal_memcons, to, pos, count); +} + static ssize_t opal_msglog_read(struct file *file, struct kobject *kobj, struct bin_attribute *bin_attr, char *to, loff_t pos, size_t count) @@ -98,32 +103,48 @@ static struct bin_attribute opal_msglog_attr = { .read = opal_msglog_read }; -void __init opal_msglog_init(void) +struct memcons *memcons_init(struct device_node *node, const char *mc_prop_name) { u64 mcaddr; struct memcons *mc; - if (of_property_read_u64(opal_node, "ibm,opal-memcons", &mcaddr)) { - pr_warn("OPAL: Property ibm,opal-memcons not found, no message log\n"); - return; + if (of_property_read_u64(node, mc_prop_name, &mcaddr)) { + pr_warn("%s property not found, no message log\n", + mc_prop_name); + goto out_err; } mc = phys_to_virt(mcaddr); if (!mc) { - pr_warn("OPAL: memory console address is invalid\n"); - return; + pr_warn("memory console address is invalid\n"); + goto out_err; } if (be64_to_cpu(mc->magic) != MEMCONS_MAGIC) { - pr_warn("OPAL: memory console version is invalid\n"); + pr_warn("memory console version is invalid\n"); + goto out_err; + } + + return mc; + +out_err: + return NULL; +} + +u32 memcons_get_size(struct memcons *mc) +{ + return be32_to_cpu(mc->ibuf_size) + be32_to_cpu(mc->obuf_size); +} + +void __init opal_msglog_init(void) +{ + opal_memcons = memcons_init(opal_node, "ibm,opal-memcons"); + if (!opal_memcons) { + pr_warn("OPAL: memcons failed to load from ibm,opal-memcons\n"); return; } - /* Report maximum size */ - opal_msglog_attr.size = be32_to_cpu(mc->ibuf_size) + - be32_to_cpu(mc->obuf_size); - - opal_memcons = mc; + opal_msglog_attr.size = memcons_get_size(opal_memcons); } void __init opal_msglog_sysfs_init(void) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index ec48ea25a674..c28d0d9b7ee0 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1939,26 +1939,12 @@ static int pnv_ioda1_tce_build(struct iommu_table *tbl, long index, } #ifdef CONFIG_IOMMU_API -static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, long index, - unsigned long *hpa, enum dma_data_direction *direction) +/* Common for IODA1 and IODA2 */ +static int pnv_ioda_tce_xchg_no_kill(struct iommu_table *tbl, long index, + unsigned long *hpa, enum dma_data_direction *direction, + bool realmode) { - long ret = pnv_tce_xchg(tbl, index, hpa, direction, true); - - if (!ret) - pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, false); - - return ret; -} - -static int pnv_ioda1_tce_xchg_rm(struct iommu_table *tbl, long index, - unsigned long *hpa, enum dma_data_direction *direction) -{ - long ret = pnv_tce_xchg(tbl, index, hpa, direction, false); - - if (!ret) - pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, true); - - return ret; + return pnv_tce_xchg(tbl, index, hpa, direction, !realmode); } #endif @@ -1973,8 +1959,8 @@ static void pnv_ioda1_tce_free(struct iommu_table *tbl, long index, static struct iommu_table_ops pnv_ioda1_iommu_ops = { .set = pnv_ioda1_tce_build, #ifdef CONFIG_IOMMU_API - .exchange = pnv_ioda1_tce_xchg, - .exchange_rm = pnv_ioda1_tce_xchg_rm, + .xchg_no_kill = pnv_ioda_tce_xchg_no_kill, + .tce_kill = pnv_pci_p7ioc_tce_invalidate, .useraddrptr = pnv_tce_useraddrptr, #endif .clear = pnv_ioda1_tce_free, @@ -2103,30 +2089,6 @@ static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index, return ret; } -#ifdef CONFIG_IOMMU_API -static int pnv_ioda2_tce_xchg(struct iommu_table *tbl, long index, - unsigned long *hpa, enum dma_data_direction *direction) -{ - long ret = pnv_tce_xchg(tbl, index, hpa, direction, true); - - if (!ret) - pnv_pci_ioda2_tce_invalidate(tbl, index, 1, false); - - return ret; -} - -static int pnv_ioda2_tce_xchg_rm(struct iommu_table *tbl, long index, - unsigned long *hpa, enum dma_data_direction *direction) -{ - long ret = pnv_tce_xchg(tbl, index, hpa, direction, false); - - if (!ret) - pnv_pci_ioda2_tce_invalidate(tbl, index, 1, true); - - return ret; -} -#endif - static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index, long npages) { @@ -2138,8 +2100,8 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, long index, static struct iommu_table_ops pnv_ioda2_iommu_ops = { .set = pnv_ioda2_tce_build, #ifdef CONFIG_IOMMU_API - .exchange = pnv_ioda2_tce_xchg, - .exchange_rm = pnv_ioda2_tce_xchg_rm, + .xchg_no_kill = pnv_ioda_tce_xchg_no_kill, + .tce_kill = pnv_pci_ioda2_tce_invalidate, .useraddrptr = pnv_tce_useraddrptr, #endif .clear = pnv_ioda2_tce_free, diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index fd4a1c5a6369..1aa51c4fa904 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -30,4 +30,9 @@ extern void opal_event_shutdown(void); bool cpu_core_split_required(void); +struct memcons; +ssize_t memcons_copy(struct memcons *mc, char *to, loff_t pos, size_t count); +u32 memcons_get_size(struct memcons *mc); +struct memcons *memcons_init(struct device_node *node, const char *mc_prop_name); + #endif /* _POWERNV_H */ diff --git a/arch/powerpc/platforms/powernv/ultravisor.c b/arch/powerpc/platforms/powernv/ultravisor.c new file mode 100644 index 000000000000..e4a00ad06f9d --- /dev/null +++ b/arch/powerpc/platforms/powernv/ultravisor.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Ultravisor high level interfaces + * + * Copyright 2019, IBM Corporation. + * + */ +#include +#include +#include +#include + +#include +#include +#include + +#include "powernv.h" + +static struct kobject *ultravisor_kobj; + +int __init early_init_dt_scan_ultravisor(unsigned long node, const char *uname, + int depth, void *data) +{ + if (!of_flat_dt_is_compatible(node, "ibm,ultravisor")) + return 0; + + powerpc_firmware_features |= FW_FEATURE_ULTRAVISOR; + pr_debug("Ultravisor detected!\n"); + return 1; +} + +static struct memcons *uv_memcons; + +static ssize_t uv_msglog_read(struct file *file, struct kobject *kobj, + struct bin_attribute *bin_attr, char *to, + loff_t pos, size_t count) +{ + return memcons_copy(uv_memcons, to, pos, count); +} + +static struct bin_attribute uv_msglog_attr = { + .attr = {.name = "msglog", .mode = 0400}, + .read = uv_msglog_read +}; + +static int __init uv_init(void) +{ + struct device_node *node; + + if (!firmware_has_feature(FW_FEATURE_ULTRAVISOR)) + return 0; + + node = of_find_compatible_node(NULL, NULL, "ibm,uv-firmware"); + if (!node) + return -ENODEV; + + uv_memcons = memcons_init(node, "memcons"); + if (!uv_memcons) + return -ENOENT; + + uv_msglog_attr.size = memcons_get_size(uv_memcons); + + ultravisor_kobj = kobject_create_and_add("ultravisor", firmware_kobj); + if (!ultravisor_kobj) + return -ENOMEM; + + return sysfs_create_bin_file(ultravisor_kobj, &uv_msglog_attr); +} +machine_subsys_initcall(powernv, uv_init); diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 42fb03253334..df7db33ca93b 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -621,7 +621,8 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus) #ifdef CONFIG_IOMMU_API static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned - long *tce, enum dma_data_direction *direction) + long *tce, enum dma_data_direction *direction, + bool realmode) { long rc; unsigned long ioba = (unsigned long) index << tbl->it_page_shift; @@ -649,7 +650,7 @@ static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned struct iommu_table_ops iommu_table_lpar_multi_ops = { .set = tce_buildmulti_pSeriesLP, #ifdef CONFIG_IOMMU_API - .exchange = tce_exchange_pseries, + .xchg_no_kill = tce_exchange_pseries, #endif .clear = tce_freemulti_pSeriesLP, .get = tce_get_pSeriesLP diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 8ce9ad21129f..9809369e0ed3 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -435,7 +435,7 @@ static int tce_iommu_clear(struct tce_container *container, unsigned long oldhpa; long ret; enum dma_data_direction direction; - unsigned long lastentry = entry + pages; + unsigned long lastentry = entry + pages, firstentry = entry; for ( ; entry < lastentry; ++entry) { if (tbl->it_indirect_levels && tbl->it_userspace) { @@ -460,7 +460,7 @@ static int tce_iommu_clear(struct tce_container *container, direction = DMA_NONE; oldhpa = 0; - ret = iommu_tce_xchg(container->mm, tbl, entry, &oldhpa, + ret = iommu_tce_xchg_no_kill(container->mm, tbl, entry, &oldhpa, &direction); if (ret) continue; @@ -476,6 +476,8 @@ static int tce_iommu_clear(struct tce_container *container, tce_iommu_unuse_page(container, oldhpa); } + iommu_tce_kill(tbl, firstentry, pages); + return 0; } @@ -518,8 +520,8 @@ static long tce_iommu_build(struct tce_container *container, hpa |= offset; dirtmp = direction; - ret = iommu_tce_xchg(container->mm, tbl, entry + i, &hpa, - &dirtmp); + ret = iommu_tce_xchg_no_kill(container->mm, tbl, entry + i, + &hpa, &dirtmp); if (ret) { tce_iommu_unuse_page(container, hpa); pr_err("iommu_tce: %s failed ioba=%lx, tce=%lx, ret=%ld\n", @@ -536,6 +538,8 @@ static long tce_iommu_build(struct tce_container *container, if (ret) tce_iommu_clear(container, tbl, entry, i); + else + iommu_tce_kill(tbl, entry, pages); return ret; } @@ -572,8 +576,8 @@ static long tce_iommu_build_v2(struct tce_container *container, if (mm_iommu_mapped_inc(mem)) break; - ret = iommu_tce_xchg(container->mm, tbl, entry + i, &hpa, - &dirtmp); + ret = iommu_tce_xchg_no_kill(container->mm, tbl, entry + i, + &hpa, &dirtmp); if (ret) { /* dirtmp cannot be DMA_NONE here */ tce_iommu_unuse_page_v2(container, tbl, entry + i); @@ -593,6 +597,8 @@ static long tce_iommu_build_v2(struct tce_container *container, if (ret) tce_iommu_clear(container, tbl, entry, i); + else + iommu_tce_kill(tbl, entry, pages); return ret; }