forked from Minki/linux
96 lines
4.6 KiB
ReStructuredText
96 lines
4.6 KiB
ReStructuredText
|
.. SPDX-License-Identifier: GPL-2.0
|
||
|
|
||
|
==============
|
||
|
Nitro Enclaves
|
||
|
==============
|
||
|
|
||
|
Overview
|
||
|
========
|
||
|
|
||
|
Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
|
||
|
that allows customers to carve out isolated compute environments within EC2
|
||
|
instances [1].
|
||
|
|
||
|
For example, an application that processes sensitive data and runs in a VM,
|
||
|
can be separated from other applications running in the same VM. This
|
||
|
application then runs in a separate VM than the primary VM, namely an enclave.
|
||
|
|
||
|
An enclave runs alongside the VM that spawned it. This setup matches low latency
|
||
|
applications needs. The resources that are allocated for the enclave, such as
|
||
|
memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
|
||
|
process running in the primary VM, that communicates with the NE driver via an
|
||
|
ioctl interface.
|
||
|
|
||
|
In this sense, there are two components:
|
||
|
|
||
|
1. An enclave abstraction process - a user space process running in the primary
|
||
|
VM guest that uses the provided ioctl interface of the NE driver to spawn an
|
||
|
enclave VM (that's 2 below).
|
||
|
|
||
|
There is a NE emulated PCI device exposed to the primary VM. The driver for this
|
||
|
new PCI device is included in the NE driver.
|
||
|
|
||
|
The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
|
||
|
maps to an enclave start PCI command. The PCI device commands are then
|
||
|
translated into actions taken on the hypervisor side; that's the Nitro
|
||
|
hypervisor running on the host where the primary VM is running. The Nitro
|
||
|
hypervisor is based on core KVM technology.
|
||
|
|
||
|
2. The enclave itself - a VM running on the same host as the primary VM that
|
||
|
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
|
||
|
for the enclave VM. An enclave does not have persistent storage attached.
|
||
|
|
||
|
The memory regions carved out of the primary VM and given to an enclave need to
|
||
|
be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
|
||
|
this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
|
||
|
user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
|
||
|
The enclave memory and CPUs need to be from the same NUMA node.
|
||
|
|
||
|
An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
|
||
|
available for the primary VM. A CPU pool has to be set for NE purposes by an
|
||
|
user with admin capability. See the cpu list section from the kernel
|
||
|
documentation [4] for how a CPU pool format looks.
|
||
|
|
||
|
An enclave communicates with the primary VM via a local communication channel,
|
||
|
using virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
|
||
|
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
|
||
|
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
|
||
|
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
|
||
|
device is placed in memory below the typical 4 GiB.
|
||
|
|
||
|
The application that runs in the enclave needs to be packaged in an enclave
|
||
|
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
|
||
|
enclave VM. The enclave VM has its own kernel and follows the standard Linux
|
||
|
boot protocol [6].
|
||
|
|
||
|
The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
|
||
|
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
|
||
|
number, eif version, image size and CRC.
|
||
|
|
||
|
Hash values are computed for the entire enclave image (EIF), the kernel and
|
||
|
ramdisk(s). That's used, for example, to check that the enclave image that is
|
||
|
loaded in the enclave VM is the one that was intended to be run.
|
||
|
|
||
|
These crypto measurements are included in a signed attestation document
|
||
|
generated by the Nitro Hypervisor and further used to prove the identity of the
|
||
|
enclave; KMS is an example of service that NE is integrated with and that checks
|
||
|
the attestation doc.
|
||
|
|
||
|
The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
|
||
|
init process in the enclave connects to the vsock CID of the primary VM and a
|
||
|
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
|
||
|
used to check in the primary VM that the enclave has booted. The CID of the
|
||
|
primary VM is 3.
|
||
|
|
||
|
If the enclave VM crashes or gracefully exits, an interrupt event is received by
|
||
|
the NE driver. This event is sent further to the user space enclave process
|
||
|
running in the primary VM via a poll notification mechanism. Then the user space
|
||
|
enclave process can exit.
|
||
|
|
||
|
[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
|
||
|
[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
|
||
|
[3] https://lwn.net/Articles/807108/
|
||
|
[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
|
||
|
[5] https://man7.org/linux/man-pages/man7/vsock.7.html
|
||
|
[6] https://www.kernel.org/doc/html/latest/x86/boot.html
|