251a7b3edc
x86_64 vmalloc() mappings are no longer "synchronized" among page tables via faulting since commit6eb82f9940
("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area"), since the corresponding P4D or PUD pages are now preallocated at boot, by preallocate_vmalloc_pages(). Drop the "lazily synchronized" description for less confusion. While this file is x86_64-specific, it is worth noting that things are different for x86_32, where vmalloc()-related changes to `init_mm.pgd` are synchronized to all page tables in the system during runtime, via arch_sync_kernel_mappings(). Unfortunately, this synchronization is subject to race condition, which is further handled via faulting, see vmalloc_fault(). See commit4819e15f74
("x86/mm/32: Bring back vmalloc faulting on x86_32") for more details. Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20210818220123.2623-1-yepeilin.cs@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
158 lines
11 KiB
ReStructuredText
158 lines
11 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
=================
|
|
Memory Management
|
|
=================
|
|
|
|
Complete virtual memory map with 4-level page tables
|
|
====================================================
|
|
|
|
.. note::
|
|
|
|
- Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
|
|
from the top of the 64-bit address space. It's easier to understand the layout
|
|
when seen both in absolute addresses and in distance-from-top notation.
|
|
|
|
For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
|
|
64-bit address space (ffffffffffffffff).
|
|
|
|
Note that as we get closer to the top of the address space, the notation changes
|
|
from TB to GB and then MB/KB.
|
|
|
|
- "16M TB" might look weird at first sight, but it's an easier way to visualize size
|
|
notation than "16 EB", which few will recognize at first sight as 16 exabytes.
|
|
It also shows it nicely how incredibly large 64-bit address space is.
|
|
|
|
::
|
|
|
|
========================================================================================================================
|
|
Start addr | Offset | End addr | Size | VM area description
|
|
========================================================================================================================
|
|
| | | |
|
|
0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
|
|
__________________|____________|__________________|_________|___________________________________________________________
|
|
| | | |
|
|
0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
|
|
| | | | virtual memory addresses up to the -128 TB
|
|
| | | | starting offset of kernel mappings.
|
|
__________________|____________|__________________|_________|___________________________________________________________
|
|
|
|
|
| Kernel-space virtual memory, shared between all processes:
|
|
____________________________________________________________|___________________________________________________________
|
|
| | | |
|
|
ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
|
|
ffff880000000000 | -120 TB | ffff887fffffffff | 0.5 TB | LDT remap for PTI
|
|
ffff888000000000 | -119.5 TB | ffffc87fffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
|
|
ffffc88000000000 | -55.5 TB | ffffc8ffffffffff | 0.5 TB | ... unused hole
|
|
ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
|
|
ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
|
|
ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
|
|
ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
|
|
ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
|
|
__________________|____________|__________________|_________|____________________________________________________________
|
|
|
|
|
| Identical layout to the 56-bit one from here on:
|
|
____________________________________________________________|____________________________________________________________
|
|
| | | |
|
|
fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
|
|
| | | | vaddr_end for KASLR
|
|
fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
|
|
fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
|
|
ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
|
|
ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
|
|
ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
|
|
ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
|
|
ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
|
|
ffffffff80000000 |-2048 MB | | |
|
|
ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
|
|
ffffffffff000000 | -16 MB | | |
|
|
FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
|
|
ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
|
|
ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
|
|
__________________|____________|__________________|_________|___________________________________________________________
|
|
|
|
|
|
Complete virtual memory map with 5-level page tables
|
|
====================================================
|
|
|
|
.. note::
|
|
|
|
- With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
|
|
from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PB starting
|
|
offset and many of the regions expand to support the much larger physical
|
|
memory supported.
|
|
|
|
::
|
|
|
|
========================================================================================================================
|
|
Start addr | Offset | End addr | Size | VM area description
|
|
========================================================================================================================
|
|
| | | |
|
|
0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm
|
|
__________________|____________|__________________|_________|___________________________________________________________
|
|
| | | |
|
|
0100000000000000 | +64 PB | feffffffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
|
|
| | | | virtual memory addresses up to the -64 PB
|
|
| | | | starting offset of kernel mappings.
|
|
__________________|____________|__________________|_________|___________________________________________________________
|
|
|
|
|
| Kernel-space virtual memory, shared between all processes:
|
|
____________________________________________________________|___________________________________________________________
|
|
| | | |
|
|
ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
|
|
ff10000000000000 | -60 PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
|
|
ff11000000000000 | -59.75 PB | ff90ffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base)
|
|
ff91000000000000 | -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
|
|
ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
|
|
ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole
|
|
ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base)
|
|
ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole
|
|
ffdf000000000000 | -8.25 PB | fffffbffffffffff | ~8 PB | KASAN shadow memory
|
|
__________________|____________|__________________|_________|____________________________________________________________
|
|
|
|
|
| Identical layout to the 47-bit one from here on:
|
|
____________________________________________________________|____________________________________________________________
|
|
| | | |
|
|
fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
|
|
| | | | vaddr_end for KASLR
|
|
fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
|
|
fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
|
|
ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
|
|
ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
|
|
ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
|
|
ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
|
|
ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
|
|
ffffffff80000000 |-2048 MB | | |
|
|
ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
|
|
ffffffffff000000 | -16 MB | | |
|
|
FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
|
|
ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
|
|
ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
|
|
__________________|____________|__________________|_________|___________________________________________________________
|
|
|
|
Architecture defines a 64-bit virtual address. Implementations can support
|
|
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
|
|
through to the most-significant implemented bit are sign extended.
|
|
This causes hole between user space and kernel addresses if you interpret them
|
|
as unsigned.
|
|
|
|
The direct mapping covers all memory in the system up to the highest
|
|
memory address (this means in some cases it can also include PCI memory
|
|
holes).
|
|
|
|
We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
|
|
memory window (this size is arbitrary, it can be raised later if needed).
|
|
The mappings are not part of any other kernel PGD and are only available
|
|
during EFI runtime calls.
|
|
|
|
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
|
|
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
|
|
Their order is preserved but their base will be offset early at boot time.
|
|
|
|
Be very careful vs. KASLR when changing anything here. The KASLR address
|
|
range must not overlap with anything except the KASAN shadow area, which is
|
|
correct as KASAN disables KASLR.
|
|
|
|
For both 4- and 5-level layouts, the STACKLEAK_POISON value in the last 2MB
|
|
hole: ffffffffffff4111
|