6657fca06e
All pieces of the puzzle are in place and we can now allow to boot with CONFIG_X86_5LEVEL=y on a machine without LA57 support. Kernel will detect that LA57 is missing and fold p4d at runtime. Update the documentation and the Kconfig option description to reflect the change. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-10-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
62 lines
2.4 KiB
Plaintext
62 lines
2.4 KiB
Plaintext
== Overview ==
|
|
|
|
Original x86-64 was limited by 4-level paing to 256 TiB of virtual address
|
|
space and 64 TiB of physical address space. We are already bumping into
|
|
this limit: some vendors offers servers with 64 TiB of memory today.
|
|
|
|
To overcome the limitation upcoming hardware will introduce support for
|
|
5-level paging. It is a straight-forward extension of the current page
|
|
table structure adding one more layer of translation.
|
|
|
|
It bumps the limits to 128 PiB of virtual address space and 4 PiB of
|
|
physical address space. This "ought to be enough for anybody" ©.
|
|
|
|
QEMU 2.9 and later support 5-level paging.
|
|
|
|
Virtual memory layout for 5-level paging is described in
|
|
Documentation/x86/x86_64/mm.txt
|
|
|
|
== Enabling 5-level paging ==
|
|
|
|
CONFIG_X86_5LEVEL=y enables the feature.
|
|
|
|
Kernel with CONFIG_X86_5LEVEL=y still able to boot on 4-level hardware.
|
|
In this case additional page table level -- p4d -- will be folded at
|
|
runtime.
|
|
|
|
== User-space and large virtual address space ==
|
|
|
|
On x86, 5-level paging enables 56-bit userspace virtual address space.
|
|
Not all user space is ready to handle wide addresses. It's known that
|
|
at least some JIT compilers use higher bits in pointers to encode their
|
|
information. It collides with valid pointers with 5-level paging and
|
|
leads to crashes.
|
|
|
|
To mitigate this, we are not going to allocate virtual address space
|
|
above 47-bit by default.
|
|
|
|
But userspace can ask for allocation from full address space by
|
|
specifying hint address (with or without MAP_FIXED) above 47-bits.
|
|
|
|
If hint address set above 47-bit, but MAP_FIXED is not specified, we try
|
|
to look for unmapped area by specified address. If it's already
|
|
occupied, we look for unmapped area in *full* address space, rather than
|
|
from 47-bit window.
|
|
|
|
A high hint address would only affect the allocation in question, but not
|
|
any future mmap()s.
|
|
|
|
Specifying high hint address on older kernel or on machine without 5-level
|
|
paging support is safe. The hint will be ignored and kernel will fall back
|
|
to allocation from 47-bit address space.
|
|
|
|
This approach helps to easily make application's memory allocator aware
|
|
about large address space without manually tracking allocated virtual
|
|
address space.
|
|
|
|
One important case we need to handle here is interaction with MPX.
|
|
MPX (without MAWA extension) cannot handle addresses above 47-bit, so we
|
|
need to make sure that MPX cannot be enabled we already have VMA above
|
|
the boundary and forbid creating such VMAs once MPX is enabled.
|
|
|