linux/Documentation/x86/x86_64/mm.txt


<previous description obsolete, deleted>

Virtual memory map with 4 level page tables:

0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffec0000000000 - fffffc0000000000 (=44 bits) kasan shadow memory (16TB)
... unused hole ...
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffef00000000 - ffffffff00000000 (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole

The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).

vmalloc space is lazily synchronized into the different PML4 pages of
the processes using the page fault handler, with init_level4_pgt as
reference.

Current X86-64 implementations only support 40 bits of address space,
but we support up to 46 bits. This expands into MBZ space in the page tables.

We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
during EFI runtime calls.

-Andi Kleen, Jul 2004
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 22:20:36 +00:00
			`<previous description obsolete, deleted>`

			`Virtual memory map with 4 level page tables:`

[PATCH] x86-64: cleanup Doc/x86_64/ files Fix typos. Lots of whitespace changes for readability and consistency. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andi Kleen <ak@suse.de> 2007-02-13 12:26:23 +00:00			`0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 22:20:36 +00:00			`hole caused by [48:63] sign extension`
x86/mm: Update memory map description to list hypervisor-reserved area Peter Anvin says: > 0xffff880000000000 is the lowest usable address because we have > agreed to leave 0xffff800000000000-0xffff880000000000 for the > hypervisor or other non-OS uses. Let's call this out in the documentation. This came up during the kernel address sanitizer discussions where it was proposed to use this area for other kernel things. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20140918195606.841389D2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> 2014-09-18 19:56:06 +00:00			`ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor`
x86: fix typo in address space documentation Fix a trivial typo in Documentation/x86/x86_64/mm.txt. [ Impact: documentation only ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Rik van Riel <riel@redhat.com> 2009-05-06 02:07:07 +00:00			`ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory`
x86: 46 bit physical address support on 64 bits Extend the maximum addressable memory on x86-64 from 2^44 to 2^46 bytes. This requires some shuffling around of the vmalloc and virtual memmap memory areas, to keep them away from the direct mapping of up to 64TB of physical memory. This patch also introduces a guard hole between the vmalloc area and the virtual memory map space. There's really no good reason why we wouldn't have a guard hole there. [ Impact: future hardware enablement ] Signed-off-by: Rik van Riel <riel@redhat.com> LKML-Reference: <20090505172856.6820db22@cuia.bos.redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> 2009-05-05 21:28:56 +00:00			`ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole`
			`ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space`
			`ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole`
			`ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)`
x86_64: add KASan support This patch adds arch specific code for kernel address sanitizer. 16TB of virtual addressed used for shadow memory. It's located in range [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup stacks. At early stage we map whole shadow region with zero page. Latter, after pages mapped to direct mapping address range we unmap zero pages from corresponding shadow (see kasan_map_shadow()) and allocate and map a real shadow memory reusing vmemmap_populate() function. Also replace __pa with __pa_nodebug before shadow initialized. __pa with CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr) __phys_addr is instrumented, so __asan_load could be called before shadow area initialized. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Jim Davis <jim.epost@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2015-02-13 22:39:25 +00:00			`... unused hole ...`
			`ffffec0000000000 - fffffc0000000000 (=44 bits) kasan shadow memory (16TB)`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 22:20:36 +00:00			`... unused hole ...`
x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # consider after upstream merge 2014-04-29 23:46:09 +00:00			`ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks`
			`... unused hole ...`
Documentation/x86: Update EFI memory region description Make it clear that the EFI page tables are only available during EFI runtime calls since that subject has come up a fair numbers of times in the past. Additionally, add the EFI region start and end addresses to the table so that it's possible to see at a glance where they fall in relation to other regions. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1448658575-17029-7-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-11-27 21:09:35 +00:00			`ffffffef00000000 - ffffffff00000000 (=64 GB) EFI region mapping space`
			`... unused hole ...`
x86_64: fix mm.txt documentation Commit 85eb69a16aab5a394ce043c2131319eae35e6493 introduced 512 MiB sized kernel and 1.5 GiB sized module space but omitted to change documentation properly. Fix that. [Wasn't the hole intentional protection hole?] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: hpa@zytor.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2008-05-12 13:43:37 +00:00			`ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0`
x86-64, docs, mm: Add vsyscall range to virtual address space layout Add the end of the virtual address space to its layout documentation. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1362428180-8865-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> 2013-03-04 20:16:18 +00:00			`ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space`
			`ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls`
			`ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 22:20:36 +00:00
[PATCH] x86-64: cleanup Doc/x86_64/ files Fix typos. Lots of whitespace changes for readability and consistency. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andi Kleen <ak@suse.de> 2007-02-13 12:26:23 +00:00			`The direct mapping covers all memory in the system up to the highest`
[PATCH] x86_64: Some clarifications for Documention/x86_64/mm.txt I got some questions on this, so just fix up the documentation. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-11-05 16:25:54 +00:00			`memory address (this means in some cases it can also include PCI memory`
[PATCH] x86-64: cleanup Doc/x86_64/ files Fix typos. Lots of whitespace changes for readability and consistency. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andi Kleen <ak@suse.de> 2007-02-13 12:26:23 +00:00			`holes).`
[PATCH] x86_64: Some clarifications for Documention/x86_64/mm.txt I got some questions on this, so just fix up the documentation. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org> 2005-11-05 16:25:54 +00:00
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 22:20:36 +00:00			`vmalloc space is lazily synchronized into the different PML4 pages of`
			`the processes using the page fault handler, with init_level4_pgt as`
			`reference.`

[PATCH] x86-64: cleanup Doc/x86_64/ files Fix typos. Lots of whitespace changes for readability and consistency. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andi Kleen <ak@suse.de> 2007-02-13 12:26:23 +00:00			`Current X86-64 implementations only support 40 bits of address space,`
			`but we support up to 46 bits. This expands into MBZ space in the page tables.`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 22:20:36 +00:00
Documentation/x86: Update EFI memory region description Make it clear that the EFI page tables are only available during EFI runtime calls since that subject has come up a fair numbers of times in the past. Additionally, add the EFI region start and end addresses to the table so that it's possible to see at a glance where they fall in relation to other regions. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1448658575-17029-7-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-11-27 21:09:35 +00:00			`We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual`
			`memory window (this size is arbitrary, it can be raised later if needed).`
			`The mappings are not part of any other kernel PGD and are only available`
			`during EFI runtime calls.`
x86/efi: Runtime services virtual mapping We map the EFI regions needed for runtime services non-contiguously, with preserved alignment on virtual addresses starting from -4G down for a total max space of 64G. This way, we provide for stable runtime services addresses across kernels so that a kexec'd kernel can still use them. Thus, they're mapped in a separate pagetable so that we don't pollute the kernel namespace. Add an efi= kernel command line parameter for passing miscellaneous options and chicken bits from the command line. While at it, add a chicken bit called "efi=old_map" which can be used as a fallback to the old runtime services mapping method in case there's some b0rkage with a particular EFI implementation (haha, it is hard to hold up the sarcasm here...). Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com> 2013-10-31 16:25:08 +00:00
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-16 22:20:36 +00:00			`-Andi Kleen, Jul 2004`