974f221c84
This change makes later calculations about where the kernel is located easier to reason about. To better understand this change, we must first clarify what 'VO' and 'ZO' are. These values were introduced in commits by hpa:77d1a49995
("x86, boot: make symbols from the main vmlinux available")37ba7ab5e3
("x86, boot: make kernel_alignment adjustable; new bzImage fields") Specifically: All names prefixed with 'VO_': - relate to the uncompressed kernel image - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define) All names prefixed with 'ZO_': - relate to the bootable compressed kernel image (boot/compressed/vmlinux), which is composed of the following memory areas: - head text - compressed kernel (VO image and relocs table) - decompressor code - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below) The 'INIT_SIZE' value is used to find the larger of the two image sizes: #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset) #define VO_INIT_SIZE (VO__end - VO__text) #if ZO_INIT_SIZE > VO_INIT_SIZE # define INIT_SIZE ZO_INIT_SIZE #else # define INIT_SIZE VO_INIT_SIZE #endif The current code uses extract_offset to decide where to position the copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE currently includes the extract_offset.) Why does z_extract_offset exist? It's needed because we are trying to minimize the amount of RAM used for the whole act of creating an uncompressed, executable, properly relocation-linked kernel image in system memory. We do this so that kernels can be booted on even very small systems. To achieve the goal of minimal memory consumption we have implemented an in-place decompression strategy: instead of cleanly separating the VO and ZO images and also allocating some memory for the decompression code's runtime needs, we instead create this elaborate layout of memory buffers where the output (decompressed) stream, as it progresses, overlaps with and destroys the input (compressed) stream. This can only be done safely if the ZO image is placed to the end of the VO range, plus a certain amount of safety distance to make sure that when the last bytes of the VO range are decompressed, the compressed stream pointer is safely beyond the end of the VO range. z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during the build process, at a point when we know the exact compressed and uncompressed size of the kernel images and can calculate this safe minimum offset value. (Note that the mkpiggy.c calculation is not perfect, because we don't know the decompressor used at that stage, so the z_extract_offset calculation is necessarily imprecise and is mostly based on gzip internals - we'll improve that in the next patch.) When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible), the copied ZO occupies the memory from extract_offset to the end of decompression buffer. It overlaps with the soon-to-be-uncompressed kernel like this: |-----compressed kernel image------| V V 0 extract_offset +INIT_SIZE |-----------|---------------|-------------------------|--------| | | | | VO__text startup_32 of ZO VO__end ZO__end ^ ^ |-------uncompressed kernel image---------| When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space left from end of ZO to the end of decompressing buffer, like below. |-compressed kernel image-| V V 0 extract_offset +INIT_SIZE |-----------|---------------|-------------------------|--------| | | | | VO__text startup_32 of ZO ZO__end VO__end ^ ^ |------------uncompressed kernel image-------------| To simplify calculations and avoid special cases, it is cleaner to always place the compressed kernel image in memory so that ZO__end is at the end of the decompression buffer, instead of placing t at the start of extract_offset as is currently done. This patch adds BP_init_size (which is the INIT_SIZE as passed in from the boot_params) into asm-offsets.c to make it visible to the assembly code. Then when moving the ZO, it calculates the starting position of the copied ZO (via BP_init_size and the ZO run size) so that the VO__end will be at the end of the decompression buffer. To make the position calculation safe, the end of ZO is page aligned (and a comment is added to the existing VO alignment for good measure). Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Rewrote changelog and comments. ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: lasse.collin@tukaani.org Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org [ Rewrote the changelog some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
493 lines
10 KiB
ArmAsm
493 lines
10 KiB
ArmAsm
/*
|
|
* linux/boot/head.S
|
|
*
|
|
* Copyright (C) 1991, 1992, 1993 Linus Torvalds
|
|
*/
|
|
|
|
/*
|
|
* head.S contains the 32-bit startup code.
|
|
*
|
|
* NOTE!!! Startup happens at absolute address 0x00001000, which is also where
|
|
* the page directory will exist. The startup code will be overwritten by
|
|
* the page directory. [According to comments etc elsewhere on a compressed
|
|
* kernel it will end up at 0x1000 + 1Mb I hope so as I assume this. - AC]
|
|
*
|
|
* Page 0 is deliberately kept safe, since System Management Mode code in
|
|
* laptops may need to access the BIOS data stored there. This is also
|
|
* useful for future device drivers that either access the BIOS via VM86
|
|
* mode.
|
|
*/
|
|
|
|
/*
|
|
* High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
|
|
*/
|
|
.code32
|
|
.text
|
|
|
|
#include <linux/init.h>
|
|
#include <linux/linkage.h>
|
|
#include <asm/segment.h>
|
|
#include <asm/boot.h>
|
|
#include <asm/msr.h>
|
|
#include <asm/processor-flags.h>
|
|
#include <asm/asm-offsets.h>
|
|
#include <asm/bootparam.h>
|
|
|
|
/*
|
|
* Locally defined symbols should be marked hidden:
|
|
*/
|
|
.hidden _bss
|
|
.hidden _ebss
|
|
.hidden _got
|
|
.hidden _egot
|
|
|
|
__HEAD
|
|
.code32
|
|
ENTRY(startup_32)
|
|
/*
|
|
* 32bit entry is 0 and it is ABI so immutable!
|
|
* If we come here directly from a bootloader,
|
|
* kernel(text+data+bss+brk) ramdisk, zero_page, command line
|
|
* all need to be under the 4G limit.
|
|
*/
|
|
cld
|
|
/*
|
|
* Test KEEP_SEGMENTS flag to see if the bootloader is asking
|
|
* us to not reload segments
|
|
*/
|
|
testb $KEEP_SEGMENTS, BP_loadflags(%esi)
|
|
jnz 1f
|
|
|
|
cli
|
|
movl $(__BOOT_DS), %eax
|
|
movl %eax, %ds
|
|
movl %eax, %es
|
|
movl %eax, %ss
|
|
1:
|
|
|
|
/*
|
|
* Calculate the delta between where we were compiled to run
|
|
* at and where we were actually loaded at. This can only be done
|
|
* with a short local call on x86. Nothing else will tell us what
|
|
* address we are running at. The reserved chunk of the real-mode
|
|
* data at 0x1e4 (defined as a scratch field) are used as the stack
|
|
* for this calculation. Only 4 bytes are needed.
|
|
*/
|
|
leal (BP_scratch+4)(%esi), %esp
|
|
call 1f
|
|
1: popl %ebp
|
|
subl $1b, %ebp
|
|
|
|
/* setup a stack and make sure cpu supports long mode. */
|
|
movl $boot_stack_end, %eax
|
|
addl %ebp, %eax
|
|
movl %eax, %esp
|
|
|
|
call verify_cpu
|
|
testl %eax, %eax
|
|
jnz no_longmode
|
|
|
|
/*
|
|
* Compute the delta between where we were compiled to run at
|
|
* and where the code will actually run at.
|
|
*
|
|
* %ebp contains the address we are loaded at by the boot loader and %ebx
|
|
* contains the address where we should move the kernel image temporarily
|
|
* for safe in-place decompression.
|
|
*/
|
|
|
|
#ifdef CONFIG_RELOCATABLE
|
|
movl %ebp, %ebx
|
|
movl BP_kernel_alignment(%esi), %eax
|
|
decl %eax
|
|
addl %eax, %ebx
|
|
notl %eax
|
|
andl %eax, %ebx
|
|
cmpl $LOAD_PHYSICAL_ADDR, %ebx
|
|
jge 1f
|
|
#endif
|
|
movl $LOAD_PHYSICAL_ADDR, %ebx
|
|
1:
|
|
|
|
/* Target address to relocate to for decompression */
|
|
movl BP_init_size(%esi), %eax
|
|
subl $_end, %eax
|
|
addl %eax, %ebx
|
|
|
|
/*
|
|
* Prepare for entering 64 bit mode
|
|
*/
|
|
|
|
/* Load new GDT with the 64bit segments using 32bit descriptor */
|
|
leal gdt(%ebp), %eax
|
|
movl %eax, gdt+2(%ebp)
|
|
lgdt gdt(%ebp)
|
|
|
|
/* Enable PAE mode */
|
|
movl %cr4, %eax
|
|
orl $X86_CR4_PAE, %eax
|
|
movl %eax, %cr4
|
|
|
|
/*
|
|
* Build early 4G boot pagetable
|
|
*/
|
|
/* Initialize Page tables to 0 */
|
|
leal pgtable(%ebx), %edi
|
|
xorl %eax, %eax
|
|
movl $((4096*6)/4), %ecx
|
|
rep stosl
|
|
|
|
/* Build Level 4 */
|
|
leal pgtable + 0(%ebx), %edi
|
|
leal 0x1007 (%edi), %eax
|
|
movl %eax, 0(%edi)
|
|
|
|
/* Build Level 3 */
|
|
leal pgtable + 0x1000(%ebx), %edi
|
|
leal 0x1007(%edi), %eax
|
|
movl $4, %ecx
|
|
1: movl %eax, 0x00(%edi)
|
|
addl $0x00001000, %eax
|
|
addl $8, %edi
|
|
decl %ecx
|
|
jnz 1b
|
|
|
|
/* Build Level 2 */
|
|
leal pgtable + 0x2000(%ebx), %edi
|
|
movl $0x00000183, %eax
|
|
movl $2048, %ecx
|
|
1: movl %eax, 0(%edi)
|
|
addl $0x00200000, %eax
|
|
addl $8, %edi
|
|
decl %ecx
|
|
jnz 1b
|
|
|
|
/* Enable the boot page tables */
|
|
leal pgtable(%ebx), %eax
|
|
movl %eax, %cr3
|
|
|
|
/* Enable Long mode in EFER (Extended Feature Enable Register) */
|
|
movl $MSR_EFER, %ecx
|
|
rdmsr
|
|
btsl $_EFER_LME, %eax
|
|
wrmsr
|
|
|
|
/* After gdt is loaded */
|
|
xorl %eax, %eax
|
|
lldt %ax
|
|
movl $__BOOT_TSS, %eax
|
|
ltr %ax
|
|
|
|
/*
|
|
* Setup for the jump to 64bit mode
|
|
*
|
|
* When the jump is performend we will be in long mode but
|
|
* in 32bit compatibility mode with EFER.LME = 1, CS.L = 0, CS.D = 1
|
|
* (and in turn EFER.LMA = 1). To jump into 64bit mode we use
|
|
* the new gdt/idt that has __KERNEL_CS with CS.L = 1.
|
|
* We place all of the values on our mini stack so lret can
|
|
* used to perform that far jump.
|
|
*/
|
|
pushl $__KERNEL_CS
|
|
leal startup_64(%ebp), %eax
|
|
#ifdef CONFIG_EFI_MIXED
|
|
movl efi32_config(%ebp), %ebx
|
|
cmp $0, %ebx
|
|
jz 1f
|
|
leal handover_entry(%ebp), %eax
|
|
1:
|
|
#endif
|
|
pushl %eax
|
|
|
|
/* Enter paged protected Mode, activating Long Mode */
|
|
movl $(X86_CR0_PG | X86_CR0_PE), %eax /* Enable Paging and Protected mode */
|
|
movl %eax, %cr0
|
|
|
|
/* Jump from 32bit compatibility mode into 64bit mode. */
|
|
lret
|
|
ENDPROC(startup_32)
|
|
|
|
#ifdef CONFIG_EFI_MIXED
|
|
.org 0x190
|
|
ENTRY(efi32_stub_entry)
|
|
add $0x4, %esp /* Discard return address */
|
|
popl %ecx
|
|
popl %edx
|
|
popl %esi
|
|
|
|
leal (BP_scratch+4)(%esi), %esp
|
|
call 1f
|
|
1: pop %ebp
|
|
subl $1b, %ebp
|
|
|
|
movl %ecx, efi32_config(%ebp)
|
|
movl %edx, efi32_config+8(%ebp)
|
|
sgdtl efi32_boot_gdt(%ebp)
|
|
|
|
leal efi32_config(%ebp), %eax
|
|
movl %eax, efi_config(%ebp)
|
|
|
|
jmp startup_32
|
|
ENDPROC(efi32_stub_entry)
|
|
#endif
|
|
|
|
.code64
|
|
.org 0x200
|
|
ENTRY(startup_64)
|
|
/*
|
|
* 64bit entry is 0x200 and it is ABI so immutable!
|
|
* We come here either from startup_32 or directly from a
|
|
* 64bit bootloader.
|
|
* If we come here from a bootloader, kernel(text+data+bss+brk),
|
|
* ramdisk, zero_page, command line could be above 4G.
|
|
* We depend on an identity mapped page table being provided
|
|
* that maps our entire kernel(text+data+bss+brk), zero page
|
|
* and command line.
|
|
*/
|
|
#ifdef CONFIG_EFI_STUB
|
|
/*
|
|
* The entry point for the PE/COFF executable is efi_pe_entry, so
|
|
* only legacy boot loaders will execute this jmp.
|
|
*/
|
|
jmp preferred_addr
|
|
|
|
ENTRY(efi_pe_entry)
|
|
movq %rcx, efi64_config(%rip) /* Handle */
|
|
movq %rdx, efi64_config+8(%rip) /* EFI System table pointer */
|
|
|
|
leaq efi64_config(%rip), %rax
|
|
movq %rax, efi_config(%rip)
|
|
|
|
call 1f
|
|
1: popq %rbp
|
|
subq $1b, %rbp
|
|
|
|
/*
|
|
* Relocate efi_config->call().
|
|
*/
|
|
addq %rbp, efi64_config+88(%rip)
|
|
|
|
movq %rax, %rdi
|
|
call make_boot_params
|
|
cmpq $0,%rax
|
|
je fail
|
|
mov %rax, %rsi
|
|
leaq startup_32(%rip), %rax
|
|
movl %eax, BP_code32_start(%rsi)
|
|
jmp 2f /* Skip the relocation */
|
|
|
|
handover_entry:
|
|
call 1f
|
|
1: popq %rbp
|
|
subq $1b, %rbp
|
|
|
|
/*
|
|
* Relocate efi_config->call().
|
|
*/
|
|
movq efi_config(%rip), %rax
|
|
addq %rbp, 88(%rax)
|
|
2:
|
|
movq efi_config(%rip), %rdi
|
|
call efi_main
|
|
movq %rax,%rsi
|
|
cmpq $0,%rax
|
|
jne 2f
|
|
fail:
|
|
/* EFI init failed, so hang. */
|
|
hlt
|
|
jmp fail
|
|
2:
|
|
movl BP_code32_start(%esi), %eax
|
|
leaq preferred_addr(%rax), %rax
|
|
jmp *%rax
|
|
|
|
preferred_addr:
|
|
#endif
|
|
|
|
/* Setup data segments. */
|
|
xorl %eax, %eax
|
|
movl %eax, %ds
|
|
movl %eax, %es
|
|
movl %eax, %ss
|
|
movl %eax, %fs
|
|
movl %eax, %gs
|
|
|
|
/*
|
|
* Compute the decompressed kernel start address. It is where
|
|
* we were loaded at aligned to a 2M boundary. %rbp contains the
|
|
* decompressed kernel start address.
|
|
*
|
|
* If it is a relocatable kernel then decompress and run the kernel
|
|
* from load address aligned to 2MB addr, otherwise decompress and
|
|
* run the kernel from LOAD_PHYSICAL_ADDR
|
|
*
|
|
* We cannot rely on the calculation done in 32-bit mode, since we
|
|
* may have been invoked via the 64-bit entry point.
|
|
*/
|
|
|
|
/* Start with the delta to where the kernel will run at. */
|
|
#ifdef CONFIG_RELOCATABLE
|
|
leaq startup_32(%rip) /* - $startup_32 */, %rbp
|
|
movl BP_kernel_alignment(%rsi), %eax
|
|
decl %eax
|
|
addq %rax, %rbp
|
|
notq %rax
|
|
andq %rax, %rbp
|
|
cmpq $LOAD_PHYSICAL_ADDR, %rbp
|
|
jge 1f
|
|
#endif
|
|
movq $LOAD_PHYSICAL_ADDR, %rbp
|
|
1:
|
|
|
|
/* Target address to relocate to for decompression */
|
|
movl BP_init_size(%rsi), %ebx
|
|
subl $_end, %ebx
|
|
addq %rbp, %rbx
|
|
|
|
/* Set up the stack */
|
|
leaq boot_stack_end(%rbx), %rsp
|
|
|
|
/* Zero EFLAGS */
|
|
pushq $0
|
|
popfq
|
|
|
|
/*
|
|
* Copy the compressed kernel to the end of our buffer
|
|
* where decompression in place becomes safe.
|
|
*/
|
|
pushq %rsi
|
|
leaq (_bss-8)(%rip), %rsi
|
|
leaq (_bss-8)(%rbx), %rdi
|
|
movq $_bss /* - $startup_32 */, %rcx
|
|
shrq $3, %rcx
|
|
std
|
|
rep movsq
|
|
cld
|
|
popq %rsi
|
|
|
|
/*
|
|
* Jump to the relocated address.
|
|
*/
|
|
leaq relocated(%rbx), %rax
|
|
jmp *%rax
|
|
|
|
#ifdef CONFIG_EFI_STUB
|
|
.org 0x390
|
|
ENTRY(efi64_stub_entry)
|
|
movq %rdi, efi64_config(%rip) /* Handle */
|
|
movq %rsi, efi64_config+8(%rip) /* EFI System table pointer */
|
|
|
|
leaq efi64_config(%rip), %rax
|
|
movq %rax, efi_config(%rip)
|
|
|
|
movq %rdx, %rsi
|
|
jmp handover_entry
|
|
ENDPROC(efi64_stub_entry)
|
|
#endif
|
|
|
|
.text
|
|
relocated:
|
|
|
|
/*
|
|
* Clear BSS (stack is currently empty)
|
|
*/
|
|
xorl %eax, %eax
|
|
leaq _bss(%rip), %rdi
|
|
leaq _ebss(%rip), %rcx
|
|
subq %rdi, %rcx
|
|
shrq $3, %rcx
|
|
rep stosq
|
|
|
|
/*
|
|
* Adjust our own GOT
|
|
*/
|
|
leaq _got(%rip), %rdx
|
|
leaq _egot(%rip), %rcx
|
|
1:
|
|
cmpq %rcx, %rdx
|
|
jae 2f
|
|
addq %rbx, (%rdx)
|
|
addq $8, %rdx
|
|
jmp 1b
|
|
2:
|
|
|
|
/*
|
|
* Do the extraction, and jump to the new kernel..
|
|
*/
|
|
pushq %rsi /* Save the real mode argument */
|
|
movq $z_run_size, %r9 /* size of kernel with .bss and .brk */
|
|
pushq %r9
|
|
movq %rsi, %rdi /* real mode address */
|
|
leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
|
|
leaq input_data(%rip), %rdx /* input_data */
|
|
movl $z_input_len, %ecx /* input_len */
|
|
movq %rbp, %r8 /* output target address */
|
|
movq $z_output_len, %r9 /* decompressed length, end of relocs */
|
|
call extract_kernel /* returns kernel location in %rax */
|
|
popq %r9
|
|
popq %rsi
|
|
|
|
/*
|
|
* Jump to the decompressed kernel.
|
|
*/
|
|
jmp *%rax
|
|
|
|
.code32
|
|
no_longmode:
|
|
/* This isn't an x86-64 CPU so hang */
|
|
1:
|
|
hlt
|
|
jmp 1b
|
|
|
|
#include "../../kernel/verify_cpu.S"
|
|
|
|
.data
|
|
gdt:
|
|
.word gdt_end - gdt
|
|
.long gdt
|
|
.word 0
|
|
.quad 0x0000000000000000 /* NULL descriptor */
|
|
.quad 0x00af9a000000ffff /* __KERNEL_CS */
|
|
.quad 0x00cf92000000ffff /* __KERNEL_DS */
|
|
.quad 0x0080890000000000 /* TS descriptor */
|
|
.quad 0x0000000000000000 /* TS continued */
|
|
gdt_end:
|
|
|
|
#ifdef CONFIG_EFI_STUB
|
|
efi_config:
|
|
.quad 0
|
|
|
|
#ifdef CONFIG_EFI_MIXED
|
|
.global efi32_config
|
|
efi32_config:
|
|
.fill 11,8,0
|
|
.quad efi64_thunk
|
|
.byte 0
|
|
#endif
|
|
|
|
.global efi64_config
|
|
efi64_config:
|
|
.fill 11,8,0
|
|
.quad efi_call
|
|
.byte 1
|
|
#endif /* CONFIG_EFI_STUB */
|
|
|
|
/*
|
|
* Stack and heap for uncompression
|
|
*/
|
|
.bss
|
|
.balign 4
|
|
boot_heap:
|
|
.fill BOOT_HEAP_SIZE, 1, 0
|
|
boot_stack:
|
|
.fill BOOT_STACK_SIZE, 1, 0
|
|
boot_stack_end:
|
|
|
|
/*
|
|
* Space for page tables (not in .bss so not zeroed)
|
|
*/
|
|
.section ".pgtable","a",@nobits
|
|
.balign 4096
|
|
pgtable:
|
|
.fill 6*4096, 1, 0
|