linux/arch/x86/kernel
Yinghai Lu 974f221c84 x86/boot: Move compressed kernel to the end of the decompression buffer
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:

  77d1a49995 ("x86, boot: make symbols from the main vmlinux available")
  37ba7ab5e3 ("x86, boot: make kernel_alignment adjustable; new bzImage fields")

Specifically:

All names prefixed with 'VO_':

 - relate to the uncompressed kernel image

 - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)

All names prefixed with 'ZO_':

 - relate to the bootable compressed kernel image (boot/compressed/vmlinux),
   which is composed of the following memory areas:
     - head text
     - compressed kernel (VO image and relocs table)
     - decompressor code

 - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The 'INIT_SIZE' value is used to find the larger of the two image sizes:

 #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
 #define VO_INIT_SIZE    (VO__end - VO__text)

 #if ZO_INIT_SIZE > VO_INIT_SIZE
 # define INIT_SIZE ZO_INIT_SIZE
 #else
 # define INIT_SIZE VO_INIT_SIZE
 #endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)

Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.

To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.

z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)

When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:

                            |-----compressed kernel image------|
                            V                                  V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          VO__end    ZO__end
            ^                                         ^
            |-------uncompressed kernel image---------|

When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.

                            |-compressed kernel image-|
                            V                         V
0                       extract_offset                      +INIT_SIZE
|-----------|---------------|-------------------------|--------|
            |               |                         |        |
          VO__text      startup_32 of ZO          ZO__end    VO__end
            ^                                                  ^
            |------------uncompressed kernel image-------------|

To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.

This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.

Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:03:29 +02:00
..
acpi x86/ACPI: Parse ACPI_FADT_LEGACY_DEVICES 2016-04-22 10:29:06 +02:00
apic Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-24 09:47:32 -07:00
cpu x86/cpu/intel: Remove not needed paravirt_enabled() use for F00F work around 2016-04-22 10:29:05 +02:00
fpu Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-24 09:47:32 -07:00
kprobes x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard 2016-02-29 08:35:12 +01:00
.gitignore
alternative.c x86/alternatives: Make optimize_nops() interrupt safe and synced 2015-09-03 21:27:47 +02:00
amd_gart_64.c
amd_nb.c x86/cpu: Get rid of compute_unit_id 2016-03-29 10:45:04 +02:00
apb_timer.c x86/apb/timer: Use proper mask to modify hotplug action 2016-03-19 13:40:08 +01:00
aperture_64.c param: convert some "on"/"off" users to strtobool 2016-03-17 15:09:34 -07:00
apm_32.c x86/apm32: Remove paravirt_enabled() use 2016-04-22 10:29:03 +02:00
asm-offsets_32.c x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup 2016-03-10 09:48:14 +01:00
asm-offsets_64.c x86/syscalls: Add syscall entry qualifiers 2016-01-29 09:46:38 +01:00
asm-offsets.c x86/boot: Move compressed kernel to the end of the decompression buffer 2016-04-29 11:03:29 +02:00
audit_64.c
bootflag.c x86: don't use module_init for non-modular core bootflag code 2015-06-16 14:12:34 -04:00
check.c Linux 4.2-rc8 2015-08-25 09:59:19 +02:00
cpuid.c new helpers: no_seek_end_llseek{,_size}() 2015-12-23 10:41:31 -05:00
crash_dump_32.c
crash_dump_64.c
crash.c x86/kexec: Remove walk_iomem_res() call with GART type 2016-01-30 09:49:59 +01:00
devicetree.c Replace module_init with equivalent device_initcall in non modules. 2015-07-02 10:30:48 -07:00
doublefault.c
dumpstack_32.c
dumpstack_64.c
dumpstack.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
e820.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
early_printk.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
early-quirks.c Linux 4.4-rc2 2015-11-23 09:04:05 +01:00
ebda.c x86/init: Rename EBDA code file 2016-04-22 10:29:07 +02:00
espfix_64.c Merge branch 'x86/urgent' into x86/asm, before applying dependent patches 2015-07-31 10:23:35 +02:00
ftrace.c Nothing major this round. Mostly small clean ups and fixes. 2016-03-24 10:52:25 -07:00
head32.c x86/rtc: Replace paravirt rtc check with platform legacy quirk 2016-04-22 10:29:01 +02:00
head64.c x86/rtc: Replace paravirt rtc check with platform legacy quirk 2016-04-22 10:29:01 +02:00
head_32.S Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 10:45:39 -07:00
head_64.S Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 10:02:25 -07:00
hpet.c x86/hpet: Use proper mask to modify hotplug action 2016-03-19 13:40:08 +01:00
hw_breakpoint.c x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros 2015-12-19 11:49:55 +01:00
i386_ksyms_32.c preempt: Use preempt_schedule_context() as the official tracing preemption point 2015-06-07 15:57:42 +02:00
i8237.c
i8253.c clockevents/drivers/i8253: Migrate to new 'set-state' interface 2015-08-10 11:40:30 +02:00
i8259.c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs 2015-11-07 10:37:37 +01:00
io_delay.c
ioport.c x86/iopl: Fix iopl capability check on Xen PV 2016-03-17 09:49:27 +01:00
irq_32.c genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
irq_64.c x86/irq: Drop unlikely before IS_ERR_OR_NULL 2015-10-01 11:08:56 +02:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
irq.c x86/irq: Call irq_force_move_complete with irq descriptor 2016-01-15 13:44:01 +01:00
irqinit.c x86/irq: Store irq descriptor in vector array 2015-08-06 00:14:59 +02:00
jump_label.c jump_label: Rename JUMP_LABEL_{EN,DIS}ABLE to JUMP_LABEL_{JMP,NOP} 2015-08-03 11:34:12 +02:00
kdebugfs.c
kexec-bzimage64.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
kgdb.c Merge branch 'x86/cleanups' into x86/urgent 2016-03-17 09:44:57 +01:00
ksysfs.c
kvm.c x86/paravirt: Remove paravirt_enabled() 2016-04-22 10:29:07 +02:00
kvmclock.c x86: Fix misspellings in comments 2016-02-24 08:44:58 +01:00
ldt.c x86/mm: Factor out LDT init from context init 2016-02-18 19:46:31 +01:00
livepatch.c livepatch: Cleanup module page permission changes 2015-12-04 22:51:07 +01:00
machine_kexec_32.c
machine_kexec_64.c kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
Makefile x86/init: Rename EBDA code file 2016-04-22 10:29:07 +02:00
mcount_64.S x86/ftrace, x86/asm: Kill ftrace_caller_end label 2016-02-17 08:47:22 +01:00
mmconf-fam10h_64.c
module.c
mpparse.c x86/cpufeature: Use enum cpuid_leafs instead of magic numbers 2016-02-01 10:46:48 +01:00
msr.c x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
nmi_selftest.c
nmi.c x86/nmi: Mark 'ignore_nmis' as __read_mostly 2016-03-08 12:48:19 +01:00
paravirt_patch_32.c x86/paravirt: Remove the unused irq_enable_sysexit pv op 2015-11-23 10:48:16 +01:00
paravirt_patch_64.c x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op 2015-11-23 10:48:16 +01:00
paravirt-spinlocks.c locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS 2015-05-11 09:52:09 +02:00
paravirt.c x86/paravirt: Remove paravirt_enabled() 2016-04-22 10:29:07 +02:00
pci-calgary_64.c x86/platform/calgary: Constify cal_chipset_ops structures 2015-11-29 08:50:58 +01:00
pci-dma.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
pci-iommu_table.c
pci-nommu.c
pci-swiotlb.c x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN 2015-12-06 12:46:31 +01:00
pcspeaker.c
perf_regs.c
platform-quirks.c x86/init: Disable pnpbios and rtc for X86_SUBARCH_CE4100 2016-04-22 10:29:09 +02:00
pmem.c x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search 2016-01-30 09:49:59 +01:00
probe_roms.c
process_32.c sched/core, sched/x86: Kill thread_info::saved_preempt_count 2015-10-06 17:08:18 +02:00
process_64.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-24 09:47:32 -07:00
process.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
ptrace.c arch/x86/kernel/ptrace.c: Remove unused arg_offs_table 2015-12-29 11:35:34 +01:00
pvclock.c x86/vdso: Remove pvclock fixmap machinery 2015-12-11 08:56:03 +01:00
quirks.c timers/x86/hpet: Type adjustments 2015-10-21 11:17:32 +02:00
reboot_fixups_32.c
reboot.c x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[] 2016-01-12 12:27:36 +01:00
relocate_kernel_32.S
relocate_kernel_64.S
resource.c
rtc.c x86/init: Disable pnpbios and rtc for X86_SUBARCH_CE4100 2016-04-22 10:29:09 +02:00
setup_percpu.c
setup.c Revert "x86: remove the kernel code/data/bss resources from /proc/iomem" 2016-04-14 12:55:32 -07:00
signal_compat.c x86/compat: Move copy_siginfo_*_user32() to signal_compat.c 2015-07-06 15:28:55 +02:00
signal.c x86/signal/64: Re-add support for SS in the 64-bit signal context 2016-02-17 08:32:11 +01:00
smp.c x86/smp: Remove single IPI wrapper 2015-11-05 13:07:54 +01:00
smpboot.c x86/cpu: Get rid of compute_unit_id 2016-03-29 10:45:04 +02:00
stacktrace.c perf: generalize perf_callchain 2016-02-20 00:21:44 -05:00
step.c Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes 2015-08-18 09:39:47 +02:00
sys_x86_64.c
sysfb_efi.c
sysfb_simplefb.c
sysfb.c
tboot.c x86/tboot: Remove paravirt_enabled() use 2016-04-22 10:29:04 +02:00
tce_64.c
test_nx.c x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option 2016-02-22 08:51:38 +01:00
test_rodata.c x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option 2016-02-22 08:51:38 +01:00
time.c
tls.c
tls.h
topology.c x86: Drop bogus __ref / __refdata annotations 2015-07-20 18:57:20 +02:00
trace_clock.c x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites 2015-07-06 15:23:29 +02:00
tracepoint.c
traps.c Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 10:23:56 -07:00
tsc_msr.c
tsc_sync.c x86/asm/tsc/sync: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers 2015-07-06 15:23:29 +02:00
tsc.c x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known() 2016-03-18 14:51:06 +01:00
uprobes.c uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever 2015-07-31 10:38:06 +02:00
verify_cpu.S x86/cpufeature: Carve out X86_FEATURE_* 2016-01-30 11:22:17 +01:00
vm86_32.c x86/cpufeature: Replace the old static_cpu_has() with safe variant 2016-01-30 11:22:18 +01:00
vmlinux.lds.S x86/boot: Move compressed kernel to the end of the decompression buffer 2016-04-29 11:03:29 +02:00
vsmp_64.c x86: replace __init_or_module with __init in non-modular vsmp_64.c 2015-06-16 14:12:41 -04:00
x86_init.c x86/tsc: Remove unused tsc_pre_init() hook 2015-11-19 11:03:13 +01:00
x8664_ksyms_64.c x86/mm, x86/mce: Add memcpy_mcsafe() 2016-03-08 17:54:38 +01:00