fd9dde6abc
Upon upgrading to binutils 2.27, we found that our lz4 and gzip compressed kernel images were significantly larger, resulting is 10ms boot time regressions. As noted by Rahul: "aarch64 binaries uses RELA relocations, where each relocation entry includes an addend value. This is similar to x86_64. On x86_64, the addend values are also stored at the relocation offset for relative relocations. This is an optimization: in the case where code does not need to be relocated, the loader can simply skip processing relative relocations. In binutils-2.25, both bfd and gold linkers did this for x86_64, but only the gold linker did this for aarch64. The kernel build here is using the bfd linker, which stored zeroes at the relocation offsets for relative relocations. Since a set of zeroes compresses better than a set of non-zero addend values, this behavior was resulting in much better lz4 compression. The bfd linker in binutils-2.27 is now storing the actual addend values at the relocation offsets. The behavior is now consistent with what it does for x86_64 and what gold linker does for both architectures. The change happened in this upstream commit: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613 Since a bunch of zeroes got replaced by non-zero addend values, we see the side effect of lz4 compressed image being a bit bigger. To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs" flag can be used: $ LDFLAGS="--no-apply-dynamic-relocs" make With this flag, the compressed image size is back to what it was with binutils-2.25. If the kernel is using ASLR, there aren't additional runtime costs to --no-apply-dynamic-relocs, as the relocations will need to be applied again anyway after the kernel is relocated to a random address. If the kernel is not using ASLR, then presumably the current default behavior of the linker is better. Since the static linker performed the dynamic relocs, and the kernel is not moved to a different address at load time, it can skip applying the relocations all over again." Some measurements: $ ld -v GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117 ^ $ ls -l vmlinux -rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux $ ls -l Image.lz4-dtb -rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb $ ld -v GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315 ^ pre patch: $ ls -l vmlinux -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux $ ls -l Image.lz4-dtb -rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb post patch: $ ls -l vmlinux -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux $ ls -l Image.lz4-dtb -rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb By Siqi's measurement w/ gzip: binutils 2.27 with this patch (with --no-apply-dynamic-relocs): Image 41535488 Image.gz 13404067 binutils 2.27 without this patch (without --no-apply-dynamic-relocs): Image 41535488 Image.gz 14125516 Any compression scheme should be able to get better results from the longer runs of zeros, not just GZIP and LZ4. 10ms boot time savings isn't anything to get excited about, but users of arm64+compression+bfd-2.27 should not have to pay a penalty for no runtime improvement. Reported-by: Gopinath Elanchezhian <gelanchezhian@google.com> Reported-by: Sindhuri Pentyala <spentyala@google.com> Reported-by: Wei Wang <wvw@google.com> Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Rahul Chaudhry <rahulchaudhry@google.com> Suggested-by: Siqi Lin <siqilin@google.com> Suggested-by: Stephen Hines <srhines@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: added comment to Makefile] Signed-off-by: Will Deacon <will.deacon@arm.com>
176 lines
5.3 KiB
Makefile
176 lines
5.3 KiB
Makefile
#
|
|
# arch/arm64/Makefile
|
|
#
|
|
# This file is included by the global makefile so that you can add your own
|
|
# architecture-specific flags and dependencies.
|
|
#
|
|
# This file is subject to the terms and conditions of the GNU General Public
|
|
# License. See the file "COPYING" in the main directory of this archive
|
|
# for more details.
|
|
#
|
|
# Copyright (C) 1995-2001 by Russell King
|
|
|
|
LDFLAGS_vmlinux :=-p --no-undefined -X
|
|
CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
|
|
GZFLAGS :=-9
|
|
|
|
ifeq ($(CONFIG_RELOCATABLE), y)
|
|
# Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour
|
|
# for relative relocs, since this leads to better Image compression
|
|
# with the relocation offsets always being zero.
|
|
LDFLAGS_vmlinux += -pie -shared -Bsymbolic \
|
|
$(call ld-option, --no-apply-dynamic-relocs)
|
|
endif
|
|
|
|
ifeq ($(CONFIG_ARM64_ERRATUM_843419),y)
|
|
ifeq ($(call ld-option, --fix-cortex-a53-843419),)
|
|
$(warning ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum)
|
|
else
|
|
LDFLAGS_vmlinux += --fix-cortex-a53-843419
|
|
endif
|
|
endif
|
|
|
|
KBUILD_DEFCONFIG := defconfig
|
|
|
|
# Check for binutils support for specific extensions
|
|
lseinstr := $(call as-instr,.arch_extension lse,-DCONFIG_AS_LSE=1)
|
|
|
|
ifeq ($(CONFIG_ARM64_LSE_ATOMICS), y)
|
|
ifeq ($(lseinstr),)
|
|
$(warning LSE atomics not supported by binutils)
|
|
endif
|
|
endif
|
|
|
|
ifeq ($(CONFIG_ARM64), y)
|
|
brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
|
|
|
|
ifneq ($(brokengasinst),)
|
|
$(warning Detected assembler with broken .inst; disassembly will be unreliable)
|
|
endif
|
|
endif
|
|
|
|
KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst)
|
|
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
|
|
KBUILD_CFLAGS += $(call cc-option, -mpc-relative-literal-loads)
|
|
KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
|
|
|
|
KBUILD_CFLAGS += $(call cc-option,-mabi=lp64)
|
|
KBUILD_AFLAGS += $(call cc-option,-mabi=lp64)
|
|
|
|
ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
|
|
KBUILD_CPPFLAGS += -mbig-endian
|
|
CHECKFLAGS += -D__AARCH64EB__
|
|
AS += -EB
|
|
LD += -EB
|
|
LDFLAGS += -maarch64linuxb
|
|
UTS_MACHINE := aarch64_be
|
|
else
|
|
KBUILD_CPPFLAGS += -mlittle-endian
|
|
CHECKFLAGS += -D__AARCH64EL__
|
|
AS += -EL
|
|
LD += -EL
|
|
LDFLAGS += -maarch64linux
|
|
UTS_MACHINE := aarch64
|
|
endif
|
|
|
|
CHECKFLAGS += -D__aarch64__ -m64
|
|
|
|
ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
|
|
KBUILD_CFLAGS_MODULE += -mcmodel=large
|
|
endif
|
|
|
|
ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
|
|
KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/arm64/kernel/module.lds
|
|
ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
|
|
KBUILD_LDFLAGS_MODULE += $(objtree)/arch/arm64/kernel/ftrace-mod.o
|
|
endif
|
|
endif
|
|
|
|
# Default value
|
|
head-y := arch/arm64/kernel/head.o
|
|
|
|
# The byte offset of the kernel image in RAM from the start of RAM.
|
|
ifeq ($(CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET), y)
|
|
TEXT_OFFSET := $(shell awk "BEGIN {srand(); printf \"0x%06x\n\", \
|
|
int(2 * 1024 * 1024 / (2 ^ $(CONFIG_ARM64_PAGE_SHIFT)) * \
|
|
rand()) * (2 ^ $(CONFIG_ARM64_PAGE_SHIFT))}")
|
|
else
|
|
TEXT_OFFSET := 0x00080000
|
|
endif
|
|
|
|
# KASAN_SHADOW_OFFSET = VA_START + (1 << (VA_BITS - 3)) - (1 << 61)
|
|
# in 32-bit arithmetic
|
|
KASAN_SHADOW_OFFSET := $(shell printf "0x%08x00000000\n" $$(( \
|
|
(0xffffffff & (-1 << ($(CONFIG_ARM64_VA_BITS) - 32))) \
|
|
+ (1 << ($(CONFIG_ARM64_VA_BITS) - 32 - 3)) \
|
|
- (1 << (64 - 32 - 3)) )) )
|
|
|
|
export TEXT_OFFSET GZFLAGS
|
|
|
|
core-y += arch/arm64/kernel/ arch/arm64/mm/
|
|
core-$(CONFIG_NET) += arch/arm64/net/
|
|
core-$(CONFIG_KVM) += arch/arm64/kvm/
|
|
core-$(CONFIG_XEN) += arch/arm64/xen/
|
|
core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
|
|
libs-y := arch/arm64/lib/ $(libs-y)
|
|
core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
|
|
|
|
# Default target when executing plain make
|
|
boot := arch/arm64/boot
|
|
KBUILD_IMAGE := $(boot)/Image.gz
|
|
KBUILD_DTBS := dtbs
|
|
|
|
all: Image.gz $(KBUILD_DTBS)
|
|
|
|
|
|
Image: vmlinux
|
|
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
|
|
|
|
Image.%: Image
|
|
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
|
|
|
|
zinstall install:
|
|
$(Q)$(MAKE) $(build)=$(boot) $@
|
|
|
|
%.dtb: scripts
|
|
$(Q)$(MAKE) $(build)=$(boot)/dts $(boot)/dts/$@
|
|
|
|
PHONY += dtbs dtbs_install
|
|
|
|
dtbs: prepare scripts
|
|
$(Q)$(MAKE) $(build)=$(boot)/dts
|
|
|
|
dtbs_install:
|
|
$(Q)$(MAKE) $(dtbinst)=$(boot)/dts
|
|
|
|
PHONY += vdso_install
|
|
vdso_install:
|
|
$(Q)$(MAKE) $(build)=arch/arm64/kernel/vdso $@
|
|
|
|
# We use MRPROPER_FILES and CLEAN_FILES now
|
|
archclean:
|
|
$(Q)$(MAKE) $(clean)=$(boot)
|
|
$(Q)$(MAKE) $(clean)=$(boot)/dts
|
|
|
|
# We need to generate vdso-offsets.h before compiling certain files in kernel/.
|
|
# In order to do that, we should use the archprepare target, but we can't since
|
|
# asm-offsets.h is included in some files used to generate vdso-offsets.h, and
|
|
# asm-offsets.h is built in prepare0, for which archprepare is a dependency.
|
|
# Therefore we need to generate the header after prepare0 has been made, hence
|
|
# this hack.
|
|
prepare: vdso_prepare
|
|
vdso_prepare: prepare0
|
|
$(Q)$(MAKE) $(build)=arch/arm64/kernel/vdso include/generated/vdso-offsets.h
|
|
|
|
define archhelp
|
|
echo '* Image.gz - Compressed kernel image (arch/$(ARCH)/boot/Image.gz)'
|
|
echo ' Image - Uncompressed kernel image (arch/$(ARCH)/boot/Image)'
|
|
echo '* dtbs - Build device tree blobs for enabled boards'
|
|
echo ' dtbs_install - Install dtbs to $(INSTALL_DTBS_PATH)'
|
|
echo ' install - Install uncompressed kernel'
|
|
echo ' zinstall - Install compressed kernel'
|
|
echo ' Install using (your) ~/bin/installkernel or'
|
|
echo ' (distribution) /sbin/installkernel or'
|
|
echo ' install to $$(INSTALL_PATH) and run lilo'
|
|
endef
|