linux

mirror of https://github.com/torvalds/linux.git synced 2025-01-01 07:42:07 +00:00

Author	SHA1	Message	Date
Linus Torvalds	94a855111e	- Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd 73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/ zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs= =cRy1 -----END PGP SIGNATURE----- Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...	2022-12-14 15:03:00 -08:00
Eric Biggers	2d203c46a0	crypto: x86/sm4 - fix crash with CFI enabled sm4_aesni_avx_ctr_enc_blk8(), sm4_aesni_avx_cbc_dec_blk8(), sm4_aesni_avx_cfb_dec_blk8(), sm4_aesni_avx2_ctr_enc_blk16(), sm4_aesni_avx2_cbc_dec_blk16(), and sm4_aesni_avx2_cfb_dec_blk16() are called via indirect function calls. Therefore they need to use SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause their type hashes to be emitted when the kernel is built with CONFIG_CFI_CLANG=y. Otherwise, the code crashes with a CFI failure. (Or at least that should be the case. For some reason the CFI checks in sm4_avx_cbc_decrypt(), sm4_avx_cfb_decrypt(), and sm4_avx_ctr_crypt() are not always being generated, using current tip-of-tree clang. Anyway, this patch is a good idea anyway.) Fixes: `ccace936ee` ("x86: Add types to indirectly called assembly functions") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:19 +08:00
Thomas Gleixner	2f93238b87	crypto: x86/sm[34]: Remove redundant alignments SYM_FUNC_START*() and friends already imply alignment, remove custom alignment hacks to make code consistent. This prepares for future function call ABI changes. Also, with having pushed the function alignment to 16 bytes, this custom alignment is completely superfluous. ( this code couldn't seem to make up it's mind about what alignment it actually wanted, randomly mixing 8 and 16 bytes ) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111144.868540856@infradead.org	2022-10-17 16:41:02 +02:00
Peter Zijlstra	f94909ceb1	x86: Prepare asm files for straight-line-speculation Replace all ret/retq instructions with RET in preparation of making RET a macro. Since AS is case insensitive it's a big no-op without RET defined. find arch/x86/ -name \.S \| while read file do sed -i 's/\<ret[q]\>/RET/' $file done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org	2021-12-08 12:25:37 +01:00
Tianjia Zhang	f8690a4b5a	crypto: x86/sm4 - Fix invalid section entry size This fixes the following warning: vmlinux.o: warning: objtool: elf_update: invalid section entry size The size of the rodata section is 164 bytes, directly using the entry_size of 164 bytes will cause errors in some versions of the gcc compiler, while using 16 bytes directly will cause errors in the clang compiler. This patch correct it by filling the size of rodata to a 16-byte boundary. Fixes: `a7ee22ee14` ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation") Fixes: `5b2efa2bb8` ("crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation") Reported-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Tested-by: Heyuan Shi <heyuan@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-10-22 20:23:01 +08:00
Tianjia Zhang	5b2efa2bb8	crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation Like the implementation of AESNI/AVX, this patch adds an accelerated implementation of AESNI/AVX2. In terms of code implementation, by reusing AESNI/AVX mode-related codes, the amount of code is greatly reduced. From the benchmark data, it can be seen that when the block size is 1024, compared to AVX acceleration, the performance achieved by AVX2 has increased by about 70%, it is also 7.7 times of the pure software implementation of sm4-generic. The main algorithm implementation comes from SM4 AES-NI work by libgcrypt and Markku-Juhani O. Saarinen at: https://github.com/mjosaarinen/sm4ni This optimization supports the four modes of SM4, ECB, CBC, CFB, and CTR. Since CBC and CFB do not support multiple block parallel encryption, the optimization effect is not obvious. Benchmark on Intel i5-6200U 2.30GHz, performance data of three implementation methods, pure software sm4-generic, aesni/avx acceleration, and aesni/avx2 acceleration, the data comes from the 218 mode and 518 mode of tcrypt. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: block-size \| 16 64 128 256 1024 1420 4096 sm4-generic ECB enc \| 60.94 70.41 72.27 73.02 73.87 73.58 73.59 ECB dec \| 61.87 70.53 72.15 73.09 73.89 73.92 73.86 CBC enc \| 56.71 66.31 68.05 69.84 70.02 70.12 70.24 CBC dec \| 54.54 65.91 68.22 69.51 70.63 70.79 70.82 CFB enc \| 57.21 67.24 69.10 70.25 70.73 70.52 71.42 CFB dec \| 57.22 64.74 66.31 67.24 67.40 67.64 67.58 CTR enc \| 59.47 68.64 69.91 71.02 71.86 71.61 71.95 CTR dec \| 59.94 68.77 69.95 71.00 71.84 71.55 71.95 sm4-aesni-avx ECB enc \| 44.95 177.35 292.06 316.98 339.48 322.27 330.59 ECB dec \| 45.28 178.66 292.31 317.52 339.59 322.52 331.16 CBC enc \| 57.75 67.68 69.72 70.60 71.48 71.63 71.74 CBC dec \| 44.32 176.83 284.32 307.24 328.61 312.61 325.82 CFB enc \| 57.81 67.64 69.63 70.55 71.40 71.35 71.70 CFB dec \| 43.14 167.78 282.03 307.20 328.35 318.24 325.95 CTR enc \| 42.35 163.32 279.11 302.93 320.86 310.56 317.93 CTR dec \| 42.39 162.81 278.49 302.37 321.11 310.33 318.37 sm4-aesni-avx2 ECB enc \| 45.19 177.41 292.42 316.12 339.90 322.53 330.54 ECB dec \| 44.83 178.90 291.45 317.31 339.85 322.55 331.07 CBC enc \| 57.66 67.62 69.73 70.55 71.58 71.66 71.77 CBC dec \| 44.34 176.86 286.10 501.68 559.58 483.87 527.46 CFB enc \| 57.43 67.60 69.61 70.52 71.43 71.28 71.65 CFB dec \| 43.12 167.75 268.09 499.33 558.35 490.36 524.73 CTR enc \| 42.42 163.39 256.17 493.95 552.45 481.58 517.19 CTR dec \| 42.49 163.11 256.36 493.34 552.62 481.49 516.83 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2021-08-27 16:30:18 +08:00

6 Commits