linux/include
Josh Poimboeuf 3193c0836f bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
elimination" optimization results in ___bpf_prog_run()'s jumptable code
changing from this:

	select_insn:
		jmp *jumptable(, %rax, 8)
		...
	ALU64_ADD_X:
		...
		jmp *jumptable(, %rax, 8)
	ALU_ADD_X:
		...
		jmp *jumptable(, %rax, 8)

to this:

	select_insn:
		mov jumptable, %r12
		jmp *(%r12, %rax, 8)
		...
	ALU64_ADD_X:
		...
		jmp *(%r12, %rax, 8)
	ALU_ADD_X:
		...
		jmp *(%r12, %rax, 8)

The jumptable address is placed in a register once, at the beginning of
the function.  The function execution can then go through multiple
indirect jumps which rely on that same register value.  This has a few
issues:

1) Objtool isn't smart enough to be able to track such a register value
   across multiple recursive indirect jumps through the jump table.

2) With CONFIG_RETPOLINE enabled, this optimization actually results in
   a small slowdown.  I measured a ~4.7% slowdown in the test_bpf
   "tcpdump port 22" selftest.

   This slowdown is actually predicted by the GCC manual:

     Note: When compiling a program using computed gotos, a GCC
     extension, you may get better run-time performance if you
     disable the global common subexpression elimination pass by
     adding -fno-gcse to the command line.

So just disable the optimization for this function.

Fixes: e55a73251d ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/30c3ca29ba037afcbd860a8672eef0021addf9fe.1563413318.git.jpoimboe@redhat.com
2019-07-18 21:01:06 +02:00
..
acpi It's been a relatively busy cycle for docs: 2019-07-09 12:34:26 -07:00
asm-generic include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures 2019-07-16 19:23:24 -07:00
clocksource
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-07-08 20:57:08 -07:00
drm drm main pull request for v5.3-rc1 (sans mm changes) 2019-07-15 19:04:27 -07:00
dt-bindings This round of clk driver and framework updates is heavy on the driver update 2019-07-17 10:07:48 -07:00
keys request_key improvements 2019-07-08 19:19:37 -07:00
kvm KVM: arm/arm64: Support chained PMU counters 2019-07-05 13:56:22 +01:00
linux bpf: Disable GCC -fgcse optimization for ___bpf_prog_run() 2019-07-18 21:01:06 +02:00
math-emu
media media updates for v5.3-rc1 2019-07-09 09:47:22 -07:00
memory
misc powerpc updates for 5.3 2019-07-13 16:08:36 -07:00
net net: sched: Fix NULL-pointer dereference in tc_indr_block_ing_cmd() 2019-07-12 15:21:53 -07:00
pcmcia It's been a relatively busy cycle for docs: 2019-07-09 12:34:26 -07:00
ras
rdma RDMA/core: Make rdma_counter.h compile stand alone 2019-07-09 09:44:47 -03:00
scsi SCSI misc on 20190709 2019-07-11 15:14:01 -07:00
soc
sound ASoC: Updates for v5.3 2019-07-08 14:45:34 +02:00
target
trace for-5.3-tag 2019-07-16 15:12:56 -07:00
uapi virtio, vhost: fixes, features, performance 2019-07-17 11:26:09 -07:00
vdso
video drm main pull request for v5.3-rc1 (sans mm changes) 2019-07-15 19:04:27 -07:00
xen
Kbuild kbuild: compile-test kernel headers to ensure they are self-contained 2019-07-09 21:44:37 +09:00