linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-04 01:51:34 +00:00

History

Eduard Zingerman 2793a8b015 bpf: exact states comparison for iterator convergence checks Convergence for open coded iterators is computed in is_state_visited() by examining states with branches count > 1 and using states_equal(). states_equal() computes sub-state relation using read and precision marks. Read and precision marks are propagated from children states, thus are not guaranteed to be complete inside a loop when branches count > 1. This could be demonstrated using the following unsafe program: 1. r7 = -16 2. r6 = bpf_get_prandom_u32() 3. while (bpf_iter_num_next(&fp[-8])) { 4. if (r6 != 42) { 5. r7 = -32 6. r6 = bpf_get_prandom_u32() 7. continue 8. } 9. r0 = r10 10. r0 += r7 11. r8 = (u64 )(r0 + 0) 12. r6 = bpf_get_prandom_u32() 13. } Here verifier would first visit path 1-3, create a checkpoint at 3 with r7=-16, continue to 4-7,3 with r7=-32. Because instructions at 9-12 had not been visitied yet existing checkpoint at 3 does not have read or precision mark for r7. Thus states_equal() would return true and verifier would discard current state, thus unsafe memory access at 11 would not be caught. This commit fixes this loophole by introducing exact state comparisons for iterator convergence logic: - registers are compared using regs_exact() regardless of read or precision marks; - stack slots have to have identical type. Unfortunately, this is too strict even for simple programs like below: i = 0; while(iter_next(&it)) i++; At each iteration step i++ would produce a new distinct state and eventually instruction processing limit would be reached. To avoid such behavior speculatively forget (widen) range for imprecise scalar registers, if those registers were not precise at the end of the previous iteration and do not match exactly. This a conservative heuristic that allows to verify wide range of programs, however it precludes verification of programs that conjure an imprecise value on the first loop iteration and use it as precise on the second. Test case iter_task_vma_for_each() presents one of such cases: unsigned int seen = 0; ... bpf_for_each(task_vma, vma, task, 0) { if (seen >= 1000) break; ... seen++; } Here clang generates the following code: <LBB0_4>: 24: r8 = r6 ; stash current value of ... body ... 'seen' 29: r1 = r10 30: r1 += -0x8 31: call bpf_iter_task_vma_next 32: r6 += 0x1 ; seen++; 33: if r0 == 0x0 goto +0x2 <LBB0_6> ; exit on next() == NULL 34: r7 += 0x10 35: if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000 <LBB0_6>: ... exit ... Note that counter in r6 is copied to r8 and then incremented, conditional jump is done using r8. Because of this precision mark for r6 lags one state behind of precision mark on r8 and widening logic kicks in. Adding barrier_var(seen) after conditional is sufficient to force clang use the same register for both counting and conditional jump. This issue was discussed in the thread [1] which was started by Andrew Werner <awerner32@gmail.com> demonstrating a similar bug in callback functions handling. The callbacks would be addressed in a followup patch. [1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@gmail.com/ Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Co-developed-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231024000917.12153-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>		2023-10-23 21:49:31 -07:00
..
accounting
arch	tools arch x86: Sync the msr-index.h copy with the kernel sources	2023-09-13 08:53:37 -03:00
bootconfig
bpf	bpftool: Wrap struct_ops dump in an array	2023-10-19 16:30:15 +02:00
build	perf tools changes for v6.6:	2023-09-09 20:06:17 -07:00
certs
cgroup
counter
crypto/ccp
debugging
edid
firewire
firmware
gpio
hv	hv/hv_kvp_daemon:Support for keyfile based connection profile	2023-10-10 03:42:29 +00:00
iio
include	bpf-next-for-netdev	2023-10-16 21:05:33 -07:00
kvm/kvm_stat
laptop
leds
lib	libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym section	2023-10-17 11:43:20 +02:00
memory-model
mm	tools/mm: fix undefined reference to pthread_once	2023-09-05 10:13:45 -07:00
net/ynl	tools: ynl: use ynl-gen -o instead of stdout in Makefile	2023-10-11 13:33:06 -07:00
objtool	objtool: Fix _THIS_IP_ detection for cold functions	2023-09-12 08:16:54 +02:00
pci
pcmcia
perf	perf jevent: fix core dump on software events on s390	2023-09-17 15:51:57 -07:00
power	More power management updates for 6.6-rc1	2023-09-04 15:21:55 -07:00
rcu
scripts
spi
testing	bpf: exact states comparison for iterator convergence checks	2023-10-23 21:49:31 -07:00
thermal
time
tracing	rtla/timerlat: Do not stop user-space if a cpu is offline	2023-09-22 14:43:46 +02:00
usb
verification
virtio
wmi
workqueue
Makefile