linux/tools
Eduard Zingerman 2793a8b015 bpf: exact states comparison for iterator convergence checks
Convergence for open coded iterators is computed in is_state_visited()
by examining states with branches count > 1 and using states_equal().
states_equal() computes sub-state relation using read and precision marks.
Read and precision marks are propagated from children states,
thus are not guaranteed to be complete inside a loop when branches
count > 1. This could be demonstrated using the following unsafe program:

     1. r7 = -16
     2. r6 = bpf_get_prandom_u32()
     3. while (bpf_iter_num_next(&fp[-8])) {
     4.   if (r6 != 42) {
     5.     r7 = -32
     6.     r6 = bpf_get_prandom_u32()
     7.     continue
     8.   }
     9.   r0 = r10
    10.   r0 += r7
    11.   r8 = *(u64 *)(r0 + 0)
    12.   r6 = bpf_get_prandom_u32()
    13. }

Here verifier would first visit path 1-3, create a checkpoint at 3
with r7=-16, continue to 4-7,3 with r7=-32.

Because instructions at 9-12 had not been visitied yet existing
checkpoint at 3 does not have read or precision mark for r7.
Thus states_equal() would return true and verifier would discard
current state, thus unsafe memory access at 11 would not be caught.

This commit fixes this loophole by introducing exact state comparisons
for iterator convergence logic:
- registers are compared using regs_exact() regardless of read or
  precision marks;
- stack slots have to have identical type.

Unfortunately, this is too strict even for simple programs like below:

    i = 0;
    while(iter_next(&it))
      i++;

At each iteration step i++ would produce a new distinct state and
eventually instruction processing limit would be reached.

To avoid such behavior speculatively forget (widen) range for
imprecise scalar registers, if those registers were not precise at the
end of the previous iteration and do not match exactly.

This a conservative heuristic that allows to verify wide range of
programs, however it precludes verification of programs that conjure
an imprecise value on the first loop iteration and use it as precise
on the second.

Test case iter_task_vma_for_each() presents one of such cases:

        unsigned int seen = 0;
        ...
        bpf_for_each(task_vma, vma, task, 0) {
                if (seen >= 1000)
                        break;
                ...
                seen++;
        }

Here clang generates the following code:

<LBB0_4>:
      24:       r8 = r6                          ; stash current value of
                ... body ...                       'seen'
      29:       r1 = r10
      30:       r1 += -0x8
      31:       call bpf_iter_task_vma_next
      32:       r6 += 0x1                        ; seen++;
      33:       if r0 == 0x0 goto +0x2 <LBB0_6>  ; exit on next() == NULL
      34:       r7 += 0x10
      35:       if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000

<LBB0_6>:
      ... exit ...

Note that counter in r6 is copied to r8 and then incremented,
conditional jump is done using r8. Because of this precision mark for
r6 lags one state behind of precision mark on r8 and widening logic
kicks in.

Adding barrier_var(seen) after conditional is sufficient to force
clang use the same register for both counting and conditional jump.

This issue was discussed in the thread [1] which was started by
Andrew Werner <awerner32@gmail.com> demonstrating a similar bug
in callback functions handling. The callbacks would be addressed
in a followup patch.

[1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@gmail.com/

Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Co-developed-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:31 -07:00
..
accounting
arch tools arch x86: Sync the msr-index.h copy with the kernel sources 2023-09-13 08:53:37 -03:00
bootconfig
bpf bpftool: Wrap struct_ops dump in an array 2023-10-19 16:30:15 +02:00
build perf tools changes for v6.6: 2023-09-09 20:06:17 -07:00
certs
cgroup
counter
crypto/ccp
debugging
edid
firewire
firmware
gpio
hv hv/hv_kvp_daemon:Support for keyfile based connection profile 2023-10-10 03:42:29 +00:00
iio
include bpf-next-for-netdev 2023-10-16 21:05:33 -07:00
kvm/kvm_stat
laptop
leds
lib libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym section 2023-10-17 11:43:20 +02:00
memory-model
mm tools/mm: fix undefined reference to pthread_once 2023-09-05 10:13:45 -07:00
net/ynl tools: ynl: use ynl-gen -o instead of stdout in Makefile 2023-10-11 13:33:06 -07:00
objtool objtool: Fix _THIS_IP_ detection for cold functions 2023-09-12 08:16:54 +02:00
pci
pcmcia
perf perf jevent: fix core dump on software events on s390 2023-09-17 15:51:57 -07:00
power More power management updates for 6.6-rc1 2023-09-04 15:21:55 -07:00
rcu
scripts
spi
testing bpf: exact states comparison for iterator convergence checks 2023-10-23 21:49:31 -07:00
thermal
time
tracing rtla/timerlat: Do not stop user-space if a cpu is offline 2023-09-22 14:43:46 +02:00
usb
verification
virtio
wmi
workqueue
Makefile