mirror of
https://github.com/torvalds/linux.git
synced 2024-12-25 12:21:37 +00:00
695ba2651a
After doing map_perf_test with a much bigger BPF_F_NO_COMMON_LRU map, the perf report shows a lot of time spent in rotating the inactive list (i.e. __bpf_lru_list_rotate_inactive): > map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}' 19644783 (19M/s) > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}' 6283930 (6.28M/s) By inactive, it usually means the element is not in cache. Hence, there is a need to tune the PERCPU_NR_SCANS value. This patch finds a better number of elements to scan during each list rotation. The PERCPU_NR_SCANS (which is defined the same as PERCPU_FREE_TARGET) decreases from 16 elements to 4 elements. This change only affects the BPF_F_NO_COMMON_LRU map. The test_lru_dist does not show meaningful difference between 16 and 4. Our production L4 load balancer which uses the LRU map for conntrack-ing also shows little change in cache hit rate. Since both benchmark and production data show no cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4. We can consider making it configurable if we find a usecase later that shows another value works better and/or use a different rotation strategy. After this change: > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}' 9240324 (9.2M/s) i.e. 6.28M/s -> 9.2M/s The test_lru_dist has not shown meaningful difference: > test_lru_dist zipf.100k.a1_01.out 4000 1: nr_misses: 31575 (Before) vs 31566 (After) > test_lru_dist zipf.100k.a0_01.out 40000 1 nr_misses: 67036 (Before) vs 67031 (After) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
---|---|---|
.. | ||
accounting | ||
arch | ||
build | ||
cgroup | ||
firewire | ||
gpio | ||
hv | ||
iio | ||
include | ||
kvm/kvm_stat | ||
laptop | ||
leds | ||
lguest | ||
lib | ||
net | ||
nfsd | ||
objtool | ||
pcmcia | ||
perf | ||
power | ||
scripts | ||
spi | ||
testing | ||
thermal/tmon | ||
time | ||
usb | ||
virtio | ||
vm | ||
Makefile |