linux/drivers
Shawn Bohrer fa349e396e veth: Fix race with AF_XDP exposing old or uninitialized descriptors
When AF_XDP is used on on a veth interface the RX ring is updated in two
steps.  veth_xdp_rcv() removes packet descriptors from the FILL ring
fills them and places them in the RX ring updating the cached_prod
pointer.  Later xdp_do_flush() syncs the RX ring prod pointer with the
cached_prod pointer allowing user-space to see the recently filled in
descriptors.  The rings are intended to be SPSC, however the existing
order in veth_poll allows the xdp_do_flush() to run concurrently with
another CPU creating a race condition that allows user-space to see old
or uninitialized descriptors in the RX ring.  This bug has been observed
in production systems.

To summarize, we are expecting this ordering:

CPU 0 __xsk_rcv_zc()
CPU 0 __xsk_map_flush()
CPU 2 __xsk_rcv_zc()
CPU 2 __xsk_map_flush()

But we are seeing this order:

CPU 0 __xsk_rcv_zc()
CPU 2 __xsk_rcv_zc()
CPU 0 __xsk_map_flush()
CPU 2 __xsk_map_flush()

This occurs because we rely on NAPI to ensure that only one napi_poll
handler is running at a time for the given veth receive queue.
napi_schedule_prep() will prevent multiple instances from getting
scheduled. However calling napi_complete_done() signals that this
napi_poll is complete and allows subsequent calls to
napi_schedule_prep() and __napi_schedule() to succeed in scheduling a
concurrent napi_poll before the xdp_do_flush() has been called.  For the
veth driver a concurrent call to napi_schedule_prep() and
__napi_schedule() can occur on a different CPU because the veth xmit
path can additionally schedule a napi_poll creating the race.

The fix as suggested by Magnus Karlsson, is to simply move the
xdp_do_flush() call before napi_complete_done().  This syncs the
producer ring pointers before another instance of napi_poll can be
scheduled on another CPU.  It will also slightly improve performance by
moving the flush closer to when the descriptors were placed in the
RX ring.

Fixes: d1396004dd ("veth: Add XDP TX and REDIRECT")
Suggested-by: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-22 15:06:10 +01:00
..
accel Fix mismerge due to devnode now taking a 'const *' device 2022-12-16 13:04:15 -06:00
accessibility
acpi More ACPI updates for 6.2-rc1 2022-12-15 10:21:10 -08:00
amba ARM updates for 6.2 2022-12-13 15:22:14 -08:00
android
ata ata changes for 6.2 2022-12-13 10:54:19 -08:00
atm
auxdisplay
base Kbuild updates for v6.2 2022-12-19 12:33:32 -06:00
bcma
block Including fixes from bpf, netfilter and can. 2022-12-21 08:41:32 -08:00
bluetooth Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
bus Char/Misc driver changes for 6.2-rc1 2022-12-16 03:49:24 -08:00
cdrom
char random: do not include <asm/archrandom.h> from random.h 2022-12-20 03:13:45 +01:00
clk A pile of clk driver updates with a small tracepoint patch to the clk core this 2022-12-13 13:46:07 -08:00
clocksource Updates for timers, timekeeping and drivers: 2022-12-12 12:52:02 -08:00
comedi
connector
counter
cpufreq linux-kselftest-next-6.2-rc1 2022-12-12 16:39:38 -08:00
cpuidle powerpc updates for 6.2 2022-12-19 07:13:33 -06:00
crypto powerpc updates for 6.2 2022-12-19 07:13:33 -06:00
cxl cxl/region: Fix memdev reuse check 2022-12-08 13:03:47 -08:00
dax
dca
devfreq
dio
dma dmaengine updates for v6.2 2022-12-19 08:54:17 -06:00
dma-buf Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
edac Merge branches 'edac-ghes' and 'edac-misc' into edac-updates-for-v6.2 2022-12-12 15:40:03 +01:00
eisa
extcon Char/Misc driver changes for 6.2-rc1 2022-12-16 03:49:24 -08:00
firewire
firmware efi: random: fix NULL-deref when refreshing seed 2022-12-20 03:13:45 +01:00
fpga Char/Misc driver changes for 6.2-rc1 2022-12-16 03:49:24 -08:00
fsi
gnss
gpio gpio: updates for v6.2 2022-12-15 09:45:51 -08:00
gpu Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
greybus
hid Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
hsi
hte
hv Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
hwmon hwmon updates for v6.2 merge window 2022-12-13 13:09:38 -08:00
hwspinlock
hwtracing
i2c Core got a new helper 'i2c_client_get_device_id', designware got some 2022-12-15 14:47:10 -08:00
i3c i3c: export SETDASA method 2022-12-11 21:25:58 +01:00
idle
iio Char/Misc driver changes for 6.2-rc1 2022-12-16 03:49:24 -08:00
infiniband v6.2 merge window 2nd pull request 2022-12-17 08:23:42 -06:00
input Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
interconnect
iommu IOMMU Updates for Linux v6.2 2022-12-19 08:34:39 -06:00
ipack
irqchip RISC-V Patches for the 6.2 Merge Window, Part 1 2022-12-14 15:23:49 -08:00
isdn Including fixes from bpf, netfilter and can. 2022-12-21 08:41:32 -08:00
leds Lee Jones offered his help with maintaining LEDs, thanks a 2022-12-17 08:28:25 -06:00
macintosh
mailbox ACPI updates for 6.2-rc1 2022-12-12 13:38:17 -08:00
mcb
md - Fix use-after-free races due to missing resource cleanup during DM 2022-12-13 10:58:09 -08:00
media Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
memory ARM updates for 6.2 2022-12-13 15:22:14 -08:00
memstick memstick/mspro_block: Convert to use sysfs_emit()/sysfs_emit_at() APIs 2022-12-09 10:29:58 +01:00
message
mfd
misc powerpc updates for 6.2 2022-12-19 07:13:33 -06:00
mmc MMC core: 2022-12-13 13:41:26 -08:00
most
mtd MTD core changes: 2022-12-13 12:32:07 -08:00
mux
net veth: Fix race with AF_XDP exposing old or uninitialized descriptors 2022-12-22 15:06:10 +01:00
nfc nfc: pn533: Clear nfc_target before being used 2022-12-14 20:51:29 -08:00
ntb
nubus
nvdimm
nvme Including fixes from bpf, netfilter and can. 2022-12-21 08:41:32 -08:00
nvmem Char/Misc driver changes for 6.2-rc1 2022-12-16 03:49:24 -08:00
of Devicetree updates for v6.2, part 2: 2022-12-20 08:48:24 -06:00
opp
parisc parisc: led: Fix potential null-ptr-deref in start_task() 2022-12-17 23:19:38 +01:00
parport
pci phy-for-6.2 2022-12-19 08:40:58 -06:00
pcmcia
peci
perf RISC-V Patches for the 6.2 Merge Window, Part 1 2022-12-14 15:23:49 -08:00
phy phy-for-6.2 2022-12-19 08:40:58 -06:00
pinctrl Pin control changes for the v6.2 kernel cycle: 2022-12-13 13:03:06 -08:00
platform USB/Thunderbolt driver changes for 6.2-rc1 2022-12-16 03:22:53 -08:00
pnp
power power supply and reset changes for the v6.2 series 2022-12-17 08:39:31 -06:00
powercap
pps
ps3
ptp Networking changes for 6.2. 2022-12-13 15:47:48 -08:00
pwm
rapidio rapidio: devices: fix missing put_device in mport_cdev_open 2022-12-11 19:30:20 -08:00
ras
regulator regulator: Updates for v6.2 2022-12-13 12:49:59 -08:00
remoteproc
reset
rpmsg
rtc rtc: ds1742: use devm_platform_get_and_ioremap_resource() 2022-12-15 23:34:31 +01:00
s390 Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
sbus
scsi Including fixes from bpf, netfilter and can. 2022-12-21 08:41:32 -08:00
sh
siox
slimbus
soc ARM: SoC fixes for 6.2 2022-12-19 16:07:59 -06:00
soundwire soundwire updates for 6.2 2022-12-19 08:47:33 -06:00
spi spi: Updates for v6.2 2022-12-13 12:54:31 -08:00
spmi
ssb
staging Char/Misc driver changes for 6.2-rc1 2022-12-16 03:49:24 -08:00
target SCSI misc on 20221213 2022-12-14 08:58:51 -08:00
tc
tee SoC driver updates for 6.2 2022-12-12 10:17:08 -08:00
thermal More thermal control updates for 6.2-rc1 2022-12-15 10:16:04 -08:00
thunderbolt
tty Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
ufs SCSI misc on 20221213 2022-12-14 08:58:51 -08:00
uio
usb Including fixes from bpf, netfilter and can. 2022-12-21 08:41:32 -08:00
vdpa
vfio Driver Core changes for 6.2-rc1 2022-12-16 03:54:54 -08:00
vhost
video fbdev: fbcon: release buffer when fbcon_do_set_font() failed 2022-12-14 20:01:51 +01:00
virt Char/Misc driver changes for 6.2-rc1 2022-12-16 03:49:24 -08:00
virtio
vlynq
w1
watchdog linux-watchdog 6.2-rc1 tag 2022-12-17 08:34:01 -06:00
xen drm for 6.2: 2022-12-13 11:59:58 -08:00
zorro
Kconfig
Makefile