linux/drivers/net/ethernet/mellanox/mlx4
Brenden Blanco 326fe02d1e net/mlx4_en: protect ring->xdp_prog with rcu_read_lock
Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
freed despite the use of call_rcu inside bpf_prog_put. The situation is
possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
callback for destroying the bpf prog can run even during the bh handling
in the mlx4 rx path.

Several options were considered before this patch was settled on:

Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
of the rings are updated with the new program.
This approach has the disadvantage that as the number of rings
increases, the speed of update will slow down significantly due to
napi_synchronize's msleep(1).

Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
The action of the bpf_prog_put_bh would be to then call bpf_prog_put
later. Those drivers that consume a bpf prog in a bh context (like mlx4)
would then use the bpf_prog_put_bh instead when the ring is up. This has
the problem of complexity, in maintaining proper refcnts and rcu lists,
and would likely be harder to review. In addition, this approach to
freeing must be exclusive with other frees of the bpf prog, for instance
a _bh prog must not be referenced from a prog array that is consumed by
a non-_bh prog.

The placement of rcu_read_lock in this patch is functionally the same as
putting an rcu_read_lock in napi_poll. Actually doing so could be a
potentially controversial change, but would bring the implementation in
line with sk_busy_loop (though of course the nature of those two paths
is substantially different), and would also avoid future copy/paste
problems with future supporters of XDP. Still, this patch does not take
that opinionated option.

Testing was done with kernels in either PREEMPT_RCU=y or
CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting
any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did
not show up in the perf report whatsoever, and with PREEMPT_RCU=y the
overhead of rcu_read_lock (according to perf) was the same before/after.
In the rx path, rcu_read_lock is eventually called for every packet
from netif_receive_skb_internal, so the napi poll call's rcu_read_lock
is easily amortized.

v2:
Remove extra rcu_read_lock in mlx4_en_process_rx_cq body
Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or
rcu_dereference[_protected] as appropriate.
Add explicit mutex lock around rcu_assign instead of xchg loop.

Fixes: d576acf0a2 ("net/mlx4_en: add page recycle to prepare rx ring for tx support")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06 13:39:33 -07:00
..
alloc.c net/mlx4: Avoid wrong virtual mappings 2016-05-05 23:23:05 -04:00
catas.c net/mlx4_core: Do not BUG_ON during reset when PCI is offline 2016-02-17 10:29:26 -05:00
cmd.c net/mlx4_en: initialize cmd.context_lock spinlock earlier 2016-06-15 12:16:30 -07:00
cq.c net/mlx4_core: Set UAR page size to 4KB regardless of system page size 2016-02-17 10:29:27 -05:00
en_clock.c net/mlx4_en: Choose time-stamping shift value according to HW frequency 2016-02-17 10:29:25 -05:00
en_cq.c net/mlx4: Avoid wrong virtual mappings 2016-05-05 23:23:05 -04:00
en_dcb_nl.c net/mlx4_en: Add DCB PFC support through CEE netlink commands 2016-06-23 15:18:50 -04:00
en_ethtool.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-07-24 00:53:32 -04:00
en_main.c net: mlx4: use new ETHTOOL_G/SSETTINGS API 2016-02-25 22:06:47 -05:00
en_netdev.c net/mlx4_en: protect ring->xdp_prog with rcu_read_lock 2016-09-06 13:39:33 -07:00
en_port.c net/mlx4_en: get rid of private net_device_stats 2016-05-25 22:15:50 -07:00
en_port.h net/mlx4_en: Use PTYS register to query ethtool settings 2014-10-28 17:18:00 -04:00
en_resources.c net/mlx4: Avoid wrong virtual mappings 2016-05-05 23:23:05 -04:00
en_rx.c net/mlx4_en: protect ring->xdp_prog with rcu_read_lock 2016-09-06 13:39:33 -07:00
en_selftest.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-03 21:16:48 -05:00
en_tx.c net/mlx4_en: add xdp forwarding and data write support 2016-07-19 21:46:33 -07:00
eq.c net/mlx4_core: Set UAR page size to 4KB regardless of system page size 2016-02-17 10:29:27 -05:00
fw_qos.c net/mlx4: Add mlx4_SET_VPORT_QOS implementation 2015-04-02 16:25:02 -04:00
fw_qos.h net/mlx4: Added qos_vport QP configuration in VST mode 2015-04-02 16:25:03 -04:00
fw.c Round one of 4.8 code 2016-08-04 20:10:31 -04:00
fw.h net/mlx4_en: Add DCB PFC support through CEE netlink commands 2016-06-23 15:18:50 -04:00
icm.c net/mlx4_core: Maintain a persistent memory for mlx4 device 2015-01-25 14:43:13 -08:00
icm.h
intf.c net/mlx4_core: Check device state before unregistering it 2016-07-25 18:00:25 -07:00
Kconfig mlx4_en: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port 2016-06-17 20:23:31 -07:00
main.c net/mlx4: Fix some indent inconsistancy 2016-07-04 15:22:33 -07:00
Makefile net/mlx4: New file for QoS related firmware commands 2015-04-02 16:25:02 -04:00
mcg.c net/mlx4: Fix some indent inconsistancy 2016-07-04 15:22:33 -07:00
mlx4_en.h net/mlx4_en: protect ring->xdp_prog with rcu_read_lock 2016-09-06 13:39:33 -07:00
mlx4_stats.h net/mlx4_en: Fix off-by-four in ethtool 2015-06-24 00:42:32 -07:00
mlx4.h net/mlx4_core: Don't allow to VF change global pause settings 2016-04-21 15:02:40 -04:00
mr.c net/mlx4: Fix some indent inconsistancy 2016-07-04 15:22:33 -07:00
pd.c io-mapping: Specify mapping size for io_mapping_map_wc() 2016-04-28 12:17:32 +01:00
port.c net/mlx4_en: Add DCB PFC support through CEE netlink commands 2016-06-23 15:18:50 -04:00
profile.c net/mlx4_core: use swap() in mlx4_make_profile() 2015-06-11 15:19:41 -07:00
qp.c net/mlx4_core: Add support for RoCE v2 entropy 2016-01-19 15:35:00 -05:00
reset.c net/mlx4_core: Maintain a persistent memory for mlx4 device 2015-01-25 14:43:13 -08:00
resource_tracker.c net/mlx4: Fix some indent inconsistancy 2016-07-04 15:22:33 -07:00
sense.c
srq.c