linux/net/core
Jesper Dangaard Brouer 44768decb7 page_pool: handle page recycle for NUMA_NO_NODE condition
The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is
not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE.

The goal of the NUMA changes in commit d5394610b1 ("page_pool: Don't
recycle non-reusable pages"), were to have RX-pages that belongs to the
same NUMA node as the CPU processing RX-packet during softirq/NAPI. As
illustrated by the performance measurements.

This patch moves the NAPI checks out of fast-path, and at the same time
solves the NUMA_NO_NODE issue.

First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE
will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used
as the the preferred nid.  It is only in rare situations, where
e.g. NUMA zone runs dry, that page gets doesn't get allocated from
preferred nid.  The page_pool API allows drivers to control the nid
themselves via controlling pool->p.nid.

This patch moves the NAPI check to when alloc cache is refilled, via
dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing
pages from remote NUMA into the ptr_ring, as the dequeue/consume step
will check the NUMA node. All current drivers using page_pool will
alloc/refill RX-ring from same CPU running softirq/NAPI process.

Drivers that control the nid explicitly, also use page_pool_update_nid
when changing nid runtime.  To speed up transision to new nid the alloc
cache is now flushed on nid changes.  This force pages to come from
ptr_ring, which does the appropate nid check.

For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA
node, we accept that transitioning the alloc cache doesn't happen
immediately. The preferred nid change runtime via consulting
numa_mem_id() based on the CPU processing RX-packets.

Notice, to avoid stressing the page buddy allocator and avoid doing too
much work under softirq with preempt disabled, the NUMA check at
ptr_ring dequeue will break the refill cycle, when detecting a NUMA
mismatch. This will cause a slower transition, but its done on purpose.

Fixes: d5394610b1 ("page_pool: Don't recycle non-reusable pages")
Reported-by: Li RongQing <lirongqing@baidu.com>
Reported-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-02 15:37:52 -08:00
..
bpf_sk_storage.c bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails 2019-11-18 11:41:59 +01:00
datagram.c net: add READ_ONCE() annotation in __skb_wait_for_more_packets() 2019-10-28 13:33:41 -07:00
datagram.h net/core: Allow the compiler to verify declaration and definition consistency 2019-03-27 13:49:44 -07:00
dev_addr_lists.c net: remove unnecessary variables and callback 2019-10-24 14:53:49 -07:00
dev_ioctl.c net: Introduce peer to peer one step PTP time stamping. 2019-12-25 19:51:34 -08:00
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2019-12-30 14:22:11 -08:00
devlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-16 21:51:42 -08:00
drop_monitor.c drop_monitor: Better sanitize notified packets 2019-09-16 21:39:27 +02:00
dst_cache.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
dst.c net: print proper warning on dst underflow 2019-09-26 09:05:56 +02:00
failover.c failover: allow name change on IFF_UP slave interfaces 2019-04-10 22:12:26 -07:00
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
filter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2019-12-27 14:20:10 -08:00
flow_dissector.c treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
flow_offload.c net: core: rename indirect block ingress cb function 2019-12-06 20:45:09 -08:00
gen_estimator.c net_sched: gen_estimator: extend packet counter to 64bit 2019-11-06 21:51:36 -08:00
gen_stats.c net_sched: add TCA_STATS_PKT64 attribute 2019-11-05 18:20:55 -08:00
gro_cells.c gro_cells: make sure device is up in gro_cells_receive() 2019-03-10 11:07:14 -07:00
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: link_watch: prevent starvation when processing linkwatch wq 2019-07-01 19:02:47 -07:00
lwt_bpf.c net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup 2019-12-04 12:27:13 -08:00
lwtunnel.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
Makefile ethtool: move to its own directory 2019-12-12 17:07:05 -08:00
neighbour.c neighbour: remove neigh_cleanup() method 2019-12-09 09:48:47 -08:00
net_namespace.c netns: fix GFP flags in rtnl_net_notifyid() 2019-10-25 20:14:42 -07:00
net-procfs.c net: procfs: use index hashlist instead of name hashlist 2019-10-01 14:47:19 -07:00
net-sysfs.c net-sysfs: Call dev_hold always in rx_queue_add_kobject 2019-12-17 22:57:11 -08:00
net-sysfs.h
net-traces.c page_pool: add tracepoints for page_pool with details need by XDP 2019-06-19 11:23:13 -04:00
netclassid_cgroup.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
netevent.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
netpoll.c net: fix skb use after free in netpoll 2019-08-27 20:52:02 -07:00
netprio_cgroup.c netprio: use css ID instead of cgroup ID 2019-11-12 08:18:03 -08:00
page_pool.c page_pool: handle page recycle for NUMA_NO_NODE condition 2020-01-02 15:37:52 -08:00
pktgen.c pktgen: remove unnecessary assignment in pktgen_xmit() 2019-10-17 14:25:13 -04:00
ptp_classifier.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
request_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
rtnetlink.c rtnetlink: provide permanent hardware address in RTM_NEWLINK 2019-12-12 17:07:05 -08:00
scm.c y2038: socket: remove timespec reference in timestamping 2019-11-15 14:38:29 +01:00
secure_seq.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
skbuff.c net: Rephrased comments section of skb_mpls_pop() 2019-12-24 22:24:45 -08:00
skmsg.c net: skmsg: fix TLS 1.3 crash with full sk_msg 2019-11-28 22:40:29 -08:00
sock_diag.c sock: make cookie generation global instead of per netns 2019-08-09 13:14:46 -07:00
sock_map.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-09-15 14:17:27 +02:00
sock_reuseport.c net/core: Replace rcu_swap_protected() with rcu_replace_pointer() 2019-10-30 08:45:26 -07:00
sock.c net: annotate lockless accesses to sk->sk_pacing_shift 2019-12-17 22:09:52 -08:00
stream.c tcp: make sure EPOLLOUT wont be missed 2019-08-19 13:07:43 -07:00
sysctl_net_core.c net, sysctl: Fix compiler warning when only cBPF is present 2019-12-19 17:17:51 +01:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: Use skb accessors in network core 2019-07-22 20:47:56 -07:00
utils.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xdp.c treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00