linux/net
John Fastabend eb82a99447 net: sched, fix OOO packets with pfifo_fast
After the qdisc lock was dropped in pfifo_fast we allow multiple
enqueue threads and dequeue threads to run in parallel. On the
enqueue side the skb bit ooo_okay is used to ensure all related
skbs are enqueued in-order. On the dequeue side though there is
no similar logic. What we observe is with fewer queues than CPUs
it is possible to re-order packets when two instances of
__qdisc_run() are running in parallel. Each thread will dequeue
a skb and then whichever thread calls the ndo op first will
be sent on the wire. This doesn't typically happen because
qdisc_run() is usually triggered by the same core that did the
enqueue. However, drivers will trigger __netif_schedule()
when queues are transitioning from stopped to awake using the
netif_tx_wake_* APIs. When this happens netif_schedule() calls
qdisc_run() on the same CPU that did the netif_tx_wake_* which
is usually done in the interrupt completion context. This CPU
is selected with the irq affinity which is unrelated to the
enqueue operations.

To resolve this we add a RUNNING bit to the qdisc to ensure
only a single dequeue per qdisc is running. Enqueue and dequeue
operations can still run in parallel and also on multi queue
NICs we can still have a dequeue in-flight per qdisc, which
is typically per CPU.

Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Reported-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 12:36:23 -04:00
..
6lowpan
9p virtio: bugfixes 2018-02-15 14:29:27 -08:00
802
8021q vlan: Fix out of order vlan headers with reorder header off 2018-03-16 10:03:47 -04:00
appletalk net: delete /proc THIS_MODULE references 2018-01-16 15:01:33 -05:00
atm vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
ax25 net: delete /proc THIS_MODULE references 2018-01-16 15:01:33 -05:00
batman-adv batman-adv: Fix skbuff rcsum on packet reroute 2018-03-18 13:20:32 +01:00
bluetooth Bluetooth: Fix missing encryption refresh on Security Request 2018-03-01 19:55:56 +01:00
bpf bpf: fix null pointer deref in bpf_prog_test_run_xdp 2018-02-01 07:43:56 -08:00
bridge netfilter: bridge: ebt_among: add more missing match size checks 2018-03-11 21:24:49 +01:00
caif vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
can can: migrate documentation to restructured text 2018-01-26 10:46:44 +01:00
ceph libceph, ceph: avoid memory leak when specifying same option several times 2018-02-26 16:19:30 +01:00
core devlink: Remove redundant free on error path 2018-03-20 10:59:29 -04:00
dcb
dccp dccp: check sk for closed state in dccp_sendmsg() 2018-03-07 13:38:56 -05:00
decnet dn_getsockoptdecnet: move nf_{get/set}sockopt outside sock lock 2018-02-16 15:46:15 -05:00
dns_resolver afs: Support the AFS dynamic root 2018-02-06 14:43:37 +00:00
dsa net: dsa: Fix dsa_is_user_port() test inversion 2018-03-12 21:04:55 -04:00
ethernet
hsr
ieee802154 ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event() 2018-03-09 11:19:26 -05:00
ife
ipv4 net/ipv4: disable SMC TCP option with SYN Cookies 2018-03-25 20:53:54 -04:00
ipv6 ipv6: the entire IPv6 header chain must fit the first fragment 2018-03-25 21:17:20 -04:00
iucv net/iucv: Free memory obtained by kzalloc 2018-03-16 11:42:12 -04:00
kcm kcm: lock lower socket in kcm_attach 2018-03-16 11:12:16 -04:00
key
l2tp l2tp: fix races with ipv4-mapped ipv6 addresses 2018-03-12 15:11:09 -04:00
l3mdev
lapb
llc net: delete /proc THIS_MODULE references 2018-01-16 15:01:33 -05:00
mac80211 mac80211: add ieee80211_hw flag for QoS NDP support 2018-03-21 10:56:18 +01:00
mac802154
mpls net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ncsi
netfilter netfilter: nf_tables: do not hold reference on netdevice from preparation phase 2018-03-22 13:17:52 +01:00
netlabel
netlink netlink: make sure nladdr has correct size in netlink_connect() 2018-03-25 21:14:51 -04:00
netrom net: delete /proc THIS_MODULE references 2018-01-16 15:01:33 -05:00
nfc NFC: llcp: Limit size of SDP URI 2018-02-16 15:16:05 -05:00
nsh
openvswitch openvswitch: meter: fix the incorrect calculation of max delta_t 2018-03-11 22:48:59 -04:00
packet vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
phonet vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
psample
qrtr qrtr: add MODULE_ALIAS macro to smd 2018-02-26 15:07:04 -05:00
rds rds: Incorrect reference counting in TCP socket creation 2018-03-02 09:40:27 -05:00
rfkill vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
rose net: delete /proc THIS_MODULE references 2018-01-16 15:01:33 -05:00
rxrpc rxrpc: Fix send in rxrpc_send_data_packet() 2018-02-22 15:37:47 -05:00
sched net: sched, fix OOO packets with pfifo_fast 2018-03-26 12:36:23 -04:00
sctp net: use skb_is_gso_sctp() instead of open-coding 2018-03-09 11:41:47 -05:00
smc net/smc: simplify wait when closing listen socket 2018-03-15 09:49:13 -04:00
strparser
sunrpc vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
switchdev
tipc tipc: correct initial value for group congestion flag 2018-02-27 11:46:03 -05:00
tls tls: Use correct sk->sk_prot for IPV6 2018-02-27 14:41:48 -05:00
unix net: af_unix: fix typo in UNIX_SKB_FRAGS_SZ comment 2018-02-13 12:21:45 -05:00
vmw_vsock vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
wimax
wireless cfg80211: add missing dependency to CFG80211 suboptions 2018-02-27 10:54:12 +01:00
x25
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2018-03-13 10:38:07 -04:00
compat.c
Kconfig Staging/IIO patches for 4.16-rc1 2018-02-01 09:51:57 -08:00
Makefile
socket.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
sysctl_net.c