linux/net
Eric Dumazet 46b1c18f9d net: sched: put back q.qlen into a single location
In the series fc8b81a598 ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).

It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.

But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.

HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
  htb_activate() -> WARN_ON()
  htb_dequeue_tree() to decide if a class can be htb_deactivated
  when it has no more packets.

HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.

Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.

It seems we have to put back qlen in a central location,
at least for stable kernels.

For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.

For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())

This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.

- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()

Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).

Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.

Fixes: 7e66016f2c ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02 14:10:18 -08:00
..
6lowpan 6lowpan: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-19 00:28:05 +01:00
9p 9p/net: put a lower bound on msize 2018-12-25 17:07:49 +09:00
802
8021q net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
appletalk
atm Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
ax25 ax25: fix possible use-after-free 2019-01-23 11:18:00 -08:00
batman-adv batman-adv: fix uninit-value in batadv_interface_tx() 2019-02-12 13:30:43 -05:00
bluetooth Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
bpf bpf/test_run: fix unkillable BPF_PROG_TEST_RUN 2019-02-19 00:17:03 +01:00
bpfilter net: bpfilter: change section name of bpfilter UMH blob. 2019-01-16 15:46:46 -08:00
bridge Revert "bridge: do not add port to router list when receives query with source 0.0.0.0" 2019-02-23 18:36:06 -08:00
caif Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
can can: bcm: check timer values before ktime conversion 2019-01-22 11:33:46 +01:00
ceph libceph: handle an empty authorize reply 2019-02-18 18:05:33 +01:00
core net: sched: put back q.qlen into a single location 2019-03-02 14:10:18 -08:00
dcb
dccp dccp: fool proof ccid_hc_[rt]x_parse_options() 2019-02-01 14:49:10 -08:00
decnet decnet: fix DN_IFREQ_SIZE 2019-01-27 23:11:55 -08:00
dns_resolver
dsa net: dsa: fix a leaked reference by adding missing of_node_put 2019-02-25 09:34:52 -08:00
ethernet net: ethernet: provide nvmem_get_mac_address() 2018-12-03 15:40:30 -08:00
hsr
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-24 16:19:56 -08:00
ife
ipv4 ipv4: Add ICMPv6 support when parse route ipproto 2019-03-01 16:41:27 -08:00
ipv6 net: sit: fix memory leak in sit_init_net() 2019-03-02 00:53:23 -08:00
iucv iucv: Remove SKB list assumptions. 2018-11-10 16:55:11 -08:00
kcm
key af_key: unconditionally clone on broadcast 2019-02-12 10:36:42 +01:00
l2tp l2tp: copy 4 more bytes to linear part if necessary 2019-01-31 08:58:46 -08:00
l3mdev l3mdev: add function to retreive upper master 2018-12-03 14:15:26 -08:00
lapb
llc llc: do not use sk_eat_skb() 2018-10-22 19:59:20 -07:00
mac80211 mac80211: allocate tailroom for forwarded mesh packets 2019-02-22 14:00:40 +01:00
mac802154
mpls mpls: Return error for RTA_GATEWAY attribute 2019-02-26 13:23:17 -08:00
ncsi net/ncsi: Add NCSI Mellanox OEM command 2018-11-27 16:37:20 -08:00
netfilter ipvs: fix warning on unused variable 2019-02-16 10:41:42 +01:00
netlabel netlabel: fix out-of-bounds memory accesses 2019-02-27 21:45:24 -08:00
netlink net: netlink: rename NETLINK_DUMP_STRICT_CHK -> NETLINK_GET_STRICT_CHK 2018-12-14 11:44:31 -08:00
netrom netrom: switch to sock timer API 2019-01-27 10:38:04 -08:00
nfc net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails 2019-02-27 12:47:08 -08:00
nsh
openvswitch openvswitch: Avoid OOB read when parsing flow nlattrs 2019-01-16 13:35:21 -08:00
packet net/packet: fix 4gb buffer limit due to overflow check 2019-02-12 13:37:23 -05:00
phonet phonet: fix building with clang 2019-02-21 16:23:56 -08:00
psample
qrtr
rds rds: fix refcount bug in rds_sock_addref 2019-01-31 09:43:27 -08:00
rfkill rfkill: gpio: Remove unused include 2018-12-18 13:13:56 +01:00
rose net/rose: fix NULL ax25_cb kernel panic 2019-01-27 10:40:01 -08:00
rxrpc rxrpc: bad unlock balance in rxrpc_recvmsg 2019-02-06 10:54:07 -08:00
sched net: sched: put back q.qlen into a single location 2019-03-02 14:10:18 -08:00
sctp sctp: chunk.c: correct format string for size_t in printk 2019-02-28 10:33:40 -08:00
smc net/smc: fix smc_poll in SMC_INIT state 2019-02-21 10:19:20 -08:00
strparser
sunrpc Two small fixes, one for crashes using nfs/krb5 with older enctypes, one 2019-02-16 17:38:01 -08:00
switchdev net: switchdev: Add extack to switchdev_handle_port_obj_add() callback 2018-12-12 16:34:22 -08:00
tipc tipc: fix race condition causing hung sendto 2019-02-26 14:50:50 -08:00
tls net: tls: Fix deadlock in free_resources tx 2019-01-28 23:07:08 -08:00
unix missing barriers in some of unix_sock ->addr and ->path accesses 2019-02-20 20:06:28 -08:00
vmw_vsock vsock: cope with memory allocation failure at socket creation time 2019-02-08 22:32:05 -08:00
wimax
wireless cfg80211: prevent speculation on cfg80211_classify8021d() return 2019-02-11 15:50:56 +01:00
x25 net/x25: fix a race in x25_bind() 2019-02-23 18:41:06 -08:00
xdp Revert "xsk: simplify AF_XDP socket teardown" 2019-02-21 16:32:25 +01:00
xfrm xfrm: Fix inbound traffic via XFRM interfaces across network namespaces 2019-02-18 10:58:54 +01:00
compat.c net: socket: add check for negative optlen in compat setsockopt 2019-02-22 11:49:28 -08:00
Kconfig net: convert bridge_nf to use skb extension infrastructure 2018-12-19 11:21:37 -08:00
Makefile
socket.c net: socket: set sock->sk to NULL after calling proto_ops::release() 2019-02-25 10:40:57 -08:00
sysctl_net.c