linux/net/sched
Sheng Lan 5845f70638 net: netem: fix skb length BUG_ON in __skb_to_sgvec
It can be reproduced by following steps:
1. virtio_net NIC is configured with gso/tso on
2. configure nginx as http server with an index file bigger than 1M bytes
3. use tc netem to produce duplicate packets and delay:
   tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
4. continually curl the nginx http server to get index file on client
5. BUG_ON is seen quickly

[10258690.371129] kernel BUG at net/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094]  <IRQ>
[10258690.372094]  skb_to_sgvec+0x11/0x40
[10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094]  dev_hard_start_xmit+0x9b/0x200
[10258690.372094]  sch_direct_xmit+0xff/0x260
[10258690.372094]  __qdisc_run+0x15e/0x4e0
[10258690.372094]  net_tx_action+0x137/0x210
[10258690.372094]  __do_softirq+0xd6/0x2a9
[10258690.372094]  irq_exit+0xde/0xf0
[10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
[10258690.372094]  apic_timer_interrupt+0xf/0x20
[10258690.372094]  </IRQ>

In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
linear data size and nonlinear data size, thus BUG_ON triggered.
Because the skb is cloned and a part of nonlinear data is split off.

Duplicate packet is cloned in netem_enqueue() and may be delayed
some time in qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, the skb will be retransmit later in write queue.
the skb will be fragmented by tso_fragment(), the limit size
that depends on cwnd and mss decrease, the skb's nonlinear
data will be split off. The length of the skb cloned by netem
will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
the BUG_ON trigger.

To fix it, netem returns NET_XMIT_SUCCESS to upper stack
when it clones a duplicate packet.

Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
Signed-off-by: Sheng Lan <lansheng@huawei.com>
Reported-by: Qin Ji <jiqin.ji@huawei.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28 10:31:31 -08:00
..
act_api.c net/sched: Remove egdev mechanism 2018-12-10 15:54:34 -08:00
act_bpf.c
act_connmark.c net_sched: add missing tcf_lock for act_connmark 2018-08-31 22:57:43 -07:00
act_csum.c
act_gact.c net/sched: act_gact: disallow 'goto chain' on fallback control action 2018-10-22 19:40:55 -07:00
act_ife.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-04 21:33:03 -07:00
act_ipt.c net/sched: act_ipt: fix refcount leak when replace fails 2019-02-24 17:28:55 -08:00
act_meta_mark.c
act_meta_skbprio.c
act_meta_skbtcindex.c
act_mirred.c act_mirred: clear skb->tstamp on redirect 2018-11-11 10:21:31 -08:00
act_nat.c net: sched: act_nat: remove dependency on rtnl lock 2018-09-08 10:18:25 -07:00
act_pedit.c net/sched: act_pedit: fix memory leak when IDR allocation fails 2018-11-16 19:53:45 -08:00
act_police.c net/sched: act_police: fix memory leak in case of invalid control action 2018-11-30 17:14:06 -08:00
act_sample.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-09-18 09:33:27 -07:00
act_simple.c
act_skbedit.c net/sched: act_skbedit: fix refcount leak when replace fails 2019-02-24 17:31:43 -08:00
act_skbmod.c
act_tunnel_key.c net: sched: act_tunnel_key: fix NULL pointer dereference during init 2019-02-25 10:13:38 -08:00
act_vlan.c net/core: use __vlan_hwaccel helpers 2018-11-08 20:45:04 -08:00
cls_api.c net_sched: refetch skb protocol for each filter 2019-01-16 13:25:11 -08:00
cls_basic.c
cls_bpf.c net_sched: fold tcf_block_cb_call() into tc_setup_cb_call() 2018-12-14 15:32:19 -08:00
cls_cgroup.c
cls_flow.c
cls_flower.c net: cls_flower: Remove filter from mask before freeing it 2019-02-04 09:19:14 -08:00
cls_fw.c
cls_matchall.c net_sched: fold tcf_block_cb_call() into tc_setup_cb_call() 2018-12-14 15:32:19 -08:00
cls_route.c
cls_rsvp6.c
cls_rsvp.c
cls_rsvp.h
cls_tcindex.c net_sched: fix two more memory leaks in cls_tcindex 2019-02-12 14:10:56 -05:00
cls_u32.c net_sched: fold tcf_block_cb_call() into tc_setup_cb_call() 2018-12-14 15:32:19 -08:00
em_canid.c
em_cmp.c
em_ipset.c
em_ipt.c
em_meta.c
em_nbyte.c
em_text.c
em_u32.c
ematch.c
Kconfig tc: Add support for configuring the taprio scheduler 2018-10-04 13:52:23 -07:00
Makefile tc: Add support for configuring the taprio scheduler 2018-10-04 13:52:23 -07:00
sch_api.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-12-27 13:04:52 -08:00
sch_atm.c net: sched: rename qdisc_destroy() to qdisc_put() 2018-09-25 20:17:35 -07:00
sch_blackhole.c
sch_cake.c sch_cake: Correctly update parent qlen when splitting GSO packets 2019-01-15 20:12:01 -08:00
sch_cbq.c net: sched: rename qdisc_destroy() to qdisc_put() 2018-09-25 20:17:35 -07:00
sch_cbs.c sched: Avoid dereferencing skb pointer after child enqueue 2019-01-15 20:12:00 -08:00
sch_choke.c
sch_codel.c
sch_drr.c sched: Fix detection of empty queues in child qdiscs 2019-01-15 20:12:00 -08:00
sch_dsmark.c sched: Avoid dereferencing skb pointer after child enqueue 2019-01-15 20:12:00 -08:00
sch_etf.c etf: Drop all expired packets 2018-11-16 20:39:34 -08:00
sch_fifo.c net: sched: rename qdisc_destroy() to qdisc_put() 2018-09-25 20:17:35 -07:00
sch_fq_codel.c net: Add and use skb_mark_not_on_list(). 2018-09-10 10:06:54 -07:00
sch_fq.c net_sched: sch_fq: avoid calling ktime_get_ns() if not needed 2018-11-20 09:51:32 -08:00
sch_generic.c Documentation: bring operstate documentation up-to-date 2019-02-11 12:38:51 -08:00
sch_gred.c net: sched: gred: support reporting stats from offloads 2018-11-19 18:53:46 -08:00
sch_hfsc.c sched: Fix detection of empty queues in child qdiscs 2019-01-15 20:12:00 -08:00
sch_hhf.c net: Add and use skb_mark_not_on_list(). 2018-09-10 10:06:54 -07:00
sch_htb.c sched: Avoid dereferencing skb pointer after child enqueue 2019-01-15 20:12:00 -08:00
sch_ingress.c
sch_mq.c net: sched: mq: offload a graft notification 2018-11-14 08:51:28 -08:00
sch_mqprio.c net: sched: rename qdisc_destroy() to qdisc_put() 2018-09-25 20:17:35 -07:00
sch_multiq.c net: sched: rename qdisc_destroy() to qdisc_put() 2018-09-25 20:17:35 -07:00
sch_netem.c net: netem: fix skb length BUG_ON in __skb_to_sgvec 2019-02-28 10:31:31 -08:00
sch_pie.c net: sched: pie: fix coding style issues 2018-10-07 20:39:01 -07:00
sch_plug.c
sch_prio.c sched: Avoid dereferencing skb pointer after child enqueue 2019-01-15 20:12:00 -08:00
sch_qfq.c sched: Fix detection of empty queues in child qdiscs 2019-01-15 20:12:00 -08:00
sch_red.c net: sched: red: notify drivers about RED's limit parameter 2018-11-14 08:51:28 -08:00
sch_sfb.c net: sched: rename qdisc_destroy() to qdisc_put() 2018-09-25 20:17:35 -07:00
sch_sfq.c
sch_skbprio.c
sch_taprio.c tc: Add support for configuring the taprio scheduler 2018-10-04 13:52:23 -07:00
sch_tbf.c sched: Avoid dereferencing skb pointer after child enqueue 2019-01-15 20:12:00 -08:00
sch_teql.c