linux/net/core
Nikolay Aleksandrov 4b9b1cdf83 net: fix wrong mac_len calculation for vlans
After 1e785f48d2 ("net: Start with correct mac_len in
skb_network_protocol") skb->mac_len is used as a start of the
calculation in skb_network_protocol() but that is not always correct. If
skb->protocol == 8021Q/AD, usually the vlan header is already inserted
in the skb (i.e. vlan reorder hdr == 0). Usually when the packet enters
dev_hard_xmit it has mac_len == 0 so we take 2 bytes from the
destination mac address (skb->data + VLAN_HLEN) as a type in
skb_network_protocol() and return vlan_depth == 4. In the case where TSO is
off, then the mac_len is set but it's == 18 (ETH_HLEN + VLAN_HLEN), so
skb_network_protocol() returns a type from inside the packet and
offset == 22. Also make vlan_depth unsigned as suggested before.
As suggested by Eric Dumazet, move the while() loop in the if() so we
can avoid additional testing in fast path.

Here are few netperf tests + debug printk's to illustrate:
cat netperf.tso-on.reorder-on.bugged
- Vlan -> device (reorder on, default, this case is okay)
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    7111.54
[   81.605435] skb->len 65226 skb->gso_size 1448 skb->proto 0x800
skb->mac_len 0 vlan_depth 0 type 0x800

- Vlan -> device (reorder off, bad)
cat netperf.tso-on.reorder-off.bugged
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00     241.35
[  204.578332] skb->len 1518 skb->gso_size 0 skb->proto 0x8100
skb->mac_len 0 vlan_depth 4 type 0x5301
0x5301 are the last two bytes of the destination mac.

And if we stop TSO, we may get even the following:
[   83.343156] skb->len 2966 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 18 vlan_depth 22 type 0xb84
Because mac_len already accounts for VLAN_HLEN.

After the fix:
cat netperf.tso-on.reorder-off.fixed
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.01    5001.46
[   81.888489] skb->len 65230 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 0 vlan_depth 18 type 0x800

CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Borkman <dborkman@redhat.com>
CC: David S. Miller <davem@davemloft.net>

Fixes:1e785f48d29a ("net: Start with correct mac_len in
skb_network_protocol")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 19:39:13 -07:00
..
datagram.c net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec() 2013-10-29 00:19:04 -04:00
dev_addr_lists.c net: Correctly sync addresses from multiple sources to single device 2014-01-23 13:06:34 -08:00
dev_ioctl.c net_tstamp: Add SIOCGHWTSTAMP ioctl to match SIOCSHWTSTAMP 2013-11-19 19:07:21 +00:00
dev.c net: fix wrong mac_len calculation for vlans 2014-06-01 19:39:13 -07:00
drop_monitor.c net: drop_monitor: fix the value of maxattr 2013-12-09 21:10:38 -05:00
dst.c ipv4: add a sock pointer to dst->output() path. 2014-04-15 13:47:15 -04:00
ethtool.c net: add busy_poll device feature 2014-04-03 14:31:34 -04:00
fib_rules.c net: fix 'ip rule' iif/oif device rename 2014-02-09 19:02:52 -08:00
filter.c net: filter: initialize A and X registers 2014-04-23 15:34:41 -04:00
flow_dissector.c net: Rename skb->rxhash to skb->hash 2014-03-26 15:58:20 -04:00
flow.c CPU hotplug notifiers registration fixes for 3.15-rc1 2014-04-07 14:55:46 -07:00
gen_estimator.c net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
gen_stats.c net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
iovec.c net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
link_watch.c net: make all team port device link events urgent 2013-06-13 02:31:41 -07:00
Makefile net: ptp: move PTP classifier in its own file 2014-04-01 16:43:18 -04:00
neighbour.c neigh: set nud_state to NUD_INCOMPLETE when probing router reachability 2014-05-13 12:43:05 -04:00
net_namespace.c rtnetlink: wait for unregistering devices in rtnl_link_unregister() 2014-05-15 15:30:33 -04:00
net-procfs.c rps: selective flow shedding during softnet overflow 2013-05-20 13:48:04 -07:00
net-sysfs.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-04-02 20:53:45 -07:00
net-sysfs.h net: netdev_kobject_init: annotate with __init 2014-01-05 20:27:54 -05:00
net-traces.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
netclassid_cgroup.c cgroup: drop @skip_css from cgroup_taskset_for_each() 2014-02-13 06:58:41 -05:00
netevent.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
netpoll.c netpoll: Use skb_irq_freeable to make zap_completion_queue safe. 2014-04-01 17:53:36 -04:00
netprio_cgroup.c cgroup: drop const from @buffer of cftype->write_string() 2014-03-19 10:23:54 -04:00
pktgen.c pktgen: be friendly to LLTX devices 2014-04-12 01:59:38 -04:00
ptp_classifier.c net: ptp: move PTP classifier in its own file 2014-04-01 16:43:18 -04:00
request_sock.c net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
rtnetlink.c rtnetlink: wait for unregistering devices in rtnl_link_unregister() 2014-05-15 15:30:33 -04:00
scm.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
secure_seq.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-10-23 16:49:34 -04:00
skbuff.c net: gro: make sure skb->cb[] initial content has not to be zero 2014-05-16 17:24:54 -04:00
sock_diag.c net: Move the permission check in sock_diag_put_filterinfo to packet_diag_dump 2014-04-24 13:44:53 -04:00
sock.c net: Add variants of capable for use on on sockets 2014-04-24 13:44:53 -04:00
stream.c net: replace macros net_random and net_srandom with direct calls to prandom 2014-01-14 15:15:25 -08:00
sysctl_net_core.c rps: NUMA flow limit allocations 2013-12-19 19:00:07 -05:00
timestamping.c net: ptp: move PTP classifier in its own file 2014-04-01 16:43:18 -04:00
user_dma.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
utils.c net: avoid dependency of net_get_random_once on nop patching 2014-05-14 00:37:34 -04:00