This patch converts tun/tap to a multiqueue devices and expose the multiqueue
queues as multiple file descriptors to userspace. Internally, each tun_file were
abstracted as a queue, and an array of pointers to tun_file structurs were
stored in tun_structure device, so multiple tun_files were allowed to be
attached to the device as multiple queues.
When choosing txq, we first try to identify a flow through its rxhash, if it
does not have such one, we could try recorded rxq and then use them to choose
the transmit queue. This policy may be changed in the future.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add flags to be used by creating multiqueue tuntap device.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU were introduced in this patch to synchronize the dereferences between
tun_struct and tun_file. All tun_{get|put} were replaced with RCU, the
dereference from one to other must be done under rtnl lock or rcu read critical
region.
This is needed for the following patches since the one of the goal of multiqueue
tuntap is to allow adding or removing queues during workload. Without RCU,
control path would hold tx locks when adding or removing queues (which may cause
sme delay) and it's hard to change the number of queues without stopping the net
device. With the help of rcu, there's also no need for tun_file hold an refcnt
to tun_struct.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current tuntap makes use of the socket receive queue as its tx queue. To
implement multiple tx queues for tuntap and enable the ability of adding and
removing queues during workload, the first step is to move the socket related
structures to tun_file. Then we could let multiple fds/sockets to be attached to
the tuntap.
This patch removes tun_sock and moves socket related structures from tun_sock or
tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are
still kept in tun_structure since they are used to filter packets for the net
device instead of per transmit queue (at least I see no requirements for
them). After those changes, socket were created and destroyed during file open
and close (instead of device creation and destroy), the socket structures could
be dereferenced from tun_file instead of the file of tun_struct structure
itself.
For persisent device, since we purge during datching and wouldn't queue any
packets when no interface were attached, there's no behaviod changes before and
after this patch, so the changes were transparent to the userspace. To keep the
attributes such as sndbuf, socket filter and vnet header, those would be
re-initialize after a new interface were attached to an persist device.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
This series contains updates to ixgbe, ixgbevf, igbvf, igb and
networking core (bridge). Most notably is the addition of support
for local link multicast addresses in SR-IOV mode to the networking
core.
Also note, the ixgbe patch "ixgbe: Add support for pipeline reset" and
"ixgbe: Fix return value from macvlan filter function" is revised based
on community feedback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_<level> calls take less code than dev_printk(KERN_<LEVEL>
and reducing object size is good.
Coalesce formats for easier grep.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a follow-up for patch "net: filter: add vlan tag access"
to support the new VLAN_TAG/VLAN_TAG_PRESENT accessors in BPF JIT.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ani Sinha <ani@aristanetworks.com>
Cc: Daniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF filters lack ability to access skb->vlan_tci
This patch adds two new ancillary accessors :
SKF_AD_VLAN_TAG (44) mapped to vlan_tx_tag_get(skb)
SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)
This allows libpcap/tcpdump to use a kernel filter instead of
having to fallback to accept all packets, then filter them in
user space.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Ani Sinha <ani@aristanetworks.com>
Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following build failure on S390:
In file included from drivers/net/ethernet/cadence/at91_ether.c:35:0:
drivers/net/ethernet/cadence/macb.h: In function 'macb_is_gem':
drivers/net/ethernet/cadence/macb.h:563:2: error: implicit declaration of function '__raw_readl' [-Werror=implicit-function-declaration]
drivers/net/ethernet/cadence/at91_ether.c: In function 'update_mac_address':
drivers/net/ethernet/cadence/at91_ether.c:119:2: error: implicit declaration of function '__raw_writel' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware doesn't support controlling pause frames autoneg, so
report that back correctly to userspace.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Network device drivers can communicate a Toeplitz hash in skb->rxhash,
but devices differ in their hashing capabilities. All compute a 5-tuple
hash for TCP over IPv4, but for other connection-oriented protocols,
they may compute only a 3-tuple. This breaks RPS load balancing, e.g.,
for TCP over IPv6 flows. Additionally, for GRE and other tunnels,
the kernel computes a 5-tuple hash over the inner packet if possible,
but devices do not.
This patch recomputes the rxhash in software in all cases where it
cannot be certain that a 5-tuple was computed. Device drivers can avoid
recomputation by setting the skb->l4_rxhash flag.
Recomputing adds cycles to each packet when RPS is enabled or the
packet arrives over a tunnel. A comparison of 200x TCP_STREAM between
two servers running unmodified netnext with rxhash computation
in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows
how much time is spent in __skb_get_rxhash in this worst case:
0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash
0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash
0.05% swapper [kernel.kallsyms] [k] __skb_get_rxhash
With 200x TCP_RR it increases to
0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash
0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash
0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash
I considered having the patch explicitly skips recomputation when it knows
that it will not improve the hash (TCP over IPv4), but that conditional
complicates code without saving many cycles in practice, because it has
to take place after flow dissector.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use the module_pci_driver macro to make the code simpler
by eliminating module_init and module_exit calls.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use the module_pci_driver macro to make the code simpler
by eliminating the module_init and module_exit calls
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- some code cleanups and minor fixes (3 of them were reported by Coverity)
- 'struct hard_iface' re-shaping to improve multi-protocol support
- ECTP packets silent drop
- transfer the WIFI flag on clients in case of roaming
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlCOQzIACgkQpGgxIkP9cwdjVgCgmTSMNhAOvIWG/8dV6iiAvDeP
bwIAnjZb5QeF/d4L+lRuqw5hMVTEQnJo
=SAxj
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
included changes:
- some code cleanups and minor fixes (3 of them were reported by Coverity)
- 'struct hard_iface' re-shaping to improve multi-protocol support
- ECTP packets silent drop
- transfer the WIFI flag on clients in case of roaming
The variable retval is initialized but never used
otherwise, so remove the unused variable.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for wol wakeup on unicast, broadcast,
multicast and arp frames.
The wakeup filter code isn't pretty, but it works.
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
adds a "hwaddr" to the "IP-Config: Complete" KERN_INFO message
with the dev_addr of the device selected for auto configuration.
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for the net device ops to manage the embedded
hardware bridge on ixgbe devices. With this patch the bridge
mode can be toggled between VEB and VEPA to support stacking
macvlan devices or using the embedded switch without any SW
component in 802.1Qbg/br environments.
Additionally, this adds source address pruning to the ixgbevf
driver to prune any frames sent back from a reflective relay on
the switch. This is required because the existing hardware does
not support this. Without it frames get pushed into the stack
with its own src mac which is invalid per 802.1Qbg VEPA
definition.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware switches may support enabling and disabling the
loopback switch which puts the device in a VEPA mode defined
in the IEEE 802.1Qbg specification. In this mode frames are
not switched in the hardware but sent directly to the switch.
SR-IOV capable NICs will likely support this mode I am
aware of at least two such devices. Also I am told (but don't
have any of this hardware available) that there are devices
that only support VEPA modes. In these cases it is important
at a minimum to be able to query these attributes.
This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
anticipating bridge attributes that may be common for both embedded
bridges and software bridges this adds a flags attribute
IFLA_BRIDGE_FLAGS currently used to determine if the command or event
is being generated to/from an embedded bridge or software bridge.
Finally, the event generation is pulled out of the bridge module and
into rtnetlink proper.
For example using the macvlan driver in VEPA mode on top of
an embedded switch requires putting the embedded switch into
a VEPA mode to get the expected results.
-------- --------
| VEPA | | VEPA | <-- macvlan vepa edge relays
-------- --------
| |
| |
------------------
| VEPA | <-- embedded switch in NIC
------------------
|
|
-------------------
| external switch | <-- shiny new physical
------------------- switch with VEPA support
A packet sent from the macvlan VEPA at the top could be
loopbacked on the embedded switch and never seen by the
external switch. So in order for this to work the embedded
switch needs to be set in the VEPA state via the above
described commands.
By making these attributes nested in IFLA_AF_SPEC we allow
future extensions to be made as needed.
CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
currently embedded in the ./net/bridge module. This prohibits
them from being used by other bridging devices. One example
of this being hardware that has embedded bridging components.
In order to use these nlmsg types more generically this patch
adds two net_device_ops hooks. One to set link bridge attributes
and another to dump the current bride attributes.
ndo_bridge_setlink()
ndo_bridge_getlink()
CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change fixes a sparse warning triggered by us casting the timestamp in
the packet as a u64 instead of as a __le64. This change corrects that in
order to resolve the sparse warning.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are multiple places in our device nvm where the version is stored.
This update fixes some output errors with some types of images and
refactors the way the version data is gathered and stored for ethtool output.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Newer devices supported by igb can support auto-crossover detection in
forced operation modes. Enable this in the driver, rather than clobbering
this functionality in forced operation.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ignoring the return value from a call to the kernel dma_map API functions
can cause data corruption and system instability. Check the return value
and take appropriate action.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver should not forward LLDP type frames. Inspect the ether type and
do not send if it is an LLDP ethertype frame.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Hw timestamping code caused performance regression in ixgbe driver when the
timestamping is not enabled. The culprit is IXGBE_READ_REG call in the Rx
path which is executed for every received skb. This call is not needed when
the timestamping is disabled or for non-ptp packets.
netperf results:
The ixgbe side of the connection was acting as a server, the netperf command
line on the other side was:
netperf -H 192.168.1.23 -T0,0 -t UDP_STREAM -l 20
The values below mean throughput as reported by netperf (local/remote), for
3 runs, with timestamping not enabled.
3.7.0-rc1+ with CONFIG_IXGBE_PTP off:
5373.83 / 3329.32
5721.88 / 3033.89
5653.42 / 3112.38
3.7.0-rc1+ with CONFIG_IXGBE_PTP on:
5233.64 / 1226.85
5448.67 / 1039.32
5421.36 / 1095.66
Patched 3.7.0-rc1+ with CONFIG_IXGBE_PTP on:
5594.72 / 2942.53
5428.95 / 3110.16
5343.56 / 3200.48
Reported-by: Jesper Brouer <jbrouer@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Andy Gospodarek <gospo@redhat.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adds/updates ASCII descriptor maps for 82598 and 82599 Tx/Rx descriptors.
Current descriptor maps were out of date for 82598 and incorrect for
82599.
Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that compare the total_rx_packets cleaned to budget
instead of decrementing budget. The advantage to this approach is that budget
can now be const and we only end up modifying total_rx_packets instead of
modifying both it and budget.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When setting a MAC filter for the VF the function should return a success
or failure code, not the index of the new filter. It causes spurious NACK
returns to the VF driver.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch simplifies the check for calling en/disable_tx_laser() function
pointer. The pointer is only set on parts that can use it.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In SR-IOV mode the PF driver acts as the uplink port and is
used to send control packets e.g. lldpad, stp, etc.
eth0.1 eth0.2 eth0
VF VF PF
| | | <-- stand-in for uplink
| | |
--------------------------
| Embedded Switch |
--------------------------
|
MAC <-- uplink
But the embedded switch is setup to forward multicast addresses
to all interfaces both VFs and PF and onto the physical link.
This results in reserved MAC addresses used by control protocols
to be forwarded over the switch onto the VF.
In the LLDP case the PF sends an LLDPDU and it is currently
being forwarded to all the VFs who then see the PF as a peer.
This is incorrect.
This patch adds the multicast addresses to the RAR table in the
hardware to prevent this behavior.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The function to set the macvlan filter should return success or failure
instead of the index of the filter. The message processing function was
misinterpreting the index as a non-zero return code indicating failure and
NACKing MAC filter set messages that actually succeeded.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Calling the ixgbe_reset_pipeline_82599 function will ensure a full pipeline
reset on all 82599 devices. This is necessary to avoid possible link issues.
Since this patch accomplishes this by modifying AUTOC.LMS we need to wrap
all AUTOC writes when LESM is enabled.
v2- fix LMS behaviour based on feedback by Martin Josefsson
CC: Martin Josefsson <gandalf@mjufs.se>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
in case of client roaming a new global entry is added while a corresponding
local one is still present. In this case the node can safely pass the WIFI flag
from the local to the global entry.
This change is required to let the AP-isolation correctly working in case of
roaming: if a generic WIFI client C roams from node A to B, A adds a global
entry for C without adding any WIFI flag. The latter will be set only later,
once A has received C's advertisement from B. In this time period the
AP-Isolation (if enabled) would not correctly work since C is not marked as
WIFI, so allowing it to communicate with other WIFI clients.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
In order to properly convert a bitwise AND to a boolean value, the whole
expression must be prepended by "!!".
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
In batadv_check_unicast_ttvn() the code accesses both the unicast header and the
Ethernet header in the payload. For this reason pskb_may_pull() must be invoked
to check for the required space.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
To simplify TranslationTable debugging it is better to print the packet
rerouting message on the DBG_TT log level. In this way a developer interested in
packets rerouting doesn't need to filter it out of the whole ROUTES log.
Moreover, since this message will appear for each rerouted message, it is now
"ratelimited".
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
in case of a new global entry added because of roaming, the roam_at field must
be properly initiated with the current time. This value will be later use to
purge this entry out on time out (if nobody claims it). Instead roam_at field
is now set to zero in this situation leading to an immediate purging of the
related entry.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
We have seen this to break networks when used with bridge loop
avoidance. As we can't see any benefit from sending these ancient frames
via our mesh, we just drop them.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
The test whether we can use a router for alternating bonding should only be
done once because it is already known that it is still usable and will not be
deleted from the list soon.
This patch addresses Coverity #712285: Unchecked return value
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
New operations should not be started when they need an increased module
reference counter and try_module_get failed.
This patch addresses Coverity #712284: Unchecked return value
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
batadv_bit_get_packet checks for all common situations before it decides that
the new received packet indicates that the host was restarted. This extra
condition check at the end of the function is not necessary because this
condition is always true.
This patch addresses Coverity #712296: Logically dead code
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Transmissions over batman-adv devices always start another nested transmission
over devices attached to the batman-adv interface. These devices usually use
the ethernet lockdep class for the tx_queue lock which is also set by default
for all batman-adv devices. Lockdep will detect a nested locking attempt of two
locks with the same class and warn about a possible deadlock.
This is the default and expected behavior and should not alarm the locking
correctness prove mechanism. Therefore, the locks for all netdevice specific tx
queues get a special batman-adv lock class to avoid a false positive for each
transmission.
Reported-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>