Currently the truncated bit is set only when the mirrored packet
is larger than mtu. For certain cases, the packet might already
been truncated before sending to the erspan tunnel. In this case,
the patch detect whether the IP header's total length is larger
than the actual skb->len. If true, this indicated that the
mirrored packet is truncated and set the erspan truncate bit.
I tested the patch using bpf_skb_change_tail helper function to
shrink the packet size and send to erspan tunnel.
Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
r8169: further improvements w/o functional change
This series aims at further improving and simplifying the code w/o
any intended functional changes.
Series was tested on: RTL8169sb, RTL8168d, RTL8168e-vl
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip-specific init code includes quite some calls which are
identical for all chips. So move these calls to tp->hw_start().
In addition move rtl_set_rx_max_size() a little to make sure it's
defined before it's used. Unfortunately the diff generated by git
is a little bit hard to read.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__dev_open() calls the ndo_set_rx_mode callback anyway, so we don't
have to do it here too.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently done:
- if mac_version in (01, 02, 03, 04)
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
- if mac_version in (01, 02, 03, 04)
rtl_set_rx_tx_config_registers(tp);
- if mac_version not in (01, 02, 03, 04)
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
rtl_set_rx_tx_config_registers(tp);
So we do exactly the same independent of chip version and can simplify
the code.
In addition remove the call to rtl_init_rxcfg(), it's called in
rtl_init_one() already and the set bits are never touched later.
rtl_init_8168/8101 don't include this call either.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both quirk masks are the same, so we can merge them. The quirk mask
includes most bits so it's actually easier to define a mask with
the bits to keep.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->cp_cmd is supposed to reflect the current value of the CplusCmd
register. Several (quite old) changes however directly change this
register w/o updating tp->cp_cmd. Also we have places in the code
reading this register where we could use the cached value.
In addition:
- Properly initialize tp->cmd with the register value.
- In rtl_hw_start_8169 remove one setting of PCIMulRW because it's
set unconditionally anyway a few lines later.
- In rtl_hw_start_8168 properly mask out the INTT bits before
setting INTT_1. So far we rely on both bits being zero.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__rtl8169_set_features is used in rtl8169_set_features only, so we
can inline it. In addition:
- Remove check (features ^ dev->features), __netdev_update_features
check's already that requested features differ from current ones.
- Don't mask out unsupported flags, there's no benefit in it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RxChkSum and RxVlan aren't touched outside __rtl8169_set_features
(except in probe), so they are always in sync with dev->features.
And the RxConfig flags are set in rtl_set_rx_mode() which is
called via dev_set_rx_mode() from __dev_open().
Therefore we can safely remove this call.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake in oct_stats_strings text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha says:
====================
liquidio: enhanced ethtool --set-channels feature
For the ethtool --set-channels feature, the liquidio driver currently
accepts max combined value as the queue count configured during driver
load time, where max combined count is the total count of input and output
queues. This limitation is applicable only when SR-IOV is enabled, that
is, when VFs are created for PF. If SR-IOV is not enabled, the driver can
configure max supported (64) queues.
This series of patches are for enhancing driver to accept
max supported queues for ethtool --set-channels.
Changes in V2:
Only patch #6 was changed to fix these Sparse warnings reported by kbuild
test robot:
lio_ethtool.c:848:5: warning: symbol 'lio_23xx_reconfigure_queue_count'
was not declared. Should it be static?
lio_ethtool.c:877:22: warning: incorrect type in assignment (different
base types)
lio_ethtool.c:878:22: warning: incorrect type in assignment (different
base types)
lio_ethtool.c:879:22: warning: incorrect type in assignment (different
base types)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhancing driver to accept max supported queues for ethtool --set-channels
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved common function setup_glists to lio_core.c
and reamed it to lio_setup_glists
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moving common definition octnic_gather to octeon_network.h
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved common function delete_glists to lio_core.c
and renamed it to lio_delete_glists
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved common function list_delete_head to octeon_network.h
and renamed it to lio_list_delete_head
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved common function if_cfg_callback to lio_core.c
and renamed it to lio_if_cfg_callback.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck says:
====================
Clean up users of skb_tx_hash and __skb_tx_hash
I am in the process of doing some work to try and enable macvlan Tx queue
selection without using ndo_select_queue. As a part of that I will likely
need to make changes to skb_tx_hash. As such this is a clean up or refactor
of the two spots where he function has been used. In both cases it didn't
really seem like the function was being used correctly so I have updated
both code paths to not make use of the function.
My current development environment doesn't have an mlx4 or OPA vnic
available so the changes to those have been build tested only.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in the fallback path has supported XDP in conjunction with the Tx
traffic classification for TCs for over a year now. So instead of just
calling skb_tx_hash for every packet we are better off using the fallback
since that will record the Tx queue to the socket and then that can be used
instead of having to recompute the hash every time.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is meant to clean up how the opa_vnic is obtaining entropy from
Tx packets.
The code as it was written was claiming to get 16 bits of hash, but from
what I can tell it was only ever actually getting 14 bits as it was limited
to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for
entropy.
Instead of throwing away all that input I am cutting out the middle man and
instead having the code call skb_get_hash directly and then folding the 32
bit value into a 8 bit value using a pair of shifts and XOR operations.
Execution wise this new approach should provide more entropy and be faster
since we are bypassing the reciprocal multiplication to reduce the 32b
value to 16b and instead just using a shift/XOR combination.
In addition we can drop the unneeded adapter value from the call to get the
entropy since the netdev itself isn't even needed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raghuram Chary J says:
====================
lan78xx updates along with Fixed phy Support
These series of patches handle few modifications in driver
and adds support for fixed phy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify the error messages when phy registration fails.
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove driver version info from the lan78xx driver.
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding Fixed PHY support to the lan78xx driver.
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: mmap: rework zerocopy receive
syzbot reported a lockdep issue caused by tcp mmap() support.
I implemented Andy Lutomirski nice suggestions to resolve the
issue and increase scalability as well.
First patch is adding a new getsockopt() operation and changes mmap()
behavior.
Second patch changes tcp_mmap reference program.
v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us.
v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
instead of setsockopt(), feedback from Ka-Cheon Poon
v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
Properly clear zc->recv_skip_hint in case user request was completed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After prior kernel change, mmap() on TCP socket only reserves VMA.
We have to use getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
to perform the transfert of pages from skbs in TCP receive queue into such VMA.
struct tcp_zerocopy_receive {
__u64 address; /* in: address of mapping */
__u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */
};
After a successful getsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
number of bytes that were mapped, and @recv_skip_hint contains number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.
Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.
1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
This operation does not involve any TCP locking.
2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
the transfert of pages from skbs to one VMA.
This operation only uses down_read(¤t->mm->mmap_sem) after
holding TCP lock, thus solving the lockdep issue.
This new implementation was suggested by Andy Lutomirski with great details.
Benefits are :
- Better scalability, in case multiple threads reuse VMAS
(without mmap()/munmap() calls) since mmap_sem wont be write locked.
- Better error recovery.
The previous mmap() model had to provide the expected size of the
mapping. If for some reason one part could not be mapped (partial MSS),
the whole operation had to be aborted.
With the tcp_zerocopy_receive struct, kernel can report how
many bytes were successfuly mapped, and how many bytes should
be read to skip the problematic sequence.
- No more memory allocation to hold an array of page pointers.
16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
- skbs are freed while mmap_sem has been released
Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
Note that memcg might require additional changes.
Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: remove Global 2 setup
Parts of the mv88e6xxx driver still write arbitrary registers of
different banks at setup time, which is misleading especially when
supporting multiple device models.
This patchset moves two features setup into the top lovel
mv88e6xxx_setup function and kills the old Global 2 register bank setup
function. It brings no functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The remaining values written to the Switch Management Register in the
mv88e6xxx_g2_setup function are specific to 88E6352 and older, and are
the default values anyway.
Thus remove completely this function. The mv88e6xxx driver no more
contains setup code to access arbitrary Global 2 registers.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the Device Mapping setup out of the specific Global 2 code,
into the top level device setup function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the trunking setup out of Global 2 specific setup into the top
level mv88e6xxx_setup function.
Note that the 88E6390 family calls this LAG instead of Trunk and
supports 32 possible ID routing vectors, with LAG ID bit 4 being placed
in Global 2 register 0x1D...
We don't need Trunk (or LAG) IDs for the moment, thus keep it simple.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit c59530d0d5 ("net: Move PHY statistics code into PHY
library helpers") we made net/core/ethtool.c reference symbols which are
part of the library which can be modular. David introduced a temporary
fix with 1ecd6e8ad9 ("phy: Temporary build fix after phylib changes.")
which would prevent such modularity.
This is not desireable of course, so instead, just inline the functions
into include/linux/phy.h to keep both options available.
Fixes: c59530d0d5 ("net: Move PHY statistics code into PHY library helpers")
Fixes: 1ecd6e8ad9 ("phy: Temporary build fix after phylib changes.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP GSO needs to export __udp_gso_segment to call it from ipv6.
I accidentally exported static ipv4 function __udp4_gso_segment.
Remove that EXPORT_SYMBOL_GPL.
Fixes: ee80d1ebe5 ("udp: add udp gso")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a documentation for seg_flowlabel sysctl into
Documentation/networking/ip-sysctl.txt
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
U64_MAX is well defined now while the UINT64_MAX is not, so we fall
back to drivers' own definition as below:
#ifndef UINT64_MAX
#define UINT64_MAX (u64)(~((u64)0))
#endif
I believe this is in one phy driver then copied and pasted to other phy
drivers.
Replace the UINT64_MAX with U64_MAX to clean up the source code.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use ns_to_timespec64() and timespec64_to_ns() instead of open coding
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a misc character device named "qrtr-tun" for the purpose
of allowing user space applications to implement endpoints in the qrtr
network.
This allows more advanced (and dynamic) testing of the qrtr code as well
as opens up the ability of tunneling qrtr over a network or USB link.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says:
====================
selftests: Add tests for mirroring to gretap
This suite tests GRE-encapsulated mirroring. The general topology that
most of the tests use is as follows, but each test defines details of
the topology based on its needs, and some tests actually use a somewhat
different topology.
+---------------------+ +---------------------+
| H1 | | H2 |
| + $h1 | | $h2 + |
+-----|---------------+ +---------------|-----+
| |
+-----|------------------------------------------------------|-----+
| SW o---> mirror | |
| +---|------------------------------------------------------|---+ |
| | + $swp1 BR $swp2 + | |
| +--------------------------------------------------------------+ |
| |
| + $swp3 + gt6 (ip6gretap) + gt4 (gretap) |
+-----|----------------:--------------------:----------------------+
| : :
+-----|----------------:--------------------:----------------------+
| + $h3 + h3-gt6(ip6gretap) + h3-gt4 (gretap) |
| H3 |
+------------------------------------------------------------------+
The following axes of configuration space are tested:
- ingress and egress mirroring
- mirroring triggered by matchall and flower
- mirroring to ipgretap and ip6gretap
- remote tunnel reachable directly or through a next-hop route
- skip_sw as well as skip_hw configurations
Apart from basic tests with the above mentioned features, the following
tests are included:
- handling of changes to neighbors pertinent to routing decisions in
mirrored underlay
- handling of configuration changes at the mirrored-to tunnel (endpoint
addresses, upness)
A suite of mlxsw-specific tests will be part of a separate submission
through linux-mlxsw patch queue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These tests set up mirroring in a situation that the configuration is
incorrect, i.e. mirrored packets, if any, are not supposed to reach
destination tunnel device. Then the configuration is rectified and
mirroring is checked to have started working.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that when a mirror to gretap or ip6gretap netdevice is configured,
changes to neighbors are reflected.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test for mirroring to a gretap and an ip6gretap netdevices such
that the mirroring action is triggered by a flower match.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test mirroring to a gretap and an ip6gretap netdevice with a bound
device, where the tunnel device and the bound device are in different
VRFs (an overlay / underlay configuration).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test mirror to a gretap and an ip6gretap netdevice such that the remote
address of the tunnel is reachable through a next-hop route.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test for basic mirroring to gretap and ip6gretap netdevices.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To simplify implementation of mirror-to-gretap tests, extend lib.sh with
several new functions that might potentially be useful more
broadly (although right now the mirroring tests will be the only
client).
Also add mirror_lib.sh with code useful for mirroring tests,
mirror_gre_lib.sh with code specifically useful for mirror-to-gretap
tests, and mirror_gre_topo.sh that primes a given test with a good
baseline topology that the test can then tweak to its liking.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Net-next updates.
This series has 3 main features. The first is to add mqprio TC to
hardware queue mapping to avoid reprogramming hardware CoS queue
watermarks during run-time. The second is DIM improvements from
Andy Gospo. The third is some improvements to VF resource allocations
when supporting large numbers of VFs with more limited resources.
There are some additional minor improvements and a new function level
discard counter.
v2: Fixed EEPROM typo noted by Andrew Lunn.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add logic to reserve default rings at driver open time if none was
reserved during probe time. This will happen when the PF driver did
not provision minimum rings to the VF, due to more limited resources.
Driver open will only succeed if some minimum rings can be reserved.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>