The member data in struct hclge_desc is type of __le32, it needs endian
conversion before using it, and some functions of debugfs didn't do that,
so this patch fixes it.
Fixes: c0ebebb9cc ("net: hns3: Add "dcb register" status information query function")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the workqueue of hclge/hclgevf is executed on
the CPU that initiates scheduling requests by default. In
stress scenarios, the CPU may be busy and workqueue scheduling
is completed after a long period of time. To avoid this
situation and implement proper scheduling, use the WQ_UNBOUND
mode instead. In this way, the workqueue can be performed on
a relatively idle CPU.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a TP port is configured by follow steps:
1.ethtool -s ethx autoneg off speed 100 duplex full
2.ethtool -A ethx rx on tx on
3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off)
4.ethtool -s ethx autoneg off speed 100 duplex full
In step 3, driver will set rx&tx pause parameters of hardware to off as
pause parameters negotiated with link partner are off.
After step 4, the "ethtool -a ethx" command shows both rx and tx pause
parameters are on. However, pause parameters of hardware are still off
and port has no flow control function actually.
To fix this problem, if autoneg is disabled, driver uses its saved
parameters to restore pause of hardware. If the speed is not changed in
this case, there is no link state changed for phy, it will cause the pause
parameter is not taken effect, so we need to force phy to go down and up.
Fixes: aacbe27e82 ("net: hns3: modify how pause options is displayed")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware link offload monitoring can be made to work in 3/4 cases by
switching on firmware feature bit WLANACTIVE_OFFLOAD
- Secure power-save on
- Secure power-save off
- Open power-save on
However, with an open AP if we switch off power-saving - thus never
entering Beacon Mode Power Save - BMPS, firmware never forwards loss
of beacon upwards.
We had hoped that WLANACTIVE_OFFLOAD and some fixes for sequence numbers
would unblock this but, it hasn't and further investigation is required.
Its possible to have a complete set of Secure power-save on/off and Open
power-save on/off provided we use Linux' link monitoring mechanism.
While we debug the Open AP failure we need to fix upstream.
This reverts commit c973fdad79f6eaf247d48b5fc77733e989eb01e1.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211025093037.3966022-2-bryan.odonoghue@linaro.org
If the system is resumed because of an incoming packet, the wcn36xx RX
interrupts is fired before actual resuming of the wireless/mac80211
stack, causing any received packets to be simply dropped. E.g. a ping
request causes a system resume, but is dropped and so never forwarded
to the IP stack.
This change fixes that, disabling DMA interrupts on suspend to no pass
packets until mac80211 is resumed and ready to handle them.
Note that it's not incompatible with RX irq wake.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1635150496-19290-1-git-send-email-loic.poulain@linaro.org
The firmware is offering features such as ARP offload, for which
firmware crafts its own (QoS)packets without waking up the host.
Point is that the sequence numbers generated by the firmware are
not in sync with the host mac80211 layer and can cause packets
such as firmware ARP reponses to be dropped by the AP (too old SN).
To fix this we need to let the firmware manages the sequence
numbers by its own (except for QoS null frames). There is a SN
counter for each QoS queue and one global/baseline counter for
Non-QoS.
Fixes: 84aff52e4f ("wcn36xx: Use sequence number allocated by mac80211")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1635150336-18736-1-git-send-email-loic.poulain@linaro.org
This is essentially exactly following the dma_wmb()/dma_rmb() usage
instructions in Documentation/memory-barriers.txt.
The theoretical races here are:
1. DXE (the DMA Transfer Engine in the Wi-Fi subsystem) seeing the
dxe->ctrl & WCN36xx_DXE_CTRL_VLD write before the dxe->dst_addr_l
write, thus performing DMA into the wrong address.
2. CPU reading dxe->dst_addr_l before DXE unsets dxe->ctrl &
WCN36xx_DXE_CTRL_VLD. This should generally be harmless since DXE
doesn't write dxe->dst_addr_l (no risk of freeing the wrong skb).
Fixes: 8e84c25821 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware")
Signed-off-by: Benjamin Li <benl@squareup.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211023001528.3077822-1-benl@squareup.com
On an open AP when you pull the plug on the AP, if we are not already in
BMPS mode then the firmware will not generate a disconnection event.
Instead we need to monitor for failure to enter BMPS and treat a string of
failures as connection loss.
Secure AP connections don't appear to demonstrate this behavior so the
work-around is limited to open APs only.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211022140447.2846248-2-bryan.odonoghue@linaro.org
WCNSS RX DMA transfer support is limited to 3872 bytes, which is
enough for simple MPDUs (single MSDU), but not enough for cases
with A-MSDU (depending on max AMSDU size or max MPDU size).
In that case the MPDU is spread over multiple transfers, with the
first transfer containing the MPDU header and (at least) the first
A-MSDU subframe and additional transfer(s) containing the following
A-MSDUs. This can be handled with a series of flags to tagging the
first and last A-MSDU transfers.
In that case we have to bufferize and re-linearize the A-MSDU buffers
into a proper MPDU skb before forwarding to mac80211 (in the same way
as it is done in ath10k).
This change also includes sanity check of the buffer descriptor to
prevent skb overflow.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634557705-11120-1-git-send-email-loic.poulain@linaro.org
Until now, offload scanning for 5Ghz channels was considered broken.
However it was mostly a driver issue, caused by bad reporting of the
beacons/probe-resp bands and frequencies, which has been fixed.
We can now allow offload scan for 5GHz band, this reduces the scanning
time comparing to software driven scanning.
Note that offloaded scan is limited to 48 channels, check for this.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634554678-7993-2-git-send-email-loic.poulain@linaro.org
For packets originating from hardware scan, the channel and band is
included in the buffer descriptor (bd->rf_band & bd->rx_ch).
For 2Ghz band the channel value is directly reported in the 4-bit
rx_ch field. For 5Ghz band, the rx_ch field contains a mapping
index (given the 4-bit limitation).
The reserved0 value field is also used to extend 4-bit mapping to
5-bit mapping to support more than 16 5Ghz channels.
This change adds correct reporting of the frequency/band, that is
used in scan mechanism. And is required for 5Ghz hardware scan
support.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1634554678-7993-1-git-send-email-loic.poulain@linaro.org
iwlwifi patches for v5.16
* Support for 160MHz in ranging measurements;
* Some fixes in HE capabilities;
* Fixes in vendor specific capabilities;
* Add the PC of both processors in error dumps;
* Small fix in TDLS;
* Code to sanitize firmware dumps;
* Updates for new FW rate and flags format;
* Continue implementation of new rate and flags format in the FW APIs;
* Some fixes for BZ family initialization;
* Fix session protection in some scenarios;
* Some debugging improvements;
* Fix BT-coex priority;
* Improve PS-poll timeout detection;
* Some other small fixes, clean-ups and improvements.
# gpg: Signature made Fri 22 Oct 2021 11:28:43 AM EEST
# gpg: using RSA key 1772CD7E06F604F5A6EBCB26A1479CA21A3CC5FA
# gpg: Good signature from "Luciano Roth Coelho (Luca) <luca@coelho.fi>" [full]
# gpg: aka "Luciano Roth Coelho (Intel) <luciano.coelho@intel.com>" [full]
This commit introduces HW-GRO offload by using the SHAMPO feature
- Add set feature handler for HW-GRO.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch adds HW_GRO counters to RX packets statistics:
- gro_match_packets: counter of received packets with set match flag.
- gro_packets: counter of received packets over the HW_GRO feature,
this counter is increased by one for every received
HW_GRO cqe.
- gro_bytes: counter of received bytes over the HW_GRO feature,
this counter is increased by the received bytes for every
received HW_GRO cqe.
- gro_skbs: counter of built HW_GRO skbs,
increased by one when we flush HW_GRO skb
(when we call a napi_gro_receive with hw_gro skb).
- gro_large_hds: counter of received packets with large headers size,
in case the packet needs new SKB, the driver will allocate
new one and will not use the headers entry to build it.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
this patch updates the SHAMPO CQE handler to support HW_GRO,
changes in the SHAMPO CQE handler:
- CQE match and flush fields are used to determine if to build new skb
using the new received packet,
or to add the received packet data to the existing RQ.hw_gro_skb,
also this fields are used to determine when to flush the skb.
- in the end of the function mlx5e_poll_rx_cq the RQ.hw_gro_skb is flushed.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The header buffer is used to store the headers of the rx packets.
The header buffer size deduced from WorkQueue size + restriction
of max packets per WorkQueueElement.
This commit adds the functionality for posting/updating memory for
the header buffer during the posting/updating of WQEs.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This patch adds the new CQE SHAMPO fields:
- flush: indicates that we must close the current session and pass the SKB
to the network stack.
- match: indicates that the current packet matches the oppened session,
the packet will be merge into the current SKB.
- header_size: the size of the packet headers that written into the headers
buffer.
- header_entry_index: the entry index in the headers buffer.
- data_offset: packets data offset in the WQE.
Also new cqe handler is added to handle SHAMPO packets:
- The new handler uses CQE SHAMPO fields to build the SKB.
CQE's Flush and match fields are not used in this patch, packets are not
merged in this patch.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit introduces the control path infrastructure for SHAMPO feature.
SHAMPO feature enables packet stitching by splitting packets to
header and payload, the header is placed on a dedicated buffer
and the payload on the RX ring, this allows stitching the data part
of a flow together continuously in the receive buffer.
SHAMPO feature is implemented as linked list striding RQ feature.
To support packets splitting and payload stitching:
- Enlarge the ICOSQ and the correspond CQ to support the header buffer
memory regions.
- Add support to create linked list striding RQ with SHAMPO feature set
in the open_rq function.
- Add deallocation function and corresponded calls for SHAMPO header
buffer.
- Add mlx5e_create_umr_klm_mkey to support KLM mkey for the header
buffer.
- Rename mlx5e_create_umr_mkey to mlx5e_create_umr_mtt_mkey.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit adds the needed definitions for using the klm_umr_wqe.
UMR stands for user-mode memory registration, is a mechanism to alter
address translation properties of MKEY by posting WorkQueueElement
aka WQE on send queue.
MKEY stands for memory key, MKEY are used to describe a region in memory that
can be later used by HW.
KLM stands for {Key, Length, MemVa}, KLM_MKEY is indirect MKEY that enables
to map multiple memory spaces with different sizes in unified MKEY.
klm_umr_wqe is a UMR that use to update a KLM_MKEY.
SHAMPO feature uses KLM_MKEY for memory registration of his header buffer.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This series introduces new packet merge type, therefore rename lro
functions to packet merge to support the new merge type:
- Generalize + rename mlx5e_build_tir_ctx_lro to
mlx5e_build_tir_ctx_packet_merge.
- Rename mlx5e_modify_tirs_lro to mlx5e_modify_tirs_packet_merge.
- Rename lro bit in mlx5_ifc_modify_tir_bitmask_bits to packet_merge.
- Rename lro_en in mlx5e_params to packet_merge_type type and combine
packet_merge params into one struct mlx5e_packet_merge_param.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit adds SHAMPO bit to hca_cap and SHAMPO capabilities structure,
SHAMPO related HW spec hardware fields and enumerations.
SHAMPO stands for: split headers and merge payload offload.
SHAMPO new fields:
WQ:
- headers_mkey: mkey that represents the headers buffer, where the packets
headers will be written by the HW.
- shampo_enable: flag to verify if the WQ supports SHAMPO feature.
- log_reservation_size: the log of the reservation size where the data of
the packet will be written by the HW.
- log_max_num_of_packets_per_reservation: log of the maximum number of
packets that can be written to the same reservation.
- log_headers_entry_size: log of the header entry size of the headers buffer.
- log_headers_buffer_entry_num: log of the entries number of the headers buffer.
RQ:
- shampo_no_match_alignment_granularity: the HW alignment granularity
in case the received packet doesn't match the current session.
- shampo_match_criteria_type: the type of match criteria.
- reservation_timeout: the maximum time that the HW will hold the
reservation.
mlx5_ifc_shampo_cap_bits, the capabilities of the SHAMPO feature:
- shampo_log_max_reservation_size: the maximum allowed value of the field
WQ.log_reservation_size.
- log_reservation_size: the minimum allowed value of the field
WQ.log_reservation_size.
- shampo_min_mss_size: the minimum payload size of packet that can open
a new session or be merged to a session.
- shampo_max_log_headers_entry_size: the maximum allowed value of the field
WQ.log_headers_entry_size
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
TIR stands for transport interface receive, the TIR object is
responsible for performing all transport related operations on
the receive side like packet processing, demultiplexing the packets
to different RQ's, etc.
lro_timeout is a field in the TIR that is used to set the timeout for lro
session, this series introduces new packet merge type, therefore rename
lro_timeout to packet_merge_timeout for all packet merge types.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add the missing endpoint max-packet sanity check to probe() to avoid
division by zero in lan78xx_tx_bh() in case a malicious device has
broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4f ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: stable@vger.kernel.org # 4.3
Cc: Woojung.Huh@microchip.com <Woojung.Huh@microchip.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the network device supplies a supported interface bitmap, we can use
that during phylink's validation to simplify MAC drivers in two ways by
using the supported_interfaces bitmap to:
1. reject unsupported interfaces before calling into the MAC driver.
2. generate the set of all supported link modes across all supported
interfaces (used mainly for SFP, but also some 10G PHYs.)
Suggested-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFB originally depended on NET_CLS_ACT for traffic redirection.
But since v4.5, that may be achieved with NFT_FWD_NETDEV as well.
Fixes: 39e6dea28a ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <stable@vger.kernel.org> # v4.5+: bcfabee1af: netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang warns:
drivers/net/ethernet/asix/ax88796c_main.c:851:24: error: address of
array 'ax_local->phydev->advertising' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
if (ax_local->phydev->advertising &&
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ~~
advertising cannot be NULL here if ax_local is not NULL, which cannot
happen due to the check in ax88796c_probe(). Remove the check.
Link: https://github.com/ClangBuiltLinux/linux/issues/1492
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang warns:
drivers/net/ethernet/asix/ax88796c_main.c:696:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
case SPEED_10:
^
drivers/net/ethernet/asix/ax88796c_main.c:696:2: note: insert 'break;' to avoid fall-through
case SPEED_10:
^
break;
drivers/net/ethernet/asix/ax88796c_main.c:706:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
case DUPLEX_HALF:
^
drivers/net/ethernet/asix/ax88796c_main.c:706:2: note: insert 'break;' to avoid fall-through
case DUPLEX_HALF:
^
break;
Clang is a little more pedantic than GCC, which permits implicit
fallthroughs to cases that contain just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing breaks to fix
the warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1491
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code doesn't allow setting the number of queues while the
NIC is down.
Update the ethtool handler functions to support setting the number of
queues while the NIC is at down state.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose via devlink-resource the maximum number of RIF MAC profiles and
their current occupancy, so it can be used for debug and writing generic
tests, like in the next patch.
Example for Spectrum-2 output:
$ devlink resource show pci/0000:06:00.0
...
name rif_mac_profiles size 4 occ 0 unit entry dpipe_tables none
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw enforces that all the router interfaces (RIFs) have the
same MAC prefix.
Relax this limitation by using RIF MAC profiles. Each profile is
associated with a particular MAC prefix and multiple RIFs can use the
same profile. Therefore, the number of possible MAC prefixes is no
longer one, but the number of profiles supported by the device.
Store the profiles in an IDR and reference count them according to the
number of RIFs using them.
Associate a RIF with a profile when the RIF is created and remove the
association when the RIF is deleted.
Change the association following 'NETDEV_CHANGEADDR' events, except when
only one RIF is using the profile. In which case, change the MAC prefix
of the profile itself instead of associating the RIF with a new profile.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The next patch will set the MAC profile of a router interface (RIF) as
part of its configure() callback. The operation can fail in case the
maximum number of profiles was exceeded.
Add extack to mlxsw_sp_rif_ops::configure() in order to communicate such
failures to user space.
In addition, the MAC profile of a RIF can change following a
'NETDEV_CHANGEADDR' notification. Propagate extack to
mlxsw_sp_router_port_change_event() so that failures could be
communicated in this path as well.
No functional changes intended.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a resource identifier for maximum RIF MAC profiles so that it could
be later used to query the information from firmware.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MAC profile ID field to RITR register so that it could be used for
associating a RIF with a MAC profile ID by a later patch.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-10-25
This series contains updates to ice driver only.
Dave adds event handler for LAG NETDEV_UNREGISTER to unlink device from
link aggregate.
Yongxin Liu adds a check for PTP support during release which would
cause a call trace on non-PTP supported devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The VRF driver invokes netfilter for output+postrouting hooks so that users
can create rules that check for 'oif $vrf' rather than lower device name.
This is a problem when NAT rules are configured.
To avoid any conntrack involvement in round 1, tag skbs as 'untracked'
to prevent conntrack from picking them up.
This gets cleared before the packet gets handed to the ip stack so
conntrack will be active on the second iteration.
One remaining issue is that a rule like
output ... oif $vrfname notrack
won't propagate to the second round because we can't tell
'notrack set via ruleset' and 'notrack set by vrf driver' apart.
However, this isn't a regression: the 'notrack' removal happens
instead of unconditional nf_reset_ct().
I'd also like to avoid leaking more vrf specific conditionals into the
netfilter infra.
For ingress, conntrack has already been done before the packet makes it
to the vrf driver, with this patch egress does connection tracking with
lower/physical device as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>