Use correct return value on action creation: ACT_P_CREATED.
The use of incorrect return value could result in a situation where the
system thought a ctinfo module was listening but actually wasn't
instantiated correctly leading to an OOPS in tcf_generic_walker().
Confession time: Until very recently, development of this module has
been done on 'net-next' tree to 'clean compile' level with run-time
testing on backports to 4.14 & 4.19 kernels under openwrt. During the
back & forward porting during development & testing, the critical
ACT_P_CREATED return code got missed despite being in the 4.14 & 4.19
backports. I have now gone through the init functions, using act_csum
as reference with a fine toothed comb. Bonus, no more OOPSes. I
managed to also miss this issue till now due to the new strict
nla_parse_nested function failing validation before action creation.
As an inexperienced developer I've learned that
copy/pasting/backporting/forward porting code correctly is hard. If I
ever get to a developer conference I shall don the cone of shame.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.
It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.
[1] https://patchwork.kernel.org/patch/3787671/
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a bare block cipher in non-crypto code is almost always a bad idea,
not only for security reasons (and we've seen some examples of this in
the kernel in the past), but also for performance reasons.
In the TCP fastopen case, we call into the bare AES block cipher one or
two times (depending on whether the connection is IPv4 or IPv6). On most
systems, this results in a call chain such as
crypto_cipher_encrypt_one(ctx, dst, src)
crypto_cipher_crt(tfm)->cit_encrypt_one(crypto_cipher_tfm(tfm), ...);
aesni_encrypt
kernel_fpu_begin();
aesni_enc(ctx, dst, src); // asm routine
kernel_fpu_end();
It is highly unlikely that the use of special AES instructions has a
benefit in this case, especially since we are doing the above twice
for IPv6 connections, instead of using a transform which can process
the entire input in one go.
We could switch to the cbcmac(aes) shash, which would at least get
rid of the duplicated overhead in *some* cases (i.e., today, only
arm64 has an accelerated implementation of cbcmac(aes), while x86 will
end up using the generic cbcmac template wrapping the AES-NI cipher,
which basically ends up doing exactly the above). However, in the given
context, it makes more sense to use a light-weight MAC algorithm that
is more suitable for the purpose at hand, such as SipHash.
Since the output size of SipHash already matches our chosen value for
TCP_FASTOPEN_COOKIE_SIZE, and given that it accepts arbitrary input
sizes, this greatly simplifies the code as well.
NOTE: Server farms backing a single server IP for load balancing purposes
and sharing a single fastopen key will be adversely affected by
this change unless all systems in the pool receive their kernel
upgrades at the same time.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In patch series, commit 9195948fbf ("tipc: improve TIPC throughput by
Gap ACK blocks"), as for simplicity, the repeated retransmit failures'
detection in the function - "tipc_link_retrans()" was kept there for
broadcast retransmissions only.
This commit now reapplies this feature for link unicast retransmissions
that has been done via the function - "tipc_link_advance_transmq()".
Also, the "tipc_link_retrans()" is renamed to "tipc_link_bc_retrans()"
as it is used only for broadcast.
Acked-by: Jon Maloy <jon.maloy@ericsson.se>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like bond, add ethtool get_link_ksettings to show the total speed.
v2: no update, just repost.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One warning each on signedness, unused variable and return type.
Fixes: 10fbcdd12a ("selftests/net: add TFO key rotation selftest")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replaced incorrect hard-coded function-name in error message with
__func__.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EXPORT_SYMBOL for lapb_register was next to a different function.
Moved it to the right place.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Added index upper bound test case
- Added mark upper bound test case
- Re-worded descriptions to few cases for clarity
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This macro $IP will be used in upcoming tc tests, which require
to create interfaces etc.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While data pass suspend, reuse of rx descriptors can be disabled using
channel state & lock from cpdma layer. For this, submit to a channel
has to be disabled using state != "not active" under lock, what is done
with this patch. The same submit is used to fill rx channel while
ndo_open, when channel is idled, so add idled submit routine that
allows to prepare descs for the channel. All this simplifies code and
helps to avoid dormant mode usage and send packets only to active
channels, avoiding potential race in later on changes. Also add missed
sync barrier analogically like in other places after stopping tx
queues.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl says:
====================
stmmac: cleanups for stmmac_mdio_reset
This is a successor to my previous series "stmmac: honor the GPIO flags
for the PHY reset GPIO" from [0]. It contains only the "cleanup"
patches from that series plus some additional cleanups on top.
I broke out the actual GPIO flag handling into a separate patch which
is already part of net-next: "net: stmmac: use GPIO descriptors in
stmmac_mdio_reset" from [1]
I have build and runtime tested this on my ARM Meson8b Odroid-C1.
[0] https://patchwork.kernel.org/cover/10983801/
[1] https://patchwork.ozlabs.org/patch/1114798/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The phy_reset hook is not set anywhere. Drop it to make
stmmac_mdio_reset() smaller.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only OF platforms use the reset delays and these delays are only read in
stmmac_mdio_reset(). Move them from struct stmmac_mdio_bus_data to a
stack variable inside stmmac_mdio_reset() because that's the only usage
of these delays.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No platform uses the "reset_gpio" field from stmmac_mdio_bus_data
anymore. Drop it so we don't get any new consumers either.
Plain GPIO numbers are being deprecated in favor of GPIO descriptors. If
needed any new non-OF platform can add a GPIO descriptor lookup table.
devm_gpiod_get_optional() will find the GPIO in that case.
Suggested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change stmmac_mdio_reset() to use device_property_read_u32_array()
instead of of_property_read_u32_array().
This is meant as a cleanup because we can drop the struct device_node
variable. Also it will make it easier to get rid of struct
stmmac_mdio_bus_data (or at least make it private) in the future because
non-OF platforms can now pass the reset delays as device properties.
No functional changes (neither for OF platforms nor for ones that are
not using OF, because the modified code is still contained in an "if
(priv->device->of_node)").
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A simplified version of the existing code looks like this:
if (priv->device->of_node) {
struct device_node *np = priv->device->of_node;
if (!np)
return 0;
The second "if" never evaluates to true because the first "if" checks
for exactly the opposite.
Drop the redundant check and early return to make the code easier to
understand.
No functional changes intended.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This config option makes only couple of lines optional.
Two small helpers and an int in couple of cls structs.
Remove the config option and always compile this in.
This saves the user from unexpected surprises when he adds
a filter with ingress device match which is silently ignored
in case the config option is not set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Abit Fatal1ty F-190HD has a PCI ID quirk and the entry marks this
board as not GBit-capable, what is wrong. According to [0] the board
has a RTL8111B that is GBit-capable, therefore remove the
RTL_CFG_NO_GBIT flag.
[0] https://www.centos.org/forums/viewtopic.php?t=23390
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of PHYLINK conversion we stopped parsing the phy-handle property
from DT. Unfortunatelly, some wrapper drivers still rely on this phy
node to configure the PHY.
Let's restore the parsing of PHY handle while these wrapper drivers are
not fully converted to PHYLINK.
Fixes: 74371272f9 ("net: stmmac: Convert to phylink and remove phylib logic")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu says:
====================
Reuse ptp_qoriq driver for dpaa2-ptp
Although dpaa2-ptp.c driver is a fsl_mc_driver which
is using MC APIs for register accessing, it's same IP
block with eTSEC/DPAA/ENETC 1588 timer.
This patch-set is to convert to reuse ptp_qoriq driver by
using register ioremap and dropping related MC APIs.
However the interrupts could only be handled by MC which
fires MSIs to ARM cores. So the interrupt enabling and
handling still rely on MC APIs. MC APIs for interrupt
and PPS event support are also added by this patch-set.
---
Changes for v2:
- Allowed to compile with COMPILE_TEST.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add interrupt support for dpaa2 ptp clock,
including MC APIs and PPS interrupt support. Other events
haven't been supported in MC by now.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add ptp timer device tree node for dpaa2
platforms(ls1088a/ls208xa/lx2160a).
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although dpaa2-ptp.c driver is a fsl_mc_driver which
is using MC APIs for register accessing, it's same IP
block with eTSEC/DPAA/ENETC 1588 timer.
This patch is to convert to reuse ptp_qoriq driver by
using register ioremap and dropping related MC APIs.
However the interrupts could only be handled by MC which
fires MSIs to ARM cores. So the interrupt enabling and
handling still rely on MC APIs.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add QorIQ PTP support for DPAA2.
Although dpaa2-ptp.c driver is a fsl_mc_driver which
is using MC APIs for register accessing, it's same
IP block with eTSEC/DPAA/ENETC 1588 timer. We will
convert to reuse ptp_qoriq driver by using register
ioremap and dropping related MC APIs.
Also allow to compile ptp_qoriq with COMPILE_TEST.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'devm_kasprintf' is less verbose than:
snprintf(NULL, 0, ...);
devm_kzalloc(...);
sprintf
so use it instead.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA ports must flood unknown unicast and multicast, but the switch
must not flood the CPU ports with unknown multicast, as this results
in a lot of undesirable traffic that the network stack needs to filter
in software.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot says:
====================
net: dsa: use switchdev attr and obj handlers
This series reduces boilerplate in the handling of switchdev attribute and
object operations by using the switchdev_handle_* helpers, which check the
targeted devices and recurse into their lower devices.
This also brings back the ability to inspect operations targeting the bridge
device itself (where .orig_dev and .dev were originally the bridge device),
even though that is of no use yet and skipped by this series.
Changes in v2: Only VLAN and (non-host) MDB objects not directly targeting
the slave device are unsupported at the moment, so only skip these cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the dsa_slave_switchdev_port_{attr_set,obj}_event functions
in favor of the switchdev_handle_port_{attr_set,obj_add,obj_del}
helpers which recurse into the lower devices of the target interface.
This has the benefit of being aware of the operations made on the
bridge device itself, where orig_dev is the bridge, and dev is the
slave. This can be used later to configure the hardware switches.
Only VLAN and (port) MDB objects not directly targeting the slave
device are unsupported at the moment, so skip this case in their
respective case statements.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switchdev handle helpers make use of a device checking helper
requiring a const net_device. Make dsa_slave_dev_check compliant
to this.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A port may trigger operations on its dedicated CPU port, so using
cpu_dp as const will raise warnings. Make cpu_dp non const.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current DSA code handling switchdev objects does not recurse into
the lower devices thus is never called with an orig_dev member being
a bridge device, hence remove this useless check.
At the same time, remove the comments about the callers, which is
unlikely to be updated if the code changes and thus will be confusing.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2019-06-14
This series contains updates to i40e only.
Aleksandr adds stub functions for Energy Efficient Ethernet (EEE) to
currently report that it is not supported in i40e. Fixed up the Link
Layer Detection Protocol (LLDP) code to ensure we do not set the LLDP
flag too early before we ensure that we have a successful start. This
also will prevent needles restarting of the device if LLDP did not
change its state with an unsuccessful start.
Piotr bumps up the amount of VLANs that an untrusted VF can implement,
from 8 VLANs to 16. Adds checks to the Virtual Embedded Bridge (VEB)
and channel arrays so access does not exceed the boundary and ensure the
index is below the maximum. Fixed an issue in the driver where we were
not checking the response from the LLDP flag and were returned success
no matter what the value of the response was.
Mitch fixes a variable counter, which can be negative in value so make
it an integer instead of an unsigned-integer.
Doug improves the admin queue log granularity by making it possible to
log only the admin queue descriptors without the entire admin queue
message buffers.
Sergey fixes up the virtchnl code by removing duplicate checks, ensure
the variable type is correct when comparing integers, enhance error and
warning messages to include useful information.
Adam fixes a potential kernel panic when the i40e driver was being bound
to a non-i40e port by adding a check on the BAR size to ensure it is
large enough by reading the highest register.
Jake fixes a statistics error in the "transmit errors" stat, which was
being calculated twice.
Gustavo A. R. Silva adds a fall-through code comment to help with
compiler checks.
v2: Fixed the return values wrapped in parenthesis in patch 8 and
cleaned up the commit message in patch 12 so the Gustavo does
not repeat himself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This was originally passed through to the VRF logic in compute_score().
But that logic has now been replaced by udp_sk_bound_dev_eq() and so
this code is no longer used or needed.
Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally this was used by the VRF logic in compute_score(), but that
was later replaced by udp_sk_bound_dev_eq() and the parameter became
unused.
Note this change adds an 'unused variable' compiler warning that will be
removed in the next patch (I've split the removal in two to make review
slightly easier).
Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we want to set a EDT time for the skb we want to send
via ip_send_unicast_reply(), we have to pass a new parameter
and initialize ipc.sockc.transmit_time with it.
This fixes the EDT time for ACK/RST packets sent on behalf of
a TIME_WAIT socket.
Fixes: a842fe1425 ("tcp: add optional per socket transmit delay")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pointer members of an object with static storage duration, if not
explicitly initialized, will be initialized to a NULL pointer. The
net namespace API checks if this pointer is not NULL before using it,
it are safe to remove the function.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: flower: loosen L4 checks and add extack to flower offload
Pieter says:
This set allows the offload of filters that make use of an unknown
ip protocol, given that layer 4 is being wildcarded. The set then
aims to make use of extack messaging for flower offloads. It adds
about 70 extack messages to the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use extack messages in flower offload when compiling match and actions
messages that will configure hardware.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use extack messages in flower offload, specifically focusing on
the extack use in add offload, remove offload and get stats paths.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matching on fields with a protocol that is unknown to hardware
is not strictly unsupported. Determine if hardware can offload
a filter with an unknown protocol by checking if any L4 fields
are being matched as well.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mlx5 devlink health fw reporters and sw reset support
This series provides mlx5 firmware reset support and firmware devlink health
reporters.
1) Add initial mlx5 kernel documentation and include devlink health reporters
2) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot
3) Issue software reset upon FW asserts
4) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by
dump and recover procedures and enable trigger these functionality by user.
4.1) fw reporter:
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it and any other fw trace into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.
4.2) fw_fatal repoter:
The fw_fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors. The
CR-space dump is stored as a memory region snapshot to ease read by address.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0CsLgACgkQSD+KveBX
+j7mFwf+MYvIbUO4mXyoZIezci1UCzt1vNAkUYPceE94O9fK68ItrwtwrstgIqqS
58Tgx//MXxPpe9k9NIWjeS3i8sjcb8fDoqkjOCj7KAchv0IhSUvYFRpBrUK+yTOW
NIIXZzuCgIoR9a/hVlT/lhG+dm4MX2L5dWFtORLxMoO+ff3yiy4nNf9+Zdt0H7LT
YCELWnKeIQCvdzJAxX7OyTh3eOfc/h7o1nOsU4VugBHxKxx4T+9A26d+cZeZH5Ox
3ikTCc01ivVHqcLydAy96HQu0MENSNYNpmyDxWum3oJGFFu6hBQTM2ueRmVWZfwH
DRu+hhxONZROxxtpmP/ULmwYcLnBHg==
=VhXt
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-06-13
Mlx5 devlink health fw reporters and sw reset support
This series provides mlx5 firmware reset support and firmware devlink health
reporters.
1) Add initial mlx5 kernel documentation and include devlink health reporters
2) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot
3) Issue software reset upon FW asserts
4) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by
dump and recover procedures and enable trigger these functionality by user.
4.1) fw reporter:
The fw reporter implements diagnose and dump callbacks.
It follows symptoms of fw error such as fw syndrome by triggering
fw core dump and storing it and any other fw trace into the dump buffer.
The fw reporter diagnose command can be triggered any time by the user to check
current fw status.
4.2) fw_fatal repoter:
The fw_fatal reporter implements dump and recover callbacks.
It follows fatal errors indications by CR-space dump and recover flow.
The CR-space dump uses vsc interface which is valid even if the FW command
interface is not functional, which is the case in most FW fatal errors. The
CR-space dump is stored as a memory region snapshot to ease read by address.
The recover function runs recover flow which reloads the driver and triggers fw
reset if needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Multipath hash policy value of 0 isn't distributing since the outer IP
dest and src aren't varied eventhough the inner ones are. Since the flow
is on the inner ones in the case of tunneled traffic, hashing on them is
desired.
This is done mainly for IP over GRE, hence only tested for that. But
anything else supported by flow dissection should work.
v2: Use skb_flow_dissect_flow_keys() directly so that other tunneling
can be supported through flow dissection (per Nikolay Aleksandrov).
v3: Remove accidental inclusion of ports in the hash keys and clarify
the documentation (Nikolay Alexandrov).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI tx mode improves TCP behavior by enabling TCP small queues (TSQ).
TSQ reduces queuing ("bufferbloat") and burstiness.
Previous measurements have shown significant improvement for
TCP_STREAM style workloads. Such as those in commit 86a5df1495
("Merge branch 'virtio-net-tx-napi'").
There has been uncertainty about smaller possible regressions in
latency due to increased reliance on tx interrupts.
The above results did not show that, nor did I observe this when
rerunning TCP_RR on Linux 5.1 this week on a pair of guests in the
same rack. This may be subject to other settings, notably interrupt
coalescing.
In the unlikely case of regression, we have landed a credible runtime
solution. Ethtool can configure it with -C tx-frames [0|1] as of
commit 0c465be183 ("virtio_net: ethtool tx napi configuration").
NAPI tx mode has been the default in Google Container-Optimized OS
(COS) for over half a year, as of release M70 in October 2018,
without any negative reports.
Link: https://marc.info/?l=linux-netdev&m=149305618416472
Link: https://lwn.net/Articles/507065/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To remove rtnl lock dependency in tc filter update API when using clsact
Qdisc, set QDISC_CLASS_OPS_DOIT_UNLOCKED flag in clsact Qdisc_class_ops.
Clsact Qdisc ops don't require any modifications to be used without rtnl
lock on tc filter update path. Implementation never changes its q->block
and only releases it when Qdisc is being destroyed. This means it is enough
for RTM_{NEWTFILTER|DELTFILTER|GETTFILTER} message handlers to hold clsact
Qdisc reference while using it without relying on rtnl lock protection.
Unlocked Qdisc ops support is already implemented in filter update path by
unlocked cls API patch set.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
enable and use static_branch_deferred_inc
1. make static_branch_deferred_inc available if !CONFIG_JUMP_LABEL
2. convert the existing STATIC_KEY_DEFERRED_FALSE user to this api
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Deferred static key clean_acked_data_enabled uses the deferred
variants of dec and flush. Do the same for inc.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>