John reports:
Recent refactoring of fl_change aims to use the classifier spinlock to
avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer()
function was moved to before the lock is taken. This can create problems
for drivers if duplicate filters are created (commmon in ovs tc offload
due to filters being triggered by user-space matches).
Drivers registered for such filters will now receive multiple copies of
the same rule, each with a different cookie value. This means that the
drivers would need to do a full match field lookup to determine
duplicates, repeating work that will happen in flower __fl_lookup().
Currently, drivers do not expect to receive duplicate filters.
To fix this, verify that filter with same key is not present in flower
classifier hash table and insert the new filter to the flower hash table
before offloading it to hardware. Implement helper function
fl_ht_insert_unique() to atomically verify/insert a filter.
This change makes filter visible to fast path at the beginning of
fl_change() function, which means it can no longer be freed directly in
case of error. Refactor fl_change() error handling code to deallocate the
filter with rcu timeout.
Fixes: 620da48608 ("net: sched: flower: refactor fl_change")
Reported-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown says:
====================
Convert rhashtable to use bitlocks
This series converts rhashtable to use a per-bucket bitlock
rather than a separate array of spinlocks.
This:
reduces memory usage
results in slightly fewer memory accesses
slightly improves parallelism
makes a configuration option unnecessary
The main change from previous version is to use a distinct type for
the pointer in the bucket which has a bit-lock in it. This
helped find two places where rht_ptr() was missed, one
in rhashtable_free_and_destroy() in print_ht in the test code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Native bit_spin_locks are not tracked by lockdep.
The bit_spin_locks used for rhashtable buckets are local
to the rhashtable implementation, so there is little opportunity
for the sort of misuse that lockdep might detect.
However locks are held while a hash function or compare
function is called, and if one of these took a lock,
a misbehaviour is possible.
As it is quite easy to add lockdep support this unlikely
possibility seems to be enough justification.
So create a lockdep class for bucket bit_spin_lock and attach
through a lockdep_map in each bucket_table.
Without the 'nested' annotation in rhashtable_rehash_one(), lockdep
correctly reports a possible problem as this lock is taken
while another bucket lock (in another table) is held. This
confirms that the added support works.
With the correct nested annotation in place, lockdep reports
no problems.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.
The benefits of a bit spin_lock are:
- no need to allocate a separate array of locks.
- no need to have a configuration option to guide the
choice of the size of this array
- locking cost is often a single test-and-set in a cache line
that will have to be loaded anyway. When inserting at, or removing
from, the head of the chain, the unlock is free - writing the new
address in the bucket head implicitly clears the lock bit.
For __rhashtable_insert_fast() we ensure this always happens
when adding a new key.
- even when lockings costs 2 updates (lock and unlock), they are
in a cacheline that needs to be read anyway.
The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.
Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved. This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.
As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.
To enhance type checking, a new struct is introduced to represent the
pointer plus lock-bit
that is stored in the bucket-table. This is "struct rhash_lock_head"
and is empty. A pointer to this needs to be cast to either an
unsigned lock, or a "struct rhash_head *" to be useful.
Variables of this type are most often called "bkt".
Previously "pprev" would sometimes point to a bucket, and sometimes a
->next pointer in an rhash_head. As these are now different types,
pprev is NULL when it would have pointed to the bucket. In that case,
'blk' is used, together with correct locking protocol.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than returning a pointer to a static nulls, rht_bucket_var()
now returns NULL if the bucket doesn't exist.
This will make the next patch, which stores a bitlock in the
bucket pointer, somewhat cleaner.
This change involves introducing __rht_bucket_nested() which is
like rht_bucket_nested(), but doesn't provide the static nulls,
and changing rht_bucket_nested() to call this and possible
provide a static nulls - as is still needed for the non-var case.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nested_table_alloc() relies on the fact that there is
at most one spinlock allocated for every slot in the top
level nested table, so it is not possible for two threads
to try to allocate the same table at the same time.
This assumption is a little fragile (it is not explicit) and is
unnecessary as cmpxchg() can be used instead.
A future patch will replace the spinlocks by per-bucket bitlocks,
and then we won't be able to protect the slot pointer with a spinlock.
So replace rcu_assign_pointer() with cmpxchg() - which has equivalent
barrier properties.
If it the cmp fails, free the table that was just allocated.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Murali Karicheri says:
====================
net: hsr: improvements and bug fixes
This series has some coding style fixes and other bug fixes.
Patch 12/14, I have also done SPDX conversion. Not sure if
that is the only thing needed and is correct. So please pay
close attention to this patch before merge as I would like to
avoid any issue related to licensing applicable for this code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
HSR should forget nodes after configured node forget time expiry based
on HSR_NODE_FORGET_TIME. As part of hsr_prune_nodes(), code checks to
see if entries are to be flushed out if not heard for longer than forget
time. But currently hsr_prune_nodes() is called only once during device
creation. Restart the timer at the end of hsr_prune_nodes() so that
hsr_prune_nodes() gets called periodically and forgotten entries are
removed from node table.
Signed-off-by: Aaron Kramer <a-kramer@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a debugfs interface to allow display the nodes learned
by the hsr master.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use SPDX-License-Identifier instead of a verbose license text.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a blank line after function declaration as suggested by
checkpatch.pl -f
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current driver code uses camel case in many places. This is
seen when ran checkpatch.pl -f on files under net/hsr. This
patch fixes the code to remove camel case usage.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add missing space around operator in code. This is
seen when ran checkpatch.pl -f on files under net/hsr.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a multi-line statement exceeding 80 characters, logical operator
should be at the end of a line instead of being at the start. This
is seen when ran checkpatch.pl -f on files under net/hsr. The change
is per suggestion from checkpatch.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes unnecessary space after a cast. This is seen
when ran checkpatch.pl -f on files under net/hsr.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces all instance of NULL checks such as
if (foo == NULL) with if (!foo)
Also
if (foo != NULL) with if (foo)
This is seen when ran checkpatch.pl -f on files under net/hsr
and suggestion is to replace as above.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes function calls that ends with '(' in a line.
This is seen when ran checkpatch.pl -f option on files under
net/hsr.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes alignment issues in code for functions. This is
seen when ran checkpatch.pl -f option on files under net/hsr.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes unnecessary paranthesis from the code. This is
seen when ran checkpatch.pl -f option on files under net/hsr.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes multiple blank lines in the code. This is seen
when ran checkpatch.pl -f option for files under net/hsr
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes lines exceeding 80 characters. This is seen
when ran checkpatch.pl with -f option for files under
net/hsr.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test is split in two, the first part checks if a report creates a
corresponding mdb entry and if traffic is properly forwarded to it, and
the second part checks if the mdb entry is deleted after a leave and
if traffic is *not* forwarded to it. Since the mcast querier is enabled
we should see standard mcast snooping bridge behaviour.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mskid is unsigned and hence comparisons with negative
error return values are always false. Fix this by making mskid an
int.
Fixes: f058e46855 ("net: hns: fix ICMP6 neighbor solicitation messages discard problem")
Addresses-Coverity: ("Operands don't affect result")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mario Limonciello says:
====================
r8152: Support runtime changes of vendor mac passthu policy
On some platforms ACPI method `\\_SB.AMAC` is dynamic and changes to it can
influence changing the behavior of MAC pass through and what MAC address is used.
When running USB reset, re-read the MAC address to use to support tools that
change the policy.
This is quite similar to using `SIOCSIFHWADDR` except that the actual MAC to use
comes from ASL rather than from userspace.
Changes from v1:
* Remove an extra unneeded `ether_addr_copy` call
* Use `dev_set_mac_address` to ensure all notifiers are called
* Shuffle functions to allow code re-use.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On some platforms it is possible to dynamically change the policy
of what MAC address is selected from the ASL at runtime.
These tools will reset the USB device and expect the change to be
made immediately.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This already happens later on in `rtl8152_set_mac_address`
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The non-null check on tskb is always false because it is in an else
path of a check on tskb and hence tskb is null in this code block.
This is check is therefore redundant and can be removed as well
as the label coalesc.
if (tsbk) {
...
} else {
...
if (unlikely(!skb)) {
if (tskb) /* can never be true, redundant code */
goto coalesc;
return;
}
}
Addresses-Coverity: ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jerome Brunet says:
====================
net: phy: add Amlogic g12a support
This patchset adds the necessary bits to support network on the Amlogic
g12a SoC family.
Only the internal PHY and related MDIO mux needed to be addressed.
The GMAC remains compatible with axg SoC family
This series has been tested on the u200 (S905D2) with both the internal
and external (Realtek) PHYs.
Change since v2 [1]:
* Change 'clk part' Reviewed-by as suggested
* Remove default callback from phy drivers
* Use exact match PHY macros
* Default MDIO g12a as module if ARCH_MESON is enabled
* Don't print error on probe defer in the g12a mdio mux
Change since v1 [0]:
* drop '_' from function name unrelated to locking
* fix peripheral clock disable on error
* fix variable declaration reverse Xmas trees
* fix Kconfig dependency on CCF
(Actually needed for 'struct clk_hw', Thx Andrew !)
* Minor fix in the DT exemple as reported by Rob
[0] https://lkml.kernel.org/r/20190314140135.19184-1-jbrunet@baylibre.com
[1] https://lkml.kernel.org/r/20190329141512.29867-1-jbrunet@baylibre.com
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose of this change is to align the gxl and g12a driver
declaration.
Like on the g12a variant, remove genphy_aneg_done() from the driver
declaration as the net phy framework will default to it anyway.
Also, the gxl phy id should be an exact match as well, so let's change
this and use the macro provided.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The g12a SoC family uses the type of internal PHY that was used on the
gxl family. The quirks of gxl family, like the LPA register corruption,
appear to have been resolved on this new SoC generation.
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the mdio mux and internal phy glue of the g12a SoC family
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> # clk parts
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add documentation for the device tree bindings of the MDIO mux of Amlogic
g12a SoC family
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In contrast to switching rx irq coalescing off what fixed an issue,
switching tx irq coalescing off is merely a latency optimization,
therefore net-next. As part of this change:
- Remove INTT_0 .. INTT_3 constants, they aren't used.
- Remove comment in rtl_hw_start_8169(), we now have a detailed
description by the code in rtl_set_coalesce().
- Due to switching irq coalescing off per default we don't need the
initialization in rtl_hw_start_8168(). If ethtool is used to switch
on coalescing then rtl_set_coalesce() will configure this register.
For the sake of completeness: This patch just changes the default.
Users still have the option to configure irq coalescing via ethtool.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor comment merge conflict in mlx5.
Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
The merge window for 5.1 introduced a number of compaction-related patches.
with intermittent reports of corruption and functional issues. The bugs
are due to sloopy checking of zone boundaries and a corner case where
invalid indexes are used to access the free lists.
Reports are not common but at least two users and 0-day have tripped
over them. There is a chance that one of the syzbot reports are related
but it has not been confirmed properly.
The normal submission path is with Andrew but there have been some delays
and I consider them urgent enough that they should be picked up before
RC4 to avoid duplicate reports.
All of these have been successfully tested on older RC windows. This
will make this branch look like a rebase but in fact, they've simply
been lifted again from Andrew's tree and placed on a fresh branch. I've
no reason to believe that this has invalidated the testing given the
lack of change in compaction and the nature of the fixes.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
-----BEGIN PGP SIGNATURE-----
iQJQBAABCAA6FiEECg+MacO/Fijve9AMfMb8M0SyR+IFAlynB+gcHG1nb3JtYW5A
dGVjaHNpbmd1bGFyaXR5Lm5ldAAKCRB8xvwzRLJH4iW4EACw2vxxFTBUr1R3Yr8a
YsRLT9nJvpebX98hyLG3Mhcv2c2haW+LC/H4T2UP1vW6zNH7Co738QcraH1xVPfr
J3xJWSihXSAwLPHJQpJWd3pdWmmboIVUnYR/Cn/TiE2vukyFS61aRo/FEctJx2Hm
cdkRh/j4HPtO7U4g0Lg6t1GYa40wy+1ahP+PTHpdIjadxnu+AGKrXgi+oysJJ29F
QBBH2/5Lv8feaOyBRlWycBYHq7to/EsN+Xr0shsMT+AVcOvPZ2hi9GCdTRiDvHiJ
c0cXftyxm8BYqKcPDeLOdQGtnDkDRCFrZI7hzHEvbUKdodVD5BZIet8HeblnUlMt
TtgWenwo02MY3t9zSDSxBtzj8EimNQFIGJz7kZseS56leskvhXUXuDyUhT3O5I8R
DBSX8DRJMhri2yGImC57RRtLGO/zbzHMrpB8ogpsqlo4UbMwNNOuL4yLAJ9XCcmV
jjuakK08dr6v02NeOc/B2z5mkm4gQe0jvJyxMaWJHWORJEwrRu9ie8Af3yKBlP3Y
dpX0R3XLtlOJdsFksDGm+24ftogN3l+0r/qVi7E8uwCvUeS9nvwY9ffNFiLHwMAg
BewwduX2MWT3hAVzNprFmUTMaKGUjdpgwdxUt68qcmWEzlF2KcHbTpRv4+lxzx/A
msO3x+upkskoFOBCRdvxq+AaKg==
=WJCV
-----END PGP SIGNATURE-----
Merge tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux
Pull mm/compaction fixes from Mel Gorman:
"The merge window for 5.1 introduced a number of compaction-related
patches. with intermittent reports of corruption and functional
issues. The bugs are due to sloopy checking of zone boundaries and a
corner case where invalid indexes are used to access the free lists.
Reports are not common but at least two users and 0-day have tripped
over them. There is a chance that one of the syzbot reports are
related but it has not been confirmed properly.
The normal submission path is with Andrew but there have been some
delays and I consider them urgent enough that they should be picked up
before RC4 to avoid duplicate reports.
All of these have been successfully tested on older RC windows. This
will make this branch look like a rebase but in fact, they've simply
been lifted again from Andrew's tree and placed on a fresh branch.
I've no reason to believe that this has invalidated the testing given
the lack of change in compaction and the nature of the fixes"
* tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux:
mm/compaction.c: abort search if isolation fails
mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints
The n_r3964 line discipline driver was written in a different time, when
SMP machines were rare, and users were trusted to do the right thing.
Since then, the world has moved on but not this code, it has stayed
rooted in the past with its lovely hand-crafted list structures and
loads of "interesting" race conditions all over the place.
After attempting to clean up most of the issues, I just gave up and am
now marking the driver as BROKEN so that hopefully someone who has this
hardware will show up out of the woodwork (I know you are out there!)
and will help with debugging a raft of changes that I had laying around
for the code, but was too afraid to commit as odds are they would break
things.
Many thanks to Jann and Linus for pointing out the initial problems in
this codebase, as well as many reviews of my attempts to fix the issues.
It was a case of whack-a-mole, and as you can see, the mole won.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcprrGAAoJEAx081l5xIa+WdUP/2zoJO9vyB3NxsJ35vH4HQjV
uOb/V2GNmUKSuVnGNp1Z/dDJdXm9zjJF78a/UzMWlXs+lJzq+7zUZF6hj9scy/5Q
29t0Ks3fzpa5q0Lz2eOMvsDQDDcRZjtCD8xU73qZRdxnT0Ov1IayT15W+Gzf7HmJ
WX4kK31uPKaJvmOt8RmxQkhvVJSD9lfEzTgijx/joQlPlCZ7ndml2d3dqlLmUk1X
ylv4W6NBo24ErtNQPhFAZ/AC6sgweAKD+b560H1X6Blen0TJ0Xrr1afsHUCBh/nj
G1uDjyp3252evuJgZEuUQb/GjhspJ5FWdG2Imn06+QbqD/N3CynZT2vTChFZ/6Hs
C0ZMHoAc6Kx/vx/Rd4oO6VbgsqMRqkSRXshgsUb4QWwZ96pozZrFvoMKWJrJprjH
LAfEqX5Cnl7IG2a9xLTuzNY6qhCv/85YNPnJj6f22HitS+dhWdtvV7nTq7r01lds
7DsI5xwfnCEY4b4xnccWj4zLjvH19DK+y9xtgDnWGg5zVa4MQ6GopyTPvurH8yHR
AyYhoMvTSpfjGcIuj1/Vj4GNT+l2A0AKHRLa8/BICd1QatFomngVvvN2AnDUnm1I
kkBnVVS/o4FKoTWMXX7vajjOKP9hot1unyQN2xSLABPffqfBzoLu4riO2bwPa1lv
jfCKGlRXyJEtaNL1ibz2
=tSY4
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Pretty quiet week, just some amdgpu and i915 fixes.
i915:
- deadlock fix
- gvt fixes
amdgpu:
- PCIE dpm feature fix
- Powerplay fixes"
* tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm:
drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplug
drm/i915/gvt: Correct the calculation of plane size
drm/amdgpu: remove unnecessary rlc reset function on gfx9
drm/i915: Always backoff after a drm_modeset_lock() deadlock
drm/i915/gvt: do not let pin count of shadow mm go negative
drm/i915/gvt: do not deliver a workload if its creation fails
drm/amd/display: VBIOS can't be light up HDMI when restart system
drm/amd/powerplay: fix possible hang with 3+ 4K monitors
drm/amd/powerplay: correct data type to avoid overflow
drm/amd/powerplay: add ECC feature bit
drm/amd/amdgpu: fix PCIe dpm feature issue (v3)
Pull networking fixes from David Miller:
1) Several hash table refcount fixes in batman-adv, from Sven
Eckelmann.
2) Use after free in bpf_evict_inode(), from Daniel Borkmann.
3) Fix mdio bus registration in ixgbe, from Ivan Vecera.
4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.
5) ila rhashtable corruption fix from Herbert Xu.
6) Don't allow upper-devices to be added to vrf devices, from Sabrina
Dubroca.
7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.
8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
from Alexander Lobakin.
9) Missing IDR checks in mlx5 driver, from Aditya Pakki.
10) Fix false connection termination in ktls, from Jakub Kicinski.
11) Work around some ASPM issues with r8169 by disabling rx interrupt
coalescing on certain chips. From Heiner Kallweit.
12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
Abeni.
13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.
14) Various BPF flow dissector fixes from Stanislav Fomichev.
15) Divide by zero in act_sample, from Davide Caratti.
16) Fix bridging multicast regression introduced by rhashtable
conversion, from Nikolay Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
ibmvnic: Fix completion structure initialization
ipv6: sit: reset ip header pointer in ipip6_rcv
net: bridge: always clear mcast matching struct on reports and leaves
libcxgb: fix incorrect ppmax calculation
vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
sch_cake: Make sure we can write the IP header before changing DSCP bits
sch_cake: Use tc_skb_protocol() helper for getting packet protocol
tcp: Ensure DCTCP reacts to losses
net/sched: act_sample: fix divide by zero in the traffic path
net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
net: hns: Fix sparse: some warnings in HNS drivers
net: hns: Fix WARNING when remove HNS driver with SMMU enabled
net: hns: fix ICMP6 neighbor solicitation messages discard problem
net: hns: Fix probabilistic memory overwrite when HNS driver initialized
net: hns: Use NAPI_POLL_WEIGHT for hns driver
net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
flow_dissector: rst'ify documentation
ipv6: Fix dangling pointer when ipv6 fragment
net-gro: Fix GRO flush when receiving a GSO packet.
flow_dissector: document BPF flow dissector environment
...
Tuong Lien says:
====================
tipc: improve TIPC unicast link throughput
The series introduces an algorithm to improve TIPC throughput especially
in terms of packet loss, also tries to reduce packet duplication due to
overactive NACK sending mechanism.
The link failover situation is also covered by the patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 0ae955e2656d ("tipc: improve TIPC throughput by Gap ACK
blocks"), we enhance the link transmq by releasing as many packets as
possible with the multi-ACKs from peer node. This also means the queue
is now non-linear and the peer link deferdq becomes vital.
Whereas, in the case of link failover, all messages in the link transmq
need to be transmitted as tunnel messages in such a way that message
sequentiality and cardinality per sender is preserved. This requires us
to maintain the link deferdq somehow, so that when the tunnel messages
arrive, the inner user messages along with the ones in the deferdq will
be delivered to upper layer correctly.
The commit accomplishes this by defining a new queue in the TIPC link
structure to hold the old link deferdq when link failover happens and
process it upon receipt of tunnel messages.
Also, in the case of link syncing, the link deferdq will not be purged
to avoid unnecessary retransmissions that in the worst case will fail
because the packets might have been freed on the sending side.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
For unicast transmission, the current NACK sending althorithm is over-
active that forces the sending side to retransmit a packet that is not
really lost but just arrived at the receiving side with some delay, or
even retransmit same packets that have already been retransmitted
before. As a result, many duplicates are observed also under normal
condition, ie. without packet loss.
One example case is: node1 transmits 1 2 3 4 10 5 6 7 8 9, when node2
receives packet #10, it puts into the deferdq. When the packet #5 comes
it sends NACK with gap [6 - 9]. However, shortly after that, when
packet #6 arrives, it pulls out packet #10 from the deferfq, but it is
still out of order, so it makes another NACK with gap [7 - 9] and so on
... Finally, node1 has to retransmit the packets 5 6 7 8 9 a number of
times, but in fact all the packets are not lost at all, so duplicates!
This commit reduces duplicates by changing the condition to send NACK,
also restricting the retransmissions on individual packets via a timer
of about 1ms. However, it also needs to say that too tricky condition
for NACKs or too long timeout value for retransmissions will result in
performance reducing! The criterias in this commit are found to be
effective for both the requirements to reduce duplicates but not affect
performance.
The tipc_link_rcv() is also improved to only dequeue skb from the link
deferdq if it is expected (ie. its seqno <= rcv_nxt).
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
During unicast link transmission, it's observed very often that because
of one or a few lost/dis-ordered packets, the sending side will fastly
reach the send window limit and must wait for the packets to be arrived
at the receiving side or in the worst case, a retransmission must be
done first. The sending side cannot release a lot of subsequent packets
in its transmq even though all of them might have already been received
by the receiving side.
That is, one or two packets dis-ordered/lost and dozens of packets have
to wait, this obviously reduces the overall throughput!
This commit introduces an algorithm to overcome this by using "Gap ACK
blocks". Basically, a Gap ACK block will consist of <ack, gap> numbers
that describes the link deferdq where packets have been got by the
receiving side but with gaps, for example:
link deferdq: [1 2 3 4 10 11 13 14 15 20]
--> Gap ACK blocks: <4, 5>, <11, 1>, <15, 4>, <20, 0>
The Gap ACK blocks will be sent to the sending side along with the
traditional ACK or NACK message. Immediately when receiving the message
the sending side will now not only release from its transmq the packets
ack-ed by the ACK but also by the Gap ACK blocks! So, more packets can
be enqueued and transmitted.
In addition, the sending side can now do "multi-retransmissions"
according to the Gaps reported in the Gap ACK blocks.
The new algorithm as verified helps greatly improve the TIPC throughput
especially under packet loss condition.
So far, a maximum of 32 blocks is quite enough without any "Too few Gap
ACK blocks" reports with a 5.0% packet loss rate, however this number
can be increased in the furture if needed.
Also, the patch is backward compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I dropped the ball a bit here: these patches should all probably have
been part of rc2, but I wanted to get around to properly testing them in
the various configurations (qemu32, qeum64, unleashed) first.
Unfortunately I've been traveling and didn't have time to actually do
that, but since these fix concrete bugs and pass my old set of tests I
don't want to delay the fixes any longer.
There are four independent fixes here:
* A fix for the rv32 port that corrects the 64-bit user accesor's fixup
label address.
* A fix for a regression introduced during the merge window that broke
medlow configurations at run time. This patch also includes a fix
that disables ftrace for the same set of functions, which was found by
inspection at the same time.
* A modification of the memory map to avoid overlapping the FIXMAP and
VMALLOC regions on systems with small memory maps.
* A fix to the module handling code to use the correct syntax for
probing Kconfig entries.
These have passed my standard test flow, but I didn't have time to
expand that like I said I would.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlymeX0THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQXKFD/9rncH6fbNjb7izocCsDXNaDHWrixOX
/qYaDTheLe45txZJytZwWcgpEHlk/W+SM6mbdk6oJ10bG4a/FVv7PltBtoJvkRxt
5/iX0xwE73Rug5Z2FkqzGV6qNK8xD+2CfMgeYQYCOmaOgJuCjzj0x4Buh3EBHUY+
nxueWg7/ITzWCGSzH0bYw0S9cDFz8qUhmHiHtDz6y98JPoibnz6TQ/Ie4idxfSs8
nvV5cN1t8N8UpNAtVyVvTuWzf4E+LT02LIbD59ShOcGJ9I0hzPmNzGvVFVL6sxRs
XA8lzwlXYSbGRFtcoqEjKws9nG1Jbx9TucdrQvSqNlZ+invjpY5KiJyucF9ODV+G
LQosPqa4vw+hwSD958y1v2APNCLcvVx42YSHgPNabQ33/1gQiISveoQBqLUFk19s
dsaNCrEv37EeSzNXjM3ZCjLA+udRRB4GU4lBEQSDsnDp6472cCgjpEO8mhKY7Qbq
jIAHoVg1oXXrqqDXJoq0juiYoMMthRLtSTm+daryh5NGB8qoU3XuHc6MspJI+Shy
tpjglSr8+xyyviDY8KnjHPmeTdUQKPhUf6/0fwMM3gX8AYdLRhXUiH/Cd2LehqFQ
3LqvKBhfKS5eqBNQv7Dy+MlsiifVY3gDf0lsBC6jifxF/CmmRaWCauHgNUD9AHeH
igftN7X07gwpXQ==
=JIdq
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"I dropped the ball a bit here: these patches should all probably have
been part of rc2, but I wanted to get around to properly testing them
in the various configurations (qemu32, qeum64, unleashed) first.
Unfortunately I've been traveling and didn't have time to actually do
that, but since these fix concrete bugs and pass my old set of tests I
don't want to delay the fixes any longer.
There are four independent fixes here:
- A fix for the rv32 port that corrects the 64-bit user accesor's
fixup label address.
- A fix for a regression introduced during the merge window that
broke medlow configurations at run time. This patch also includes a
fix that disables ftrace for the same set of functions, which was
found by inspection at the same time.
- A modification of the memory map to avoid overlapping the FIXMAP
and VMALLOC regions on systems with small memory maps.
- A fix to the module handling code to use the correct syntax for
probing Kconfig entries.
These have passed my standard test flow, but I didn't have time to
expand that testing like I said I would"
* tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Use IS_ENABLED(CONFIG_CMODEL_MEDLOW)
RISC-V: Fix FIXMAP_TOP to avoid overlap with VMALLOC area
RISC-V: Always compile mm/init.c with cmodel=medany and notrace
riscv: fix accessing 8-byte variable from RV32
Heiner Kallweit says:
====================
net: phy: use generic PHY ability readers if callback get_features isn't set
Meanwhile we have generic functions for reading the abilities of
Clause 22 / 45 PHY's. This allows to use them as fallback in case
callback get_features isn't set. Benefit is the reduction of
boilerplate code in PHY drivers.
v2:
- adjust a comment in patch 1 to match the code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that phylib uses genphy_read_abilities() as fallback, we don't have
to set callback get_features any longer.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Meanwhile we have generic functions for reading the abilities of
Clause 22 / 45 PHY's. This allows to use them as fallback in case
callback get_features isn't set. Benefit is the reduction of
boilerplate code in PHY drivers.
v2:
- adjust the comment in phy_driver_register to match the code
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>