This is an optimization specifically for TO_INCLUDE which sends queries
for the older entries and thus lowers the S,G timers to LMQT. If we have
the following situation for a group in either include or exclude mode:
- host A was interested in srcs X and Y, but is timing out
- host B sends TO_INCLUDE src Z, the bridge lowers X and Y's timeouts
to LMQT
- host B sends BLOCK src Z after LMQT time has passed
=> since host B is the last host we can delete the group, but if we
still have host A's EHT entries for X and Y (i.e. if they weren't
lowered to LMQT previously) then we'll have to wait another LMQT
time before deleting the group, with this optimization we can
directly remove it regardless of the group mode as there are no more
interested hosts
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for IGMPv3/MLDv2 include and exclude EHT handling. Similar to
how the reports are processed we have 2 cases when the group is in include
or exclude mode, these are processed as follows:
- group include
- is_include: create missing entries
- to_include: flush existing entries and create a new set from the
report, obviously if the src set is empty then we delete the group
- group exclude
- is_exclude: create missing entries
- to_exclude: flush existing entries and create a new set from the
report, any empty source set entries are removed
If the group is in a different mode then we just flush all entries reported
by the host and we create a new set with the new mode entries created from
the report. If the report is include type, the source list is empty and
the group has empty sources' set then we remove it. Any source set entries
which are empty are removed as well. If the group is in exclude mode it
can exist without any S,G entries (allowing for all traffic to pass).
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how
the reports are processed we have 2 cases when the group is in include
or exclude mode, these are processed as follows:
- group include
- allow: create missing entries
- block: remove existing matching entries and remove the corresponding
S,G entries if there are no more set host entries, then possibly
delete the whole group if there are no more S,G entries
- group exclude
- allow
- host include: create missing entries
- host exclude: remove existing matching entries and remove the
corresponding S,G entries if there are no more set host entries
- block
- host include: remove existing matching entries and remove the
corresponding S,G entries if there are no more set host entries,
then possibly delete the whole group if there are no more S,G entries
- host exclude: create missing entries
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we can delete set entries, we can use that to remove EHT hosts.
Since the group's host set entries exist only when there are related
source set entries we just have to flush all source set entries
joined by the host set entry and it will be automatically removed.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add EHT source set and set-entry create, delete and lookup functions.
These allow to manipulate source sets which contain their own host sets
with entries which joined that S,G. We're limiting the maximum number of
tracked S,G entries per host to PG_SRC_ENT_LIMIT (currently 32) which is
the current maximum of S,G entries for a group. There's a per-set timer
which will be used to destroy the whole set later.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add functions to create, destroy and lookup an EHT host. These are
per-host entries contained in the eht_host_tree in net_bridge_port_group
which are used to store a list of all sources (S,G) entries joined for that
group by each host, the host's current filter mode and total number of
joined entries.
No functional changes yet, these would be used in later patches.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add EHT structures for tracking hosts and sources per group. We keep one
set for each host which has all of the host's S,G entries, and one set for
each multicast source which has all hosts that have joined that S,G. For
each host, source entry we record the filter_mode and we keep an expiry
timer. There is also one global expiry timer per source set, it is
updated with each set entry update, it will be later used to lower the
set's timer instead of lowering each entry's timer separately.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to preserve the srcs pointer since we'll be passing it for EHT
handling later.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prepare __grp_src_block_incl() for being able to cause a notification
due to changes. Currently it cannot happen, but EHT would change that
since we'll be deleting sources immediately. Make sure that if the pg is
deleted we don't return true as that would cause the caller to access
freed pg. This patch shouldn't cause any functional change.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to pass the host address so later it can be used for explicit
host tracking. No functional change.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename src_size argument to addr_size in preparation for passing host
address as an argument to IGMPv3/MLDv2 functions.
No functional change.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni says:
====================
mptcp: re-enable sndbuf autotune
The sendbuffer autotuning was unintentionally disabled as a
side effect of the recent workqueue removal refactor. These
patches re-enable id, with some extra care: with autotuning
enable/large send buffer we need a more accurate packet
scheduler to be able to use efficiently the available
subflow bandwidth, especially when the subflows have
different capacities.
The first patch cleans-up subflow socket handling, making
the actual re-enable (patch 2) simpler.
Patches 3 and 4 improve the packet scheduler, to better cope
with non trivial scenarios and large send buffer.
Finally patch 5 adds and uses some infrastructure to avoid
the workqueue usage for the packet scheduler operations introduced
by the previous patches.
====================
Link: https://lore.kernel.org/r/cover.1611153172.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On MPTCP-level ack reception, the packet scheduler
may select a subflow other then the current one.
Prior to this commit we rely on the workqueue to trigger
action on such subflow.
This changeset introduces an infrastructure that allows
any MPTCP subflow to schedule actions (MPTCP xmit) on
others subflows without resorting to (multiple) process
reschedule.
A dummy NAPI instance is used instead. When MPTCP needs to
trigger action an a different subflow, it enqueues the target
subflow on the NAPI backlog and schedule such instance as needed.
The dummy NAPI poll method walks the sockets backlog and tries
to acquire the (BH) socket lock on each of them. If the socket
is owned by the user space, the action will be completed by
the sock release cb, otherwise push is started.
This change leverages the delegated action infrastructure
to avoid invoking the MPTCP worker to spool the pending data,
when the packet scheduler picks a subflow other then the one
currently processing the incoming MPTCP-level ack.
Additionally we further refine the subflow selection
invoking the packet scheduler for each chunk of data
even inside __mptcp_subflow_push_pending().
v1 -> v2:
- fix possible UaF at shutdown time, resetting sock ops
after removing the ulp context
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Otherwise the packet scheduler policy will not be
enforced when pushing pending data at MPTCP-level
ack reception time.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current packet scheduler can enqueue up to sndbuf
data on each subflow. If the send buffer is large and
the subflows are not symmetric, this could lead to
suboptimal aggregate bandwidth utilization.
Limit the amount of queued data to the maximum send
window.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 6e628cd3a8 ("mptcp: use mptcp release_cb for
delayed tasks"), MPTCP never sets the flag bit SOCK_NOSPACE
on its subflow. As a side effect, autotune never takes place,
as it happens inside tcp_new_space(), which in turn is called
only when the mentioned bit is set.
Let's sendmsg() set the subflows NOSPACE bit when looking for
more memory and use the subflow write_space callback to propagate
the snd buf update and wake-up the user-space.
Additionally, this allows dropping a bunch of duplicate code and
makes the SNDBUF_LIMITED chrono relevant again for MPTCP subflows.
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, incoming subflows link to the parent socket,
while outgoing ones link to a per subflow socket. The latter
is not really needed, except at the initial connect() time and
for the first subflow.
Always graft the outgoing subflow to the parent socket and
free the unneeded ones early.
This allows some code cleanup, reduces the amount of memory
used and will simplify the next patch
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Without this change the driver tries to allocate too many queues,
breaching the number of available msi-x interrupts on machines
with many logical cpus and default adapter settings:
Insufficient resources for 12 XDP event queues (24 other channels, max 32)
Which in turn triggers EINVAL on XDP processing:
sfc 0000:86:00.0 ext0: XDP TX failed (-22)
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20210120212759.81548-1-ivan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
the time-to-live or hop limit of the latest incoming packet with
SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
the sequence when incoming packets are aggregated. Exporting the
time-to-live or hop limit value of incoming packets helps to estimate
the hop count of the path of the flow that may change over time.
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the early retransmit has been removed by
commit bec41a11dd ("tcp: remove early retransmit"),
we also remove the unused ICSK_TIME_EARLY_RETRANS macro.
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1611239473-27304-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Document the compatible value for the RAVB block in the Renesas R-Car
V3U (R8A779A0) SoC. This variant has no stream buffer, so we only need
to add the new compatible and add it to the TX delay block.
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20210121100619.5653-2-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder says:
====================
net: ipa: remove a build dependency
Unlike the original (temporary) IPA notification mechanism, the
generic remoteproc SSR notification code does not require the IPA
driver to maintain a pointer to the modem subsystem remoteproc
structure.
The IPA driver was converted to use the newer SSR notifiers, but the
specification and use of a phandle for the modem subsystem was never
removed.
This series removes the lookup of the remoteproc pointer, and that
removes the need for the modem DT property. It also removes the
reference to the "modem-remoteproc" property from the DT binding,
and from the DT files that specified them.
====================
Link: https://lore.kernel.org/r/20210120212606.12556-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "modem-remoteproc" property is no longer required for the IPA
driver, so get rid of it.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "modem-remoteproc" property is no longer required for the IPA
driver, so get rid of it.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The IPA driver uses the remoteproc SSR notifier now, rather than the
temporary IPA notification system used initially. As a result it no
longer needs a property identifying the modem subsystem DT node.
Use GIC_SPI rather than 0 in the example interrupt definition.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The IPA driver currently requires a DT property to be defined whose
value is the phandle for the modem subsystem. This was needed to
look up a remoteproc structure pointer used when registering for
notifications in the original IPA notification mechanism.
Remoteproc provides a more generic SSR notifier system, and the IPA
driver switched over to it last summer, but this remoteproc phandle
dependency was not removed at that time.
Get rid of the IPA remoteproc pointer and stop requiring the phandle
be specified.
This avoids a link error (rproc_put() not defined) for certain
configurations.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the MII interface is used, the PHY is the clock master, thus don't
set the clock rate. On Zynq-7000, this will prevent the following
warning:
macb e000b000.ethernet eth0: unable to generate target frequency: 25000000 Hz
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210120194303.28268-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tango4 platform is getting removed, and this driver has no
other known users, so it can be removed.
Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mans Rullgard <mans@mansr.com>
Link: https://lore.kernel.org/r/20210120150703.1629527-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The comment is quite weird, there is no such thing as a vendor-specific
VPD id. 0x82 is the value of PCI_VPD_LRDT_ID_STRING. So what we are
doing here is simply checking whether the byte at VPD address VPD_BASE
is a valid string LRDT, same as what is done a few lines later in
the code.
LRDT = Large Resource Data Tag, see PCI 2.2 spec, VPD chapter
v2:
- don't set VPD_BASE / VPD_BASE_OLD separately
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/644ef22f-e86a-5cc1-0f27-f873ab165696@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add myself as maintainer of the Arrow SpeedChips XRS7000 series Ethernet
switch driver.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Link: https://lore.kernel.org/r/20210120135323.73856-1-george.mccollister@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes says:
====================
ucc_geth improvements
This is a resend of some improvements to the ucc_geth driver that was
previously sent together with bug fixes, which have by now been
applied.
v2: rebase to net/master; address minor style issues; don't introduce
a use-after-free in patch "don't statically allocate eight
ucc_geth_info".
====================
Link: https://lore.kernel.org/r/20210119150802.19997-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since kmalloc() is nowadays [1] guaranteed to return naturally
aligned (i.e., aligned to the size itself) memory for power-of-2
sizes, we don't need to over-allocate the align amount, compute an
aligned address within the allocation, and (for later freeing) also
storing the original pointer [2].
Instead, just round up the length we want to allocate to the alignment
requirements, then round that up to the next power of 2. In theory,
this could allocate up to about twice as much memory as we needed. In
practice, (a) kmalloc() would in most cases anyway return a
power-of-2-sized allocation and (b) with the default values of the
bdRingLen[RT]x fields, the length is already itself a power of 2
greater than the alignment.
So we actually end up saving memory compared to the current
situtation (e.g. for tx, we currently allocate 128+32 bytes, which
kmalloc() likely rounds up to 192 or 256; with this patch, we just
allocate 128 bytes.) Also struct ucc_geth_private becomes a little
smaller.
[1] 59bb47985c ("mm, sl[aou]b: guarantee natural alignment for
kmalloc(power-of-two)")
[2] That storing was anyway done in a u32, which works on 32 bit
machines, but is not very elegant and certainly makes a reader of the
code pause for a while.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The numQueuesTx and numQueuesRx members of struct ucc_geth_info are
never set to anything but 1, and never have been. It's unclear how
well the code supporting multiple queues would work. Until somebody
wants to play with enabling that, help the compiler eliminate a lot of
dead code and loops that are not really loops by creating static
inline helpers. If and when the numQueuesTx/numQueuesRx fields are
re-introduced, it suffices to update those helper to return the
appropriate field.
This cuts the .text segment of ucc_geth.o by 8%.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The translation from the ucc_geth_num_of_threads enum value to the
actual count can be written somewhat more compactly with a small
lookup table, allowing us to replace the four switch statements.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bd_mem_part member of ucc_geth_info always has the value
MEM_PART_SYSTEM, and AFAICT, there has never been any code setting it
to any other value. Moreover, muram is a somewhat precious resource,
so there's no point using that when normal memory serves just as well.
Apart from removing a lot of dead code, this is also motivated by
wanting to clean up the "store result from kmalloc() in a u32" mess.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These macros both have the value 32, there's no point first
initializing align to a lower value.
If anything, one could throw in a
BUILD_BUG_ON(UCC_GETH_TX_BD_RING_ALIGNMENT < 4), but it's not worth it
- lots of code depends on named constants having sensible values.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
struct ucc_geth_info is somewhat large, and on systems with only one
or two UCC instances, that just wastes a few KB of memory. So
allocate and populate a chunk of memory at probe time instead of
initializing them all during driver init.
Note that the existing "ug_info == NULL" check was dead code, as the
address of some static array element can obviously never be NULL.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reduce the code duplication a bit by moving the parsing of
rx-clock-name and the fallback handling to a helper function.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These fields are only used within ucc_geth_startup(), so they might as
well be local variables in that function rather than being stashed in
struct ucc_geth_private.
Aside from making that struct a tiny bit smaller, it also shortens
some lines (getting rid of pointless casts while here), and fixes the
problems with using IS_ERR_VALUE() on a u32 as explained in commit
800cd6fb76 ("soc: fsl: qe: change return type of cpm_muram_alloc()
to s32").
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These buffers have all just been handed out from qe_muram_alloc(), aka
cpm_muram_alloc(), and the helper cpm_muram_alloc_common() already
does
memset_io(cpm_muram_addr(start), 0, size);
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This removes the explicit NULL checks, and allows us to stop storing
at least some of the _offset values separately.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a helper that takes a virtual address rather than the muram
offset. This will be used in a couple of places to avoid having to
store both the offset and the virtual address, as well as removing
NULL checks from the callers.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two functions cpm_muram_offset() and cpm_muram_dma() both need a
cast currently, one casts muram_vbase to do the pointer arithmetic on
void pointers, the other casts the passed-in address u8*.
It's simpler and more consistent to just always use void* and drop all
the casting.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow passing const-qualified pointers without requiring a cast in the
caller.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>