Publish fib_nlmsg_size() to allow it to be used later on from
fib_alias_hw_flags_set().
Remove the inline keyword since it shouldn't be used inside C files.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
fib_dump_info() does not change 'fri', so pass it as 'const'.
It will later allow us to invoke fib_dump_info() from
fib_alias_hw_flags_set().
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, netdevsim implements dummy FIB offload and marks notified
routes with RTM_F_TRAP flag. netdevsim does not defer route notifications
to a work queue because it does not need to program any hardware.
Given that netdevsim's purpose is to both give an example implementation
and allow developers to test their code, align netdevsim to a "real"
hardware device driver like mlxsw and have it also perform the route
"programming" in a non-atomic context.
It will be used to test route flags notifications which will be added in
the next patches.
The following changes are needed when route handling is performed in WQ:
- Handle the accounting in the main context, to be able to return an
error for adding route when all the routes are used.
For FIB_EVENT_ENTRY_REPLACE increase the counter before scheduling
the delayed work, and in case that this event replaces an existing route,
decrease the counter as part of the delayed work.
- For IPv6, cannot use fen6_info->rt->fib6_siblings list because it
might be changed during handling the delayed work.
Save an array with the nexthops as part of fib6_event struct, and take
a reference for each nexthop to prevent them from being freed while
event is queued.
- Change GFP_ATOMIC allocations to GFP_KERNEL.
- Use single work item that is handling a list of ordered routes.
Handling routes must be processed in the order they were submitted to
avoid logical errors that could lead to unexpected failures.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When route is added/deleted, the appropriate counter is increased/decreased
to maintain number of routes.
User can limit the number of routes and then according to the appropriate
counter, adding more routes than the limitation is forbidden.
Currently, there is one lock which protects hashtable, list and accounting.
Handling the counters will be performed from both atomic context and
non-atomic context, while the hashtable and the list will be used only from
non-atomic context and therefore will be protected by a separate lock.
Protect accounting by using an atomic variable, so lock is not needed.
v2:
* Use atomic64_sub() in nsim_nexthop_account()'s error path
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder says:
====================
net: ipa: don't disable NAPI in suspend
This is version 2 of a series that reworks the order in which things
happen during channel stop and suspend (and start and resume), in
order to address a hang that has been observed during suspend.
The introductory message on the first version of the series gave
some history which is omitted here.
The end result of this series is that we only enable NAPI and the
I/O completion interrupt on a channel when we start the channel for
the first time. And we only disable them when stopping the channel
"for good." In other words, NAPI and the completion interrupt
remain enabled while a channel is stopped for suspend.
One comment on version 1 of the series suggested *not* returning
early on success in a function, instead having both success and
error paths return from the same point at the end of the function
block. This has been addressed in this version.
In addition, this version consolidates things a little bit, but the
net result of the series is exactly the same as version 1 (with the
exception of the return fix mentioned above).
First, patch 6 in the first version was a small step to make patch 7
easier to understand. The two have been combined now.
Second, previous version moved (and for suspend/resume, eliminated)
I/O completion interrupt and NAPI disable/enable control in separate
steps (patches). Now both are moved around together in patch 5 and
6, which eliminates the need for the final (NAPI-only) patch.
I won't repeat the patch summaries provided in v1:
https://lore.kernel.org/netdev/20210129202019.2099259-1-elder@linaro.org/
Many thanks to Willem de Bruijn for his thoughtful input.
====================
Link: https://lore.kernel.org/r/20210201172850.2221624-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Transactions to send data for a network device can be allocated at
any time up until the point the TX queue is stopped. It is possible
for ipa_start_xmit() to be called in one context just before a
the transmit queue is stopped in another.
Update gsi_channel_trans_last() so that for TX channels the
allocated and pending transaction lists are checked--in addition
to the completed and polled lists--to determine the "last"
transaction. This means any transaction that has been allocated
before the TX queue is stopped will be allowed to complete before
we conclude the channel is quiesced.
Rework the function a bit to use a list pointer and gotos.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No completion interrupts will occur while an endpoint is suspended,
nor when a channel has been stopped for suspend. So there's no need
to disable the interrupt during suspend and re-enable it when
resuming. Without any interrupts occurring, there is no need to
disable/re-enable NAPI for channel suspend/resume either.
We'll only enable NAPI and the interrupt when we first start the
channel, and disable it again only when it's "really" stopped.
To accomplish this, move the enable/disable calls out of
__gsi_channel_start() and __gsi_channel_stop(), and into
gsi_channel_start() and gsi_channel_stop() instead.
Add a call to napi_synchronize() to gsi_channel_suspend(), to ensure
NAPI polling is done before moving on.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Disable both the I/O completion interrupt and NAPI polling on a
channel *after* we successfully stop it rather than before. This
ensures a completion occurring just before the channel is stopped
gets processed.
Enable NAPI polling and the interrupt *before* starting a channel
rather than after, to be symmetric. A stopped channel won't
generate any completion interrupts anyway.
Enable NAPI before the interrupt and disable it afterward.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Open-code gsi_channel_freeze() and gsi_channel_thaw() in all callers
and get rid of these two functions. This is part of reworking the
sequence of things done during channel suspend/resume and start/stop.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a new function that does most of the work of starting a
channel. What's different is that it takes a flag indicating
whether the channel should really be started or not. Create
another new function __gsi_channel_stop() that behaves similarly.
IPA v3.5.1 implements suspend using a special SUSPEND endpoint
setting. If the endpoint is suspended when an I/O completes on the
underlying GSI channel, a SUSPEND interrupt is generated.
Newer versions of IPA do not implement the SUSPEND endpoint mode.
Instead, endpoint suspend is implemented by simply stopping the
underlying GSI channel. In this case, a completing I/O on a
*stopped* channel causes the SUSPEND interrupt condition.
These new functions put all activity related to starting or stopping
a channel (including "thawing/freezing" the channel) in one place,
whether or not the channel is actually started or stopped.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a new helper function that encapsulates issuing a set of
channel stop commands, retrying if appropriate, with a short delay
between attempts.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If an error occurs starting a channel, don't "thaw" it.
We should assume the channel remains in a non-started state.
Update the comment in gsi_channel_stop(); calls to this function
are no longer retried.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when
cloning an skb, save and restore truesize after pskb_expand_head(). This
can occur if the allocator decides to service an allocation of the same
size differently (e.g. use a different size class, or pass the
allocation on to KFENCE).
Because truesize is used for bookkeeping (such as sk_wmem_queued), a
modified truesize of a cloned skb may result in corrupt bookkeeping and
relevant warnings (such as in sk_stream_kill_queues()).
Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com
Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With version 0 of the protocol it was legal to encode the 'Subflow Id' in
the MP_PRIO suboption, to specify which subflow would change its 'Backup'
flag. This has been removed from v1 specification: thus, according to RFC
8684 §3.3.8, the resulting 'Length' for MP_PRIO changed from 4 to 3 byte.
Current Linux generates / parses MP_PRIO according to the old spec, using
'Length' equal to 4, and hardcoding 1 as 'Subflow Id'; RFC compliance can
improve if we change 'Length' in other to become 3, leaving a 'Nop' after
the MP_PRIO suboption. In this way the kernel will emit and accept *only*
MP_PRIO suboptions that are compliant to version 1 of the MPTCP protocol.
unpatched 5.11-rc kernel:
[root@bottarga ~]# tcpdump -tnnr unpatched.pcap | grep prio
reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1)
dropped privs to tcpdump
IP 10.0.3.2.48433 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 4032325513 ecr 1876514270,mptcp prio non-backup id 1,mptcp dss ack 14084896651682217737], length 0
patched 5.11-rc kernel:
[root@bottarga ~]# tcpdump -tnnr patched.pcap | grep prio
reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1)
dropped privs to tcpdump
IP 10.0.3.2.49735 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 1276737699 ecr 2686399734,mptcp prio non-backup,nop,mptcp dss ack 18433038869082491686], length 0
Changes since v2:
- when accounting for option space, don't increment 'TCPOLEN_MPTCP_PRIO'
and use 'TCPOLEN_MPTCP_PRIO_ALIGN' instead, thanks to Matthieu Baerts.
Changes since v1:
- refactor patch to avoid using 'TCPOLEN_MPTCP_PRIO' with its old value,
thanks to Geliang Tang.
Fixes: 067065422f ("mptcp: add the outgoing MP_PRIO support")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Matteo Croce <mcroce@linux.microsoft.com>
Link: https://lore.kernel.org/r/846cdd41e6ad6ec88ef23fee1552ab39c2f5a3d1.1612184361.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Emil Renner Berthing says:
====================
drivers: net: update tasklet_init callers
This updates the remaining callers of tasklet_init() in drivers/net
to the new API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
All changes are done by coccinelle using the following semantic patch.
Coccinelle needs a little help parsing drivers/net/arcnet/arcnet.c
@ match @
type T;
T *container;
identifier tasklet;
identifier callback;
@@
tasklet_init(&container->tasklet, callback, (unsigned long)container);
@ patch1 depends on match @
type match.T;
identifier match.tasklet;
identifier match.callback;
identifier data;
identifier container;
@@
-void callback(unsigned long data)
+void callback(struct tasklet_struct *t)
{
...
- T *container = (T *)data;
+ T *container = from_tasklet(container, t, tasklet);
...
}
@ patch2 depends on match @
type match.T;
identifier match.tasklet;
identifier match.callback;
identifier data;
identifier container;
@@
-void callback(unsigned long data)
+void callback(struct tasklet_struct *t)
{
...
- T *container;
+ T *container = from_tasklet(container, t, tasklet);
...
- container = (T *)data;
...
}
@ depends on (patch1 || patch2) @
match.T *container;
identifier match.tasklet;
identifier match.callback;
@@
- tasklet_init(&container->tasklet, callback, (unsigned long)container);
+ tasklet_setup(&container->tasklet, callback);
====================
Link: https://lore.kernel.org/r/20210130234730.26565-1-kernel@esmil.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the async and synctty drivers to use the new tasklet API n
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This converts the driver to use the new tasklet API introduced in
commit 12cc923f1c ("tasklet: Introduce new initialization API")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmAZZ3gACgkQGXyLc2ht
IW1+SQ//bVsbnFa+EH0b9uHNHr03GTgxqg34CghumaCFcmRY3Pq3cFbeJUlntgh6
n6z6+p/apbqpZn6j05R930y0kONcVLbDQ6mykgY37U5neUGa3VcctJKuIVdsonmc
6Zwsd0ZxzZoBB+UQmKKkWRW2f8mdfuQFGWmF9kkNyPxuxVHP1y7Bti4/L16o7Vd0
rYWH0SiiFnjI1P+ll4gdLWZ9jl7qV36HkBc1pZpT+GOAZLMJCv8ITur/K9H5VLbJ
RBcBli5fEH/tgUgO33oCeqNL/M6XuQZJ+rC04hXfA7Vf0p/s67D9tBqxkwNtvgAj
QKFN77sETELToiaPLo0Y36AycYI3dgAZRPPxIVggYN6DCv6lbFsiIhcKxYwGTJk3
/9p3PKFRUMF7P0QuGmf0J+CHs/ARpU7LSO6o1csXq+jzBQUJo1vLte99nYHWKPtf
WhJ36V9Gng2D9hraxjynzoLfkCMlDrH6IYKJ5qBfYrATUMpRaTFx13QtSlItgJT3
I6xMSh1D9vIMVHo53US3j9KRSR4s+SSNg1zubt4JPRvQNuOZEMt05hucU5BAYqek
Csn5Usn+4B74ct+/sYLL6z8yDb/0ecRz/R5qsBwM1gTA7+V2Ye+boNSKCFzVo/ha
A5h9wPMvWtmW0kcsJa7+4P0HfwXp7EVAYRsP6zfUIggxbH7UkgI=
=77ru
-----END PGP SIGNATURE-----
Merge tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linux
Pull clang-format update from Miguel Ojeda:
"Update with the latest for_each macro list"
* tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
- fix a kernel crash in the new dma-mapping benchmark test (Barry Song)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmAZGkkLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYNF0A//WjFHsPCMbbhlOlCMcRD/I1+JLQYchjlWp48wAzxq
K3fmGnLlsvd3lCyUQzLLcUMSs+NsaTlzNNtH+MSNfGvX3x/8Mz9F57AgJ7C2xlaD
XXbob5SKAqls6UFw6sBhlNbUe/l3Tup7LgqyCqQGRfpftycO7Vk70oHfFir0xqeB
IX2s2s9UK+iePRtqfhyOylipIPXG3A3TnDJ+T3x5wsw6m4ejr8TVUVNsA1Xg0mha
xsMVELyPwp3pEYp0+LNZsvtRC6uv7MeNf11mqmtk9VOZtrnS1VsD+ZPeNXzoCP3n
5iipcFGeiz8ac2bdldIjwYa4wyIBZR1kOOQ+1+gq42/Hs/Eu9N3aQydhgrzw8ZnY
MxUHbHazhpB2qt2lxyMjODVRKFdEt2FrkP7a7d3guw2Il4o2f14g9MZS72c5H5jk
Vg90VSW8UQr51MKamZkgLeIgFaqVTOkgXO5h7e+dfD/XMgm4Vt3qsjcBSQfTcCrj
4efmJwUvyv8x13HXoSvistKXG6kTH+rBtbxD6RvzE2C9dasb5L3mYIj+loQJqU7+
u/+X9ZUVHTvXlH7j7Fc9vBgCUf0VlZJ8N3LXl5zN7zRP40vcvudlhyNj1PWhiUlu
G5RBhuyTm3jd+NqCYE7XCqf+dvA7BGgkW2LZM/qAhlZgmJaWLwlJwAe0XA0KR9uR
U5w=
=l+IX
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
"Fix a kernel crash in the new dma-mapping benchmark test (Barry Song)"
* tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: fix kernel crash when dma_map_single fails
A single mlx bugfix.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmAQJNMPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpDDAH/1S+KGTZ8tgCJmzFm/JQxomGr8qQuXw2QG+J
l6yuXKDe2lss1yTC7hcozfv1n1JnnASwPHclPRFEOCXCO0f/CGvhh8U6Nqkr/5sR
4Y/VA6Ex4vJUJxQA2VAa2QPFnDsIjNigYxeehTu4SHpAMF3h92r5M9pE6VjofJbJ
+7eknl2uCcscIdM6gQClskhqb22iEn10gmHSYLwQguoClFSOtr2iNWEkILFLhZa/
Qc5yYiNqR1ZPhLUQeMCXX5CDY5FoVnkavMlf+gI5JuvawG2RZqiLEmU9M/VS9+zk
FHLWHefuweHBir5076um6LOqAGRE+wf4FigdGikwBefsBM/8l8c=
=+kN5
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vdpa fix from Michael Tsirkin:
"A single mlx bugfix"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix memory key MTT population
trees.
Current release - regressions:
- ip_tunnel: fix mtu calculation
- mlx5: fix function calculation for page trees
Previous releases - regressions:
- vsock: fix the race conditions in multi-transport support
- neighbour: prevent a dead entry from updating gc_list
- dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Previous releases - always broken:
- bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF
cgroup getsockopt infra when user space is trying
to race against optlen, from Loris Reiff.
- bpf: add missing fput() in BPF inode storage map update helper
- udp: ipv4: manipulate network header of NATed UDP GRO fraglist
- mac80211: fix station rate table updates on assoc
- r8169: work around RTL8125 UDP HW bug
- igc: report speed and duplex as unknown when device is runtime
suspended
- rxrpc: fix deadlock around release of dst cached on udp tunnel
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmAZjwQACgkQMUZtbf5S
IruLbQ//Yg9+xEnqhDuOJZtYHB0rsJjLlKmtvgOsBr8BaTcUEPoPoqUPm+EMvCHb
o1fFa1qIrbS5luVEofu9hNX7DGXwvgawaMW2TympJhqLZQqjazCMB/st99LphhJw
RvaZI8aDOikosT4c+I0vm83jDQETonrjziIcPfHHPjn/Q+amGRRRXiTSQnRF/MlU
oARCG+U3kHsHBDUPNSCtSjKXshoZPjFb/pD7fQAlzzm7CssvbPhNWbducueyP2Fb
XW4RwJu9QBBH2JS6uZJ1Y6LVoRzusmE9dUam3KhkiL/CHs72lWPsc+Rn5gbBPvc5
Y4T4h61Xti1O4ULKdqhGceror6XY+4Qb1VlHWWztOhIo00wIAv3IHbTup/4o0HBr
j84MtcyOl/qxSFXjunPJkbWJngXikrkIMS0Bl6ZcPAejYM9wN6vCgbvFCHbEg1Rx
cWFnYyS9FCLduaxHSizv050tWhknOdX+zHK3fOtlW0yWnreJAB8Hoc21Zm7YKvg0
GxxcGK6AhqJ6s2ixVDv7MyJrltJ/hOJQb+T3HgHFuY2BYUs8F2r/HoHU/u4uCl76
RdBzbC/sLnBpMHf6r1rHTnGPsapoJOOYWnej71l425vX1qr5xnmxVNNB6HReObNv
+/jPoRYa5BVsVt2LmDcuH1O32pXJPWKVBR7Yfa6Bn2yzhcbECTc=
=ZByM
-----END PGP SIGNATURE-----
Merge tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.11-rc7, including fixes from bpf and mac80211
trees.
Current release - regressions:
- ip_tunnel: fix mtu calculation
- mlx5: fix function calculation for page trees
Previous releases - regressions:
- vsock: fix the race conditions in multi-transport support
- neighbour: prevent a dead entry from updating gc_list
- dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Previous releases - always broken:
- bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF
cgroup getsockopt infra when user space is trying to race against
optlen, from Loris Reiff.
- bpf: add missing fput() in BPF inode storage map update helper
- udp: ipv4: manipulate network header of NATed UDP GRO fraglist
- mac80211: fix station rate table updates on assoc
- r8169: work around RTL8125 UDP HW bug
- igc: report speed and duplex as unknown when device is runtime
suspended
- rxrpc: fix deadlock around release of dst cached on udp tunnel"
* tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary
net: ipa: fix two format specifier errors
net: ipa: use the right accessor in ipa_endpoint_status_skip()
net: ipa: be explicit about endianness
net: ipa: add a missing __iomem attribute
net: ipa: pass correct dma_handle to dma_free_coherent()
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
net: mvpp2: TCAM entry enable should be written after SRAM data
net: lapb: Copy the skb before sending a packet
net/mlx5e: Release skb in case of failure in tc update skb
net/mlx5e: Update max_opened_tc also when channels are closed
net/mlx5: Fix leak upon failure of rule creation
net/mlx5: Fix function calculation for page trees
docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
net: ip_tunnel: fix mtu calculation
vsock: fix the race conditions in multi-transport support
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
ibmvnic: device remove has higher precedence over reset
...
sup_multicast_addr is passed to ether_addr_equal for address comparison
which casts the address inputs to u16 leading to an unaligned access.
Aligning the sup_multicast_addr to u16 boundary fixes the issue.
Signed-off-by: Andreas Oetken <andreas.oetken@siemens.com>
Link: https://lore.kernel.org/r/20210202090304.2740471-1-ennoerlangen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmAY+OoACgkQSD+KveBX
+j6hPAf9G09tOYaZfFkbFmfhe5TnjwPCf/2CdErj+dEPb9GI6rYhVoSYeNqUZSt0
UYQeutWVSaIohCA79D8l1PFHlOV1CXYO35BGi7Ga6glODzXEDEt0u/kqxAH7eNuF
VsnxQ2kfzDwN2M8N2xuIzoG66I+IAFa9hukzxLcIPwYjbGDL46p54wNZVSCh/scu
7CJbpTXTBh9yLRWTfhVouTWMHu3DaC7rMnaB+Yq+lRRzqnQ6Xitdty6uvFSKMv1Y
CMVuVKH3dafCFhJf/xF7vwdlGo4QcVx2FbZwo0I6tjGEEzYwfCWAaqY+3MB0VRc5
k7tjJeT6T+MW29knSh71LmTrnm/Abg==
=FWcp
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-02-01
Please note the first patch in this series
("Fix function calculation for page trees") is fixing a regression
due to previous fix in net which you didn't include in your previous
rc pr. So I hope this series will make it into your next rc pr,
so mlx5 won't be broken in the next rc.
* tag 'mlx5-fixes-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Release skb in case of failure in tc update skb
net/mlx5e: Update max_opened_tc also when channels are closed
net/mlx5: Fix leak upon failure of rule creation
net/mlx5: Fix function calculation for page trees
====================
Link: https://lore.kernel.org/r/20210202070703.617251-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder says:
====================
net: ipa: a few bug fixes
This series fixes four minor bugs. The first two are things that
sparse points out. All four are very simple and each patch should
explain itself pretty well.
====================
Link: https://lore.kernel.org/r/20210201232609.3524451-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix two format specifiers that used %lu for a size_t in "ipa_mem.c".
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When extracting the destination endpoint ID from the status in
ipa_endpoint_status_skip(), u32_get_bits() is used. This happens to
work, but it's wrong: the structure field is only 8 bits wide
instead of 32.
Fix this by using u8_get_bits() to get the destination endpoint ID.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sparse warns that the assignment of the metadata mask for a QMAP
endpoint in ipa_endpoint_init_hdr_metadata_mask() is a bad
assignment. We know we want the mask value to be big endian, even
though the value we write is in host byte order. Use a __force
tag to indicate we really mean it.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The virt local variable in gsi_channel_state() does not have an
__iomem attribute but should. Fix this.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Amy Parker <enbyamy@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "ring->addr = addr;" assignment is done a few lines later so we
can't use "ring->addr" yet. The correct dma_handle is "addr".
Fixes: 650d160382 ("soc: qcom: ipa: the generic software interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/YBjpTU2oejkNIULT@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
So far phy_disconnect() is called before free_irq(). If CONFIG_DEBUG_SHIRQ
is set and interrupt is shared, then free_irq() creates an "artificial"
interrupt by calling the interrupt handler. The "link change" flag is set
in the interrupt status register, causing phylib to eventually call
phy_suspend(). Because the net_device is detached from the PHY already,
the PHY driver can't recognize that WoL is configured and powers down the
PHY.
Fixes: f1e911d5d0 ("r8169: add basic phylib support")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/fe732c2c-a473-9088-3974-df83cfbd6efd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS
control message is passed with user-controlled
0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition.
The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400.
So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10
is the closest limit, with 0x10 leftover.
Same condition is currently done in rds_cmsg_rdma_args().
[1] WARNING: mm/page_alloc.c:5011
[..]
Call Trace:
alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
alloc_pages include/linux/gfp.h:547 [inline]
kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
kmalloc_array include/linux/slab.h:592 [inline]
kcalloc include/linux/slab.h:621 [inline]
rds_rdma_extra_size+0xb2/0x3b0 net/rds/rdma.c:568
rds_rm_size net/rds/send.c:928 [inline]
Reported-by: syzbot+1bd2b07f93745fa38425@syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Link: https://lore.kernel.org/r/20210201203233.1324704-1-snovitoll@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Last TCAM data contains TCAM enable bit.
It should be written after SRAM data before entry enabled.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Link: https://lore.kernel.org/r/1612172139-28343-1-git-send-email-stefanc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When sending a packet, we will prepend it with an LAPB header.
This modifies the shared parts of a cloned skb, so we should copy the
skb rather than just clone it, before we prepend the header.
In "Documentation/networking/driver.rst" (the 2nd point), it states
that drivers shouldn't modify the shared parts of a cloned skb when
transmitting.
The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called
when an skb is being sent, clones the skb and sents the clone to
AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte
pseudo-header before handing over the skb to us, if we don't copy the
skb before prepending the LAPB header, the first byte of the packets
received on AF_PACKET sockets can be corrupted.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20210201055706.415842-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- station rate tables were not updated correctly
after association, leading to bad configuration
- rtl8723bs (staging) was initializing data incorrectly
after the previous fix and needed to move the init
later
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmAZYesACgkQB8qZga/f
l8QOpA//V6cBc+ie4AoWKLU/isc33IiZG15qbwvXrHVouLaMgSg4SKQ3cQMwRKd7
Y6cDJynj4EkIo7RZlRMIo5Dsefm2sv6L41tDhfVwR8Z7+wOmGKSXsmYABpjWttaZ
4ABEqU2ZUPI3cm5iYF7qKKSNbwSqHeCWjlWnB3TFB8NfzTC+x7uSTQ/8C8/GZ7aF
tkELHgvdig9+FbdKOs52lmIneTYLEDqxMuv+65XtE9flaYvgWiXCj5ilVVDo8Tjd
vdCST8ux/9YIEcrhlM+SUM1OFO6AIqZ3EX5S2ZzJdc37PMDDy+nNr95cFrlN4EQ8
y9avIS0Z+mvw/R7KSDc7XKInpvleC7bzR9DZVQsF8hdV9iB0cmVKyPmASfGpft69
Ndv2+h2vmWvSHmJDpiroSvTY9WT+AgWCihOU/tj0PrKs+XNLUFfrO08BxFGnaRK/
+MXzXY7ZmgfU9BFgmlAS2ejRbqfb3V6F5qa2Obj+3gq/SbM9W4Jl8RHiiox7szse
GdLrT/LjvVEFC/cEMqDzvnGpnVosNkNtJRFMAaGyKs1g/uljl9A51HRZ8HdLrgv9
bVsMripcQX2JMMxqBwbyfdzPBE0MX8ExkMhyuFbdUyWGEWJqsz4+irr25Bhcyoge
RaRI6/xPM7DOkB9CDdbvJItBJ9GHYz6gvf+ZiIdu+ClpQ+b3k3s=
=mEgj
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Two fixes:
- station rate tables were not updated correctly
after association, leading to bad configuration
- rtl8723bs (staging) was initializing data incorrectly
after the previous fix and needed to move the init
later
* tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
staging: rtl8723bs: Move wiphy setup to after reading the regulatory settings from the chip
mac80211: fix station rate table updates on assoc
====================
Link: https://lore.kernel.org/r/20210202143505.37610-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of failure in tc update skb the packet is dropped
without freeing the skb.
Fixed by freeing the skb in case failure in tc update skb.
Fixes: d6d2778286 ("net/mlx5: E-Switch, Restore chain id on miss")
Fixes: c756909722 ("net/mlx5e: Add tc chains offload support for nic flows")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
max_opened_tc is used for stats, so that potentially non-zero stats
won't disappear when num_tc decreases. However, mlx5e_setup_tc_mqprio
fails to update it in the flow where channels are closed.
This commit fixes it. The new value of priv->channels.params.num_tc is
always checked on exit. In case of errors it will just be the old value,
and in case of success it will be the updated value.
Fixes: 05909babce ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When creation of a new rule that requires allocation of an FTE fails,
need to call to tree_put_node on the FTE in order to release its'
resource.
Fixes: cefc23554f ("net/mlx5: Fix FTE cleanup")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The function calculation always results in a value of 0. This works
generally, but when the release all pages feature is enabled it will
result in crashes.
Fixes: 0aa128475d ("net/mlx5: Maintain separate page trees for ECPF and PF functions")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-02-01
This series contains updates to igc and i40e drivers.
Kai-Heng Feng fixes igc to report unknown speed and duplex during suspend
as an attempted read will cause errors.
Kevin Lo sets the default value to -IGC_ERR_NVM instead of success for
writing shadow RAM as this could miss a timeout. Also propagates the return
value for Flow Control configuration to properly pass on errors for igc.
Aleksandr reverts commit 2ad1274fa3 ("i40e: don't report link up for a VF
who hasn't enabled queues") as this can cause link flapping.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
igc: check return value of ret_val in igc_config_fc_after_link_up
igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
igc: Report speed and duplex as unknown when device is runtime suspended
====================
Link: https://lore.kernel.org/r/20210201214618.852831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijun Pan says:
====================
rework the memory barrier for SCRQ entry
This series rework the memory barrier for SCRQ (Sub-Command-Response
Queue) entry.
====================
Link: https://lore.kernel.org/r/20210130011905.1485-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rmb() can be removed since:
1. pending_scrq() has dma_rmb() at the function end;
2. dma_rmb(), though weaker, is enough here.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Dwip Banerjee <dnbanerg@us.ibm.com>
Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the dma_rmb() between pending_scrq() and ibmvnic_next_scrq()
into the end of pending_scrq() to save the duplicated code since
this dma_rmb will be used 3 times.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>