According to Guillaume Nault RT_TOS should never be used for IPv6.
Quote:
RT_TOS() is an old macro used to interprete IPv4 TOS as described in
the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
code, although, given the current state of the code, most of the
existing calls have no consequence.
But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
field to be interpreted the RFC 1349 way. There's no historical
compatibility to worry about.
Fixes: 571912c69f ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Matthias May <matthias.may@westermo.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Harden set element field checks to avoid out-of-bound memory access,
this patch also fixes the type of issue described in 7e6bc1f6ca
("netfilter: nf_tables: stricter validation of element data") in a
broader way.
2) Patches to restrict the chain, set, and rule id lookup in the
transaction to the corresponding top-level table, patches from
Thadeu Lima de Souza Cascardo.
3) Fix incorrect comment in ip6t_LOG.h
4) nft_data_init() performs upfront validation of the expected data.
struct nft_data_desc is used to describe the expected data to be
received from userspace. The .size field represents the maximum size
that can be stored, for bound checks. Then, .len is an input/output field
which stores the expected length as input (this is optional, to restrict
the checks), as output it stores the real length received from userspace
(if it was not specified as input). This patch comes in response to
7e6bc1f6ca ("netfilter: nf_tables: stricter validation of element data")
to address this type of issue in a more generic way by avoid opencoded
data validation. Next patch requires this as a dependency.
5) Disallow jump to implicit chain from set element, this configuration
is invalid. Only allow jump to chain via immediate expression is
supported at this stage.
6) Fix possible null-pointer derefence in the error path of table updates,
if memory allocation of the transaction fails. From Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: fix null deref due to zeroed list head
netfilter: nf_tables: disallow jump to implicit chain from set element
netfilter: nf_tables: upfront validation of data via nft_data_init()
netfilter: ip6t_LOG: Fix a typo in a comment
netfilter: nf_tables: do not allow RULE_ID to refer to another chain
netfilter: nf_tables: do not allow CHAIN_ID to refer to another table
netfilter: nf_tables: do not allow SET_ID to refer to another table
netfilter: nf_tables: validate variable length element extension
====================
Link: https://lore.kernel.org/r/20220809220532.130240-1-pablo@netfilter.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In addition to TC hook, enable these in tracing programs so that they
can be used in selftests.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220809213033.24147-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Work on "courteous server", which was introduced in 5.19, continues
apace. This release introduces a more flexible limit on the number
of NFSv4 clients that NFSD allows, now that NFSv4 clients can remain
in courtesy state long after the lease expiration timeout. The
client limit is adjusted based on the physical memory size of the
server.
The NFSD filecache is a cache of files held open by NFSv4 clients or
recently touched by NFSv2 or NFSv3 clients. This cache had some
significant scalability constraints that have been relieved in this
release. Thanks to all who contributed to this work.
A data corruption bug found during the most recent NFS bake-a-thon
that involves NFSv3 and NFSv4 clients writing the same file has been
addressed in this release.
This release includes several improvements in CPU scalability for
NFSv4 operations. In addition, Neil Brown provided patches that
simplify locking during file lookup, creation, rename, and removal
that enables subsequent work on making these operations more
scalable. We expect to see that work materialize in the next
release.
There are also numerous single-patch fixes, clean-ups, and the
usual improvements in observability.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLujF0ACgkQM2qzM29m
f5c14w/+Lsoryo5vdTXMAZBXBNvVdXQmHqLIEotJEVA3sECr+Kad2bF8rFaCWVzS
Gf9KhTetmcDO9O73I5I/UtJ2qFT9B4I6baSGpOzIkjsM/aKeEEQpbpdzPYhKrCEv
bQu3P54js7snH4YV8s0I39nBFOWdnahYXaw7peqE/2GOHxaR2mz88bkrQ+OybCxz
KETqUxA6bKzkOT61S0nHcnQKd8HQzhocMDtrxtANHGsMM167ngI1dw4tUQAtfAUI
s9R+GS6qwiKgwGz1oqhTR6LA/h4DROxPnc7AieuD9FvuAnR3kXw61bGMN5Biwv2T
JZUTBbQvWhNasSV+7qOY9nBu+sHVC6Q7OZ5C9F/KjMyqCioDX0DnbxX9uKP20CDd
EAAMS8n4Tdgd4KRBWdkLXPzizWYAjZQmFIJtcZne1JzGZ4IWRnikgM5qD6n1VviZ
kcPRm5EN3DRHA+Hte4jG0EHIrE/7g5gnf+zr9dWl3uNhZtfTmumCfU16YYmKG8pP
QN4kXBR2w7dAvp8nRaOsY6bBFLDAk/jHbpY8Q4xoUO4tsojfWayCTGVFOrecOjxv
uSn0LhiidC5pLlkcPgwemhysVywDzr+gGXBRJXeUOHfdd05Q2gbFK8OpqDSvJ3dZ
aC/RxFvHc8jaktUcuIjkE6Rsz6AVaAH3EZj84oMZ4hZhyGbEreg=
=PEJ3
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Work on 'courteous server', which was introduced in 5.19, continues
apace. This release introduces a more flexible limit on the number of
NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in
courtesy state long after the lease expiration timeout. The client
limit is adjusted based on the physical memory size of the server.
The NFSD filecache is a cache of files held open by NFSv4 clients or
recently touched by NFSv2 or NFSv3 clients. This cache had some
significant scalability constraints that have been relieved in this
release. Thanks to all who contributed to this work.
A data corruption bug found during the most recent NFS bake-a-thon
that involves NFSv3 and NFSv4 clients writing the same file has been
addressed in this release.
This release includes several improvements in CPU scalability for
NFSv4 operations. In addition, Neil Brown provided patches that
simplify locking during file lookup, creation, rename, and removal
that enables subsequent work on making these operations more scalable.
We expect to see that work materialize in the next release.
There are also numerous single-patch fixes, clean-ups, and the usual
improvements in observability"
* tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (78 commits)
lockd: detect and reject lock arguments that overflow
NFSD: discard fh_locked flag and fh_lock/fh_unlock
NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
NFSD: use explicit lock/unlock for directory ops
NFSD: reduce locking in nfsd_lookup()
NFSD: only call fh_unlock() once in nfsd_link()
NFSD: always drop directory lock in nfsd_unlink()
NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
NFSD: add posix ACLs to struct nfsd_attrs
NFSD: add security label to struct nfsd_attrs
NFSD: set attributes when creating symlinks
NFSD: introduce struct nfsd_attrs
NFSD: verify the opened dentry after setting a delegation
NFSD: drop fh argument from alloc_init_deleg
NFSD: Move copy offload callback arguments into a separate structure
NFSD: Add nfsd4_send_cb_offload()
NFSD: Remove kmalloc from nfsd4_do_async_copy()
NFSD: Refactor nfsd4_do_copy()
NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
...
First set of fixes for v6.0. Small one this time, fix a cfg80211
warning seen with brcmfmac and remove an unncessary inline keyword
from wilc1000.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmLyj2ERHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZtxaggAptzh9NVi2qCWpCdwIjp+d6CusPoEA4NN
eI7PSLecWPA5MVCR5YXSOboVDEtV/wGDOk/N1fKpKVXW02+7nvuLohx5tOclFpms
CZtS2thpyEvUW6Zu+bE1Opwyx1v4e3nyznrNXMHW8tcnaVI3BNwYpdp7LRCylv07
JQPNKZvxR5fs8NuIhf0O1TSjPaUSvRrMWfRn3ZioHWVa7+j8qMfnxWk+o6n38zP5
fqbYlhLEBS3Nu9jp3e26KRMRrkAs/OTb/oRc/bPbU68V0VFPquP97Fz0vOobyjzO
+B5+qAcaNpP6lSlAmrVyPxFEO1Y0utXblXblrWQsAqox7rt/PXQecg==
=2QwM
-----END PGP SIGNATURE-----
Merge tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.0
First set of fixes for v6.0. Small one this time, fix a cfg80211
warning seen with brcmfmac and remove an unncessary inline keyword
from wilc1000.
* tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: wilc1000: fix spurious inline in wilc_handle_disconnect()
wifi: cfg80211: Fix validating BSS pointers in __cfg80211_connect_result
====================
Link: https://lore.kernel.org/r/20220809164756.B1DAEC433D6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In nf_tables_updtable, if nf_tables_table_enable returns an error,
nft_trans_destroy is called to free the transaction object.
nft_trans_destroy() calls list_del(), but the transaction was never
placed on a list -- the list head is all zeroes, this results in
a null dereference:
BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59
Call Trace:
nft_trans_destroy+0x26/0x59
nf_tables_newtable+0x4bc/0x9bc
[..]
Its sane to assume that nft_trans_destroy() can be called
on the transaction object returned by nft_trans_alloc(), so
make sure the list head is initialised.
Fixes: 55dd6f9307 ("netfilter: nf_tables: use new transaction infrastructure to handle table")
Reported-by: mingi cho <mgcho.minic@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Extend struct nft_data_desc to add a flag field that specifies
nft_data_init() is being called for set element data.
Use it to disallow jump to implicit chain from set element, only jump
to chain via immediate expression is allowed.
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.
This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.
Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6ca ("netfilter: nf_tables:
stricter validation of element data").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When doing lookups for rules on the same batch by using its ID, a rule from
a different chain can be used. If a rule is added to a chain but tries to
be positioned next to a rule from a different chain, it will be linked to
chain2, but the use counter on chain1 would be the one to be incremented.
When looking for rules by ID, use the chain that was used for the lookup by
name. The chain used in the context copied to the transaction needs to
match that same chain. That way, struct nft_rule does not need to get
enlarged with another member.
Fixes: 1a94e38d25 ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Fixes: 75dd48e2e4 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When doing lookups for chains on the same batch by using its ID, a chain
from a different table can be used. If a rule is added to a table but
refers to a chain in a different table, it will be linked to the chain in
table2, but would have expressions referring to objects in table1.
Then, when table1 is removed, the rule will not be removed as its linked to
a chain in table2. When expressions in the rule are processed or removed,
that will lead to a use-after-free.
When looking for chains by ID, use the table that was used for the lookup
by name, and only return chains belonging to that same table.
Fixes: 837830a4b4 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When doing lookups for sets on the same batch by using its ID, a set from a
different table can be used.
Then, when the table is removed, a reference to the set may be kept after
the set is freed, leading to a potential use-after-free.
When looking for sets by ID, use the table that was used for the lookup by
name, and only return sets belonging to that same table.
This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.
Reported-by: Team Orca of Sea Security (@seasecresponse)
Fixes: 958bee14d0 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Update template to validate variable length extensions. This patch adds
a new .ext_len[id] field to the template to store the expected extension
length. This is used to sanity check the initialization of the variable
length extension.
Use PTR_ERR() in nft_set_elem_init() to report errors since, after this
update, there are two reason why this might fail, either because of
ENOMEM or insufficient room in the extension field (EINVAL).
Kernels up until 7e6bc1f6ca ("netfilter: nf_tables: stricter
validation of element data") allowed to copy more data to the extension
than was allocated. This ext_len field allows to validate if the
destination has the correct size as additional check.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We need to drop skb references taken in j1939_session_skb_queue() when
destroying a session in j1939_session_destroy(). Otherwise those skbs
would be lost.
Link to Syzkaller info and repro: https://forge.ispras.ru/issues/11743.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
V1: https://lore.kernel.org/all/20220708175949.539064-1-pchelkin@ispras.ru
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Suggested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20220805150216.66313-1-pchelkin@ispras.ru
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
We should warn user-space that it is doing something wrong when trying
to activate sessions with identical parameters but WARN_ON_ONCE macro
can not be used here as it serves a different purpose.
So it would be good to replace it with netdev_warn_once() message.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20220729143655.1108297-1-pchelkin@ispras.ru
[mkl: fix indention]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
- Fixes various issues related to ISO channel/socket support
- Fixes issues when building with C=1
- Fix cancel uninitilized work which blocks syzbot to run
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLxpdoZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKbjqEACZiIUKIACTYWa8Os0fTuzu
LM/h4aOnh3L+W2KyA9Kh4Hmm7Caf9JtUZrTIMhMigGiTTN91eLPoScu6ATm7q0vY
7JgfRKbsLCjhUV8uQfypDBM0uQq7exbEiwd1KTHo8XfOgiheZL6ergN4r2g+V/gt
up0a58j4ukc6PhWAxujc3UzvMj2c1Sb5jY6TIuyiQM7RONtWLH9VDzc0InRNGqpa
eEpPDqCuXsgDTKDAvcJoWARwnj6nsODN3QaSWVlwgN1JgE0/OjXI9hoUNQ83ueCH
pl6qigJIuCnGq4ZDbdDE+QcK5I2ouoGoJ9rQMLuUFdupmaBtTEdMK7pw7opzYt3c
HqW/TvIR8t2LG0oFmrvFSKH+OMHkIH7D7zaCHGYx5T7B778x5fnUK4OfhvnJ4NPu
HkKYD5BJv92X7cHacgclJwQdwwbParrr7wPbqGiSRgiw2ec2puC1VQSYj/+4nwV5
De3AJ2OORv+2kcIw+zi3T0wGzddQF07gXXpz7ckOnFQ1A5jiYX5yGrfGlJezvblX
LnXikwvPkkl640ZRrSZvGQBPNySKv8N2yuE/FtbkKfNjoumAkC67PA+4NYOLc9g9
gkgPnR6y4Cm+h3yLILV3njcYbif+5Ue0KHx8L3rr523bZ9C7vUKdcE1i+Tkr9B0y
I6rMyxtfkUFwVRRFKwvguw==
=nFPi
-----END PGP SIGNATURE-----
Merge tag 'for-net-2022-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fixes various issues related to ISO channel/socket support
- Fixes issues when building with C=1
- Fix cancel uninitilized work which blocks syzbot to run
* tag 'for-net-2022-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: ISO: Fix not using the correct QoS
Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed()
Bluetooth: ISO: Fix iso_sock_getsockopt for BT_DEFER_SETUP
Bluetooth: MGMT: Fixes build warnings with C=1
Bluetooth: hci_event: Fix build warning with C=1
Bluetooth: ISO: Fix memory corruption
Bluetooth: Fix null pointer deref on unexpected status event
Bluetooth: ISO: Fix info leak in iso_sock_getsockopt()
Bluetooth: hci_conn: Fix updating ISO QoS PHY
Bluetooth: ISO: unlock on error path in iso_sock_setsockopt()
Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression
====================
Link: https://lore.kernel.org/r/20220809001224.412807-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a userspace application starts a blocking connect(), a CALL REQUEST
is sent, the t21 timer is started and the connect is waiting in
x25_wait_for_connection_establishment(). If then for some reason the t21
timer expires before any reaction on the assigned logical channel (e.g.
CALL ACCEPT, CLEAR REQUEST), there is sent a CLEAR REQUEST and timer
t23 is started waiting for a CLEAR confirmation. If we now receive a
CLEAR CONFIRMATION from the peer, x25_disconnect() is called in
x25_state2_machine() with reason "0", which means "normal" call
clearing. This is ok, but the parameter "reason" is used as sk->sk_err
in x25_disconnect() and sock_error(sk) is evaluated in
x25_wait_for_connection_establishment() to check if the call is still
pending. As "0" is not rated as an error, the connect will stuck here
forever.
To fix this situation, also check if the sk->sk_state changed form
TCP_SYN_SENT to TCP_CLOSE in the meantime, which is also done by
x25_disconnect().
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20220805061810.10824-1-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* more new_sync_{read,write}() speedups - ITER_UBUF introduction
* ITER_PIPE cleanups
* unification of iov_iter_get_pages/iov_iter_get_pages_alloc and
switching them to advancing semantics
* making ITER_PIPE take high-order pages without splitting them
* handling copy_page_from_iter() for high-order pages properly
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYvHI8QAKCRBZ7Krx/gZQ
62CQAPsGlbebqBeAT2pMulaGDxfLAsgz5Yf4BEaMLhPtRqFOQgD+KrZQId7Sd8O0
3IWucpTb2c4jvLlXhGMS+XWnusQH+AQ=
=pBux
-----END PGP SIGNATURE-----
Merge tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more iov_iter updates from Al Viro:
- more new_sync_{read,write}() speedups - ITER_UBUF introduction
- ITER_PIPE cleanups
- unification of iov_iter_get_pages/iov_iter_get_pages_alloc and
switching them to advancing semantics
- making ITER_PIPE take high-order pages without splitting them
- handling copy_page_from_iter() for high-order pages properly
* tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits)
fix copy_page_from_iter() for compound destinations
hugetlbfs: copy_page_to_iter() can deal with compound pages
copy_page_to_iter(): don't split high-order page in case of ITER_PIPE
expand those iov_iter_advance()...
pipe_get_pages(): switch to append_pipe()
get rid of non-advancing variants
ceph: switch the last caller of iov_iter_get_pages_alloc()
9p: convert to advancing variant of iov_iter_get_pages_alloc()
af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages()
iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()
block: convert to advancing variants of iov_iter_get_pages{,_alloc}()
iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
iov_iter: saner helper for page array allocation
fold __pipe_get_pages() into pipe_get_pages()
ITER_XARRAY: don't open-code DIV_ROUND_UP()
unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts
unify xarray_get_pages() and xarray_get_pages_alloc()
unify pipe_get_pages() and pipe_get_pages_alloc()
iov_iter_get_pages(): sanity-check arguments
iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper
...
Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.
Provide inline wrappers doing that, convert trivial open-coded
uses of those.
BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This fixes using wrong QoS settings when attempting to send frames while
acting as peripheral since the QoS settings in use are stored in
hconn->iso_qos not in sk->qos, this is actually properly handled on
getsockopt(BT_ISO_QOS) but not on iso_send_frame.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
syzbot is reporting attempt to cancel uninitialized work at
mgmt_index_removed() [1], for calling cancel_delayed_work_sync() without
INIT_DELAYED_WORK() is not permitted.
INIT_DELAYED_WORK() is called from mgmt_init_hdev() via chan->hdev_init()
from hci_mgmt_cmd(), but cancel_delayed_work_sync() is unconditionally
called from mgmt_index_removed().
Call cancel_delayed_work_sync() only if HCI_MGMT flag was set, for
mgmt_init_hdev() sets HCI_MGMT flag when calling INIT_DELAYED_WORK().
Link: https://syzkaller.appspot.com/bug?extid=b8ddd338a8838e581b1c [1]
Reported-by: syzbot <syzbot+b8ddd338a8838e581b1c@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 0ef08313ce ("Bluetooth: Convert delayed discov_off to hci_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
BT_DEFER_SETUP shall be considered valid for all states except for
BT_CONNECTED as it is also used when initiated a connection rather then
only for BT_BOUND and BT_LISTEN.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes the following warning when building with make C=1:
net/bluetooth/mgmt.c:3821:29: warning: restricted __le16 degrades to integer
net/bluetooth/mgmt.c:4625:9: warning: cast to restricted __le32
Fixes: 600a87490f ("Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS.")
Fixes: 4c54bf2b09 ("Bluetooth: Add get/set device flags mgmt op")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes the following warning when build with make C=1:
net/bluetooth/hci_event.c:337:15: warning: restricted __le16 degrades to integer
Fixes: a936612036 ("Bluetooth: Process result of HCI Delete Stored Link Key command")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The following memory corruption can happen since iso_pinfo.base size
did not account for its headers (4 bytes):
net/bluetooth/eir.c
76 memcpy(&eir[eir_len], data, data_len);
^^^^^^^ ^^^^^^^^
77 eir_len += data_len;
78
79 return eir_len;
80 }
The "eir" buffer has 252 bytes and data_len is 252 but we do a memcpy()
to &eir[4] so this can corrupt 4 bytes beyond the end of the buffer.
Fixes: f764a6c2c1 ("Bluetooth: ISO: Add broadcast support")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
__hci_cmd_sync returns NULL if the controller responds with a status
event. This is unexpected for the commands sent here, but on
occurrence leads to null pointer dereferences and thus must be
handled.
Signed-off-by: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The C standard rules for when struct holes are zeroed out are slightly
weird. The existing assignments might initialize everything, but GCC
is allowed to (and does sometimes) leave the struct holes uninitialized,
so instead of using yet another variable and copy the QoS settings just
use a pointer to the stored QoS settings.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
BT_ISO_QOS has different semantics when it comes to QoS PHY as it uses
0x00 to disable a direction but that value is invalid over HCI and
sockets using DEFER_SETUP to connect may attempt to use hci_bind_cis
multiple times in order to detect if the parameters have changed, so to
fix the code will now just mirror the PHY for the parameters of
HCI_OP_LE_SET_CIG_PARAMS and will not update the PHY of the socket
leaving it disabled.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Call release_sock(sk); before returning on this error path.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The patch d0be8347c6: "Bluetooth: L2CAP: Fix use-after-free caused
by l2cap_chan_put" from Jul 21, 2022, leads to the following Smatch
static checker warning:
net/bluetooth/l2cap_core.c:1977 l2cap_global_chan_by_psm()
error: we previously assumed 'c' could be null (see line 1996)
Fixes: d0be8347c6 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The commit d1361840f8 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning")
add one new (struct proto_ops)->set_rcvlowat method so that a protocol
can override the default setsockopt(SO_RCVLOWAT) behavior.
The prior bpf codes don't check and invoke the protos's set_rcvlowat,
now correct it.
Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver's SME is allowed to fill either BSSID or BSS pointers in struct
cfg80211_connect_resp_params when indicating connect response but a
check in __cfg80211_connect_result() is giving unnecessary warning when
driver's SME fills only BSSID pointer and not BSS pointer in struct
cfg80211_connect_resp_params.
In case of mac80211 with auth/assoc path, it is always expected to fill
BSS pointers in struct cfg80211_connect_resp_params when calling
__cfg80211_connect_result() since cfg80211 must have hold BSS pointers
in cfg80211_mlme_assoc().
So, skip the check for the drivers which support cfg80211 connect
callback, for example with brcmfmac is one such driver which had the
warning:
WARNING: CPU: 5 PID: 514 at net/wireless/sme.c:786 __cfg80211_connect_result+0x2fc/0x5c0 [cfg80211]
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: efbabc1165 ("cfg80211: Indicate MLO connection info in connect and roam callbacks")
Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
[kvalo@kernel.org: add more info to the commit log]
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220805135259.4126630-1-quic_vjakkam@quicinc.com
- a couple of fixes
- add a tracepoint for fid refcounting
- some cleanup/followup on fid lookup
- some cleanup around req refcounting
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmLuKqkACgkQq06b7GqY
5nA+RBAAvuA6AjKQSvsNxHsqSFMahwoE3cCPlF/QlmAnnZa1DzmRI3kKTAuuDKkE
Q4c7DjxzDJCWRTlkpNUCGZycHqpu1QQbfoTo43sIqob7W8xlajeemc5Fxqtw5sPM
m+0SzN7vvNJpCr+6pxMXwwGHSZmZOvpFjwj7cUjhpF/V1WO8bNxfCGzQdF0hX1Vn
2HoBbFUOmacL9Z/pF3O/ZG9LwCFuRQsH0EmbFBlJy1WdDtlVHTXlzDaGQ7EGaY4D
17UR6iITsj9ozacLhvk094PLIc3/RHDGLrm3C4Ka3zmUI7BsiYWPDmai3Pu/DNqn
JJ5sZkdrVowxyBbGxw8GpZ4YDJtGsU5XglFPdkw+ZazxhZLNEIstPXg0HTZybrOe
GE+WskWB0qS+RpX0tnYEcX6qOHWm3/63Yq5NG6A9tLQUSFku02jS/bCQSLBrmGWW
Js24IWvFSTvl6XytHeldYhJP618pNUxXRSqgYv96vT/LI3mUrIMN+IVBNPujO6p2
jIYXNoaqLoY3efXKW/WQmp7C/52ZP4ly4fOiz7qtHTQsCcIk8Xo6zwHtm/FkNEqc
sMZdqgLrxKPNBAlT8iEtt//wU2fB7mFt988p0pc+5lAK5t0h67KZJV6vDwTAMObX
wV6Ht+QhOHtJwO779fk8FhZPNRPYZYMutIyFNlRx+4gCpBuqJPc=
=941k
-----END PGP SIGNATURE-----
Merge tag '9p-for-5.20' of https://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
- a couple of fixes
- add a tracepoint for fid refcounting
- some cleanup/followup on fid lookup
- some cleanup around req refcounting
* tag '9p-for-5.20' of https://github.com/martinetd/linux:
net/9p: Initialize the iounit field during fid creation
net: 9p: fix refcount leak in p9_read_work() error handling
9p: roll p9_tag_remove into p9_req_put
9p: Add client parameter to p9_req_put()
9p: Drop kref usage
9p: Fix some kernel-doc comments
9p fid refcount: cleanup p9_fid_put calls
9p fid refcount: add a 9p_fid_ref tracepoint
9p fid refcount: add p9_fid_get/put wrappers
9p: Fix minor typo in code comment
9p: Remove unnecessary variable for old fids while walking from d_parent
9p: Make the path walk logic more clear about when cloning is required
9p: Track the root fid with its own variable during lookups
Fixes the following warnings observed when building
CONFIG_IPV6_SEG6_LWTUNNEL=y with clang:
net/ipv6/seg6_local.o: warning: objtool: seg6_local_fill_encap() falls
through to next function seg6_local_get_encap_size()
net/ipv6/seg6_local.o: warning: objtool: seg6_local_cmp_encap() falls
through to next function input_action_end()
LLVM can fully unroll loops in seg6_local_get_encap_size() and
seg6_local_cmp_encap(). One issue in those loops is that the induction
variable is initialized to 0. The loop iterates over members of
seg6_action_params, a global array of struct seg6_action_param calling
their put() function pointer members. seg6_action_param uses an array
initializer to initialize SEG6_LOCAL_SRH and later elements, which is
the third enumeration of an anonymous union.
The guard `if (attrs & SEG6_F_ATTR(i))` may prevent this from being
called at runtime, but it would still be UB for
`seg6_action_params[0]->put` to be called; the unrolled loop will make
the initial iterations unreachable, which LLVM will later rotate to
fallthrough to the next function.
Make this more obvious that this cannot happen to the compiler by
initializing the loop induction variable to the minimum valid index that
seg6_action_params is initialized to.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220802161203.622293-1-ndesaulniers@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
rose /proc code does not serialize timer accesses.
Initial report by Bernard F6BVP Pidoux exhibits overflow amounting
to 116 ticks on its HZ=250 system.
Full timer access serialization would imho be overkill as rose /proc
does not enforce consistency between displayed ROSE_STATE_XYZ and
timer values during changes of state.
The patch may also fix similar behavior in ax25 /proc, ax25 ioctl
and netrom /proc as they all exhibit the same timer serialization
policy. This point has not been reported though.
The sole remaining use of ax25_display_timer - ax25 rtt valuation -
may also perform marginally better but I have not analyzed it too
deeply.
Cc: Thomas DL9SAU Osterried <thomas@osterried.de>
Link: https://lore.kernel.org/all/d5e93cc7-a91f-13d3-49a1-b50c11f0f811@free.fr/
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Link: https://lore.kernel.org/r/Yuk9vq7t7VhmnOXu@electric-eye.fr.zoreil.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove default to 'y', this infrastructure is not fundamental for the
flowtable operational.
Add a missing dependency on CONFIG_NF_FLOW_TABLE.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: b038177636 ("netfilter: nf_flow_table: count pending offload workqueue tasks")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
do not access info->pkt when info->trace is not 1.
nft_traceinfo is not initialized, except when tracing is enabled.
The 'nft_trace_enabled' static key cannot be used for this, we must
always check info->trace first.
Pass nft_pktinfo directly to avoid this.
Fixes: e34b9ed96c ("netfilter: nf_tables: avoid skb access on nf_stolen")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
- Runtime verification infrastructure
This is the biggest change for this pull request. It introduces the
runtime verification that is necessary for running Linux on safety
critical systems. It allows for deterministic automata models to be
inserted into the kernel that will attach to tracepoints, where the
information on these tracepoints will move the model from state to state.
If a state is encountered that does not belong to the model, it will then
activate a given reactor, that could just inform the user or even panic
the kernel (for which safety critical systems will detect and can recover
from).
- Two monitor models are also added: Wakeup In Preemptive (WIP - not to be
confused with "work in progress"), and Wakeup While Not Running (WWNR).
- Added __vstring() helper to the TRACE_EVENT() macro to replace several
vsnprintf() usages that were all doing it wrong.
- eprobes now can have their event autogenerated when the event name is left
off.
- The rest is various cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYu0yzRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qj4HAP4tQtV55rjj4DQ5XIXmtI3/64PmyRSJ
+y4DEXi1UvEUCQD/QAuQfWoT/7gh35ltkfeS4t3ockzy14rrkP5drZigiQA=
=kEtM
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Runtime verification infrastructure
This is the biggest change here. It introduces the runtime
verification that is necessary for running Linux on safety critical
systems.
It allows for deterministic automata models to be inserted into the
kernel that will attach to tracepoints, where the information on
these tracepoints will move the model from state to state.
If a state is encountered that does not belong to the model, it will
then activate a given reactor, that could just inform the user or
even panic the kernel (for which safety critical systems will detect
and can recover from).
- Two monitor models are also added: Wakeup In Preemptive (WIP - not to
be confused with "work in progress"), and Wakeup While Not Running
(WWNR).
- Added __vstring() helper to the TRACE_EVENT() macro to replace
several vsnprintf() usages that were all doing it wrong.
- eprobes now can have their event autogenerated when the event name is
left off.
- The rest is various cleanups and fixes.
* tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
rv: Unlock on error path in rv_unregister_reactor()
tracing: Use alignof__(struct {type b;}) instead of offsetof()
tracing/eprobe: Show syntax error logs in error_log file
scripts/tracing: Fix typo 'the the' in comment
tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT
tracing: Use free_trace_buffer() in allocate_trace_buffers()
tracing: Use a struct alignof to determine trace event field alignment
rv/reactor: Add the panic reactor
rv/reactor: Add the printk reactor
rv/monitor: Add the wwnr monitor
rv/monitor: Add the wip monitor
rv/monitor: Add the wip monitor skeleton created by dot2k
Documentation/rv: Add deterministic automata instrumentation documentation
Documentation/rv: Add deterministic automata monitor synthesis documentation
tools/rv: Add dot2k
Documentation/rv: Add deterministic automaton documentation
tools/rv: Add dot2c
Documentation/rv: Add a basic documentation
rv/include: Add instrumentation helper functions
rv/include: Add deterministic automata monitor definition via C macros
...
When namespace support was added to xfrm/afkey, it caused the
previously single-threaded call to xfrm_probe_algs to become
multi-threaded. This is buggy and needs to be fixed with a mutex.
Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Fixes: 283bc9f35b ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
eBPF program, the MPTCP protocol ends-up leaking all the subflows:
the related cleanup happens in __mptcp_destroy_sock() that is not
invoked in such code path.
Address the issue moving the subflow sockets cleanup in the
mptcp_destroy_common() helper, which is invoked in every msk cleanup
path.
Additionally get rid of the intermediate list_splice_init step, which
is an unneeded relic from the past.
The issue is present since before the reported root cause commit, but
any attempt to backport the fix before that hash will require a complete
rewrite.
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the set of SPDX comment updates for 6.0-rc1.
Nothing huge here, just a number of updated SPDX license tags and
cleanups based on the review of a number of common patterns in GPLv2
boilerplate text. Also included in here are a few other minor updates,
2 USB files, and one Documentation file update to get the SPDX lines
correct.
All of these have been in the linux-next tree for a very long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYupz3g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynPUgCgslaf2ssCgW5IeuXbhla+ZBRAzisAnjVgOvLN
4AKdqbiBNlFbCroQwmeQ
=v1sg
-----END PGP SIGNATURE-----
Merge tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here is the set of SPDX comment updates for 6.0-rc1.
Nothing huge here, just a number of updated SPDX license tags and
cleanups based on the review of a number of common patterns in GPLv2
boilerplate text.
Also included in here are a few other minor updates, two USB files,
and one Documentation file update to get the SPDX lines correct.
All of these have been in the linux-next tree for a very long time"
* tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (28 commits)
Documentation: samsung-s3c24xx: Add blank line after SPDX directive
x86/crypto: Remove stray comment terminator
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_406.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_398.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_391.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_390.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_385.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_319.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_318.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_298.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_292.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_179.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 2)
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 1)
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_160.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_152.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_147.RULE
treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_133.RULE
...
Here is the set of driver core and kernfs changes for 6.0-rc1.
"biggest" thing in here is some scalability improvements for kernfs for
large systems. Other than that, included in here are:
- arch topology and cache info changes that have been reviewed
and discussed a lot.
- potential error path cleanup fixes
- deferred driver probe cleanups
- firmware loader cleanups and tweaks
- documentation updates
- other small things
All of these have been in the linux-next tree for a while with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYuqCnw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym/JgCcCnaycJY00ZPRQm3LQCyzfJ0HgqoAn2qxGV+K
NKycLeXZSnuvIA87dycE
=/4Jk
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core / kernfs updates from Greg KH:
"Here is the set of driver core and kernfs changes for 6.0-rc1.
The "biggest" thing in here is some scalability improvements for
kernfs for large systems. Other than that, included in here are:
- arch topology and cache info changes that have been reviewed and
discussed a lot.
- potential error path cleanup fixes
- deferred driver probe cleanups
- firmware loader cleanups and tweaks
- documentation updates
- other small things
All of these have been in the linux-next tree for a while with no
reported problems"
* tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (63 commits)
docs: embargoed-hardware-issues: fix invalid AMD contact email
firmware_loader: Replace kmap() with kmap_local_page()
sysfs docs: ABI: Fix typo in comment
kobject: fix Kconfig.debug "its" grammar
kernfs: Fix typo 'the the' in comment
docs: driver-api: firmware: add driver firmware guidelines. (v3)
arch_topology: Fix cache attributes detection in the CPU hotplug path
ACPI: PPTT: Leave the table mapped for the runtime usage
cacheinfo: Use atomic allocation for percpu cache attributes
drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist
MAINTAINERS: Change mentions of mpm to olivia
docs: ABI: sysfs-devices-soc: Update Lee Jones' email address
docs: ABI: sysfs-class-pwm: Update Lee Jones' email address
Documentation/process: Add embargoed HW contact for LLVM
Revert "kernfs: Change kernfs_notify_list to llist."
ACPI: Remove the unused find_acpi_cpu_cache_topology()
arch_topology: Warn that topology for nested clusters is not supported
arch_topology: Add support for parsing sockets in /cpu-map
arch_topology: Set cluster identifier in each core/thread from /cpu-map
arch_topology: Limit span of cpu_clustergroup_mask()
...
Now that the bonding driver keeps track of the last TX time of ARP and
NS probes, we effectively revert the following commits:
32d3e51a82 ("net_sched: use macvlan real dev trans_start in dev_trans_start()")
07ce76aa9b ("net_sched: make dev_trans_start return vlan's real dev trans_start")
Note that the approach of continuing to hack at this function would not
get us very far, hence the desire to take a different approach. DSA is
also a virtual device that uses NETIF_F_LLTX, but there, many uppers
share the same lower (DSA master, i.e. the physical host port of a
switch). By making dev_trans_start() on a DSA interface return the
dev_trans_start() of the master, we effectively assume that all other
DSA interfaces are silent, otherwise this corrupts the validity of the
probe timestamp data from the bonding driver's perspective.
Furthermore, the hacks didn't take into consideration the fact that the
lower interface of @dev may not have been physical either. For example,
VLAN over VLAN, or DSA with 2 masters in a LAG.
And even furthermore, there are NETIF_F_LLTX devices which are not
stacked, like veth. The hack here would not work with those, because it
would not have to provide the bonding driver something to chew at all.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core
----
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source file
with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent
netdev_* schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF
---
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when
possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the
eBPF used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols
---------
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA,
to cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API
----------
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers
----------------------
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers
-------
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability
(parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share
the probe code (parts 1, 2), add support for phylink
mac configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than
10 years.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLqN+oSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkB9kQAI9VqW0c3SfiTJnkVBEIovZ6Tnh5stD2
UYFkh1BdchLsYxi7W4XMpVPSzRztiTP87mIx5c/KvIzj+QNeWL1XWRJSPdI9HhTD
pTAA/tM2OG7bqrbyQiKDNfpQdNl7+kk1RwnYd+f9RFl1QVuIJaYhmjVwrsN5xF/+
jUsotpROarM2dGFWiFwJbKhP2zMDT+6qEEahM8pEPggKhv8wRLYjany2cZVEe4e0
WGUpbINAS8gEKm0Ob922WaDfDrcK/N1Z0jNz/kMaENkK18Vvc7F6bCO0DzAawKX9
QZMMwm6mHp3EThflJAMAzCGIYiIcwLhykgdyj8rrjPhFrWbMD2Sdsbo21HOXU/8j
u4aAhVl+d+h7emmbgBoJ8sycVJ7BQlXz7lX20sTgADv9xI4/dPhQ17CMRuwX6fXX
JSrn6P6e1LTV5CEg6vrlSPnKPY6uhFn/cPw47FxCjRwJ9phVnp+8uZWQmf9Pz3yf
Ok/tcj+juFbsmuOshHy2cbRkuNZNS0oRWlSTBo5795ZwOLSakMonR3L+ev2aOvzz
DVrFp2Y/iIVwMSFdCbouYdYnhArPRhOAtCmZc2afY8aBN7aaMgrdTy3+mzUoHy3I
FG3K+VuKpfi0vY4zn6ZoLZDIpyXIoJJ93RcSGltD32t3Dp1RaQMVEI4s45k05PVm
1nYpXKHA8qML
=hxEG
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Paolo Abeni:
"Core:
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent netdev_*
schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF:
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF
used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...
a destination-only iov_iter when it handles Rerror arriving in reply to
zero-copy request. Not hard to fix, fortunately; it's a prereq for the
iov_iter_get_pages() work in the second part of iov_iter series,
ended up in a separate branch.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYurQLQAKCRBZ7Krx/gZQ
65AiAP9Mmpu3yMWmfMEnTEjBv4iSuG37JdgHE/IE/P6q99opfQEAxThED/nJVuaG
YZuNUx60OT9Au1hSdfl7EjAN4dg/Kw8=
=tL2V
-----END PGP SIGNATURE-----
Merge tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 9p iov_iter fix from Al Viro:
"net/9p abuses iov_iter primitives - it attempts to copy _from_ a
destination-only iov_iter when it handles Rerror arriving in reply to
zero-copy request. Not hard to fix, fortunately.
This is a prereq for the iov_iter_get_pages() work in the second part
of iov_iter series, ended up in a separate branch"
* tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: handling Rerror without copy_from_iter_full()
This function always returns 0, and ignores the nofail boolean. Drop the
nofail argument, make the function void return and fix up the callers.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
x->lastused was not cloned in xfrm_do_migrate. Add it to clone during
migrate.
Fixes: 80c9abaabf ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This reverts commit af734a26a1.
The abvoce commit is a regression according RFC 2367. A better fix would be
use x->lastused. Which will be propsed later.
according to RFC 2367 use_time == sadb_lifetime_usetime.
"sadb_lifetime_usetime
For CURRENT, the time, in seconds, when association
was first used. For HARD and SOFT, the number of
seconds after the first use of the association until
it expires."
Fixes: af734a26a1 ("xfrm: update SA curlft.use_time")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
API:
- Make proc files report fips module name and version.
Algorithms:
- Move generic SHA1 code into lib/crypto.
- Implement Chinese Remainder Theorem for RSA.
- Remove blake2s.
- Add XCTR with x86/arm64 acceleration.
- Add POLYVAL with x86/arm64 acceleration.
- Add HCTR2.
- Add ARIA.
Drivers:
- Add support for new CCP/PSP device ID in ccp.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmLosAAACgkQxycdCkmx
i6dvgxAAzcw0cKMuq3dbQamzeVu1bDW8rPb7yHnpXal3ao5ewa15+hFjsKhdh/s3
cjM5Lu7Qx4lnqtsh2JVSU5o2SgEpptxXNfxAngcn46ld5EgV/G4DYNKuXsatMZ2A
erCzXqG9dDxJmREat+5XgVfD1RFVsglmEA/Nv4Rvn+9O4O6PfwRa8GyUzeKC+byG
qs/1JyiPqpyApgzCvlQFAdTF4PM7ruDtg3mnMy2EKAzqj4JUseXRi1i81vLVlfBL
T40WESG/CnOwIF5MROhziAtkJMS4Y4v2VQ2++1p0gwG6pDCnq4w7u9cKPXYfNgZK
fMVCxrNlxIH3W99VfVXbXwqDSN6qEZtQvhnliwj9aEbEltIoH+B02wNfS/BDsTec
im+5NCnNQ6olMPyL0yHrMKisKd+DwTrEfYT5H2kFhcdcYZncQ9C6el57kimnJRzp
4ymPRudCKm/8weWGTtmjFMi+PFP4LgvCoR+VMUd+gVe91F9ZMAO0K7b5z5FVDyDf
wmsNBvsEnTdm/r7YceVzGwdKQaP9sE5wq8iD/yySD1PjlmzZos1CtCrqAIT/v2RK
pQdZCIkT8qCB+Jm03eEd4pwjEDnbZdQmpKt4cTy0HWIeLJVG1sXPNpgwPCaBEV4U
g0nctILtypChlSDmuGhTCyuElfMg6CXt4cgSZJTBikT+QcyWOm4=
=rfWK
-----END PGP SIGNATURE-----
Merge tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Make proc files report fips module name and version
Algorithms:
- Move generic SHA1 code into lib/crypto
- Implement Chinese Remainder Theorem for RSA
- Remove blake2s
- Add XCTR with x86/arm64 acceleration
- Add POLYVAL with x86/arm64 acceleration
- Add HCTR2
- Add ARIA
Drivers:
- Add support for new CCP/PSP device ID in ccp"
* tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (89 commits)
crypto: tcrypt - Remove the static variable initialisations to NULL
crypto: arm64/poly1305 - fix a read out-of-bound
crypto: hisilicon/zip - Use the bitmap API to allocate bitmaps
crypto: hisilicon/sec - fix auth key size error
crypto: ccree - Remove a useless dma_supported() call
crypto: ccp - Add support for new CCP/PSP device ID
crypto: inside-secure - Add missing MODULE_DEVICE_TABLE for of
crypto: hisilicon/hpre - don't use GFP_KERNEL to alloc mem during softirq
crypto: testmgr - some more fixes to RSA test vectors
cyrpto: powerpc/aes - delete the rebundant word "block" in comments
hwrng: via - Fix comment typo
crypto: twofish - Fix comment typo
crypto: rmd160 - fix Kconfig "its" grammar
crypto: keembay-ocs-ecc - Drop if with an always false condition
Documentation: qat: rewrite description
Documentation: qat: Use code block for qat sysfs example
crypto: lib - add module license to libsha1
crypto: lib - make the sha1 library optional
crypto: lib - move lib/sha1.c into lib/crypto/
crypto: fips - make proc files report fips module name and version
...
The double `without' is duplicated in the comment, remove one.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It should be better to check pointer firstly, then assign it
to c->rules[]. Refine code a little bit.
Signed-off-by: Li Qiong <liqiong@nfschina.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm/MQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoaXD/9Nevo4KQmlG83ZcZfu2d51VlGtt6/Dl7LL
pr07RfnRFJcjeCPCwXCXmu6rrlY+inpfEWv9iCR/ImoeESOJCzm0dN/nlffO/zT1
E0h5AlEoDv2bYrCnVkbfvxL722TZqGeLiDE4YY1jVbuUfs3TDmLQzfGbORK+Zw4y
wPEMDZP1yWHoyeHUGWFasu6dpWiAwsZ4sTX0J631YwIBDNWKZqtienIiY15rK4dz
GioBea6voe8Fos0VEhCBOKXMmV9mG4yVOPeaDbTWTRfuzGNF8b7t2vg7mz+PrbBY
M8h1oEt+/+FnsCIZqfaEUzqHX6quv46OVtq/F5L3yNz/5QEsnqfv08ZFwD3sXdgZ
/RFxXamfcn/LoxzZ9eLu3MeyzpXp6frxBcgTNGc3q2TlIwXr1WsIx2N4PxZh00GM
ssW/ulaOZvZmOmDlbdeSC7sp3R1JmHO4qVlHowr58ce8pkishNTwlZZGr0sHyeNq
/Wkd9NQEQEFD6AIzZ/Mz9CsmzHeHYpy6GhicFrcLuU4YF/fnQ6T4hTjlIlucGv/S
IeqoAHrurCB0/p1ml6VfJ58xUWXNCCCkKC5+xu8Vm6/RgMlIw5KkzvVEBfflnomB
wVJLYsLw41gnlqqpwISR39I7cDV+s6xC5P8YAA/NLz692HDIUrRX14dlbZuXIgbc
ROeHB2N5+g==
=vSwm
-----END PGP SIGNATURE-----
Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring zerocopy support from Jens Axboe:
"This adds support for efficient support for zerocopy sends through
io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and
UDP.
The core network changes to support this is in a stable branch from
Jakub that both io_uring and net-next has pulled in, and the io_uring
changes are layered on top of that.
All of the work has been done by Pavel"
* tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits)
io_uring: notification completion optimisation
io_uring: export req alloc from core
io_uring/net: use unsigned for flags
io_uring/net: make page accounting more consistent
io_uring/net: checks errors of zc mem accounting
io_uring/net: improve io_get_notif_slot types
selftests/io_uring: test zerocopy send
io_uring: enable managed frags with register buffers
io_uring: add zc notification flush requests
io_uring: rename IORING_OP_FILES_UPDATE
io_uring: flush notifiers after sendzc
io_uring: sendzc with fixed buffers
io_uring: allow to pass addr into sendzc
io_uring: account locked pages for non-fixed zc
io_uring: wire send zc request type
io_uring: add notification slot registration
io_uring: add rsrc referencing for notifiers
io_uring: complete notifiers in tw
io_uring: cache struct io_notif
io_uring: add zc notification infrastructure
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm5gQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmKMD/4l3QIrLbjYIxlfrzQcHbmYuUkbQtj3SbZg
6ejbnGVhCs1P9DdXH8MgE2BxgpiXQE0CqOK7vbSoo5ep2n2UTLI2DIxAl74SMIo7
0wmJXtUJySuViKr3NYVHqlN180MkQYddBz0nGElhkQBPBCMhW8CrtPCeURr/YyHp
2RxSYBXiUx2gRyig+klnp6oPEqelcBZJUyNHdA9yVrgl/RhB/t2rKj7D++8ukQM3
Zuyh8WIkTeTfUz9hdGG7fuCEdZN4DlO2CCEc7uy0cKi6VRCKH4hYUCqClJ+/cfd2
43dUI2O7B6D1t/ObFh8AGIDXBDqVA6ePQohQU6gooRkfQiBPKkc9d0ts4yIhRqca
AjkzNM+0Eve3A01loJ8J84w8oZnvNpYEv5n8/sZVLWcyU3UIs0I88nC2OBiFtoRq
d77CtFLwOTo+r3STtAhnZOqez90rhS6BqKtqlUP346PCuFItl6/MbGtwdTbLYEFj
CVNIb2pERWSr2NxGv4lFyXaX/cRwruxojWH7yc3rRYjr4Ykevd1pe/fMGNiMAnKw
5em/3QU3qq0ZVcXLMihksKeHHFIQwGDRMuyuv/fktV10+yYXQ0t16WzkJT3aR8Xo
cqs0r8+6Jnj3uYcOMzj/FoLcpEPr21hnwAtzLto1mG1Wh4JRn/D7Nx5zqxPLxcW+
NiU6VihPOw==
=gxeV
-----END PGP SIGNATURE-----
Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
- As per (valid) complaint in the last merge window, fs/io_uring.c has
grown quite large these days. io_uring isn't really tied to fs
either, as it supports a wide variety of functionality outside of
that.
Move the code to io_uring/ and split it into files that either
implement a specific request type, and split some code into helpers
as well. The code is organized a lot better like this, and io_uring.c
is now < 4K LOC (me).
- Deprecate the epoll_ctl opcode. It'll still work, just trigger a
warning once if used. If we don't get any complaints on this, and I
don't expect any, then we can fully remove it in a future release
(me).
- Improve the cancel hash locking (Hao)
- kbuf cleanups (Hao)
- Efficiency improvements to the task_work handling (Dylan, Pavel)
- Provided buffer improvements (Dylan)
- Add support for recv/recvmsg multishot support. This is similar to
the accept (or poll) support for have for multishot, where a single
SQE can trigger everytime data is received. For applications that
expect to do more than a few receives on an instantiated socket, this
greatly improves efficiency (Dylan).
- Efficiency improvements for poll handling (Pavel)
- Poll cancelation improvements (Pavel)
- Allow specifiying a range for direct descriptor allocations (Pavel)
- Cleanup the cqe32 handling (Pavel)
- Move io_uring types to greatly cleanup the tracing (Pavel)
- Tons of great code cleanups and improvements (Pavel)
- Add a way to do sync cancelations rather than through the sqe -> cqe
interface, as that's a lot easier to use for some use cases (me).
- Add support to IORING_OP_MSG_RING for sending direct descriptors to a
different ring. This avoids the usually problematic SCM case, as we
disallow those. (me)
- Make the per-command alloc cache we use for apoll generic, place
limits on it, and use it for netmsg as well (me).
- Various cleanups (me, Michal, Gustavo, Uros)
* tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits)
io_uring: ensure REQ_F_ISREG is set async offload
net: fix compat pointer in get_compat_msghdr()
io_uring: Don't require reinitable percpu_ref
io_uring: fix types in io_recvmsg_multishot_overflow
io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
io_uring: support multishot in recvmsg
net: copy from user before calling __get_compat_msghdr
net: copy from user before calling __copy_msghdr
io_uring: support 0 length iov in buffer select in compat
io_uring: fix multishot ending when not polled
io_uring: add netmsg cache
io_uring: impose max limit on apoll cache
io_uring: add abstraction around apoll cache
io_uring: move apoll cache to poll.c
io_uring: consolidate hash_locked io-wq handling
io_uring: clear REQ_F_HASH_LOCKED on hash removal
io_uring: don't race double poll setting REQ_F_ASYNC_DATA
io_uring: don't miss setting REQ_F_DOUBLE_POLL
io_uring: disable multishot recvmsg
io_uring: only trace one of complete or overflow
...
Commit 2dec18ad82 forgets to call mutex_unlock() before the function
returns in the error path:
New smatch warnings:
net/core/devlink.c:6392 devlink_nl_cmd_region_new() warn: inconsistent \
returns '®ion->snapshot_lock'.
Make sure we call mutex_unlock() in this error path.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 2dec18ad82 ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220801115742.1309329-1-ammar.faizi@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
destroy_workqueue() safely destroys the workqueue after draining it.
No need for the explicit call to flush_workqueue(). Remove it.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220801112444.26175-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the case of sk->dccps_qpolicy == DCCPQ_POLICY_PRIO, dccp_qpolicy_full
will drop a skb when qpolicy is full. And the lock in dccp_sendmsg is
released before sock_alloc_send_skb and then relocked after
sock_alloc_send_skb. The following conditions may lead dccp_qpolicy_push
to add skb to an already full sk_write_queue:
thread1--->lock
thread1--->dccp_qpolicy_full: queue is full. drop a skb
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_full: queue is not full. no need to drop.
thread2--->unlock
thread1--->lock
thread1--->dccp_qpolicy_push: add a skb. queue is full.
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_push: add a skb!
thread2--->unlock
Fix this by moving dccp_qpolicy_full.
Fixes: b1308dc015 ("[DCCP]: Set TX Queue Length Bounds via Sysctl")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220729110027.40569-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This will help debugging netdevice refcount problems with
CONFIG_NET_DEV_REFCNT_TRACKER=y
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bernard reported that trying to unload rose module would lead
to infamous messages:
unregistered_netdevice: waiting for rose0 to become free. Usage count = xx
This patch solves the issue, by making sure each socket referring to
a netdevice holds a reference count on it, and properly releases it
in rose_release().
rose_dev_first() is also fixed to take a device reference
before leaving the rcu_read_locked section.
Following patch will add ref_tracker annotations to ease
future bug hunting.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As the devlink_mutex was removed and all devlink instances are protected
individually by devlink->lock mutex, allow the netlink ops to run
in parallel and therefore allow user to execute commands on multiple
devlink instances simultaneously.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All accesses to devlink structure from userspace and drivers are locked
with devlink->lock instance mutex. Also, devlinks xa_array iteration is
taken care of by iteration helpers taking devlink reference.
Therefore, remove devlink_mutex as it is no longer needed.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new mark called "unregistering" to be set at the beginning of
devlink_unregister() function. Check this mark during devlinks
iteration in order to prevent getting a reference of devlink which is
being currently unregistered.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__udp_sysctl_init() is called for init_net via udp_sysctl_ops.
While at it, we can rename __udp_sysctl_init() to udp_sysctl_init().
Fixes: 1e80295158 ("udp: Move the udp sysctl to namespace.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If 'local_odp_mr->r_trans_private' is a error code,
it is better to print the error code than to print
the value of IS_ERR().
Signed-off-by: Li Qiong <liqiong@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220724191650.236b1355@rorschach.local.home
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: netdev@vger.kernel.org
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Use "jiffies != now" to replace "jiffies - now > 0" to make
code more readable. We want to put a limit on how long the
loop can run for before rescheduling.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20220729061712.22666-1-yuzhe@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrii Nakryiko says:
====================
bpf-next 2022-07-29
We've added 22 non-merge commits during the last 4 day(s) which contain
a total of 27 files changed, 763 insertions(+), 120 deletions(-).
The main changes are:
1) Fixes to allow setting any source IP with bpf_skb_set_tunnel_key() helper,
from Paul Chaignon.
2) Fix for bpf_xdp_pointer() helper when doing sanity checking, from Joanne Koong.
3) Fix for XDP frame length calculation, from Lorenzo Bianconi.
4) Libbpf BPF_KSYSCALL docs improvements and fixes to selftests to accommodate
s390x quirks with socketcall(), from Ilya Leoshkevich.
5) Allow/denylist and CI configs additions to selftests/bpf to improve BPF CI,
from Daniel Müller.
6) BPF trampoline + ftrace follow up fixes, from Song Liu and Xu Kuohai.
7) Fix allocation warnings in netdevsim, from Jakub Kicinski.
8) bpf_obj_get_opts() libbpf API allowing to provide file flags, from Joe Burton.
9) vsnprintf usage fix in bpf_snprintf_btf(), from Fedor Tokarev.
10) Various small fixes and clean ups, from Daniel Müller, Rongguang Wei,
Jörn-Thorben Hinz, Yang Li.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (22 commits)
bpf: Remove unneeded semicolon
libbpf: Add bpf_obj_get_opts()
netdevsim: Avoid allocation warnings triggered from user space
bpf: Fix NULL pointer dereference when registering bpf trampoline
bpf: Fix test_progs -j error with fentry/fexit tests
selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout
bpftool: Don't try to return value from void function in skeleton
bpftool: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE macro
bpf: btf: Fix vsnprintf return value check
libbpf: Support PPC in arch_specific_syscall_pfx
selftests/bpf: Adjust vmtest.sh to use local kernel configuration
selftests/bpf: Copy over libbpf configs
selftests/bpf: Sort configuration
selftests/bpf: Attach to socketcall() in test_probe_user
libbpf: Extend BPF_KSYSCALL documentation
bpf, devmap: Compute proper xdp_frame len redirecting frames
bpf: Fix bpf_xdp_pointer return pointer
selftests/bpf: Don't assign outer source IP to host
bpf: Set flow flag to allow any source IP in bpf_tunnel_key
geneve: Use ip_tunnel_key flow flags in route lookups
...
====================
Link: https://lore.kernel.org/r/20220729230948.1313527-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Record not only the number of pages requested, but the number of
pages that were actually allocated, to get a measure of progress
(or lack thereof).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The SRv6 H.L2Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.L2Encaps behavior [2].
H.L2Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.
[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.4
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.3
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SRv6 H.Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.Encaps behavior [2].
H.Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.
[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.2
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.1
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
User can use AF_PACKET socket to send packets with the length of 0.
When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
to send packets, and sock->type can be any type.
Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com
Fixes: fd18942244 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let the core take the devlink instance lock around health callbacks and
remove the now redundant locking in the drivers.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After mlx4 driver is converted to do locked reload,
devlink_region_snapshot_create() may be called from both locked and
unlocked context.
Note that in mlx4 region snapshots could be created on any command
failure. That can happen in any flow that involves commands to FW,
which means most of the driver flows.
So resolve this by removing dependency on devlink->lock for region
snapshots list consistency and introduce new mutex to ensure it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After mlx4 driver is converted to do locked reload, functions to get/put
regions snapshot ID may be called from both locked and unlocked context.
So resolve this by removing dependency on devlink->lock for region
snapshot ID tracking by using internal xa_lock() to maintain
shapshot_ids xa_array consistency.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a framework for running selftests.
Framework exposes devlink commands and test suite(s) to the user
to execute and query the supported tests by the driver.
Below are new entries in devlink_nl_ops
devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported
selftests by the drivers.
devlink_nl_cmd_selftests_run: To execute selftests. Users can
provide a test mask for executing group tests or standalone tests.
Documentation/networking/devlink/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.
This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate). When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.
Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel. Use
a new dedicated destruct workqueue for this.
Tested with mlx5 device:
Before: 22141 add/sec, 103 del/sec
After: 11684 add/sec, 11684 del/sec
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TLS context destructor can be run in atomic context. Cleanup operations
for device-offloaded contexts could require access and interaction with
the device callbacks, which might sleep. Hence, the cleanup of such
contexts must be deferred and completed inside an async work.
For all others, this is not necessary, as cleanup is atomic. Invoke
cleanup immediately for them, avoiding queueing redundant gc work.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The return from the call to tls_rx_msg_size() is int, it can be
a negative error code, however this is being assigned to an
unsigned long variable 'sz', so making 'sz' an int.
Eliminate the following coccicheck warning:
./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I went too far in the accessor conversion, we can't use tls_strp_msg()
after decryption because the message may not be ready. What we care
about on this path is that the output skb is detached, i.e. we didn't
somehow just turn around and used the input skb with its TCP data
still attached. So look at the anchor directly.
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo points out that there seems to be no strong reason strparser
users a single threaded workqueue. Perhaps there were some performance
or pinning considerations? Since we don't know (and it's the slow path)
let's default to the most natural, multi-threaded choice.
Also rename the workqueue to "tls-".
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric indicates that restarting rcvtimeo on every wait may be fine.
I thought that we should consider it cumulative, and made
tls_rx_reader_lock() return the remaining timeo after acquiring
the reader lock.
tls_rx_rec_wait() gets its timeout passed in by value so it
does not keep track of time previously spent.
Make the lock waiting consistent with tls_rx_rec_wait() - don't
keep track of time spent.
Read the timeo fresh in tls_rx_rec_wait().
It's unclear to me why callers are supposed to cache the value.
Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy(). As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr;
char data[24] = {0};
int fd;
hdr = (struct ipv6_sr_hdr *)data;
hdr->hdrlen = 2;
hdr->type = IPV6_SRCRT_TYPE_4;
fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
close(fd);
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range. The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0). Thus, the local DoS does
not succeed until we change the default value. However, at least
Ubuntu/Fedora/RHEL loosen it.
$ cat /usr/lib/sysctl.d/50-default.conf
...
-net.ipv4.ping_group_range = 0 2147483647
Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.
setsockopt
IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
IPV6_HOPOPTS (inet6_sk(sk)->opt)
IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
IPV6_RTHDR (inet6_sk(sk)->opt)
IPV6_DSTOPTS (inet6_sk(sk)->opt)
IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
getsockopt
IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
For the record, I left a different splat with syzbot's one.
unreferenced object 0xffff888006270c60 (size 96):
comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
hex dump (first 32 bytes):
01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
[<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
[<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
[<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
[<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
[<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176
Fixes: 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
ice: PPPoE offload support
Marcin Szycik says:
Add support for dissecting PPPoE and PPP-specific fields in flow dissector:
PPPoE session id and PPP protocol type. Add support for those fields in
tc-flower and support offloading PPPoE. Finally, add support for hardware
offload of PPPoE packets in switchdev mode in ice driver.
Example filter:
tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \
1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR
Changes in iproute2 are required to use the new fields (will be submitted
soon).
ICE COMMS DDP package is required to create a filter in ice.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: Add support for PPPoE hardware offload
flow_offload: Introduce flow_match_pppoe
net/sched: flower: Add PPPoE filter
flow_dissector: Add PPPoE dissectors
====================
Link: https://lore.kernel.org/r/20220726203133.2171332-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The net_eq() check is already performed inside
devlinks_xa_for_each_registered_get() helper, so remove the redundant
appearance.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220727055912.568391-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A NULL pointer dereference was reported by Wei Chen:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:__list_del_entry_valid+0x26/0x80
Call Trace:
<TASK>
sctp_sched_dequeue_common+0x1c/0x90
sctp_sched_prio_dequeue+0x67/0x80
__sctp_outq_teardown+0x299/0x380
sctp_outq_free+0x15/0x20
sctp_association_free+0xc3/0x440
sctp_do_sm+0x1ca7/0x2210
sctp_assoc_bh_rcv+0x1f6/0x340
This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.
As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.
Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.
Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a request is re-encoded and then retransmitted, we need to make sure
that we also re-encode the bvec, in case the page lists have changed.
Fixes: ff053dbbaf ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
After the blamed commit, IPv4 SYN packets handled
by a dual stack IPv6 socket are dropped, even if
perfectly valid.
$ nstat | grep MD5
TcpExtTCPMD5Failure 5 0.0
For a dual stack listener, an incoming IPv4 SYN packet
would call tcp_inbound_md5_hash() with @family == AF_INET,
while tp->af_specific is pointing to tcp_sock_ipv6_specific.
Only later when an IPv4-mapped child is created, tp->af_specific
is changed to tcp_sock_ipv6_mapped_specific.
Fixes: 7bbb765b73 ("net/tcp: Merge TCP-MD5 inbound callbacks")
Reported-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Tested-by: Leonard Crestez <cdleonard@gmail.com>
Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When we're reusing the backchannel requests instead of freeing them,
then we should reinitialise any values of the send/receive xdr_bufs so
that they reflect the available space.
Fixes: 0d2a970d0a ("SUNRPC: Fix a backchannel race")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Previously, the smc and smc_diag modules were automatically loaded as
dependencies of the ism module whenever an ISM device was present.
With the pending rework of the ISM API, the smc module will no longer
automatically be loaded in presence of an ISM device. Usage of an AF_SMC
socket will still trigger loading of the smc modules, but usage of a
netlink socket will not.
This is addressed by setting the correct module aliases.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the DMBE bits, which are passed on individually in ism_move() as
parameter idx, available to the receiver.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reworked signature of the function to retrieve the system EID: No plausible
reason to use a double pointer. And neither to pass in the device as an
argument, as this identifier is by definition per system, not per device.
Plus some minor consistency edits.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This struct is used in a single place only, and its usage generates
inefficient code. Time to clean up!
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-and-tested-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One rcu_read_unlock() should have been removed in blamed commit.
Fixes: 9b1c21d898 ("ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()")
Reported-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220725200554.2563581-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
New subflows are created within the kernel using O_NONBLOCK, so
EINPROGRESS is the expected return value from kernel_connect().
__mptcp_subflow_connect() has the correct logic to consider EINPROGRESS
to be a successful case, but it has also used that error code as its
return value.
Before v5.19 this was benign: all the callers ignored the return
value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic
netlink command that does use the return value, so the EINPROGRESS gets
propagated to userspace.
Make __mptcp_subflow_connect() always return 0 on success instead.
Fixes: ec3edaa7ca ("mptcp: Add handling of outgoing MP_JOIN requests")
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal says:
====================
netfilter updates for net
Three late fixes for netfilter:
1) If nf_queue user requests packet truncation below size of l3 header,
we corrupt the skb, then crash. Reject such requests.
2) add cond_resched() calls when doing cycle detection in the
nf_tables graph. This avoids softlockup warning with certain
rulesets.
3) Reject rulesets that use nftables 'queue' expression in family/chain
combinations other than those that are supported. Currently the ruleset
will load, but when userspace attempts to reinject you get WARN splat +
packet drops.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_queue: only allow supported familes and hooks
netfilter: nf_tables: add rescheduling points during loop detection walks
netfilter: nf_queue: do not allow packet truncation below transport header offset
====================
Link: https://lore.kernel.org/r/20220726192056.13497-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fix early wakeup after suspend
- Fix double free on error
- Fix use-after-free on l2cap_chan_put
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLgXYkZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKd4AD/9RgobBI3tUfpKe0BXhju2p
pjx/oL1SSqfF5kOvHNn/37dZ6tXQlqTl4yM0pD36Gos+UX1n1zyaBxm8teEhHx7h
/yLiWW8cGmae1+OwrfL63uqjRnwJUNjp4EPHZNWczsCrSIr+RqYkWomyFzcXWcsi
bjh2USvURKQZqeSDLMvEWI1H3Xjcu9q3atE5sXvJpUmthqraI5ewBYTLUMmvIM3J
sm/wcA7cIl1HEOE21h6L+irMN33wtfjKADoYYV6BlWdAzj6xMmr/yxCbBS3jNNTW
XaZLwJTp11YxeDu+ZSvEEIe5Q05DDY4qBMxWZxIFaBhxNWO+sDexHGjYonGvlL6M
OjuLMdyqfDOCFuWYeuBAcQA9TfA/ibA7WC9OeEf7aZONI3ZPeX+71VXlAlz9Z8ly
/DfV8keY7FIhoKrxJRgJa94hS/o4/zrIeP3Gylw4itbooLX0bKbrg9YBvgmldMT6
054vdN8bXdfBi4ooWkzzCyZ86I/PRS3mtenJj+fK9iYaKZGRB4t3bOYo/7Gabb62
VqrivooaVSL9A3bBRB3WO+5igf2o5gz03nAQjm8G1X/2gBtXz8wU3nZOrAsGa2W8
3+dmx4BJ8EgUE4IDmbZIvFZwJaT7njjHmnDYpTlbl17Evquqjv8NooF5x0wTaebV
cGhaI9RTPdLVtbDR0jXj4Q==
=4Cuo
-----END PGP SIGNATURE-----
Merge tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix early wakeup after suspend
- Fix double free on error
- Fix use-after-free on l2cap_chan_put
* tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Bluetooth: Always set event mask on suspend
Bluetooth: mgmt: Fix double free on error path
====================
Link: https://lore.kernel.org/r/20220726221328.423714-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TLS is a relatively poor fit for strparser. We pause the input
every time a message is received, wait for a read which will
decrypt the message, start the parser, repeat. strparser is
built to delineate the messages, wrap them in individual skbs
and let them float off into the stack or a different socket.
TLS wants the data pages and nothing else. There's no need
for TLS to keep cloning (and occasionally skb_unclone()'ing)
the TCP rx queue.
This patch uses a pre-allocated skb and attaches the skbs
from the TCP rx queue to it as frags. TLS is careful never
to modify the input skb without CoW'ing / detaching it first.
Since we call TCP rx queue cleanup directly we also get back
the benefit of skb deferred free.
Overall this results in a 6% gain in my benchmarks.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wrap the remaining skb_cow_data() into a helper, so it's easier
to replace down the lane. The new version will change the skb
so make sure relevant pointers get reloaded after the call.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose TCP rx queue accessor and cleanup, so that TLS can
decrypt directly from the TCP queue. The expectation
is that the caller can access the skb returned from
tcp_recv_skb() and up to inq bytes worth of data (some
of which may be in ->next skbs) and then call
tcp_read_done() when data has been consumed.
The socket lock must be held continuously across
those two operations.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The non-zero-copy path assumes a full skb with decrypted contents.
This means the device offload would have to CoW the data. Try
to keep the zero-copy status instead, copy the data to user space.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the future we'll want to reuse the input skb in case of
zero-copy so we shouldn't always free darg.skb. Move the
freeing of darg.skb into the non-zc cases. All cases will
now free ctx->recv_pkt (inside let tls_rx_rec_done()).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After recent changes the SW side of tls_rx_one_record() can
be nicely encapsulated in its own function. Move the pad handling
as well. This will be useful for ->zc handling in tls_decrypt_device().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To allow for the logic to change later wrap accesses
which interrogate the input skb in helper functions.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The issue happens on an error path in __xfrm_policy_check(). When the
fetching process of the object `pols[1]` fails, the function simply
returns 0, forgetting to decrement the reference count of `pols[0]`,
which is incremented earlier by either xfrm_sk_policy_lookup() or
xfrm_policy_lookup(). This may result in memory leaks.
Fix it by decreasing the reference count of `pols[0]` in that path.
Fixes: 134b0fc544 ("IPsec: propagate security module errors up from flow_cache_lookup")
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
For the purpose of exposing device info and allow flash update which is
going to be implemented in follow-up patches, introduce a possibility
for a line card to expose relation to nested devlink entity. The nested
devlink entity represents the line card.
Example:
$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
supported_types:
16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Benefit from having devlinks iterator helper
devlinks_xa_for_each_registered_get() and move the net pointer
check inside.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove dependency on devlink_mutex during devlinks xarray iteration.
The reason is that devlink_register/unregister() functions taking
devlink_mutex would deadlock during devlink reload operation of devlink
instance which registers/unregisters nested devlink instances.
The devlinks xarray consistency is ensured internally by xarray.
There is a reference taken when working with devlink using
devlink_try_get(). But there is no guarantee that devlink pointer
picked during xarray iteration is not freed before devlink_try_get()
is called.
Make sure that devlink_try_get() works with valid pointer.
Achieve it by:
1) Splitting devlink_put() so the completion is sent only
after grace period. Completion unblocks the devlink_unregister()
routine, which is followed-up by devlink_free()
2) During devlinks xa_array iteration, get devlink pointer from xa_array
holding RCU read lock and taking reference using devlink_try_get()
before unlock.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.
refcount_t: increment on 0; use-after-free.
BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W
4.14.234-00003-g1fb6d0bd49a4-dirty #28
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
Google Inc. MSM sm8150 Flame DVT (DT)
Workqueue: hci0 hci_rx_work
Call trace:
dump_backtrace+0x0/0x378
show_stack+0x20/0x2c
dump_stack+0x124/0x148
print_address_description+0x80/0x2e8
__kasan_report+0x168/0x188
kasan_report+0x10/0x18
__asan_load4+0x84/0x8c
refcount_dec_and_test+0x20/0xd0
l2cap_chan_put+0x48/0x12c
l2cap_recv_frame+0x4770/0x6550
l2cap_recv_acldata+0x44c/0x7a4
hci_acldata_packet+0x100/0x188
hci_rx_work+0x178/0x23c
process_one_work+0x35c/0x95c
worker_thread+0x4cc/0x960
kthread+0x1a8/0x1c4
ret_from_fork+0x10/0x18
Cc: stable@kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When suspending, always set the event mask once disconnects are
successful. Otherwise, if wakeup is disallowed, the event mask is not
set before suspend continues and can result in an early wakeup.
Fixes: 182ee45da0 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Cc: stable@vger.kernel.org
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
for commit f856373e2f ("wifi: mac80211: do not wake queues on a vif
that is being stopped") guards clear_bit() using fq.lock even before
fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.
According to discussion [2], Toke was not happy with expanding usage of
fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().
Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: f856373e2f ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp
[ pick up commit 3598cb6e18 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next]
Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Trying to use 'queue' statement in ingress (for example)
triggers a splat on reinject:
WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291
... because nf_reinject cannot find the ruleset head.
The netdev family doesn't support async resume at the moment anyway,
so disallow loading such rulesets with a more appropriate
error message.
v2: add 'validate' callback and also check hook points, v1 did
allow ingress use in 'table inet', but that doesn't work either. (Pablo)
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add explicit rescheduling points during ruleset walk.
Switching to a faster algorithm is possible but this is a much
smaller change, suitable for nf tree.
Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Domingo Dirutigliano and Nicola Guerrera report kernel panic when
sending nf_queue verdict with 1-byte nfta_payload attribute.
The IP/IPv6 stack pulls the IP(v6) header from the packet after the
input hook.
If user truncates the packet below the header size, this skb_pull() will
result in a malformed skb (skb->len < 0).
Fixes: 7af4cc3fa1 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allow to offload PPPoE filters by adding flow_rule_match_pppoe.
Drivers can extract PPPoE specific fields from now on.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for PPPoE specific fields for tc-flower.
Those fields can be provided only when protocol was set
to ETH_P_PPP_SES. Defines, dump, load and set are being done here.
Overwrite basic.n_proto only in case of PPP_IP and PPP_IPV6,
otherwise leave it as ETH_P_PPP_SES.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Allow to dissect PPPoE specific fields which are:
- session ID (16 bits)
- ppp protocol (16 bits)
- type (16 bits) - this is PPPoE ethertype, for now only
ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC
in the future
The goal is to make the following TC command possible:
# tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \
flower \
pppoe_sid 12 \
ppp_proto ip \
action drop
Note that only PPPoE Session is supported.
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After commit b6c02ef549 ("bridge: Netlink interface fix."),
br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a
bridge vlan dump is requested but an interface does not have any vlans
configured.
iproute2 ignores such an empty attribute since commit b262a9becbcb
("bridge: Fix output with empty vlan lists") but older iproute2 versions as
well as other utilities have their output changed by the cited kernel
commit, resulting in failed test cases. Regardless, emitting an empty
attribute is pointless and inefficient.
Avoid this change by canceling the attribute if no AF_SPEC data was added.
Fixes: b6c02ef549 ("bridge: Netlink interface fix.")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For the case where offset + len == size, bpf_xdp_pointer should return a
valid pointer to the addr because that access is permitted. We should
only return NULL in the case where offset + len exceeds size.
Fixes: 3f364222d0 ("net: xdp: introduce bpf_xdp_pointer utility routine")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/bpf/20220722220105.2065466-1-joannelkoong@gmail.com
Commit 26101f5ab6 ("bpf: Add source ip in "struct bpf_tunnel_key"")
added support for getting and setting the outer source IP of encapsulated
packets via the bpf_skb_{get,set}_tunnel_key BPF helper. This change
allows BPF programs to set any IP address as the source, including for
example the IP address of a container running on the same host.
In that last case, however, the encapsulated packets are dropped when
looking up the route because the source IP address isn't assigned to any
interface on the host. To avoid this, we need to set the
FLOWI_FLAG_ANYSRC flag.
Fixes: 26101f5ab6 ("bpf: Add source ip in "struct bpf_tunnel_key"")
Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/76873d384e21288abe5767551a0799ac93ec07fb.1658759380.git.paul@isovalent.com
There are sleep in atomic context bugs in timer handlers of sctp
such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(),
sctp_generate_t1_init_event(), sctp_generate_timeout_event(),
sctp_generate_t3_rtx_event() and so on.
The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter
that may sleep could be called by different timer handlers which is in
interrupt context.
One of the call paths that could trigger bug is shown below:
(interrupt context)
sctp_generate_probe_event
sctp_do_sm
sctp_side_effects
sctp_cmd_interpreter
sctp_outq_teardown
sctp_outq_init
sctp_sched_set_sched
n->init_sid(..,GFP_KERNEL)
sctp_sched_prio_init_sid //may sleep
This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched()
from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic
context bugs.
Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
delete extra space and tab in blank line, there is no functional change.
Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Link: https://lore.kernel.org/r/20220723073222.2961602-1-williamsukatube@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to an invalid conflict resolution on my side while working on 2
different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add()
does not store the database associated with a dsa_mac_addr structure.
So after adding an FDB entry associated with a LAG, dsa_mac_addr_find()
fails to find it while deleting it, because &a->db is zeroized memory
for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del()
returns -ENOENT rather than deleting the entry.
Fixes: c26933639b ("net: dsa: request drivers to perform FDB isolation")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Third set of patches for v5.20. MLO work continues and we have a lot
of stack changes due to that, including driver API changes. Not much
driver patches except on mt76.
Major changes:
cfg80211/mac80211
* more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support,
works with one link now
* align with IEEE Draft P802.11be_D2.0
* hardware timestamps for receive and transmit
mt76
* preparation for new chipset support
* ACPI SAR support
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmLe1k4RHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZvxlQf8DrZIllhF0q/7Wry3JuG0gbNA+V2eI/lc
OYrephsDBm/dvvyjcFWcdUzxoNk0k1+aOrx/09JijHFgCGKVwuK1+hxYVfjW2q43
9mHxJBo4NcMk1RDDM3paVuZ8QMHuYugbv2mQOZeAEq2XloAaqEM7wVE+bb4Mgtgx
VAKS5du2igrSt83wl8BRMFb9MPAM1EQ3Cw7Ro5T4y+1Qm/hrBm6qWizSpqh9CXYx
pDLR3pvQxiD4Axa0Uq3rUbyF4hLwciqSFOJvr2sI3q7b9YElA7wIi6NQzMkYJH6Z
7HW5K6UIQbblAaQkv2BLqpU1N6puTHUOAf5Md31vOAaOcGbSI5hbUA==
=Cnxg
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v5.20
Third set of patches for v5.20. MLO work continues and we have a lot
of stack changes due to that, including driver API changes. Not much
driver patches except on mt76.
Major changes:
cfg80211/mac80211
- more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support,
works with one link now
- align with IEEE Draft P802.11be_D2.0
- hardware timestamps for receive and transmit
mt76
- preparation for new chipset support
- ACPI SAR support
* tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (254 commits)
wifi: mac80211: fix link data leak
wifi: mac80211: mlme: fix disassoc with MLO
wifi: mac80211: add macros to loop over active links
wifi: mac80211: remove erroneous sband/link validation
wifi: mac80211: mlme: transmit assoc frame with address translation
wifi: mac80211: verify link addresses are different
wifi: mac80211: rx: track link in RX data
wifi: mac80211: optionally implement MLO multicast TX
wifi: mac80211: expand ieee80211_mgmt_tx() for MLO
wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API
wifi: mac80211: report link ID to cfg80211 on mgmt RX
wifi: cfg80211: report link ID in NL80211_CMD_FRAME
wifi: mac80211: add hardware timestamps for RX and TX
wifi: cfg80211: add hardware timestamps to frame RX info
wifi: cfg80211/nl80211: move rx management data into a struct
wifi: cfg80211: add a function for reporting TX status with hardware timestamps
wifi: nl80211: add RX and TX timestamp attributes
wifi: ieee80211: add helper functions for detecting TM/FTM frames
wifi: mac80211_hwsim: handle links for wmediumd/virtio
wifi: mac80211: sta_info: fix link_sta insertion
...
====================
Link: https://lore.kernel.org/r/20220725174547.EA465C341C6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For only offline transports, attempt to check connectivity via
a NULL call and, if that succeeds, call a provided session trunking
detection function.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Make xprt_iter_rewind callable outside of xprtmultipath.c
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In preparation for code re-use, pull out the part of the
rpc_clnt_setup_test_and_add_xprt() portion that sends a NULL rpc
and then calls a session trunking function into a helper function.
Re-organize the end of the function for code re-use.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Expose a function that allows a removal of xprt from the rpc_clnt.
When called from NFS that's running a trunked transport then don't
decrement the active transport counter.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When we are adding a transport to a xprt_switch that's already on
the list but has been marked OFFLINE, then make the state ONLINE
since it's been tested now.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Create a new iterator helper that will go thru the all the transports
in the switch and return transports that are marked OFFLINE.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Iterate thru available transports in the xprt_switch for all
trunkable transports offline and possibly remote them as well.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Re-arrange the code that make offline transport and delete transport
callable functions.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2022-07-20
1) Don't set DST_NOPOLICY in IPv4, a recent patch made this
superfluous. From Eyal Birger.
2) Convert alg_key to flexible array member to avoid an iproute2
compile warning when built with gcc-12.
From Stephen Hemminger.
3) xfrm_register_km and xfrm_unregister_km do always return 0
so change the type to void. From Zhengchao Shao.
4) Fix spelling mistake in esp6.c
From Zhang Jiaming.
5) Improve the wording of comment above XFRM_OFFLOAD flags.
From Petr Vaněk.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_fib_notify_on_flag_change, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 680aea08e7 ("net: ipv4: Emit notification when fib hardware flags are changed")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_reflect_tos, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: ac8f1710c1 ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 9c21d2fc41 ("tcp: add tcp_comp_sack_nr sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_comp_sack_slack_ns, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: a70437cc09 ("tcp: add hrtimer slack to sack compression")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_comp_sack_delay_ns, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 6d82aa2420 ("tcp: add tcp_comp_sack_delay_ns sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading these sysctl variables, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.
- .sysctl_rmem
- .sysctl_rwmem
- .sysctl_rmem_offset
- .sysctl_wmem_offset
- sysctl_tcp_rmem[1, 2]
- sysctl_tcp_wmem[1, 2]
- sysctl_decnet_rmem[1]
- sysctl_decnet_wmem[1]
- sysctl_tipc_rmem[1]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed
concurrently. Thus, we need to add READ_ONCE() to their readers.
Fixes: 43e122b014 ("tcp: refine pacing rate determination")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mld_{query | report}_work() processes queued events.
If there are too many events in the queue, it re-queue a work.
And then, it returns without in6_dev_put().
But if queuing is failed, it should call in6_dev_put(), but it doesn't.
So, a reference count leak would occur.
THREAD0 THREAD1
mld_report_work()
spin_lock_bh()
if (!mod_delayed_work())
in6_dev_hold();
spin_unlock_bh()
spin_lock_bh()
schedule_delayed_work()
spin_unlock_bh()
Script to reproduce(by Hangbin Liu):
ip netns add ns1
ip netns add ns2
ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1
ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1
ip -n ns1 link add veth0 type veth peer name veth0 netns ns2
ip -n ns1 link set veth0 up
ip -n ns2 link set veth0 up
for i in `seq 50`; do
for j in `seq 100`; do
ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0
ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0
done
done
modprobe -r veth
ip -a netns del
splat looks like:
unregister_netdevice: waiting for veth0 to become free. Usage count = 2
leaked reference.
ipv6_add_dev+0x324/0xec0
addrconf_notify+0x481/0xd10
raw_notifier_call_chain+0xe3/0x120
call_netdevice_notifiers+0x106/0x160
register_netdevice+0x114c/0x16b0
veth_newlink+0x48b/0xa50 [veth]
rtnl_newlink+0x11a2/0x1a40
rtnetlink_rcv_msg+0x63f/0xc00
netlink_rcv_skb+0x1df/0x3e0
netlink_unicast+0x5de/0x850
netlink_sendmsg+0x6c9/0xa90
____sys_sendmsg+0x76a/0x780
__sys_sendmsg+0x27c/0x340
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: f185de28d9 ("mld: add new workqueues for process mld events")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* for-5.20/io_uring: (716 commits)
io_uring: ensure REQ_F_ISREG is set async offload
net: fix compat pointer in get_compat_msghdr()
io_uring: Don't require reinitable percpu_ref
io_uring: fix types in io_recvmsg_multishot_overflow
io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
io_uring: support multishot in recvmsg
net: copy from user before calling __get_compat_msghdr
net: copy from user before calling __copy_msghdr
io_uring: support 0 length iov in buffer select in compat
io_uring: fix multishot ending when not polled
io_uring: add netmsg cache
io_uring: impose max limit on apoll cache
io_uring: add abstraction around apoll cache
io_uring: move apoll cache to poll.c
io_uring: consolidate hash_locked io-wq handling
io_uring: clear REQ_F_HASH_LOCKED on hash removal
io_uring: don't race double poll setting REQ_F_ASYNC_DATA
io_uring: don't miss setting REQ_F_DOUBLE_POLL
io_uring: disable multishot recvmsg
io_uring: only trace one of complete or overflow
...
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A previous change enabled external users to copy the data before
calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or
__io_compat_recvmsg_copy_hdr() to take that into account. They are both
stil passing in the __user pointer rather than the copied version.
Ensure we pass in the kernel struct, not the pointer to the user data.
Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/
Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.
functionally this should be a no-op.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.
functionally this should be a no-op.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224751.271015450@goodmis.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.
Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.
Fixes: 3740651bf7 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These functions are no longer needed now that the NFS client places data
and hole segments directly.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This will be used during READ_PLUS decoding for handling HOLE segments.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We need to do this step during READ_PLUS decoding so that we know pages
are the right length and any extra data has been preserved in the tail.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
I do this by creating an xdr subsegment for the range we will be
operating over. This lets me shift data to the correct place without
potentially overwriting anything already there.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
A bad verifier is not a garbage argument, it's an authentication
failure. Retrying it doesn't make the problem go away, and delays
upper layer recovery steps.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
- Add support for IM Networks PID 0x3568
- Add support for BCM4349B1
- Add support for CYW55572
- Add support for MT7922 VID/PID 0489/e0e2
- Add support for Realtek RTL8852C
- Initial support for Isochronous Channels/ISO sockets
- Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLbPmsZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKSo7EACc8njgHO2pN8ncWvgu/gH8
0v1lRBoi+Tyzk5gZtdM0rIbE3t7tFqml3Kr0WsCkzV6CGgnqCw5i/MKZXCV8G4tG
0ZsY8y6NMiCFR6wQq3rdNS8NqPmlCHWm6yY2EISEM6qbtF8HwxQvXdkzznZPHgVG
DR7i5fAVuGA6rIL+9NSG/TxHjJvq6Bmu3v9Uu/V062I7NrBMw9Jr0Ic1EaUqgYck
QL8+653ZZMxYPxt978UekbQIYEp3YwZ5MTACtX86j2s5tlZKuivKTIZch1vSaOi1
1zC6up208+p2/4+Yq7FJ2kA6d2be3FD26oT1xymRhqiMakCRrHfdmTFpC7/J4ZgX
/4mcIREkUoO2duim+91Hgt1Ww1vaD4joPwXD6AILbK1bdp0pi0gw47bQF8XO8uIh
yPQqnoGWSJGD1VknPh5x7lGcAYQ3bgSg0L3TlQ4gN9Qc/emuC6UOQ9QPvwmyOilG
ZrDn2p1Rpsoj8vVRTv6+CgKqLokXNUTPixCAaS4AIygRzhIzwReYYNMYUZgYMrTk
Qf6bKqczEut1vq8NZiN3TTxLLOWHB5+3cdl/QRv4ZeoNXMXPmbtJ0y9Vud66GhZu
vS6F+BNQapIkEbM6xrC/OepCgzrVoVn2BQuDO6SQA3pg4JxFeAl+V434Va3tqVeK
9h6GHrl8KePjh528NXbH3Q==
=Ne87
-----END PGP SIGNATURE-----
Merge tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Add support for IM Networks PID 0x3568
- Add support for BCM4349B1
- Add support for CYW55572
- Add support for MT7922 VID/PID 0489/e0e2
- Add support for Realtek RTL8852C
- Initial support for Isochronous Channels/ISO sockets
- Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk
* tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (58 commits)
Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet
Bluetooth: btusb: Add support for ISO packets
Bluetooth: ISO: Add broadcast support
Bluetooth: Add initial implementation of BIS connections
Bluetooth: Add BTPROTO_ISO socket type
Bluetooth: Add initial implementation of CIS connections
Bluetooth: hci_core: Introduce hci_recv_event_data
Bluetooth: Convert delayed discov_off to hci_sync
Bluetooth: Remove update_scan hci_request dependancy
Bluetooth: Remove dead code from hci_request.c
Bluetooth: btrtl: Fix typo in comment
Bluetooth: MGMT: Fix holding hci_conn reference while command is queued
Bluetooth: mgmt: Fix using hci_conn_abort
Bluetooth: Use bt_status to convert from errno
Bluetooth: Add bt_status
Bluetooth: hci_sync: Split hci_dev_open_sync
Bluetooth: hci_sync: Refactor remove Adv Monitor
Bluetooth: hci_sync: Refactor add Adv Monitor
Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING
Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR
...
====================
Link: https://lore.kernel.org/r/20220723002232.964796-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds broadcast support for BTPROTO_ISO by extending the
sockaddr_iso with a new struct sockaddr_iso_bc where the socket user
can set the broadcast address when receiving, the SID and the BIS
indexes it wants to synchronize.
When using BTPROTO_ISO for broadcast the roles are:
Broadcaster -> uses connect with address set to BDADDR_ANY:
> tools/isotest -s 00:00:00:00:00:00
Broadcast Receiver -> uses listen with address set to broadcaster:
> tools/isotest -d 00:AA:01:00:00:00
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This introduces a new socket type BTPROTO_ISO which can be enabled with
use of ISO Socket experiemental UUID, it can used to initiate/accept
connections and transfer packets between userspace and kernel similarly
to how BTPROTO_SCO works:
Central -> uses connect with address set to destination bdaddr:
> tools/isotest -s 00:AA:01:00:00:00
Peripheral -> uses listen:
> tools/isotest -d
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Daniel Borkmann says:
====================
bpf-next 2022-07-22
We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).
The main changes are:
1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.
2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
syscalls through kprobe mechanism, from Andrii Nakryiko.
3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
function, from Song Liu & Jiri Olsa.
4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.
5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
with kernel symbols, from Alan Maguire.
6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.
7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.
8) libbpf support for writing custom perf event readers, from Jon Doron.
9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.
10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.
11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.
12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.
13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.
14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.
15) Make non-prealloced BPF map allocations low priority to play better with
memcg limits, from Yafang Shao.
16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.
17) Various smaller cleanups and improvements all over the place.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
bpf: Simplify bpf_prog_pack_[size|mask]
bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
bpf, x64: Allow to use caller address from stack
ftrace: Allow IPMODIFY and DIRECT ops on the same function
ftrace: Add modify_ftrace_direct_multi_nolock
bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
selftests/bpf: Fix test_verifier failed test in unprivileged mode
selftests/bpf: Add negative tests for new nf_conntrack kfuncs
selftests/bpf: Add tests for new nf_conntrack kfuncs
selftests/bpf: Add verifier tests for trusted kfunc args
net: netfilter: Add kfuncs to set and change CT status
net: netfilter: Add kfuncs to set and change CT timeout
net: netfilter: Add kfuncs to allocate and insert CT
net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
bpf: Add documentation for kfuncs
bpf: Add support for forcing kfunc args to be trusted
bpf: Switch to new kfunc flags infrastructure
tools/resolve_btfids: Add support for 8-byte BTF sets
bpf: Introduce 8-byte BTF set
...
====================
Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 4a41f453be.
This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.
Fixes: 4a41f453be ("tcp: change pingpong threshold to 3")
Reported-by: LemmyHuang <hlm3280@163.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because of how struct msghdr is usually initialised some fields and
sg_from_iter in particular might be left out not initialised, so we
can't safely use it in __zerocopy_sg_from_iter().
For now use the callback only when there is ->msg_ubuf set relying on
the fact that they're used together and we properly zero ->msg_ubuf.
Fixes: ebe73a284f ("net: Allow custom iter handler in msghdr")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Message-Id: <ce8b68b41351488f79fd998b032b3c56e9b1cc6c.1658401817.git.asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This introduces hci_recv_event_data to make it simpler to access the
contents of last received event rather than having to pass its contents
to the likes of *_ind/*_cfm callbacks.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The timed ending of Discoverability was handled in hci_requst.c, with
calls using the deprecated hci_req_add() mechanism. Converted to live
inside mgmt.c using the same delayed work queue, but with hci_sync
version of hci_update_discoverable().
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This removes the remaining calls to HCI_OP_WRITE_SCAN_ENABLE from
hci_request call chains, and converts them to hci_sync calls.
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The discov_update work queue is no longer used as a result
of the hci_sync rework.
The __hci_req_hci_power_on() function is no longer referenced in the
code as a result of the hci_sync rework.
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This removes the use of hci_conn_hold from Get Conn Info and Get Clock
Info since the callback can just do a lookup by address using the cmd
data and only then set cmd->user_data to pass to the complete callback.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
During the code reshuffling, I accidentally set this to
NULL before using it, fix that to fix the link data leak.
Fixes: d3e2439b0f ("wifi: mac80211: fix link manipulation")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In MLO we shouldn't call ieee80211_bss_info_change_notify(),
call that only (for backward compatibility) without MLO, and
otherwise ieee80211_vif_cfg_change_notify().
Similarly, ieee80211_reset_erp_info() only applies to the
current link, and in MLO we assume the driver doesn't really
need that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In sta_apply_parameters(), we really no longer need to
check that the link or sband exists, in fact, that's
harmful if link 0 doesn't exist, since then this will
fail.
Just remove this check, it was added for validation of
the sband where used, but it's not used here, it's now
only used in sta_link_apply_parameters() which has an
own lookup and check.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To transmit the association frame to the right station and
with address translation, use the correct addresses there
and set up the AP address in the configuration earlier so
it's applied during the transmit of auth/assoc frames.
Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We'll need the link e.g. for decrypt, and shouldn't be
looking it up all the time later, so track it in the RX
data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For drivers using software encryption for multicast TX, such
as mac80211_hwsim, mac80211 needs to duplicate the multicast
frames on each link, if MLO is enabled. Do this, but don't
just make it dependent on the key but provide a separate flag
for drivers to opt out of this.
This is not very efficient, I expect that drivers will do it
in firmware/hardware or at least with DMA engine assistence,
so this is mostly for hwsim.
To make this work, also implement the SNS11 sequence number
space that an AP MLD shall have, and modify the API to the
__ieee80211_subif_start_xmit() function to always require the
link ID bits to be set.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are a couple of new things that should be possible
with MLO:
* selecting the link to transmit to a station by link ID,
which a previous patch added to the nl80211 API
* selecting the link by frequency, similarly
* allowing transmittion to an MLD without specifying any
channel or link ID, with MLD addresses
Enable these use cases. Also fix the address comparison
in client mode to use the AP (MLD) address.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow optionally specifying the link ID to transmit on,
which can be done instead of the link frequency, on an
MLD addressed frame. Both can also be omitted in which
case the frame must be MLD addressed and link selection
(and address translation) will be done on lower layers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the low level driver reports hardware timestamps for frame
TX status or frame RX, pass the timestamps to cfg80211.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add hardware timestamps to management frame RX info.
This shall be used by drivers that support hardware timestamping for
Timing measurement and Fine timing measurement action frames RX.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The functions for reporting rx management take many arguments.
Collect all the arguments into a struct, which also make it easier
to add more arguments if needed.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a function for reporting TX status with hardware timestamps. This
function shall be used for reporting the TX status of Timing
measurement and Fine timing measurement action frames by devices that
support reporting hardware timestamps.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When inserting a link STA, make sure it doesn't exist first
and add lockdep assertions that we cannot modify the hash
table without holding the sta_mtx, so this check is really
correct.
Also return without hashing if the driver failed, and warn
if the hashing fails, which shouldn't happen due to the
check described above.
Fixes: cb71f1d136 ("wifi: mac80211: add sta link addition/removal")
Fixes: ba6ddab94f ("wifi: mac80211: maintain link-sta hash table")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We need to copy the address to both the private and public
portion of the link_sta (the private one is needed for the
hash table). Fix this.
Fixes: bbe90107e1 ("wifi: mac80211: mlme: refactor link station setup")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Clarify the link ID behaviour in control port TX, we need it
to select the link to transmit on for both MLD and non-MLD
receivers, but select the link address as the SA only if the
receiver is not an MLD.
Fixes: 67207bab93 ("wifi: cfg80211/mac80211: Support control port TX from specific link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the frame is going to be dropped anyway because
ieee80211_lookup_ra_sta() returned an error (and
even though it's a bit racy, it will likely continue
to do so), return the error out instead of just
silently dropping the frame.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We always need the MLD address and link ID to add or
modify the link STA, so require it in the API.
Fixes: 577e5b8c39 ("wifi: cfg80211: add API to add/modify/remove a link station")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add more sanity checks to the API handling, we shouldn't
be able to create a station without links, nor should we
be able to add a link to a station that wasn't created as
an MLD with links in the first place.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are two issues here: we unhash the link stations only
directly before freeing the station they belong to, and we
also don't unhash all the links correctly in all cases. Fix
these issues.
Fixes: ba6ddab94f ("wifi: mac80211: maintain link-sta hash table")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When modifying a link station, validate that the link address
doesn't change, except the first time the link is created.
Fixes: b95eb7f0ee ("wifi: cfg80211/mac80211: separate link params from station params")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
At this point, we've already changed link_id to be zero for
a non-MLO connection, so use the 'mlo' variable rather than
link ID to determine the MLO status of the station.
Fixes: bd363ee533 ("wifi: mac80211: mlme: set sta.mlo correctly")
Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If there's a non-MLO client, the A2 must be set to the BSSID
of the link since no translation will happen in lower layers
and it's needed that way for encryption.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>