Don't use mnl attr helpers, we're trying to remove the libmnl
dependency. Create both signed and unsigned helpers, libmnl
had unsigned helpers, so code generator no longer needs
the mnl_type() hack.
The new helpers are written from first principles, but are
hopefully not too buggy.
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The temporary auto-int helpers are not really correct.
We can't treat signed and unsigned ints the same when
determining whether we need full 8B. I realized this
before sending the patch to add support in libmnl.
Unfortunately, that patch has not been merged,
so time to fix our local helpers. Use the mnl* name
for now, subsequent patches will address that.
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240227223032.1835527-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We never increment the group number iterator, so all groups
get recorded into index 0 of the mcast_groups[] array.
As a result YNL can only handle using the last group.
For example using the "netdev" sample on kernel with
page pool commands results in:
$ ./samples/netdev
YNL: Multicast group 'mgmt' not found
Most families have only one multicast group, so this hasn't
been noticed. Plus perhaps developers usually test the last
group which would have worked.
Fixes: 86878f14d7 ("tools: ynl: user space helpers")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 7c59c9c8f2 ("tools: ynl: generate code for ovs families")
we need relatively recent OvS headers to get YNL to compile.
Add the direct include workaround to fix compilation on less
up-to-date OSes like CentOS 9.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240226225806.1301152-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add test case for multicast packet confirm race.
Without preceding patch, this should result in:
WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0
Workqueue: events_unbound macvlan_process_broadcast
RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0
? __nf_conntrack_confirm+0x3ed/0x5f0
nf_confirm+0x2ad/0x2d0
nf_hook_slow+0x36/0xd0
ip_local_deliver+0xce/0x110
__netif_receive_skb_one_core+0x4f/0x70
process_backlog+0x8c/0x130
[..]
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
conntrack nf_confirm logic cannot handle cloned skbs referencing
the same nf_conn entry, which will happen for multicast (broadcast)
frames on bridges.
Example:
macvlan0
|
br0
/ \
ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing
an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
-> skb->_nfct now references a unconfirmed entry
2. skb is broad/mcast packet. bridge now passes clones out on each bridge
interface.
3. skb gets passed up the stack.
4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the
original skb. The macvlan rx handler then returns RX_HANDLER_PASS.
5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that
case later steps perform skb_clone() with skb->_nfct already confirmed (in
hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the
packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful
nat, so we can safely discard the nf_conn entry and let inet call
conntrack again.
This doesn't work for bridge netfilter: skb could have a nat
transformation. Also bridge nf prevents re-invocation of inet prerouting
via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN
time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with
unconfirmed packets, but even if that could be done in a sane way this
opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4
-m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be
created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic
NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass
them, ruleset might have accept rules for untracked traffic already,
so user-visible behaviour would change.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit d0009effa8 ("netfilter: nf_tables: validate NFPROTO_* family") added
some validation of NFPROTO_* families in the nft_compat module, but it broke
the ability to use legacy iptables modules in dual-stack nftables.
While with legacy iptables one had to independently manage IPv4 and IPv6
tables, with nftables it is possible to have dual-stack tables sharing the
rules. Moreover, it was possible to use rules based on legacy iptables
match/target modules in dual-stack nftables.
As an example, the program from [2] creates an INET dual-stack family table
using an xt_bpf based rule, which looks like the following (the actual output
was generated with a patched nft tool as the current nft tool does not parse
dual stack tables with legacy match rules, so consider it for illustrative
purposes only):
table inet testfw {
chain input {
type filter hook prerouting priority filter; policy accept;
bytecode counter packets 0 bytes 0 accept
}
}
After d0009effa8 ("netfilter: nf_tables: validate NFPROTO_* family") we get
EOPNOTSUPP for the above program.
Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also
restrict the functions to classic iptables hooks.
Changes in v3:
* clarify that upstream nft will not display such configuration properly and
that the output was generated with a patched nft tool
* remove example program from commit description and link to it instead
* no code changes otherwise
Changes in v2:
* restrict nft_(match/target)_validate() to classic iptables hooks
* rewrite example program to use unmodified libnftnl
Fixes: d0009effa8 ("netfilter: nf_tables: validate NFPROTO_* family")
Link: https://lore.kernel.org/all/Zc1PfoWN38UuFJRI@calendula/T/#mc947262582c90fec044c7a3398cc92fac7afea72 [1]
Link: https://lore.kernel.org/all/20240220145509.53357-1-ignat@cloudflare.com/ [2]
Reported-by: Jordan Griege <jgriege@cloudflare.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Puranjay Mohan says:
====================
bpf, arm64: use BPF prog pack allocator in BPF JIT
Changes in V8 => V9:
V8: https://lore.kernel.org/bpf/20240221145106.105995-1-puranjay12@gmail.com/
1. Rebased on bpf-next/master
2. Added Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Changes in V7 => V8:
V7: https://lore.kernel.org/bpf/20240125133159.85086-1-puranjay12@gmail.com/
1. Rebase on bpf-next/master
2. Fix __text_poke() by removing usage of 'ret' that was never set.
Changes in V6 => V7:
V6: https://lore.kernel.org/all/20240124164917.119997-1-puranjay12@gmail.com/
1. Rebase on bpf-next/master.
Changes in V5 => V6:
V5: https://lore.kernel.org/all/20230908144320.2474-1-puranjay12@gmail.com/
1. Implement a text poke api to reduce code repeatition.
2. Use flush_icache_range() in place of caches_clean_inval_pou() in the
functions that modify code.
3. Optimize the bpf_jit_free() by not copying the all instructions on
the rw image to the ro_image
Changes in V4 => v5:
1. Remove the patch for making prog pack allocator portable as it will come
through the RISCV tree[1].
2. Add a new function aarch64_insn_set() to be used in
bpf_arch_text_invalidate() for putting illegal instructions after a
program is removed. The earlier implementation of bpf_arch_text_invalidate()
was calling aarch64_insn_patch_text_nosync() in a loop and making it slow
because each call invalidated the cache.
Here is test_tag now:
[root@ip-172-31-6-176 bpf]# time ./test_tag
test_tag: OK (40945 tests)
real 0m19.695s
user 0m1.514s
sys 0m17.841s
test_tag without these patches:
[root@ip-172-31-6-176 bpf]# time ./test_tag
test_tag: OK (40945 tests)
real 0m21.487s
user 0m1.647s
sys 0m19.106s
test_tag in the previous version was really slow > 2 minutes. see [2]
3. Add cache invalidation in aarch64_insn_copy() so other users can call the
function without worrying about the cache. Currently only bpf_arch_text_copy()
is using it, but there might be more users in the future.
Chanes in V3 => V4: Changes only in 3rd patch
1. Fix the I-cache maintenance: Clean the data cache and invalidate the i-Cache
only *after* the instructions have been copied to the ROX region.
Chanes in V2 => V3: Changes only in 3rd patch
1. Set prog = orig_prog; in the failure path of bpf_jit_binary_pack_finalize()
call.
2. Add comments explaining the usage of the offsets in the exception table.
Changes in v1 => v2:
1. Make the naming consistent in the 3rd patch:
ro_image and image
ro_header and header
ro_image_ptr and image_ptr
2. Use names dst/src in place of addr/opcode in second patch.
3. Add Acked-by: Song Liu <song@kernel.org> in 1st and 2nd patch.
BPF programs currently consume a page each on ARM64. For systems with many BPF
programs, this adds significant pressure to instruction TLB. High iTLB pressure
usually causes slow down for the whole system.
Song Liu introduced the BPF prog pack allocator[3] to mitigate the above issue.
It packs multiple BPF programs into a single huge page. It is currently only
enabled for the x86_64 BPF JIT.
This patch series enables the BPF prog pack allocator for the ARM64 BPF JIT.
====================================================
Performance Analysis of prog pack allocator on ARM64
====================================================
To test the performance of the BPF prog pack allocator on ARM64, a stresser
tool[4] was built. This tool loads 8 BPF programs on the system and triggers
5 of them in an infinite loop by doing system calls.
The runner script starts 20 instances of the above which loads 8*20=160 BPF
programs on the system, 5*20=100 of which are being constantly triggered.
In the above environment we try to build Python-3.8.4 and try to find different
iTLB metrics for the compilation done by gcc-12.2.0.
The source code[5] is configured with the following command:
./configure --enable-optimizations --with-ensurepip=install
Then the runner script is executed with the following command:
./run.sh "perf stat -e ITLB_WALK,L1I_TLB,INST_RETIRED,iTLB-load-misses -a make -j32"
This builds Python while 160 BPF programs are loaded and 100 are being constantly
triggered and measures iTLB related metrics.
The output of the above command is discussed below before and after enabling the
BPF prog pack allocator.
The tests were run on qemu-system-aarch64 with 32 cpus, 4G memory, -machine virt,
-cpu host, and -enable-kvm.
Results
-------
Before enabling prog pack allocator:
------------------------------------
Performance counter stats for 'system wide':
333278635 ITLB_WALK
6762692976558 L1I_TLB
25359571423901 INST_RETIRED
15824054789 iTLB-load-misses
189.029769053 seconds time elapsed
After enabling prog pack allocator:
-----------------------------------
Performance counter stats for 'system wide':
190333544 ITLB_WALK
6712712386528 L1I_TLB
25278233304411 INST_RETIRED
5716757866 iTLB-load-misses
185.392650561 seconds time elapsed
Improvements in metrics
-----------------------
Compilation time ---> 1.92% faster
iTLB-load-misses/Sec (Less is better) ---> 63.16% decrease
ITLB_WALK/1000 INST_RETIRED (Less is better) ---> 42.71% decrease
ITLB_Walk/L1I_TLB (Less is better) ---> 42.47% decrease
[1] https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/commit/?h=for-next&id=20e490adea279d49d57b800475938f5b67926d98
[2] https://lore.kernel.org/all/CANk7y0gcP3dF2mESLp5JN1+9iDfgtiWRFGqLkCgZD6wby1kQOw@mail.gmail.com/
[3] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
[4] https://github.com/puranjaymohan/BPF-Allocator-Bench
[5] https://www.python.org/ftp/python/3.8.4/Python-3.8.4.tgz
====================
Link: https://lore.kernel.org/r/20240228141824.119877-1-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use bpf_jit_binary_pack_alloc for memory management of JIT binaries in
ARM64 BPF JIT. The bpf_jit_binary_pack_alloc creates a pair of RW and RX
buffers. The JIT writes the program into the RW buffer. When the JIT is
done, the program is copied to the final RX buffer
with bpf_jit_binary_pack_finalize.
Implement bpf_arch_text_copy() and bpf_arch_text_invalidate() for ARM64
JIT as these functions are required by bpf_jit_binary_pack allocator.
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240228141824.119877-3-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The text_poke API is used to implement functions like memcpy() and
memset() for instruction memory (RO+X). The implementation is similar to
the x86 version.
This will be used by the BPF JIT to write and modify BPF programs. There
could be more users of this in the future.
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20240228141824.119877-2-puranjay12@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Revert a recent EC driver change that introduced an unexpected and
undesirable user-visible difference in behavior (Rafael J. Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmXfh9MSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxE1sP/RuDelNCmPOVplzhGohKcm0pGhOrp0KU
HrRNxt8ZYn0WeBx8YvURKobTEU2/n3GdpTC/8XJV5y5Q9OCWXqAwpmLCB8qcC5bj
JJmAw2gpXupaa2GSuDj9KA8HkjwXmLz67lXQqeb8em34CQewF+ZFwaVriY9idfpF
cP9on5TL+vU5ZHbAelknAdP/Wmw2hQgeHV5NPsqwMqV03I7oZQaxq5G4q3AOzFa3
eYK7ZIYQkhRRI+b5MxpNQDLHBP1dbr1xE0korh2O4tynav4oXYhNADtMEMZpHdA4
NNT3ffQ2fimjtDm3gIWIVKgv+2beN+rB7lpfiQKADxgN6KCkIz9iwPM5R2636x/d
gr5kWxQjUl8eJn2zOQ83Pc79htqGqIaZRzTIWacB3jhdKW4ZoeVZlyKRvWayMzXG
YICpoTMrdbQmFOWXS8hRx8MZ7P/axyoqazMMEV26OQauJD0AHuM+u9uj5zDR7lKQ
hR33as6/HwfP3CuxwrtGZSkqi56AdtVwFBEXGyO612qEsAgSjF1J2mGlqumlViQ1
8q+yonmVRy7c8aYsITXg9x6KrliBPnMqGb1SohcHtGC1NlvtHypywBC2gFrVbOIf
QuctvhEyLVDHdCY7N800nT0l/kKqorODUc8UCNXKlbIbaQ5kr9QEZz8MNQefR0iU
RpvH8bqK1weS
=vyUq
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert a recent EC driver change that introduced an unexpected and
undesirable user-visible difference in behavior (Rafael Wysocki)"
* tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: EC: Use a spin lock without disabing interrupts"
Fix a latent bug in the intel-pstate cpufreq driver that has been
exposed by the recent schedutil governor changes (Doug Smythies).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmXfiG4SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx+MAP/jPyZb4g9fIrkZ6++7+PDfB4F/dwM8+a
yNTAkaIJ4H5bMenTkgTvDKETgogFo+tRraPiwk0a0oRZ6rF4HhNHMUOuXkynh9q8
rDxsIq8DtUURrtd8VAEtJ+lof3v9DO8Xpl6tvCksrJWJQXEldiOx1JNcavUnDb3H
tf0MNps9gZMFqSLNFlID4vFsaOYIq5PGpw73A91MBKTuVNB9VD4/camKIJPHr5WL
rZk3fzO2MqaI1WxPD1RL87r8Bj+a/1c5PGMJMCwJkeBqhAw+sfuP9GHoA21P2O+k
9N3gz1dqCWGs/hVp4vldnp3X+B7HcBUTOUoSctr0pxEaqTXqlTZ4IzprDEnzqU64
vR5OjKOGTkUhXvcQYEBNIgr9OsHr6mdjXmBZhHbnC7cSb4Mvn+xur6vV/qSXQtS+
J+xazNcMAoCyomChrWKljzRVkLBJ7bYSPDpGyiFPztdEtSUEhtopW9wgwUZ6/Ou4
JlgVZ6/hbfFTgH1BAqIwttWEoXZmNoTx8r416P1vr71YgZGwYDh+lh4mXUkGiwxb
cpHR3lEIMNfD9lOkhi+KcfDhyfi8fiok1pE6EvbdS6fKhyMbrkPYvONMuLfniMlX
4acobj7Xg5M3TQgvDo4aojEBIsWNOMuaLzth/tja6p//34ll07rqXcwTKD6KVXZ8
rSKyupjUWLuZ
=UaYX
-----END PGP SIGNATURE-----
Merge tag 'pm-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a latent bug in the intel-pstate cpufreq driver that has been
exposed by the recent schedutil governor changes (Doug Smythies)"
* tag 'pm-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
There's two things here - the big one is a batch of fixes for the power
management in the Cadence QuadSPI driver which had some serious issues
with runtime PM and there's also a revert of one of the last batch of
fixes for ppc4xx which has a dependency on -next but was in between two
mainline fixes so the -next dependency got missed.
The ppc4xx driver is not currently included in any defconfig and has
dependencies that exclude it from allmodconfigs so none of the CI
systems catch issues with it, hence the need for the earlier fixes
series. There's some updates to the PowerPC configs to address this.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmXfcaAACgkQJNaLcl1U
h9DOagf7BA7H/KvTOLuyd6uU90TwExZslA3/KiHhxn9ePwPHdc9njVMHBMVWl7T2
K4jMDVz9Y7SN41npLG/B48n1dD1jMyWsX2GxkUHaFGCMyK4Go1ODJHmK8ixWSRfy
hIFacWgjd6qEJej9SRoKFXmhi9u/UdNJIcUU8xXy3WK3IkrugGySmPUtVXDby+3A
6u2aR588xDEYrySZtxm8ZDFpl2FKU912OswtPPwCNTmSsMkn25/ePqhdo9C2H/VK
GA8290rVJFpGSsJPdUVLYAyYIbaar7bBtWV4IfLRBbIN9sFNRKvRbxJpwe6A93dQ
BOCjj+G/UjKPFYARhuDReF1Hd5+/Bg==
=YOog
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's two things here - the big one is a batch of fixes for the
power management in the Cadence QuadSPI driver which had some serious
issues with runtime PM and there's also a revert of one of the last
batch of fixes for ppc4xx which has a dependency on -next but was in
between two mainline fixes so the -next dependency got missed.
The ppc4xx driver is not currently included in any defconfig and has
dependencies that exclude it from allmodconfigs so none of the CI
systems catch issues with it, hence the need for the earlier fixes
series. There's some updates to the PowerPC configs to address this"
* tag 'spi-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Drop mismerged fix
spi: cadence-qspi: add system-wide suspend and resume callbacks
spi: cadence-qspi: put runtime in runtime PM hooks names
spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
spi: cadence-qspi: fix pointer reference in runtime PM hooks
Two small fixes, one small update for the max5970 driver bringing the
driver and DT binding documentation into sync plus a missed update to
the patterns in MAINTAINERS after a DT binding YAML conversion.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmXfcg4ACgkQJNaLcl1U
h9DY6gf/Y3pZz03HuUXf5RVphi9q6cyTb9XlqQbXPWb2tVN/3pVxjCy14aW2IJEN
bo7fnQyEnbP3pYpDpWDHI0hX4KxV83MoCArpsw7yOTnXrIP+PZBAEbcsWuAeKDqu
WPKVu1bO/vCpMSaQEzYavtmcyLwIihs21W8FqcQ0bhrYVO4jWefSVGHh+VcKFKSv
0h2OeAbE9OE58arAZb+rFAeDwKzbo6ldf14/Uj+a+wKENxZ9U5nZevsMf7xoJyjO
XYJpYMbMj/1LJ9V+WL4zOaqULIlZyRjq5SUMeC5egX4wYX3LMMvCdw6f0i0Rkaye
bmld819z9ITcLwlvWry5ljDba26YUQ==
=sGIp
-----END PGP SIGNATURE-----
Merge tag 'regulator-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Two small fixes, one small update for the max5970 driver bringing the
driver and DT binding documentation into sync plus a missed update to
the patterns in MAINTAINERS after a DT binding YAML conversion"
* tag 'regulator-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max5970: Fix regulator child node name
MAINTAINERS: repair entry for MICROCHIP MCP16502 PMIC DRIVER
access in arm64/neonbs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmXZOpsACgkQxycdCkmx
i6evnRAAzwTYyTOZFil02WUUqrmjIoiA3qMK2FiXr1wE8vTtlCvfEODGjS8T58JX
nl5PU6D3/H4OJu7Z0zzjdPUbj2EQ7JSw7T+N54p8Zr4Y/Sy/XeFPWwE5XbfGuPcc
7TS3UbIpX9Gi08M5R316C4an2/8YYRsYPZUyEkm0SPvh9egDF08o62UwxJsvj7aH
zKpjMjSSuxJWtzf2H9HDBlPh46u/ZFaV7kNJmrvl9Meb0XovmSCSkDWmHyOOE6TQ
TNdKyWuueW/QGtuIwFYEKhzyptfpqFN7ZkX8F45b4Mx8uHzcwWTezHe6jr/pFupR
APcYZfnZWanX5fbYkUmjjTkfiNO/ez9CkMRrhbexUWxafpeojlTAWtPRN3HdT8AX
/UFVCgYbDjdR8CebONidyvxQn6rYahseQ7epDPDiT/EEjAxCaU+WzOCCg6pb6TpX
KI3KBNq/cgvDkH3ywcTvoyS/XVRuiN8DqE+/zzLvU3po1EmcC1ZBwyKJva2/STMU
J0d+g4Xxu5tY8XmO19+5ZGbY6FerbTdchTgwwNamGtnCL3B2JWDC1kUTnA/aKiFW
2umTzxKSEzQFqgAnyXGOnx584QDAOOjm3CUQvTbLH+K2E4ip5O32U2SlOmYSep3O
2EMQR1jHonEf2rUYADACk0ES4SCMftmIZV4Y4oHRoVFM+oEcyI0=
=IOp0
-----END PGP SIGNATURE-----
Merge tag 'v6.8-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a regression in lskcipher and an out-of-bound access
in arm64/neonbs"
* tag 'v6.8-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/neonbs - fix out-of-bounds access on short input
crypto: lskcipher - Copy IV in lskcipher glue code always
hci_coredump_qca() uses __hci_cmd_sync() to send a vendor-specific command
to trigger firmware coredump, but the command does not have any event as
its sync response, so it is not suitable to use __hci_cmd_sync(), fixed by
using __hci_cmd_send().
Fixes: 06d3fdfcdf ("Bluetooth: hci_qca: Add qcom devcoredump support")
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
BT adapter going into UNCONFIGURED state during BT turn ON when
devicetree has no local-bd-address node.
Bluetooth will not work out of the box on such devices, to avoid this
problem, added check to set HCI_QUIRK_USE_BDADDR_PROPERTY based on
local-bd-address node entry.
When this quirk is not set, the public Bluetooth address read by host
from controller though HCI Read BD Address command is
considered as valid.
Fixes: e668eb1e15 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts")
Signed-off-by: Janaki Ramaiah Thota <quic_janathot@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Right now Linux BT stack cannot pass test case "GAP/CONN/CPUP/BV-05-C
'Connection Parameter Update Procedure Invalid Parameters Central
Responder'" in Bluetooth Test Suite revision GAP.TS.p44. [0]
That was revoled by commit c49a8682fc ("Bluetooth: validate BLE
connection interval updates"), but later got reverted due to devices
like keyboards and mice may require low connection interval.
So only validate the max value connection interval to pass the Test
Suite, and let devices to request low connection interval if needed.
[0] https://www.bluetooth.org/docman/handlers/DownloadDoc.ashx?doc_id=229869
Fixes: 68d19d7d99 ("Revert "Bluetooth: validate BLE connection interval updates"")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
LIMITED_DISCOVERABLE flag is not reset from Class of Device and
advertisement on limited discoverable timeout. This prevents to pass PTS
test GAP/DISC/LIMM/BV-02-C
Calling set_discoverable_sync as when the limited discovery is set
correctly update the Class of Device and advertisement.
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_store_wake_reason() wrongly parses event HCI_Connection_Request
as HCI_Connection_Complete and HCI_Connection_Complete as
HCI_Connection_Request, so causes recording wakeup BD_ADDR error and
potential stability issue, fix it by using the correct field.
Fixes: 2f20216c1d ("Bluetooth: Emit controller suspend and resume events")
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
During our fuzz testing of the connection and disconnection process at the
RFCOMM layer, we discovered this bug. By comparing the packets from a
normal connection and disconnection process with the testcase that
triggered a KASAN report. We analyzed the cause of this bug as follows:
1. In the packets captured during a normal connection, the host sends a
`Read Encryption Key Size` type of `HCI_CMD` packet
(Command Opcode: 0x1408) to the controller to inquire the length of
encryption key.After receiving this packet, the controller immediately
replies with a Command Completepacket (Event Code: 0x0e) to return the
Encryption Key Size.
2. In our fuzz test case, the timing of the controller's response to this
packet was delayed to an unexpected point: after the RFCOMM and L2CAP
layers had disconnected but before the HCI layer had disconnected.
3. After receiving the Encryption Key Size Response at the time described
in point 2, the host still called the rfcomm_check_security function.
However, by this time `struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn;`
had already been released, and when the function executed
`return hci_conn_security(conn->hcon, d->sec_level, auth_type, d->out);`,
specifically when accessing `conn->hcon`, a null-ptr-deref error occurred.
To fix this bug, check if `sk->sk_state` is BT_CLOSED before calling
rfcomm_recv_frame in rfcomm_process_rx.
Signed-off-by: Yuxuan Hu <20373622@buaa.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
During suspend, only wakeable devices can be in acceptlist, so if the
device was previously added it needs to be removed otherwise the device
can end up waking up the system prematurely.
Fixes: 3b42055388 ("Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan")
Signed-off-by: Clancy Shang <clancy.shang@quectel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
While handling the HCI_EV_HARDWARE_ERROR event, if the underlying
BT controller is not responding, the GPIO reset mechanism would
free the hci_dev and lead to a use-after-free in hci_error_reset.
Here's the call trace observed on a ChromeOS device with Intel AX201:
queue_work_on+0x3e/0x6c
__hci_cmd_sync_sk+0x2ee/0x4c0 [bluetooth <HASH:3b4a6>]
? init_wait_entry+0x31/0x31
__hci_cmd_sync+0x16/0x20 [bluetooth <HASH:3b4a 6>]
hci_error_reset+0x4f/0xa4 [bluetooth <HASH:3b4a 6>]
process_one_work+0x1d8/0x33f
worker_thread+0x21b/0x373
kthread+0x13a/0x152
? pr_cont_work+0x54/0x54
? kthread_blkcg+0x31/0x31
ret_from_fork+0x1f/0x30
This patch holds the reference count on the hci_dev while processing
a HCI_EV_HARDWARE_ERROR event to avoid potential crash.
Fixes: c7741d16a5 ("Bluetooth: Perform a power cycle when receiving hardware error event")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
There's a very confusing mistake in the code starting a HCI inquiry: We're
calling hci_dev_test_flag() to test for HCI_INQUIRY, but hci_dev_test_flag()
checks hdev->dev_flags instead of hdev->flags. HCI_INQUIRY is a bit that's
set on hdev->flags, not on hdev->dev_flags though.
HCI_INQUIRY equals the integer 7, and in hdev->dev_flags, 7 means
HCI_BONDABLE, so we were actually checking for HCI_BONDABLE here.
The mistake is only present in the synchronous code for starting an inquiry,
not in the async one. Also devices are typically bondable while doing an
inquiry, so that might be the reason why nobody noticed it so far.
Fixes: abfeea476c ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
A recent commit restored the original (and still documented) semantics
for the HCI_QUIRK_USE_BDADDR_PROPERTY quirk so that the device address
is considered invalid unless an address is provided by firmware.
This specifically means that this flag must only be set for devices with
invalid addresses, but the Broadcom BCM4377 driver has so far been
setting this flag unconditionally.
Fortunately the driver already checks for invalid addresses during setup
and sets the HCI_QUIRK_INVALID_BDADDR flag, which can simply be replaced
with HCI_QUIRK_USE_BDADDR_PROPERTY to indicate that the default address
is invalid but can be overridden by firmware (long term, this should
probably just always be allowed).
Fixes: 6945795bc8 ("Bluetooth: fix use-bdaddr-property quirk")
Cc: stable@vger.kernel.org # 6.5
Reported-by: Felix Zhang <mrman@mrman314.tech>
Link: https://lore.kernel.org/r/77419ffacc5b4875e920e038332575a2a5bff29f.camel@mrman314.tech/
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reported-by: Felix Zhang <mrman@mrman314.tech>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Andrew Lunn says:
====================
drivers: net: Convert EEE handling to use linkmode bitmaps
EEE has until recently been limited to lower speeds due to the use of
the legacy u32 for link speeds. This restriction has been lifted, with
the use of linkmode bitmaps, added in the following patches:
1f069de636 ethtool: add linkmode bitmap support to struct ethtool_keee
1d756ff13d ethtool: add suffix _u32 to legacy bitmap members of struct ethtool_keee
285cc15cc5 ethtool: adjust struct ethtool_keee to kernel needs
0b3100bc8f ethtool: switch back from ethtool_keee to ethtool_eee for ioctl
d80a523353 ethtool: replace struct ethtool_eee with a new struct ethtool_keee on kernel side
This patchset converts the remaining MAC drivers still using the old
_u32 to link modes.
A couple of Intel drivers do odd things with EEE, setting the autoneg
bit. It is unclear why, no other driver does, ethtool does not display
it, and EEE is always negotiated. One patch in this series deletes
this code.
With all users of the legacy _u32 changed to link modes, the _u32
values are removed from keee, and support for them in the ethtool core
is removed.
---
Changes in v5:
- Restore zeroing eee_data.advertised in ax8817_178a
- Fix lp_advertised -> supported in ixgdb
- Link to v4: https://lore.kernel.org/r/20240218-keee-u32-cleanup-v4-0-71f13b7c3e60@lunn.ch
Changes in v4:
- Add missing conversion in igb
- Add missing conversion in r8152
- Add patch to remove now unused _u32 members
- Link to v3: https://lore.kernel.org/r/20240217-keee-u32-cleanup-v3-0-fcf6b62a0c7f@lunn.ch
Changes in v3:
- Add list of commits adding linkmodes to EEE to cover letter
- Fix grammar error in cover letter.
- Add Reviewed-by from Jacob Keller
- Link to v2: https://lore.kernel.org/r/20240214-keee-u32-cleanup-v2-0-4ac534b83d66@lunn.ch
Changes in v2:
- igb: Fix type 100BaseT to 1000BaseT.
- Link to v1: https://lore.kernel.org/r/20240204-keee-u32-cleanup-v1-0-fb6e08329d9a@lunn.ch
====================
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
All MAC drivers have been converted to use the link mode members of
keee. So remove the _u32 values, and the code in the ethtool core to
convert the legacy _u32 values to link modes.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Energy Efficient Ethernet should always be negotiated with the link
peer. Don't include SUPPORTED_Autoneg in the results of get_eee() for
supported, advertised or lp_advertised, since it is
assumed. Additionally, ethtool(1) ignores the set bit, and no other
driver sets this.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the tables to make use of ETHTOOL link mode bits, rather than
the old u32 SUPPORTED speeds. Make use of the linkmode helps to set
bits and compare linkmodes. As a result, the _u32 members of keee are
no longer used, a step towards removing them.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the existing linkmode helpers for bit manipulation of EEE
advertise, support and link partner support. The aim is to drop the
restricted _u32 variants in the near future.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the existing linkmode helpers for converting PHY EEE
register values into links modes, now that ethtool_keee uses link
modes, rather than u32 values.
Rework determining if EEE is active to make is similar as to how
phylib decides, and make use of a phylib helper to validate if EEE is
valid in for the current link mode. This then requires that PHYLIB is
selected.
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input parameter 'level' in rawv6_get/seticmpfilter is not used.
Therefore, remove it.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fast path usage breakdown describes the detail for 'inet_sock', fix
the markup title.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when suspending driver and stopping workqueue it is checked whether
workqueue is not NULL and if so, it is destroyed.
Function destroy_workqueue() does drain queue and does clear variable, but
it does not set workqueue variable to NULL. This can cause kernel/module
panic if code attempts to clear workqueue that was not initialized.
This scenario is possible when resuming suspended driver in stmmac_resume(),
because there is no handling for failed stmmac_hw_setup(),
which can fail and return if DMA engine has failed to initialize,
and workqueue is initialized after DMA engine.
Should DMA engine fail to initialize, resume will proceed normally,
but interface won't work and TX queue will eventually timeout,
causing 'Reset adapter' error.
This then does destroy workqueue during reset process.
And since workqueue is initialized after DMA engine and can be skipped,
it will cause kernel/module panic.
To secure against this possible crash, set workqueue variable to NULL when
destroying workqueue.
Log/backtrace from crash goes as follows:
[88.031977]------------[ cut here ]------------
[88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
[88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
<Skipping backtrace for watchdog timeout>
[88.032251]---[ end trace e70de432e4d5c2c0 ]---
[88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
[88.036359]------------[ cut here ]------------
[88.036519]Call trace:
[88.036523] flush_workqueue+0x3e4/0x430
[88.036528] drain_workqueue+0xc4/0x160
[88.036533] destroy_workqueue+0x40/0x270
[88.036537] stmmac_fpe_stop_wq+0x4c/0x70
[88.036541] stmmac_release+0x278/0x280
[88.036546] __dev_close_many+0xcc/0x158
[88.036551] dev_close_many+0xbc/0x190
[88.036555] dev_close.part.0+0x70/0xc0
[88.036560] dev_close+0x24/0x30
[88.036564] stmmac_service_task+0x110/0x140
[88.036569] process_one_work+0x1d8/0x4a0
[88.036573] worker_thread+0x54/0x408
[88.036578] kthread+0x164/0x170
[88.036583] ret_from_fork+0x10/0x20
[88.036588]---[ end trace e70de432e4d5c2c1 ]---
[88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
Fixes: 5a5586112b ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct type in the hsr_forward_do() comment.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updating link clock rate for different speeds is only needed when
using RGMII, as that mode requires changing clock speed when the link
speed changes. Let's restrict updating the link clock speed in
ethqos_update_link_clk() to just RGMII. Other modes such as SGMII
only need to enable the link clock (which is already done in probe).
Signed-off-by: Sarosh Hasan <quic_sarohasa@quicinc.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8775p-ride
Reviewed-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Iurman says:
====================
ioam6: netlink multicast event
v5:
- remove the "must be the destination" check before sending an ioam6
event
v4:
- rebase on top of net merge
v3:
- patchset was mistakenly superseded due to same cover title used for
iproute2-next equivalent patch -> resend (renamed)
v2:
- fix warnings
Add generic netlink multicast event support to ioam6 as another solution
to share IOAM data with user space. The other one being via IPv6 raw
sockets combined with ancillary data (or packet socket, if the listener
does not need the processing of the IOAM Option-Type, since the hook is
before in that case). This patchset focuses on the IOAM Pre-allocated
Trace (the only Option-Type currently supported in the kernel), and so
on IOAM "trace" events. See an example of a consumer here [1].
[1] https://github.com/Advanced-Observability/ioam-agent-python/blob/netlink_event/ioam-agent.py
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If we're processing an IOAM Pre-allocated Trace Option-Type (the only
one supported currently), then send the trace as an ioam6 event to the
netlink multicast group. This way, user space apps will be able to
collect IOAM data.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a multicast group to the ioam6 generic netlink family and provide
ioam6_event() to send an ioam6 event to the multicast group.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new api to support ioam6 events for generic netlink multicast. A
first "trace" event is added to the list of ioam6 events, which will
represent an IOAM Pre-allocated Trace Option-Type. It provides another
solution to share IOAM data with user space.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
dtschema defines label as string, so $ref in other bindings is
redundant.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a previous patch I added "select PHYLIB" at the wrong place for the
ADIN1110 driver symbol, so move it to its correct place under the
ADIN1110 kconfig symbol.
Fixes: a9f80df4f5 ("net: ethernet: adi: requires PHYLIB support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/lkml/77012b38-4b49-47f4-9a88-d773d52909ad@infradead.org/T/#m8ba397484738711edc0ad607b2c63ca02244e3c3
Cc: Lennart Franzen <lennart@lfdomain.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Cc: Nuno Sa <nuno.sa@analog.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>