mirror of
https://github.com/torvalds/linux.git
synced 2024-12-12 06:02:38 +00:00
6d1970e1ef
1746 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Kui-Feng Lee
|
aef56f2e91 |
bpf: Update the struct_ops of a bpf_link.
By improving the BPF_LINK_UPDATE command of bpf(), it should allow you to conveniently switch between different struct_ops on a single bpf_link. This would enable smoother transitions from one struct_ops to another. The struct_ops maps passing along with BPF_LINK_UPDATE should have the BPF_F_LINK flag. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230323032405.3735486-6-kuifeng@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Kui-Feng Lee
|
68b04864ca |
bpf: Create links for BPF struct_ops maps.
Make bpf_link support struct_ops. Previously, struct_ops were always used alone without any associated links. Upon updating its value, a struct_ops would be activated automatically. Yet other BPF program types required to make a bpf_link with their instances before they could become active. Now, however, you can create an inactive struct_ops, and create a link to activate it later. With bpf_links, struct_ops has a behavior similar to other BPF program types. You can pin/unpin them from their links and the struct_ops will be deactivated when its link is removed while previously need someone to delete the value for it to be deactivated. bpf_links are responsible for registering their associated struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag set to create a bpf_link, while a structs without this flag behaves in the same manner as before and is registered upon updating its value. The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it used to craft the links for BPF struct_ops programs, but also to create links for BPF struct_ops them-self. Since the links of BPF struct_ops programs are only used to create trampolines internally, they are never seen in other contexts. Thus, they can be reused for struct_ops themself. To maintain a reference to the map supporting this link, we add bpf_struct_ops_link as an additional type. The pointer of the map is RCU and won't be necessary until later in the patchset. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Link: https://lore.kernel.org/r/20230323032405.3735486-4-kuifeng@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Christian Brauner
|
43b4506326
|
open: return EINVAL for O_DIRECTORY | O_CREAT
After a couple of years and multiple LTS releases we received a report that the behavior of O_DIRECTORY | O_CREAT changed starting with v5.7. On kernels prior to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL had the following semantics: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: create regular file * d exists and is a regular file: ENOTDIR * d exists and is a directory: EISDIR (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: create regular file * d exists and is a regular file: EEXIST * d exists and is a directory: EEXIST (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory On kernels since to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL have the following semantics: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: ENOTDIR (create regular file) * d exists and is a regular file: ENOTDIR * d exists and is a directory: EISDIR (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: ENOTDIR (create regular file) * d exists and is a regular file: EEXIST * d exists and is a directory: EEXIST (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory This is a fairly substantial semantic change that userspace didn't notice until Pedro took the time to deliberately figure out corner cases. Since no one noticed this breakage we can somewhat safely assume that O_DIRECTORY | O_CREAT combinations are likely unused. The v5.7 breakage is especially weird because while ENOTDIR is returned indicating failure a regular file is actually created. This doesn't make a lot of sense. Time was spent finding potential users of this combination. Searching on codesearch.debian.net showed that codebases often express semantical expectations about O_DIRECTORY | O_CREAT which are completely contrary to what our code has done and currently does. The expectation often is that this particular combination would create and open a directory. This suggests users who tried to use that combination would stumble upon the counterintuitive behavior no matter if pre-v5.7 or post v5.7 and quickly realize neither semantics give them what they want. For some examples see the code examples in [1] to [3] and the discussion in [4]. There are various ways to address this issue. The lazy/simple option would be to restore the pre-v5.7 behavior and to just live with that bug forever. But since there's a real chance that the O_DIRECTORY | O_CREAT quirk isn't relied upon we should try to get away with murder(ing bad semantics) first. If we need to Frankenstein pre-v5.7 behavior later so be it. So let's simply return EINVAL categorically for O_DIRECTORY | O_CREAT combinations. In addition to cleaning up the old bug this also opens up the possiblity to make that flag combination do something more intuitive in the future. Starting with this commit the following semantics apply: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: EINVAL * d exists and is a regular file: EINVAL * d exists and is a directory: EINVAL (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: EINVAL * d exists and is a regular file: EINVAL * d exists and is a directory: EINVAL (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory One additional note, O_TMPFILE is implemented as: #define __O_TMPFILE 020000000 #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) For older kernels it was important to return an explicit error when O_TMPFILE wasn't supported. So O_TMPFILE requires that O_DIRECTORY is raised alongside __O_TMPFILE. It also enforced that O_CREAT wasn't specified. Since O_DIRECTORY | O_CREAT could be used to create a regular allowing that combination together with __O_TMPFILE would've meant that false positives were possible, i.e., that a regular file was created instead of a O_TMPFILE. This could've been used to trick userspace into thinking it operated on a O_TMPFILE when it wasn't. Now that we block O_DIRECTORY | O_CREAT completely the check for O_CREAT in the __O_TMPFILE branch via if ((flags & O_TMPFILE_MASK) != O_TMPFILE) can be dropped. Instead we can simply check verify that O_DIRECTORY is raised via if (!(flags & O_DIRECTORY)) and explain this in two comments. As Aleksa pointed out O_PATH is unaffected by this change since it always returned EINVAL if O_CREAT was specified - with or without O_DIRECTORY. Link: https://lore.kernel.org/lkml/20230320071442.172228-1-pedro.falcato@gmail.com Link: https://sources.debian.org/src/flatpak/1.14.4-1/subprojects/libglnx/glnx-dirfd.c/?hl=324#L324 [1] Link: https://sources.debian.org/src/flatpak-builder/1.2.3-1/subprojects/libglnx/glnx-shutil.c/?hl=251#L251 [2] Link: https://sources.debian.org/src/ostree/2022.7-2/libglnx/glnx-dirfd.c/?hl=324#L324 [3] Link: https://www.openwall.com/lists/oss-security/2014/11/26/14 [4] Reported-by: Pedro Falcato <pedro.falcato@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Feiyang Chen
|
73f12c6da7 |
tools/nolibc: Add support for LoongArch
Add support for LoongArch (32 and 64 bit) to nolibc. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Feiyang Chen
|
b551cb7dc3 |
tools/nolibc: Add statx() and make stat() rely on statx() if necessary
LoongArch and RISC-V 32-bit only have statx(). ARC, Hexagon, Nios2 and OpenRISC have statx() and stat64() but not stat() or newstat(). Add statx() and make stat() rely on statx() if necessary to make them happy. We may just use statx() for all architectures in the future. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Feiyang Chen
|
a438e528b6 |
tools/nolibc: Include linux/fcntl.h and remove duplicate code
Include linux/fcntl.h for O_* and AT_*. asm/fcntl.h is included by linux/fcntl.h, so it can be safely removed. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
1c3a4c10cc |
tools/nolibc: check for S_I* macros before defining them
Defining S_I* flags in types.h can cause some build failures if linux/stat.h is included prior to it. But if not defined, some toolchains that include some glibc parts will in turn fail because linux/stat.h already takes care of avoiding these definitions when glibc is present. Let's preserve the macros here but first include linux/stat.h and check for their definition before doing so. We also define the previously missing permission macros so that we don't get a different behavior depending on the first include found. Cc: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
919d0532d4 |
tools/nolibc: add getuid() and geteuid()
This can be useful to avoid attempting some privileged operations, starting from the nolibc-test tool that gets two failures when not privileged. We call getuid32() and geteuid32() when they are defined, and fall back to getuid() and geteuid() otherwise. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Vincent Dagonneau
|
3e9fd4e9a1 |
tools/nolibc: add integer types and integer limit macros
This commit adds some of the missing integer types to stdint.h and adds limit macros (e.g. INTN_{MIN,MAX}). The reference used for adding these types is https://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html. We rely on the compiler-defined __LONG_MAX__ to get the right limits for size_t, intptr_t, uintptr_t and ptrdiff_t. This compiler constant seem to have been defined at least since GCC 4.1.2 and clang 3.0.0 on x86_64. It is also defined on ARM (32&64), mips and RISC-V. Note that the maximum size of size_t is implementation-defined (>65535), in this case I chose to go with unsigned long on all platforms since unsigned long == unsigned int on all the platforms we care about. Note that the kernel uses either unsigned int or unsigned long in linux/include/uapi/asm-generic/posix_types.h. These should be equivalent for the plaforms we are targeting. Also note that the 'fast*' flavor of the types have been chosen to be always 1 byte for '*fast8*' and always long (a.k.a. intptr_t/uintptr_t) for the other variants. I have never seen the 'fast*' types in use in the wild but that seems to be what glibc does. Signed-off-by: Vincent Dagonneau <v@vda.io> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Vincent Dagonneau
|
c34da317e0 |
tools/nolibc: add stdint.h
Nolibc works fine for small and limited program however most program expect integer types to be defined in stdint.h rather than std.h. This is a quick fix that moves the existing integer definitions in std.h to stdint.h. Signed-off-by: Vincent Dagonneau <v@vda.io> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Paul E. McKenney
|
d548e9ae07 |
tools/nolibc: Add gitignore to avoid git complaints about sysroot
Testing of nolibc can produce a tools/include/nolibc/sysroot file, which is not known to git. Because it is automatically generated, there is no reason for it to be known to git. Therefore, add a .gitignore to remove it from git's field of view. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org> Cc: Sven Schnelle <svens@linux.ibm.com> |
||
Jakub Kicinski
|
1118aa4c70 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/wireless/nl80211.c |
||
Linus Torvalds
|
478a351ce0 |
Including fixes from netfilter, wifi and ipsec.
Current release - regressions: - phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol() - virtio: vsock: don't use skbuff state to account credit - virtio: vsock: don't drop skbuff on copy failure - virtio_net: fix page_to_skb() miscalculating the memory size Current release - new code bugs: - eth: correct xdp_features after device reconfig - wifi: nl80211: fix the puncturing bitmap policy - net/mlx5e: flower: - fix raw counter initialization - fix missing error code - fix cloned flow attribute - ipa: - fix some register validity checks - fix a surprising number of bad offsets - kill FILT_ROUT_CACHE_CFG IPA register Previous releases - regressions: - tcp: fix bind() conflict check for dual-stack wildcard address - veth: fix use after free in XDP_REDIRECT when skb headroom is small - ipv4: fix incorrect table ID in IOCTL path - ipvlan: make skb->skb_iif track skb->dev for l3s mode - mptcp: - fix possible deadlock in subflow_error_report - fix UaFs when destroying unaccepted and listening sockets - dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290 Previous releases - always broken: - tcp: tcp_make_synack() can be called from process context, don't assume preemption is disabled when updating stats - netfilter: correct length for loading protocol registers - virtio_net: add checking sq is full inside xdp xmit - bonding: restore IFF_MASTER/SLAVE flags on bond enslave Ethertype change - phy: nxp-c45-tja11xx: fix MII_BASIC_CONFIG_REV bit number - eth: i40e: fix crash during reboot when adapter is in recovery mode - eth: ice: avoid deadlock on rtnl lock when auxiliary device plug/unplug meets bonding - dsa: mt7530: - remove now incorrect comment regarding port 5 - set PLL frequency and trgmii only when trgmii is used - eth: mtk_eth_soc: reset PCS state when changing interface types Misc: - ynl: another license adjustment - move the TCA_EXT_WARN_MSG attribute for tc action Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQUzTgACgkQMUZtbf5S IrvulQ/9GA5GaT52r5T9HaV5slygkHw9ValpfJAddI0MbBjeYfDhkSoTUujIr92W VMj+VpRcqS67pqzD2Z77s2EwB445NCOralB9ji8623tkCDevZU3gUKmjtiO5G7fP 4iAUbibfXjQiDKIeCdcVZ+SXYYdBSQDfFvQskU6/nzKuqjEbhC+GbiMWz7rt2SKe q9gHFSK1du2SGa6fIfJTEosa+MX4UTwAhLOReS5vSFhrlOsUCeMGTCzBfDuacQqn Iq1MJqW2yLceUar164xkYAAwRdL/ZLVkWaMza7KjM8Qi04MiopuFB2+moFDowrM9 D9lX6HMX9NUrHTFGjyZVk845PFxPW+Rnhu1/OKINdugOmcCHApYrtkxB6/Z+piS5 sW3kfkTPsQydA6Dx/RINJE39z6EYabwIQCc68D1HlPuTpOjYWTQdn0CvwxCmOFCr saTkd1wOeiwy8BheBSeX1QCkx4MwO6Dg+ObX/eKsYXGGWPMZcbMdbmmvFu7dZHhO cH4AGypRMrDa2IoYGqIs5sgkjxAMZZSkeQ1E+EpPw3n4us/QjQYrey5uto8uvErm zz7hI1qAwM8dooxsKdPyaARzM//Bq/gmYbqD0Ahts2t6BMX6eX2weneuQ4VJEf94 8RTtIu9BbBH0ysgBkgqMwCeM4YVtG+/e7p390z4tqPrwOi7bZ5A= =5/YI -----END PGP SIGNATURE----- Merge tag 'net-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wifi and ipsec. A little more changes than usual, but it's pretty normal for us that the rc3/rc4 PRs are oversized as people start testing in earnest. Possibly an extra boost from people deploying the 6.1 LTS but that's more of an unscientific hunch. Current release - regressions: - phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol() - virtio: vsock: don't use skbuff state to account credit - virtio: vsock: don't drop skbuff on copy failure - virtio_net: fix page_to_skb() miscalculating the memory size Current release - new code bugs: - eth: correct xdp_features after device reconfig - wifi: nl80211: fix the puncturing bitmap policy - net/mlx5e: flower: - fix raw counter initialization - fix missing error code - fix cloned flow attribute - ipa: - fix some register validity checks - fix a surprising number of bad offsets - kill FILT_ROUT_CACHE_CFG IPA register Previous releases - regressions: - tcp: fix bind() conflict check for dual-stack wildcard address - veth: fix use after free in XDP_REDIRECT when skb headroom is small - ipv4: fix incorrect table ID in IOCTL path - ipvlan: make skb->skb_iif track skb->dev for l3s mode - mptcp: - fix possible deadlock in subflow_error_report - fix UaFs when destroying unaccepted and listening sockets - dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290 Previous releases - always broken: - tcp: tcp_make_synack() can be called from process context, don't assume preemption is disabled when updating stats - netfilter: correct length for loading protocol registers - virtio_net: add checking sq is full inside xdp xmit - bonding: restore IFF_MASTER/SLAVE flags on bond enslave Ethertype change - phy: nxp-c45-tja11xx: fix MII_BASIC_CONFIG_REV bit number - eth: i40e: fix crash during reboot when adapter is in recovery mode - eth: ice: avoid deadlock on rtnl lock when auxiliary device plug/unplug meets bonding - dsa: mt7530: - remove now incorrect comment regarding port 5 - set PLL frequency and trgmii only when trgmii is used - eth: mtk_eth_soc: reset PCS state when changing interface types Misc: - ynl: another license adjustment - move the TCA_EXT_WARN_MSG attribute for tc action" * tag 'net-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits) selftests: bonding: add tests for ether type changes bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave fails bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change net: renesas: rswitch: Fix GWTSDIE register handling net: renesas: rswitch: Fix the output value of quote from rswitch_rx() ethernet: sun: add check for the mdesc_grab() net: ipa: fix some register validity checks net: ipa: kill FILT_ROUT_CACHE_CFG IPA register net: ipa: add two missing declarations net: ipa: reg: include <linux/bug.h> net: xdp: don't call notifiers during driver init net/sched: act_api: add specific EXT_WARN_MSG for tc action Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy" net: dsa: microchip: fix RGMII delay configuration on KSZ8765/KSZ8794/KSZ8795 ynl: make the tooling check the license ynl: broaden the license even more tools: ynl: make definitions optional again hsr: ratelimit only when errors are printed qed/qed_mng_tlv: correctly zero out ->min instead of ->hour selftests: net: devlink_port_split.py: skip test if no suitable device available ... |
||
Jakub Kicinski
|
4e16b6a748 |
ynl: broaden the license even more
I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but
we still put a slightly different license on the uAPI header
than the rest of the code. Use the Linux-syscall-note on all
the specs and all generated code. It's moot for kernel code,
but should not hurt. This way the licenses match everywhere.
Cc: Chuck Lever <chuck.lever@oracle.com>
Fixes:
|
||
Thomas Huth
|
c5edd753a0 |
KVM: x86: Remove the KVM_GET_NR_MMU_PAGES ioctl
The KVM_GET_NR_MMU_PAGES ioctl is quite questionable on 64-bit hosts since it fails to return the full 64 bits of the value that can be set with the corresponding KVM_SET_NR_MMU_PAGES call. Its "long" return value is truncated into an "int" in the kvm_arch_vm_ioctl() function. Since this ioctl also never has been used by userspace applications (QEMU, Google's internal VMM, kvmtool and CrosVM have been checked), it's likely the best if we remove this badly designed ioctl before anybody really tries to use it. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230208140105.655814-4-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Ross Zwisler
|
27d7fdf06f |
bpf: use canonical ftrace path
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many comments and samples in the bpf code still refer to this older debugfs path, so let's update them to avoid confusion. There are a few spots where the bpf code explicitly checks both tracefs and debugfs (tools/bpf/bpftool/tracelog.c and tools/lib/api/fs/fs.c) and I've left those alone so that the tools can continue to work with both paths. Signed-off-by: Ross Zwisler <zwisler@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230313205628.1058720-2-zwisler@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Lorenzo Bianconi
|
f85949f982 |
xdp: add xdp_set_features_flag utility routine
Introduce xdp_set_features_flag utility routine in order to update dynamically xdp_features according to the dynamic hw configuration via ethtool (e.g. changing number of hw rx/tx queues). Add xdp_clear_features_flag() in order to clear all xdp_feature flag. Reviewed-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Palmer Dabbelt
|
d3c7ec7588 |
Move bp_type_idx to include/linux/hw_breakpoint.h
This has a "#ifdef CONFIG_*" that used to be exposed to userspace. The names in here are so generic that I don't think it's a good idea to expose them to userspace (or even the rest of the kernel). There are multiple in-kernel users, so it's been moved to a kernel header file. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Reviewed-by: Andrew Waterman <waterman@eecs.berkeley.edu> Reviewed-by: Albert Ou <aou@eecs.berkeley.edu> Message-Id: <1447119071-19392-10-git-send-email-palmer@dabbelt.com> [thuth: Remove it also from tools/include/uapi/linux/hw_breakpoint.h] Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> |
||
Linus Torvalds
|
49be4fb281 |
perf tools fixes for v6.3:
- Add Adrian Hunter to MAINTAINERS as a perf tools reviewer. - Sync various tools/ copies of kernel headers with the kernel sources, this time trying to avoid first merging with upstream to then update but instead copy from upstream so that a merge is avoided and the end result after merging this pull request is the one expected, tools/perf/check-headers.sh (mostly) happy, less warnings while building tools/perf/. - Fix counting when initial delay configured by setting perf_attr.enable_on_exec when starting workloads from the perf command line. - Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject --buildid-all' when that record comes with a build-id, otherwise we end up not being able to resolve symbols. - Don't use comma as the CSV output separator the "stat+csv_output" test, as comma can appear on some tests as a modifier for an event, use @ instead, ditto for the JSON linter test. - The offcpu test was looking for some bits being set on task_struct->prev_state without masking other bits not important for this specific 'perf test', fix it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZApKjQAKCRCyPKLppCJ+ JzdfAQDRnwDCxhb4cvx7lVR32L1XMIFW6qLWRBJWoxC2SJi6lgD/SoQgKswkxrJv XnBP7jEaIsh3M3ak82MxLKbjSAEvnwk= =jup7 -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Add Adrian Hunter to MAINTAINERS as a perf tools reviewer - Sync various tools/ copies of kernel headers with the kernel sources, this time trying to avoid first merging with upstream to then update but instead copy from upstream so that a merge is avoided and the end result after merging this pull request is the one expected, tools/perf/check-headers.sh (mostly) happy, less warnings while building tools/perf/ - Fix counting when initial delay configured by setting perf_attr.enable_on_exec when starting workloads from the perf command line - Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject --buildid-all' when that record comes with a build-id, otherwise we end up not being able to resolve symbols - Don't use comma as the CSV output separator the "stat+csv_output" test, as comma can appear on some tests as a modifier for an event, use @ instead, ditto for the JSON linter test - The offcpu test was looking for some bits being set on task_struct->prev_state without masking other bits not important for this specific 'perf test', fix it * tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf tools: Add Adrian Hunter to MAINTAINERS as a reviewer tools headers UAPI: Sync linux/perf_event.h with the kernel sources tools headers x86 cpufeatures: Sync with the kernel sources tools include UAPI: Sync linux/vhost.h with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources tools include UAPI: Synchronize linux/fcntl.h with the kernel sources tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources tools headers UAPI: Sync linux/prctl.h with the kernel sources tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench' perf stat: Fix counting when initial delay configured tools headers svm: Sync svm headers with the kernel sources perf test: Avoid counting commas in json linter perf tests stat+csv_output: Switch CSV separator to @ perf inject: Fix --buildid-all not to eat up MMAP2 tools arch x86: Sync the msr-index.h copy with the kernel sources perf test: Fix offcpu test prev_state check |
||
Jakub Kicinski
|
d0ddf5065f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Documentation/bpf/bpf_devel_QA.rst |
||
Michael Weiß
|
5a70f4a630 |
bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h
Fix s/BPF_PROF_LOAD/BPF_PROG_LOAD/ typo in the documentation comment for BPF_F_ANY_ALIGNMENT in bpf.h. Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230309133823.944097-1-michael.weiss@aisec.fraunhofer.de |
||
Linus Torvalds
|
44889ba56c |
Networking fixes for 6.3-rc2, including fixes from netfilter, bpf
Current release - regressions: - core: avoid skb end_offset change in __skb_unclone_keeptruesize() - sched: - act_connmark: handle errno on tcf_idr_check_alloc - flower: fix fl_change() error recovery path - ieee802154: prevent user from crashing the host Current release - new code bugs: - eth: bnxt_en: fix the double free during device removal - tools: ynl: - fix enum-as-flags in the generic CLI - fully inherit attrs in subsets - re-license uniformly under GPL-2.0 or BSD-3-clause Previous releases - regressions: - core: use indirect calls helpers for sk_exit_memory_pressure() - tls: - fix return value for async crypto - avoid hanging tasks on the tx_lock - eth: ice: copy last block omitted in ice_get_module_eeprom() Previous releases - always broken: - core: avoid double iput when sock_alloc_file fails - af_unix: fix struct pid leaks in OOB support - tls: - fix possible race condition - fix device-offloaded sendpage straddling records - bpf: - sockmap: fix an infinite loop error - test_run: fix &xdp_frame misplacement for LIVE_FRAMES - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR - netfilter: tproxy: fix deadlock due to missing BH disable - phylib: get rid of unnecessary locking - eth: bgmac: fix *initial* chip reset to support BCM5358 - eth: nfp: fix csum for ipsec offload - eth: mtk_eth_soc: fix RX data corruption issue Misc: - usb: qmi_wwan: add telit 0x1080 composition Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmQJzQISHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOky5YP/04Dbsbeqpk0Q94axmjoaS0J/4rW49js RaA7v8ci7sL1omW8k5tILPXniAouN4YHNOCW1KbLBMR5O7lyn9qM1RteHOIpOmte TLzAw+6Wl7CyGiiqirv2GU96Wd/jZoZpPXFZz/gXP59GnkChSHzQcpexmz0nrmxI eCRSs+qm+re3wmDKTYm5C+g+420PNXu9JItPnTNf+nTkTBxpmOEMyry03I0taXKS wceQHB2q5E0sSWXDfkxG/pmUuYTj3AdRSQ+vo+FLevSs/LWeThs2I6pT5sn8XS76 1S8Lh6FytfBhyalFmRtrpqIJYyGae5MwEXQ29ddfmF4OFFLedx3IH0+JFQxTE9So i4gaXmM5SUI7c5vhib097xUISoLxKqqXQVQQSQ1MPZRfXtVubbA2gCv+vh6fXVoj zQYatZOLM7KT9q4Pw8A+9bJPof/FV+ObC67pbGQbJJgBoy+oOixDuP+x5DYT384L /5XS+23OZiFe7bvQoE/0SQMeRk3lF2XkS5l9gSbdSnGQPiaOqKhDgkoCmdkn1jvg qtkBS6+tRRoOBNsjC4r4eFXBVOQ1+myyjZetBnEOaSp22FaTJFQh9qX3AMFIHbUy m0jDi9OJZSWHICd6KNWPm3JK43cMjiyZbGftYqOHhuY5HN30vQN6sl7DXIJ0rIcE myHMfizwqmGT =hSXM -----END PGP SIGNATURE----- Merge tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - core: avoid skb end_offset change in __skb_unclone_keeptruesize() - sched: - act_connmark: handle errno on tcf_idr_check_alloc - flower: fix fl_change() error recovery path - ieee802154: prevent user from crashing the host Current release - new code bugs: - eth: bnxt_en: fix the double free during device removal - tools: ynl: - fix enum-as-flags in the generic CLI - fully inherit attrs in subsets - re-license uniformly under GPL-2.0 or BSD-3-clause Previous releases - regressions: - core: use indirect calls helpers for sk_exit_memory_pressure() - tls: - fix return value for async crypto - avoid hanging tasks on the tx_lock - eth: ice: copy last block omitted in ice_get_module_eeprom() Previous releases - always broken: - core: avoid double iput when sock_alloc_file fails - af_unix: fix struct pid leaks in OOB support - tls: - fix possible race condition - fix device-offloaded sendpage straddling records - bpf: - sockmap: fix an infinite loop error - test_run: fix &xdp_frame misplacement for LIVE_FRAMES - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR - netfilter: tproxy: fix deadlock due to missing BH disable - phylib: get rid of unnecessary locking - eth: bgmac: fix *initial* chip reset to support BCM5358 - eth: nfp: fix csum for ipsec offload - eth: mtk_eth_soc: fix RX data corruption issue Misc: - usb: qmi_wwan: add telit 0x1080 composition" * tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) tools: ynl: fix enum-as-flags in the generic CLI tools: ynl: move the enum classes to shared code net: avoid double iput when sock_alloc_file fails af_unix: fix struct pid leaks in OOB support eth: fealnx: bring back this old driver net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC net: microchip: sparx5: fix deletion of existing DSCP mappings octeontx2-af: Unlock contexts in the queue context cache in case of fault detection net/smc: fix fallback failed while sendmsg with fastopen ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause mailmap: update entries for Stephen Hemminger mailmap: add entry for Maxim Mikityanskiy nfc: change order inside nfc_se_io error path ethernet: ice: avoid gcc-9 integer overflow warning ice: don't ignore return codes in VSI related code ice: Fix DSCP PFC TLV creation net: usb: qmi_wwan: add Telit 0x1080 composition net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990 netfilter: conntrack: adopt safer max chain length net: tls: fix device-offloaded sendpage straddling records ... |
||
Andrii Nakryiko
|
6018e1f407 |
bpf: implement numbers iterator
Implement the first open-coded iterator type over a range of integers. It's public API consists of: - bpf_iter_num_new() constructor, which accepts [start, end) range (that is, start is inclusive, end is exclusive). - bpf_iter_num_next() which will keep returning read-only pointer to int until the range is exhausted, at which point NULL will be returned. If bpf_iter_num_next() is kept calling after this, NULL will be persistently returned. - bpf_iter_num_destroy() destructor, which needs to be called at some point to clean up iterator state. BPF verifier enforces that iterator destructor is called at some point before BPF program exits. Note that `start = end = X` is a valid combination to setup an empty iterator. bpf_iter_num_new() will return 0 (success) for any such combination. If bpf_iter_num_new() detects invalid combination of input arguments, it returns error, resets iterator state to, effectively, empty iterator, so any subsequent call to bpf_iter_num_next() will keep returning NULL. BPF verifier has no knowledge that returned integers are in the [start, end) value range, as both `start` and `end` are not statically known and enforced: they are runtime values. While the implementation is pretty trivial, some care needs to be taken to avoid overflows and underflows. Subsequent selftests will validate correctness of [start, end) semantics, especially around extremes (INT_MIN and INT_MAX). Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can be specified. bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded BPF loops and bpf_loop() helper and is the basis for implementing ergonomic BPF loops with no statically known or verified bounds. Subsequent patches implement bpf_for() macro, demonstrating how this can be wrapped into something that works and feels like a normal for() loop in C language. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Jakub Kicinski
|
37d9df224d |
ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
I was intending to make all the Netlink Spec code BSD-3-Clause to ease the adoption but it appears that: - I fumbled the uAPI and used "GPL WITH uAPI note" there - it gives people pause as they expect GPL in the kernel As suggested by Chuck re-license under dual. This gives us benefit of full BSD freedom while fulfilling the broad "kernel is under GPL" expectations. Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/ Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
36e5e391a2 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG 2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc= =EjJQ -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Arnaldo Carvalho de Melo
|
06a1574b94 |
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
To pick up the changes in:
|
||
Arnaldo Carvalho de Melo
|
14e998ed42 |
tools include UAPI: Sync linux/vhost.h with the kernel sources
To get the changes in:
|
||
Arnaldo Carvalho de Melo
|
33c53f9b5a |
tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
To pick up the changes in: |
||
Arnaldo Carvalho de Melo
|
5f800380af |
tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
To pick up the changes in:
|
||
Arnaldo Carvalho de Melo
|
811f35ff59 |
tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources
To pick up the changes in this cset:
|
||
Arnaldo Carvalho de Melo
|
df4b933e0e |
tools headers UAPI: Sync linux/prctl.h with the kernel sources
To pick new prctl options introduced in:
|
||
Tero Kristo
|
f71f853049 |
bpf: Add support for absolute value BPF timers
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start() to start an absolute value timer instead of the default relative value. This makes the timer expire at an exact point in time, instead of a time with latencies induced by both the BPF and timer subsystems. Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
857f1268a5 |
Changes in this cycle were:
- Shrink 'struct instruction', to improve objtool performance & memory footprint. - Other maximum memory usage reductions - this makes the build both faster, and fixes kernel build OOM failures on allyesconfig and similar configs when they try to build the final (large) vmlinux.o. - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying single-byte instruction (PUSH/POP or LEAVE). This requires the extension of the ORC metadata structure with a 'signal' field. - Misc fixes & cleanups. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmQAVp8RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gV6A//YbWb4nNxYbRFBd1O3FnFfy4efrDQ4btI hwkL6f7jka9RnIpIEatJvaLdNvyN5tuPCC/+B5eVnvFdd1JcBUmj5D+zYFt6H6qt BG4M6TNHFkP1kOJVfFGn8UPRfoMz2oMiEqilpsc1Yuf7b3ldMJtGUoHaeZC9pyqe RUisKNw4WHZp2G/gTBUWxW17xpWY3Awgch/w4HCu8wMnR+uEC44i0UCBfnAadl36 ar66PfhMJcQIv0XkK9wu43g7+HFnjpxHOx35JW3lRot0xRnwl/JcsmaX5iPkh0gt HV8eLH80J0homeMZDY7vWIKJxGeLkIdfjO5gxwTdnFc9rQw3GwHp1B7WTS6J3Vwe gM00kyaGly3CvkKMiz5QQBfViWCjE25nYS8X0i9Oz6Gk58IkRPGByaDTKRjNrDJB BwH9DE9xb3dPVZRv/PejkTdggQWo+FDTrL8ulHIjUFK11M7VubwkskecNHkfpAOE TRy5iLjMocF8u7hdyec6Mma2K6qEndC2Rw9ZMPQ7TeieMsBcl63cSRgSJLFfdRhr /5c6Hr2SNQKU8xu+3j49GyBwFvp4CwCa+GPs9/o+l0uCvuKNIn9B788cm4TjxLJ9 C3PRzE6B/CaLhYvlC5k5cNM+I4YpoMU/mvSvY6HcC0Duj2nSAWS2VV60MVMDpqVX 8nK4xnla2tM= =bpPY -----END PGP SIGNATURE----- Merge tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Shrink 'struct instruction', to improve objtool performance & memory footprint - Other maximum memory usage reductions - this makes the build both faster, and fixes kernel build OOM failures on allyesconfig and similar configs when they try to build the final (large) vmlinux.o - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying single-byte instruction (PUSH/POP or LEAVE). This requires the extension of the ORC metadata structure with a 'signal' field - Misc fixes & cleanups * tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) objtool: Fix ORC 'signal' propagation objtool: Remove instruction::list x86: Fix FILL_RETURN_BUFFER objtool: Fix overlapping alternatives objtool: Union instruction::{call_dest,jump_table} objtool: Remove instruction::reloc objtool: Shrink instruction::{type,visited} objtool: Make instruction::alts a single-linked list objtool: Make instruction::stack_ops a single-linked list objtool: Change arch_decode_instruction() signature x86/entry: Fix unwinding from kprobe on PUSH/POP instruction x86/unwind/orc: Add 'signal' field to ORC metadata objtool: Optimize layout of struct special_alt objtool: Optimize layout of struct symbol objtool: Allocate multiple structures with calloc() objtool: Make struct check_options static objtool: Make struct entries[] static and const objtool: Fix HOSTCC flag usage objtool: Properly support make V=1 objtool: Install libsubcmd in build ... |
||
Joanne Koong
|
66e3a13e7c |
bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr. The user must pass in a buffer to store the contents of the data slice if a direct pointer to the data cannot be obtained. For skb and xdp type dynptrs, these two APIs are the only way to obtain a data slice. However, for other types of dynptrs, there is no difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data. For skb type dynptrs, the data is copied into the user provided buffer if any of the data is not in the linear portion of the skb. For xdp type dynptrs, the data is copied into the user provided buffer if the data is between xdp frags. If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then the skb will be uncloned (see bpf_unclone_prologue()). Please note that any bpf_dynptr_write() automatically invalidates any prior data slices of the skb dynptr. This is because the skb may be cloned or may need to pull its paged buffer into the head. As such, any bpf_dynptr_write() will automatically have its prior data slices invalidated, even if the write is to data in the skb head of an uncloned skb. Please note as well that any other helper calls that change the underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data slices of the skb dynptr as well, for the same reasons. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Joanne Koong
|
05421aecd4 |
bpf: Add xdp dynptrs
Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() should be used. For examples of how xdp dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Joanne Koong
|
b5964b968a |
bpf: Add skb dynptrs
Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (bpf_dynptr_write() will return an error) For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write() interfaces, reading and writing from/to data in the head as well as from/to non-linear paged buffers is supported. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used. For examples of how skb dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
5ca26d6039 |
Including fixes from wireless and netfilter.
Current release - regressions: - phy: multiple fixes for EEE rework - wifi: wext: warn about usage only once - wifi: ath11k: allow system suspend to survive ath11k Current release - new code bugs: - mlx5: Fix memory leak in IPsec RoCE creation - ibmvnic: assign XPS map to correct queue index Previous releases - regressions: - netfilter: ip6t_rpfilter: Fix regression with VRF interfaces - netfilter: ctnetlink: make event listener tracking global - nf_tables: allow to fetch set elements when table has an owner - mlx5: - fix skb leak while fifo resync and push - fix possible ptp queue fifo use-after-free Previous releases - always broken: - sched: fix action bind logic - ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also uses a mutex - netfilter: conntrack: fix rmmod double-free race - netfilter: xt_length: use skb len to match in length_mt6, avoid issues with BIG TCP Misc: - ice: remove unnecessary CONFIG_ICE_GNSS - mlx5e: remove hairpin write debugfs files - sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP9JgYACgkQMUZtbf5S IrsIRRAApy4Hjb8z5z3k4HOM2lA3b/3OWD301I5YtoU3FC4L938yETAPFYUGbWrX rKN4YOTNh2Fvkgbni7vz9hbC84C6i86Q9+u7dT1U+kCk3kbyQPFZlEDj5fY0I8zK 1xweCRrC1CcG74S2M5UO3UnWz1ypWQpTnHfWZqq0Duh1j9Xc+MHjHC2IKrGnzM6U 1/ODk9FrtsWC+KGJlWwiV+yJMYUA4nCKIS/NrmdRlBa7eoP0oC1xkA8g0kz3/P3S O+xMyhExcZbMYY5VMkiGBZ5l8Ve3t6lHcMXq7jWlSCOeXd4Ut6zzojHlGZjzlCy9 RQQJzva2wlltqB9rECUQixpZbVS6ubf5++zvACOKONhSIEdpWjZW9K/qsV8igbfM Xx0hsG1jCBt/xssRw2UBsq73vjNf1AkdksvqJgcggAvBJU8cV3MxRRB4/9lyPdmB NNFqehwCeE3aU0FSBKoxZVYpfg+8J/XhwKT63Cc2d1ENetsWk/LxvkYm24aokpW+ nn+jUH9AYk3rFlBVQG1xsCwU4VlGk/yZgRwRMYFBqPkAGcXLZOnqdoSviBPN3yN0 Habs1hxToMt3QBgLJcMVn8CYdWCJgnZpxs8Mfo+PGoWKHzQ9kXBdyYyIZm1GyesD BN/2QN38yMGXRALd2NXS2Va4ygX7KptB7+HsitdkzKCqcp1Ao+I= =Ko4p -----END PGP SIGNATURE----- Merge tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. The notable fixes here are the EEE fix which restores boot for many embedded platforms (real and QEMU); WiFi warning suppression and the ICE Kconfig cleanup. Current release - regressions: - phy: multiple fixes for EEE rework - wifi: wext: warn about usage only once - wifi: ath11k: allow system suspend to survive ath11k Current release - new code bugs: - mlx5: Fix memory leak in IPsec RoCE creation - ibmvnic: assign XPS map to correct queue index Previous releases - regressions: - netfilter: ip6t_rpfilter: Fix regression with VRF interfaces - netfilter: ctnetlink: make event listener tracking global - nf_tables: allow to fetch set elements when table has an owner - mlx5: - fix skb leak while fifo resync and push - fix possible ptp queue fifo use-after-free Previous releases - always broken: - sched: fix action bind logic - ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also uses a mutex - netfilter: conntrack: fix rmmod double-free race - netfilter: xt_length: use skb len to match in length_mt6, avoid issues with BIG TCP Misc: - ice: remove unnecessary CONFIG_ICE_GNSS - mlx5e: remove hairpin write debugfs files - sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy" * tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) tcp: tcp_check_req() can be called from process context net: phy: c45: fix network interface initialization failures on xtensa, arm:cubieboard xen-netback: remove unused variables pending_idx and index net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy net: dsa: ocelot_ext: remove unnecessary phylink.h include net: mscc: ocelot: fix duplicate driver name error net: dsa: felix: fix internal MDIO controller resource length net: dsa: seville: ignore mscc-miim read errors from Lynx PCS net/sched: act_sample: fix action bind logic net/sched: act_mpls: fix action bind logic net/sched: act_pedit: fix action bind logic wifi: wext: warn about usage only once wifi: mt76: usb: fix use-after-free in mt76u_free_rx_queue qede: avoid uninitialized entries in coal_entry array nfc: fix memory leak of se_io context in nfc_genl_se_io ice: remove unnecessary CONFIG_ICE_GNSS net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy() ibmvnic: Assign XPS map to correct queue index docs: net: fix inaccuracies in msg_zerocopy.rst tools: net: add __pycache__ to gitignore ... |
||
Tariq Toukan
|
1862de92c8 |
netdev-genl: fix repeated typo oflloading -> offloading
Fix a repeated copy/paste typo.
Fixes:
|
||
Linus Torvalds
|
f01d4c8a22 |
nolibc updates for v6.3
o Add s390 support. o Add support for the ARM Thumb1 instruction set. o Fix O_* flags definitions for open() and fcntl(). o Make errno a weak symbol instead of a static variable. o Export environ as a weak symbol. o Export _auxv as a weak symbol for auxilliary vector retrieval. o Implement getauxval() and getpagesize(). o Further improve self tests, including permitting userland testing of the nolibc library. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmPh1DITHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jJTqD/9FPv58m1ZJWP8j8EMF9p6Pd2GuYJ/F t0tSf8Qmv0tTLqtPzZtu5E5b5bTvsgxQkQJUGLtUBf5l0AsyQt5ve5EUlzGgBHAP 8opwLEzCPUMhjq6ZsHJrmLIPwrH1reVYiAV2uIdBxLHLjGF8QLdYgqIGtguRBIHT o9HS9RAyPxvMmV8OZqhp+NLjcEzKGloUBdcnDLURQ8Wy12vSQnALl9w1OKiN40rz dlmXcysn8TboRWZS/DJqr/Xsg5W8ZMIfxrlopgR+FwrqutwH2ZDKgnc5ixm9YxFF CJCM2QZO8d8UtAxllJRH3NApTCHJh6c257w4awEU97hgkHfhw0tHgRs6sOz6ho0g O5OeOTAv0NkNNt5jGHXI4s0iQwVU/Ek6m3N8RC2GGzuMXGDcKvbFzGB4T8m8AhYL MnyaQvuq8SWhE84c+gQgxagZ5cdm8r2hDgnSrlI7P19W5SCsQq7MNSo1WyHQ7uss sMyxomvCC3y4pMgHcJHWwxtjR/BKjN1wtgCHCvTFcE8k98ti/ycKS6X/zQbGie/1 j20AgP0Cli2MVq+vocInvn0Gf4Ce0xxu5kB0NM8RMX+uiYNB0cJR4lIyWxt0680U M2Ya6AnfO8Sn57BptTp+QaqZidx9IJJzrAY4QBsdzXIsyJ2kKTK8BVNIaWMQ96nB twKV/fU0HWWcJQ== =S+cL -----END PGP SIGNATURE----- Merge tag 'nolibc.2023.02.06a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: - Add s390 support - Add support for the ARM Thumb1 instruction set - Fix O_* flags definitions for open() and fcntl() - Make errno a weak symbol instead of a static variable - Export environ as a weak symbol - Export _auxv as a weak symbol for auxilliary vector retrieval - Implement getauxval() and getpagesize() - Further improve self tests, including permitting userland testing of the nolibc library * tag 'nolibc.2023.02.06a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (28 commits) selftests/nolibc: Add a "run-user" target to test the program in user land selftests/nolibc: Support "x86_64" for arch name selftests/nolibc: Add `getpagesize(2)` selftest nolibc/sys: Implement `getpagesize(2)` function nolibc/stdlib: Implement `getauxval(3)` function tools/nolibc: add auxiliary vector retrieval for s390 tools/nolibc: add auxiliary vector retrieval for mips tools/nolibc: add auxiliary vector retrieval for riscv tools/nolibc: add auxiliary vector retrieval for arm tools/nolibc: add auxiliary vector retrieval for arm64 tools/nolibc: add auxiliary vector retrieval for x86_64 tools/nolibc: add auxiliary vector retrieval for i386 tools/nolibc: export environ as a weak symbol on s390 tools/nolibc: export environ as a weak symbol on riscv tools/nolibc: export environ as a weak symbol on mips tools/nolibc: export environ as a weak symbol on arm tools/nolibc: export environ as a weak symbol on arm64 tools/nolibc: export environ as a weak symbol on i386 tools/nolibc: export environ as a weak symbol on x86_64 tools/nolibc: make errno a weak symbol instead of a static one ... |
||
Ingo Molnar
|
585a78c1f7 |
Merge branch 'linus' into objtool/core, to pick up Xen dependencies
Pick up dependencies - freshly merged upstream via xen-next - before applying dependent objtool changes. Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Martin KaFai Lau
|
31de4105f0 |
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev |
||
Tiezhu Yang
|
524581d121 |
selftests/bpf: Fix build error for LoongArch
There exists build error when make -C tools/testing/selftests/bpf/ on LoongArch: BINARY test_verifier In file included from test_verifier.c:27: tools/include/uapi/linux/bpf_perf_event.h:14:28: error: field 'regs' has incomplete type 14 | bpf_user_pt_regs_t regs; | ^~~~ make: *** [Makefile:577: tools/testing/selftests/bpf/test_verifier] Error 1 make: Leaving directory 'tools/testing/selftests/bpf' Add missing uapi header for LoongArch to use the following definition: typedef struct user_pt_regs bpf_user_pt_regs_t; Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1676458867-22052-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Dave Marchevsky
|
9c395c1b99 |
bpf: Add basic bpf_rb_{root,node} support
This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new types, and adds bpf_rb_root_free function for freeing bpf_rb_root in map_values. structs bpf_rb_root and bpf_rb_node are opaque types meant to obscure structs rb_root_cached rb_node, respectively. btf_struct_access will prevent BPF programs from touching these special fields automatically now that they're recognized. btf_check_and_fixup_fields now groups list_head and rb_root together as "graph root" fields and {list,rb}_node as "graph node", and does same ownership cycle checking as before. Note that this function does _not_ prevent ownership type mixups (e.g. rb_root owning list_node) - that's handled by btf_parse_graph_root. After this patch, a bpf program can have a struct bpf_rb_root in a map_value, but not add anything to nor do anything useful with it. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Josh Poimboeuf
|
ffb1b4a410 |
x86/unwind/orc: Add 'signal' field to ORC metadata
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and exceptions) rather than decremented (like for call stack return addresses) when used to find the next ORC entry. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org |
||
Florian Lehner
|
17c9b4e1a7 |
bpf: fix typo in header for bpf_perf_prog_read_value
Fix a simple typo in the documentation for bpf_perf_prog_read_value. Signed-off-by: Florian Lehner <dev@der-flo.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230203121439.25884-1-dev@der-flo.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Jakub Kicinski
|
d3d854fd6a |
netdev-genl: create a simple family for netdev stuff
Add a Netlink spec-compatible family for netdevs. This is a very simple implementation without much thought going into it. It allows us to reap all the benefits of Netlink specs, one can use the generic client to issue the commands: $ ./cli.py --spec netdev.yaml --dump dev_get [{'ifindex': 1, 'xdp-features': set()}, {'ifindex': 2, 'xdp-features': {'basic', 'ndo-xmit', 'redirect'}}, {'ifindex': 3, 'xdp-features': {'rx-sg'}}] the generic python library does not have flags-by-name support, yet, but we also don't have to carry strings in the messages, as user space can get the names from the spec. Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/327ad9c9868becbe1e601b580c962549c8cd81f2.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Tiezhu Yang
|
e2bd974298 |
tools/bpf: Use tab instead of white spaces to sync bpf.h
Just silence the following build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1675319486-27744-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Jakub Kicinski
|
2d104c390f |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY9RqJgAKCRDbK58LschI gw2IAP9G5uhFO5abBzYLupp6SY3T5j97MUvPwLfFqUEt7EXmuwEA2lCUEWeW0KtR QX+QmzCa6iHxrW7WzP4DUYLue//FJQY= =yYqA -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-28 We've added 124 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 6386 insertions(+), 1827 deletions(-). The main changes are: 1) Implement XDP hints via kfuncs with initial support for RX hash and timestamp metadata kfuncs, from Stanislav Fomichev and Toke Høiland-Jørgensen. Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk 2) Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case, from Andrii Nakryiko. 3) Significantly reduce the search time for module symbols by livepatch and BPF, from Jiri Olsa and Zhen Lei. 4) Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals, from David Vernet. 5) Fix several issues in the dynptr processing such as stack slot liveness propagation, missing checks for PTR_TO_STACK variable offset, etc, from Kumar Kartikeya Dwivedi. 6) Various performance improvements, fixes, and introduction of more than just one XDP program to XSK selftests, from Magnus Karlsson. 7) Big batch to BPF samples to reduce deprecated functionality, from Daniel T. Lee. 8) Enable struct_ops programs to be sleepable in verifier, from David Vernet. 9) Reduce pr_warn() noise on BTF mismatches when they are expected under the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien. 10) Describe modulo and division by zero behavior of the BPF runtime in BPF's instruction specification document, from Dave Thaler. 11) Several improvements to libbpf API documentation in libbpf.h, from Grant Seltzer. 12) Improve resolve_btfids header dependencies related to subcmd and add proper support for HOSTCC, from Ian Rogers. 13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room() helper along with BPF selftests, from Ziyang Xuan. 14) Simplify the parsing logic of structure parameters for BPF trampoline in the x86-64 JIT compiler, from Pu Lehui. 15) Get BTF working for kernels with CONFIG_RUST enabled by excluding Rust compilation units with pahole, from Martin Rodriguez Reboredo. 16) Get bpf_setsockopt() working for kTLS on top of TCP sockets, from Kui-Feng Lee. 17) Disable stack protection for BPF objects in bpftool given BPF backends don't support it, from Holger Hoffstätte. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits) selftest/bpf: Make crashes more debuggable in test_progs libbpf: Add documentation to map pinning API functions libbpf: Fix malformed documentation formatting selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket. bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). bpf/selftests: Verify struct_ops prog sleepable behavior bpf: Pass const struct bpf_prog * to .check_member libbpf: Support sleepable struct_ops.s section bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable selftests/bpf: Fix vmtest static compilation error tools/resolve_btfids: Alter how HOSTCC is forced tools/resolve_btfids: Install subcmd headers bpf/docs: Document the nocast aliasing behavior of ___init bpf/docs: Document how nested trusted fields may be defined bpf/docs: Document cpumask kfuncs in a new file selftests/bpf: Add selftest suite for cpumask kfuncs selftests/bpf: Add nested trust selftests suite bpf: Enable cpumasks to be queried and used as kptrs bpf: Disallow NULLable pointers for trusted kfuncs ... ==================== Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Stanislav Fomichev
|
2b3486bc2d |
bpf: Introduce device-bound XDP programs
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way to associate a netdev with a BPF program at load time. netdevsim checks are dropped in favor of generic check in dev_xdp_attach. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Jakub Kicinski
|
b3c588cd55 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_interrupt.c drivers/net/ipa/ipa_interrupt.h |
||
Arnaldo Carvalho de Melo
|
c905ecfbb8 |
tools headers: Syncronize linux/build_bug.h with the kernel sources
To pick up the changes in:
|
||
Arnaldo Carvalho de Melo
|
8026a31df6 |
tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:
|
||
Ziyang Xuan
|
d219df60a7 |
bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the new IP header version after decapsulating the outer IP header. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Jakub Kicinski
|
a99da46ac0 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/usb/r8152.c |
||
Ammar Faizi
|
7efd762e97 |
nolibc/sys: Implement getpagesize(2) function
This function returns the page size used by the running kernel. The page size value is taken from the auxiliary vector at 'AT_PAGESZ' key. 'getpagesize(2)' is assumed as a syscall becuase the manpage placement of this function is in entry 2 ('man 2 getpagesize') despite there is no real 'getpagesize(2)' syscall in the Linux syscall table. Define this function in 'sys.h'. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Ammar Faizi
|
c61a078015 |
nolibc/stdlib: Implement getauxval(3) function
Previous commits save the address of the auxiliary vector into a global variable @_auxv. This commit creates a new function 'getauxval()' as a helper function to get the auxv value based on the given key. The behavior of this function is identic with the function documented in 'man 3 getauxval'. This function is also needed to implement 'getpagesize()' function that we will wire up in the next patches. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Sven Schnelle
|
241c4b4e02 |
tools/nolibc: add auxiliary vector retrieval for s390
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
d01869cf1e |
tools/nolibc: add auxiliary vector retrieval for mips
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
041fa97cb3 |
tools/nolibc: add auxiliary vector retrieval for riscv
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. It was tested on riscv64 only. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
59ea187624 |
tools/nolibc: add auxiliary vector retrieval for arm
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> It was tested in arm, thumb1 and thumb2 modes. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
2a39a53245 |
tools/nolibc: add auxiliary vector retrieval for arm64
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
1cce162ab4 |
tools/nolibc: add auxiliary vector retrieval for x86_64
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
2ab4aa487b |
tools/nolibc: add auxiliary vector retrieval for i386
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Sven Schnelle
|
9e5bdc613d |
tools/nolibc: export environ as a weak symbol on s390
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested on s390 both with environ inherited from _start and extracted from envp. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
758f333795 |
tools/nolibc: export environ as a weak symbol on riscv
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested on riscv64 both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
8f7fafebd1 |
tools/nolibc: export environ as a weak symbol on mips
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested with mips24kc (BE) both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
a6f29a2c41 |
tools/nolibc: export environ as a weak symbol on arm
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested in arm and thumb1 and thumb2 modes, and for each mode, both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
9b8688c6ea |
tools/nolibc: export environ as a weak symbol on arm64
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
52e423f5b9 |
tools/nolibc: export environ as a weak symbol on i386
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
89dc50921c |
tools/nolibc: export environ as a weak symbol on x86_64
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
1caa1154c3 |
tools/nolibc: make errno a weak symbol instead of a static one
Till now errno was declared static so that it could be eliminated if unused. While the goal is commendable for tiny executables as it allows to eliminate any data and bss segments when not used, this comes with some limitations, one of which being that the errno symbol seen in different units are not the same. Even though this has never been a real issue given the nature of the programs involved till now, it happens that referencing the same symbol from multiple units can also be achieved using weak symbols, with a difference being that only one of them will be used for all of them. Compared to weak symbols, static basically have no benefit for regular programs since there are always at least a few variables in most of these, so the bss segment cannot be eliminated. E.g: $ size nolibc-test-static-errno text data bss dec hex filename 11531 0 48 11579 2d3b nolibc-test-static-errno Furthermore, the weak symbol doesn't use bss storage at all, resulting in a slightly section: $ size nolibc-test-weak-errno text data bss dec hex filename 11531 0 40 11571 2d33 nolibc-test-weak-errno This patch thus converts errno from static to weak. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
d5b48f958b |
tools/nolibc: remove local definitions of O_* flags for open/fcntl
The historic nolibc code did not include asm/fcntl.h and had to define the various O_RDWR etc macros in each arch-specific file (since such values differ between certain archs). This was found at least once to induce bugs due to wrong definitions. Let's get rid of all of them and include asm/nolibc.h from sys.h instead. This was verified to work properly on all supported architectures. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
5a51b6de59 |
tools/nolibc: support thumb mode with frame pointers on ARM
In Thumb mode, register r7 is normally used to store the frame pointer. By default when optimizing at -Os there's no frame pointer so this works fine. But if no optimization is set, then build errors occur, indicating that r7 cannot not be used. It's difficult to cheat because it's the compiler that is complaining, not the assembler, so it's not even possible to report that the register was clobbered. The solution consists in saving and restoring r7 around the syscall, but this slightly inflates the code. The syscall number is passed via r6 which is never used by syscalls. The current patch adds a few macroes which do that only in Thumb mode, and which continue to directly assign the syscall number to register r7 in ARM mode. Now this always builds and works for all modes (tested on Arm, Thumbv1, Thumbv2 modes, at -Os, -O0, -O0 -fomit-frame-pointer). The code is very slightly inflated in thumb-mode without frame-pointers compared to previously (e.g. 7928 vs 7864 bytes for nolibc-test) but at least it's always operational. And it's possible to disable this mechanism by setting NOLIBC_OMIT_FRAME_POINTER. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
20470dfd65 |
tools/nolibc: enable support for thumb1 mode for ARM
Passing -mthumb to the kernel.org arm toolchain failed to build because it defaults to armv5 hence thumb1, which has a fairly limited instruction set compared to thumb2 enabled with armv7 that is much more complete. It's not very difficult to adjust the instructions to also build on thumb1, it only adds a total of 3 instructions, so it's worth doing it at least to ease use by casual testers. It was verified that the adjusted code now builds and works fine for armv5, thumb1, armv7 and thumb2, as long as frame pointers are not used. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
7f85485896 |
tools/nolibc: make compiler and assembler agree on the section around _start
The out-of-block asm() statement carrying _start does not allow the compiler to know what section the assembly code is being emitted to, and there's no easy way to push/pop the current section and restore it. It sometimes causes issues depending on the include files ordering and compiler optimizations. For example if a variable is declared immediately before the asm() block and another one after, the compiler assumes that the current section is still .bss and doesn't re-emit it, making the second variable appear inside the .text section instead. Forcing .bss at the end of the _start block doesn't work either because at certain optimizations the compiler may reorder blocks and will make some real code appear just after this block. A significant number of solutions were attempted, but many of them were still sensitive to section reordering. In the end, the best way to make sure the compiler and assembler agree on the current section is to place this code inside a function. Here the function is directly called _start and configured not to emit a frame-pointer, hence to have no prologue. If some future architectures would still emit some prologue, another working approach consists in naming the function differently and placing the _start label inside the asm statement. But the current solution is simpler. It was tested with nolibc-test at -O,-O0,-O2,-O3,-Os for arm,arm64,i386, mips,riscv,s390 and x86_64. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
00b18da408 |
tools/nolibc: fix the O_* fcntl/open macro definitions for riscv
When RISCV port was imported in 5.2, the O_* macros were taken with
their octal value and written as-is in hex, resulting in the getdents64()
to fail in nolibc-test.
Fixes:
|
||
Sven Schnelle
|
18a5a09d90 |
nolibc: add support for s390
Use arch-x86_64 as a template. Not really different, but we have our own mmap syscall which takes a structure instead of discrete arguments. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
1bfbe1f3e9 |
tools/nolibc: prevent gcc from making memset() loop over itself
When building on ARM in thumb mode with gcc-11.3 at -O2 or -O3, nolibc-test segfaults during the select() tests. It turns out that at this level, gcc recognizes an opportunity for using memset() to zero the fd_set, but it miscompiles it because it also recognizes a memset pattern as well, and decides to call memset() from the memset() code: 000122bc <memset>: 122bc: b510 push {r4, lr} 122be: 0004 movs r4, r0 122c0: 2a00 cmp r2, #0 122c2: d003 beq.n 122cc <memset+0x10> 122c4: 23ff movs r3, #255 ; 0xff 122c6: 4019 ands r1, r3 122c8: f7ff fff8 bl 122bc <memset> 122cc: 0020 movs r0, r4 122ce: bd10 pop {r4, pc} Simply placing an empty asm() statement inside the loop suffices to avoid this. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
55abdd1f5e |
tools/nolibc: fix missing includes causing build issues at -O0
After the nolibc includes were split to facilitate portability from standard libcs, programs that include only what they need may miss some symbols which are needed by libgcc. This is the case for raise() which is needed by the divide by zero code in some architectures for example. Regardless, being able to include only the apparently needed files is convenient. Instead of trying to move all exported definitions to a single file, since this can change over time, this patch takes another approach consisting in including the nolibc header at the end of all standard include files. This way their types and functions are already known at the moment of inclusion, and including any single one of them is sufficient to bring all the required ones. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Willy Tarreau
|
184177c3d6 |
tools/nolibc: restore mips branch ordering in the _start block
Depending on the compiler used and the optimization options, the sbrk()
test was crashing, both on real hardware (mips-24kc) and in qemu. One
such example is kernel.org toolchain in version 11.3 optimizing at -Os.
Inspecting the sys_brk() call shows the following code:
0040047c <sys_brk>:
40047c: 24020fcd li v0,4045
400480: 27bdffe0 addiu sp,sp,-32
400484: 0000000c syscall
400488: 27bd0020 addiu sp,sp,32
40048c: 10e00001 beqz a3,400494 <sys_brk+0x18>
400490: 00021023 negu v0,v0
400494: 03e00008 jr ra
It is obviously wrong, the "negu" instruction is placed in beqz's
delayed slot, and worse, there's no nop nor instruction after the
return, so the next function's first instruction (addiu sip,sip,-32)
will also be executed as part of the delayed slot that follows the
return.
This is caused by the ".set noreorder" directive in the _start block,
that applies to the whole program. The compiler emits code without the
delayed slots and relies on the compiler to swap instructions when this
option is not set. Removing the option would require to change the
startup code in a way that wouldn't make it look like the resulting
code, which would not be easy to debug. Instead let's just save the
default ordering before changing it, and restore it at the end of the
_start block. Now the code is correct:
0040047c <sys_brk>:
40047c: 24020fcd li v0,4045
400480: 27bdffe0 addiu sp,sp,-32
400484: 0000000c syscall
400488: 10e00002 beqz a3,400494 <sys_brk+0x18>
40048c: 27bd0020 addiu sp,sp,32
400490: 00021023 negu v0,v0
400494: 03e00008 jr ra
400498: 00000000 nop
Fixes:
|
||
Warner Losh
|
16f5cea741 |
tools/nolibc: Fix S_ISxxx macros
The mode field has the type encoded as an value in a field, not as a bit mask. Mask the mode with S_IFMT instead of each type to test. Otherwise, false positives are possible: eg S_ISDIR will return true for block devices because S_IFDIR = 0040000 and S_IFBLK = 0060000 since mode is masked with S_IFDIR instead of S_IFMT. These macros now match the similar definitions in tools/include/uapi/linux/stat.h. Signed-off-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Sven Schnelle
|
feaf756587 |
nolibc: fix fd_set type
The kernel uses unsigned long for the fd_set bitmap, but nolibc use u32. This works fine on little endian machines, but fails on big endian. Convert to unsigned long to fix this. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
Jakub Kicinski
|
4aea86b403 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
d75858ef10 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY7X/4wAKCRDbK58LschI g7gzAQCjKsLtAWg1OplW+B7pvEPwkQ8g3O1+PYWlToCUACTlzQD+PEMrqGnxB573 oQAk6I2yOTwLgvlHkrm+TIdKSouI4gs= =2hUY -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-04 We've added 45 non-merge commits during the last 21 day(s) which contain a total of 50 files changed, 1454 insertions(+), 375 deletions(-). The main changes are: 1) Fixes, improvements and refactoring of parts of BPF verifier's state equivalence checks, from Andrii Nakryiko. 2) Fix a few corner cases in libbpf's BTF-to-C converter in particular around padding handling and enums, also from Andrii Nakryiko. 3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata, from Christian Ehrig. 4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks, from Dave Marchevsky. 5) Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers, from Jiri Olsa. 6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps, from Maryam Tahhan. 7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du. 8) Bigger batch of improvements to BPF tracing code samples, from Daniel T. Lee. 9) Add LoongArch support to libbpf's bpf_tracing helper header, from Hengqi Chen. 10) Fix a libbpf compiler warning in perf_event_open_probe on arm32, from Khem Raj. 11) Optimize bpf_local_storage_elem by removing 56 bytes of padding, from Martin KaFai Lau. 12) Use pkg-config to locate libelf for resolve_btfids build, from Shen Jiamin. 13) Various libbpf improvements around API documentation and errno handling, from Xin Liu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) libbpf: Return -ENODATA for missing btf section libbpf: Add LoongArch support to bpf_tracing.h libbpf: Restore errno after pr_warn. libbpf: Added the description of some API functions libbpf: Fix invalid return address register in s390 samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro samples/bpf: Change _kern suffix to .bpf with syscall tracing program samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program samples/bpf: Use kyscall instead of kprobe in syscall tracing program bpf: rename list_head -> graph_root in field info types libbpf: fix errno is overwritten after being closed. bpf: fix regs_exact() logic in regsafe() to remap IDs correctly bpf: perform byte-by-byte comparison only when necessary in regsafe() bpf: reject non-exact register type matches in regsafe() bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule bpf: reorganize struct bpf_reg_state fields bpf: teach refsafe() to take into account ID remapping bpf: Remove unused field initialization in bpf's ctl_table selftests/bpf: Add jit probe_mem corner case tests to s390x denylist ... ==================== Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Arnaldo Carvalho de Melo
|
b235e5b51f |
tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:
|
||
Arnaldo Carvalho de Melo
|
eeac18e2bf |
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
To pick up the changes in: |
||
Christian Ehrig
|
e26aa600ba |
bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key()
This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY flag. On egress, the resulting tunnel header will not contain a tunnel key if the protocol and implementation supports it. At the moment bpf_tunnel_key wants a user to specify a numeric tunnel key. This will wrap the inner packet into a tunnel header with the key bit and value set accordingly. This is problematic when using a tunnel protocol that supports optional tunnel keys and a receiving tunnel device that is not expecting packets with the key bit set. The receiver won't decapsulate and drop the packet. RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is useful. It allows for generating packets, that can be decapsulated by a GRE tunnel device not operating in collect metadata mode or not expecting the key bit set. Signed-off-by: Christian Ehrig <cehrig@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221218051734.31411-1-cehrig@cloudflare.com |
||
Arnaldo Carvalho de Melo
|
43a3ce77ae |
tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
To pick the changes from: |
||
Linus Torvalds
|
8fa590bf34 |
ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. * Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit |
||
Linus Torvalds
|
94a855111e |
- Add the call depth tracking mitigation for Retbleed which has
been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd 73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/ zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs= =cRy1 -----END PGP SIGNATURE----- Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ... |
||
Paolo Bonzini
|
9352e7470a |
Merge remote-tracking branch 'kvm/queue' into HEAD
x86 Xen-for-KVM: * Allow the Xen runstate information to cross a page boundary * Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured * add support for 32-bit guests in SCHEDOP_poll x86 fixes: * One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). * Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. * Clean up the MSR filter docs. * Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. * Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. * Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. * Advertise (on AMD) that the SMM_CTL MSR is not supported * Remove unnecessary exports Selftests: * Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. * Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). * Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. * Introduce actual atomics for clear/set_bit() in selftests Documentation: * Remove deleted ioctls from documentation * Various fixes |
||
Paolo Bonzini
|
eb5618911a |
KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOODb0PHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDztsQAInRnsgLl57/SpqhZzExNCllN6AT/bdeB3uz rnw3ScJOV174uNKp8lnPWoTvu2YUGiVtBp6tFHhDI8le7zHX438ZT8KE5mcs8p5i KfFKnb8SHV2DDpqkcy24c0Xl/6vsg1qkKrdfJb49yl5ZakRITDpynW/7tn6dXsxX wASeGFdCYeW4g2xMQzsCbtx6LgeQ8uomBmzRfPrOtZHYYxAn6+4Mj4595EC1sWxM AQnbp8tW3Vw46saEZAQvUEOGOW9q0Nls7G21YqQ52IA+ZVDK1LmAF2b1XY3edjkk pX8EsXOURfqdasBxfSfF3SgnUazoz9GHpSzp1cTVTktrPp40rrT7Ldtml0ktq69d 1malPj47KVMDsIq0kNJGnMxciXFgAHw+VaCQX+k4zhIatNwviMbSop2fEoxj22jc 4YGgGOxaGrnvmAJhreCIbr4CkZk5CJ8Zvmtfg+QM6npIp8BY8896nvORx/d4i6tT H4caadd8AAR56ANUyd3+KqF3x0WrkaU0PLHJLy1tKwOXJUUTjcpvIfahBAAeUlSR qEFrtb+EEMPgAwLfNOICcNkPZR/yyuYvM+FiUQNVy5cNiwFkpztpIctfOFaHySGF K07O2/a1F6xKL0OKRUg7hGKknF9ecmux4vHhiUMuIk9VOgNTWobHozBDorLKXMzC aWa6oGVC =iIPT -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.2 - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts |
||
Kumar Kartikeya Dwivedi
|
2706053173 |
bpf: Rework process_dynptr_func
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
for use in callback state, because in case of user ringbuf helpers,
there is no dynptr on the stack that is passed into the callback. To
reflect such a state, a special register type was created.
However, some checks have been bypassed incorrectly during the addition
of this feature. First, for arg_type with MEM_UNINIT flag which
initialize a dynptr, they must be rejected for such register type.
Secondly, in the future, there are plans to add dynptr helpers that
operate on the dynptr itself and may change its offset and other
properties.
In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
to such helpers, however the current code simply returns 0.
The rejection for helpers that release the dynptr is already handled.
For fixing this, we take a step back and rework existing code in a way
that will allow fitting in all classes of helpers and have a coherent
model for dealing with the variety of use cases in which dynptr is used.
First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
fact. To make the distinction clear, use MEM_RDONLY flag to indicate
that the helper only operates on the memory pointed to by the dynptr,
not the dynptr itself. In C parlance, it would be equivalent to taking
the dynptr as a point to const argument.
When either of these flags are not present, the helper is allowed to
mutate both the dynptr itself and also the memory it points to.
Currently, the read only status of the memory is not tracked in the
dynptr, but it would be trivial to add this support inside dynptr state
of the register.
With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
better reflect its usage, it can no longer be passed to helpers that
initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
A note to reviewers is that in code that does mark_stack_slots_dynptr,
and unmark_stack_slots_dynptr, we implicitly rely on the fact that
PTR_TO_STACK reg is the only case that can reach that code path, as one
cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
both cases such helpers won't be setting that flag.
The next patch will add a couple of selftest cases to make sure this
doesn't break.
Fixes:
|
||
Eyal Birger
|
4f4ac4d910 |
tools: add IFLA_XFRM_COLLECT_METADATA to uapi/linux/if_link.h
Needed for XFRM metadata tests. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-4-eyal.birger@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Marc Zyngier
|
adde0476af |
Merge branch kvm-arm64/selftest/s2-faults into kvmarm-master/next
* kvm-arm64/selftest/s2-faults: : . : New KVM/arm64 selftests exercising various sorts of S2 faults, courtesy : of Ricardo Koller. From the cover letter: : : "This series adds a new aarch64 selftest for testing stage 2 fault handling : for various combinations of guest accesses (e.g., write, S1PTW), backing : sources (e.g., anon), and types of faults (e.g., read on hugetlbfs with a : hole, write on a readonly memslot). Each test tries a different combination : and then checks that the access results in the right behavior (e.g., uffd : faults with the right address and write/read flag). [...]" : . KVM: selftests: aarch64: Add mix of tests into page_fault_test KVM: selftests: aarch64: Add readonly memslot tests into page_fault_test KVM: selftests: aarch64: Add dirty logging tests into page_fault_test KVM: selftests: aarch64: Add userfaultfd tests into page_fault_test KVM: selftests: aarch64: Add aarch64/page_fault_test KVM: selftests: Use the right memslot for code, page-tables, and data allocations KVM: selftests: Fix alignment in virt_arch_pgd_alloc() and vm_vaddr_alloc() KVM: selftests: Add vm->memslots[] and enum kvm_mem_region_type KVM: selftests: Stash backing_src_type in struct userspace_mem_region tools: Copy bitfield.h from the kernel sources KVM: selftests: aarch64: Construct DEFAULT_MAIR_EL1 using sysreg.h macros KVM: selftests: Add missing close and munmap in __vm_mem_region_delete() KVM: selftests: aarch64: Add virt_get_pte_hva() library function KVM: selftests: Add a userfaultfd library Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
Sean Christopherson
|
bb056c0f08 |
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
Convert {clear,set}_bit() to atomics as KVM's ucall implementation relies on clear_bit() being atomic, they are defined in atomic.h, and the same helpers in the kernel proper are atomic. KVM's ucall infrastructure is the only user of clear_bit() in tools/, and there are no true set_bit() users. tools/testing/nvdimm/ does make heavy use of set_bit(), but that code builds into a kernel module of sorts, i.e. pulls in all of the kernel's header and so is already getting the kernel's atomic set_bit(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Sean Christopherson
|
36293352ff |
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to match the kernel nomenclature where test_and_set_bit() is atomic, and __test_and_set_bit() provides the non-atomic variant. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Sean Christopherson
|
7f32a6cf8b |
tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
Drop tools' non-atomic test_and_set_bit() and test_and_clear_bit() helpers now that all users are gone. The names will be claimed in the future for atomic versions. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Sean Christopherson
|
7f2b47f22b |
tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
Take @bit as an unsigned long instead of a signed int in clear_bit() and set_bit() so that they match the double-underscore versions, __clear_bit() and __set_bit(). This will allow converting users that really don't want atomic operations to the double-underscores without introducing a functional change, which will in turn allow making {clear,set}_bit() atomic (as advertised). Practically speaking, this _should_ have no functional impact. KVM's selftests usage is either hardcoded (Hyper-V tests) or is artificially limited (arch_timer test and dirty_log test). In KVM, dirty_log test is the only mildly interesting case as it's use indirectly restricted to unsigned 32-bit values, but in theory it could generate a negative value when cast to a signed int. But in that case, taking an "unsigned long" is actually a bug fix. Perf's usage is more difficult to audit, but any code that is affected by the switch is likely already broken. perf_header__{set,clear}_feat() and perf_file_header__read() effectively use only hardcoded enums with small, positive values, atom_new() passes an unsigned long, but its value is capped at 128 via NR_ATOM_PER_PAGE, etc... The only real potential for breakage is in the perf flows that take a "cpu", but it's unlikely perf is subtly relying on a negative index into bitmaps, e.g. "cpu" can be "-1", but only as "not valid" placeholder. Note, tools/testing/nvdimm/ makes heavy use of set_bit(), but that code builds into a kernel module of sorts, i.e. pulls in all of the kernel's header and so is getting the kernel's atomic set_bit(). The NVDIMM test usage of atomics is likely unnecessary, e.g. ndtest_dimm_register() sets bits in a local variable, but that's neither here nor there as far as this change is concerned. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Javier Martinez Canillas
|
30ee198ce4 |
KVM: Reference to kvm_userspace_memory_region in doc and comments
There are still references to the removed kvm_memory_region data structure but the doc and comments should mention struct kvm_userspace_memory_region instead, since that is what's used by the ioctl that replaced the old one and this data structure support the same set of flags. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Message-Id: <20221202105011.185147-4-javierm@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Javier Martinez Canillas
|
66a9221d73 |
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
The documentation says that the ioctl has been deprecated, but it has been actually removed and the remaining references are just left overs. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Message-Id: <20221202105011.185147-3-javierm@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Javier Martinez Canillas
|
61e15f8712 |
KVM: Delete all references to removed KVM_SET_MEMORY_REGION ioctl
The documentation says that the ioctl has been deprecated, but it has been actually removed and the remaining references are just left overs. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Message-Id: <20221202105011.185147-2-javierm@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Ji Rongfeng
|
72b43bde38 |
bpf: Update bpf_{g,s}etsockopt() documentation
* append missing optnames to the end * simplify bpf_getsockopt()'s doc Signed-off-by: Ji Rongfeng <SikoJobs@outlook.com> Link: https://lore.kernel.org/r/DU0P192MB15479B86200B1216EC90E162D6099@DU0P192MB1547.EURP192.PROD.OUTLOOK.COM Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Ingo Molnar
|
0ce096db71 |
Linux 6.1-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I 1GfXkXs= =gt6P -----END PGP SIGNATURE----- Merge tag 'v6.1-rc6' into x86/core, to resolve conflicts Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c: # upstream: |
||
Peter Gonda
|
cf4694be2b |
tools: Add atomic_test_and_set_bit()
Add x86 and generic implementations of atomic_test_and_set_bit() to allow
KVM selftests to atomically manage bitmaps.
Note, the generic version is taken from arch_test_and_set_bit() as of
commit
|
||
Kumar Kartikeya Dwivedi
|
f0c5941ff5 |
bpf: Support bpf_list_head in map values
Add the support on the map side to parse, recognize, verify, and build metadata table for a new special field of the type struct bpf_list_head. To parameterize the bpf_list_head for a certain value type and the list_node member it will accept in that value type, we use BTF declaration tags. The definition of bpf_list_head in a map value will be done as follows: struct foo { struct bpf_list_node node; int data; }; struct map_value { struct bpf_list_head head __contains(foo, node); }; Then, the bpf_list_head only allows adding to the list 'head' using the bpf_list_node 'node' for the type struct foo. The 'contains' annotation is a BTF declaration tag composed of four parts, "contains:name:node" where the name is then used to look up the type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during the lookup. The node defines name of the member in this type that has the type struct bpf_list_node, which is actually used for linking into the linked list. For now, 'kind' part is hardcoded as struct. This allows building intrusive linked lists in BPF, using container_of to obtain pointer to entry, while being completely type safe from the perspective of the verifier. The verifier knows exactly the type of the nodes, and knows that list helpers return that type at some fixed offset where the bpf_list_node member used for this list exists. The verifier also uses this information to disallow adding types that are not accepted by a certain list. For now, no elements can be added to such lists. Support for that is coming in future patches, hence draining and freeing items is done with a TODO that will be resolved in a future patch. Note that the bpf_list_head_free function moves the list out to a local variable under the lock and releases it, doing the actual draining of the list items outside the lock. While this helps with not holding the lock for too long pessimizing other concurrent list operations, it is also necessary for deadlock prevention: unless every function called in the critical section would be notrace, a fentry/fexit program could attach and call bpf_map_update_elem again on the map, leading to the same lock being acquired if the key matches and lead to a deadlock. While this requires some special effort on part of the BPF programmer to trigger and is highly unlikely to occur in practice, it is always better if we can avoid such a condition. While notrace would prevent this, doing the draining outside the lock has advantages of its own, hence it is used to also fix the deadlock related problem. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Jakub Kicinski
|
f4c4ca70de |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEET63h6RnJhTJHuKTjXOwUVIRcSScFAmNu2EkACgkQXOwUVIRc SSebKhAA0ffmp5jJgEJpQYNABGLYIJcwKkBrGClDbMJLtwCjevGZJajT9fpbCLb1 eK6EIhdfR0NTO+0KtUVkZ8WMa81OmLEJYdTNtJfNE23ENMpssiAWhlhDF8AoXeKv Bo3j719gn3Cw9PWXQoircH3wpj+5RMDnjxy4iYlA5yNrvzC7XVmssMF+WALvQnuK CGrfR57hxdgmphmasRqeCzEoriwihwPsG3k6eQN8rf7ZytLhs90tMVgT9L3Cd2u9 DafA0Xl8mZdz2mHhThcJhQVq4MUymZj44ufuHDiOs1j6nhUlWToyQuvegPOqxKti uLGtZul0ls+3UP0Lbrv1oEGU/MWMxyDz4IBc0EVs0k3ItQbmSKs6r9WuPFGd96Sb GHk68qFVySeLGN0LfKe3rCHJ9ZoIOPYJg9qT8Rd5bOhetgGwSsxZTxUI39BxkFup CEqwIDnts1TMU37GDjj+vssKW91k4jEzMZVtRfsL3J36aJs28k/Ez4AqLXg6WU6u ADqFaejVPcXbN9rX90onIYxxiL28gZSeT+i8qOPELZtqTQmNWz+tC/ySVuWnD8Mn Nbs7PZ1IWiNZpsKS8pZnpd6j4mlBeJnwXkPKiFy+xHGuwRSRdYl6G9e5CtlRely/ rwQ8DtaOpRYMrGhnmBEdAOCa9t/iqzrzHzjoigjJ7iAST4ToJ5s= =Y+/e -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Andrii Nakryiko says: ==================== bpf-next 2022-11-11 We've added 49 non-merge commits during the last 9 day(s) which contain a total of 68 files changed, 3592 insertions(+), 1371 deletions(-). The main changes are: 1) Veristat tool improvements to support custom filtering, sorting, and replay of results, from Andrii Nakryiko. 2) BPF verifier precision tracking fixes and improvements, from Andrii Nakryiko. 3) Lots of new BPF documentation for various BPF maps, from Dave Tucker, Donald Hunter, Maryam Tahhan, Bagas Sanjaya. 4) BTF dedup improvements and libbpf's hashmap interface clean ups, from Eduard Zingerman. 5) Fix veth driver panic if XDP program is attached before veth_open, from John Fastabend. 6) BPF verifier clean ups and fixes in preparation for follow up features, from Kumar Kartikeya Dwivedi. 7) Add access to hwtstamp field from BPF sockops programs, from Martin KaFai Lau. 8) Various fixes for BPF selftests and samples, from Artem Savkov, Domenico Cerasuolo, Kang Minchul, Rong Tao, Yang Jihong. 9) Fix redirection to tunneling device logic, preventing skb->len == 0, from Stanislav Fomichev. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits) selftests/bpf: fix veristat's singular file-or-prog filter selftests/bpf: Test skops->skb_hwtstamp selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test bpf: Add hwtstamp field for the sockops prog selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch bpf, docs: Document BPF_MAP_TYPE_ARRAY docs/bpf: Document BPF map types QUEUE and STACK docs/bpf: Document BPF ARRAY_OF_MAPS and HASH_OF_MAPS docs/bpf: Document BPF_MAP_TYPE_CPUMAP map docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map libbpf: Hashmap.h update to fix build issues using LLVM14 bpf: veth driver panics when xdp prog attached before veth_open selftests: Fix test group SKIPPED result selftests/bpf: Tests for btf_dedup_resolve_fwds libbpf: Resolve unambigous forward declarations libbpf: Hashmap interface update to allow both long and void* keys/values samples/bpf: Fix sockex3 error: Missing BPF prog type selftests/bpf: Fix u32 variable compared with less than zero Documentation: bpf: Escape underscore in BPF type name prefix selftests/bpf: Use consistent build-id type for liburandom_read.so ... ==================== Link: https://lore.kernel.org/r/20221111233733.1088228-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Martin KaFai Lau
|
9bb053490f |
bpf: Add hwtstamp field for the sockops prog
The bpf-tc prog has already been able to access the skb_hwtstamps(skb)->hwtstamp. This patch extends the same hwtstamp access to the sockops prog. In sockops, the skb is also available to the bpf prog during the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event. There is a use case that the hwtstamp will be useful to the sockops prog to better measure the one-way-delay when the sender has put the tx timestamp in the tcp header option. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev |
||
Jakub Kicinski
|
966a9b4903 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/pch_can.c |
||
Ricardo Koller
|
590b949597 |
tools: Copy bitfield.h from the kernel sources
Copy bitfield.h from include/linux/bitfield.h. A subsequent change will make use of some FIELD_{GET,PREP} macros defined in this header. The header was copied as-is, no changes needed. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-6-ricarkol@google.com |
||
Jakub Kicinski
|
f2c24be55b |
bpf-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY2RS7QAKCRDbK58LschI g6RVAQC1FdSXMrhn369NGCG1Vox1QYn2/5P32LSIV1BKqiQsywEAsxgYNrdCPTua ie91Q5IJGT9pFl1UR50UrgL11DI5BgI= =sdhO -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2022-11-04 We've added 8 non-merge commits during the last 3 day(s) which contain a total of 10 files changed, 113 insertions(+), 16 deletions(-). The main changes are: 1) Fix memory leak upon allocation failure in BPF verifier's stack state tracking, from Kees Cook. 2) Fix address leakage when BPF progs release reference to an object, from Youlin Li. 3) Fix BPF CI breakage from buggy in.h uapi header dependency, from Andrii Nakryiko. 4) Fix bpftool pin sub-command's argument parsing, from Pu Lehui. 5) Fix BPF sockmap lockdep warning by cancelling psock work outside of socket lock, from Cong Wang. 6) Follow-up for BPF sockmap to fix sk_forward_alloc accounting, from Wang Yufen. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add verifier test for release_reference() bpf: Fix wrong reg type conversion in release_reference() bpf, sock_map: Move cancel_work_sync() out of sock lock tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI net/ipv4: Fix linux/in.h header dependencies bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues bpf, verifier: Fix memory leak in array reallocation for stack state ==================== Link: https://lore.kernel.org/r/20221104000445.30761-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
fbeb229a66 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Andrii Nakryiko
|
a778f5d46b |
tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI
With recent sync of linux/in.h tools/include headers are now relying on
__DECLARE_FLEX_ARRAY macro, which isn't itself defined inside
tools/include headers anywhere and is instead assumed to be present in
system-wide UAPI header. This breaks isolated environments that don't
have kernel UAPI headers installed system-wide, like BPF CI ([0]).
To fix this, bring in include/uapi/linux/stddef.h into tools/include.
We can't just copy/paste it, though, it has to be processed with
scripts/headers_install.sh, which has a dependency on scripts/unifdef.
So the full command to (re-)generate stddef.h for inclusion into
tools/include directory is:
$ make scripts_unifdef && \
cp $KBUILD_OUTPUT/scripts/unifdef scripts/ && \
scripts/headers_install.sh include/uapi/linux/stddef.h tools/include/uapi/linux/stddef.h
This assumes KBUILD_OUTPUT envvar is set and used for out-of-tree builds.
[0] https://github.com/kernel-patches/bpf/actions/runs/3379432493/jobs/5610982609
Fixes:
|
||
Jakub Kicinski
|
b54a0d4094 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY2GuKgAKCRDbK58LschI gy32AP9PI0e/bUGDExKJ8g97PeeEtnpj4TTI6g+XKILtYnyXlgD/Rk4j2D/f3IBF Ha9TmqYvAUim+U/g50vUrNuoNLNJ5w8= =OKC1 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2022-11-02 We've added 70 non-merge commits during the last 14 day(s) which contain a total of 96 files changed, 3203 insertions(+), 640 deletions(-). The main changes are: 1) Make cgroup local storage available to non-cgroup attached BPF programs such as tc BPF ones, from Yonghong Song. 2) Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers, from Martin KaFai Lau. 3) Add LLVM disassembler as default library for dumping JITed code in bpftool, from Quentin Monnet. 4) Various kprobe_multi_link fixes related to kernel modules, from Jiri Olsa. 5) Optimize x86-64 JIT with emitting BMI2-based shift instructions, from Jie Meng. 6) Improve BPF verifier's memory type compatibility for map key/value arguments, from Dave Marchevsky. 7) Only create mmap-able data section maps in libbpf when data is exposed via skeletons, from Andrii Nakryiko. 8) Add an autoattach option for bpftool to load all object assets, from Wang Yufen. 9) Various memory handling fixes for libbpf and BPF selftests, from Xu Kuohai. 10) Initial support for BPF selftest's vmtest.sh on arm64, from Manu Bretelle. 11) Improve libbpf's BTF handling to dedup identical structs, from Alan Maguire. 12) Add BPF CI and denylist documentation for BPF selftests, from Daniel Müller. 13) Check BPF cpumap max_entries before doing allocation work, from Florian Lehner. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits) samples/bpf: Fix typo in README bpf: Remove the obsolte u64_stats_fetch_*_irq() users. bpf: check max_entries before allocating memory bpf: Fix a typo in comment for DFS algorithm bpftool: Fix spelling mistake "disasembler" -> "disassembler" selftests/bpf: Fix bpftool synctypes checking failure selftests/bpf: Panic on hard/soft lockup docs/bpf: Add documentation for new cgroup local storage selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x selftests/bpf: Add selftests for new cgroup local storage selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str bpftool: Support new cgroup local storage libbpf: Support new cgroup local storage bpf: Implement cgroup storage available to non-cgroup-attached bpf progs bpf: Refactor some inode/task/sk storage functions for reuse bpf: Make struct cgroup btf id global selftests/bpf: Tracing prog can still do lookup under busy lock selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection bpf: Add new bpf_task_storage_delete proto with no deadlock detection bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check ... ==================== Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
54917c90c2 |
Urgent nolibc pull request for v6.1
This pull request contains a couple of commits that fix string-function bugs introduced by: |
||
Rasmus Villemoes
|
b3f4f51ea6 |
tools/nolibc/string: Fix memcmp() implementation
The C standard says that memcmp() must treat the buffers as consisting
of "unsigned chars". If char happens to be unsigned, the casts are ok,
but then obviously the c1 variable can never contain a negative
value. And when char is signed, the casts are wrong, and there's still
a problem with using an 8-bit quantity to hold the difference, because
that can range from -255 to +255.
For example, assuming char is signed, comparing two 1-byte buffers,
one containing 0x00 and another 0x80, the current implementation would
return -128 for both memcmp(a, b, 1) and memcmp(b, a, 1), whereas one
of those should of course return something positive.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Fixes:
|
||
Willy Tarreau
|
bfc3b0f056 |
tools/nolibc: Fix missing strlen() definition and infinite loop with gcc-12
When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen() and replaces it with a jump to strlen(), which is not defined as a symbol and breaks compilation. Worse, when the function is called strlen(), the function is simply replaced with a jump to itself, hence becomes an infinite loop. One way to avoid this is to always set -ffreestanding, but the calling code doesn't know this and there's no way (either via attributes or pragmas) to globally enable it from include files, effectively leaving a painful situation for the caller. Alexey suggested to place an empty asm() statement inside the loop to stop gcc from recognizing a well-known pattern, which happens to work pretty fine. At least it allows us to make sure our local definition is not replaced with a self jump. The function only needs to be renamed back to strlen() so that the symbol exists, which implies that nolibc_strlen() which is used on variable strings has to be declared as a macro that points back to it before the strlen() macro is redifined. It was verified to produce valid code with gcc 3.4 to 12.1 at different optimization levels, and both with constant and variable strings. In case this problem surfaces again in the future, an alternate approach consisting in adding an optimize("no-tree-loop-distribute-patterns") function attribute for gcc>=12 worked as well but is less pretty. Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210081618.754a77db-yujie.liu@intel.com Fixes: |
||
Jakub Kicinski
|
31f1aa4f74 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c |
||
Arnaldo Carvalho de Melo
|
831c05a762 |
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
To pick the changes in: |
||
Yonghong Song
|
c4bcfb38a9 |
bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Similar to sk/inode/task storage, implement similar cgroup local storage. There already exists a local storage implementation for cgroup-attached bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper bpf_get_local_storage(). But there are use cases such that non-cgroup attached bpf progs wants to access cgroup local storage data. For example, tc egress prog has access to sk and cgroup. It is possible to use sk local storage to emulate cgroup local storage by storing data in socket. But this is a waste as it could be lots of sockets belonging to a particular cgroup. Alternatively, a separate map can be created with cgroup id as the key. But this will introduce additional overhead to manipulate the new map. A cgroup local storage, similar to existing sk/inode/task storage, should help for this use case. The life-cycle of storage is managed with the life-cycle of the cgroup struct. i.e. the storage is destroyed along with the owning cgroup with a call to bpf_cgrp_storage_free() when cgroup itself is deleted. The userspace map operations can be done by using a cgroup fd as a key passed to the lookup, update and delete operations. Typically, the following code is used to get the current cgroup: struct task_struct *task = bpf_get_current_task_btf(); ... task->cgroups->dfl_cgrp ... and in structure task_struct definition: struct task_struct { .... struct css_set __rcu *cgroups; .... } With sleepable program, accessing task->cgroups is not protected by rcu_read_lock. So the current implementation only supports non-sleepable program and supporting sleepable program will be the next step together with adding rcu_read_lock protection for rcu tagged structures. Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used for cgroup storage available to non-cgroup-attached bpf programs. The old cgroup storage supports bpf_get_local_storage() helper to get the cgroup data. The new cgroup storage helper bpf_cgrp_storage_get() can provide similar functionality. While old cgroup storage pre-allocates storage memory, the new mechanism can also pre-allocate with a user space bpf_map_update_elem() call to avoid potential run-time memory allocation failure. Therefore, the new cgroup storage can provide all functionality w.r.t. the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can be deprecated since the new one can provide the same functionality. Acked-by: David Vernet <void@manifault.com> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Arnaldo Carvalho de Melo
|
49c75d30b0 |
tools headers uapi: Sync linux/stat.h with the kernel sources
To pick the changes from:
|
||
Arnaldo Carvalho de Melo
|
82c50d8937 |
tools include UAPI: Sync sound/asound.h copy with the kernel sources
Picking the changes from:
|
||
Arnaldo Carvalho de Melo
|
036b8f5b89 |
tools headers uapi: Update linux/in.h copy
To get the changes in: |
||
Jakub Kicinski
|
96917bb3a3 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/linux/net.h |
||
Paolo Bonzini
|
9aec606c16 |
tools: include: sync include/api/linux/kvm.h
Provide a definition of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
Fixes:
|
||
Jakub Kicinski
|
3566a79c9e |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY08JQQAKCRDbK58LschI g0M0AQCWGrJcnQFut1qwR9efZUadwxtKGAgpaA/8Smd8+v7c8AD/SeHQuGfkFiD6 rx18hv1mTfG0HuPnFQy6YZQ98vmznwE= =DaeS -----END PGP SIGNATURE----- Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-18 We've added 33 non-merge commits during the last 14 day(s) which contain a total of 31 files changed, 874 insertions(+), 538 deletions(-). The main changes are: 1) Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs, from Hou Tao & Paul E. McKenney. 2) Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values. In the wild we have seen OS vendors doing buggy backports where helper call numbers mismatched. This is an attempt to make backports more foolproof, from Andrii Nakryiko. 3) Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions, from Roberto Sassu. 4) Fix libbpf's BTF dumper for structs with padding-only fields, from Eduard Zingerman. 5) Fix various libbpf bugs which have been found from fuzzing with malformed BPF object files, from Shung-Hsi Yu. 6) Clean up an unneeded check on existence of SSE2 in BPF x86-64 JIT, from Jie Meng. 7) Fix various ASAN bugs in both libbpf and selftests when running the BPF selftest suite on arm64, from Xu Kuohai. 8) Fix missing bpf_iter_vma_offset__destroy() call in BPF iter selftest and use in-skeleton link pointer to remove an explicit bpf_link__destroy(), from Jiri Olsa. 9) Fix BPF CI breakage by pointing to iptables-legacy instead of relying on symlinked iptables which got upgraded to iptables-nft, from Martin KaFai Lau. 10) Minor BPF selftest improvements all over the place, from various others. * tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (33 commits) bpf/docs: Update README for most recent vmtest.sh bpf: Use rcu_trace_implies_rcu_gp() for program array freeing bpf: Use rcu_trace_implies_rcu_gp() in local storage map bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator rcu-tasks: Provide rcu_trace_implies_rcu_gp() selftests/bpf: Use sys_pidfd_open() helper when possible libbpf: Fix null-pointer dereference in find_prog_by_sec_insn() libbpf: Deal with section with no data gracefully libbpf: Use elf_getshdrnum() instead of e_shnum selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c selftests/bpf: Fix error failure of case test_xdp_adjust_tail_grow selftest/bpf: Fix memory leak in kprobe_multi_test selftests/bpf: Fix memory leak caused by not destroying skeleton libbpf: Fix memory leak in parse_usdt_arg() libbpf: Fix use-after-free in btf_dump_name_dups selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test selftests/bpf: Alphabetize DENYLISTs selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id() libbpf: Introduce bpf_link_get_fd_by_id_opts() libbpf: Introduce bpf_btf_get_fd_by_id_opts() ... ==================== Link: https://lore.kernel.org/r/20221018210631.11211-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Peter Zijlstra
|
5da6aea375 |
objtool: Fix find_{symbol,func}_containing()
The current find_{symbol,func}_containing() functions are broken in the face of overlapping symbols, exactly the case that is needed for a new ibt/endbr supression. Import interval_tree_generic.h into the tools tree and convert the symbol tree to an interval tree to support proper range stabs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111146.330203761@infradead.org |
||
Linus Torvalds
|
d465bff130 |
perf tools changes for v6.1: 1st batch
- Add support for AMD on 'perf mem' and 'perf c2c', the kernel enablement patches went via tip. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ - "perf lock" improvements: - Add -E/--entries option to limit the number of entries to display, say to ask for just the top 5 contended locks. - Add -q/--quiet option to suppress header and debug messages. - Add a 'perf test' kernel lock contention entry to test 'perf lock'. - "perf lock contention" improvements: - Ask BPF's bpf_get_stackid() to skip some callchain entries. The ones closer to the tooling are bpf related and not that interesting, the ones calling the locking function are the ones we're interested in, example of a full, unskipped callstack: - Allow changing the callstack depth and number of entries to skip. 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb841015 tick_do_update_jiffies64+0x25 0xffffffffbb8409ee tick_irq_enter+0x9e - Show full callstack in verbose mode (-v option), sometimes this is desirable instead of showing just one callstack entry. - Allow multiple time ranges in 'perf record --delay' to help in reducing the amount of data collected from hardware tracing (Intel PT, etc) when there is a rough idea of periods of time where events of interest take time. - Add Intel PT to record only decoder debug messages when error happens. - Improve layout of Intel PT man page. - Add new branch types: alignment, data and inst faults and arch specific ones, such as fiq, debug_halt, debug_exit, debug_inst and debug_data on arm64. Kernel enablement went thru the tip tree. - Fix 'perf probe' error log check in 'perf test' when no debuginfo is available. - Fix 'perf stat' aggregation mode logic, it should be looking at the CPU not at the core number. - Fix flags parsing in 'perf trace' filters. - Introduce compact encoding of CPU range encoding on perf.data, to avoid having a bitmap with all the CPUs. - Improvements to the 'perf stat' metrics, including adding "core_wide", and computing "smt" from the CPU topology. - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, that allows tooling to ask for the precise number of lost samples for a given event. - Add 'addr' sort key to see just the address of sampled instructions: $ perf record -o- true | perf report -i- -s addr [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # Samples: 12 of event 'cycles:u' # Event count (approx.): 252512 # # Overhead Address # ........ .................. 42.96% 0x7f96f08443d7 29.55% 0x7f96f0859b50 14.76% 0x7f96f0852e02 8.30% 0x7f96f0855028 4.43% 0xffffffff8de01087 perf annotate: Toggle full address <-> offset display - Add 'f' hotkey to the 'perf annotate' TUI interface when in 'disassembler output' mode ('o' hotkey) to toggle showing full virtual address or just the offset. - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for pre-existing threads, at the start of a 'perf record' session, speeding up that record startup phase. - Add a command line option to specify build ids in 'perf inject'. - Update JSON event files for the Intel alderlake, broadwell, broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids, skylake, skylakex, and tigerlake processors. - Update vendor JSON event files for the ARM Neoverse V1 and E1 platforms. - Add a 'perf test' entry for 'perf mem' where a struct has false sharing and this gets detected in the 'perf mem' output, tested with Intel, AMD and ARM64 systems. - Add a 'perf test' entry to test the resolution of java symbols, where an output like this is expected: 8.18% jshell jitted-50116-29.so [.] Interpreter 0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int) - Add tests for the ARM64 CoreSight hardware tracing feature, with specially crafted pureloop, memcpy, thread loop and unroll tread that then gets traced and the output compared with expected output. Documentation explaining it is also included. - Add per thread Intel PT 'perf test' entry to check that PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a mixture of per thread and per CPU events and mmaps, verify that this gets all recorded correctly. - Introduce pthread mutex wrappers to allow for building with clang's -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" "lockable", "exclusive_lock_function", "exclusive_trylock_function", "exclusive_locks_required", and "no_thread_safety_analysis" compiler function attributes. - Fix empty version number when building outside of a git repo. - Improve feature detection display when multiple versions of a feature are present, such as for binutils libbfd, that has a mix of possible ways to detect according to the Linux distribution. Previously in some cases we had: Auto-detecting system features <SNIP> ... libbfd: [ on ] ... libbfd-liberty: [ on ] ... libbfd-liberty-z: [ on ] <SNIP> Now for this case we show just the main feature: Auto-detecting system features <SNIP> ... libbfd: [ on ] <SNIP> - Remove some unused structs, variables, macros, function prototypes and includes from various places. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY0CKuAAKCRCyPKLppCJ+ JywwAQDWLForEnEZNk92Fd3y342Lh9W/8z1V51dKK7XdY1cV6AD/Rn5L57v7k/yG mG5w2Fd1J/xBjlsL/BvNlimUD2tbkQA= =XPMg -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: - Add support for AMD on 'perf mem' and 'perf c2c', the kernel enablement patches went via tip. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ - "perf lock" improvements: - Add -E/--entries option to limit the number of entries to display, say to ask for just the top 5 contended locks. - Add -q/--quiet option to suppress header and debug messages. - Add a 'perf test' kernel lock contention entry to test 'perf lock'. - "perf lock contention" improvements: - Ask BPF's bpf_get_stackid() to skip some callchain entries. The ones closer to the tooling are bpf related and not that interesting, the ones calling the locking function are the ones we're interested in, example of a full, unskipped callstack: - Allow changing the callstack depth and number of entries to skip. 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb841015 tick_do_update_jiffies64+0x25 0xffffffffbb8409ee tick_irq_enter+0x9e - Show full callstack in verbose mode (-v option), sometimes this is desirable instead of showing just one callstack entry. - Allow multiple time ranges in 'perf record --delay' to help in reducing the amount of data collected from hardware tracing (Intel PT, etc) when there is a rough idea of periods of time where events of interest take time. - Add Intel PT to record only decoder debug messages when error happens. - Improve layout of Intel PT man page. - Add new branch types: alignment, data and inst faults and arch specific ones, such as fiq, debug_halt, debug_exit, debug_inst and debug_data on arm64. Kernel enablement went thru the tip tree. - Fix 'perf probe' error log check in 'perf test' when no debuginfo is available. - Fix 'perf stat' aggregation mode logic, it should be looking at the CPU not at the core number. - Fix flags parsing in 'perf trace' filters. - Introduce compact encoding of CPU range encoding on perf.data, to avoid having a bitmap with all the CPUs. - Improvements to the 'perf stat' metrics, including adding "core_wide", and computing "smt" from the CPU topology. - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, that allows tooling to ask for the precise number of lost samples for a given event. - Add 'addr' sort key to see just the address of sampled instructions: $ perf record -o- true | perf report -i- -s addr [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # Samples: 12 of event 'cycles:u' # Event count (approx.): 252512 # # Overhead Address # ........ .................. 42.96% 0x7f96f08443d7 29.55% 0x7f96f0859b50 14.76% 0x7f96f0852e02 8.30% 0x7f96f0855028 4.43% 0xffffffff8de01087 perf annotate: Toggle full address <-> offset display - Add 'f' hotkey to the 'perf annotate' TUI interface when in 'disassembler output' mode ('o' hotkey) to toggle showing full virtual address or just the offset. - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for pre-existing threads, at the start of a 'perf record' session, speeding up that record startup phase. - Add a command line option to specify build ids in 'perf inject'. - Update JSON event files for the Intel alderlake, broadwell, broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids, skylake, skylakex, and tigerlake processors. - Update vendor JSON event files for the ARM Neoverse V1 and E1 platforms. - Add a 'perf test' entry for 'perf mem' where a struct has false sharing and this gets detected in the 'perf mem' output, tested with Intel, AMD and ARM64 systems. - Add a 'perf test' entry to test the resolution of java symbols, where an output like this is expected: 8.18% jshell jitted-50116-29.so [.] Interpreter 0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int) - Add tests for the ARM64 CoreSight hardware tracing feature, with specially crafted pureloop, memcpy, thread loop and unroll tread that then gets traced and the output compared with expected output. Documentation explaining it is also included. - Add per thread Intel PT 'perf test' entry to check that PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a mixture of per thread and per CPU events and mmaps, verify that this gets all recorded correctly. - Introduce pthread mutex wrappers to allow for building with clang's -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" "lockable", "exclusive_lock_function", "exclusive_trylock_function", "exclusive_locks_required", and "no_thread_safety_analysis" compiler function attributes. - Fix empty version number when building outside of a git repo. - Improve feature detection display when multiple versions of a feature are present, such as for binutils libbfd, that has a mix of possible ways to detect according to the Linux distribution. Previously in some cases we had: Auto-detecting system features <SNIP> ... libbfd: [ on ] ... libbfd-liberty: [ on ] ... libbfd-liberty-z: [ on ] <SNIP> Now for this case we show just the main feature: Auto-detecting system features <SNIP> ... libbfd: [ on ] <SNIP> - Remove some unused structs, variables, macros, function prototypes and includes from various places. * tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) perf script: Add missing fields in usage hint perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB perf mem/c2c: Avoid printing empty lines for unsupported events perf mem/c2c: Add load store event mappings for AMD perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO} perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout() perf test coresight: Add relevant documentation about ARM64 CoreSight testing perf test: Add git ignore for tmp and output files of ARM CoreSight tests perf test coresight: Add unroll thread test shell script perf test coresight: Add unroll thread test tool perf test coresight: Add thread loop test shell scripts perf test coresight: Add thread loop test tool perf test coresight: Add memcpy thread test shell script perf test coresight: Add memcpy thread test tool perf test: Add git ignore for perf data generated by the ARM CoreSight tests perf test: Add arm64 asm pureloop test shell script perf test: Add asm pureloop test tool ... |
||
Linus Torvalds
|
27bc50fc90 |
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam R. Howlett. An overlapping range-based tree for vmas. It it apparently slight more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com). This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU= =xfWx -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ... |
||
Linus Torvalds
|
d4013bc4d4 |
bitmap patches for v6.1-rc1
From Phil Auld: drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES From me: cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. From me: lib: optimize find_bit() functions Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. From me: lib/find: add find_nth_bit() Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. From me: cpumask: repair cpumask_check() After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. From Valentin Schneider: bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot() Extends the API with one more function and applies it in sched/core. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow 7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD 5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6 hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc /ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001 TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV 6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3 GR7sMQae =Q/+F -----END PGP SIGNATURE----- Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux Pull bitmap updates from Yury Norov: - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld) - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me) This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. - optimize find_bit() functions (me) Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. - add find_nth_bit() (me) Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. - repair cpumask_check() (me) After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin Schneider) Extends the API with one more function and applies it in sched/core. * tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits) sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() lib/test_cpumask: Add for_each_cpu_and(not) tests cpumask: Introduce for_each_cpu_andnot() lib/find_bit: Introduce find_next_andnot_bit() cpumask: fix checking valid cpu range lib/bitmap: add tests for for_each() loops lib/find: optimize for_each() macros lib/bitmap: introduce for_each_set_bit_wrap() macro lib/find_bit: add find_next{,_and}_bit_wrap cpumask: switch for_each_cpu{,_not} to use for_each_bit() net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and} cpumask: add cpumask_nth_{,and,andnot} lib/bitmap: remove bitmap_ord_to_pos lib/bitmap: add tests for find_nth_bit() lib: add find_nth{,_and,_andnot}_bit() lib/bitmap: add bitmap_weight_and() lib/bitmap: don't call __bitmap_weight() in kernel code tools: sync find_bit() implementation lib/find_bit: optimize find_next_bit() functions lib/find_bit: create find_first_zero_bit_le() ... |
||
Ravi Bangoria
|
b7ddd38ccc |
tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
Two new fields for mem_lvl_num has been introduced: PERF_MEM_LVLNUM_IO and PERF_MEM_LVLNUM_CXL which are required to support perf mem/c2c on AMD platform. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Andrii Nakryiko
|
8a76145a2e |
bpf: explicitly define BPF_FUNC_xxx integer values
Historically enum bpf_func_id's BPF_FUNC_xxx enumerators relied on implicit sequential values being assigned by compiler. This is convenient, as new BPF helpers are always added at the very end, but it also has its downsides, some of them being: - with over 200 helpers now it's very hard to know what's each helper's ID, which is often important to know when working with BPF assembly (e.g., by dumping raw bpf assembly instructions with llvm-objdump -d command). it's possible to work around this by looking into vmlinux.h, dumping /sys/btf/kernel/vmlinux, looking at libbpf-provided bpf_helper_defs.h, etc. But it always feels like an unnecessary step and one should be able to quickly figure this out from UAPI header. - when backporting and cherry-picking only some BPF helpers onto older kernels it's important to be able to skip some enum values for helpers that weren't backported, but preserve absolute integer IDs to keep BPF helper IDs stable so that BPF programs stay portable across upstream and backported kernels. While neither problem is insurmountable, they come up frequently enough and are annoying enough to warrant improving the situation. And for the backporting the problem can easily go unnoticed for a while, especially if backport is done with people not very familiar with BPF subsystem overall. Anyways, it's easy to fix this by making sure that __BPF_FUNC_MAPPER macro provides explicit helper IDs. Unfortunately that would potentially break existing users that use UAPI-exposed __BPF_FUNC_MAPPER and are expected to pass macro that accepts only symbolic helper identifier (e.g., map_lookup_elem for bpf_map_lookup_elem() helper). As such, we need to introduce a new macro (___BPF_FUNC_MAPPER) which would specify both identifier and integer ID, but in such a way as to allow existing __BPF_FUNC_MAPPER be expressed in terms of new ___BPF_FUNC_MAPPER macro. And that's what this patch is doing. To avoid duplication and allow __BPF_FUNC_MAPPER stay *exactly* the same, ___BPF_FUNC_MAPPER accepts arbitrary "context" arguments, which can be used to pass any extra macros, arguments, and whatnot. In our case we use this to pass original user-provided macro that expects single argument and __BPF_FUNC_MAPPER is using it's own three-argument __BPF_FUNC_MAPPER_APPLY intermediate macro to impedance-match new and old "callback" macros. Once we resolve this, we use new ___BPF_FUNC_MAPPER to define enum bpf_func_id with explicit values. The other users of __BPF_FUNC_MAPPER in kernel (namely in kernel/bpf/disasm.c) are kept exactly the same both as demonstration that backwards compat works, but also to avoid unnecessary code churn. Note that new ___BPF_FUNC_MAPPER() doesn't forcefully insert comma between values, as that might not be appropriate in all possible cases where ___BPF_FUNC_MAPPER might be used by users. This doesn't reduce usability, as it's trivial to insert that comma inside "callback" macro. To validate all the manually specified IDs are exactly right, we used BTF to compare before and after values: $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > after.txt $ git stash # stach UAPI changes $ make -j90 ... re-building kernel without UAPI changes ... $ bpftool btf dump file ~/linux-build/default/vmlinux | rg bpf_func_id -A 211 > before.txt $ diff -u before.txt after.txt --- before.txt 2022-10-05 10:48:18.119195916 -0700 +++ after.txt 2022-10-05 10:46:49.446615025 -0700 @@ -1,4 +1,4 @@ -[14576] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211 +[9560] ENUM 'bpf_func_id' encoding=UNSIGNED size=4 vlen=211 'BPF_FUNC_unspec' val=0 'BPF_FUNC_map_lookup_elem' val=1 'BPF_FUNC_map_update_elem' val=2 As can be seen from diff above, the only thing that changed was resulting BTF type ID of ENUM bpf_func_id, not any of the enumerators, their names or integer values. The only other place that needed fixing was scripts/bpf_doc.py used to generate man pages and bpf_helper_defs.h header for libbpf and selftests. That script is tightly-coupled to exact shape of ___BPF_FUNC_MAPPER macro definition, so had to be trivially adapted. Cc: Quentin Monnet <quentin@isovalent.com> Reported-by: Andrea Terzolo <andrea.terzolo@polito.it> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221006042452.2089843-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
0326074ff4 |
Networking changes for 6.1.
Core ---- - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF --- - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols --------- - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API ---------- - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers ---------------------- - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers ------- - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52 bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89 Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q 9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8= =Gsio -----END PGP SIGNATURE----- Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ... |
||
Anshuman Khandual
|
fb42f8b729 |
perf branch: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platform
This updates the perf tool with arch specific branch type classification used for BRBE on arm64 platform as added in the kernel earlier. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220824044822.70230-9-anshuman.khandual@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anshuman Khandual
|
bcb96ce6d2 |
perf branch: Add branch privilege information request flag
This updates the perf tools with branch privilege information request flag i.e PERF_SAMPLE_BRANCH_PRIV_SAVE that has been added earlier in the kernel. This also updates 'perf record' documentation, branch_modes[], and generic branch privilege level enumeration as added earlier in the kernel. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220824044822.70230-8-anshuman.khandual@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anshuman Khandual
|
0ddea8e2a0 |
perf branch: Extend branch type classification
This updates the perf tool with generic branch type classification with new ABI extender place holder i.e PERF_BR_EXTEND_ABI, the new 4 bit branch type field i.e perf_branch_entry.new_type, new generic page fault related branch types and some arch specific branch types as added earlier in the kernel. Committer note: Add an extra entry to the branch_type_name array to cope with PERF_BR_EXTEND_ABI, to address build warnings on some compiler/systems, like: 75 8.89 ubuntu:20.04-x-powerpc64el : FAIL gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1~20.04) inlined from 'branch_type_stat_display' at util/branch.c:152:4: /usr/powerpc64le-linux-gnu/include/bits/stdio2.h💯10: error: '%8s' directive argument is null [-Werror=format-overflow=] 100 | return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 101 | __va_arg_pack ()); | ~~~~~~~~~~~~~~~~~ Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220824044822.70230-7-anshuman.khandual@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Anshuman Khandual
|
1c96b6e45f |
perf branch: Add system error and not in transaction branch types
This updates the perf tool with generic branch type classification with two new branch types i.e system error (PERF_BR_SERROR) and not in transaction (PERF_BR_NO_TX) which got updated earlier in the kernel. This also updates corresponding branch type strings in branch_type_name(). Committer notes: At perf tools merge time this is only on PeterZ's tree, at: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core So for testing one has to build a kernel with that branch, then test the tooling side from: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220824044822.70230-6-anshuman.khandual@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jakub Kicinski
|
e52f7c1ddf |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in the left-over fixes before the net-next pull-request. Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c |
||
Linus Torvalds
|
8aebac8293 |
Rust introduction for v6.1-rc1
The initial support of Rust-for-Linux comes in roughly 4 areas: - Kernel internals (kallsyms expansion for Rust symbols, %pA format) - Kbuild infrastructure (Rust build rules and support scripts) - Rust crates and bindings for initial minimum viable build - Rust kernel documentation and samples Rust support has been in linux-next for a year and a half now, and the short log doesn't do justice to the number of people who have contributed both to the Linux kernel side but also to the upstream Rust side to support the kernel's needs. Thanks to these 173 people, and many more, who have been involved in all kinds of ways: Miguel Ojeda, Wedson Almeida Filho, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Andreas Hindborg, Adam Bratschi-Kaye, Benno Lossin, Maciej Falkowski, Finn Behrens, Sven Van Asbroeck, Asahi Lina, FUJITA Tomonori, John Baublitz, Wei Liu, Geoffrey Thomas, Philip Herron, Arthur Cohen, David Faust, Antoni Boucher, Philip Li, Yujie Liu, Jonathan Corbet, Greg Kroah-Hartman, Paul E. McKenney, Josh Triplett, Kent Overstreet, David Gow, Alice Ryhl, Robin Randhawa, Kees Cook, Nick Desaulniers, Matthew Wilcox, Linus Walleij, Joe Perches, Michael Ellerman, Petr Mladek, Masahiro Yamada, Arnaldo Carvalho de Melo, Andrii Nakryiko, Konstantin Shelekhin, Rasmus Villemoes, Konstantin Ryabitsev, Stephen Rothwell, Andy Shevchenko, Sergey Senozhatsky, John Paul Adrian Glaubitz, David Laight, Nathan Chancellor, Jonathan Cameron, Daniel Latypov, Shuah Khan, Brendan Higgins, Julia Lawall, Laurent Pinchart, Geert Uytterhoeven, Akira Yokosawa, Pavel Machek, David S. Miller, John Hawley, James Bottomley, Arnd Bergmann, Christian Brauner, Dan Robertson, Nicholas Piggin, Zhouyi Zhou, Elena Zannoni, Jose E. Marchesi, Leon Romanovsky, Will Deacon, Richard Weinberger, Randy Dunlap, Paolo Bonzini, Roland Dreier, Mark Brown, Sasha Levin, Ted Ts'o, Steven Rostedt, Jarkko Sakkinen, Michal Kubecek, Marco Elver, Al Viro, Keith Busch, Johannes Berg, Jan Kara, David Sterba, Connor Kuehl, Andy Lutomirski, Andrew Lunn, Alexandre Belloni, Peter Zijlstra, Russell King, Eric W. Biederman, Willy Tarreau, Christoph Hellwig, Emilio Cobos Álvarez, Christian Poveda, Mark Rousskov, John Ericson, TennyZhuang, Xuanwo, Daniel Paoliello, Manish Goregaokar, comex, Josh Stone, Stephan Sokolow, Philipp Krones, Guillaume Gomez, Joshua Nelson, Mats Larsen, Marc Poulhiès, Samantha Miller, Esteban Blanc, Martin Schmidt, Martin Rodriguez Reboredo, Daniel Xu, Viresh Kumar, Bartosz Golaszewski, Vegard Nossum, Milan Landaverde, Dariusz Sosnowski, Yuki Okushi, Matthew Bakhtiari, Wu XiangCheng, Tiago Lam, Boris-Chengbiao Zhou, Sumera Priyadarsini, Viktor Garske, Niklas Mohrin, Nándor István Krácser, Morgan Bartlett, Miguel Cano, Léo Lanteri Thauvin, Julian Merkle, Andreas Reindl, Jiapeng Chong, Fox Chen, Douglas Su, Antonio Terceiro, SeongJae Park, Sergio González Collado, Ngo Iok Ui (Wu Yu Wei), Joshua Abraham, Milan, Daniel Kolsoi, ahomescu, Manas, Luis Gerhorst, Li Hongyu, Philipp Gesang, Russell Currey, Jalil David Salamé Messina, Jon Olson, Raghvender, Angelos, Kaviraj Kanagaraj, Paul Römer, Sladyn Nunes, Mauro Baladés, Hsiang-Cheng Yang, Abhik Jain, Hongyu Li, Sean Nash, Yuheng Su, Peng Hao, Anhad Singh, Roel Kluin, Sara Saa, Geert Stappers, Garrett LeSage, IFo Hancroft, and Linus Torvalds. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4WcIWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlGrD/93HbmxjNi/hwdWF5UdWV1/W0kJ bSTh9JsNtN9atQGEUwxePBjrtxHE75lxSL0RJ+sWvaJ7vR3iv2qys+cEgU0ePrgX INZ3bvHAGgvPG1b0R6VxmakksHq1BdCDbCT3Ft5lSNxB0uQBi95KgjtR0lCH/NUl eoZnGJ0ZbKs5KpbzFqOjM2gmJ51geZppnfNFmbKOb3lSUpPQqhZLPDCzweE57GNo e2vcMoY4daVaSUxmo01TSEphrM5IjDxp5rs09+aeovfmpbeoiz33siyGiAxyM7CI +Ybxl+bBnyqXLadjbs9VvvtYzASFZgmrQdwIQbY8j/sqsw34jmZarOwa5iUVmo+Q 2w1CDDNLMG3XpI/PdnUklFRIJg1uYCM+OXgZY2MFFqzbjoik/zFv2qFWTp1F5+XV DdLxoN9quBPDSVDFQjAZPsyCD/pSRfiJYh9s7BdlhUPL6rk9uLIgZyZuPqy3kWXn 2Z02lWJpiHUtTaICdUDyNPFzTggDHEfY2DvmuedXpsyhlMkCdtFS5zoo/evl8pb6 xUV7qdfpjyLyTLmLWjYEVRO6DJJuFQWMK5Qpqn6O0y3wch3XV+At5QDk2TE2WMvB cYwd9nCqcMs7J0HrdoDmtLwew1jrLd1xefqDgD0zd6B/+Dk9W4gFD69Stmtarg7d KGRvH0wnL0keMxy31w== =zz09 -----END PGP SIGNATURE----- Merge tag 'rust-v6.1-rc1' of https://github.com/Rust-for-Linux/linux Pull Rust introductory support from Kees Cook: "The tree has a recent base, but has fundamentally been in linux-next for a year and a half[1]. It's been updated based on feedback from the Kernel Maintainer's Summit, and to gain recent Reviewed-by: tags. Miguel is the primary maintainer, with me helping where needed/wanted. Our plan is for the tree to switch to the standard non-rebasing practice once this initial infrastructure series lands. The contents are the absolute minimum to get Rust code building in the kernel, with many more interfaces[2] (and drivers - NVMe[3], 9p[4], M1 GPU[5]) on the way. The initial support of Rust-for-Linux comes in roughly 4 areas: - Kernel internals (kallsyms expansion for Rust symbols, %pA format) - Kbuild infrastructure (Rust build rules and support scripts) - Rust crates and bindings for initial minimum viable build - Rust kernel documentation and samples Rust support has been in linux-next for a year and a half now, and the short log doesn't do justice to the number of people who have contributed both to the Linux kernel side but also to the upstream Rust side to support the kernel's needs. Thanks to these 173 people, and many more, who have been involved in all kinds of ways: Miguel Ojeda, Wedson Almeida Filho, Alex Gaynor, Boqun Feng, Gary Guo, Björn Roy Baron, Andreas Hindborg, Adam Bratschi-Kaye, Benno Lossin, Maciej Falkowski, Finn Behrens, Sven Van Asbroeck, Asahi Lina, FUJITA Tomonori, John Baublitz, Wei Liu, Geoffrey Thomas, Philip Herron, Arthur Cohen, David Faust, Antoni Boucher, Philip Li, Yujie Liu, Jonathan Corbet, Greg Kroah-Hartman, Paul E. McKenney, Josh Triplett, Kent Overstreet, David Gow, Alice Ryhl, Robin Randhawa, Kees Cook, Nick Desaulniers, Matthew Wilcox, Linus Walleij, Joe Perches, Michael Ellerman, Petr Mladek, Masahiro Yamada, Arnaldo Carvalho de Melo, Andrii Nakryiko, Konstantin Shelekhin, Rasmus Villemoes, Konstantin Ryabitsev, Stephen Rothwell, Andy Shevchenko, Sergey Senozhatsky, John Paul Adrian Glaubitz, David Laight, Nathan Chancellor, Jonathan Cameron, Daniel Latypov, Shuah Khan, Brendan Higgins, Julia Lawall, Laurent Pinchart, Geert Uytterhoeven, Akira Yokosawa, Pavel Machek, David S. Miller, John Hawley, James Bottomley, Arnd Bergmann, Christian Brauner, Dan Robertson, Nicholas Piggin, Zhouyi Zhou, Elena Zannoni, Jose E. Marchesi, Leon Romanovsky, Will Deacon, Richard Weinberger, Randy Dunlap, Paolo Bonzini, Roland Dreier, Mark Brown, Sasha Levin, Ted Ts'o, Steven Rostedt, Jarkko Sakkinen, Michal Kubecek, Marco Elver, Al Viro, Keith Busch, Johannes Berg, Jan Kara, David Sterba, Connor Kuehl, Andy Lutomirski, Andrew Lunn, Alexandre Belloni, Peter Zijlstra, Russell King, Eric W. Biederman, Willy Tarreau, Christoph Hellwig, Emilio Cobos Álvarez, Christian Poveda, Mark Rousskov, John Ericson, TennyZhuang, Xuanwo, Daniel Paoliello, Manish Goregaokar, comex, Josh Stone, Stephan Sokolow, Philipp Krones, Guillaume Gomez, Joshua Nelson, Mats Larsen, Marc Poulhiès, Samantha Miller, Esteban Blanc, Martin Schmidt, Martin Rodriguez Reboredo, Daniel Xu, Viresh Kumar, Bartosz Golaszewski, Vegard Nossum, Milan Landaverde, Dariusz Sosnowski, Yuki Okushi, Matthew Bakhtiari, Wu XiangCheng, Tiago Lam, Boris-Chengbiao Zhou, Sumera Priyadarsini, Viktor Garske, Niklas Mohrin, Nándor István Krácser, Morgan Bartlett, Miguel Cano, Léo Lanteri Thauvin, Julian Merkle, Andreas Reindl, Jiapeng Chong, Fox Chen, Douglas Su, Antonio Terceiro, SeongJae Park, Sergio González Collado, Ngo Iok Ui (Wu Yu Wei), Joshua Abraham, Milan, Daniel Kolsoi, ahomescu, Manas, Luis Gerhorst, Li Hongyu, Philipp Gesang, Russell Currey, Jalil David Salamé Messina, Jon Olson, Raghvender, Angelos, Kaviraj Kanagaraj, Paul Römer, Sladyn Nunes, Mauro Baladés, Hsiang-Cheng Yang, Abhik Jain, Hongyu Li, Sean Nash, Yuheng Su, Peng Hao, Anhad Singh, Roel Kluin, Sara Saa, Geert Stappers, Garrett LeSage, IFo Hancroft, and Linus Torvalds" Link: https://lwn.net/Articles/849849/ [1] Link: https://github.com/Rust-for-Linux/linux/commits/rust [2] Link: |
||
Jakub Kicinski
|
ad061cf422 |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says: ==================== pull-request: bpf 2022-10-03 We've added 10 non-merge commits during the last 23 day(s) which contain a total of 14 files changed, 130 insertions(+), 69 deletions(-). The main changes are: 1) Fix dynptr helper API to gate behind CAP_BPF given it was not intended for unprivileged BPF programs, from Kumar Kartikeya Dwivedi. 2) Fix need_wakeup flag inheritance from umem buffer pool for shared xsk sockets, from Jalal Mostafa. 3) Fix truncated last_member_type_id in btf_struct_resolve() which had a wrong storage type, from Lorenz Bauer. 4) Fix xsk back-pressure mechanism on tx when amount of produced descriptors to CQ is lower than what was grabbed from xsk tx ring, from Maciej Fijalkowski. 5) Fix wrong cgroup attach flags being displayed to effective progs, from Pu Lehui. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xsk: Inherit need_wakeup flag for shared sockets bpf: Gate dynptr API behind CAP_BPF selftests/bpf: Adapt cgroup effective query uapi change bpftool: Fix wrong cgroup attach flags being assigned to effective progs bpf, cgroup: Reject prog_attach_flags array when effective query bpf: Ensure correct locking around vulnerable function find_vpid() bpf: btf: fix truncated last_member_type_id in btf_struct_resolve selftests/xsk: Add missing close() on netns fd xsk: Fix backpressure mechanism on Tx MAINTAINERS: Add include/linux/tnum.h to BPF CORE ==================== Link: https://lore.kernel.org/r/20221003201957.13149-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Matthias Goergens
|
710bb68c2e |
hugetlb_encode.h: fix undefined behaviour (34 << 26)
Left-shifting past the size of your datatype is undefined behaviour in C. The literal 34 gets the type `int`, and that one is not big enough to be left shifted by 26 bits. An `unsigned` is long enough (on any machine that has at least 32 bits for their ints.) For uniformity, we mark all the literals as unsigned. But it's only really needed for HUGETLB_FLAG_ENCODE_16GB. Thanks to Randy Dunlap for an initial review and suggestion. Link: https://lkml.kernel.org/r/20220905031904.150925-1-matthias.goergens@gmail.com Signed-off-by: Matthias Goergens <matthias.goergens@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Jakub Kicinski
|
a08d97a193 |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-03 We've added 143 non-merge commits during the last 27 day(s) which contain a total of 151 files changed, 8321 insertions(+), 1402 deletions(-). The main changes are: 1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu. 2) Add support for struct-based arguments for trampoline based BPF programs, from Yonghong Song. 3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa. 4) Batch of improvements to veristat selftest tool in particular to add CSV output, a comparison mode for CSV outputs and filtering, from Andrii Nakryiko. 5) Add preparatory changes needed for the BPF core for upcoming BPF HID support, from Benjamin Tissoires. 6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program types, from Daniel Xu. 7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler. 8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer / single-kernel-consumer semantics for BPF ring buffer, from David Vernet. 9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF hashtab's bucket lock, from Hou Tao. 10) Allow creating an iterator that loops through only the resources of one task/thread instead of all, from Kui-Feng Lee. 11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi. 12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT entry which is not yet inserted, from Lorenzo Bianconi. 13) Remove invalid recursion check for struct_ops for TCP congestion control BPF programs, from Martin KaFai Lau. 14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu. 15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa. 16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen. 17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu. 18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta. 19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits) net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Documentation: bpf: Add implementation notes documentations to table of contents bpf, docs: Delete misformatted table. selftests/xsk: Fix double free bpftool: Fix error message of strerror libbpf: Fix overrun in netlink attribute iteration selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged" samples/bpf: Fix typo in xdp_router_ipv4 sample bpftool: Remove unused struct event_ring_info bpftool: Remove unused struct btf_attach_point bpf, docs: Add TOC and fix formatting. bpf, docs: Add Clang note about BPF_ALU bpf, docs: Move Clang notes to a separate file bpf, docs: Linux byteswap note bpf, docs: Move legacy packet instructions to a separate file selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline ... ==================== Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
dda0ba40da |
nolibc pull request for v6.1
This pull request provides nolibc updates, most notably greatly improved testing. These tests are located in tools/testing/selftests/nolibc. The output of "make help" is as follows: Supported targets under selftests/nolibc: all call the "run" target below help this help sysroot create the nolibc sysroot here (uses $ARCH) nolibc-test build the executable (uses $CC and $CROSS_COMPILE) initramfs prepare the initramfs with nolibc-test defconfig create a fresh new default config (uses $ARCH) kernel (re)build the kernel with the initramfs (uses $ARCH) run runs the kernel in QEMU after building it (uses $ARCH, $TEST) rerun runs a previously prebuilt kernel in QEMU (uses $ARCH, $TEST) clean clean the sysroot, initramfs, build and output files The output file is "run.out". Test ranges may be passed using $TEST. Currently using the following variables: ARCH = x86 CROSS_COMPILE = CC = gcc OUTPUT = /home/git/linux-rcu/tools/testing/selftests/nolibc/ TEST = QEMU_ARCH = x86_64 [determined from $ARCH] IMAGE_NAME = bzImage [determined from $ARCH] The output of a successful x86 "make run" is currently as follows, with kernel build output omitted: $ make run 71 test(s) passed. $ -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmM5AeATHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jJCLD/wIps8OlIvNMlvT0Bwmi1gG6kTdJrXb Cj495z9F1P8c1x2nZBVuwPM05KM4SUvMLBzkND8le/AqrDP2Vc9WurEjnx6aYbjw aNrWeeVOYF2ykROpEyolF92bV0tu2aZEYmZ12Rxb3ybKYW8wf5t2M6JbU9VvLATB TNPov6cBUjJLqF5AVf4pJ3FNjGJH392NyTzGuKFYN4kK8XGPSxKFfpb8jJX/Jnr/ OQPZZurHclgIJzWYwjWpWBY5NBEMSUI8K2KqQIFzeVdNGpWmAyAyciCDG3zD+Rp1 GtcHrPEgvBQnPd5o89eZonMPHkIC5kOVU3Ebwzzx30GWQO8cseCZCb9foJCu918q wbJYPvopDw1Kd7NtltTzWj/xWkINgBcqdki5bEtcOW8i71RX8tgzUVx/tz771Vok /j21se9Svwi8ZnAY+dTxMNdy0Jd7eI0dJO6fMzagrQamDKZzNaO50WjiCW53/Eln JSttwSneWJ190PJ5ty6lNQ6OLGuILSXWhbYzRplgHiEcSwjtpxccgMRkiNQlQiow hXRNKQU/A7q3b5HPpf/kdQFGEk/aqFNDTpTuz1JCsCY2f9WQ6qz5stQuUKfFxyVh YD82JBI1DHtucs5dD/4ZbparHrJGG/NsHoe61wZYpXVrcJWfB0bMiie3zQ1cGoWN t6lw39lNotQbsw== =wqBE -----END PGP SIGNATURE----- Merge tag 'nolibc.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: "Most notably greatly improved testing. These tests are located in tools/testing/selftests/nolibc. The output of "make help" is as follows: Supported targets under selftests/nolibc: all call the "run" target below help this help sysroot create the nolibc sysroot here (uses $ARCH) nolibc-test build the executable (uses $CC and $CROSS_COMPILE) initramfs prepare the initramfs with nolibc-test defconfig create a fresh new default config (uses $ARCH) kernel (re)build the kernel with the initramfs (uses $ARCH) run runs the kernel in QEMU after building it (uses $ARCH, $TEST) rerun runs a previously prebuilt kernel in QEMU (uses $ARCH, $TEST) clean clean the sysroot, initramfs, build and output files The output file is "run.out". Test ranges may be passed using $TEST. Currently using the following variables: ARCH = x86 CROSS_COMPILE = CC = gcc OUTPUT = /home/git/linux-rcu/tools/testing/selftests/nolibc/ TEST = QEMU_ARCH = x86_64 [determined from $ARCH] IMAGE_NAME = bzImage [determined from $ARCH] The output of a successful x86 "make run" is currently as follows, with kernel build output omitted: $ make run 71 test(s) passed." * tag 'nolibc.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: selftests/nolibc: Avoid generated files being committed selftests/nolibc: add a "help" target selftests/nolibc: "sysroot" target installs a local copy of the sysroot selftests/nolibc: add a "run" target to start the kernel in QEMU selftests/nolibc: add a "defconfig" target selftests/nolibc: add a "kernel" target to build the kernel with the initramfs selftests/nolibc: support glibc as well selftests/nolibc: condition some tests on /proc existence selftests/nolibc: recreate and populate /dev and /proc if missing selftests/nolibc: on x86, support exiting with isa-debug-exit selftests/nolibc: exit with poweroff on success when getpid() == 1 selftests/nolibc: add a few tests for some libc functions selftests/nolibc: implement a few tests for various syscalls selftests/nolibc: support a test definition format selftests/nolibc: add basic infrastructure to ease creation of nolibc tests tools/nolibc: make sys_mmap() automatically use the right __NR_mmap definition tools/nolibc: fix build warning in sys_mmap() when my_syscall6 is not defined tools/nolibc: make argc 32-bit in riscv startup code |
||
Jakub Kicinski
|
accc3b4a57 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Kui-Feng Lee
|
21fb6f2aa3 |
bpf: Handle bpf_link_info for the parameterized task BPF iterators.
Add new fields to bpf_link_info that users can query it through bpf_obj_get_info_by_fd(). Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-3-kuifeng@fb.com |
||
Kui-Feng Lee
|
f0d74c4da1 |
bpf: Parameterize task iterators.
Allow creating an iterator that loops through resources of one thread/process. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com |
||
Miguel Ojeda
|
b8a94bfb33 |
kallsyms: increase maximum kernel symbol length to 512
Rust symbols can become quite long due to namespacing introduced by modules, types, traits, generics, etc. For instance, the following code: pub mod my_module { pub struct MyType; pub struct MyGenericType<T>(T); pub trait MyTrait { fn my_method() -> u32; } impl MyTrait for MyGenericType<MyType> { fn my_method() -> u32 { 42 } } } generates a symbol of length 96 when using the upcoming v0 mangling scheme: _RNvXNtCshGpAVYOtgW1_7example9my_moduleINtB2_13MyGenericTypeNtB2_6MyTypeENtB2_7MyTrait9my_method At the moment, Rust symbols may reach up to 300 in length. Setting 512 as the maximum seems like a reasonable choice to keep some headroom. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com> Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com> Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Co-developed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Gary Guo <gary@garyguo.net> Co-developed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> |
||
Christophe JAILLET
|
73dfe93ea1 |
headers: Remove some left-over license text
Remove some left-over from commit
|
||
Jiri Olsa
|
0e253f7e55 |
bpf: Return value in kprobe get_func_ip only for entry address
Changing return value of kprobe's version of bpf_get_func_ip to return zero if the attach address is not on the function's entry point. For kprobes attached in the middle of the function we can't easily get to the function address especially now with the CONFIG_X86_KERNEL_IBT support. If user cares about current IP for kprobes attached within the function body, they can get it with PT_REGS_IP(ctx). Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Liam R. Howlett
|
cc86e0c2f3 |
radix tree test suite: add support for slab bulk APIs
Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to the radix tree test suite. Link: https://lkml.kernel.org/r/20220906194824.2110408-6-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |