Patch series "Speed up mremap on large regions", v4.
mremap time can be optimized by moving entries at the PMD/PUD level if the
source and destination addresses are PMD/PUD-aligned and PMD/PUD-sized.
Enable moving at the PMD and PUD levels on arm64 and x86. Other
architectures where this type of move is supported and known to be safe
can also opt-in to these optimizations by enabling HAVE_MOVE_PMD and
HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
This patch (of 4):
Test mremap on regions of various sizes and alignments and validate data
after remapping. Also provide total time for remapping the region which
is useful for performance comparison of the mremap optimizations that move
pages at the PMD/PUD levels if HAVE_MOVE_PMD and/or HAVE_MOVE_PUD are
enabled.
Link: https://lkml.kernel.org/r/20201014005320.2233162-1-kaleshsingh@google.com
Link: https://lkml.kernel.org/r/20201014005320.2233162-2-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Hassan Naveed <hnaveed@wavecomp.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Jia He <justin.he@arm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Each invocation of userfaultfd for "anon" and "shmem" was taking about
6.5 sec to run, contributing to an overall run time of about 22 sec for
run_vmtests.sh.
Reduce the size and bounce input values to the userfaultfd invocation
within run_vmtests.sh, enough to get each invocation down to about 1.0
sec. This should still provide a reasonable smoke test, while staying
within a nominal time budget of around 1 second or so per test. And this
brings the overall running time of run_vmtests.sh down to 11 second.
Link: https://lkml.kernel.org/r/20201026064021.3545418-10-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HMM selftests are incredibly useful, but they are only effective if people
actually build and run them. All the other tests in selftests/vm can be
built with very standard, always-available libraries: libpthread, librt.
The hmm-tests.c program, on the other hand, requires something that is
(much) less readily available: libhugetlbfs. And so the build will
typically fail for many developers.
A simple attempt to install libhugetlbfs will also run into complications
on some common distros these days: Fedora and Arch Linux (yes, Arch AUR
has it, but that's fragile, as always with AUR). The library is not
maintained actively enough at the moment, for distros to deal with it. I
had to build it from source, for Fedora, and that didn't go too smoothly
either.
It turns out that, out of 21 tests in hmm-tests.c, only 2 actually require
functionality from libhugetlbfs. Therefore, if libhugetlbfs is missing,
simply ifdef those two tests out and allow the developer to at least have
the other 19 tests, if they don't want to pause to work through the above
issues. Also issue a warning, so that it's clear that there is an
imperfection in the build.
In order to do that, a tiny shell script (check_config.sh) runs a quick
compile (not link, that's too prone to false failures with library paths),
and basically, if the compiler doesn't find hugetlbfs.h in its standard
locations, then the script concludes that libhugetlbfs is not available.
The output is in two files, one for inclusion in hmm-test.c
(local_config.h), and one for inclusion in the Makefile (local_config.mk).
Link: https://lkml.kernel.org/r/20201026064021.3545418-9-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Run benchmarks on the _fast variants of gup and pup, as originally
intended.
Run the new gup_test sub-test: dump pages. In addition to exercising the
dump_page() call, it also demonstrates the various options you can use to
specify which pages to dump, and how.
Link: https://lkml.kernel.org/r/20201026064021.3545418-8-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For quite a while, I was doing a quick hack to gup_test.c (previously,
gup_benchmark.c) whenever I wanted to try out my changes to dump_page().
This makes that hack unnecessary, and instead allows anyone to easily get
the same coverage from a user space program. That saves a lot of time
because you don't have to change the kernel, in order to test different
pages and options.
The new sub-test takes advantage of the existing gup_test infrastructure,
which already provides a simple user space program, some allocated user
space pages, an ioctl call, pinning of those pages (via either
get_user_pages or pin_user_pages) and a corresponding kernel-side test
invocation. There's not much more required, mainly just a couple of
inputs from the user.
In fact, the new test re-uses the existing command line options in order
to get various helpful combinations (THP or normal, _fast or slow gup, gup
vs. pup, and more).
New command line options are: which pages to dump, and what type of
"get/pin" to use.
In order to figure out which pages to dump, the logic is:
* If the user doesn't specify anything, the page 0 (the first page in
the address range that the program sets up for testing) is dumped.
* Or, the user can type up to 8 page indices anywhere on the command
line. If you type more than 8, then it uses the first 8 and ignores the
remaining items.
For example:
./gup_test -ct -F 1 0 19 0x1000
Meaning:
-c: dump pages sub-test
-t: use THP pages
-F 1: use pin_user_pages() instead of get_user_pages()
0 19 0x1000: dump pages 0, 19, and 4096
Link: https://lkml.kernel.org/r/20201026064021.3545418-7-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Therefore, some minor cleanup and improvements are in order:
1. Rename the other items appropriately.
2. Stop reporting timing information on the non-benchmark items. It's
still being recorded and is available, but there's no point in
cluttering up the report with data that no one reasonably needs to
check.
3. Don't do iterations, for non-benchmark items.
4. Print out a shorter, more appropriate report for the non-benchmark
tests.
5. Add the command that was run, to the report. This really helps, as
there are quite a lot of options now.
6. Use a larger integer type for cmd, now that it's being compared
Otherwise it doesn't work, because in this case cmd is about 3 billion,
which is the perfect size for problems with signed vs unsigned int.
Link: https://lkml.kernel.org/r/20201026064021.3545418-6-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few cleanups that don't deserve separate patches, but that also should
not clutter up other functional changes:
1. Remove an unnecessary #include <prctl.h>
2. Restore the sorted order of TEST_GEN_FILES.
3. Add -lpthread to the common LDLIBS, as it is harmless and several
tests use it. This gets rid of one special rule already.
Link: https://lkml.kernel.org/r/20201026064021.3545418-5-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename to *.sh, in order to match the conventions of all of the other
items in selftest/vm.
The only reason not to use a .sh suffix a shell script like this, might be
to make it look more like a normal program, but that's not an issue here.
Link: https://lkml.kernel.org/r/20201026064021.3545418-4-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid the need to copy-paste the gup_test ioctl commands and the struct
gup_test definition, between the kernel and the user space application, by
providing a new header file for these. This allows easier and safer
adding of new ioctl calls, as well as reducing the overall line count.
Details: The header file has to be able to compile independently, because
of the arguably unfortunate way that the Makefile is written: the Makefile
tries to build all of its prerequisites, when really it should be only
building the .c files, and leaving the other prerequisites (LOCAL_HDRS) as
pure dependencies.
That Makefile limitation is probably not worth fixing, but it explains why
one of the includes had to be moved into the new header file.
Also: simplify the ioctl struct (struct gup_test), by deleting the unused
__expansion[10] field. This sort of thing is what you might see in a
stable ABI, but this low-level, kernel-developer-oriented selftests/vm
system is very much not subject to ABI stability. So "expansion" and
"reserved" fields are unnecessary here.
Link: https://lkml.kernel.org/r/20201026064021.3545418-3-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3.
Summary: This series provides two main things, and a number of smaller
supporting goodies. The two main points are:
1) Add a new sub-test to gup_test, which in turn is a renamed version
of gup_benchmark. This sub-test allows nicer testing of dump_pages(),
at least on user-space pages.
For quite a while, I was doing a quick hack to gup_test.c whenever I
wanted to try out changes to dump_page(). Then Matthew Wilcox asked me
what I meant when I said "I used my dump_page() unit test", and I
realized that it might be nice to check in a polished up version of
that.
Details about how it works and how to use it are in the commit
description for patch #6 ("selftests/vm: gup_test: introduce the
dump_pages() sub-test").
2) Fixes a limitation of hmm-tests: these tests are incredibly useful,
but only if people actually build and run them. And it turns out that
libhugetlbfs is a little too effective at throwing a wrench in the
works, there. So I've added a little configuration check that removes
just two of the 21 hmm-tests, if libhugetlbfs is not available.
Further details in the commit description of patch #8
("selftests/vm: hmm-tests: remove the libhugetlbfs dependency").
Other smaller things that this series does:
a) Remove code duplication by creating gup_test.h.
b) Clear up the sub-test organization, and their invocation within
run_vmtests.sh.
c) Other minor assorted improvements.
[1] v2 is here:
https://lore.kernel.org/linux-doc/20200929212747.251804-1-jhubbard@nvidia.com/
[2] https://lore.kernel.org/r/CAHk-=wgh-TMPHLY3jueHX7Y2fWh3D+nMBqVS__AZm6-oorquWA@mail.gmail.com
This patch (of 9):
Rename nearly every "gup_benchmark" reference and file name to "gup_test".
The one exception is for the actual gup benchmark test itself.
The current code already does a *little* bit more than benchmarking, and
definitely covers more than get_user_pages_fast(). More importantly,
however, subsequent patches are about to add some functionality that is
non-benchmark related.
Closely related changes:
* Kconfig: in addition to renaming the options from GUP_BENCHMARK to
GUP_TEST, update the help text to reflect that it's no longer a
benchmark-only test.
Link: https://lkml.kernel.org/r/20201026064021.3545418-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20201026064021.3545418-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/UDHQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMGeQf9EtGft5U5EihqAbNr2O61Bh4ptCIT
+qNWWfuGQkKLsP6PCHMUJnNI3WJy2/Gb5+nUHjFXSEZBP2l3KGRuDniAdm4+DyEi
2khVmJiXYn2q2yfodmpHA/dqav3OHSrsq2IfH+J+WAFlIHnjkdz3Wk1zNFk7Y/xv
PVv2czvXhsnrvHvNp5e1+YsVGkMZc9fwXLRbac7ptmaKUKCBAgpZO8Gkc2GGgOdE
zUDp3qA8/7Ys+vzzYfPrRMUhev9dgE4x2TBmtOuzqOcfj2FOKRbKbwjur37fJ61j
Px4F2ZI0GEL0RrHvZK1vZ5KO41BcD+gQPumKAg1Lgz312loKj85RG8nBEQ==
=BJ9g
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes for ARM, x86 and tools"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools/kvm_stat: Exempt time-based counters
KVM: mmu: Fix SPTE encoding of MMIO generation upper half
kvm: x86/mmu: Use cpuid to determine max gfn
kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()
selftests: kvm/set_memory_region_test: Fix race in move region test
KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
KVM: arm64: Fix handling of merging tables into a block entry
KVM: arm64: Fix memory leak on stage2 update of a valid PTE
The new counters halt_poll_success_ns and halt_poll_fail_ns do not count
events. Instead they provide a time, and mess up our statistics. Therefore,
we should exclude them.
Removal is currently implemented with an exempt list. If more counters like
these appear, we can think about a more general rule like excluding all
fields name "*_ns", in case that's a standing convention.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Tested-and-reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20201208210829.101324-1-raspl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.
Fixes: eaa6bcb71e ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ktest.pl does not know about grub2bls that was introduced in Fedora 30,
and now it does.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX9LBmRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qkM4AQC47T6BMbpUixDs6mS2CKpnJMC0vkQY
xcWKxbd8EcpI8gEAzTarP4HSlWu/YBcLinf+GP5qGiQLFuJ5rMibXXfQNQQ=
=FAGh
-----END PGP SIGNATURE-----
Merge tag 'ktest-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest fix from Steven Rostedt:
"Fix issues with grub2bls in ktest.pl
ktest.pl did not know about grub2bls that was introduced in Fedora 30,
and now it does"
* tag 'ktest-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Fix incorrect reboot for grub2bls
Pull networking fixes from David Miller:
1) IPsec compat fixes, from Dmitry Safonov.
2) Fix memory leak in xfrm_user_policy(). Fix from Yu Kuai.
3) Fix polling in xsk sockets by using sk_poll_wait() instead of
datagram_poll() which keys off of sk_wmem_alloc and such which xsk
sockets do not update. From Xuan Zhuo.
4) Missing init of rekey_data in cfgh80211, from Sara Sharon.
5) Fix destroy of timer before init, from Davide Caratti.
6) Missing CRYPTO_CRC32 selects in ethernet driver Kconfigs, from Arnd
Bergmann.
7) Missing error return in rtm_to_fib_config() switch case, from Zhang
Changzhong.
8) Fix some src/dest address handling in vrf and add a testcase. From
Stephen Suryaputra.
9) Fix multicast handling in Seville switches driven by mscc-ocelot
driver. From Vladimir Oltean.
10) Fix proto value passed to skb delivery demux in udp, from Xin Long.
11) HW pkt counters not reported correctly in enetc driver, from Claudiu
Manoil.
12) Fix deadlock in bridge, from Joseph Huang.
13) Missing of_node_pur() in dpaa2 driver, fromn Christophe JAILLET.
14) Fix pid fetching in bpftool when there are a lot of results, from
Andrii Nakryiko.
15) Fix long timeouts in nft_dynset, from Pablo Neira Ayuso.
16) Various stymmac fixes, from Fugang Duan.
17) Fix null deref in tipc, from Cengiz Can.
18) When mss is biog, coose more resonable rcvq_space in tcp, fromn Eric
Dumazet.
19) Revert a geneve change that likely isnt necessary, from Jakub
Kicinski.
20) Avoid premature rx buffer reuse in various Intel driversm from Björn
Töpel.
21) retain EcT bits during TIS reflection in tcp, from Wei Wang.
22) Fix Tso deferral wrt. cwnd limiting in tcp, from Neal Cardwell.
23) MPLS_OPT_LSE_LABEL attribute is 342 ot 8 bits, from Guillaume Nault
24) Fix propagation of 32-bit signed bounds in bpf verifier and add test
cases, from Alexei Starovoitov.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
selftests: fix poll error in udpgro.sh
selftests/bpf: Fix "dubious pointer arithmetic" test
selftests/bpf: Fix array access with signed variable test
selftests/bpf: Add test for signed 32-bit bound check bug
bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds.
MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver
net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower
net/mlx4_en: Handle TX error CQE
net/mlx4_en: Avoid scheduling restart task if it is already running
tcp: fix cwnd-limited bug for TSO deferral where we send nothing
net: flow_offload: Fix memory leak for indirect flow block
tcp: Retain ECT bits for tos reflection
ethtool: fix stack overflow in ethnl_parse_bitset()
e1000e: fix S0ix flow to allow S0i3.2 subset entry
ice: avoid premature Rx buffer reuse
ixgbe: avoid premature Rx buffer reuse
i40e: avoid premature Rx buffer reuse
igb: avoid transmit queue timeout in xdp path
igb: use xdp_do_flush
igb: skb add metasize for xdp
...
Alexei Starovoitov says:
====================
pull-request: bpf 2020-12-10
The following pull-request contains BPF updates for your *net* tree.
We've added 21 non-merge commits during the last 12 day(s) which contain
a total of 21 files changed, 163 insertions(+), 88 deletions(-).
The main changes are:
1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei.
2) Fix ring_buffer__poll() return value, from Andrii.
3) Fix race in lwt_bpf, from Cong.
4) Fix test_offload, from Toke.
5) Various xsk fixes.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Thanks a lot!
Also thanks to reporters, reviewers and testers of commits in this pull-request:
Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John
Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The test program udpgso_bench_rx always invokes the poll()
syscall with a timeout of 10ms. If a larger timeout is specified
via the command line, udpgso_bench_rx is supposed to do multiple
poll() calls till the timeout is expired or an event is received.
Currently the poll() loop errors out after the first invocation with
no events, and may causes self-tests failure alike:
failed
GRO with custom segment size ./udpgso_bench_rx: poll: 0x0 expected 0x1
This change addresses the issue allowing the poll() loop to consume
all the configured timeout.
Fixes: ada641ff6e ("selftests: fixes for UDP GRO")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The verifier trace changed following a bugfix. After checking the 64-bit
sign, only the upper bit mask is known, not bit 31. Update the test
accordingly.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test fails because of a recent fix to the verifier, even though this
program is valid. In details what happens is:
7: (61) r1 = *(u32 *)(r0 +0)
Load a 32-bit value, with signed bounds [S32_MIN, S32_MAX]. The bounds
of the 64-bit value are [0, U32_MAX]...
8: (65) if r1 s> 0xffffffff goto pc+1
... therefore this is always true (the operand is sign-extended).
10: (b4) w2 = 11
11: (6d) if r2 s> r1 goto pc+1
When true, the 64-bit bounds become [0, 10]. The 32-bit bounds are still
[S32_MIN, 10].
13: (64) w1 <<= 2
Because this is a 32-bit operation, the verifier propagates the new
32-bit bounds to the 64-bit ones, and the knowledge gained from insn 11
is lost.
14: (0f) r0 += r1
15: (7a) *(u64 *)(r0 +0) = 4
Then the verifier considers r0 unbounded here, rejecting the test. To
make the test work, change insn 8 to check the sign of the 32-bit value.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After a 32-bit load followed by a branch, the verifier would reduce the
maximum bound of the register to 0x7fffffff, allowing a user to bypass
bound checks. Ensure such a program is rejected.
In the second test, the 64-bit compare should not sufficient to
determine whether the signed 32-bit lower bound is 0, so the verifier
should reject the second branch.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A few of the tests in test_offload.py expects to see a certain number of
maps created, and checks this by counting the number of maps returned by
bpftool. There is already a filter that will remove any maps already there
at the beginning of the test, but bpftool now creates a map for the PID
iterator rodata on each invocation, which makes the map count wrong. Fix
this by also filtering the pid_iter.rodata map by name when counting.
Fixes: d53dee3fe0 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226387.110217.9887866138149423444.stgit@toke.dk
When setting the ethtool feature flag fails (as expected for the test), the
kernel now tracks that the feature was requested to be 'off' and refuses to
subsequently disable it again. So reset it back to 'on' so a subsequent
disable (that's not supposed to fail) can succeed.
Fixes: 417ec26477 ("selftests/bpf: add offload test based on netdevsim")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226280.110217.10696241563705667871.stgit@toke.dk
Commit 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF programs
in net_device") changed the case of some of the extack messages being
returned when attaching of XDP programs failed. This broke test_offload.py,
so let's fix the test to reflect this.
Fixes: 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226175.110217.11214100824416344952.stgit@toke.dk
Since commit 6f8a57ccf8 ("bpf: Make verifier log more relevant by
default"), the verifier discards log messages for successfully-verified
programs. This broke test_offload.py which is looking for a verification
message from the driver callback. Change test_offload.py to use the toggle
in netdevsim to make the verification fail before looking for the
verification message.
Fixes: 6f8a57ccf8 ("bpf: Make verifier log more relevant by default")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752226069.110217.12370824996153348073.stgit@toke.dk
Since we just removed the xdp_attachment_flags_ok() callback, also remove
the check for it in test_offload.py, and replace it with a test for the new
ambiguity-avoid check when multiple programs are loaded.
Fixes: 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225858.110217.13036901876869496246.stgit@toke.dk
In case of having so many PID results that they don't fit into a singe page
(4096) bytes, bpftool will erroneously conclude that it got corrupted data due
to 4096 not being a multiple of struct pid_iter_entry, so the last entry will
be partially truncated. Fix this by sizing the buffer to fit exactly N entries
with no truncation in the middle of record.
Fixes: d53dee3fe0 ("tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201204232002.3589803-1-andrii@kernel.org
- Make the AMD L3 QoS code and data priorization enable/disable mechanism
work correctly. The control bit was only set/cleared on one of the CPUs
in a L3 domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one from
Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes the UV
hubs to be identified wrongly. The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the uprobes
code which fails to handle repeated prefixes properly. The aggregate
size of the prefixes can be larger than the bytes array but the code
blindly iterated over the aggregate size beyond the array boundary.
Add a macro to handle this case properly and use it at the affected
places.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M2GoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUGeD/0TxQyVIZTJjY/ExDYvQZeSWgvFVon9
A/QcwkgtkWzYKI3YThgxBtSNKhHPP8eQ9QK8ErcaQKEHU5TdXVEOwHgTm1A6OGat
0+M9/EWEe7tTu+cLpu7esQ+1VbSvEcFXZbljbhCKrShlBlsIjFG4eCPAprSw9yjI
RSgfXKZu4NmHVS6nfJTwmIIaTeLQ6U3b7b1D5s/66slBFScqnLbRNhABVbHbos4F
pl/lxDCFOddy2YbEojHjjGqMA7oxPav7c0nYFOM/zG+wAqfEjbqOxReT31bGQPi2
XT9K4JEqDqILo0KnhV4GsYoWAhes3BtmsJ9IoZ7IijsMriYl80mD9URAORidJ4PX
28Ckk9V/DlE8uDrAnBDcWDSoKlg78mhVV7V9L6v43teg/gJfSZNROtNDBmqRmwG4
Op2NJfzJITtaxVQuSZRkSs8rzGv+QUfaM1sBUQ+Oz4KYeIjjA7G2MAOECrzIAWKB
GWc5toYRVS6oGT+RbZhSxZYoh8ASoGJ2MrL8K4OV4RqEqHHcXcih0WmmljtsDIFI
td4FHHH6fghIb9S6iYKiApd6k2qKa33mwJwa/xZOoIrv0w5xT0WDJnxT60gu/Mec
YDkqhmA009CNSD2G4oNRNF5MH7gp34UII+25jOGatbVh+5DDPYs+5Jnh/DR7jssR
PryAG9ER7UUb6w==
=rNBq
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Make the AMD L3 QoS code and data priorization enable/disable
mechanism work correctly.
The control bit was only set/cleared on one of the CPUs in a L3
domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one
from Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes
the UV hubs to be identified wrongly.
The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the
uprobes code which fails to handle repeated prefixes properly.
The aggregate size of the prefixes can be larger than the bytes
array but the code blindly iterated over the aggregate size beyond
the array boundary. Add a macro to handle this case properly and
use it at the affected places"
* tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
x86/platform/uv: Fix UV4 hub revision adjustment
x86/resctrl: Fix AMD L3 QOS CDP enable/disable
The error handling in hugetlb_allocate_area() was incorrect for the
hugetlb_shared test case.
Previously the behavior was:
- mmap a hugetlb area
- If this fails, set the pointer to NULL, and carry on
- mmap an alias of the same hugetlb fd
- If this fails, munmap the original area
If the original mmap failed, it's likely the second one did too. If
both failed, we'd blindly try to munmap a NULL pointer, causing a
SIGSEGV. Instead, "goto fail" so we return before trying to mmap the
alias.
This issue can be hit "in real life" by forgetting to set
/proc/sys/vm/nr_hugepages (leaving it at 0), and then trying to run the
hugetlb_shared test.
Another small improvement is, when the original mmap fails, don't just
print "it failed": perror(), so we can see *why*. :)
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Alan Gilbert <dgilbert@redhat.com>
Link: https://lkml.kernel.org/r/20201204203443.2714693-1-axelrasmussen@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only x86 and PowerPC implement the pkey-xxx.h, and an error was reported
when compiling protection_keys.c.
Add a Arch judgment to compile "protection_keys" in the Makefile.
If other arch implement this, add the arch name to the Makefile.
eg:
ifneq (,$(findstring $(ARCH),powerpc mips ... ))
Following build errors:
pkey-helpers.h:93:2: error: #error Architecture not supported
#error Architecture not supported
pkey-helpers.h:96:20: error: `PKEY_DISABLE_ACCESS' undeclared
#define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
^
protection_keys.c:218:45: error: `PKEY_DISABLE_WRITE' undeclared
pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
^
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Link: https://lkml.kernel.org/r/1606826876-30656-1-git-send-email-suxingxing@loongson.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes.
Introduce a for_each_insn_prefix() macro for this purpose. Debugged by
Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message, sync with the respective header in tools/
and drop "we". ]
Fixes: 2b14449835 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697103739.3146288.7437620795200799020.stgit@devnote2
Depending on the order of the routes to fe80::/64 are installed on the
VRF table, the NS for the source link-local address of the originator
might be sent to the wrong interface.
This patch ensures that packets with link-local addr source is doing a
lookup with the orig_iif when the destination addr indicates that it
is strict.
Add the reproducer as a use case in self test script fcnal-test.sh.
Fixes: b4869aa2f8 ("net: vrf: ipv6 support for local traffic to local addresses")
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201204030604.18828-1-ssuryaextr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current memory region move test correctly handles the situation that
the second (realigning) memslot move operation would temporarily trigger
MMIO until it completes, however it does not handle the case in which the
first (misaligning) move operation does this, too.
This results in false test assertions in case it does so.
Fix this by handling temporary MMIO from the first memslot move operation
in the test guest code, too.
Fixes: 8a0639fe92 ("KVM: sefltests: Add explicit synchronization to move mem region test")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <0fdddb94bb0e31b7da129a809a308d91c10c0b5e.1606941224.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In case the bootconfig is created on one kind of endian machine, and then
read on the other kind of endian kernel, the size and checksum will be
incorrect. Instead, have both the size and checksum always be little
endian and have the tool and the kernel convert it from little endian to
or from the host endian.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX8brThQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qiBMAQDe1vsp/SyHO9H5pnsepdmk4fERn0bC
Q0qtCoYp1xUKOQEAjnOJKdCE1O6n24u+b+3jw3BHswQLyUKOFaPcIM7jSgM=
=Z6kA
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.10-rc6-bootconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull bootconfig fixes from Steven Rostedt:
"Have bootconfig size and checksum be little endian
In case the bootconfig is created on one kind of endian machine, and
then read on the other kind of endian kernel, the size and checksum
will be incorrect. Instead, have both the size and checksum always be
little endian and have the tool and the kernel convert it from little
endian to or from the host endian"
* tag 'trace-v5.10-rc6-bootconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
docs: bootconfig: Add the endianness of fields
tools/bootconfig: Store size and checksum in footer as le32
bootconfig: Load size and checksum in the footer as le32
Avoid occasional test failures due to the last sample being delayed to
another ring_buffer__poll() call. Instead, drain samples completely with
ring_buffer__consume(). This is supposed to fix a rare and non-deterministic
test failure in libbpf CI.
Fixes: cb1c9ddd55 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201130223336.904192-2-andrii@kernel.org
Fix ring_buffer__poll() to return the number of non-discarded records
consumed, just like its documentation states. It's also consistent with
ring_buffer__consume() return. Fix up selftests with wrong expected results.
Fixes: bf99c936f9 ("libbpf: Add BPF ring buffer support")
Fixes: cb1c9ddd55 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201130223336.904192-1-andrii@kernel.org
- Use correct timestamp variable for ring buffer write stamp update
- Fix up before stamp and write stamp when crossing ring buffer sub
buffers
- Keep a zero delta in ring buffer in slow path if cmpxchg fails
- Fix trace_printk static buffer for archs that care
- Fix ftrace record accounting for ftrace ops with trampolines
- Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
- Remove WARN_ON in hwlat tracer that triggers on something that is OK
- Make "my_tramp" trampoline in ftrace direct sample code global
- Fixes in the bootconfig tool for better alignment management
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX8ZzghQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qg0JAQCII1bDQyF3APLlNFRqfHf3bTo7Zl5z
WaUd1Cd7JkY+WAD/eF1dWjN0JRtfU+oRlk6UZ4oNmp8WMJvQ7oV26ub2egE=
=lts8
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Use correct timestamp variable for ring buffer write stamp update
- Fix up before stamp and write stamp when crossing ring buffer sub
buffers
- Keep a zero delta in ring buffer in slow path if cmpxchg fails
- Fix trace_printk static buffer for archs that care
- Fix ftrace record accounting for ftrace ops with trampolines
- Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
- Remove WARN_ON in hwlat tracer that triggers on something that is OK
- Make "my_tramp" trampoline in ftrace direct sample code global
- Fixes in the bootconfig tool for better alignment management
* tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Always check to put back before stamp when crossing pages
ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
ftrace: Fix updating FTRACE_FL_TRAMP
tracing: Fix alignment of static buffer
tracing: Remove WARN_ON in start_thread()
samples/ftrace: Mark my_tramp[12]? global
ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next()
ring-buffer: Update write stamp with the correct ts
docs: bootconfig: Update file format on initrd image
tools/bootconfig: Align the bootconfig applied initrd image size to 4
tools/bootconfig: Fix to check the write failure correctly
tools/bootconfig: Fix errno reference after printf()
Store the size and the checksum fields in the footer as le32
instead of u32. This will allow us to apply bootconfig to the
cross build initrd without caring the endianness.
Link: https://lkml.kernel.org/r/160583935332.547349.5897811300636587426.stgit@devnote2
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This issue was first noticed when I was testing different kernels on
Oracle Linux 8 which as Fedora 30+ adopts BLS as default. Even though a
kernel entry was added successfully and the index of that kernel entry was
retrieved correctly, ktest still wouldn't reboot the system into
user-specified kernel.
The bug was spotted in subroutine reboot_to where the if-statement never
checks for REBOOT_TYPE "grub2bls", therefore the desired entry will not be
set for the next boot.
Add a check for "grub2bls" so that $grub_reboot $grub_number can
be run before a reboot if REBOOT_TYPE is "grub2bls" then we can boot to
the correct kernel.
Link: https://lkml.kernel.org/r/20201121021243.1532477-1-libo.chen@oracle.com
Cc: stable@vger.kernel.org
Fixes: ac2466456e ("ktest: introduce grub2bls REBOOT_TYPE option")
Signed-off-by: Libo Chen <libo.chen@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
a proper kernel configuration for running kselftest can be obtained with:
$ yes | make kselftest-merge
enable compile support for the 'red' qdisc: otherwise, tdc kselftest fail
when trying to run tdc test items contained in red.json.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/cfa23f2d4f672401e6cebca3a321dd1901a9ff07.1606416464.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2020-11-28
1) Do not reference the skb for xsk's generic TX side since when looped
back into RX it might crash in generic XDP, from Björn Töpel.
2) Fix umem cleanup on a partially set up xsk socket when being destroyed,
from Magnus Karlsson.
3) Fix an incorrect netdev reference count when failing xsk_bind() operation,
from Marek Majtyka.
4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(),
from Zhen Lei.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add MAINTAINERS entry for BPF LSM
bpftool: Fix error return value in build_btf_type_table
net, xsk: Avoid taking multiple skbuff references
xsk: Fix incorrect netdev reference count
xsk: Fix umem cleanup bug at socket destruct
MAINTAINERS: Update XDP and AF_XDP entries
====================
Link: https://lore.kernel.org/r/20201128005104.1205-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since some gcc generates a broken DWARF which lacks DW_AT_declaration
attribute from the subprogram DIE of function prototype.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060)
So, in addition to the DW_AT_declaration check, we also check the
subprogram DIE has DW_AT_inline or actual entry pc.
Committer testing:
# cat /etc/fedora-release
Fedora release 33 (Thirty Three)
#
Before:
# perf test vfs_getname
78: Use vfs_getname probe to get syscall args filenames : FAILED!
79: Check open filename arg using perf trace + vfs_getname : FAILED!
81: Add vfs_getname probe to get syscall args filenames : FAILED!
#
After:
# perf test vfs_getname
78: Use vfs_getname probe to get syscall args filenames : Ok
79: Check open filename arg using perf trace + vfs_getname : Ok
81: Add vfs_getname probe to get syscall args filenames : Ok
#
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: http://lore.kernel.org/lkml/160645613571.2824037.7441351537890235895.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix die_entrypc() to return error correctly if the DIE has no
DW_AT_ranges attribute. Since dwarf_ranges() will treat the case as an
empty ranges and return 0, we have to check it by ourselves.
Fixes: 91e2f539ee ("perf probe: Fix to show function entry line as probe-able")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lore.kernel.org/lkml/160645612634.2824037.5284932731175079426.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.
Before:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 116,057,380 cycles
CPU1 86,084,722 cycles
CPU2 99,423,125 cycles
CPU3 98,272,994 cycles
CPU0 53,369,217 instructions # 0.46 insn per cycle
CPU1 33,378,058 instructions # 0.29 insn per cycle
CPU2 58,150,086 instructions # 0.50 insn per cycle
CPU3 40,029,703 instructions # 0.34 insn per cycle
1.001816971 seconds time elapsed
So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
After:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 109,621,384 cycles
CPU1 159,026,454 cycles
CPU2 99,460,366 cycles
CPU3 124,144,142 cycles
CPU0 44,396,706 instructions # 0.41 insn per cycle
CPU1 120,195,425 instructions # 0.76 insn per cycle
CPU2 44,763,978 instructions # 0.45 insn per cycle
CPU3 69,049,079 instructions # 0.56 insn per cycle
1.001910444 seconds time elapsed
Fixes: 44d49a6002 ("perf stat: Support metrics in --per-core/socket mode")
Reported-by: Sam Xi <xyzsam@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127041404.390276-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It didn't check the tool->cgroup_events bit which is set when the
--all-cgroups option is given. Without it, samples will not have cgroup
info so no reason to synthesize.
We can check the PERF_RECORD_CGROUP records after running perf record
*WITHOUT* the --all-cgroups option:
Before:
$ perf report -D | grep CGROUP
0 0 0x8430 [0x38]: PERF_RECORD_CGROUP cgroup: 1 /
CGROUP events: 1
CGROUP events: 0
CGROUP events: 0
After:
$ perf report -D | grep CGROUP
CGROUP events: 0
CGROUP events: 0
CGROUP events: 0
Committer testing:
Before:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.208 MB perf.data (10003 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 146
CGROUP events: 0
CGROUP events: 0
#
After:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.208 MB perf.data (10448 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 0
CGROUP events: 0
CGROUP events: 0
#
With all-cgroups:
# perf record --all-cgroups -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.374 MB perf.data (11526 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 146
CGROUP events: 0
CGROUP events: 0
#
Fixes: 8fb4b67939 ("perf record: Add --all-cgroups option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127054356.405481-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An appropriate return value should be set on the failed path.
Fixes: 2a09a84c72 ("perf diff: Support hot streams comparison")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201124103652.438-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
7a078d2d18 ("libbpf, hashmap: Fix undefined behavior in hash_bits")
That don't entail any changes in tools/perf.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An appropriate return value should be set on the failed path.
Fixes: 4d374ba0bf ("tools: bpftool: implement "bpftool btf show|list"")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201124104100.491-1-thunder.leizhen@huawei.com
- Fix typos in seccomp selftests on powerpc and sh (Kees Cook)
- Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël Salaün)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl+4FQgACgkQiXL039xt
wCYERA/8Cwb8p8PWbtOq6uUKsZ4kJQuMZee3crP+LM0B1a427hOCSuezoDtOY1wO
N7IEOr3ipHxoxZ1Zf2Ln4lGyktxlwzA8pqlGZyqR5zFIl+0HArbXdAfcRMcaeqUp
eSV30CNBD5XfGLc2R5s1qHnrshVBJFebCpgMfSCOQQWMpZ51nnaoFN8N8iSEx6PN
kYHC1C1WX5g3vtKo29xS2Y7KCThMOXvcNI7eFpVD0C4ZwEr8lywbTTzBBhXUIGBX
6NoNOV7kVxIjNLQ8x17F1OacrC6h4ZzNTl4MEYnMZ/Mw0NVB3MvoHQohwW+Y98Rf
97sPQPZjYeJ6xURolRsWX+kvXC7PyLYvfldsQi00QDfdc6RGu0pnsG4UuivsldlY
OhswE9Q/KKHmzXiHnZBmcw4NcSyhZiL3LYB1VZl3jDobeOhVKyHw72vo8Zrhhz8A
ksCDg3vNvOo/x2iH9GSUG4Fjk8coXRif8P6lH5Btw6V+x9ZlFiaW5WbSbP0G3PzJ
zS5nPu8PE6Sm70XlRn0BbRmIjV9AhEZqNNZoOsndrbR86klH6WolyCB4ifj2MKuR
ZwbeDblUrYyRne/Ll9XGQVDSFv8J5phxtDQM0phiGK0jOsvXqbl8RvlckCKqBwm0
7VgtEumU5vJTx01avrXw86Sj7B2IR4M1nTgpwWJ2EVs9U98Emew=
=1Mse
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"This gets the seccomp selftests running again on powerpc and sh, and
fixes an audit reporting oversight noticed in both seccomp and ptrace.
- Fix typos in seccomp selftests on powerpc and sh (Kees Cook)
- Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël
Salaün)"
* tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: sh: Fix register names
selftests/seccomp: powerpc: Fix typo in macro variable name
seccomp: Set PF_SUPERPRIV when checking capability
ptrace: Set PF_SUPERPRIV when checking capability
It looks like the seccomp selftests was never actually built for sh.
This fixes it, though I don't have an environment to do a runtime test
of it yet.
Fixes: 0bb605c2c7 ("sh: Add SECCOMP_FILTER")
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/lkml/a36d7b48-6598-1642-e403-0c77a86f416d@physik.fu-berlin.de
Signed-off-by: Kees Cook <keescook@chromium.org>