Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
Pull scheduler updates from Ingo Molnar:
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler
build, from the fast-headers tree - including placeholder *_api.h
headers for later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per
node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
sched/headers: ARM needs asm/paravirt_api_clock.h too
sched/numa: Fix boot crash on arm64 systems
headers/prep: Fix header to build standalone: <linux/psi.h>
sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
cgroup: Fix suspicious rcu_dereference_check() usage warning
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
sched/deadline,rt: Remove unused functions for !CONFIG_SMP
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
sched/deadline: Remove unused def_dl_bandwidth
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
sched/tracing: Don't re-read p->state when emitting sched_switch event
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
sched/cpuacct: Remove redundant RCU read lock
sched/cpuacct: Optimize away RCU read lock
sched/cpuacct: Fix charge percpu cpuusage
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
...
Pull execve updates from Kees Cook:
"Execve and binfmt updates.
Eric and I have stepped up to be the active maintainers of this area,
so here's our first collection. The bulk of the work was in coredump
handling fixes; additional details are noted below:
- Handle unusual AT_PHDR offsets (Akira Kawata)
- Fix initial mapping size when PT_LOADs are not ordered (Alexey
Dobriyan)
- Move more code under CONFIG_COREDUMP (Alexey Dobriyan)
- Fix missing mmap_lock in file_files_note (Eric W. Biederman)
- Remove a.out support for alpha and m68k (Eric W. Biederman)
- Include first pages of non-exec ELF libraries in coredump (Jann
Horn)
- Don't write past end of notes for regset gap in coredump (Rick
Edgecombe)
- Comment clean-ups (Tom Rix)
- Force single empty string when argv is empty (Kees Cook)
- Add NULL argv selftest (Kees Cook)
- Properly redefine PT_GNU_* in terms of PT_LOOS (Kees Cook)
- MAINTAINERS: Update execve entry with tree (Kees Cook)
- Introduce initial KUnit testing for binfmt_elf (Kees Cook)"
* tag 'execve-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: Don't write past end of notes for regset gap
a.out: Stop building a.out/osf1 support on alpha and m68k
coredump: Don't compile flat_core_dump when coredumps are disabled
coredump: Use the vma snapshot in fill_files_note
coredump/elf: Pass coredump_params into fill_note_info
coredump: Remove the WARN_ON in dump_vma_snapshot
coredump: Snapshot the vmas in do_coredump
coredump: Move definition of struct coredump_params into coredump.h
binfmt_elf: Introduce KUnit test
ELF: Properly redefine PT_GNU_* in terms of PT_LOOS
MAINTAINERS: Update execve entry with more details
exec: cleanup comments
fs/binfmt_elf: Refactor load_elf_binary function
fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
binfmt: move more stuff undef CONFIG_COREDUMP
selftests/exec: Test for empty string on NULL argv
exec: Force single empty string when argv is empty
coredump: Also dump first pages of non-executable ELF libraries
ELF: fix overflow in total mapping size calculation
Pull RCU updates from Paul McKenney:
- Fix idle detection (Neeraj Upadhyay) and missing access marking
detected by KCSAN.
- Reduce coupling between rcu_barrier() and CPU-hotplug operations, so
that rcu_barrier() no longer needs to do cpus_read_lock(). This may
also someday allow system boot to bring CPUs online concurrently.
- Enable more aggressive movement to per-CPU queueing when reacting to
excessive lock contention due to workloads placing heavy update-side
stress on RCU tasks.
- Improvements to RCU priority boosting, including changes from Neeraj
Upadhyay, Zqiang, and Alison Chaiken.
- Various fixes improving test robustness and debug information.
- Add tests for SRCU size transitions, further compress torture.sh
build products, and improve debug output.
- Miscellaneous fixes.
* tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
rcu: Replace cpumask_weight with cpumask_empty where appropriate
rcu: Remove __read_mostly annotations from rcu_scheduler_active externs
rcu: Uninline multi-use function: finish_rcuwait()
rcu: Mark writes to the rcu_segcblist structure's ->flags field
kasan: Record work creation stack trace with interrupts enabled
rcu: Inline __call_rcu() into call_rcu()
rcu: Add mutex for rcu boost kthread spawning and affinity setting
rcu: Fix description of kvfree_rcu()
MAINTAINERS: Add Frederic and Neeraj to their RCU files
rcutorture: Provide non-power-of-two Tasks RCU scenarios
rcutorture: Test SRCU size transitions
torture: Make torture.sh help message match reality
rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention
rcu-tasks: Use order_base_2() instead of ilog2()
rcu: Create and use an rcu_rdp_cpu_online()
rcu: Make rcu_barrier() no longer block CPU-hotplug operations
rcu: Rework rcu_barrier() and callback-migration logic
rcu: Refactor rcu_barrier() empty-list handling
rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
torture: Change KVM environment variable to RCUTORTURE
...
Pull x86 SGX updates from Borislav Petkov:
- A couple of fixes and improvements to the SGX selftests
* tag 'x86_sgx_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/sgx: Treat CC as one argument
selftests/x86: Add validity check and allow field splitting
selftests/sgx: Remove extra newlines in test output
selftests/sgx: Ensure enclave data available during debug print
selftests/sgx: Do not attempt enclave build without valid enclave
selftests/sgx: Fix NULL-pointer-dereference upon early test failure
Pull arm64 updates from Will Deacon:
- Support for including MTE tags in ELF coredumps
- Instruction encoder updates, including fixes to 64-bit immediate
generation and support for the LSE atomic instructions
- Improvements to kselftests for MTE and fpsimd
- Symbol aliasing and linker script cleanups
- Reduce instruction cache maintenance performed for user mappings
created using contiguous PTEs
- Support for the new "asymmetric" MTE mode, where stores are checked
asynchronously but loads are checked synchronously
- Support for the latest pointer authentication algorithm ("QARMA3")
- Support for the DDR PMU present in the Marvell CN10K platform
- Support for the CPU PMU present in the Apple M1 platform
- Use the RNDR instruction for arch_get_random_{int,long}()
- Update our copy of the Arm optimised string routines for str{n}cmp()
- Fix signal frame generation for CPUs which have foolishly elected to
avoid building in support for the fpsimd instructions
- Workaround for Marvell GICv3 erratum #38545
- Clarification to our Documentation (booting reqs. and MTE prctl())
- Miscellanous cleanups and minor fixes
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
arm64/mte: Remove asymmetric mode from the prctl() interface
arm64: Add cavium_erratum_23154_cpus missing sentinel
perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
Documentation: vmcoreinfo: Fix htmldocs warning
kasan: fix a missing header include of static_keys.h
drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
drivers/perf: arm_pmu: Handle 47 bit counters
arm64: perf: Consistently make all event numbers as 16-bits
arm64: perf: Expose some Armv9 common events under sysfs
perf/marvell: cn10k DDR perf event core ownership
perf/marvell: cn10k DDR perfmon event overflow handling
perf/marvell: CN10k DDR performance monitor support
dt-bindings: perf: marvell: cn10k ddr performance monitor
arm64: clean up tools Makefile
perf/arm-cmn: Update watchpoint format
perf/arm-cmn: Hide XP PUB events for CMN-600
arm64: drop unused includes of <linux/personality.h>
arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
...
Pull tpm updates from Jarkko Sakkinen:
"In order to split the work a bit we've aligned with David Howells more
or less that I take more hardware/firmware aligned keyring patches,
and he takes care more of the framework aligned patches.
For TPM the patches worth of highlighting are the fixes for
refcounting provided by Lino Sanfilippo and James Bottomley.
Eric B. has done a bunch obvious (but important) fixes but there's one
a bit controversial: removal of asym_tpm. It was added in 2018 when
TPM1 was already declared as insecure and world had moved on to TPM2.
I don't know how this has passed all the filters but I did not have a
chance to see the patches when they were out. I simply cannot commit
to maintaining this because it was from all angles just wrong to take
it in the first place to the mainline kernel. Nobody should use this
module really for anything.
Finally, there is a new keyring '.machine' to hold MOK keys ('Machine
Owner Keys'). In the mok side MokListTrustedRT UEFI variable can be
set, from which kernel knows that MOK keys are kernel trusted keys and
they are populated to the machine keyring. This keyring linked to the
secondary trusted keyring, which means that can be used like any
kernel trusted keys. This keyring of course can be used to hold other
MOK'ish keys in other platforms in future"
* tag 'tpmdd-next-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (24 commits)
tpm: use try_get_ops() in tpm-space.c
KEYS: asymmetric: properly validate hash_algo and encoding
KEYS: asymmetric: enforce that sig algo matches key algo
KEYS: remove support for asym_tpm keys
tpm: fix reference counting for struct tpm_chip
integrity: Only use machine keyring when uefi_check_trust_mok_keys is true
integrity: Trust MOK keys if MokListTrustedRT found
efi/mokvar: move up init order
KEYS: Introduce link restriction for machine keys
KEYS: store reference to machine keyring
integrity: add new keyring handler for mok keys
integrity: Introduce a Linux keyring called machine
integrity: Fix warning about missing prototypes
KEYS: trusted: Avoid calling null function trusted_key_exit
KEYS: trusted: Fix trusted key backends when building as module
tpm: xen-tpmfront: Use struct_size() helper
KEYS: x509: remove dead code that set ->unsupported_sig
KEYS: x509: remove never-set ->unsupported_key flag
KEYS: x509: remove unused fields
KEYS: x509: clearly distinguish between key and signature algorithms
...
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, ipsec, and wireless.
A few last minute revert / disable and fix patches came down from our
sub-trees. We're not waiting for any fixes at this point.
Current release - regressions:
- Revert "netfilter: nat: force port remap to prevent shadowing
well-known ports", restore working conntrack on asymmetric paths
- Revert "ath10k: drop beacon and probe response which leak from
other channel", restore working AP and mesh mode on QCA9984
- eth: intel: fix hang during reboot/shutdown
Current release - new code bugs:
- netfilter: nf_tables: disable register tracking, it needs more work
to cover all corner cases
Previous releases - regressions:
- ipv6: fix skb_over_panic in __ip6_append_data when (admin-only)
extension headers get specified
- esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return
value more selectively
- bnx2x: fix driver load failure when FW not present in initrd
Previous releases - always broken:
- vsock: stop destroying unrelated sockets in nested virtualization
- packet: fix slab-out-of-bounds access in packet_recvmsg()
Misc:
- add Paolo Abeni to networking maintainers!"
* tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits)
iavf: Fix hang during reboot/shutdown
net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload
net: bcmgenet: skip invalid partial checksums
bnx2x: fix built-in kernel driver load failure
net: phy: mscc: Add MODULE_FIRMWARE macros
net: dsa: Add missing of_node_put() in dsa_port_parse_of
net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
Revert "ath10k: drop beacon and probe response which leak from other channel"
hv_netvsc: Add check for kvmalloc_array
iavf: Fix double free in iavf_reset_task
ice: destroy flow director filter mutex after releasing VSIs
ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
Add Paolo Abeni to networking maintainers
atm: eni: Add check for dma_map_single
net/packet: fix slab-out-of-bounds access in packet_recvmsg()
net: mdio: mscc-miim: fix duplicate debugfs entry
net: phy: marvell: Fix invalid comparison in the resume and suspend functions
esp6: fix check on ipv6_skip_exthdr's return value
net: dsa: microchip: add spi_device_id tables
netfilter: nf_tables: disable register tracking
...
When building the vm selftests using clang, some errors are seen due to
having headers in the compilation command:
clang -Wall -I ../../../../usr/include -no-pie gup_test.c ../../../../mm/gup_test.h -lrt -lpthread -o .../tools/testing/selftests/vm/gup_test
clang: error: cannot specify -o when generating multiple output files
make[1]: *** [../lib.mk:146: .../tools/testing/selftests/vm/gup_test] Error 1
Rework to add the header files to LOCAL_HDRS before including ../lib.mk,
since the dependency is evaluated in '$(OUTPUT)/%:%.c $(LOCAL_HDRS)' in
file lib.mk.
Link: https://lkml.kernel.org/r/20220304000645.1888133-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net coming late
in the 5.17-rc process:
1) Revert port remap to mitigate shadowing service ports, this is causing
problems in existing setups and this mitigation can be achieved with
explicit ruleset, eg.
... tcp sport < 16386 tcp dport >= 32768 masquerade random
This patches provided a built-in policy similar to the one described above.
2) Disable register tracking infrastructure in nf_tables. Florian reported
two issues:
- Existing expressions with no implemented .reduce interface
that causes data-store on register should cancel the tracking.
- Register clobbering might be possible storing data on registers that
are larger than 32-bits.
This might lead to generating incorrect ruleset bytecode. These two
issues are scheduled to be addressed in the next release cycle.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disable register tracking
Revert "netfilter: conntrack: tag conntracks picked up in local out hook"
Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"
====================
Link: https://lore.kernel.org/r/20220312220315.64531-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, and ipsec.
Current release - regressions:
- Bluetooth: fix unbalanced unlock in set_device_flags()
- Bluetooth: fix not processing all entries on cmd_sync_work, make
connect with qualcomm and intel adapters reliable
- Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"
- xdp: xdp_mem_allocator can be NULL in trace_mem_connect()
- eth: ice: fix race condition and deadlock during interface enslave
Current release - new code bugs:
- tipc: fix incorrect order of state message data sanity check
Previous releases - regressions:
- esp: fix possible buffer overflow in ESP transformation
- dsa: unlock the rtnl_mutex when dsa_master_setup() fails
- phy: meson-gxl: fix interrupt handling in forced mode
- smsc95xx: ignore -ENODEV errors when device is unplugged
Previous releases - always broken:
- xfrm: fix tunnel mode fragmentation behavior
- esp: fix inter address family tunneling on GSO
- tipc: fix null-deref due to race when enabling bearer
- sctp: fix kernel-infoleak for SCTP sockets
- eth: macb: fix lost RX packet wakeup race in NAPI receive
- eth: intel stop disabling VFs due to PF error responses
- eth: bcmgenet: don't claim WOL when its not available"
* tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
xdp: xdp_mem_allocator can be NULL in trace_mem_connect().
ice: Fix race condition during interface enslave
net: phy: meson-gxl: improve link-up behavior
net: bcmgenet: Don't claim WOL when its not available
net: arc_emac: Fix use after free in arc_mdio_probe()
sctp: fix kernel-infoleak for SCTP sockets
net: phy: correct spelling error of media in documentation
net: phy: DP83822: clear MISR2 register to disable interrupts
gianfar: ethtool: Fix refcount leak in gfar_get_ts_info
selftests: pmtu.sh: Kill nettest processes launched in subshell.
selftests: pmtu.sh: Kill tcpdump processes launched by subshell.
NFC: port100: fix use-after-free in port100_send_complete
net/mlx5e: SHAMPO, reduce TIR indication
net/mlx5e: Lag, Only handle events from highest priority multipath entry
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
net/mlx5: Fix a race on command flush flow
net/mlx5: Fix size field in bufferx_reg struct
ax25: Fix NULL pointer dereference in ax25_kill_by_device
net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr
net: ethernet: lpc_eth: Handle error for clk_enable
...
When using "run_cmd <command> &", then "$!" refers to the PID of the
subshell used to run <command>, not the command itself. Therefore
nettest_pids actually doesn't contain the list of the nettest commands
running in the background. So cleanup() can't kill them and the nettest
processes run until completion (fortunately they have a 5s timeout).
Fix this by defining a new command for running processes in the
background, for which "$!" really refers to the PID of the command run.
Also, double quote variables on the modified lines, to avoid shellcheck
warnings.
Fixes: ece1278a9b ("selftests: net: add ESP-in-UDP PMTU test")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cleanup() function takes care of killing processes launched by the
test functions. It relies on variables like ${tcpdump_pids} to get the
relevant PIDs. But tests are run in their own subshell, so updated
*_pids values are invisible to other shells. Therefore cleanup() never
sees any process to kill:
$ ./tools/testing/selftests/net/pmtu.sh -t pmtu_ipv4_exception
TEST: ipv4: PMTU exceptions [ OK ]
TEST: ipv4: PMTU exceptions - nexthop objects [ OK ]
$ pgrep -af tcpdump
6084 tcpdump -s 0 -i veth_A-R1 -w pmtu_ipv4_exception_veth_A-R1.pcap
6085 tcpdump -s 0 -i veth_R1-A -w pmtu_ipv4_exception_veth_R1-A.pcap
6086 tcpdump -s 0 -i veth_R1-B -w pmtu_ipv4_exception_veth_R1-B.pcap
6087 tcpdump -s 0 -i veth_B-R1 -w pmtu_ipv4_exception_veth_B-R1.pcap
6088 tcpdump -s 0 -i veth_A-R2 -w pmtu_ipv4_exception_veth_A-R2.pcap
6089 tcpdump -s 0 -i veth_R2-A -w pmtu_ipv4_exception_veth_R2-A.pcap
6090 tcpdump -s 0 -i veth_R2-B -w pmtu_ipv4_exception_veth_R2-B.pcap
6091 tcpdump -s 0 -i veth_B-R2 -w pmtu_ipv4_exception_veth_B-R2.pcap
6228 tcpdump -s 0 -i veth_A-R1 -w pmtu_ipv4_exception_veth_A-R1.pcap
6229 tcpdump -s 0 -i veth_R1-A -w pmtu_ipv4_exception_veth_R1-A.pcap
6230 tcpdump -s 0 -i veth_R1-B -w pmtu_ipv4_exception_veth_R1-B.pcap
6231 tcpdump -s 0 -i veth_B-R1 -w pmtu_ipv4_exception_veth_B-R1.pcap
6232 tcpdump -s 0 -i veth_A-R2 -w pmtu_ipv4_exception_veth_A-R2.pcap
6233 tcpdump -s 0 -i veth_R2-A -w pmtu_ipv4_exception_veth_R2-A.pcap
6234 tcpdump -s 0 -i veth_R2-B -w pmtu_ipv4_exception_veth_R2-B.pcap
6235 tcpdump -s 0 -i veth_B-R2 -w pmtu_ipv4_exception_veth_B-R2.pcap
Fix this by running cleanup() in the context of the test subshell.
Now that each test cleans the environment after completion, there's no
need for calling cleanup() again when the next test starts. So let's
drop it from the setup() function. This is okay because cleanup() is
also called when pmtu.sh starts, so even the first test starts in a
clean environment.
Also, use tcpdump's immediate mode. Otherwise it might not have time to
process buffered packets, resulting in missing packets or even empty
pcap files for short tests.
Note: PAUSE_ON_FAIL is still evaluated before cleanup(), so one can
still inspect the test environment upon failure when using -p.
Fixes: a92a0a7b8e ("selftests: pmtu: Simplify cleanup and namespace names")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 878aed8db3.
This change breaks existing setups where conntrack is used with
asymmetric paths.
In these cases, the NAT transformation occurs on the syn-ack instead of
the syn:
1. SYN x:12345 -> y -> 443 // sent by initiator, receiverd by responder
2. SYNACK y:443 -> x:12345 // First packet seen by conntrack, as sent by responder
3. tuple_force_port_remap() gets called, sees:
'tcp from 443 to port 12345 NAT' -> pick a new source port, inititor receives
4. SYNACK y:$RANDOM -> x:12345 // connection is never established
While its possible to avoid the breakage with NOTRACK rules, a kernel
update should not break working setups.
An alternative to the revert is to augment conntrack to tag
mid-stream connections plus more code in the nat core to skip NAT
for such connections, however, this leads to more interaction/integration
between conntrack and NAT.
Therefore, revert, users will need to add explicit nat rules to avoid
port shadowing.
Link: https://lore.kernel.org/netfilter-devel/20220302105908.GA5852@breakpoint.cc/#R
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2051413
Signed-off-by: Florian Westphal <fw@strlen.de>
Determine an available PCR bank to be used by a test case by querying the
capability TPM2_GET_CAP. The TPM2 returns TPML_PCR_SELECTIONS that
contains an array of TPMS_PCR_SELECTIONs indicating available PCR banks
and the bitmasks that show which PCRs are enabled in each bank. Collect
the data in a dictionary. From the dictionary determine the PCR bank that
has the PCRs enabled that the test needs. This avoids test failures with
TPM2's that either to not have a SHA-1 bank or whose SHA-1 bank is
disabled.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
If the test triggers a problem it may well result in a log message from
the kernel such as a WARN() or BUG(). If these include a PID it can help
with debugging to know if it was the parent or child process that triggered
the issue, since the test is just creating a new thread the process name
will be the same either way. Print the PIDs of the parent and child on
startup so users have this information to hand should it be needed.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220303192817.2732509-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The error message when I build vm tests on debian10 (GLIBC 2.28):
userfaultfd.c: In function `userfaultfd_pagemap_test':
userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
in this function); did you mean `MADV_RANDOM'?
if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
^~~~~~~~~~~~
MADV_RANDOM
This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.
Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.com
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, xfrm, wifi, bluetooth, and netfilter.
Lots of various size fixes, the length of the tag speaks for itself.
Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees
which had been lagging as you pointed out previously. But there's also
a larger than we'd like portion of fixes for bugs from previous
releases.
Three more fixes still under discussion, including and xfrm revert for
uAPI error.
Current release - regressions:
- iwlwifi: don't advertise TWT support, prevent FW crash
- xfrm: fix the if_id check in changelink
- xen/netfront: destroy queues before real_num_tx_queues is zeroed
- bluetooth: fix not checking MGMT cmd pending queue, make scanning
work again
Current release - new code bugs:
- mptcp: make SIOCOUTQ accurate for fallback socket
- bluetooth: access skb->len after null check
- bluetooth: hci_sync: fix not using conn_timeout
- smc: fix cleanup when register ULP fails
- dsa: restore error path of dsa_tree_change_tag_proto
- iwlwifi: fix build error for IWLMEI
- iwlwifi: mvm: propagate error from request_ownership to the user
Previous releases - regressions:
- xfrm: fix pMTU regression when reported pMTU is too small
- xfrm: fix TCP MSS calculation when pMTU is close to 1280
- bluetooth: fix bt_skb_sendmmsg not allocating partial chunks
- ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks
- ipv6: prevent leaks in igmp6 when input queues get full
- fix up skbs delta_truesize in UDP GRO frag_list
- eth: e1000e: fix possible HW unit hang after an s0ix exit
- eth: e1000e: correct NVM checksum verification flow
- ptp: ocp: fix large time adjustments
Previous releases - always broken:
- tcp: make tcp_read_sock() more robust in presence of urgent data
- xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate
- xfrm: fix xfrm_migrate issues when address family changes
- dcb: flush lingering app table entries for unregistered devices
- smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
- mac80211: fix EAPoL rekey fail in 802.3 rx path
- mac80211: fix forwarded mesh frames AC & queue selection
- netfilter: nf_queue: fix socket access races and bugs
- batman-adv: fix ToCToU iflink problems and check the result belongs
to the expected net namespace
- can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting
- can: rcar_canfd: register the CAN device when fully ready
- eth: igb, igc: phy: drop premature return leaking HW semaphore
- eth: ixgbe: xsk: change !netif_carrier_ok() handling in
ixgbe_xmit_zc(), prevent live lock when link goes down
- eth: stmmac: only enable DMA interrupts when ready
- eth: sparx5: move vlan checks before any changes are made
- eth: iavf: fix races around init, removal, resets and vlan ops
- ibmvnic: more reset flow fixes
Misc:
- eth: fix return value of __setup handlers"
* tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
selftests: mlxsw: resource_scale: Fix return value
selftests: mlxsw: tc_police_scale: Make test more robust
net: dcb: disable softirqs in dcbnl_flush_dev()
bnx2: Fix an error message
sfc: extend the locking on mcdi->seqno
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
tcp: make tcp_read_sock() more robust
bpf, sockmap: Do not ignore orig_len parameter
net: ipa: add an interconnect dependency
net: fix up skbs delta_truesize in UDP GRO frag_list
iwlwifi: mvm: return value for request_ownership
nl80211: Update bss channel on channel switch for P2P_CLIENT
iwlwifi: fix build error for IWLMEI
ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
batman-adv: Don't expect inter-netns unique iflink indices
...
The test runs several test cases and is supposed to return an error in
case at least one of them failed.
Currently, the check of the return value of each test case is in the
wrong place, which can result in the wrong return value. For example:
# TESTS='tc_police' ./resource_scale.sh
TEST: 'tc_police' [default] 968 [FAIL]
tc police offload count failed
Error: mlxsw_spectrum: Failed to allocate policer index.
We have an error talking to the kernel
Command failed /tmp/tmp.i7Oc5HwmXY:969
TEST: 'tc_police' [default] overflow 969 [ OK ]
...
TEST: 'tc_police' [ipv4_max] overflow 969 [ OK ]
$ echo $?
0
Fix this by moving the check to be done after each test case.
Fixes: 059b18e21c ("selftests: mlxsw: Return correct error code in resource scale test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test adds tc filters and checks how many of them were offloaded by
grepping for 'in_hw'.
iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control
action offload") added offload indication to tc actions, producing the
following output:
$ tc filter show dev swp2 ingress
...
filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
eth_type ipv6
dst_ip 2001:db8:1::7bf
skip_sw
in_hw in_hw_count 1
action order 1: police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
ref 1 bind 1
not_in_hw
used_hw_stats immediate
The current grep expression matches on both 'in_hw' and 'not_in_hw',
resulting in incorrect results.
Fix that by using JSON output instead.
Fixes: 5061e77326 ("selftests: mlxsw: Add scale test for tc-police")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Use kfree_rcu(ptr, rcu) variant, using kfree_rcu(ptr) was not
intentional. From Eric Dumazet.
2) Use-after-free in netfilter hook core, from Eric Dumazet.
3) Missing rcu read lock side for netfilter egress hook,
from Florian Westphal.
4) nf_queue assume state->sk is full socket while it might not be.
Invoke sock_gen_put(), from Florian Westphal.
5) Add selftest to exercise the reported KASAN splat in 4)
6) Fix possible use-after-free in nf_queue in case sk_refcnt is 0.
Also from Florian.
7) Use input interface index only for hardware offload, not for
the software plane. This breaks tc ct action. Patch from Paul Blakey.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
netfilter: nf_queue: handle socket prefetch
netfilter: nf_queue: fix possible use-after-free
selftests: netfilter: add nfqueue TCP_NEW_SYN_RECV socket race test
netfilter: nf_queue: don't assume sk is full socket
netfilter: egress: silence egress hook lockdep splats
netfilter: fix use-after-free in __nf_register_net_hook()
netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
====================
Link: https://lore.kernel.org/r/20220301215337.378405-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull kvm fixes from Paolo Bonzini:
"The bigger part of the change is a revert for x86 hosts. Here the
second patch was supposed to fix the first, but in reality it was just
as broken, so both have to go.
x86 host:
- Revert incorrect assumption that cr3 changes come with preempt
notifier callbacks (they don't when static branches are changed,
for example)
ARM host:
- Correctly synchronise PMR and co on PSCI CPU_SUSPEND
- Skip tests that depend on GICv3 when the HW isn't available"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: aarch64: Skip tests if we can't create a vgic-v3
Revert "KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()"
Revert "KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()"
KVM: arm64: Don't miss pending interrupts for suspended vCPU
causes:
BUG: KASAN: slab-out-of-bounds in sk_free+0x25/0x80
Write of size 4 at addr ffff888106df0284 by task nf-queue/1459
sk_free+0x25/0x80
nf_queue_entry_release_refs+0x143/0x1a0
nf_reinject+0x233/0x770
... without 'netfilter: nf_queue: don't assume sk is full socket'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Running the memfd script ./run_hugetlbfs_test.sh will often end in error
as follows:
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
memfd-hugetlb: SEAL-SHRINK
fallocate(ALLOC) failed: No space left on device
./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs
opening: ./mnt/memfd
fuse: DONE
If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will
allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE
test the mfd_fail_write routine maps the file, but does not unmap. As a
result, two hugetlb pages remain reserved for the mapping. When the
fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb
pages, it is short by the two reserved pages.
Fix by making sure to unmap in mfd_fail_write.
Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The arch_timer and vgic_irq kselftests assume that they can create a
vgic-v3, using the library function vgic_v3_setup() which aborts with a
test failure if it is not possible to do so. Since vgic-v3 can only be
instantiated on systems where the host has GICv3 this leads to false
positives on older systems where that is not the case.
Fix this by changing vgic_v3_setup() to return an error if the vgic can't
be instantiated and have the callers skip if this happens. We could also
exit flagging a skip in vgic_v3_setup() but this would prevent future test
cases conditionally deciding which GIC to use or generally doing more
complex output.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Tested-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220223131624.1830351-1-broonie@kernel.org
After commit 05be5e273c ("selftests: mptcp: add disconnect tests")
the mptcp selftests leave behind a couple of tmp files after
each run. run_tests_disconnect() misnames a few variables used to
track them. Address the issue setting the appropriate global variables
Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
Current release - regressions:
- bpf: fix crash due to out of bounds access into reg2btf_ids
- mvpp2: always set port pcs ops, avoid null-deref
- eth: marvell: fix driver load from initrd
- eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
Current release - new code bugs:
- mptcp: fix race in overlapping signal events
Previous releases - regressions:
- xen-netback: revert hotplug-status changes causing devices to not
be configured
- dsa:
- avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
held
- fix panic when removing unoffloaded port from bridge
- dsa: microchip: fix bridging with more than two member ports
Previous releases - always broken:
- bpf:
- fix crash due to incorrect copy_map_value when both spin lock
and timer are present in a single value
- fix a bpf_timer initialization issue with clang
- do not try bpf_msg_push_data with len 0
- add schedule points in batch ops
- nf_tables:
- unregister flowtable hooks on netns exit
- correct flow offload action array size
- fix a couple of memory leaks
- vsock: don't check owner in vhost_vsock_stop() while releasing
- gso: do not skip outer ip header in case of ipip and net_failover
- smc: use a mutex for locking "struct smc_pnettable"
- openvswitch: fix setting ipv6 fields causing hw csum failure
- mptcp: fix race in incoming ADD_ADDR option processing
- sysfs: add check for netdevice being present to speed_show
- sched: act_ct: fix flow table lookup after ct clear or switching
zones
- eth: intel: fixes for SR-IOV forwarding offloads
- eth: broadcom: fixes for selftests and error recovery
- eth: mellanox: flow steering and SR-IOV forwarding fixes
Misc:
- make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
friends not report freed skbs as drops
- force inlining of checksum functions in net/checksum.h"
* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: mv643xx_eth: process retval from of_get_mac_address
ping: remove pr_err from ping_lookup
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
openvswitch: Fix setting ipv6 fields causing hw csum failure
ipv6: prevent a possible race condition with lifetimes
net/smc: Use a mutex for locking "struct smc_pnettable"
bnx2x: fix driver load from initrd
Revert "xen-netback: Check for hotplug-status existence before watching"
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
net/mlx5e: Fix VF min/max rate parameters interchange mistake
net/mlx5e: Add missing increment of count
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
net/mlx5e: Add feature check for set fec counters
net/mlx5e: TC, Skip redundant ct clear actions
net/mlx5e: TC, Reject rules with forward and drop actions
net/mlx5e: TC, Reject rules with drop and modify hdr action
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
net/mlx5: Fix possible deadlock on rule deletion
...
New conflicts in sched/core due to the following upstream fixes:
44585f7bc0 ("psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n")
a06247c680 ("psi: Fix uaf issue when psi trigger is destroyed while being polled")
Conflicts:
include/linux/psi_types.h
kernel/sched/psi.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull mount_setattr test/doc fixes from Christian Brauner:
"This contains a fix for one of the selftests for the mount_setattr
syscall to create idmapped mounts, an entry for idmapped mounts for
maintainers, and missing kernel documentation for the helper we split
out some time ago to get and yield write access to a mount when
changing mount properties"
* tag 'fs.mount_setattr.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: add kernel doc for mnt_{hold,unhold}_writers()
MAINTAINERS: add entry for idmapped mounts
tests: fix idmapped mount_setattr test
Since commit 2843ff6f36 ("mptcp: remote addresses fullmesh"), an
MPTCP client can attempt creating multiple MPJ subflow simultaneusly.
In such scenario the server, when syncookies are enabled, could end-up
accepting incoming MPJ syn even above the configured subflow limit, as
the such limit can be enforced in a reliable way only after the subflow
creation. In case of syncookie, only after the 3rd ack reception.
As a consequence the related self-tests case sporadically fails, as it
verify that the server always accept the expected number of MPJ syn.
Address the issues relaxing the MPJ syn number constrain. Note that the
check on the accepted number of MPJ 3rd ack still remains intact.
Fixes: 2843ff6f36 ("mptcp: remote addresses fullmesh")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of waiting for an arbitrary amount of time for the MPTCP
MP_CAPABLE handshake to complete, explicitly wait for the relevant
socket to enter into the established status.
Additionally let the data transfer application use the slowest
transfer mode available (-r), to cope with very slow host, or
high jitter caused by hosting VMs.
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/258
Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Kselftest fixes from Shuah Khan:
"Fixes to ftrace, exec, and seccomp tests build, run-time and install
bugs. These bugs are in the way of running the tests"
* tag 'linux-kselftest-fixes-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT
selftests/seccomp: Fix seccomp failure by adding missing headers
selftests/exec: Add non-regular to TEST_GEN_PROGS
Alexei Starovoitov says:
====================
pull-request: bpf 2022-02-17
We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 8 files changed, 119 insertions(+), 15 deletions(-).
The main changes are:
1) Add schedule points in map batch ops, from Eric.
2) Fix bpf_msg_push_data with len 0, from Felix.
3) Fix crash due to incorrect copy_map_value, from Kumar.
4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.
5) Fix a bpf_timer initialization issue with clang, from Yonghong.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add schedule points in batch ops
bpf: Fix crash due to out of bounds access into reg2btf_ids.
selftests: bpf: Check bpf_msg_push_data return value
bpf: Fix a bpf_timer initialization issue
bpf: Emit bpf_timer in vmlinux BTF
selftests/bpf: Add test for bpf_timer overwriting crash
bpf: Fix crash due to incorrect copy_map_value
bpf: Do not try bpf_msg_push_data with len 0
====================
Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>