- Have reading of event format files test if the meta data still exists.
When a event is freed, a flag (EVENT_FILE_FL_FREED) in the meta data is
set to state that it is to prevent any new references to it from happening
while waiting for existing references to close. When the last reference
closes, the meta data is freed. But the "format" was missing a check to
this flag (along with some other files) that allowed new references to
happen, and a use-after-free bug to occur.
- Have the trace event meta data use the refcount infrastructure instead
of relying on its own atomic counters.
- Have tracefs inodes use alloc_inode_sb() for allocation instead of
using kmem_cache_alloc() directly.
- Have eventfs_create_dir() return an ERR_PTR instead of NULL as
the callers expect a real object or an ERR_PTR.
- Have release_ei() use call_srcu() and not call_rcu() as all the
protection is on SRCU and not RCU.
- Fix ftrace_graph_ret_addr() to use the task passed in and not current.
- Fix overflow bug in get_free_elt() where the counter can overflow
the integer and cause an infinite loop.
- Remove unused function ring_buffer_nr_pages()
- Have tracefs freeing use the inode RCU infrastructure instead of
creating its own. When the kernel had randomize structure fields
enabled, the rcu field of the tracefs_inode was overlapping the
rcu field of the inode structure, and corrupting it. Instead,
use the destroy_inode() callback to do the initial cleanup of
the code, and then have free_inode() free it.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZrTvXxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qu39AP9ze6ELpShDrxbXhf0adbNqG2IXMepa
MMLqfq8tU8E/vAEAuZXJ6rKXeGvKeONa06ocvWJ0dpb2cy/n4hmx+KtM5gI=
=Pkh4
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Have reading of event format files test if the metadata still exists.
When a event is freed, a flag (EVENT_FILE_FL_FREED) in the metadata
is set to state that it is to prevent any new references to it from
happening while waiting for existing references to close. When the
last reference closes, the metadata is freed. But the "format" was
missing a check to this flag (along with some other files) that
allowed new references to happen, and a use-after-free bug to occur.
- Have the trace event meta data use the refcount infrastructure
instead of relying on its own atomic counters.
- Have tracefs inodes use alloc_inode_sb() for allocation instead of
using kmem_cache_alloc() directly.
- Have eventfs_create_dir() return an ERR_PTR instead of NULL as the
callers expect a real object or an ERR_PTR.
- Have release_ei() use call_srcu() and not call_rcu() as all the
protection is on SRCU and not RCU.
- Fix ftrace_graph_ret_addr() to use the task passed in and not
current.
- Fix overflow bug in get_free_elt() where the counter can overflow the
integer and cause an infinite loop.
- Remove unused function ring_buffer_nr_pages()
- Have tracefs freeing use the inode RCU infrastructure instead of
creating its own.
When the kernel had randomize structure fields enabled, the rcu field
of the tracefs_inode was overlapping the rcu field of the inode
structure, and corrupting it. Instead, use the destroy_inode()
callback to do the initial cleanup of the code, and then have
free_inode() free it.
* tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracefs: Use generic inode RCU for synchronizing freeing
ring-buffer: Remove unused function ring_buffer_nr_pages()
tracing: Fix overflow in get_free_elt()
function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
eventfs: Use SRCU for freeing eventfs_inodes
eventfs: Don't return NULL in eventfs_create_dir()
tracefs: Fix inode allocation
tracing: Use refcount for trace_event_file reference counter
tracing: Have format file honor EVENT_FILE_FL_FREED
Assorted little stuff:
- lockdep fixup for lockdep_set_notrack_class()
- we can now remove a device when using erasure coding without
deadlocking, though we still hit other issues
- the "allocator stuck" timeout is now configurable, and messages are
ratelimited; default timeout has been increased from 10 seconds to 30
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAma05JwACgkQE6szbY3K
bnYE+Q//VJ6R/UxDoxjk8zgftRcdgwXod6U+/E0Kj3ZBKLYXkcGaWWmmGMkFafBp
eL7Y3wtHSKiMsHYX9KEdFUZFLe1KI4c16RgNIXk9nwkF+3/+8pEDHKPFuoGHJH3O
HComHGqwVg8Zx2jRNvEkvQ980iH7OBGhCjMFXhJ3xbMGLdw91TQQi49a+Q/vz7QT
y3Cl1dgX5xBl7fqKefsYa+X6mpWi4/6t60vJvatI+bvDfznjI6jN3qGVLlQye7tC
6VbJAjHsPPyNMlWa99UaHqDdaM325zR2ES0bsfHd8Up4iAwO8OgjzYQxpYTgi51i
6DTiGEOV2S8gF+Rnprnbzsnau0hEvrtQY2Ub85TCIGbZJa8b+aDIlq9k8jF36O2E
2CUTleQ/E129RxXpkZGsVRpNmemdCi6rHAcluaFEgezX4FJH8BVOwQQq2Xz7rd7E
3ZP5iAWmX0IgOL0VOCP/ZXl/lEMwSk0VAED3jEbT7f2K7rU9nXDO2bIEx1wXDCm1
b32kvmUi2FBjqLHSqvAPEb52tvvZuliMUY7z9dEx+AX9AVC9kGE+amGexosKb/LY
nWzey+D0cKHtgbkMFrCClkpg75Tnt9ISJbad53+5qhN8an/a71djdj8Zk0UQnQjv
6Amv4Ns1lDo3XGC1QtYkF5HqiWaupbUXAftptpS4Av4X1zZEQIc=
=q1dD
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Assorted little stuff:
- lockdep fixup for lockdep_set_notrack_class()
- we can now remove a device when using erasure coding without
deadlocking, though we still hit other issues
- the 'allocator stuck' timeout is now configurable, and messages are
ratelimited. The default timeout has been increased from 10 seconds
to 30"
* tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs:
bcachefs: Use bch2_wait_on_allocator() in btree node alloc path
bcachefs: Make allocator stuck timeout configurable, ratelimit messages
bcachefs: Add missing path_traverse() to btree_iter_next_node()
bcachefs: ec should not allocate from ro devs
bcachefs: Improved allocator debugging for ec
bcachefs: Add missing bch2_trans_begin() call
bcachefs: Add a comment for bucket helper types
bcachefs: Don't rely on implicit unsigned -> signed integer conversion
lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT
bcachefs: Fix double free of ca->buckets_nouse
Russell King reported that the arm cbc(aes) crypto module hangs when
loaded, and Herbert Xu bisected it to commit 9b9879fc03 ("modules:
catch concurrent module loads, treat them as idempotent"), and noted:
"So what's happening here is that the first modprobe tries to load a
fallback CBC implementation, in doing so it triggers a load of the
exact same module due to module aliases.
IOW we're loading aes-arm-bs which provides cbc(aes). However, this
needs a fallback of cbc(aes) to operate, which is made out of the
generic cbc module + any implementation of aes, or ecb(aes). The
latter happens to also be provided by aes-arm-cb so that's why it
tries to load the same module again"
So loading the aes-arm-bs module ends up wanting to recursively load
itself, and the recursive load then ends up waiting for the original
module load to complete.
This is a regression, in that it used to be that we just tried to load
the module multiple times, and then as we went on to install it the
second time we would instead just error out because the module name
already existed.
That is actually also exactly what the original "catch concurrent loads"
patch did in commit 9828ed3f69 ("module: error out early on concurrent
load of the same module file"), but it turns out that it ends up being
racy, in that erroring out before the module has been fully initialized
will cause failures in dependent module loading.
See commit ac2263b588 (which was the revert of that "error out early")
commit for details about why erroring out before the module has been
initialized is actually fundamentally racy.
Now, for the actual recursive module load (as opposed to just
concurrently loading the same module twice), the race is not an issue.
At the same time it's hard for the kernel to see that this is recursion,
because the module load is always done from a usermode helper, so the
recursion is not some simple callchain within the kernel.
End result: this is not the real fix, but this at least adds a warning
for the situation (admittedly much too late for all the debugging pain
that Russell and Herbert went through) and if we can come to a
resolution on how to detect the recursion properly, this re-organizes
the code to make that easier.
Link: https://lore.kernel.org/all/ZrFHLqvFqhzykuYw@shell.armlinux.org.uk/
Reported-by: Russell King <linux@armlinux.org.uk>
Debugged-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmazQeUWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImepOhD/46XKoog/bxelKRiAbSbHslijJF
mF6VtME0zZ+f+nRU2j6q92gbE0RZsPokjNeWMhy5koj6Z7EkqMY1wa8FVYhO+OMm
p1Is7tp76iTTLXffVrL2nbA30Ak5rVBlF//sWE7yxaDbj2Y9JS2fq2Aia2bZ0G63
HOYEyXjL26H79aOEZ49HuRPrHDIIcBkjxvQqdnLm1kdD3GdSJe+20C8UWAdRbLjB
5auu64aHO6Tzu3Ka1rQddC1zGA8ZddHUmk8KXX2wPft70VWBeLehXNdrc8YjbVv3
lMEEKHpdSv//BdEXnzRfU9UaeoR2lHIo187CMKsZECZNHNTSE8mG/2uSNEqOO0Fm
mlhcTXMEcT52FqCEjmpHcRf1imI87P9celGqggMZsWwT1IAclxXNCFtLlWQqXAP0
HUdTjyn0I3e5iy02Kgf1RlfSWHaS68me5yDqdtdJByr5Gr0PI9yQ8Cc2r6EM3GtU
SqpuYTMGr8e9zoAwej9JOUTlEiMeO8ZRbgJubPtR7MRO3KvoMZ8DawUTpAzjyF87
Z/7Z/t5c8DqfPY+HaJdf55VQyzfuMS++SE4XsKXMvoqRdywmC/J+5Z6raa6bWD8c
cIzfPHdWg5I/dSYnWtFHNhz1F5qJWjjy2FCTH9Wl2PUtREMDDRzcX1jBYEc9cfmI
klySV7kOzeQJdtqRRg==
=aEeY
-----END PGP SIGNATURE-----
Merge tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Enable general EFI poweroff method to make poweroff usable on
hardwares which lack ACPI S5, use accessors to page table entries
instead of direct dereference to avoid potential problems, and two
trivial kvm cleanups"
* tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()
LoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
LoongArch: Use accessors to page table entries instead of direct dereference
LoongArch: Enable general EFI poweroff method
aren't considered necessary for earlier kernels. 5 are MM and 4 are
non-MM. No identifiable theme here - please see the individual changelogs.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZrQhyAAKCRDdBJ7gKXxA
jvLLAP46sQ/HspAbx+5JoeKBTiX6XW4Hfd+MAk++EaTAyAhnxQD+Mfq7rPOIHm/G
wiXPVvLO8FEx0lbq06rnXvdotaWFrQg=
=mlE4
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes. Five are cc:stable, the others either pertain to
post-6.10 material or aren't considered necessary for earlier kernels.
Five are MM and four are non-MM. No identifiable theme here - please
see the individual changelogs"
* tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
padata: Fix possible divide-by-0 panic in padata_mt_helper()
mailmap: update entry for David Heidelberg
memcg: protect concurrent access to mem_cgroup_idr
mm: shmem: fix incorrect aligned index when checking conflicts
mm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem
mm: list_lru: fix UAF for memory cgroup
kcov: properly check for softirq context
MAINTAINERS: Update LTP members and web
selftests: mm: add s390 to ARCH check
We are hit with a not easily reproducible divide-by-0 panic in padata.c at
bootup time.
[ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
[ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
[ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
[ 10.017908] Workqueue: events_unbound padata_mt_helper
[ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
:
[ 10.017963] Call Trace:
[ 10.017968] <TASK>
[ 10.018004] ? padata_mt_helper+0x39/0xb0
[ 10.018084] process_one_work+0x174/0x330
[ 10.018093] worker_thread+0x266/0x3a0
[ 10.018111] kthread+0xcf/0x100
[ 10.018124] ret_from_fork+0x31/0x50
[ 10.018138] ret_from_fork_asm+0x1a/0x30
[ 10.018147] </TASK>
Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0. The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.
Fix this divide-by-0 panic by making sure that chunk_size will be at least
1 no matter what the input parameters are.
Link: https://lkml.kernel.org/r/20240806174647.1050398-1-longman@redhat.com
Fixes: 004ed42638 ("padata: add basic support for multithreaded jobs")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link my old gmail address to my active email.
Link: https://lkml.kernel.org/r/20240804054704.859503-1-david@ixit.cz
Signed-off-by: David Heidelberg <david@ixit.cz>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 73f576c04b ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Link: https://lkml.kernel.org/r/20240802235822.1830976-1-shakeel.butt@linux.dev
Fixes: 73f576c04b ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In the shmem_suitable_orders() function, xa_find() is used to check for
conflicts in the pagecache to select suitable huge orders. However, when
checking each huge order in every loop, the aligned index is calculated
from the previous iteration, which may cause suitable huge orders to be
missed.
We should use the original index each time in the loop to calculate a new
aligned index for checking conflicts to avoid this issue.
Link: https://lkml.kernel.org/r/07433b0f16a152bffb8cee34934a5c040e8e2ad6.1722404078.git.baolin.wang@linux.alibaba.com
Fixes: e7a2ab7b3b ("mm: shmem: add mTHP support for anonymous shmem")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Similar to commit d659b715e9 ("mm/huge_memory: avoid PMD-size page
cache if needed"), ARM64 can support 512MB PMD-sized THP when the base
page size is 64KB, which is larger than the maximum supported page cache
size MAX_PAGECACHE_ORDER.
This is not expected. To fix this issue, use THP_ORDERS_ALL_FILE_DEFAULT
for shmem to filter allowable huge orders.
[baolin.wang@linux.alibaba.com: remove comment, per Barry]
Link: https://lkml.kernel.org/r/c55d7ef7-78aa-4ed6-b897-c3e03a3f3ab7@linux.alibaba.com
[wangkefeng.wang@huawei.com: remove local `orders']
Link: https://lkml.kernel.org/r/87769ae8-b6c6-4454-925d-1864364af9c8@huawei.com
Link: https://lkml.kernel.org/r/117121665254442c3c7f585248296495e5e2b45c.1722404078.git.baolin.wang@linux.alibaba.com
Fixes: e7a2ab7b3b ("mm: shmem: add mTHP support for anonymous shmem")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The mem_cgroup_from_slab_obj() is supposed to be called under rcu lock or
cgroup_mutex or others which could prevent returned memcg from being
freed. Fix it by adding missing rcu read lock.
Found by code inspection.
[songmuchun@bytedance.com: only grab rcu lock when necessary, per Vlastimil]
Link: https://lkml.kernel.org/r/20240801024603.1865-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20240718083607.42068-1-songmuchun@bytedance.com
Fixes: 0a97c01cd2 ("list_lru: allow explicit memcg and NUMA node selection")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When collecting coverage from softirqs, KCOV uses in_serving_softirq() to
check whether the code is running in the softirq context. Unfortunately,
in_serving_softirq() is > 0 even when the code is running in the hardirq
or NMI context for hardirqs and NMIs that happened during a softirq.
As a result, if a softirq handler contains a remote coverage collection
section and a hardirq with another remote coverage collection section
happens during handling the softirq, KCOV incorrectly detects a nested
softirq coverate collection section and prints a WARNING, as reported by
syzbot.
This issue was exposed by commit a7f3813e58 ("usb: gadget: dummy_hcd:
Switch to hrtimer transfer scheduler"), which switched dummy_hcd to using
hrtimer and made the timer's callback be executed in the hardirq context.
Change the related checks in KCOV to account for this behavior of
in_serving_softirq() and make KCOV ignore remote coverage collection
sections in the hardirq and NMI contexts.
This prevents the WARNING printed by syzbot but does not fix the inability
of KCOV to collect coverage from the __usb_hcd_giveback_urb when dummy_hcd
is in use (caused by a7f3813e58); a separate patch is required for that.
Link: https://lkml.kernel.org/r/20240729022158.92059-1-andrey.konovalov@linux.dev
Fixes: 5ff3b30ab5 ("kcov: collect coverage from interrupts")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reported-by: syzbot+2388cdaeb6b10f0c13ac@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac
Acked-by: Marco Elver <elver@google.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marcello Sylvester Bauer <sylv@sylv.io>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LTP project uses now readthedocs.org instance instead of GitHub wiki.
LTP maintainers are listed in alphabetical order.
Link: https://lkml.kernel.org/r/20240726072009.1021599-1-pvorel@suse.cz
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Li Wang <liwang@redhat.com>
Reviewed-by: Cyril Hrubis <chrubis@suse.cz>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Xiao Yang <yangx.jy@fujitsu.com>
Cc: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit 0518dbe97f ("selftests/mm: fix cross compilation with LLVM")
changed the env variable for the architecture from MACHINE to ARCH.
This is preventing 3 required TEST_GEN_FILES from being included when
cross compiling s390x and errors when trying to run the test suite. This
is due to the ARCH variable already being set and the arch folder name
being s390.
Add "s390" to the filtered list to cover this case and have the 3 files
included in the build.
Link: https://lkml.kernel.org/r/20240724213517.23918-1-npache@redhat.com
Fixes: 0518dbe97f ("selftests/mm: fix cross compilation with LLVM")
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Limit these messages to once every 2 minutes to avoid spamming logs;
with multiple devices the output can be quite significant.
Also, up the default timeout to 30 seconds from 10 seconds.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug exposed by the next path - we pop an assert in
path_set_should_be_locked().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Because ring_buffer_nr_pages() is not an inline function and user accesses
buffer->buffers[cpu]->nr_pages directly, the function ring_buffer_nr_pages
is removed.
Signed-off-by: Jianhui Zhou <912460177@qq.com>
Link: https://lore.kernel.org/tencent_F4A7E9AB337F44E0F4B858D07D19EF460708@qq.com
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.
Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.
Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 08d43a5fa0 ("tracing: Add lock-free tracing_map")
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Link: https://lore.kernel.org/20240805055922.6277-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
To mirror the SRCU lock held in eventfs_iterate() when iterating over
eventfs inodes, use call_srcu() to free them too.
This was accidentally(?) degraded to RCU in commit 43aa6f97c2
("eventfs: Get rid of dentry pointers without refcounts").
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20240723210755.8970-1-minipli@grsecurity.net
Fixes: 43aa6f97c2 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Commit 77a06c33a2 ("eventfs: Test for ei->is_freed when accessing
ei->dentry") added another check, testing if the parent was freed after
we released the mutex. If so, the function returns NULL. However, all
callers expect it to either return a valid pointer or an error pointer,
at least since commit 5264a2f4bb ("tracing: Fix a NULL vs IS_ERR() bug
in event_subsystem_dir()"). Returning NULL will therefore fail the error
condition check in the caller.
Fix this by substituting the NULL return value with a fitting error
pointer.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Fixes: 77a06c33a2 ("eventfs: Test for ei->is_freed when accessing ei->dentry")
Link: https://lore.kernel.org/20240723122522.2724-1-minipli@grsecurity.net
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The leading comment above alloc_inode_sb() is pretty explicit about it:
/*
* This must be used for allocating filesystems specific inodes to set
* up the inode reclaim context correctly.
*/
Switch tracefs over to alloc_inode_sb() to make sure inodes are properly
linked.
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20240807115143.45927-2-minipli@grsecurity.net
Fixes: ba37ff75e0 ("eventfs: Implement tracefs_inode_cache")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Instead of using an atomic counter for the trace_event_file reference
counter, use the refcount interface. It has various checks to make sure
the reference counting is correct, and will warn if it detects an error
(like refcount_inc() on '0').
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20240726144208.687cce24@rorschach.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When eventfs was introduced, special care had to be done to coordinate the
freeing of the file meta data with the files that are exposed to user
space. The file meta data would have a ref count that is set when the file
is created and would be decremented and freed after the last user that
opened the file closed it. When the file meta data was to be freed, it
would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed,
and any new references made (like new opens or reads) would fail as it is
marked freed. This allowed other meta data to be freed after this flag was
set (under the event_mutex).
All the files that were dynamically created in the events directory had a
pointer to the file meta data and would call event_release() when the last
reference to the user space file was closed. This would be the time that it
is safe to free the file meta data.
A shortcut was made for the "format" file. It's i_private would point to
the "call" entry directly and not point to the file's meta data. This is
because all format files are the same for the same "call", so it was
thought there was no reason to differentiate them. The other files
maintain state (like the "enable", "trigger", etc). But this meant if the
file were to disappear, the "format" file would be unaware of it.
This caused a race that could be trigger via the user_events test (that
would create dynamic events and free them), and running a loop that would
read the user_events format files:
In one console run:
# cd tools/testing/selftests/user_events
# while true; do ./ftrace_test; done
And in another console run:
# cd /sys/kernel/tracing/
# while true; do cat events/user_events/__test_event/format; done 2>/dev/null
With KASAN memory checking, it would trigger a use-after-free bug report
(which was a real bug). This was because the format file was not checking
the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the
event that the file meta data pointed to after the event was freed.
After inspection, there are other locations that were found to not check
the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a
new helper function: event_file_file() that will make sure that the
event_mutex is held, and will return NULL if the trace_event_file has the
EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file
pointer use event_file_file() and check for NULL. Later uses can still use
the event_file_data() helper function if the event_mutex is still held and
was not released since the event_file_file() call.
Link: https://lore.kernel.org/all/20240719204701.1605950-1-minipli@grsecurity.net/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Ilkka Naulapää <digirigawa@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Alexey Makhalov <alexey.makhalov@broadcom.com>
Cc: Vasavi Sirnapalli <vasavi.sirnapalli@broadcom.com>
Link: https://lore.kernel.org/20240730110657.3b69d3c1@gandalf.local.home
Fixes: b63db58e2f ("eventfs/tracing: Add callback for release of an eventfs_inode")
Reported-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmazbGYACgkQxWXV+ddt
WDsQAw//Z3XjjylTZPuHNk/AiMe5oochxB5T9ZracQOzG0o70gj1w/UQIZBkSzp1
66g8I4YdbwvEXKDg9Oi/GPDSON3GuhAiLXp+0Y/reeD/totgvrhROuJ3mIk5CZ0H
B4fIKH3xCKLQan26Opgju4qjum7+AR7ekFveM6GicxXXb3eAYALgoEFt63eZjZVu
7myak78gmBuK5QdGHH+onEhn+HfC57UTGBqu1bJsSOQC7dANkU+WzmgbH6FeOHqx
2T/lN/tu2tBBoF4zMvC472Zjmj4PnubNQnwwv0oJ8Z2Y0yIY95joZV0uXIXO72oz
BMQC6s0cltiTn1Tfe4iIWn+ZNjcfGAZO7aoD5NcJb2F/Mz5mZuMhtq3BND0wJ8/+
SuYk7PxX7tcOaFrDAn3Ne7XHsD7r5lLkFICXkzcNG4dqkBUxR3dN4Oi8KKMFrgCP
kTpf0/lAkYKYSrU86Yn1zhwRLaH8jm3fulVUY8i/p0tJnpsW8AmeME1Sk97Bhxbb
y6zt8+MPscPuQi3jPsBevaYd8Q8BIT34vVtU/jNmGAyqEv1wVYQRRK2Ma4sJJfNk
EEQV9i4VqXWLc17dcWVZrG6lEsVRIIBE/2Adhth6Myq73bkunpB9ZNNNZApf1T2j
Wn5gPzkuQd6FWrXe8V7oSN9rT1OPn9uHL6BmSUWhHUCuFeATgoc=
=03Gf
-----END PGP SIGNATURE-----
Merge tag 'for-6.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix double inode unlock for direct IO sync writes (reported by
syzbot)
- fix root tree id/name map definitions, don't use fixed size buffers
for name (reported by -Werror=unterminated-string-initialization)
- fix qgroup reserve leaks in bufferd write path
- update scrub status structure more often so it can be reported in
user space more accurately and let 'resume' not repeat work
- in preparation to remove space cache v1 in the future print a warning
if it's detected
* tag 'for-6.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: avoid using fixed char array size for tree names
btrfs: fix double inode unlock for direct IO sync writes
btrfs: emit a warning about space cache v1 being deprecated
btrfs: fix qgroup reserve leaks in cow_file_range
btrfs: implement launder_folio for clearing dirty page reserve
btrfs: scrub: update last_physical after scrubbing one stripe
btrfs: factor out stripe length calculation into a helper
We've had bugs in the past with incorrect integer conversions in disk
accounting code, which is why bucket helpers now always return s64s; add
a comment explaining this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
implicit integer conversion is a fertile source of bugs, and we really
would rather not have the min()/max() macros doing it implicitly.
bcachefs appears to be the only place in the kernel where this happens,
so let's fix it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The kvm_hypercall() set for LoongArch is limited to a1-a5. So the
mention of a6 in the comment is undefined that needs to be rectified.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Dandan Zhang <zhangdandan@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
As very well explained in commit 20a004e7b0 ("arm64: mm: Use
READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose
page table walker can modify the PTE in parallel must use READ_ONCE()/
WRITE_ONCE() macro to avoid any compiler transformation.
So apply that to LoongArch which is such an architecture, in order to
avoid potential problems.
Similar to commit edf9556472 ("riscv: Use accessors to page table
entries instead of direct dereference").
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
efi_shutdown_init() can register a general sys_off handler named
efi_power_off(). Enable this by providing efi_poweroff_required(),
like arm and x86. Since EFI poweroff is also supported on LoongArch,
and the enablement makes the poweroff function usable for hardwares
which lack ACPI S5.
We prefer ACPI poweroff rather than EFI poweroff (like x86), so we only
require EFI poweroff if acpi_gbl_reduced_hardware or acpi_no_s5 is true.
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Fixes a single, long-standing issue with kick pass-through vdpa.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmayY60PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRphg0IAJUlLOlK8z1j1NIu2GtMbS2+MtPg5ZTm8+UX
4okBqPg20IJoB5mRYpAwzjAoqrDsITuJJj6cG7HyT6M06C48AhpIpgbnKjOhHPdO
BrkJvBRKM1xRUAa1CY6tVw93u0lUhBc71xgGZxhqeWDUNf0g1AhZT1ZULwo99OdT
S5VSRANov3wO06Ds3Y7IfomIxvb/bsPsxbKxt1+L2qyzr2kpJnIsMKkgq9+SNLsO
J8B/FhSsSxHJQZEvdGC+LIiuB2senbOjOUiocn8kgBtEfwWKPv30D/+H8cRsUezG
RGNWy3bQ5OhlIgLDhThr5OYe6i2GJxH79xD+BXE+UKAFVGK+BEo=
=fd3h
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"Fix a single, long-standing issue with kick pass-through vdpa"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-vdpa: switch to use vmf_insert_pfn() in the fault handler
Fixes:
- intel-vbtn: ACPI notifier racing with itself.
- intel/ifs: Init local variable to cover a timeout corner case.
- WMI docs spelling
New HW Support:
- amd/{pmc,pmf}: AMD 1Ah model 60h series.
- amd/pmf: SPS quirk support for ASUS ROG Ally X
The following is an automated shortlog grouped by driver:
amd/pmc:
- Send OS_HINT command for new AMD platform
amd/pmf:
- Add new ACPI ID AMDI0107
amd: pmf:
- Add quirk for ROG Ally X
intel/ifs:
- Initialize union ifs_status to zero
intel-vbtn:
- Protect ACPI notify handler against recursion
msi-wmi-platform:
- Fix spelling mistakes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCZrHFkgAKCRBZrE9hU+XO
Mb3RAP9M0BfrdTBGxgrnR1NNGkzNWUvFPcw1OkWD+ceCpQVhAQEAoPtXnlwvn+6C
/1pKcHGHy7zjyeQINTrKBoI1KsdoCQM=
=hL5i
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes:
- Fix ACPI notifier racing with itself (intel-vbtn)
- Initialize local variable to cover a timeout corner case
(intel/ifs)
- WMI docs spelling
New device IDs:
- amd/{pmc,pmf}: AMD 1Ah model 60h series.
- amd/pmf: SPS quirk support for ASUS ROG Ally X"
* tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel/ifs: Initialize union ifs_status to zero
platform/x86: msi-wmi-platform: Fix spelling mistakes
platform/x86/amd/pmf: Add new ACPI ID AMDI0107
platform/x86/amd/pmc: Send OS_HINT command for new AMD platform
platform/x86/amd: pmf: Add quirk for ROG Ally X
platform/x86: intel-vbtn: Protect ACPI notify handler against recursion
This kselftest fixes update consists of a single fix to the conditional
in ksft.py script which incorrectly flags a test suite failed when there
are skipped tests in the mix. The logic is fixed to take skipped tests
into account and report the test as passed.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmaxM04ACgkQCwJExA0N
Qxw3/RAAseNtt0ReE92Ts6SrTRGfhUxauv65b8dZ0eCiIMxLiMHmAjjJ27eyiQfP
JH77sQibF1/6B2v4j0L5Hgy4fD2JM4aaeUR5LIkczrTsPoeGtbVwAlrxzjb2m25S
Rz0ODU3Bgk+JeyHrmGMgy4trXzgw//6bQQtjOJqTq0nzWXqpoRuLujz+b4Vej67j
xtF1ELZOWnNLV/2gORMZs2AuU+RAQDeMA6bGAVmBrbpFFWZemYtD07zF+UxT+nC3
SP6FyAGh7l3Q2NtUhKW1CnU2OV8ChWPa5qPvA6N2tVo+TzCXHVB2cmkctH+g/qfR
KgaOG48DUzmFLwR0X5MT2daQGWi3E7IumZXCPOy0kNmPTOjRQz5cjWHyYcSLhvOv
4TJmPYHuhb8P0910r0XFBkPozpbqgA1QcL38P81wOY0K4DytNEcu5bdYj849IC/V
EqpP32PvJ4AguLTSXS2rDHLIm04J2+J2+rEx2YXD8YsaG1oF9nbpSufo/sk7achA
Y/Tu8Q5CmJg6rl+Cc0ZmQD/sAqLArPhwXLBdzHLLwyvdVLYK8jYmII6RbAipmNq7
UALGhH4QrGBcqvyjZtYsyYYLUote9Lkb8ZJ4rGHXc1fVj46MAKwvfsiJqufkVbu2
7NLbmW2H7IqRgFou/I7vf2mpUxsDOXqHJCjSzHiYXLSikDvrDkA=
=DeQe
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"A single fix to the conditional in ksft.py script which incorrectly
flags a test suite failed when there are skipped tests in the mix.
The logic is fixed to take skipped tests into account and report the
test as passed"
* tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: ksft: Fix finished() helper exit code on skipped tests
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmawnmwACgkQu+CwddJF
iJqHkAf/an9TIC3VOt1LXZBXNt5xGXK5azhRbhfCih2F11lH5MlaHpuJJI8iJdVN
4G+cifmn+e9f9k+6FKc96xStV5g4OvRoxPYfZrgvcTTDDs2jCU1qyG/aDqopsyeA
zh/lcH+jXUXCpX2Y0TUhUwOeaKf2qyb2eArpw+bqnJ7aCAEbqxPi5egwA9uEO+71
g1moNP8KF3PBiOvE295RnF/+A91fOBt/1kPjTRRxWQxtp04nptATKZNEfEVFrNw5
jPata6cK1x/Hce8P2RitQsUlVBE53lllNeunZR2KQ0Qu1LiO7Yo8iyVywKhk+4V9
f8NwZ+sL+s/YCQvd2W80yhQ+iTQkKg==
=sfE2
-----END PGP SIGNATURE-----
Merge tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
"Since v6.8 we've had a subtle breakage in SLUB with KFENCE enabled,
that can cause a crash. It hasn't been found earlier due to quite
specific conditions necessary (OOM during kmem_cache_alloc_bulk())"
* tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm, slub: do not call do_slab_free for kfence object
The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc ("sched: Add wrapper for get_wchan()
to keep task blocked")
Booting with the 'profile=sleep' kernel command line option added or
executing
# echo -n sleep > /sys/kernel/profiling
after boot causes the system to lock up.
Lockdep reports
kthreadd/3 is trying to acquire lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
but task is already holding lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
with the call trace being
lock_acquire+0xc8/0x2f0
get_wchan+0x32/0x70
__update_stats_enqueue_sleeper+0x151/0x430
enqueue_entity+0x4b0/0x520
enqueue_task_fair+0x92/0x6b0
ttwu_do_activate+0x73/0x140
try_to_wake_up+0x213/0x370
swake_up_locked+0x20/0x50
complete+0x2f/0x40
kthread+0xfb/0x180
However, since nobody noticed this regression for more than two years,
let's remove 'profile=sleep' support based on the assumption that nobody
needs this functionality.
Fixes: 42a20f86dc ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.
A recent change in the ACPI code which consolidated code pathes moved
the invocation of init_freq_invariance_cppc() to be moved to a CPU
hotplug handler. The first invocation on AMD CPUs ends up enabling a
static branch which dead locks because the static branch enable tries to
acquire cpu_hotplug_lock but that lock is already held write by the
hotplug machinery.
Use static_branch_enable_cpuslocked() instead and take the hotplug
lock read for the Intel code path which is invoked from the
architecture code outside of the CPU hotplug operations.
- Fix the number of reserved bits in the sev_config structure bit field
so that the bitfield does not exceed 64 bit.
- Add missing Zen5 model numbers
- Fix the alignment assumptions of pti_clone_pgtable() and
clone_entry_text() on 32-bit:
The code assumes PMD aligned code sections, but on 32-bit the kernel
entry text is not PMD aligned. So depending on the code size and
location, which is configuration and compiler dependent, entry text
can cross a PMD boundary. As the start is not PMD aligned adding PMD
size to the start address is larger than the end address which
results in partially mapped entry code for user space. That causes
endless recursion on the first entry from userspace (usually #PF).
Cure this by aligning the start address in the addition so it ends up
at the next PMD start address.
clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
eventually be PTE mapped, which causes a map fail because the PMD for
the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
the clone() invocation which resolves to PTE on 32-bit and PMD on
64-bit.
- Zero the 8-byte case for get_user() on range check failure on 32-bit
The recend consolidation of the 8-byte get_user() case broke the
zeroing in the failure case again. Establish it by clearing ECX
before the range check and not afterwards as that obvioulsy can't be
reached when the range check fails
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmave5oTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofuHD/9AX+BeMOp1+qezoK/YAAfdeY413y9G
WVYbHEdukS4wULX5wBJm1eTGJs2seuJYJ18yO18xHog1cTBsYd8V9kdLGR629QWc
6nEcs2Wbda6NCqZcKigXDbwWHMyKdymvLgCs+ldc+fEOnflXr27ZRyT0fFl03alE
RsX9jlNLG289i6DKJlllC6TjEr+hN6hXUAqY8d5OoMaUuJMJ4HsSBlBSwKAnuvfw
J0/OYZ8cQBtSGMiL3jHG8UngsWt9ehFdWfr/ineDiHagFvFjwlKgAYZwNZ1WORIg
Wx2Ga07JD3ZB4eLCMK1/fHsCtWPw7QtTLYFaKg3QES3yWSPvDJp7YIdXFlFDLNDh
tm/hp6ArhFofpTa+k+EopppUcK5f/TwDyosbKii8FadYjdTFWX4NmBGwoX3wIhCh
M81LdkP4K5YKI+wmJTgTQlT4o6KuNXC7XkKcqrKk/5OBrPG5xgpyeHK1zgbY7p+F
Ez5lTIDEm293boB3WZGGGiImceftr4kZoXSAZjbMBnncrGVFFGBrW5KE8JVTMaKm
kkAVYZFXl+vMJQgAKAIIRgj9MTcV44Cnopq0NwRhM5hOPTFTYXibHuH3X6sUuHKL
P2X2w0HZIaEo1nFO9/pCtqIs/kNFcanP6VWiJggFcCu7ldVi4jgCBpv0UnAiCHwq
nmqq2QbTV1XAMg==
=wf31
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.
A recent change in the ACPI code which consolidated code pathes moved
the invocation of init_freq_invariance_cppc() to be moved to a CPU
hotplug handler. The first invocation on AMD CPUs ends up enabling a
static branch which dead locks because the static branch enable tries
to acquire cpu_hotplug_lock but that lock is already held write by
the hotplug machinery.
Use static_branch_enable_cpuslocked() instead and take the hotplug
lock read for the Intel code path which is invoked from the
architecture code outside of the CPU hotplug operations.
- Fix the number of reserved bits in the sev_config structure bit field
so that the bitfield does not exceed 64 bit.
- Add missing Zen5 model numbers
- Fix the alignment assumptions of pti_clone_pgtable() and
clone_entry_text() on 32-bit:
The code assumes PMD aligned code sections, but on 32-bit the kernel
entry text is not PMD aligned. So depending on the code size and
location, which is configuration and compiler dependent, entry text
can cross a PMD boundary. As the start is not PMD aligned adding PMD
size to the start address is larger than the end address which
results in partially mapped entry code for user space. That causes
endless recursion on the first entry from userspace (usually #PF).
Cure this by aligning the start address in the addition so it ends up
at the next PMD start address.
clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
eventually be PTE mapped, which causes a map fail because the PMD for
the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
the clone() invocation which resolves to PTE on 32-bit and PMD on
64-bit.
- Zero the 8-byte case for get_user() on range check failure on 32-bit
The recend consolidation of the 8-byte get_user() case broke the
zeroing in the failure case again. Establish it by clearing ECX
before the range check and not afterwards as that obvioulsy can't be
reached when the range check fails
* tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
x86/mm: Fix pti_clone_entry_text() for i386
x86/mm: Fix pti_clone_pgtable() alignment assumption
x86/setup: Parse the builtin command line before merging
x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
x86/sev: Fix __reserved field in sev_config
x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
- The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed in testing as some compilers hoist the access into
the non-preemotible section where the pointer is actually used, but
obviously compilers can rightfully invoke it where the code put it.
Move it into the non-preemptible section right to the actual usage
side to cure it.
- The clocksource watchdog is supposed to emit a warning when the retry
count is greater than one and the number of retries reaches the
limit. The condition is backwards and warns always when the count is
greater than one. Fixup the condition to prevent spamming dmesg.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmavdtQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofrtD/95Ck3FgHRdxZWlnIwBptzW1ApfTjKa
fuwOHBAcFzpNx13DcSyKqclMIM2QxN2lAjAv3m5IeNFO7RN5Ru1aOskPpFMgQIzj
6UfKqvtuSZeCPIqspeN9/RAnqKTRYAFRZcSnE8FcFxuM6dU9zlnLjms1gstTyLS3
HeoQoUe1DT6IpKUDKdvMP8JiwU6/i+xHAeVizEkGZ5Rxo67+UDUqqfcpBr/pIAIN
W0KfekPVLGjvL19gvmaHrPBHi5tEjM1+7HiepKeyc9GjC1FHYGQjNLxXpri3CcI+
VZmya37ZVLKoOFjvHuqmCYzFWrEs1rEfgnBeCglV5lvwxfgPoOk9awpUAR4IWkEz
HMagUUre3kThEztoPzyf7apJmltVC7U++gRfW0i7p/gSfwF9AYAPgWAcg6VyDrxn
hIbKkQvqLNM1ldXWS0tG/scgEAKEM7yYG9BP03ac/mGdFNGa6yucG986ElHoVLSR
S8Dw1E7/F7G5KOqVK6i25JLFgN0ZJNRWMWbd95VBEuZcZ4fzIKug3intNeSUSjDc
zfvKfu65nRr2bHcaxs5MkPqkDqFOVytoQqgYYqstUZ4bRyeI6px/Rmu58VgJF2cP
DmUvf6gqNXbg3g6ijQmhOiCTqzjW67bxFbYDuX68oLfBDmdVPP8MjZOJHV7/ukvp
mjcFL/MYjg6/iw==
=lI2j
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the timer/clocksource code:
- The recent fix to make the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed in testing as some compilers hoist the access
into the non-preemotible section where the pointer is actually
used, but obviously compilers can rightfully invoke it where the
code put it.
Move it into the non-preemptible section right to the actual usage
side to cure it.
- The clocksource watchdog is supposed to emit a warning when the
retry count is greater than one and the number of retries reaches
the limit.
The condition is backwards and warns always when the count is
greater than one. Fixup the condition to prevent spamming dmesg"
* tag 'timers-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()
tick/broadcast: Move per CPU pointer access into the atomic section
- When stime is larger than rtime due to accounting imprecision, then
utime = rtime - stime becomes negative. As this is unsigned math, the
result becomes a huge positive number. Cure it by resetting stime to
rtime in that case, so utime becomes 0.
- Restore consistent state when sched_cpu_deactivate() fails.
When offlining a CPU fails in sched_cpu_deactivate() after the SMT
present counter has been decremented, then the function aborts but
fails to increment the SMT present counter and leaves it imbalanced.
Consecutive operations cause it to underflow. Add the missing fixup
for the error path.
As SMT accounting the runqueue needs to marked online again in the
error exit path to restore consistent state.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmavdT8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodVrEACDLJdjkM2n3T7EL8YjuBjkCW3dGWAZ
umpJGjwMDsT9/oLIU7B1wgX/IdWppssQa+0yXxZy7cKQvfP5VTd4fueuub2k5sJc
yDy5J8N0xRYvOhA0lrnp6jyqhhCZzIGDmSn3G+lDuQuuffaqfFbPkeMwoXmewiyt
72ajFsjeo7q25pm8ALgBhrSKfO5FFV1HJoAyoYKEyT5E/pliKNWrzbcA+PWstMK3
DWmj8dgYk6g/dBwNl6wORlpmcxjcDO65icH5XPSsadwosHe7q1+quIJSqMDyXHNY
qQ5r5f9bvXdq5DPKRON0GJb9gfSQNX5yE/pKdyW75mqHMxJ/pnIIds6h6mLHBewt
eZ5M1a/v8o+QiqQcDogk5DUzZlI46bKZsYLqU9y6v/WgUqa5C4BaEJT7CrQk+6wp
xUB4g3j/+asih55Tq9HKo6PEY8NLj4ytKHgeh0EvEllDxGmnRYR+PEdzLBjuWlAY
ka2/1vaNr/r5grbpQhO6N4txUAASoKF6nx1hq05I/lY45KA+RgeU0mgEN07Pa6HZ
4873Q2CnVUlvMVFulOUkJogGNk7KTDb3e7/+BMsA9Lda/2KmqaOLEh5T6egdLZ0G
feb/UQ6hoYcCD0IAsj9MfEOS3IVhOvtkJSwwLi/j09ucmC+5Ar3v3/Aw1EtTHJHm
ObdoEXJC98RLFA==
=b/q0
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
- When stime is larger than rtime due to accounting imprecision, then
utime = rtime - stime becomes negative. As this is unsigned math, the
result becomes a huge positive number.
Cure it by resetting stime to rtime in that case, so utime becomes 0.
- Restore consistent state when sched_cpu_deactivate() fails.
When offlining a CPU fails in sched_cpu_deactivate() after the SMT
present counter has been decremented, then the function aborts but
fails to increment the SMT present counter and leaves it imbalanced.
Consecutive operations cause it to underflow. Add the missing fixup
for the error path.
For SMT accounting the runqueue needs to marked online again in the
error exit path to restore consistent state.
* tag 'sched-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()
sched/core: Introduce sched_set_rq_on/offline() helper
sched/smt: Fix unbalance sched_smt_present dec/inc
sched/smt: Introduce sched_smt_present_inc/dec() helper
sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime
- Move the smp_processor_id() invocation back into the non-preemtible
region, so that the result is valid to use.
- Add the missing package C2 residency counters for Sierra Forest CPUs
to make the newly added support actually useful
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmavcrATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYKrD/wKtioQzAPuZHu7XH4IiaTLAbGEuU7L
9B+RYhMJcYmvYgZFdCJIMhsnKeNH6U6tVYx5ajYb4ThjXOpjGnJO5x8kYFoIFRuY
CibYnGZSfjfVQtHgOXtwNt/SG0DkiTP7nS4HrX++zakXeBREVZG/gd4tAq8jWnaq
sxJS3e3WhSz0MqYqfxbkgqIPlREAVLon3pgtBsNQwu1J6y6MV9yYJYwHW4NPCMGq
fTpEEKUdWKOwayaljw+r+GfAiyub+t0IlZ9Cue7FaqNlbLKTnMKSnJgo/wWLdei8
SUs5EOh76w2gFamPz+qRv9LlndvY2mhVvb+aPb/py2EtOUIISAqa/bCNI28EldEr
pzlrybyXxU+sb8igGp7oBpa154DzSAOqIGx81pBDUeqdN9oThjAU6+qC3VgDHqLh
XNKEL+i2MIsyHKwwsjIJDcW10g5p0ngbi+4QmucqXeSCv0Ms9+64m2/xWmFGgnJM
KGu8Iv7e9k15E4wuUUdzsUcOo2UxzQb+HkYfXK0x39FZbRVR1nbjNUJsBrEe9gFA
OJ3CJHJfkVMml0uLlO0vxecDkaBkDuXN33tn7SlABtf5jlWNaeoIZC1ltp/BG8oB
vC3VTDHDpS5H3vN1wQs9UrdxXWQXbpQedv50jwotKgBcP2ibrMV5kSvxjE6LC9V0
jEaxVFfRYVkIVg==
=q4oe
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Thomas Gleixner:
- Move the smp_processor_id() invocation back into the non-preemtible
region, so that the result is valid to use
- Add the missing package C2 residency counters for Sierra Forest CPUs
to make the newly added support actually useful
* tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix smp_processor_id()-in-preemptible warnings
perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
- Ensure to skip the clear register space in the MBIGEN driver when
calculating the node register index. Otherwise the clear register is
clobbered and the wrong node registers are accessed.
- Fix a signed/unsigned confusion in the loongarch CPU driver which
converts an error code to a huge "valid" interrupt number.
- Convert the mesion GPIO interrupt controller lock to a raw spinlock so
it works on RT.
- Add a missing static to a internal function in the pic32 EVIC driver.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmavcLQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYN3D/9kS+JGtkDdk1RazGtHKTMrPh+rRxKD
P3UF3ApI2/G1gyOemcJWhxp8Y2iI5PqOJOwe+t82hMMSwWeOCKVcDcH0nL/RKmL1
DxC2pcXAm+WUbwwdcMsj3k5oYE/3AlNVp/KYluTwmnZoXu4o2MFkXOZUn+sTLdi3
NU3q0uslc5InbC/Dqh7YSC0g/QqnQhPFgrfbgFX0mg2ixchvWqcu/tYqTsPj0Jgz
ZHaBUDQLdDL1ngCgAeiD2m9+qkFZRdjiYiV4xcRXc+kvCOWcRKo1CEX6hBmJh9YQ
fTjDzrFcXihKdp8ivWlZ34BaYyQobS14wbuZ5TcyD8XuO7h6/gkeeoaClpXHHZSN
T1O1vN2xxvn+wqge60HupPAAPBPmtX3cXW1bw1znSEDSSXhcYWD8J33kMZQZRSAw
xZXipqSi9vy0s8ZGhk7abNR+uEWAaCM9a4r4UEF15nAIgZjlLN9QwVTsZ12lwyte
aeH1HxV7Vv3775qy5SJkoRiWBMTSXy0G69ho/pjatE2GHmkJnncx5VGGh+SMg/Ri
xWOfwq36rv8YaTmoAWm/j3FsLnMDEqCju+sOO3J+5H24Zb5BUiLmeInkqi3j+jDq
NoFZyT2c7W94YiAUIL7nf5CJ1Sdlm7LIsyoKFtq6AkmTiZGI6henUc6c+3Q8Sihk
802+TD53qdu/Qw==
=WBFB
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A couple of fixes for interrupt chip drivers:
- Make sure to skip the clear register space in the MBIGEN driver
when calculating the node register index. Otherwise the clear
register is clobbered and the wrong node registers are accessed.
- Fix a signed/unsigned confusion in the loongarch CPU driver which
converts an error code to a huge "valid" interrupt number.
- Convert the mesion GPIO interrupt controller lock to a raw spinlock
so it works on RT.
- Add a missing static to a internal function in the pic32 EVIC
driver"
* tag 'irq-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mbigen: Fix mbigen node address layout
irqchip/meson-gpio: Convert meson_gpio_irq_controller::lock to 'raw_spinlock_t'
irqchip/irq-pic32-evic: Add missing 'static' to internal function
irqchip/loongarch-cpu: Fix return value of lpic_gsi_to_irq()
- Ensure that the atomic_cmpxchg() conditions are correct and evaluating
to true on any non-zero value except 1. The missing check of the
return value leads to inconsisted state of the jump label counter.
- Add a missing type conversion in the paravirt spinlock code which
makes loongson build again.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmavcdsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofABD/9AJA5feiAhwbidCafFihfiL1yxzUoy
PLvbZ3YME9N0dVen6LKQnGyA68eXAzHQFXunWSIqMHPxC/L35AJTJ2Qjx9d+vXZu
EigtYox9hkR19ZH1VH/yAOZcc2fvTYPvD/hQ8Wqd/5nEfa8nQq8k5i1/GOP9zZ/Y
LtgbPT98FGG3eUHFWxmINv2Ws3y4iNZLT6tmxUrhoTlcojMEuHvEPmdO0KYorOzS
ri0/OySk5j438LX59rucP53vIFr1yUg2uFqkrV8ru9PqGm0lHVmqG3YkVh9A1VWA
huuXZJba6ixjDtUFyeg9ksW0M5jJxAkl4XQ0suiLt8ySZPTA7LDKq0Py037RECfX
jnmY4gvOgv43mvZfgcThtgzqqxO/Jg8IATve8ljKQKOklhQ/A8B/wJgbzVhIMARQ
xczqB6iM1BuJmfwPUrwX8ibfgATo+HCSlyS+Sob3335Tap/XOjB6cdx1+V8OAWkn
VlTzTJpYXOlOq0JtkIC71pzEqSGmAqscPinwBPj+ZHaUd21lvFxpdXi6r+APjPiY
LsneKztQAfQk/DToYddbqisMcGdMxjcdifr4AtlW03XAdEd//G3NjE13TASaCL33
snRPkzARFUf56bW/isIsQFi/kEax3CZ438O7MN0nkelaURLcDtMYJOM+y3ncybmo
Pr3H/be2I/qmtg==
=mbEV
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two fixes for locking and jump labels:
- Ensure that the atomic_cmpxchg() conditions are correct and
evaluating to true on any non-zero value except 1. The missing
check of the return value leads to inconsisted state of the jump
label counter.
- Add a missing type conversion in the paravirt spinlock code which
makes loongson build again"
* tag 'locking-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_label: Fix the fix, brown paper bags galore
locking/pvqspinlock: Correct the type of "old" variable in pv_kick_node()