This reverts commit e61ef21e27.
uffd_poll_thread may be called by other tests that do not initialize the
pthread_barrier, so this approach is not correct. This will revert to
using atomic_bool instead.
Link: https://lkml.kernel.org/r/20241018171734.2315053-3-edliaw@google.com
Fixes: e61ef21e27 ("selftests/mm: replace atomic_bool with pthread_barrier_t")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: revert pthread_barrier change"
On Android arm, pthread_create followed by a fork caused a deadlock in
the case where the fork required work to be completed by the created
thread.
The previous patches incorrectly assumed that the parent would
always initialize the pthread_barrier for the child thread. This
reverts the change and replaces the fix for wp-fork-with-event with the
original use of atomic_bool.
This patch (of 3):
This reverts commit e142cc87ac.
fork_event_consumer may be called by other tests that do not initialize
the pthread_barrier, so this approach is not correct. The subsequent
patch will revert to using atomic_bool instead.
Link: https://lkml.kernel.org/r/20241018171734.2315053-1-edliaw@google.com
Link: https://lkml.kernel.org/r/20241018171734.2315053-2-edliaw@google.com
Fixes: e142cc87ac ("fix deadlock for fork after pthread_create on ARM")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test to assert that VMG_FLAG_JUST_EXPAND functions as expected - that
is, when the VMA iterator is positioned at the previous VMA and no VMAs
proceed it, we observe an expansion with all state as expected.
Explicitly place a prior VMA that would otherwise fail this test if the
mode were not enabled (as it would traverse to the previous-previous VMA).
Link: https://lkml.kernel.org/r/d2f88330254a6448092412bf7dfe077a579ab0dc.1729174352.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "introduce VMA merge mode to improve brk() performance".
A ~5% performance regression was discovered on the
aim9.brk_test.ops_per_sec by the linux kernel test bot [0].
In the past to satisfy brk() performance we duplicated VMA expansion code
and special-cased do_brk_flags(). This is however horrid and undoes work
to abstract this logic, so in resolving the issue I have endeavoured to
avoid this.
Investigating further I was able to observe that the use of a
vma_iter_next_range() and vma_prev() pair, causing an unnecessary maple
tree walk. In addition there is work that we do that is simply
unnecessary for brk().
Therefore, add a special VMA merge mode VMG_FLAG_JUST_EXPAND to avoid
doing any of this - it assumes the VMA iterator is pointing at the
previous VMA and which skips logic that brk() does not require.
This mostly eliminates the performance regression reducing it to ~2% which
is in the realm of noise. In addition, the will-it-scale test brk2,
written to be more representative of real-world brk() usage, shows a
modest performance improvement - which gives me confidence that we are not
meaningfully regressing real workloads here.
This series includes a test asserting that the 'just expand' mode works as
expected.
With many thanks to Oliver Sang for helping with performance testing of
candidate patch sets!
[0]:https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com
This patch (of 2):
We know in advance that do_brk_flags() wants only to perform a VMA
expansion (if the prior VMA is compatible), and that we assume no
mergeable VMA follows it.
These are the semantics of this function prior to the recent rewrite of
the VMA merging logic, however we are now doing more work than necessary -
positioning the VMA iterator at the prior VMA and performing tasks that
are not required.
Add a new field to the vmg struct to permit merge flags and add a new
merge flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have
do_brk_flags() use this.
This fixes a reported performance regression in a brk() benchmarking suite.
Link: https://lkml.kernel.org/r/cover.1729174352.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/4e65d4395e5841c5acf8470dbcb714016364fd39.1729174352.git.lorenzo.stoakes@oracle.com
Fixes: cacded5e42 ("mm: avoid using vma_merge() for new VMAs")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/linux-mm/202409301043.629bea78-oliver.sang@intel.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e269 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The acquired memory blocks for reserved may include blocks outside of
memory management. In this case, the nid variable is set to NUMA_NO_NODE
(-1), so an error occurs in node_set(). This adds a check using
numa_valid_node() to numa_clear_kernel_node_hotplug() that skips
node_set() when nid is set to NUMA_NO_NODE.
Link: https://lkml.kernel.org/r/1729070461-13576-1-git-send-email-nobuhiro1.iwamatsu@toshiba.co.jp
Fixes: 8748270821 ("mm: introduce numa_memblks")
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Suggested-by: Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Syzbot reported a kernel BUG in ocfs2_truncate_inline. There are two
reasons for this: first, the parameter value passed is greater than
ocfs2_max_inline_data_with_xattr, second, the start and end parameters of
ocfs2_truncate_inline are "unsigned int".
So, we need to add a sanity check for byte_start and byte_len right before
ocfs2_truncate_inline() in ocfs2_remove_inode_range(), if they are greater
than ocfs2_max_inline_data_with_xattr return -EINVAL.
Link: https://lkml.kernel.org/r/tencent_D48DB5122ADDAEDDD11918CFB68D93258C07@qq.com
Fixes: 1afc32b952 ("ocfs2: Write support for inline data")
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reported-by: syzbot+81092778aac03460d6b7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=81092778aac03460d6b7
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I got the following KCSAN report during syzbot testing:
==================================================================
BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current
write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
shmem_mknod+0x117/0x180 mm/shmem.c:3443
shmem_create+0x34/0x40 mm/shmem.c:3497
lookup_open fs/namei.c:3578 [inline]
open_last_lookups fs/namei.c:3647 [inline]
path_openat+0xdbc/0x1f00 fs/namei.c:3883
do_filp_open+0xf7/0x200 fs/namei.c:3913
do_sys_openat2+0xab/0x120 fs/open.c:1416
do_sys_open fs/open.c:1431 [inline]
__do_sys_openat fs/open.c:1447 [inline]
__se_sys_openat fs/open.c:1442 [inline]
__x64_sys_openat+0xf3/0x120 fs/open.c:1442
x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0:
inode_get_ctime_nsec include/linux/fs.h:1623 [inline]
inode_get_ctime include/linux/fs.h:1629 [inline]
generic_fillattr+0x1dd/0x2f0 fs/stat.c:62
shmem_getattr+0x17b/0x200 mm/shmem.c:1157
vfs_getattr_nosec fs/stat.c:166 [inline]
vfs_getattr+0x19b/0x1e0 fs/stat.c:207
vfs_statx_path fs/stat.c:251 [inline]
vfs_statx+0x134/0x2f0 fs/stat.c:315
vfs_fstatat+0xec/0x110 fs/stat.c:341
__do_sys_newfstatat fs/stat.c:505 [inline]
__se_sys_newfstatat+0x58/0x260 fs/stat.c:499
__x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
value changed: 0x2755ae53 -> 0x27ee44d3
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
==================================================================
When calling generic_fillattr(), if you don't hold read lock, data-race
will occur in inode member variables, which can cause unexpected
behavior.
Since there is no special protection when shmem_getattr() calls
generic_fillattr(), data-race occurs by functions such as shmem_unlink()
or shmem_mknod(). This can cause unexpected results, so commenting it out
is not enough.
Therefore, when calling generic_fillattr() from shmem_getattr(), it is
appropriate to protect the inode using inode_lock_shared() and
inode_unlock_shared() to prevent data-race.
Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com
Fixes: 44a30220bc ("shmem: recalculate file inode when fstat")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reported-by: syzbot <syzkaller@googlegroup.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
vms_abort_munmap_vmas() is a recovery path where, on entry, some VMAs have
already been torn down halfway (in a way we can't undo) but are still
present in the maple tree.
At this point, we *must* remove the VMAs from the VMA tree, otherwise we
get UAF.
Because removing VMA tree nodes can require memory allocation, the
existing code has an error path which tries to handle this by reattaching
the VMAs; but that can't be done safely.
A nicer way to fix it would probably be to preallocate enough maple tree
nodes for the removal before the point of no return, or something like
that; but for now, fix it the easy and kinda ugly way, by marking this
allocation __GFP_NOFAIL.
Link: https://lkml.kernel.org/r/20241016-fix-munmap-abort-v1-1-601c94b2240d@google.com
Fixes: 4f87153e82 ("mm: change failure of MAP_FIXED to restoring the gap on failure")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:
AR built-in.a
AR vmlinux.a
LD vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
kmsan_unpoison_entry_regs() leaves .noinstr.text section
OBJCOPY modules.builtin.modinfo
GEN modules.builtin
MODPOST Module.symvers
CC .vmlinux.export.o
Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.
There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.
Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com
Fixes: ba54d194f8 ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We want to use the functions (get_free_mem_region()) configured via
GET_FREE_REGION in resource kunit tests. However, GET_FREE_REGION
depends on SPARSEMEM now. This makes resource kunit tests cannot be
built on some architectures lacking SPARSEMEM, or causes config warning
as follows,
WARNING: unmet direct dependencies detected for GET_FREE_REGION
Depends on [n]: SPARSEMEM [=n]
Selected by [y]:
- RESOURCE_KUNIT_TEST [=y] && RUNTIME_TESTING_MENU [=y] && KUNIT [=y]
When get_free_mem_region() was introduced the only consumers were those
looking to pass the address range to memremap_pages(). That address
range needed to be mindful of the maximum addressable platform physical
address which at the time only SPARSMEM defined via MAX_PHYSMEM_BITS.
Given that memremap_pages() also depended on SPARSEMEM via ZONE_DEVICE,
it was easier to just depend on that definition than invent a general
MAX_PHYSMEM_BITS concept outside of SPARSEMEM.
Turns out that decision was buggy and did not account for KASAN
consumption of physical address space. That problem was resolved
recently with commit ea72ce5da2 ("x86/kaslr: Expose and use the end
of the physical memory address space"), and GET_FREE_REGION dropped its
MAX_PHYSMEM_BITS dependency.
Then commit 99185c10d5 ("resource, kunit: add test case for
region_intersects()"), went ahead and fixed up the only remaining
dependency on SPARSEMEM which was usage of the PA_SECTION_SHIFT macro
for setting the default alignment. A PAGE_SIZE fallback is fine in the
SPARSEMEM=n case.
With those build dependencies gone GET_FREE_REGION no longer depends on
SPARSEMEM. So, the patch removes dependency on SPARSEMEM from
GET_FREE_REGION to fix the build issues.
Link: https://lkml.kernel.org/r/20241016014730.339369-1-ying.huang@intel.com
Link: https://lore.kernel.org/lkml/20240922225041.603186-1-linux@roeck-us.net/
Link: https://lkml.kernel.org/r/20241015051554.294734-1-ying.huang@intel.com
Fixes: 99185c10d5 ("resource, kunit: add test case for region_intersects()")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Avoiding the zeroing of the vma tree in mmap_region() introduced a race
with truncate in the page table walk. To avoid any races, create a hole
in the rmap during the operation by clearing the pagetable entries earlier
under the mmap write lock and (critically) before the new vma is installed
into the vma tree. The result is that the old vma(s) are left in the vma
tree, but free_pgtables() removes them from the rmap and clears the ptes
while holding the necessary locks.
This change extends the fix required for hugetblfs and the call_mmap()
function by moving the cleanup higher in the function and running it
unconditionally.
Link: https://lkml.kernel.org/r/20241016013455.2241533-1-Liam.Howlett@oracle.com
Fixes: f8d112a4e6 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Under memory pressure it's possible for GFP_ATOMIC order-0 allocations to
fail even though free pages are available in the highatomic reserves.
GFP_ATOMIC allocations cannot trigger unreserve_highatomic_pageblock()
since it's only run from reclaim.
Given that such allocations will pass the watermarks in
__zone_watermark_unusable_free(), it makes sense to fallback to highatomic
reserves the same way that ALLOC_OOM can.
This fixes order-0 page allocation failures observed on Cloudflare's fleet
when handling network packets:
kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0-7
CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G O 6.6.43-CUSTOM #1
Hardware name: MACHINE
Call Trace:
<IRQ>
dump_stack_lvl+0x3c/0x50
warn_alloc+0x13a/0x1c0
__alloc_pages_slowpath.constprop.0+0xc9d/0xd10
__alloc_pages+0x327/0x340
__napi_alloc_skb+0x16d/0x1f0
bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en]
bnxt_rx_pkt+0x201/0x15e0 [bnxt_en]
__bnxt_poll_work+0x156/0x2b0 [bnxt_en]
bnxt_poll+0xd9/0x1c0 [bnxt_en]
__napi_poll+0x2b/0x1b0
bpf_trampoline_6442524138+0x7d/0x1000
__napi_poll+0x5/0x1b0
net_rx_action+0x342/0x740
handle_softirqs+0xcf/0x2b0
irq_exit_rcu+0x6c/0x90
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
[mfleming@cloudflare.com: update comment]
Link: https://lkml.kernel.org/r/20241015125158.3597702-1-matt@readmodwrite.com
Link: https://lkml.kernel.org/r/20241011120737.3300370-1-matt@readmodwrite.com
Link: https://lore.kernel.org/all/CAGis_TWzSu=P7QJmjD58WWiu3zjMTVKSzdOwWE8ORaGytzWJwQ@mail.gmail.com/
Fixes: 1d91df85f3 ("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs")
Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is no reason to invoke these hooks early against an mm that is in an
incomplete state.
The change in commit d240629148 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.
Their placement early in dup_mmap() only appears to have been meaningful
for early error checking, and since functionally it'd require a very small
allocation to fail (in practice 'too small to fail') that'd only occur in
the most dire circumstances, meaning the fork would fail or be OOM'd in
any case.
Since both khugepaged and KSM tracking are there to provide optimisations
to memory performance rather than critical functionality, it doesn't
really matter all that much if, under such dire memory pressure, we fail
to register an mm with these.
As a result, we follow the example of commit d2081b2bf8 ("mm:
khugepaged: make khugepaged_enter() void function") and make ksm_fork() a
void function also.
We only expose the mm to these functions once we are done with them and
only if no error occurred in the fork operation.
Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "fork: do not expose incomplete mm on fork".
During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.
In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.
As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.
We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and
disallow uffd from observing fork events should an error have occurred.
This patch (of 2):
Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.
This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers
for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().
This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.
The change in commit d240629148 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.
We address this by, on fork error, ensuring that we roll back state that
we would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().
We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.
Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.
Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
pmd_leaf()/pud_leaf() only implies a pmd_present()/pud_present() check on
some architectures. We really should check for
pmd_present()/pud_present() first.
This should explain the report we got on ppc64 (which has
CONFIG_PGTABLE_HAS_HUGE_LEAVES set in the config) that triggered:
VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp)));
Likely we had a PMD migration entry for which pmd_leaf() did not trigger.
We raced with restoring the PMD migration entry, and suddenly saw a
pmd_leaf(). In this case, pte_offset_map_lock() saved us from more
trouble, because it rechecks the PMD value, but we would not have
processed the migration entry -- which is not too bad because the only
user of FW_MIGRATION is KSM for unsharing, and KSM only applies to small
folios.
Further, we shouldn't re-read the PMD/PUD value for our warning, the
primary purpose of the VM_WARN_ON_ONCE() is to find spurious use of
pmd_leaf()/pud_leaf() without CONFIG_PGTABLE_HAS_HUGE_LEAVES.
As a side note, we are currently not implementing FW_MIGRATION support for
PUD migration entries, which likely should exist due to hugetlb. Add a
TODO so this won't fall through the cracks if more FW_MIGRATION users get
added.
Was able to write a quick reproducer and verify that the issue no longer triggers with this fix.
https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/reproducers/move-pages-pmd-leaf.c
Without this fix after a couple of seconds in a VM with 2 NUMA nodes:
[ 54.333753] ------------[ cut here ]------------
[ 54.334901] WARNING: CPU: 20 PID: 1704 at mm/pagewalk.c:815 folio_walk_start+0x48f/0x6e0
[ 54.336455] Modules linked in: ...
[ 54.345009] CPU: 20 UID: 0 PID: 1704 Comm: move-pages-pmd- Not tainted 6.12.0-rc2+ #81
[ 54.346529] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 54.348191] RIP: 0010:folio_walk_start+0x48f/0x6e0
[ 54.349134] Code: b5 ad 48 8d 35 00 00 00 00 e8 6d 59 d7 ff e8 08 74 da ff e9 9c fe ff ff 4c 8b 7c 24 08 4c 89 ff e8 26 2b be 00 e9 8a fe ff ff <0f> 0b e9 ec fe ff ff f7 c2 ff 0f 00 00 0f 85 81 fe ff ff 48 8b 02
[ 54.352660] RSP: 0018:ffffb7e4c430bc78 EFLAGS: 00010282
[ 54.353679] RAX: 80000002a3e008e7 RBX: ffff9946039aa580 RCX: ffff994380000000
[ 54.355056] RDX: ffff994606aec000 RSI: 00007f004b000000 RDI: 0000000000000000
[ 54.356440] RBP: 00007f004b000000 R08: 0000000000000591 R09: 0000000000000001
[ 54.357820] R10: 0000000000000200 R11: 0000000000000001 R12: ffffb7e4c430bd10
[ 54.359198] R13: ffff994606aec2c0 R14: 0000000000000002 R15: ffff994604a89b00
[ 54.360564] FS: 00007f004ae006c0(0000) GS:ffff9947f7400000(0000) knlGS:0000000000000000
[ 54.362111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 54.363242] CR2: 00007f004adffe58 CR3: 0000000281e12005 CR4: 0000000000770ef0
[ 54.364615] PKRU: 55555554
[ 54.365153] Call Trace:
[ 54.365646] <TASK>
[ 54.366073] ? __warn.cold+0xb7/0x14d
[ 54.366796] ? folio_walk_start+0x48f/0x6e0
[ 54.367628] ? report_bug+0xff/0x140
[ 54.368324] ? handle_bug+0x58/0x90
[ 54.369019] ? exc_invalid_op+0x17/0x70
[ 54.369771] ? asm_exc_invalid_op+0x1a/0x20
[ 54.370606] ? folio_walk_start+0x48f/0x6e0
[ 54.371415] ? folio_walk_start+0x9e/0x6e0
[ 54.372227] do_pages_move+0x1c5/0x680
[ 54.372972] kernel_move_pages+0x1a1/0x2b0
[ 54.373804] __x64_sys_move_pages+0x25/0x30
Link: https://lkml.kernel.org/r/20241015111236.1290921-1-david@redhat.com
Fixes: aa39ca6940 ("mm/pagewalk: introduce folio_walk_start() + folio_walk_end()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+7d917f67c05066cec295@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/670d3248.050a0220.3e960.0064.GAE@google.com
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
PD3.1 spec ("8.3.3.3.3 PE_SNK_Wait_for_Capabilities State") mandates
that the policy engine perform a hard reset when SinkWaitCapTimer
expires. Instead the code explicitly does a GET_SOURCE_CAP when the
timer expires as part of SNK_WAIT_CAPABILITIES_TIMEOUT. Due to this the
following compliance test failures are reported by the compliance tester
(added excerpts from the PD Test Spec):
* COMMON.PROC.PD.2#1:
The Tester receives a Get_Source_Cap Message from the UUT. This
message is valid except the following conditions: [COMMON.PROC.PD.2#1]
a. The check fails if the UUT sends this message before the Tester
has established an Explicit Contract
...
* TEST.PD.PROT.SNK.4:
...
4. The check fails if the UUT does not send a Hard Reset between
tTypeCSinkWaitCap min and max. [TEST.PD.PROT.SNK.4#1] The delay is
between the VBUS present vSafe5V min and the time of the first bit
of Preamble of the Hard Reset sent by the UUT.
For the purpose of interoperability, restrict the quirk introduced in
https://lore.kernel.org/all/20240523171806.223727-1-sebastian.reichel@collabora.com/
to only non self-powered devices as battery powered devices will not
have the issue mentioned in that commit.
Cc: stable@vger.kernel.org
Fixes: 122968f8dd ("usb: typec: tcpm: avoid resets for missing source capability messages")
Reported-by: Badhri Jagan Sridharan <badhri@google.com>
Closes: https://lore.kernel.org/all/CAPTae5LAwsVugb0dxuKLHFqncjeZeJ785nkY4Jfd+M-tCjHSnQ@mail.gmail.com/
Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://lore.kernel.org/r/20241024022233.3276995-1-amitsd@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For devm_usb_put_phy(), its comment says it needs to invoke usb_put_phy()
to release the phy, but it does not do that actually, so it can not fully
undo what the API devm_usb_get_phy() does, that is wrong, fixed by using
devres_release() instead of devres_destroy() within the API.
Fixes: cedf860237 ("usb: phy: move bulk of otg/otg.c to phy/phy.c")
Cc: stable@vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20241020-usb_phy_fix-v1-1-7f79243b8e1e@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running watchdog-test with 'make run_tests', the watchdog-test will
be terminated by a timeout signal(SIGTERM) due to the test timemout.
And then, a system reboot would happen due to watchdog not stop. see
the dmesg as below:
```
[ 1367.185172] watchdog: watchdog0: watchdog did not stop!
```
Fix it by registering more signals(including SIGTERM) in watchdog-test,
where its signal handler will stop the watchdog.
After that
# timeout 1 ./watchdog-test
Watchdog Ticking Away!
.
Stopping watchdog ticks...
Link: https://lore.kernel.org/all/20241029031324.482800-1-lizhijian@fujitsu.com/
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Use the __free() macro for 'altmodes_node' to automatically release the
node when it goes out of scope, removing the need for explicit calls to
fwnode_handle_put().
Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20241021-typec-class-fwnode_handle_put-v2-2-3281225d3d27@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 'altmodes_node' fwnode_handle is never released after it is no
longer required, which leaks the resource.
Add the required call to fwnode_handle_put() when 'altmodes_node' is no
longer required.
Cc: stable@vger.kernel.org
Fixes: 7b458a4c5d ("usb: typec: Add typec_port_register_altmodes()")
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://lore.kernel.org/r/20241021-typec-class-fwnode_handle_put-v2-1-3281225d3d27@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If drm_dp_hpd_bridge_register() fails, the probe function returns
without removing the fwnode via fwnode_handle_put(), leaking the
resource.
Jump to fwnode_remove if drm_dp_hpd_bridge_register() fails to remove
the fwnode acquired with device_get_named_child_node().
Cc: stable@vger.kernel.org
Fixes: 7d9f1b72b2 ("usb: typec: qcom-pmic-typec: switch to DRM_AUX_HPD_BRIDGE")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20241020-qcom_pmic_typec-fwnode_remove-v2-2-7054f3d2e215@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The right function to release a fwnode acquired via
device_get_named_child_node() is fwnode_handle_put(), and not
fwnode_remove_software_node(), as no software node is being handled.
Replace the calls to fwnode_remove_software_node() with
fwnode_handle_put() in qcom_pmic_typec_probe() and
qcom_pmic_typec_remove().
Cc: stable@vger.kernel.org
Fixes: a4422ff221 ("usb: typec: qcom: Add Qualcomm PMIC Type-C driver")
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Acked-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20241020-qcom_pmic_typec-fwnode_remove-v2-1-7054f3d2e215@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a boot hang issue triggered when a USB3 device is incorrectly assumed
to be tunneled over USB4, thus attempting to create a device link between
the USB3 "consumer" device and the USB4 "supplier" Host Interface before
the USB4 side is properly bound to a driver.
This could happen if xhci isn't capable of detecting tunneled devices,
but ACPI tables contain all info needed to assume device is tunneled.
i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
It turns out that even for actual tunneled USB3 devices it can't be
assumed that the thunderbolt driver providing the tunnel is loaded
before the tunneled USB3 device is created.
The tunnel can be created by BIOS and remain in use by thunderbolt/USB4
host driver once it loads.
Solve this by making the device link "stateless", which doesn't create
a driver presence order dependency between the supplier and consumer
drivers.
It still guarantees correct suspend/resume and shutdown ordering.
cc: Mario Limonciello <mario.limonciello@amd.com>
Fixes: f1bfb4a6fe ("usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface")
Tested-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241024131355.3836538-1-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit d483f034f0 ("usb: dwc2: Skip clock gating on Broadcom SoCs")
introduced a regression on Raspberry Pi 3 B Plus, which prevents
enumeration of the onboard Microchip LAN7800 in case no external USB device
is connected during boot.
Fixes: d483f034f0 ("usb: dwc2: Skip clock gating on Broadcom SoCs")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://lore.kernel.org/r/20241025103621.4780-2-wahrenst@gmx.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During the aborting of a command, the software receives a command
completion event for the command ring stopped, with the TRB pointing
to the next TRB after the aborted command.
If the command we abort is located just before the Link TRB in the
command ring, then during the 'command ring stopped' completion event,
the xHC gives the Link TRB in the event's cmd DMA, which causes a
mismatch in handling command completion event.
To address this situation, move the 'command ring stopped' completion
event check slightly earlier, since the specific command it stopped
on isn't of significant concern.
Fixes: 7f84eef0da ("USB: xhci: No-op command queueing and irq handler.")
Cc: stable@vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh@quicinc.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241022155631.1185-1-quic_faisalh@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use pm_runtime_put in the remove function and pm_runtime_get to disable
RPM on platforms that don't support runtime D3, as re-enabling it through
sysfs auto power control may cause the controller to malfunction. This
can lead to issues such as hotplug devices not being detected due to
failed interrupt generation.
Fixes: a5d6264b63 ("xhci: Enable RPM on controllers that support low-power states")
Cc: stable <stable@kernel.org>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241024133718.723846-1-Basavaraj.Natikar@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Running "make kselftest TARGETS=intel_pstate" results in the
following errors:
- ./run.sh: line 89: cpupower: command not found
- ./run.sh: line 91: cpupower: command not found
if the cpupower is not installed.
Since the test depends on cpupower, this patch stops the test if the
cpupower is not installed.
Link: https://lore.kernel.org/all/cc01753c8dab0f33669a5a0fc162544078055bd1.1730141362.git.alessandro.zanni87@gmail.com/
Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Running "make kselftest TARGETS=intel_pstate" results in
the following errors:
- ./run.sh: line 90: / 1000: syntax error: operand expected
(error token is "/ 1000")
- ./run.sh: line 92: / 1000: syntax error: operand expected
(error token is "/ 1000")
This fix allows to have cross-platform compatibility when
using arithmetic expression with command substitutions.
Link: https://lore.kernel.org/r/f37df23888cd5ea6b3976f19d3e25796129dd090.1730141362.git.alessandro.zanni87@gmail.com
Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Test case idmap_mount_tree_invalid failed to run on the newer kernel
with the following output:
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, sizeof(attr)) (0) ! = 0 (0)
# idmap_mount_tree_invalid: Test terminated by assertion
This is because tmpfs is mounted at "/mnt/A", and tmpfs already
contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6 ("shmem:
support idmapped mounts for tmpfs"). So calling sys_mount_setattr here
returns 0 instead of -EINVAL as expected.
Ramfs does not support idmap mounts, so we can use it here to test invalid mounts,
which allows the test case to pass with the following output:
# Starting 1 tests from 1 test cases.
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# OK mount_setattr_idmapped.idmap_mount_tree_invalid
ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid
# PASSED: 1 / 1 tests passed.
Link: https://lore.kernel.org/all/20241028084132.3212598-1-zhouyuhang1010@163.com/
Signed-off-by: zhouyuhang <zhouyuhang@kylinos.cn>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This includes following USB4/Thunderbolt fixes for v6.12-rc5:
- Fix KASAN reported stack out-of-bounds read
- Honor Time Management Unit (TMU) requirements in the domain when
configuring TMU mode of a newly plugged router.
Both have been in linux-next with no reported issues.
-----BEGIN PGP SIGNATURE-----
iQJUBAABCgA+FiEEVTdhRGBbNzLrSUBaAP2fSd+ZWKAFAmcbK7IgHG1pa2Eud2Vz
dGVyYmVyZ0BsaW51eC5pbnRlbC5jb20ACgkQAP2fSd+ZWKCz+BAAgid8LtaemRXt
pIcCvvxrn4q9yoUnaAHo1qiTiXfWCnCdp+8IEEjkqba2E0/SUe4mrLQj3LAWN13Q
88b6o4oRj1vAMujuHmRzNG4yty1nZeEHw9xXoBZEstjiUmt1YsrTQGZ/etdZoTyF
YtV3yX91jk16DG682i22PUNwy5OXTgmvvI/IlYFUa86ob6Bt2+EJoj53+Foa0p7U
2DEvmyC48HVrJ/8dQXJs/xSdW5nE8j0z40g7I5Hz0QiT92QFCxPXtEbZq63SEqlD
yX0IgApKqPoCGI25TBa9pkgEANHKHH6akMCL5lYb0ejkclXIDj5jLJasLIScAlJ+
Rk2JMO5Yi3tlrE3nkVIxmfEIN1CPIZFO9JxrMap0TroOAeAc7aUuyJZ5/pk2g9kp
e8n/ySh0MuaJAI8ydMn3/PzXP7NW5slXyYUMBL0ZMFucNQGrBSj3bZyp1Xv8zOtj
kO6ZkH1Tr+1FM3ECCIgvMe22sLm8TZki/atnHW96BOCV+IOfgMCTjTOEDZhBOHiR
o3mPIhpSJQBrP/SbWAXbpDV5qimCN27mCLASm19BDf99TAC+OizKIoYrcdxLRTBh
tyJCGcYEMXohimACP4P/bMPNadjyzOCWUDYjlsIDWkXzdrhPRdP6KsXef53Z/1m2
o9TSZOAvcC2fQ84NnystB2Q7eoCmm+Y=
=QBZT
-----END PGP SIGNATURE-----
Merge tag 'thunderbolt-for-v6.12-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus
Mika writes:
thunderbolt: Fixes for v6.12-rc5
This includes following USB4/Thunderbolt fixes for v6.12-rc5:
- Fix KASAN reported stack out-of-bounds read
- Honor Time Management Unit (TMU) requirements in the domain when
configuring TMU mode of a newly plugged router.
Both have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.12-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Honor TMU requirements in the domain when setting TMU mode
thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
Usual mixed back of fixes for ancient bugs and some more recently
introduced problems.
gts-helper module
- Memory leak fixes for this library code to handle complex gain cases.
adi,ad7124
- Fix a divide by zero that can be triggered from userspace.
adi,ad7380
- Various supply fixes. Includes some minor rework that simplifies the
fix though increases the apparent scale of the change.
adi,ad9832
- Avoid a potential divide by zero if clk_get_rate() returns 0.
adi,ltc2642
- Fix wrong Kconfig regmap dependency.
vishay,veml6030
- Fix a scaling problem with decimal part of processed channel.
Note that only the illuminance channel is fixed as a larger series
of cleanups not suitable for this point in the rc cycle removes
the intensity channel anyway.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmccxrARHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0Foj+mw/+L/0EfK5svAzsEApSE+xwKPo2yP8fp8D1
6VycANSRzEjMW1RdV3n6Z3I3hcnU+p3VJ6lz8kbYQGwmDEQWw+q6a6/ONyrwg8Zx
JM0qj3HABZf3tiPYdTgDHTVKUCE93noLxPfTgpRgtDV8FyirEw3/LL/jMxYBYggx
19xV3y6gQQb8Hs7YwcVAMp/nilMX3OpvCN8nPbM/ZEcuB7IDZomQ3qgIuAR4FPRc
I2wiqoQGgQAnaa91UQypsONkJRKoi7UzUWkGZ9SrlVrnehiCJlLbkOHg6Quzab8N
GxEL3r//XPFb5i/2i4/iSvKd62Rg5z435H2dAYq7A52SmC0+uErJ6eSayxuERy6V
x8V/GjkpTSq7TVSx4Qh0Q0MKoFZ60Kw2RjMbvjSyDN+45vEnXMUnXnAe/QU4Blcf
1t+fkKn0RKnJ8GLIkG5Ih0MGz2JfZO+7qfg+IhlK8DqwpbJZpxdaaIVRgsO+UhH+
rtsCDB+nKHWykZafFsF7lig/F2CFzANiNyEk4q4jbHWivm52+vEZYok0cFWYf+3v
N19sEKTrauuyx3E/Q5ualSG/KMyOoHk/rNOeS1l7ORtUWJoeMHP9FnVssFfSIZmI
DCNmk3p3ByasjVl2rnXITTandW8mRujmB4U3rlcgaFYZMxy6AKVQvoK6HLTuic7X
dx422m7ijoI=
=RFc+
-----END PGP SIGNATURE-----
Merge tag 'iio-fixes-for-6.12b' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
IIO: Fixes for 6.12, set 2
Usual mixed back of fixes for ancient bugs and some more recently
introduced problems.
gts-helper module
- Memory leak fixes for this library code to handle complex gain cases.
adi,ad7124
- Fix a divide by zero that can be triggered from userspace.
adi,ad7380
- Various supply fixes. Includes some minor rework that simplifies the
fix though increases the apparent scale of the change.
adi,ad9832
- Avoid a potential divide by zero if clk_get_rate() returns 0.
adi,ltc2642
- Fix wrong Kconfig regmap dependency.
vishay,veml6030
- Fix a scaling problem with decimal part of processed channel.
Note that only the illuminance channel is fixed as a larger series
of cleanups not suitable for this point in the rc cycle removes
the intensity channel anyway.
* tag 'iio-fixes-for-6.12b' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: dac: Kconfig: Fix build error for ltc2664
iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr()
staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg()
docs: iio: ad7380: fix supply for ad7380-4
iio: adc: ad7380: fix supplies for ad7380-4
iio: adc: ad7380: add missing supplies
iio: adc: ad7380: use devm_regulator_get_enable_read_voltage()
dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply
iio: light: veml6030: fix microlux value calculation
iio: gts-helper: Fix memory leaks for the error path of iio_gts_build_avail_scale_table()
iio: gts-helper: Fix memory leaks in iio_gts_build_avail_scale_table()
Read buffer is allocated according to max message size, reported by
the firmware and may reach 64K in systems with pxp client.
Contiguous 64k allocation may fail under memory pressure.
Read buffer is used as in-driver message storage and not required
to be contiguous.
Use kvmalloc to allow kernel to allocate non-contiguous memory.
Fixes: 3030dc0564 ("mei: add wrapper for queuing control commands.")
Cc: stable <stable@kernel.org>
Reported-by: Rohit Agarwal <rohiagar@chromium.org>
Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/
Tested-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Acked-by: Tomas Winkler <tomasw@gmail.com>
Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following race condition could trigger a NULL pointer dereference:
sock_map_link_detach(): sock_map_link_update_prog():
mutex_lock(&sockmap_mutex);
...
sockmap_link->map = NULL;
mutex_unlock(&sockmap_mutex);
mutex_lock(&sockmap_mutex);
...
sock_map_prog_link_lookup(sockmap_link->map);
mutex_unlock(&sockmap_mutex);
<continue>
Fix it by adding a NULL pointer check. In this specific case, it makes
no sense to update a link which is being released.
Reported-by: Ruan Bonan <bonan.ruan@u.nus.edu>
Fixes: 699c23f02c ("bpf: Add bpf_link support for sk_msg and sk_skb progs")
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20241026185522.338562-1-xiyou.wangcong@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This reverts commit 15fffc6a56.
This commit causes a regression, so revert it for now until it can come
back in a way that works for everyone.
Link: https://lore.kernel.org/all/172790598832.1168608.4519484276671503678.stgit@dwillia2-xfh.jf.intel.com/
Fixes: 15fffc6a56 ("driver core: Fix uevent_show() vs driver detach race")
Cc: stable <stable@kernel.org>
Cc: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Dirk Behme <dirk.behme@de.bosch.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The widgets array in the snd_soc_dapm_widget_list has a __counted_by
attribute attached to it, which points to the num_widgets variable. This
attribute is used in bounds checking, and if it is not set before the
array is filled, then the bounds sanitizer will issue a warning or a
kernel panic if CONFIG_UBSAN_TRAP is set.
This patch sets the size of the widgets list calculated with
list_for_each as the initial value for num_widgets as it is used for
allocating memory for the array. It is updated with the actual number of
added elements after the array is filled.
Signed-off-by: Aleksei Vetrov <vvvvvv@google.com>
Fixes: 80e698e2df ("ASoC: soc-dapm: Annotate struct snd_soc_dapm_widget_list with __counted_by")
Link: https://patch.msgid.link/20241028-soc-dapm-bounds-checker-fix-v1-1-262b0394e89e@google.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This patch adds support for another Lenovo Mini dock 0x17EF:0x3098 to the
r8152 driver. The device has been tested on NixOS, hotplugging and sleep
included.
Signed-off-by: Benjamin Große <ste3ls@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241020174128.160898-1-ste3ls@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KASAN reports the following UAF. The metadata_dst, which is used to
store the SCI value for macsec offload, is already freed by
metadata_dst_free() in macsec_free_netdev(), while driver still use it
for sending the packet.
To fix this issue, dst_release() is used instead to release
metadata_dst. So it is not freed instantly in macsec_free_netdev() if
still referenced by skb.
BUG: KASAN: slab-use-after-free in mlx5e_xmit+0x1e8f/0x4190 [mlx5_core]
Read of size 2 at addr ffff88813e42e038 by task kworker/7:2/714
[...]
Workqueue: mld mld_ifc_work
Call Trace:
<TASK>
dump_stack_lvl+0x51/0x60
print_report+0xc1/0x600
kasan_report+0xab/0xe0
mlx5e_xmit+0x1e8f/0x4190 [mlx5_core]
dev_hard_start_xmit+0x120/0x530
sch_direct_xmit+0x149/0x11e0
__qdisc_run+0x3ad/0x1730
__dev_queue_xmit+0x1196/0x2ed0
vlan_dev_hard_start_xmit+0x32e/0x510 [8021q]
dev_hard_start_xmit+0x120/0x530
__dev_queue_xmit+0x14a7/0x2ed0
macsec_start_xmit+0x13e9/0x2340
dev_hard_start_xmit+0x120/0x530
__dev_queue_xmit+0x14a7/0x2ed0
ip6_finish_output2+0x923/0x1a70
ip6_finish_output+0x2d7/0x970
ip6_output+0x1ce/0x3a0
NF_HOOK.constprop.0+0x15f/0x190
mld_sendpack+0x59a/0xbd0
mld_ifc_work+0x48a/0xa80
process_one_work+0x5aa/0xe50
worker_thread+0x79c/0x1290
kthread+0x28f/0x350
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x11/0x20
</TASK>
Allocated by task 3922:
kasan_save_stack+0x20/0x40
kasan_save_track+0x10/0x30
__kasan_kmalloc+0x77/0x90
__kmalloc_noprof+0x188/0x400
metadata_dst_alloc+0x1f/0x4e0
macsec_newlink+0x914/0x1410
__rtnl_newlink+0xe08/0x15b0
rtnl_newlink+0x5f/0x90
rtnetlink_rcv_msg+0x667/0xa80
netlink_rcv_skb+0x12c/0x360
netlink_unicast+0x551/0x770
netlink_sendmsg+0x72d/0xbd0
__sock_sendmsg+0xc5/0x190
____sys_sendmsg+0x52e/0x6a0
___sys_sendmsg+0xeb/0x170
__sys_sendmsg+0xb5/0x140
do_syscall_64+0x4c/0x100
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Freed by task 4011:
kasan_save_stack+0x20/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x37/0x50
poison_slab_object+0x10c/0x190
__kasan_slab_free+0x11/0x30
kfree+0xe0/0x290
macsec_free_netdev+0x3f/0x140
netdev_run_todo+0x450/0xc70
rtnetlink_rcv_msg+0x66f/0xa80
netlink_rcv_skb+0x12c/0x360
netlink_unicast+0x551/0x770
netlink_sendmsg+0x72d/0xbd0
__sock_sendmsg+0xc5/0x190
____sys_sendmsg+0x52e/0x6a0
___sys_sendmsg+0xeb/0x170
__sys_sendmsg+0xb5/0x140
do_syscall_64+0x4c/0x100
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: 0a28bfd497 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20241021100309.234125-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: sched: fix some lock issues
Two small fixes related to the MPTCP packets scheduler:
- Patch 1: add missing rcu_read_(un)lock(). A fix for >= 6.6.
And some modifications in the MPTCP selftests:
- Patch 2: a small addition to the MPTCP selftests to cover more code.
====================
Link: https://patch.msgid.link/20241021-net-mptcp-sched-lock-v1-0-637759cf061c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Listing all the values linked to the MPTCP sysctl knobs was not
exercised in MPTCP test suite.
Let's do that to avoid any regressions, but also to have a kernel with a
debug kconfig verifying more assumptions. For the moment, we are not
interested by the output, only to avoid crashes and warnings.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021-net-mptcp-sched-lock-v1-3-637759cf061c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original link returns 404 now. This commit replaces the dead google
site link with archive.org link.
Signed-off-by: Levi Zim <rsworktech@outlook.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241021-packet_mmap_fix_link-v1-1-dffae4a174c0@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the allocation of chip->auth to tpm2_start_auth_session() so that this
field can be used as flag to tell whether auth session is active or not.
Instead of flushing and reloading the auth session for every transaction
separately, keep the session open unless /dev/tpm0 is used.
Reported-by: Pengyu Ma <mapengyu@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219229
Cc: stable@vger.kernel.org # v6.10+
Fixes: 7ca110f267 ("tpm: Address !chip->auth in tpm_buf_append_hmac_session*()")
Tested-by: Pengyu Ma <mapengyu@gmail.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
The following 3 commits landed in parallel:
commit d7d2688bf4 ("drm/amd/pm: update workload mask after the setting")
commit 7a1613e47e ("drm/amdgpu/smu13: always apply the powersave optimization")
commit 7c210ca5a2 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D")
While everything is set correctly, this caused the profile to be
reported incorrectly because both the powersave and fullscreen3d bits
were set in the mask and when the driver prints the profile, it looks
for the first bit set.
Fixes: d7d2688bf4 ("drm/amd/pm: update workload mask after the setting")
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ecfe9b2376)
Cc: stable@vger.kernel.org
A small collection of driver specific fixes for SPI, there's nothing
particularly remarkable about any of them.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmcf/nYACgkQJNaLcl1U
h9A9AAf/a2PFeRBT9OcgUHkI0/Ca2t/7Onb5C5ouvIwnlYZkgPCQUO5p545HemEn
kHiV579V5iIGQJ8mLzIPin32DjnKltYqkozR+AztGQcO0bULNMFP3++d6b2ZHTUT
l2zhJdwMHpn5psDjEdnHrIQk4goBMoF+CIIaVnPOfkaO7oOpM8NOFoXhodCsYrCg
GLxl16y8MkR4CgZsawCCb24wBZnqPH00UuiqXtrWwbyzSd0mHvJ2vecwWRu1iYk0
74srtk2tsdewPRoSlbUm72aJm0dPyNnk0StwXBnVF8O0g0jbDwWz1kePR9PfQuls
RGari/wB9PIjLEyM08pVdtaByhrjlw==
=u9Y1
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of driver specific fixes for SPI, there's nothing
particularly remarkable about any of them"
* tag 'spi-fix-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-fsl-dspi: Fix crash when not using GPIO chip select
spi: geni-qcom: Fix boot warning related to pm_runtime and devres
spi: mtk-snfi: fix kerneldoc for mtk_snand_is_page_ops()
spi: stm32: fix missing device mode capability in stm32mp25
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:
[ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[ 33.861816] Tainted: [W]=WARN
[ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[ 33.861822] Call Trace:
[ 33.861826] <TASK>
[ 33.861829] dump_stack_lvl+0x66/0x90
[ 33.861838] print_report+0xce/0x620
[ 33.861853] kasan_report+0xda/0x110
[ 33.862794] kasan_check_range+0xfd/0x1a0
[ 33.862799] __asan_memset+0x23/0x40
[ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[ 33.867135] dev_attr_show+0x43/0xc0
[ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0
[ 33.867155] seq_read_iter+0x3f8/0x1140
[ 33.867173] vfs_read+0x76c/0xc50
[ 33.867198] ksys_read+0xfb/0x1d0
[ 33.867214] do_syscall_64+0x90/0x160
...
[ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[ 33.867358] kasan_save_stack+0x33/0x50
[ 33.867364] kasan_save_track+0x17/0x60
[ 33.867367] __kasan_kmalloc+0x87/0x90
[ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu]
[ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[ 33.869608] local_pci_probe+0xda/0x180
[ 33.869614] pci_device_probe+0x43f/0x6b0
Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.
Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.
The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.
v2:
* Drop impossible v3_0 case. (Mario)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 41cec40bc9 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Wenyou Yang <WenYou.Yang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0880f58f96)
Cc: stable@vger.kernel.org # v6.6+
This reverts
commit 9dad21f910 ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35")
[why & how]
The offending commit exposes a hang with lid close/open behavior.
Both issues seem to be related to ODM 2:1 mode switching, so there
is another issue generic to that sequence that needs to be
investigated.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 68bf95317e)
Cc: stable@vger.kernel.org