mirror of
https://github.com/torvalds/linux.git
synced 2024-11-26 06:02:05 +00:00
ffe1e78612
20683 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Peter Xu
|
ffe1e78612 |
mm/gup: cleanup next_page handling
The only path that doesn't use generic "**pages" handling is the gate vma. Make it use the same path, meanwhile tune the next_page label upper to cover "**pages" handling. This prepares for THP handling for "**pages". Link: https://lkml.kernel.org/r/20230628215310.73782-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Peter Xu
|
5502ea44f5 |
mm/hugetlb: add page_mask for hugetlb_follow_page_mask()
follow_page() doesn't need it, but we'll start to need it when unifying gup for hugetlb. Link: https://lkml.kernel.org/r/20230628215310.73782-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Peter Xu
|
458568c929 |
mm/hugetlb: prepare hugetlb_follow_page_mask() for FOLL_PIN
follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the target of FOLL_WRITE either. However add the checks. Namely, either the need to CoW due to missing write bit, or proper unsharing on !AnonExclusive pages over R/O pins to reject the follow page. That brings this function closer to follow_hugetlb_page(). So we don't care before, and also for now. But we'll care if we switch over slow-gup to use hugetlb_follow_page_mask(). We'll also care when to return -EMLINK properly, as that's the gup internal api to mean "we should unshare". Not really needed for follow page path, though. When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be clear that it just should never fail. When error happens, instead of setting page==NULL, capture the errno instead. Link: https://lkml.kernel.org/r/20230628215310.73782-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Peter Xu
|
dd767aaa2f |
mm/hugetlb: handle FOLL_DUMP well in follow_page_mask()
Patch series "mm/gup: Unify hugetlb, speed up thp", v4. Hugetlb has a special path for slow gup that follow_page_mask() is actually skipped completely along with faultin_page(). It's not only confusing, but also duplicating a lot of logics that generic gup already has, making hugetlb slightly special. This patchset tries to dedup the logic, by first touching up the slow gup code to be able to handle hugetlb pages correctly with the current follow page and faultin routines (where we're mostly there.. due to 10 years ago we did try to optimize thp, but half way done; more below), then at the last patch drop the special path, then the hugetlb gup will always go the generic routine too via faultin_page(). Note that hugetlb is still special for gup, mostly due to the pgtable walking (hugetlb_walk()) that we rely on which is currently per-arch. But this is still one small step forward, and the diffstat might be a proof too that this might be worthwhile. Then for the "speed up thp" side: as a side effect, when I'm looking at the chunk of code, I found that thp support is actually partially done. It doesn't mean that thp won't work for gup, but as long as **pages pointer passed over, the optimization will be skipped too. Patch 6 should address that, so for thp we now get full speed gup. For a quick number, "chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10" gives me 13992.50us -> 378.50us. Gup_test is an extreme case, but just to show how it affects thp gups. This patch (of 8): Firstly, the no_page_table() is meaningless for hugetlb which is a no-op there, because a hugetlb page always satisfies: - vma_is_anonymous() == false - vma->vm_ops->fault != NULL So we can already safely remove it in hugetlb_follow_page_mask(), alongside with the page* variable. Meanwhile, what we do in follow_hugetlb_page() actually makes sense for a dump: we try to fault in the page only if the page cache is already allocated. Let's do the same here for follow_page_mask() on hugetlb. It should so far has zero effect on real dumps, because that still goes into follow_hugetlb_page(). But this may start to influence a bit on follow_page() users who mimics a "dump page" scenario, but hopefully in a good way. This also paves way for unifying the hugetlb gup-slow. Link: https://lkml.kernel.org/r/20230628215310.73782-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230628215310.73782-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A . Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Peter Collingbourne
|
b53e24c4f6 |
mm: call arch_swap_restore() from unuse_pte()
We would like to move away from requiring architectures to restore metadata from swap in the set_pte_at() implementation, as this is not only error-prone but adds complexity to the arch-specific code. This requires us to call arch_swap_restore() before calling swap_free() whenever pages are restored from swap. We are currently doing so everywhere except in unuse_pte(); do so there as well. Link: https://lkml.kernel.org/r/20230523004312.1807357-3-pcc@google.com Link: https://linux-review.googlesource.com/id/I68276653e612d64cde271ce1b5a99ae05d6bbc4f Signed-off-by: Peter Collingbourne <pcc@google.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Kefeng Wang
|
1279aa0656 |
mm: make show_free_areas() static
All callers of show_free_areas() pass 0 and NULL, so we can directly use show_mem() instead of show_free_areas(0, NULL), which could make show_free_areas() a static function. Link: https://lkml.kernel.org/r/20230630062253.189440-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Thomas Weißschuh
|
626e98cb03 |
mm: make MEMFD_CREATE into a selectable config option
The memfd_create() syscall, enabled by CONFIG_MEMFD_CREATE, is useful on its own even when not required by CONFIG_TMPFS or CONFIG_HUGETLBFS. Split it into its own proper bool option that can be enabled by users. Move that option into mm/ where the code itself also lies. Also add "select" statements to CONFIG_TMPFS and CONFIG_HUGETLBFS so they automatically enable CONFIG_MEMFD_CREATE as before. Link: https://lkml.kernel.org/r/20230630-config-memfd-v1-1-9acc3ae38b5a@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Tested-by: Zhangjin Wu <falcon@tinylab.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
ZhangPeng
|
fc1878ec70 |
mm: remove page_rmapping()
After converting the last user to folio_raw_mapping(), we can safely remove the function. Link: https://lkml.kernel.org/r/20230701032853.258697-3-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
ZhangPeng
|
15b4919a1e |
mm: use a folio in fault_dirty_shared_page()
We can replace four implicit calls to compound_head() with one by using folio. Link: https://lkml.kernel.org/r/20230701032853.258697-2-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Ma Wupeng
|
c70699e555 |
swap: stop add to avail list if swap is full
Our test finds a WARN_ON in add_to_avail_list. During add_to_avail_list, avail_lists is already in swap_avail_heads, while leads to this WARN_ON. Here is the simplified calltrace: ------------[ cut here ]------------ Call trace: add_to_avail_list+0xb8/0xc0 swap_range_free+0x110/0x138 swapcache_free_entries+0x100/0x1c0 free_swap_slot+0xbc/0xe0 put_swap_folio+0x1f0/0x2ec delete_from_swap_cache+0x6c/0xd0 folio_free_swap+0xa4/0xe4 __try_to_reclaim_swap+0x9c/0x190 free_swap_and_cache+0x84/0x88 unmap_page_range+0x31c/0x934 unmap_single_vma.isra.0+0x48/0x84 unmap_vmas+0x98/0x10c exit_mmap+0xa4/0x210 mmput+0x88/0x158 do_exit+0x284/0x970 do_group_exit+0x34/0x90 post_copy_siginfo_from_user32+0x0/0x1cc do_notify_resume+0x15c/0x470 el0_svc+0x74/0x84 el0t_64_sync_handler+0xb8/0xbc el0t_64_sync+0x190/0x194 During swapoff, try_to_unuse fails to alloc memory due to memory limit and this leads to the failure of swapoff and causes re-insertion of swap space back into swap_list. During _enable_swap_info, this swap device is added to avail list even this swap device if full. At the same time, one entry in this full swap device in released and we try to add this device into avail list and find it is already in the avail list. This causes this WARN_ON. To fix this. Don't add to avail list is swap is full. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20230627120833.2230766-3-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Ma Wupeng
|
67490031e8 |
swap: cleanup duplicated WARN_ON in add_to_avail_list
Patch series "fix WARN_ON in add_to_avail_list". Empty check for plist_node is checked in add_to_avail_list and plist_add. Drop the duplicate one in add_to_avail_list. Link: https://lkml.kernel.org/r/20230627120833.2230766-1-mawupeng1@huawei.com Link: https://lkml.kernel.org/r/20230627120833.2230766-2-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sidhartha Kumar
|
87b11f8622 |
mm: increase usage of folio_next_index() helper
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Link: https://lkml.kernel.org/r/20230627174349.491803-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Miaohe Lin
|
3a29280afb |
mm/mm_init.c: update obsolete comment in get_pfn_range_for_nid()
Since commit
|
||
Charan Teja Kalla
|
20c897eadf |
mm: madvise: fix uneven accounting of psi
A folio turns into a Workingset during: 1) shrink_active_list() placing the folio from active to inactive list. 2) When a workingset transition is happening during the folio refault. And when Workingset is set on a folio, PSI for memory can be accounted during a) That folio is being reclaimed and b) Refault of that folio, for usual reclaims. This accounting of PSI for memory is not consistent for reclaim + refault operation between usual reclaim and madvise(COLD/PAGEOUT) which deactivate or proactively reclaim a folio: a) A folio started at inactive and moved to active as part of accesses. Workingset is absent on the folio thus refault of it when reclaimed through MADV_PAGEOUT operation doesn't account for PSI. b) When the same folio transition from inactive->active and then to inactive through shrink_active_list(). Workingset is set on the folio thus refault of it when reclaimed through MADV_PAGEOUT operation accounts for PSI. c) When the same folio is part of active list directly as a result of folio refault and this was a workingset folio prior to eviction. Workingset is set on the folio thus the refault of it when reclaimed through MADV_PAGEOUT/MADV_COLD operation accounts for PSI. d) MADV_COLD transfers the folio from active list to inactive list. Such folios may not have the Workingset thus refault operation on such folio doesn't account for PSI. As said above, refault operation caused because of MADV_PAGEOUT on a folio is accounts for memory PSI in b) and c) but not in a). Refault caused by the reclaim of a folio on which MADV_COLD is performed accounts memory PSI in c) but not in d). These behaviours are inconsistent w.r.t usual reclaim + refault operation. Make this PSI accounting always consistent by turning a folio into a workingset one whenever it is leaving the active list. Also, accounting of PSI on a folio whenever it leaves the active list as part of the MADV_COLD/PAGEOUT operation helps the users whether they are operating on proper folios[1]. [1] https://lore.kernel.org/all/20230605180013.GD221380@cmpxchg.org/ Link: https://lkml.kernel.org/r/1688393201-11135-1-git-send-email-quic_charante@quicinc.com Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Suggested-by: Suren Baghdasaryan <surenb@google.com> Reported-by: Sai Manobhiram Manapragada <quic_smanapra@quicinc.com> Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Linus Torvalds
|
122e7943b2 |
11 hotfixes. Five are cc:stable and the remainder address post-6.4 issues
or aren't considered serious enough to justify backporting. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZMRGzwAKCRDdBJ7gKXxA jtWoAQDqD5yton3O/tPcCC2X7QbV5bsgghIqvQFo5yWvuiJdNwEAkKwLnXISAadg RmVCgsfQ+4CCsJgp7RpPlMS43m2AQgI= =Rib2 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2023-07-28-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "11 hotfixes. Five are cc:stable and the remainder address post-6.4 issues or aren't considered serious enough to justify backporting" * tag 'mm-hotfixes-stable-2023-07-28-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory-failure: fix hardware poison check in unpoison_memory() proc/vmcore: fix signedness bug in read_from_oldmem() mailmap: update remaining active codeaurora.org email addresses mm: lock VMA in dup_anon_vma() before setting ->anon_vma mm: fix memory ordering for mm_lock_seq and vm_lock_seq scripts/spelling.txt: remove 'thead' as a typo mm/pagewalk: fix EFI_PGT_DUMP of espfix area shmem: minor fixes to splice-read implementation tmpfs: fix Documentation of noswap and huge mount options Revert "um: Use swap() to make code cleaner" mm/damon/core-test: initialise context before test in damon_test_set_attrs() |
||
Mike Rapoport (IBM)
|
c442a957b2 |
Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF"
This reverts commit
|
||
Jann Horn
|
6c21e066f9 |
mm/mempolicy: Take VMA lock before replacing policy
mbind() calls down into vma_replace_policy() without taking the per-VMA
locks, replaces the VMA's vma->vm_policy pointer, and frees the old
policy. That's bad; a concurrent page fault might still be using the
old policy (in vma_alloc_folio()), resulting in use-after-free.
Normally this will manifest as a use-after-free read first, but it can
result in memory corruption, including because vma_alloc_folio() can
call mpol_cond_put() on the freed policy, which conditionally changes
the policy's refcount member.
This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA
systems as long as the kernel was built with CONFIG_NUMA.
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Fixes:
|
||
Sidhartha Kumar
|
6c54312f96 |
mm/memory-failure: fix hardware poison check in unpoison_memory()
It was pointed out[1] that using folio_test_hwpoison() is wrong as we need
to check the indiviual page that has poison. folio_test_hwpoison() only
checks the head page so go back to using PageHWPoison().
User-visible effects include existing hwpoison-inject tests possibly
failing as unpoisoning a single subpage could lead to unpoisoning an
entire folio. Memory unpoisoning could also not work as expected as
the function will break early due to only checking the head page and
not the actually poisoned subpage.
[1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
Link: https://lkml.kernel.org/r/20230717181812.167757-1-sidhartha.kumar@oracle.com
Fixes:
|
||
Jann Horn
|
d8ab9f7b64 |
mm: lock VMA in dup_anon_vma() before setting ->anon_vma
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA. This currently happens while `dst` is not write-locked.
This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock. This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.
This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.
This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.
I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.
For a kernel-assisted reproducer, see the notes section of the patch mail.
Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes:
|
||
Hugh Dickins
|
8b1cb4a2e8 |
mm/pagewalk: fix EFI_PGT_DUMP of espfix area
Booting x86_64 with CONFIG_EFI_PGT_DUMP=y shows messages of the form "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)". EFI_PGT_DUMP dumps all of efi_mm, including the espfix area, which is set up with pmd entries which fit the pmd_bad() check: so |
||
Hugh Dickins
|
fa598952fa |
shmem: minor fixes to splice-read implementation
HWPoison: my reading of folio_test_hwpoison() is that it only tests the
head page of a large folio, whereas splice_folio_into_pipe() will splice
as much of the folio as it can: so for safety we should also check the
has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned.
(Perhaps that ugliness can be improved at the mm end later.)
The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask
it for "part" not "len".
Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com
Fixes:
|
||
Feng Tang
|
d1836a3b2a |
mm/damon/core-test: initialise context before test in damon_test_set_attrs()
Running kunit test for 6.5-rc1 hits one bug:
ok 10 damon_test_update_monitoring_result
general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G N 6.5.0-rc2 #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:damon_set_attrs+0xb9/0x120
Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d
8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00
49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89
RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246
RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70
RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed
R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78
R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28
FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
damon_test_set_attrs+0x63/0x1f0
kunit_generic_run_threadfn_adapter+0x17/0x30
kthread+0xfd/0x130
The problem seems to be related with the damon_ctx was used without
being initialized. Fix it by adding the initialization.
Link: https://lkml.kernel.org/r/20230718052811.1065173-1-feng.tang@intel.com
Fixes:
|
||
Linus Torvalds
|
379e66711b |
memblock: reset memblock.reserved to system init state to prevent UAF
A call to memblock_free() or memblock_phys_free() issued after memblock data is discarded will result in use after free in memblock_isolate_range(). When CONFIG_KASAN is enabled, this will cause a panic early in boot. Without CONFIG_KASAN, there is a chance that memblock_isolate_range() might scribble on memory that is now in use by somebody else. Avoid those issues by making sure that memblock_discard points memblock.reserved.regions back at the static buffer. If memblock_free() or memblock_phys_free() is called after memblock memory is discarded, that will print a warning in memblock_remove_region(). -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmTB94cQHHJwcHRAa2Vy bmVsLm9yZwAKCRA5A4Ymyw79kesHB/4rNvGFGEI8LFxooARLt8glcv0Hn7oJ+z3L Xyczw1ZkglT3DEYsoY78bSriddWPqrV3wWkr+p2NYXPBJWgQZ6t3DRZviqzXcj2l Ew2XwLAfT6Vay1eqEFfJJvkGg27QLhnmJPnjDzCWweiXUaR5xOESwKCBmZBWeXUU t5EFJMIXLVEoBDLGW5kk+Q4RZDqhU/sJWDqf4ciWQ5vDS8OFTr56hfth7T8XoMxm BPlC21+cEJUWrbb1gAJUMbIERTzvYg8odZqSAESlHyNyDEtYjyLce5W6HA6zHK+H 2gqiti+Pd1OyHbJUc1lN7iRTE8FJ7DQcBr6H9sk81Po5af02Ky7m =FRx8 -----END PGP SIGNATURE----- Merge tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fix from Mike Rapoport: "A call to memblock_free() or memblock_phys_free() issued after memblock data is discarded will result in use after free in memblock_isolate_range(). Avoid those issues by making sure that memblock_discard points memblock.reserved.regions back at the static buffer" * tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: mm,memblock: reset memblock.reserved to system init state to prevent UAF |
||
Jann Horn
|
657b514695 |
mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't
be called in the VMA-locked page fault path by ensuring that
vma->anon_vma is set.
However, this check happens before the VMA is locked, which means a
concurrent move_vma() can concurrently call unlink_anon_vmas(), which
disassociates the VMA's anon_vma.
This means we can get UAF in the following scenario:
THREAD 1 THREAD 2
======== ========
<page fault>
lock_vma_under_rcu()
rcu_read_lock()
mas_walk()
check vma->anon_vma
mremap() syscall
move_vma()
vma_start_write()
unlink_anon_vmas()
<syscall end>
handle_mm_fault()
__handle_mm_fault()
handle_pte_fault()
do_pte_missing()
do_anonymous_page()
anon_vma_prepare()
__anon_vma_prepare()
find_mergeable_anon_vma()
mas_walk() [looks up VMA X]
munmap() syscall (deletes VMA X)
reusable_anon_vma() [called on freed VMA X]
This is a security bug if you can hit it, although an attacker would
have to win two races at once where the first race window is only a few
instructions wide.
This patch is based on some previous discussion with Linus Torvalds on
the security list.
Cc: stable@vger.kernel.org
Fixes:
|
||
Rik van Riel
|
9e46e4dcd9 |
mm,memblock: reset memblock.reserved to system init state to prevent UAF
The memblock_discard function frees the memblock.reserved.regions array, which is good. However, if a subsequent memblock_free (or memblock_phys_free) comes in later, from for example ima_free_kexec_buffer, that will result in a use after free bug in memblock_isolate_range. When running a kernel with CONFIG_KASAN enabled, this will cause a kernel panic very early in boot. Without CONFIG_KASAN, there is a chance that memblock_isolate_range might scribble on memory that is now in use by somebody else. Avoid those issues by making sure that memblock_discard points memblock.reserved.regions back at the static buffer. If memblock_free is called after memblock memory is discarded, that will print a warning in memblock_remove_region. Signed-off-by: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20230719154137.732d8525@imladris.surriel.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> |
||
Liam R. Howlett
|
2658f94d67 |
mm/mlock: fix vma iterator conversion of apply_vma_lock_flags()
apply_vma_lock_flags() calls mlock_fixup(), which could merge the VMA
after where the vma iterator is located. Although this is not an issue,
the next iteration of the loop will check the start of the vma to be equal
to the locally saved 'tmp' variable and cause an incorrect failure
scenario. Fix the error by setting tmp to the end of the vma iterator
value before restarting the loop.
There is also a potential of the error code being overwritten when the
loop terminates early. Fix the return issue by directly returning when an
error is encountered since there is nothing to undo after the loop.
Link: https://lkml.kernel.org/r/20230711175020.4091336-1-Liam.Howlett@oracle.com
Fixes:
|
||
Hugh Dickins
|
1c7873e336 |
mm: lock newly mapped VMA with corrected ordering
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes:
|
||
Linus Torvalds
|
946c6b59c5 |
16 hotfixes. Six are cc:stable and the remainder address post-6.4 issues.
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZKmgXAAKCRDdBJ7gKXxA joqDAP0V520Jy0cyJrRMvaQRFMqtVeDOdTpAue7ZOQHSi/LZnAD9EEAxDpYF/V4x PO27ixXQ4Glm2iYgH7bDX7J73WiA3wg= =JsYW -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "16 hotfixes. Six are cc:stable and the remainder address post-6.4 issues" The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since it was all hopefully fixed in mainline. * tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: lib: dhry: fix sleeping allocations inside non-preemptable section kasan, slub: fix HW_TAGS zeroing with slub_debug kasan: fix type cast in memory_is_poisoned_n mailmap: add entries for Heiko Stuebner mailmap: update manpage link bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page MAINTAINERS: add linux-next info mailmap: add Markus Schneider-Pargmann writeback: account the number of pages written back mm: call arch_swap_restore() from do_swap_page() squashfs: fix cache race with migration mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison docs: update ocfs2-devel mailing list address MAINTAINERS: update ocfs2-devel mailing list address mm: disable CONFIG_PER_VMA_LOCK until its fixed fork: lock VMAs of the parent process when forking |
||
Suren Baghdasaryan
|
33313a747e |
mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it afterwards before dropping the mmap_lock. This poses a problem for page faults handled under per-VMA locks because they don't take the mmap_lock and can stumble on this VMA while it's still being modified. Currently this does not pose a problem since post-addition modifications are done only for file-backed VMAs, which are not handled under per-VMA lock. However, once support for handling file-backed page faults with per-VMA locks is added, this will become a race. Fix this by write-locking the VMA before inserting it into the VMA tree. Other places where a new VMA is added into VMA tree do not modify it after the insertion, so do not need the same locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Suren Baghdasaryan
|
c137381f71 |
mm: lock a vma before stack expansion
With recent changes necessitating mmap_lock to be held for write while expanding a stack, per-VMA locks should follow the same rules and be write-locked to prevent page faults into the VMA being expanded. Add the necessary locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrey Konovalov
|
fdb54d9660 |
kasan, slub: fix HW_TAGS zeroing with slub_debug
Commit |
||
Andrey Konovalov
|
05c56e7b43 |
kasan: fix type cast in memory_is_poisoned_n
Commit |
||
Matthew Wilcox (Oracle)
|
8344a3d44b |
writeback: account the number of pages written back
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.
Dave added:
: XFS is the only filesystem this would affect, right? AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
:
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.
Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
Fixes:
|
||
Peter Collingbourne
|
6dca4ac6fc |
mm: call arch_swap_restore() from do_swap_page()
Commit |
||
John Hubbard
|
191fcdb6c9 |
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
The following crash happens for me when running the -mm selftests (below). Specifically, it happens while running the uffd-stress subtests: kernel BUG at mm/hugetlb.c:7249! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109 Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018 RIP: 0010:huge_pte_alloc+0x12c/0x1a0 ... Call Trace: <TASK> ? __die_body+0x63/0xb0 ? die+0x9f/0xc0 ? do_trap+0xab/0x180 ? huge_pte_alloc+0x12c/0x1a0 ? do_error_trap+0xc6/0x110 ? huge_pte_alloc+0x12c/0x1a0 ? handle_invalid_op+0x2c/0x40 ? huge_pte_alloc+0x12c/0x1a0 ? exc_invalid_op+0x33/0x50 ? asm_exc_invalid_op+0x16/0x20 ? __pfx_put_prev_task_idle+0x10/0x10 ? huge_pte_alloc+0x12c/0x1a0 hugetlb_fault+0x1a3/0x1120 ? finish_task_switch+0xb3/0x2a0 ? lock_is_held_type+0xdb/0x150 handle_mm_fault+0xb8a/0xd40 ? find_vma+0x5d/0xa0 do_user_addr_fault+0x257/0x5d0 exc_page_fault+0x7b/0x1f0 asm_exc_page_fault+0x22/0x30 That happens because a BUG() statement in huge_pte_alloc() attempts to check that a pte, if present, is a hugetlb pte, but it does so in a non-lockless-safe manner that leads to a false BUG() report. We got here due to a couple of bugs, each of which by itself was not quite enough to cause a problem: First of all, before commit c33c794828f2("mm: ptep_get() conversion"), the BUG() statement in huge_pte_alloc() was itself fragile: it relied upon compiler behavior to only read the pte once, despite using it twice in the same conditional. Next, commit |
||
Suren Baghdasaryan
|
f96c486703 |
mm: disable CONFIG_PER_VMA_LOCK until its fixed
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Disable per-VMA locks config to
prevent this issue until the fix is confirmed. This is expected to be a
temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-3-surenb@google.com
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes:
|
||
Linus Torvalds
|
6cd06ab12d |
gup: make the stack expansion warning a bit more targeted
I added a warning about about GUP no longer expanding the stack in
commit
|
||
Linus Torvalds
|
b5641a5d8b |
mm: don't do validate_mm() unnecessarily and without mmap locking
This is an addition to commit |
||
Linus Torvalds
|
ae80b40419 |
mm: validate the mm before dropping the mmap lock
Commit |
||
Liam R. Howlett
|
408579cd62 |
mm: Update do_vmi_align_munmap() return semantics
Since do_vmi_align_munmap() will always honor the downgrade request on the success, the callers no longer have to deal with confusing return codes. Since all callers that request downgrade actually want the lock to be dropped, change the downgrade to an unlock request. Note that the lock still needs to be held in read mode during the page table clean up to avoid races with a map request. Update do_vmi_align_munmap() to return 0 for success. Clean up the callers and comments to always expect the unlock to be honored on the success path. The error path will always leave the lock untouched. As part of the cleanup, the wrapper function do_vmi_munmap() and callers to the wrapper are also updated. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/linux-mm/20230629191414.1215929-1-willy@infradead.org/ Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Matthew Wilcox (Oracle)
|
e4bd84c069 |
mm: Always downgrade mmap_lock if requested
Now that stack growth must always hold the mmap_lock for write, we can always downgrade the mmap_lock to read and safely unmap pages from the page table, even if we're next to a stack. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Max Filippov
|
03f889378f |
xtensa: fix lock_mm_and_find_vma in case VMA not found
MMU version of lock_mm_and_find_vma releases the mm lock before
returning when VMA is not found. Do the same in noMMU version.
This fixes hang on an attempt to handle protection fault.
Fixes:
|
||
Linus Torvalds
|
d85a143b69 |
xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion
It turns out that xtensa has a really odd configuration situation: you
can do a no-MMU config, but still have the page fault code enabled.
Which doesn't sound all that sensible, but it turns out that xtensa can
have protection faults even without the MMU, and we have this:
config PFAULT
bool "Handle protection faults" if EXPERT && !MMU
default y
help
Handle protection faults. MMU configurations must enable it.
noMMU configurations may disable it if used memory map never
generates protection faults or faults are always fatal.
If unsure, say Y.
which completely violated my expectations of the page fault handling.
End result: Guenter reports that the xtensa no-MMU builds all fail with
arch/xtensa/mm/fault.c: In function ‘do_page_fault’:
arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’
because I never exposed the new lock_mm_and_find_vma() function for the
no-MMU case.
Doing so is simple enough, and fixes the problem.
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes:
|
||
Linus Torvalds
|
075e333591 |
memblock: small updates for 6.5-rc1
* add test for memblock_alloc_node() * minor coding style fixes * add flags and nid info in memblock debugfs -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmSaoGYQHHJwcHRAa2Vy bmVsLm9yZwAKCRA5A4Ymyw79kfVhB/4sH73TJrPbNgo6ITQW0BpePgvJo/1aCuB9 5RYsTjHb98F5g9Bxq9D8XdJvWgPI9B8FWVYARws4g3+LCtiMXyibVYg+DF3wjUgc Y/1X6BjYoG7pVLlx5tZ8GE8SGj0q709u2dI6BmrzM+9WPBjNKwRw14wLtIyDxd0r j58p9YpWH/lZvZsTpUdjh/1L3G+88lHL5H8OnvlyM116kwd1DAXKAyq8dYd1Yly6 mPNvRAj/ZXGiPd7a1+ry677CFOKSXTm0O7qoXGw6KsOdqBhVgwrudwU4m/K3vAla m7N4Zgh20Ow8RuYg/uHr2S4/u0e1my99jxnndRtbpfKtlzK6kJVK =l43o -----END PGP SIGNATURE----- Merge tag 'memblock-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - add test for memblock_alloc_node() - minor coding style fixes - add flags and nid info in memblock debugfs * tag 'memblock-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: Update nid info in memblock debugfs memblock: Add flags and nid info in memblock debugfs Fix some coding style errors in memblock.c Add tests for memblock_alloc_node() |
||
Linus Torvalds
|
43ec8a620b |
Fix error return from do_vmi_align_munmap()
-----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEEMUsIrNDeSBEzpfKGm+mA/QrAFUQFAmSd8LASHGR3bXdAYW1h em9uLmNvLnVrAAoJEJvpgP0KwBVExK4P/RXQmDSDftYRoNiQV9X2K4O4NruST8L8 DoYrfa1vHIxVhpZQBHD8KmP1T7ZCPxwfw6ojLYugTZSzwstLEELm+eLjj1j0yPC9 xOYnsOQhlfjP4anV6scVI6WYWk8RogmpB288uXU5sjid6JmCcs4VHFsxvs4MjM52 +P/oGebOSjFQCTRXttFh+uuPlmb9bQ0OMdMkmzDuFIgyftupndFxEEVOW1SIMYLb c0nakgaRtoNQcuM8mR16BL33j7FZVMfqWoJAReX6/wkffNgSFr2jf8Xe5Y9pGgTx nO4fmc0cNDv5K85mIVDnRAFC6HLsctxnmWFt+h+HNIN8fP4rGcKL2XsFLy5p/mKj c8GK3TLp98W672MQTUcM9neMOjs3PSRE0BmXlpTLrx1VKWjc8eT8Sn6tlbEWcqhQ Kz3Zh2iSHIGoK0gjuCJ3N4kt6HBkkAzuGIslgBHyTWntExIH/KfDuQVA3XwoEdtf VpATTAv1CWcEDPp7iwAtIyKHjjsMz2Rp+WbsKE6O9TYJzn2fTF5Or0MND+5ntOxk n0TP8pIqrqY5fUFLpcJMrNqMPHqmlP2a0ZhsHRdT8xO94Wv0Z9gWSgzWNp4MY6w1 SWDfSI5cSJJg+pD1TlJXnE62Phyq9DH9+ATpDO6iFpTJe8QLCndr330w5EkScaWO Umkq3k789Jud =ePMY -----END PGP SIGNATURE----- Merge tag 'unmap-fix-20230629' of git://git.infradead.org/users/dwmw2/linux Pull mm fix from David Woodhouse: "Fix error return from do_vmi_align_munmap()" * tag 'unmap-fix-20230629' of git://git.infradead.org/users/dwmw2/linux: mm/mmap: Fix error return in do_vmi_align_munmap() |
||
Linus Torvalds
|
632f54b4d6 |
slab updates for 6.5
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmSZtjsACgkQu+CwddJF iJqCTwf/XVhmAD7zMOj6g1aak5oHNZDRG5jufM5UNXmiWjCWT3w4DpltrJkz0PPm mg3Ac5fjNUqesZ1SGtUbvoc363smroBrRudGEFrsUhqBcpR+S4fSneoDk+xqMypf VLXP/8kJlFEBGMiR7ouAWnR4+u6JgY4E8E8JIPNzao5KE/L1lD83nY+Usjc/01ek oqMyYVFRfncsGjGJXc5fOOTTCj768mRroF0sLmEegIonnwQkSHE7HWJ/nyaVraDV bomnTIgMdVIDqharin08ZPIM7qBIWM09Uifaf0lIs6fIA94pQP+5Ko3mum2P/S+U ON/qviSrlNgRXoHPJ3hvPHdfEU9cSg== =1d0v -----END PGP SIGNATURE----- Merge tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - SLAB deprecation: Following the discussion at LSF/MM 2023 [1] and no objections, the SLAB allocator is deprecated by renaming the config option (to make its users notice) to CONFIG_SLAB_DEPRECATED with updated help text. SLUB should be used instead. Existing defconfigs with CONFIG_SLAB are also updated. - SLAB_NO_MERGE kmem_cache flag (Jesper Dangaard Brouer): There are (very limited) cases where kmem_cache merging is undesirable, and existing ways to prevent it are hacky. Introduce a new flag to do that cleanly and convert the existing hacky users. Btrfs plans to use this for debug kernel builds (that use case is always fine), networking for performance reasons (that should be very rare). - Replace the usage of weak PRNGs (David Keisar Schmidt): In addition to using stronger RNGs for the security related features, the code is a bit cleaner. - Misc code cleanups (SeongJae Parki, Xiongwei Song, Zhen Lei, and zhaoxinchao) Link: https://lwn.net/Articles/932201/ [1] * tag 'slab-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab_common: use SLAB_NO_MERGE instead of negative refcount mm/slab: break up RCU readers on SLAB_TYPESAFE_BY_RCU example code mm/slab: add a missing semicolon on SLAB_TYPESAFE_BY_RCU example code mm/slab_common: reduce an if statement in create_cache() mm/slab: introduce kmem_cache flag SLAB_NO_MERGE mm/slab: rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED mm/slab: remove HAVE_HARDENED_USERCOPY_ALLOCATOR mm/slab_common: Replace invocation of weak PRNG mm/slab: Replace invocation of weak PRNG slub: Don't read nr_slabs and total_objects directly slub: Remove slabs_node() function slub: Remove CONFIG_SMP defined check slub: Put objects_show() into CONFIG_SLUB_DEBUG enabled block slub: Correct the error code when slab_kset is NULL mm/slab: correct return values in comment for _kmem_cache_create() |
||
Linus Torvalds
|
eee9c708cc |
gup: avoid stack expansion warning for known-good case
In commit
|
||
Hugh Dickins
|
e8c716bc68 |
mm/khugepaged: fix regression in collapse_file()
There is no xas_pause(&xas) in collapse_file()'s main loop, at the points
where it does xas_unlock_irq(&xas) and then continues.
That would explain why, once two weeks ago and twice yesterday, I have
hit the VM_BUG_ON_PAGE(page != xas_load(&xas), page) since "mm/khugepaged:
fix iteration in collapse_file" removed the xas_set(&xas, index) just
before it: xas.xa_node could be left pointing to a stale node, if there
was concurrent activity on the file which transformed its xarray.
I tried inserting xas_pause()s, but then even bootup crashed on that
VM_BUG_ON_PAGE(): there appears to be a subtle "nextness" implicit in
xas_pause().
xas_next() and xas_pause() are good for use in simple loops, but not in
this one: xas_set() worked well until now, so use xas_set(&xas, index)
explicitly at the head of the loop; and change that VM_BUG_ON_PAGE() not
to need its own xas_set(), and not to interfere with the xa_state (which
would probably stop the crashes from xas_pause(), but I trust that less).
The user-visible effects of this bug (if VM_BUG_ONs are configured out)
would be data loss and data leak - potentially - though in practice I
expect it is more likely that a subsequent check (e.g. on mapping or on
nr_none) would notice an inconsistency, and just abandon the collapse.
Link: https://lore.kernel.org/linux-mm/f18e4b64-3f88-a8ab-56cc-d1f5f9c58d4@google.com/
Fixes:
|
||
Linus Torvalds
|
9471f1f2f5 |
Merge branch 'expand-stack'
This modifies our user mode stack expansion code to always take the mmap_lock for writing before modifying the VM layout. It's actually something we always technically should have done, but because we didn't strictly need it, we were being lazy ("opportunistic" sounds so much better, doesn't it?) about things, and had this hack in place where we would extend the stack vma in-place without doing the proper locking. And it worked fine. We just needed to change vm_start (or, in the case of grow-up stacks, vm_end) and together with some special ad-hoc locking using the anon_vma lock and the mm->page_table_lock, it all was fairly straightforward. That is, it was all fine until Ruihan Li pointed out that now that the vma layout uses the maple tree code, we *really* don't just change vm_start and vm_end any more, and the locking really is broken. Oops. It's not actually all _that_ horrible to fix this once and for all, and do proper locking, but it's a bit painful. We have basically three different cases of stack expansion, and they all work just a bit differently: - the common and obvious case is the page fault handling. It's actually fairly simple and straightforward, except for the fact that we have something like 24 different versions of it, and you end up in a maze of twisty little passages, all alike. - the simplest case is the execve() code that creates a new stack. There are no real locking concerns because it's all in a private new VM that hasn't been exposed to anybody, but lockdep still can end up unhappy if you get it wrong. - and finally, we have GUP and page pinning, which shouldn't really be expanding the stack in the first place, but in addition to execve() we also use it for ptrace(). And debuggers do want to possibly access memory under the stack pointer and thus need to be able to expand the stack as a special case. None of these cases are exactly complicated, but the page fault case in particular is just repeated slightly differently many many times. And ia64 in particular has a fairly complicated situation where you can have both a regular grow-down stack _and_ a special grow-up stack for the register backing store. So to make this slightly more manageable, the bulk of this series is to first create a helper function for the most common page fault case, and convert all the straightforward architectures to it. Thus the new 'lock_mm_and_find_vma()' helper function, which ends up being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon, loongarch, nios2, sh, sparc32, and xtensa. So we not only convert more than half the architectures, we now have more shared code and avoid some of those twisty little passages. And largely due to this common helper function, the full diffstat of this series ends up deleting more lines than it adds. That still leaves eight architectures (ia64, m68k, microblaze, openrisc, parisc, s390, sparc64 and um) that end up doing 'expand_stack()' manually because they are doing something slightly different from the normal pattern. Along with the couple of special cases in execve() and GUP. So there's a couple of patches that first create 'locked' helper versions of the stack expansion functions, so that there's a obvious path forward in the conversion. The execve() case is then actually pretty simple, and is a nice cleanup from our old "grow-up stackls are special, because at execve time even they grow down". The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because it's just more straightforward to write out the stack expansion there manually, instead od having get_user_pages_remote() do it for us in some situations but not others and have to worry about locking rules for GUP. And the final step is then to just convert the remaining odd cases to a new world order where 'expand_stack()' is called with the mmap_lock held for reading, but where it might drop it and upgrade it to a write, only to return with it held for reading (in the success case) or with it completely dropped (in the failure case). In the process, we remove all the stack expansion from GUP (where dropping the lock wouldn't be ok without special rules anyway), and add it in manually to __access_remote_vm() for ptrace(). Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases. Everything else here felt pretty straightforward, but the ia64 rules for stack expansion are really quite odd and very different from everything else. Also thanks to Vegard Nossum who caught me getting one of those odd conditions entirely the wrong way around. Anyway, I think I want to actually move all the stack expansion code to a whole new file of its own, rather than have it split up between mm/mmap.c and mm/memory.c, but since this will have to be backported to the initial maple tree vma introduction anyway, I tried to keep the patches _fairly_ minimal. Also, while I don't think it's valid to expand the stack from GUP, the final patch in here is a "warn if some crazy GUP user wants to try to expand the stack" patch. That one will be reverted before the final release, but it's left to catch any odd cases during the merge window and release candidates. Reported-by: Ruihan Li <lrh2000@pku.edu.cn> * branch 'expand-stack': gup: add warning if some caller would seem to want stack expansion mm: always expand the stack with the mmap write lock held execve: expand new process stack manually ahead of time mm: make find_extend_vma() fail if write lock not held powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma() mm/fault: convert remaining simple cases to lock_mm_and_find_vma() arm/mm: Convert to using lock_mm_and_find_vma() riscv/mm: Convert to using lock_mm_and_find_vma() mips/mm: Convert to using lock_mm_and_find_vma() powerpc/mm: Convert to using lock_mm_and_find_vma() arm64/mm: Convert to using lock_mm_and_find_vma() mm: make the page fault mmap locking killable mm: introduce new 'lock_mm_and_find_vma()' page fault helper |
||
Linus Torvalds
|
3a8a670eee |
Networking changes for 6.5.
Core ---- - Rework the sendpage & splice implementations. Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is. Remove the MSG_SENDPAGE_NOTLAST flag completely. - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid. - Enable socket busy polling with CONFIG_RT. - Improve reliability and efficiency of reporting for ref_tracker. - Auto-generate a user space C library for various Netlink families. Protocols --------- - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2]. - Use per-VMA locking for "page-flipping" TCP receive zerocopy. - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags. - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative. - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO). - Avoid waking up applications using TLS sockets until we have a full record. - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring. - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address. - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch. - PPPoE: make number of hash bits configurable. - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig). - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge). - Support matching on Connectivity Fault Management (CFM) packets. - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug. - HSR: don't enable promiscuous mode if device offloads the proto. - Support active scanning in IEEE 802.15.4. - Continue work on Multi-Link Operation for WiFi 7. BPF --- - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators. - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything. - Accept dynptr memory as memory arguments passed to helpers. - Add routing table ID to bpf_fib_lookup BPF helper. - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands. - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only). - Show target_{obj,btf}_id in tracing link fdinfo. - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter --------- - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value. - Increase ip_vs_conn_tab_bits range for 64BIT builds. - Allow updating size of a set. - Improve NAT tuple selection when connection is closing. Driver API ---------- - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out). - Support configuring rate selection pins of SFP modules. - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines. - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer. - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio). - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message. - Split devlink instance and devlink port operations. New hardware / drivers ---------------------- - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers ------- - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/ fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/ JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4= =9i4I -----END PGP SIGNATURE----- Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer *should* be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" * tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ... |