forked from Minki/linux
7d2c4385c3
1122273 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Zach O'Keefe
|
7d2c4385c3 |
mm/khugepaged: rename prefix of shared collapse functions
The following functions are shared between khugepaged and madvise collapse contexts. Replace the "khugepaged_" prefix with generic "hpage_collapse_" prefix in such cases: khugepaged_test_exit() -> hpage_collapse_test_exit() khugepaged_scan_abort() -> hpage_collapse_scan_abort() khugepaged_scan_pmd() -> hpage_collapse_scan_pmd() khugepaged_find_target_node() -> hpage_collapse_find_target_node() khugepaged_alloc_page() -> hpage_collapse_alloc_page() The kerenel ABI (e.g. huge_memory:mm_khugepaged_scan_pmd tracepoint) is unaltered. Link: https://lkml.kernel.org/r/20220706235936.2197195-11-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Zach O'Keefe
|
7d8faaf155 |
mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse
This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse Semantics This call is independent of the system-wide THP sysfs settings, but will fail for memory marked VM_NOHUGEPAGE. If the ranges provided span multiple VMAs, the semantics of the collapse over each VMA is independent from the others. This implies a hugepage cannot cross a VMA boundary. If collapse of a given hugepage-aligned/sized region fails, the operation may continue to attempt collapsing the remainder of memory specified. The memory ranges provided must be page-aligned, but are not required to be hugepage-aligned. If the memory ranges are not hugepage-aligned, the start/end of the range will be clamped to the first/last hugepage-aligned address covered by said range. The memory ranges must span at least one hugepage-sized region. All non-resident pages covered by the range will first be swapped/faulted-in, before being internally copied onto a freshly allocated hugepage. Unmapped pages will have their data directly initialized to 0 in the new hugepage. However, for every eligible hugepage aligned/sized region to-be collapsed, at least one page must currently be backed by memory (a PMD covering the address range must already exist). Allocation for the new hugepage may enter direct reclaim and/or compaction, regardless of VMA flags. When the system has multiple NUMA nodes, the hugepage will be allocated from the node providing the most native pages. This operation operates on the current state of the specified process and makes no persistent changes or guarantees on how pages will be mapped, constructed, or faulted in the future Return Value If all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already PMD-mapped THPs, this operation will be deemed successful. On success, process_madvise(2) returns the number of bytes advised, and madvise(2) returns 0. Else, -1 is returned and errno is set to indicate the error for the most-recently attempted hugepage collapse. Note that many failures might have occurred, since the operation may continue to collapse in the event a single hugepage-sized/aligned region fails. ENOMEM Memory allocation failed or VMA not found EBUSY Memcg charging failed EAGAIN Required resource temporarily unavailable. Try again might succeed. EINVAL Other error: No PMD found, subpage doesn't have Present bit set, "Special" page no backed by struct page, VMA incorrectly sized, address not page-aligned, ... Most notable here is ENOMEM and EBUSY (new to madvise) which are intended to provide the caller with actionable feedback so they may take an appropriate fallback measure. Use Cases An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but additional support for file, shmem, and HugeTLB high-granularity mappings[2] is expected. File and tmpfs/shmem support would permit: * Backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. * Backing guest memory by hugapages after the memory contents have been migrated in native-page-sized chunks to a new host, in a userfaultfd-based live-migration stack. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc [jrdr.linux@gmail.com: avoid possible memory leak in failure path] Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com [zokeefe@google.com add missing kfree() to madvise_collapse()] Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/ Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com [zokeefe@google.com: delay computation of hpage boundaries until use]] Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Signed-off-by: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Suggested-by: David Rientjes <rientjes@google.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Zach O'Keefe
|
5072280442 |
mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage
When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a hugepage. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. In order to determine if a pmd already maps a hugepage, refactor mm_find_pmd(): Return mm_find_pmd() to it's pre-commit |
||
Zach O'Keefe
|
a7f4e6e4c4 |
mm/thp: add flag to enforce sysfs THP in hugepage_vma_check()
MADV_COLLAPSE is not coupled to the kernel-oriented sysfs THP settings[1]. hugepage_vma_check() is the authority on determining if a VMA is eligible for THP allocation/collapse, and currently enforces the sysfs THP settings. Add a flag to disable these checks. For now, only apply this arg to anon and file, which use /sys/kernel/transparent_hugepage/enabled. We can expand this to shmem, which uses /sys/kernel/transparent_hugepage/shmem_enabled, later. Use this flag in collapse_pte_mapped_thp() where previously the VMA flags passed to hugepage_vma_check() were OR'd with VM_HUGEPAGE to elide the VM_HUGEPAGE check in "madvise" THP mode. Prior to "mm: khugepaged: check THP flag in hugepage_vma_check()", this check also didn't check "never" THP mode. As such, this restores the previous behavior of collapse_pte_mapped_thp() where sysfs THP settings are ignored. See comment in code for justification why this is OK. [1] https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220706235936.2197195-8-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Zach O'Keefe
|
d8ea7cc854 |
mm/khugepaged: add flag to predicate khugepaged-only behavior
Add .is_khugepaged flag to struct collapse_control so khugepaged-specific behavior can be elided by MADV_COLLAPSE context. Start by protecting khugepaged-specific heuristics by this flag. In MADV_COLLAPSE, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't prevent the user from doing so: 1) sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared] 2) requirement that some pages in region being collapsed be young or referenced [zokeefe@google.com: consistently order cc->is_khugepaged and pte_* checks] Link: https://lkml.kernel.org/r/20220720140603.1958773-3-zokeefe@google.com Link: https://lore.kernel.org/linux-mm/Ys2qJm6FaOQcxkha@google.com/ Link: https://lkml.kernel.org/r/20220706235936.2197195-7-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Zach O'Keefe
|
50ad2f24b3 |
mm/khugepaged: propagate enum scan_result codes back to callers
Propagate enum scan_result codes back through return values of functions downstream of khugepaged_scan_file() and khugepaged_scan_pmd() to inform callers if the operation was successful, and if not, why. Since khugepaged_scan_pmd()'s return value already has a specific meaning (whether mmap_lock was unlocked or not), add a bool* argument to khugepaged_scan_pmd() to retrieve this information. Change khugepaged to take action based on the return values of khugepaged_scan_file() and khugepaged_scan_pmd() instead of acting deep within the collapsing functions themselves. hugepage_vma_revalidate() now returns SCAN_SUCCEED on success to be more consistent with enum scan_result propagation. Remove dependency on error pointers to communicate to khugepaged that allocation failed and it should sleep; instead just use the result of the scan (SCAN_ALLOC_HUGE_PAGE_FAIL if allocation fails). Link: https://lkml.kernel.org/r/20220706235936.2197195-6-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Zach O'Keefe
|
9710a78ab2 |
mm/khugepaged: dedup and simplify hugepage alloc and charging
The following code is duplicated in collapse_huge_page() and collapse_file(): gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; new_page = khugepaged_alloc_page(hpage, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { result = SCAN_CGROUP_CHARGE_FAIL; goto out; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); Also, "node" is passed as an argument to both collapse_huge_page() and collapse_file() and obtained the same way, via khugepaged_find_target_node(). Move all this into a new helper, alloc_charge_hpage(), and remove the duplicate code from collapse_huge_page() and collapse_file(). Also, simplify khugepaged_alloc_page() by returning a bool indicating allocation success instead of a copy of the allocated struct page *. Link: https://lkml.kernel.org/r/20220706235936.2197195-5-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Suggested-by: Peter Xu <peterx@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Zach O'Keefe
|
34d6b470ab |
mm/khugepaged: add struct collapse_control
Modularize hugepage collapse by introducing struct collapse_control. This structure serves to describe the properties of the requested collapse, as well as serve as a local scratch pad to use during the collapse itself. Start by moving global per-node khugepaged statistics into this new structure. Note that this structure is still statically allocated since CONFIG_NODES_SHIFT might be arbitrary large, and stack-allocating a MAX_NUMNODES-sized array could cause -Wframe-large-than= errors. [zokeefe@google.com: use minimal bits to store num page < HPAGE_PMD_NR] Link: https://lkml.kernel.org/r/20220720140603.1958773-2-zokeefe@google.com Link: https://lore.kernel.org/linux-mm/Ys2CeIm%2FQmQwWh9a@google.com/ [sfr@canb.auug.org.au: fix build] Link: https://lkml.kernel.org/r/20220721195508.15f1e07a@canb.auug.org.au [zokeefe@google.com: fix struct collapse_control load_node definition] Link: https://lore.kernel.org/linux-mm/202209021349.F73i5d6X-lkp@intel.com/ Link: https://lkml.kernel.org/r/20220903021221.1130021-1-zokeefe@google.com Link: https://lkml.kernel.org/r/20220706235936.2197195-4-zokeefe@google.com Signed-off-by: Zach O'Keefe <zokeefe@google.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com> Cc: SeongJae Park <sj@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Yang Shi
|
c6a7f445a2 |
mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA
Patch series "mm: userspace hugepage collapse", v7.
Introduction
--------------------------------
This series provides a mechanism for userspace to induce a collapse of
eligible ranges of memory into transparent hugepages in process context,
thus permitting users to more tightly control their own hugepage
utilization policy at their own expense.
This idea was introduced by David Rientjes[5].
Interface
--------------------------------
The proposed interface adds a new madvise(2) mode, MADV_COLLAPSE, and
leverages the new process_madvise(2) call.
process_madvise(2)
Performs a synchronous collapse of the native pages
mapped by the list of iovecs into transparent hugepages.
This operation is independent of the system THP sysfs settings,
but attempts to collapse VMAs marked VM_NOHUGEPAGE will still fail.
THP allocation may enter direct reclaim and/or compaction.
When a range spans multiple VMAs, the semantics of the collapse
over of each VMA is independent from the others.
Caller must have CAP_SYS_ADMIN if not acting on self.
Return value follows existing process_madvise(2) conventions. A
“success” indicates that all hugepage-sized/aligned regions
covered by the provided range were either successfully
collapsed, or were already pmd-mapped THPs.
madvise(2)
Equivalent to process_madvise(2) on self, with 0 returned on
“success”.
Current Use-Cases
--------------------------------
(1) Immediately back executable text by THPs. Current support provided
by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large
system which might impair services from serving at their full rated
load after (re)starting. Tricks like mremap(2)'ing text onto
anonymous memory to immediately realize iTLB performance prevents
page sharing and demand paging, both of which increase steady state
memory footprint. With MADV_COLLAPSE, we get the best of both
worlds: Peak upfront performance and lower RAM footprints. Note
that subsequent support for file-backed memory is required here.
(2) malloc() implementations that manage memory in hugepage-sized
chunks, but sometimes subrelease memory back to the system in
native-sized chunks via MADV_DONTNEED; zapping the pmd. Later,
when the memory is hot, the implementation could
madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain
hugepage coverage and dTLB performance. TCMalloc is such an
implementation that could benefit from this[6]. A prior study of
Google internal workloads during evaluation of Temeraire, a
hugepage-aware enhancement to TCMalloc, showed that nearly 20% of
all cpu cycles were spent in dTLB stalls, and that increasing
hugepage coverage by even small amount can help with that[7].
(3) userfaultfd-based live migration of virtual machines satisfy UFFD
faults by fetching native-sized pages over the network (to avoid
latency of transferring an entire hugepage). However, after guest
memory has been fully copied to the new host, MADV_COLLAPSE can
be used to immediately increase guest performance. Note that
subsequent support for file/shmem-backed memory is required here.
(4) HugeTLB high-granularity mapping allows HugeTLB a HugeTLB page to
be mapped at different levels in the page tables[8]. As it's not
"transparent" like THP, HugeTLB high-granularity mappings require
an explicit user API. It is intended that MADV_COLLAPSE be co-opted
for this use case[9]. Note that subsequent support for HugeTLB
memory is required here.
Future work
--------------------------------
Only private anonymous memory is supported by this series. File and
shmem memory support will be added later.
One possible user of this functionality is a userspace agent that
attempts to optimize THP utilization system-wide by allocating THPs
based on, for example, task priority, task performance requirements, or
heatmaps. For the latter, one idea that has already surfaced is using
DAMON to identify hot regions, and driving THP collapse through a new
DAMOS_COLLAPSE scheme[10].
This patch (of 17):
The khugepaged has optimization to reduce huge page allocation calls for
!CONFIG_NUMA by carrying the allocated but failed to collapse huge page to
the next loop. CONFIG_NUMA doesn't do so since the next loop may try to
collapse huge page from a different node, so it doesn't make too much
sense to carry it.
But when NUMA=n, the huge page is allocated by khugepaged_prealloc_page()
before scanning the address space, so it means huge page may be allocated
even though there is no suitable range for collapsing. Then the page
would be just freed if khugepaged already made enough progress. This
could make NUMA=n run have 5 times as much thp_collapse_alloc as NUMA=y
run. This problem actually makes things worse due to the way more
pointless THP allocations and makes the optimization pointless.
This could be fixed by carrying the huge page across scans, but it will
complicate the code further and the huge page may be carried indefinitely.
But if we take one step back, the optimization itself seems not worth
keeping nowadays since:
* Not too many users build NUMA=n kernel nowadays even though the kernel is
actually running on a non-NUMA machine. Some small devices may run NUMA=n
kernel, but I don't think they actually use THP.
* Since commit
|
||
Linus Torvalds
|
b90cb10531 | Linux 6.0-rc3 | ||
Linus Torvalds
|
b467192ec7 |
Seventeen hotfixes. Mostly memory management things. Ten patches are
cc:stable, addressing pre-6.0 issues. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYwvgrAAKCRDdBJ7gKXxA jlweAQC9dzE08Elxl4F7Uvxe+62JWVeflBRrT7sJ6jU1Gu3QcQEAhhI1Xit3/MGq pRytDBObGADxlA67c9eNq6J5pCT/7gE= =pD67 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more hotfixes from Andrew Morton: "Seventeen hotfixes. Mostly memory management things. Ten patches are cc:stable, addressing pre-6.0 issues" * tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: .mailmap: update Luca Ceresoli's e-mail address mm/mprotect: only reference swap pfn page if type match squashfs: don't call kmalloc in decompressors mm/damon/dbgfs: avoid duplicate context directory creation mailmap: update email address for Colin King asm-generic: sections: refactor memory_intersects bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown Revert "memcg: cleanup racy sum avoidance code" mm/zsmalloc: do not attempt to free IS_ERR handle binder_alloc: add missing mmap_lock calls when using the VMA mm: re-allow pinning of zero pfns (again) vmcoreinfo: add kallsyms_num_syms symbol mailmap: update Guilherme G. Piccoli's email addresses writeback: avoid use-after-free after removing device shmem: update folio if shmem_replace_page() updates the page mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte |
||
Linus Torvalds
|
373eff576e |
bitmap fixes for v6.0-rc3
Hi Linus, Please pull (hopefully) the last portion of fixes from Sander for his UP rework series. The original series came from -mm tree, and it was not the latest version, that's why we need follow-ups. It fixes only a test introduced by that series. The test fails under certain configs. From Sander: This series fixes the reported issues, and implements the suggested improvements, for the version of the cpumask tests [1] that was merged with commit |
||
Luca Ceresoli
|
0ebafe2ea8 |
.mailmap: update Luca Ceresoli's e-mail address
My Bootlin address is preferred from now on. Link: https://lkml.kernel.org/r/20220826130515.3011951-1-luca.ceresoli@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Atish Patra <atishp@atishpatra.org> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Peter Xu
|
3d2f78f08c |
mm/mprotect: only reference swap pfn page if type match
Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
kernel BUG at include/linux/swapops.h:117!
CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S O L 6.0.0-dbg-DEV #2
RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
FS: 00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
change_pte_range+0x36e/0x880
change_p4d_range+0x2e8/0x670
change_protection_range+0x14e/0x2c0
mprotect_fixup+0x1ee/0x330
do_mprotect_pkey+0x34c/0x440
__x64_sys_mprotect+0x1d/0x30
It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
genuine swap entry.
Fix it by only calling it when it's a write migration entry where the page*
is used.
[1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220823221138.45602-1-peterx@redhat.com
Fixes:
|
||
Phillip Lougher
|
1f13dff09f |
squashfs: don't call kmalloc in decompressors
The decompressors may be called while in an atomic section. So move the kmalloc() out of this path, and into the "page actor" init function. This fixes a regression introduced by commit |
||
Badari Pulavarty
|
d26f607036 |
mm/damon/dbgfs: avoid duplicate context directory creation
When user tries to create a DAMON context via the DAMON debugfs interface
with a name of an already existing context, the context directory creation
fails but a new context is created and added in the internal data
structure, due to absence of the directory creation success check. As a
result, memory could leak and DAMON cannot be turned on. An example test
case is as below:
# cd /sys/kernel/debug/damon/
# echo "off" > monitor_on
# echo paddr > target_ids
# echo "abc" > mk_context
# echo "abc" > mk_context
# echo $$ > abc/target_ids
# echo "on" > monitor_on <<< fails
Return value of 'debugfs_create_dir()' is expected to be ignored in
general, but this is an exceptional case as DAMON feature is depending
on the debugfs functionality and it has the potential duplicate name
issue. This commit therefore fixes the issue by checking the directory
creation failure and immediately return the error in the case.
Link: https://lkml.kernel.org/r/20220821180853.2400-1-sj@kernel.org
Fixes:
|
||
Colin Ian King
|
ac733f6558 |
mailmap: update email address for Colin King
Colin King is working on kernel janitorial fixes in his spare time and using his Intel email is confusing. Use his gmail account as the default email address. Link: https://lkml.kernel.org/r/20220817212753.101109-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Quanyang Wang
|
0c7d7cc2b4 |
asm-generic: sections: refactor memory_intersects
There are two problems with the current code of memory_intersects: First, it doesn't check whether the region (begin, end) falls inside the region (virt, vend), that is (virt < begin && vend > end). The second problem is if vend is equal to begin, it will return true but this is wrong since vend (virt + size) is not the last address of the memory region but (virt + size -1) is. The wrong determination will trigger the misreporting when the function check_for_illegal_area calls memory_intersects to check if the dma region intersects with stext region. The misreporting is as below (stext is at 0x80100000): WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536] Modules linked in: CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5 Hardware name: Xilinx Zynq Platform unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0xb0/0x198 __warn from warn_slowpath_fmt+0x80/0xb4 warn_slowpath_fmt from check_for_illegal_area+0x130/0x168 check_for_illegal_area from debug_dma_map_sg+0x94/0x368 debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128 __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24 dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4 usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214 usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118 usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70 usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360 usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440 usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238 usb_stor_control_thread from kthread+0xf8/0x104 kthread from ret_from_fork+0x14/0x2c Refactor memory_intersects to fix the two problems above. Before the |
||
Liu Shixin
|
dd0ff4d12d |
bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem
The vmemmap pages is marked by kmemleak when allocated from memblock.
Remove it from kmemleak when freeing the page. Otherwise, when we reuse
the page, kmemleak may report such an error and then stop working.
kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing)
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff98fb6be00000 (size 335544320):
kmemleak: comm "swapper", pid 0, jiffies 4294892296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com
Fixes:
|
||
Heming Zhao
|
550842cc60 |
ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown
After commit |
||
Shakeel Butt
|
dbb16df644 |
Revert "memcg: cleanup racy sum avoidance code"
This reverts commit |
||
Sergey Senozhatsky
|
a5d2172180 |
mm/zsmalloc: do not attempt to free IS_ERR handle
zsmalloc() now returns ERR_PTR values as handles, which zram accidentally
can pass to zs_free(). Another bad scenario is when zcomp_compress()
fails - handle has default -ENOMEM value, and zs_free() will try to free
that "pointer value".
Add the missing check and make sure that zs_free() bails out when
ERR_PTR() is passed to it.
Link: https://lkml.kernel.org/r/20220816050906.2583956-1-senozhatsky@chromium.org
Fixes:
|
||
Liam Howlett
|
44e602b4e5 |
binder_alloc: add missing mmap_lock calls when using the VMA
Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages()
and when checking for a VMA in binder_alloc_new_buf_locked().
It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock
after it verifies a VMA exists, but may be taken again deeper in the call
stack, if necessary.
Link: https://lkml.kernel.org/r/20220810160209.1630707-1-Liam.Howlett@oracle.com
Fixes:
|
||
Alex Williamson
|
fcab34b433 |
mm: re-allow pinning of zero pfns (again)
The below referenced commit makes the same error as |
||
Stephen Brennan
|
f09bddbd86 |
vmcoreinfo: add kallsyms_num_syms symbol
The rest of the kallsyms symbols are useless without knowing the number of
symbols in the table. In an earlier patch, I somehow dropped the
kallsyms_num_syms symbol, so add it back in.
Link: https://lkml.kernel.org/r/20220808205410.18590-1-stephen.s.brennan@oracle.com
Fixes:
|
||
Guilherme G. Piccoli
|
6c26d17eea |
mailmap: update Guilherme G. Piccoli's email addresses
Both @canonical and @ibm email addresses are invalid now; use my personal address instead. Link: https://lkml.kernel.org/r/20220804202207.439427-1-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Khazhismel Kumykov
|
f87904c075 |
writeback: avoid use-after-free after removing device
When a disk is removed, bdi_unregister gets called to stop further
writeback and wait for associated delayed work to complete. However,
wb_inode_writeback_end() may schedule bandwidth estimation dwork after
this has completed, which can result in the timer attempting to access the
just freed bdi_writeback.
Fix this by checking if the bdi_writeback is alive, similar to when
scheduling writeback work.
Since this requires wb->work_lock, and wb_inode_writeback_end() may get
called from interrupt, switch wb->work_lock to an irqsafe lock.
Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com
Fixes:
|
||
Matthew Wilcox (Oracle)
|
9dfb3b8d65 |
shmem: update folio if shmem_replace_page() updates the page
If we allocate a new page, we need to make sure that our folio matches
that new page.
If we do end up in this code path, we store the wrong page in the shmem
inode's page cache, and I would rather imagine that data corruption
ensues.
This will be solved by changing shmem_replace_page() to
shmem_replace_folio(), but this is the minimal fix.
Link: https://lkml.kernel.org/r/20220730042518.1264767-1-willy@infradead.org
Fixes:
|
||
Miaohe Lin
|
ab74ef708d |
mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page
cache are installed in the ptes. But hugepage_add_new_anon_rmap is called
for them mistakenly because they're not vm_shared. This will corrupt the
page->mapping used by page cache code.
Link: https://lkml.kernel.org/r/20220712130542.18836-1-linmiaohe@huawei.com
Fixes:
|
||
Linus Torvalds
|
8379c0b31f |
for-6.0-rc3-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmMLY9oACgkQxWXV+ddt WDue/w/8C3ZF8nLAI/sMrUpef2vSD62bvkKRRS45wzR2uod6yc0Fle9upzBssJQZ qO3mQ53+QV+imCq7dY5mmtmwCUJNmbV5gbiMoF1OoV9TYtpZb/NIDklSX8se2eJX drdAWQr2pYwU2M4duA4IEW08TvQ2TFh0JiqMi0aYM5apyL80uv3WniOu+xpRipA3 CMFAnDqayIgQ5OIsedqNy2MBLBopodUL5PZv/H7/g6KSKIuAZP9zgg1eKPfaz2t3 HO183ubmMbVtxgxeu+EnvCkg/iQ5hQiuGmyi0FLYMs/A6/NglwBnIJU5jCMQhcp6 HO5+FSUn6lHQetVzt2uHb9Lo+gX4FtCaHqVv1bXT62lnmDsZO1D7RVSg1Fra+CY+ jJmi8vvIbfbYlSZPZlJANoWe8ODOMVPk+pM4SFHlxOWGAY6HViX2RfHnIjNj5x9O iDSTGvH6++nBF1Wu2/Xja/VKZ1avxRyTu2srW8JOF62j/tTU/EoPJcO9rxXOBBmC Hi4UmJ690p3h5xZeeiyE8CmaSlPtfdCcnc/97FnusEjBao9O7THX0PCDVJX6VBkm hVk01Z6+az1UNcD18KecvCpKYF/At4WpjaUGgf7q+LBfJXuXA6jfzOVDJMKV3TFd n1yMFg+duGj90l8gT0aa/VQiBlUlnzQKz6ceqyKkPccwveNis6I= =p8YV -----END PGP SIGNATURE----- Merge tag 'for-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Fixes: - check that subvolume is writable when changing xattrs from security namespace - fix memory leak in device lookup helper - update generation of hole file extent item when merging holes - fix space cache corruption and potential double allocations; this is a rare bug but can be serious once it happens, stable backports and analysis tool will be provided - fix error handling when deleting root references - fix crash due to assert when attempting to cancel suspended device replace, add message what to do if mount fails due to missing replace item Regressions: - don't merge pages into bio if their page offset is not contiguous - don't allow large NOWAIT direct reads, this could lead to short reads eg. in io_uring" * tag 'for-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add info when mount fails due to stale replace target btrfs: replace: drop assert for suspended replace btrfs: fix silent failure when deleting root reference btrfs: fix space cache corruption and potential double allocations btrfs: don't allow large NOWAIT direct reads btrfs: don't merge pages into bio if their page offset is not contiguous btrfs: update generation of hole file extent item when merging holes btrfs: fix possible memory leak in btrfs_get_dev_args_from_path() btrfs: check if root is readonly while setting security xattr |
||
Linus Torvalds
|
c7bb3fbc1b |
5 cifs/smb3 fixes, three for stable
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmMKi7cACgkQiiy9cAdy T1GV7gv/aQR1eQpk1wOnctwFWWaoRLKiiAtq18fg78JfGqjTyJaclxB+c1TV6PKY WPbQq0TqLgAi46QKTtgDNnxexXzIz7pSBZGsM02PVb4hGfeO7GRcG7PVuPSrRCvE nq/p2BT9FyxqyL0pXox6GLSZ86IYYnjmI1HPseE9WqD33hsNIgdwpLNlh1fWzv04 kFU/TZ1hboOky/plnhjkvyR4QyK9oUX44qj8kuwWN838gSruKI/+wVprIIKMuiFg 2bNwFdwQeM/7mWnCEHOdyhO9Upy0WCVQHjqq3+LkE52xgdUxd0my5wQKdO5LiTv3 AZGJ4XGsTc3aaWPZ/zVMMmt9JDFRe84JmQOdk0Ff2MxKTWmQYo+ZznArRUuYf8cc 6jPK1SefQEqqrtA4h/c+osNfAJJmpL3KxbQ4Hpj55cQZzY32iPuLsW2JyQKY+cUG 7j+8Av5wmM8iwJzH/B2DL3TIhY1xhPAsAsd6BXhF+iDzcCG7XgDx8VUTFPhp8QsD W8re2OMz =tss7 -----END PGP SIGNATURE----- Merge tag '6.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cfis fixes from Steve French: - two locking fixes (zero range, punch hole) - DFS 9 fix (padding), affecting some servers - three minor cleanup changes * tag '6.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Add helper function to check smb1+ server cifs: Use help macro to get the mid header size cifs: Use help macro to get the header preamble size cifs: skip extra NULL byte in filenames smb3: missing inode locks in punch hole smb3: missing inode locks in zero range |
||
Linus Torvalds
|
2f23a7c914 |
Misc fixes:
- Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLgUERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hU1hAAj9bKO+D07gROpkRhXpLvbqm+mUqk6It8 qEuyLkC/xvD9N//Pya0ZPPE+l23EHQxM/L0tXCYwsc8Zlx3fr678FQktHZuxULaP qlGmwub8PvsyMesXBGtgHJ4RT8L07FelBR+E/TpqCSbv/geM7mcc0ojyOKo+OmbD vbsUTDNqsMYndzePUBrv/vJRqVLGlbd1a22DKE/YiYBbZu5hughNUB0bHYhYdsml mutcRsVTKRbZOi4wiuY/pnveMf/z9wAwKOWd9EBaDWILygVSHkBJJQyhHigBh3F8 H1hFn1q4ybULdrAUT4YND9Tjb6U8laU0fxg8Z43bay7bFXXElqoj3W2qka9NjAim QvWSFvTYQRTIWt6sCshVsUBWj6fBZdHxcJ8Fh+ucJJp+JWl9/aD0A41vK2Jx0LIt p8YzBRKEqd1Q/7QD855BArN7HNtQGgShNt4oPVZ7nPnjuQt0+lngu8NR+6X3RKpX r8rordyPvzgPL6W+1uV+8hnz1w+YU2xplAbK+zTijwgJVgyf8khSlZQNpFKULrou zjtzo/2nB+4C4bvfetNnaOGhi1/AdCHZHyZE35rotpd73SLHvdOrH0Ll9oCVfVrC UWbC1E67cHQw97Ni/4CrCsJRBULK01uyszVCxlEkSYInf0UsKlnmy+TqxizwsCVy reYQ0ePyWg0= =ZpCJ -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen |
||
Linus Torvalds
|
4459d800f7 |
Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix,
PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmMLfEERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iUdxAAqWLRHp1JQlANJxbdwmJu/PMwjlhXLn63 w71UPXou172jEJWk6PxEkllMLfJBAe1hL0CW2VE1DlGFTfzOTwBtylLz8frhF5am 4smCwAppGzK/r6gOABwhgPG/rbU5TJRhjmRMkPmfeOmFZSD/L4DHcDu8HbG4ruz9 lhgKnB+TUmNBLyYQ7oqfnNsGNI5uuyJhzA8/kddPgEkK5XeebCxqZCXDRSp9LbUg 4BkGCB2R+MfiHlttCGzOKkW+dQafA+pUQMfoZHSFJ30lB7UuvpsVl5FTiXQ5cu+f TGkjyBIzkNqNJHrRebQ3kkLYY6rlTgJTvrk7QdnWY2sb1J6B0ktxqBd+DG47Pc9S IVOe66ikoVnV/Bws6mFxN8Kj/U4L38M+373hdUvyQd8kwuvy4c6fXGFZqfF8VCMf zHZQJR8eeOKP1EAOIE7tDb2eY+pWnherZlm3VsHYZLtOfhDepZH6bvRWO2wXbl3I R+Wr/PYZih2AzcvHj+CtpS8jKFAvBG6rbqlElWZ9ain0TrV0uMuI8I3HQ9WqMa5H sPukVqtOczxXSMgTCilHK5S+ymq2xOHdRgo2FZUIP/5SllMfiWpYFRdaS8oh2K/j M6zyc5kXajSeHdSKc/2O8imkcurNN2vgdXVDsIzZGqw6phFepMVmsXYeRD9msQDT aZTvVfOmvvQ= =I/UD -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Ingo Molnar: "Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix, PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix" * tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU perf/x86/intel: Fix pebs event constraints for ADL perf/x86/intel/ds: Fix precise store latency handling perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the Baseline perf/x86/lbr: Enable the branch type for the Arch LBR by default |
||
Linus Torvalds
|
611875d510 |
perf tools fixes for v6.0: 2nd batch
- Fixup setup of weak groups when using 'perf stat --repeat', add a 'perf test' for it. - Fix memory leaks in 'perf sched record' detected with -fsanitize=address. - Fix build when PYTHON_CONFIG is user supplied. - Capitalize topdown metrics' names in 'perf stat', so that the output, sometimes parsed, matches the Intel SDM docs. - Make sure the documentation for the save_type filter about Intel systems with Arch LBR support (12th-Gen+ client or 4th-Gen Xeon+ server) reflects recent related kernel changes. - Fix 'perf record' man page formatting of description of support to hybrid systems. - Update arm64´s KVM header from the kernel sources. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYwo57wAKCRCyPKLppCJ+ J4XKAP9nY5lAQak+i+Oce+Cd7TsNXJ9nubey4+dMkDoK4t4loAD7Bu+eDo0FF/F4 iGsOhtqgGD+jdguqhgemPRPNHFb5bQI= =ETh0 -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.0-2022-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fixup setup of weak groups when using 'perf stat --repeat', add a 'perf test' for it. - Fix memory leaks in 'perf sched record' detected with -fsanitize=address. - Fix build when PYTHON_CONFIG is user supplied. - Capitalize topdown metrics' names in 'perf stat', so that the output, sometimes parsed, matches the Intel SDM docs. - Make sure the documentation for the save_type filter about Intel systems with Arch LBR support (12th-Gen+ client or 4th-Gen Xeon+ server) reflects recent related kernel changes. - Fix 'perf record' man page formatting of description of support to hybrid systems. - Update arm64´s KVM header from the kernel sources. * tag 'perf-tools-fixes-for-v6.0-2022-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf stat: Capitalize topdown metrics' names perf docs: Update the documentation for the save_type filter perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=address perf record: Fix manpage formatting of description of support to hybrid systems perf test: Stat test for repeat with a weak group perf stat: Clear evsel->reset_group for each stat run tools kvm headers arm64: Update KVM header from the kernel sources perf python: Fix build when PYTHON_CONFIG is user supplied |
||
Linus Torvalds
|
10d4879f9e |
Thermal control fixes for 6.0-rc3
- Add missing EXPORT_SYMBOL_GPL in the thermal core and add back the required 'trips' property to the thermal zone DT bindings (Daniel Lezcano). - Prevent the int340x_thermal driver from crashing when a package with a buffer of 0 length is returned by an ACPI control method evaluated by it (Lee, Chun-Yi). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmMKHMYSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxEz8QAJAZ+T5o0tcfJLAq2b9suuUKx7V7NBwu ZAOV7i98NmohUGOnJtomgD+ajPfK09TH6T/ji4seBmRlWm4jgvc1gpIKCJxZ3Lco hX65oc5z03VzDqorgxDpWvVRagNi17QE249+w/tn25SFfXGjspxycciOxxKPXUb+ 34Szs8c8iFEQUrGXvouvVO9J38tWVcZB22cb++Sw4dZurlLM9uyuNP+d4DnC3Gg5 6dNisvVI3BGHz0JjqcC0wmMDZ04BIGXAVu3S/5Bvg2vZ8zJk/625wtVAZmKINyg/ K8LOaJ+cmG90L8+MmAooImBXdymB7uBecMka+m0McBMl+HG0SS0zwzgjD4rHJvIc 4W95CW4GuYOOg+FcCkMymlDRZ7RgJXMysz8s/uE64hzchgf6R1OenM4sG0n29Lao rqCWwAIPjwM6TI7bOhDm/GhpUOuq6bhokjsy5KMQ71IZO9JFookHN1y0VgTOPuTq I2e6gqIAuwPAOCq9ARMX3XNbmMMBc1k0Y0sgo7aH44hNFLPM9ef/hUt/1irSfto1 qJj9jhC/QlBBZ/dICm5FglPgBuDYbmQV7U3nCyIc5n2QvyiAecpCO50VBJN9GpZ5 m2yBr+hQKEEe6Ij09AwuZkvgtC70pz+3gV9c+e3qtu/5iljvSAoegwtiku5ISKkD nVkOKgZ0g7Tc =cpKx -----END PGP SIGNATURE----- Merge tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Fix two issues introduced recently and one driver problem leading to a NULL pointer dereference in some cases. Specifics: - Add missing EXPORT_SYMBOL_GPL in the thermal core and add back the required 'trips' property to the thermal zone DT bindings (Daniel Lezcano) - Prevent the int340x_thermal driver from crashing when a package with a buffer of 0 length is returned by an ACPI control method evaluated by it (Lee, Chun-Yi)" * tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTR dt-bindings: thermal: Fix missing required property thermal/core: Add missing EXPORT_SYMBOL_GPL |
||
Linus Torvalds
|
b98f602df7 |
Power management fix for 6.0-rc3
Make __resolve_freq() check the presence of the frequency table instead of checking whether or not the ->target_index() callback is implemented by the driver, because that need not be the case when __resolve_freq() is used (Lukasz Luba). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmMKHDkSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxf8kQAIph/Z/5/OaTEsS5pGfbIsQpqzHB1VU8 mJKIsoxyVv2fxCbDIzzFp692PayZzMDAdRGFrrnAdYHhezCRWxvDrqe6IIK36zSK PbbfmJE3HMQpaIc4kKOlbADqrGAcAlyuZdhMwhb+TNb6eQtxZGyPtxEaK4ws7SUW CemozX8dCohErqQTxM1Sd/i9mNl47dAF9wXA+89lebnijr71NbqgCAlNb/8mjdUR 8MAMYlxwK76dOXnMOWIsYzVaUwJDP2w6d7YyouAmecDjpEValWH4OfR+hTQoikAJ T/ZfRT8ynd2O3TzwqBgaNR3hkbvUlVoV2WM8QoXR02VaGNPKkbLaKmLTRHhqez4u bzMft2rrETZ71OV5TG/gdfhTEN3vqSNLw3uF+Yi6AC5FgnxGnFo1P3S4ephbGZmh +vSZaR4fGFIJp8gjigPqhCwXphAct8C+hPDEMlE6G6Lq10F/Yd7f7j2dehRU+9DJ iWnjOZkr00+iuaOOblAm8EfUOqSrprqf+SnVSpLNVCms6KhassRLTugg8mBraLw9 pKzKlA6PId7CHgsmkDoUzqJ2MwQ2RHy/QsrEnp2rvgC/NQmrW7tVlfGz6gUYBGtH YiOHKonC+pgNgYRJq9Hc5BuCrXxE8O8hp9cbuDszGW1YNf6mjirsfBPyqMnKEzmB sid512ZflH7E =lrNa -----END PGP SIGNATURE----- Merge tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Make __resolve_freq() check the presence of the frequency table instead of checking whether or not the ->target_index() callback is implemented by the driver, because that need not be the case when __resolve_freq() is used (Lukasz Luba)" * tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: check only freq_table in __resolve_freq() |
||
Linus Torvalds
|
2b1ddb5950 |
ACPI fixes for 6.0-rc3
- Prevent acpi_thermal_cpufreq_exit() from attempting to remove the same frequency QoS request for multiple times (Riwen Lu). - Fix type detection for integer ACPI device properties (Stefan Binding). - Avoid emitting false-positive warnings when processing ACPI device properties and drop the useless default case from the acpi_copy_property_array_uint() macro (Sakari Ailus). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmMKG5YSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxliwP/jByqpCXxpDqiRwt13npBSTHT49YfXIS bLKW8LjeoYLH2PxTRRkSY3i0wY8iFOJCoq1QkQJFZatXCDTxUxwDNceBqJX2yuiS XnkrpDuf2FbrIuuipsisDkeLDE4AE/byorFiYqfmBrMpiBhsfDCKNXPyQZFOEprU QfkRzxQG244vl3nIw2aIiVmdMv4MaenZMNtKqdFUvMeSVjJmrQupNLYfiSAeJznN rRFSR9zpS6xQNfYQt3QsAcdc1s39Iig990+w352H2E1Ev5Hf3M+2z1sASGtaV6K8 TZgzZEYHhOMIfCJTX1Jijkpqf8VK1VWWnZYkzIASUSfAV6UnvO2NBpqmx8ygjt24 +KoAyH8nsDkCTs3XmCNZ5vg0kODzlWylGs5mzh2iM12DKiexdG7LLfbPbhEs+r/d 6s14zj2jWPW3PFZ42Ho1lzwiibJ+qkLnRvY0aYQTCuVrzu5RGfDF9XtdZM6TkdN2 eCg9mB+AobfJoWj9hQLsMlXuBL3eVEgSQXZNVyL+hwGfD8chwDq7cFQp2q70pOoW sesGIfFuVnSBvCXAba8FyCfGsFZdsCeB1iD2HxdlyanHQKDgrUurpBegnxhCgPtb aZroJhTqfK8J7l2Bhvo71ncVZ3AVuoaYRVRrAsddyy+nHulxd/c3YGZHuZznH5uA DlSVjhFPbA/6 =0doF -----END PGP SIGNATURE----- Merge tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix issues introduced by recent changes related to the handling of ACPI device properties and a coding mistake in the exit path of the ACPI processor driver. Specifics: - Prevent acpi_thermal_cpufreq_exit() from attempting to remove the same frequency QoS request multiple times (Riwen Lu) - Fix type detection for integer ACPI device properties (Stefan Binding) - Avoid emitting false-positive warnings when processing ACPI device properties and drop the useless default case from the acpi_copy_property_array_uint() macro (Sakari Ailus)" * tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: property: Remove default association from integer maximum values ACPI: property: Ignore already existing data node tags ACPI: property: Fix type detection of unified integer reading functions ACPI: processor: Remove freq Qos request for all CPUs |
||
Linus Torvalds
|
dee1873796 |
s390 updates for 6.0-rc3
- Fix double free of guarded storage and runtime instrumentation control blocks on fork() failure. - Fix triggering write fault when VMA does not allow VM_WRITE. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmMJ6sYACgkQjYWKoQLX FBiH2Af/YzyZO+MgkXEGpMDp7zp7G1H3m7OYfA9onnRruf5PKppWYR/0eQRRYbat C1+9LyIQO2PjMG0RoWroxX7DYG9HR027nt1D+J1xPk82UgGvegDkky3oH3c75wFb oxjjdSI+Rsb4VEo1yIaKfl+lEsxa+rqXB3wvSsrwMFs9OHJYQo4eK62WUAFwcDoq saZ5xu82GhACTETpzCFnVHdGyWdq4BfpQPV1xIQDt8V+8y/yK4/dnB8coGEJL6I1 cBVPUXltCwXeVoeMTVzzD6qUU0TK9nwHmJneSJ/Afj2w2XfAfJlNYpBuSEUSwHS4 pD9DzBKy8YQn9ecDmqnDN7zY5n1fFw== =wHSL -----END PGP SIGNATURE----- Merge tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix double free of guarded storage and runtime instrumentation control blocks on fork() failure - Fix triggering write fault when VMA does not allow VM_WRITE * tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: do not trigger write fault when vma does not allow VM_WRITE s390: fix double free of GS and RI CBs on fork() failure |
||
Linus Torvalds
|
05519f2480 |
xen: branch for v6.0-rc3
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYwnVogAKCRCAXGG7T9hj vkvCAP97gXbhilg8GkmhSgwsn8XAnR3PFWonR+PA/NBQwsrx4AD/by18f7ICyOCl /r2PHDjYSiyMjwBSjrdm5Uo/kNBaqAM= =cIk8 -----END PGP SIGNATURE----- Merge tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - two minor cleanups - a fix of the xen/privcmd driver avoiding a possible NULL dereference in an error case * tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/privcmd: fix error exit of privcmd_ioctl_dm_op() xen: move from strlcpy with unused retval to strscpy xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY |
||
Linus Torvalds
|
17b28d4267 |
audit/stable-6.0 PR 20220826
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmMJG04UHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOBtBAAkfUY6U8EtMvrPPu6kMyREPdU/9Zh wCBrKjY59fMWOl1RT8zYqyZaCZRvSc/Wd73XLvU2r0pf83N3i6sH7CozVhQyhM8H icNSzFRcZetaaOu2VKvfp5sSHR0ulLlYy26+zud6Syl/F7AJVwID0wsyHLVMuLs0 PVb+oOoOoHzLdAxY6GlwHFHww3NgDPuYTo2v/19AAQ9f9HHHbr8iMwso4kBPA3TX x6tS/0YNKdAKAEtzwBmLQ7d8rFsjuBVActzoIOHjSluH5hg7UrrY4OwSOK1tp0bY r+tnpa4M1bBBqxgNlHY9CHlpveNNzDtiDNjxOA/EsGHyNPrjkna017MEc9kGO7Bn uwu0ytGoLt/IWeWdn3edmlDJtg782JmGI5YS3ihCE6vrqjd1sDh6QUVGMMy29Cm2 dSPp1WY+I7IW9zTD1RzsdqDWdtnuN2XL591VxPW8WyvcU4QS5bBXQmUT+T8Ribkr jsZHiG4GqozF7bzuN38iw+MO2dV7TFvrzTQmqbji/8cDC68QANagdBaqUx8dGZ1w itW6UDZiUeSN8XUNJgDNX2b7jxnVPpEBQ1a0Ncbo6ykfZ4NKKujGE2kv7GMJ2d7x vYP/MxQdw15hQsSlT3vhmCQq6OpchpLUIywIsT3uTYATb5dMHDaWW7RtUg55/yNv xxiKWBMeALHGE9w= =j67g -----END PGP SIGNATURE----- Merge tag 'audit-pr-20220826' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "Another small audit patch, this time to fix a bug where the return codes were not properly set before the audit filters were run, potentially resulting in missed audit records" * tag 'audit-pr-20220826' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: move audit_return_fixup before the filters |
||
Linus Torvalds
|
89b749d855 |
fbdev fixes and updates for kernel 6.0-rc3
Major fixes: - Revert the changes for fbcon console when vc_resize() fails [Shigeru Yoshida] - Avoid a potential divide by zero error in fb_pm2fb [Letu Ren] Minor fixes: - Add missing pci_disable_device() in chipsfb_pci_init() [Yang Yingliang] - Fix tests for platform_get_irq() failure in omapfb [Yu Zhe] - Destroy mutex on freeing struct fb_info in fbsysfs [Shigeru Yoshida] Cleanups: - Move fbdev drivers from strlcpy to strscpy [Wolfram Sang] - Indenting fixes, comment fixes, ... [Jiapeng Chong & Jilin Yuan] -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYwkNcgAKCRD3ErUQojoP X9yzAQCgHdQ185Q9hMFNMbT/mAiZ54RMWjUmHB95TVht41F8dgD/ZNngl1psEixg 3QkoY3dAmK7lytp6XWe7vyXvHa+hNg4= =Zipi -----END PGP SIGNATURE----- Merge tag 'fbdev-for-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fixes and updates from Helge Deller: "Mostly just small patches, with the exception of the bigger indenting cleanups in the sisfb and radeonfb drivers. Two patches should be mentioned though: A fix-up for fbdev if the screen resize fails (by Shigeru Yoshida), and a potential divide by zero fix in fb_pm2fb (by Letu Ren). Summary: Major fixes: - Revert the changes for fbcon console when vc_resize() fails [Shigeru Yoshida] - Avoid a potential divide by zero error in fb_pm2fb [Letu Ren] Minor fixes: - Add missing pci_disable_device() in chipsfb_pci_init() [Yang Yingliang] - Fix tests for platform_get_irq() failure in omapfb [Yu Zhe] - Destroy mutex on freeing struct fb_info in fbsysfs [Shigeru Yoshida] Cleanups: - Move fbdev drivers from strlcpy to strscpy [Wolfram Sang] - Indenting fixes, comment fixes, ... [Jiapeng Chong & Jilin Yuan]" * tag 'fbdev-for-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: fbcon: Properly revert changes when vc_resize() failed fbdev: Move fbdev drivers from strlcpy to strscpy fbdev: omap: Remove unnecessary print function dev_err() fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init() fbdev: fbcon: Destroy mutex on freeing struct fb_info fbdev: radeon: Clean up some inconsistent indenting fbdev: sisfb: Clean up some inconsistent indenting fbdev: fb_pm2fb: Avoid potential divide by zero error fbdev: ssd1307fb: Fix repeated words in comments fbdev: omapfb: Fix tests for platform_get_irq() failure |
||
Mikulas Patocka
|
d6ffe6067a |
provide arch_test_bit_acquire for architectures that define test_bit
Some architectures define their own arch_test_bit and they also need
arch_test_bit_acquire, otherwise they won't compile. We also clean up
the code by using the generic test_bit if that is equivalent to the
arch-specific version.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes:
|
||
Zhengjun Xing
|
48648548ef |
perf stat: Capitalize topdown metrics' names
Capitalize topdown metrics' names to follow the intel SDM. Before: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,094.05 msec cpu-clock # 225.026 CPUs utilized 842 context-switches # 3.691 /sec 224 cpu-migrations # 0.982 /sec 70 page-faults # 0.307 /sec 23,164,105 cycles # 0.000 GHz 29,403,446 instructions # 1.27 insn per cycle 5,268,185 branches # 23.097 K/sec 33,239 branch-misses # 0.63% of all branches 136,248,990 slots # 597.337 K/sec 32,976,450 topdown-retiring # 24.2% retiring 4,651,918 topdown-bad-spec # 3.4% bad speculation 26,148,695 topdown-fe-bound # 19.2% frontend bound 72,515,776 topdown-be-bound # 53.2% backend bound 6,008,540 topdown-heavy-ops # 4.4% heavy operations # 19.8% light operations 3,934,049 topdown-br-mispredict # 2.9% branch mispredict # 0.5% machine clears 16,655,439 topdown-fetch-lat # 12.2% fetch latency # 7.0% fetch bandwidth 41,635,972 topdown-mem-bound # 30.5% memory bound # 22.7% Core bound 1.013634593 seconds time elapsed After: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,081.94 msec cpu-clock # 225.003 CPUs utilized 824 context-switches # 3.613 /sec 224 cpu-migrations # 0.982 /sec 67 page-faults # 0.294 /sec 22,647,423 cycles # 0.000 GHz 28,870,551 instructions # 1.27 insn per cycle 5,167,099 branches # 22.655 K/sec 32,383 branch-misses # 0.63% of all branches 133,411,074 slots # 584.926 K/sec 32,352,607 topdown-retiring # 24.3% Retiring 4,456,977 topdown-bad-spec # 3.3% Bad Speculation 25,626,487 topdown-fe-bound # 19.2% Frontend Bound 70,955,316 topdown-be-bound # 53.2% Backend Bound 5,834,844 topdown-heavy-ops # 4.4% Heavy Operations # 19.9% Light Operations 3,738,781 topdown-br-mispredict # 2.8% Branch Mispredict # 0.5% Machine Clears 16,286,803 topdown-fetch-lat # 12.2% Fetch Latency # 7.0% Fetch Bandwidth 40,802,069 topdown-mem-bound # 30.6% Memory Bound # 22.6% Core Bound 1.013683125 seconds time elapsed Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kan Liang
|
3126204ce3 |
perf docs: Update the documentation for the save_type filter
Update the documentation to reflect the kernel changes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220816125612.2042397-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
d72e5cf3cf |
perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=address
An array of strings is passed to cmd_record but not freed. As cmd_record modifies the array, add another array as a copy that can be mutated allowing the original array contents to all be freed. Detected with -fsanitize=address. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Andi Kleen
|
e89eaa611c |
perf record: Fix manpage formatting of description of support to hybrid systems
The Intel hybrid description is written in a different style than the rest of the perf record man page. There were some new command line options added after it which resulted in very strange section ordering. Move the hybrid include last. Also the sub sections in the hybrid document don't fit the record manpage well (especially since it talks about all kinds of unrelated commands). I left this for now, but would be better to separate this properly in the different man pages. It would be better to use sub sections for the other sections, but these don't seem to be supported in AsciiDoc? Some of the examples are still misrendered in the manpage with an indented troff command, but I don't know how to fix that. In any case it's now better than before. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
0c361c6eab |
perf test: Stat test for repeat with a weak group
Breaking a weak group requires multiple passes of an evlist, with multiple runs this can introduce bugs ultimately leading to segfaults. Add a test to cover this. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220822213352.75721-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
bf515f024e |
perf stat: Clear evsel->reset_group for each stat run
If a weak group is broken then the reset_group flag remains set for
the next run. Having reset_group set means the counter isn't created
and ultimately a segfault.
A simple reproduction of this is:
# perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W
which will be added as a test in the next patch.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
dbcfe5ec3f |
tools kvm headers arm64: Update KVM header from the kernel sources
To pick the changes from:
|
||
James Clark
|
bc9e7fe313 |
perf python: Fix build when PYTHON_CONFIG is user supplied
The previous change to Python autodetection had a small mistake where the auto value was used to determine the Python binary, rather than the user supplied value. The Python binary is only used for one part of the build process, rather than the final linking, so it was producing correct builds in most scenarios, especially when the auto detected value matched what the user wanted, or the system only had a valid set of Pythons. Change it so that the Python binary path is derived from either the PYTHON_CONFIG value or PYTHON value, depending on what is specified by the user. This was the original intention. This error was spotted in a build failure an odd cross compilation environment after commit |