linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 12:42:02 +00:00

Author	SHA1	Message	Date
Matthew Wilcox (Oracle)	ee0800c2f6	mm: convert page_add_anon_rmap() to use a folio internally The API for page_add_anon_rmap() needs to be page-based, because we can add mappings of individual pages. But inside the function, we want to only call compound_head() once and then use the folio APIs instead of the page APIs that each call compound_head(). Link: https://lkml.kernel.org/r/20230111142915.1001531-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:56 -08:00
Matthew Wilcox (Oracle)	62beb906ef	mm: convert page_remove_rmap() to use a folio internally The API for page_remove_rmap() needs to be page-based, because we can remove mappings of pages individually. But inside the function, we want to only call compound_head() once and then use the folio APIs instead of the page APIs that each call compound_head(). Link: https://lkml.kernel.org/r/20230111142915.1001531-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:55 -08:00
Matthew Wilcox (Oracle)	b14224fbea	mm: convert total_compound_mapcount() to folio_total_mapcount() Instead of enforcing that the argument must be a head page by naming, enforce it with the compiler by making it a folio. Also rename the counter in struct folio from _compound_mapcount to _entire_mapcount. Link: https://lkml.kernel.org/r/20230111142915.1001531-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:55 -08:00
Matthew Wilcox (Oracle)	6eee1a0062	doc: clarify refcount section by referring to folios & pages Include the rename of subpages_mapcount to _nr_pages_mapped. Link: https://lkml.kernel.org/r/20230111142915.1001531-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:55 -08:00
Matthew Wilcox (Oracle)	eec20426d4	mm: convert head_subpages_mapcount() into folio_nr_pages_mapped() Calling this 'mapcount' is confusing since mapcount is usually the number of times something is mapped; instead this is the number of mapped pages. It's also better to enforce that this is a folio rather than a head page. Move folio_nr_pages_mapped() into mm/internal.h since this is not something we want device drivers or filesystems poking at. Get rid of folio_subpages_mapcount_ptr() and use folio->_nr_pages_mapped directly. Link: https://lkml.kernel.org/r/20230111142915.1001531-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:55 -08:00
Matthew Wilcox (Oracle)	94688e8eb4	mm: remove folio_pincount_ptr() and head_compound_pincount() We can use folio->_pincount directly, since all users are guarded by tests of compound/large. Link: https://lkml.kernel.org/r/20230111142915.1001531-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:54 -08:00
Alistair Popple	7d4a8be0c4	mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export mmu_notifier_range_update_to_read_only() was originally introduced in commit `c6d23413f8` ("mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper") as an optimisation for device drivers that know a range has only been mapped read-only. However there are no users of this feature so remove it. As it is the only user of the struct mmu_notifier_range.vma field remove that also. Link: https://lkml.kernel.org/r/20230110025722.600912-1-apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:54 -08:00
Baolin Wang	9e5522715e	mm: compaction: avoid fragmentation score calculation for empty zones There is no need to calculate the fragmentation score for empty zones. Link: https://lkml.kernel.org/r/100331ad9d274a9725e687b00d85d75d7e4a17c7.1673342761.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:54 -08:00
Baolin Wang	8fff8b6f8d	mm: compaction: add missing kcompactd wakeup trace event Add missing kcompactd wakeup trace event for proactive compaction, meanwhile use order = -1 and the highest zone index of the pgdat for the kcompactd wakeup trace event by proactive compaction. Link: https://lkml.kernel.org/r/cbf8097a2d8a1b6800991f2a21575550d3613ce6.1673342761.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:54 -08:00
Baolin Wang	1bfb7684db	mm: compaction: count the migration scanned pages events for proactive compaction The proactive compaction will reuse per-node kcompactd threads, so we should also count the KCOMPACTD_MIGRATE_SCANNED and KCOMPACTD_FREE_SCANNED events for proactive compaction. Link: https://lkml.kernel.org/r/b7f1ece1adc17defa47e3667b5f9fd61f496517a.1673342761.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:53 -08:00
Baolin Wang	753ec50d97	mm: compaction: move list validation into compact_zone() Move the cc.freepages and cc.migratepages list validation into compact_zone() to remove some duplicate code. Link: https://lkml.kernel.org/r/15cf54f7d762e87b04ac3cc74536f7d1ebbcd8cd.1673342761.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:53 -08:00
Baolin Wang	c6835e8d86	mm: compaction: remove redundant VM_BUG_ON() in compact_zone() Patch series "Some small improvements for compaction". When I did some compaction testing, I found some small room for improvement as well as some code cleanups. This patch (of 5): The compaction_suitable() will never return values other than COMPACT_SUCCESS, COMPACT_SKIPPED and COMPACT_CONTINUE, so after validation of COMPACT_SUCCESS and COMPACT_SKIPPED, we will never hit other unexpected case. Thus remove the redundant VM_BUG_ON() validation for the return values of compaction_suitable(). Link: https://lkml.kernel.org/r/cover.1673342761.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/740a2396d9b98154dba76e326cba5e798b640ead.1673342761.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:53 -08:00
Vernon Yang	baabcfc93d	mm/mmap: fix typo in comment Replace "parital" with "partial". Link: https://lkml.kernel.org/r/20230110145353.1658435-1-vernon2gm@gmail.com Signed-off-by: Vernon Yang <vernon2gm@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:53 -08:00
Vernon Yang	c5d5546ea0	maple_tree: remove the parameter entry of mas_preallocate The parameter entry of mas_preallocate is not used, so drop it. Link: https://lkml.kernel.org/r/20230110154211.1758562-1-vernon2gm@gmail.com Signed-off-by: Vernon Yang <vernon2gm@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:52 -08:00
SeongJae Park	75cb348714	selftests/damon/debugfs_rm_non_contexts: hide expected write error messages A selftest case for DAMON debugfs interface has a test for expected failure. To make the test output clean, hide the expected failure error message. Link: https://lkml.kernel.org/r/20230110190400.119388-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:52 -08:00
SeongJae Park	16ddcb1549	selftests/damon/sysfs: hide expected write failures DAMON selftests for sysfs (sysfs.sh) tests if some writes to DAMON sysfs interface files fails as expected. It makes the test results noisy with the failure error message because it tests a number of such failures. Redirect the expected failure error messages to /dev/null to make the results clean. Link: https://lkml.kernel.org/r/20230110190400.119388-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:52 -08:00
SeongJae Park	2d2230efbc	MAINTAINERS/DAMON: link maintainer profile, git trees, and website Add links to below DAMON development related resource to DAMON section in MAINTAINERS file. - The basic policies and expectations of DAMON development, - DAMON development trees, and - DAMON introduction website. Link: https://lkml.kernel.org/r/20230110190400.119388-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:52 -08:00
SeongJae Park	e7366f3a2e	Docs/mm/damon: add a maintainer-profile for DAMON Document the basic policies and expectations for DAMON development. Link: https://lkml.kernel.org/r/20230110190400.119388-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:52 -08:00
SeongJae Park	9a47c411cc	Docs/admin-guide/mm/damon/usage: update DAMOS actions/filters supports of each DAMON operations set Supports of each DAMOS action and filters are up to DAMON operations set implementation, but it's not mentioned in detail on the documentation. Update the information on the usage document. Link: https://lkml.kernel.org/r/20230110190400.119388-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:51 -08:00
SeongJae Park	86834644e3	Docs/mm/damon/index: mention DAMOS on the intro What DAMON aims to do is not only access monitoring but efficient and effective access-aware system operations. And DAMon-based Operation Schemes (DAMOS) is the important feature of DAMON for the goal. Make the intro of DAMON documentation to emphasize the goal and mention DAMOS. Link: https://lkml.kernel.org/r/20230110190400.119388-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:51 -08:00
SeongJae Park	55901e89d2	mm/damon/core: update kernel-doc comments for DAMOS filters supports of each DAMON operations set Supports of each DAMOS filter type are up to DAMON operations set implementation in use, but not well mentioned on the kernel-doc comments. Add the comment. Link: https://lkml.kernel.org/r/20230110190400.119388-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:51 -08:00
SeongJae Park	fb6f026b83	mm/damon/core: update kernel-doc comments for DAMOS action supports of each DAMON operations set Patch series "mm/damon: trivial fixups". This patchset contains patches for trivial fixups of DAMON's documentation, MAINTAINERS section, and selftests. This patch (of 8): Supports of each DAMOS action are up to DAMON operations set implementation in use, but not well mentioned on the kernel-doc comments. Add the comment. Link: https://lkml.kernel.org/r/20230110190400.119388-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230110190400.119388-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:51 -08:00
Fabio M. De Francesco	92b64bd01f	mm/highmem: add notes about conversions from kmap{,_atomic}() kmap() and kmap_atomic() have been deprecated. kmap_local_page() should always be used in new code and the call sites of the two deprecated functions should be converted. This latter task can lead to errors if it is not carried out with the necessary attention to the context around and between the maps and unmaps. Therefore, add further information to the Highmem's documentation for the purpose to make it clearer that (1) kmap() and kmap_atomic() must not any longer be called in new code and (2) developers doing conversions from kmap() amd kmap_atomic() are expected to take care of the context around and between the maps and unmaps, in order to not break the code. Relevant parts of this patch have been taken from messages exchanged privately with Ira Weiny (thanks!). [fmdefrancesco@gmail.com: merge two sentences into one, per Bagas] Link: https://lkml.kernel.org/r/20230119123945.10471-1-fmdefrancesco@gmail.com Link: https://lkml.kernel.org/r/20221207225308.8290-1-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-02-02 22:32:50 -08:00
Andrew Morton	5ab0fc155d	Sync mm-stable with mm-hotfixes-stable to pick up dependent patches Merge branch 'mm-hotfixes-stable' into mm-stable	2023-01-31 17:25:17 -08:00
Kefeng Wang	ac86f547ca	mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() As commit `18365225f0` ("hwpoison, memcg: forcibly uncharge LRU pages"), hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could occurs a NULL pointer dereference, let's do not record the foreign writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to fix it. Link: https://lkml.kernel.org/r/20230129040945.180629-1-wangkefeng.wang@huawei.com Fixes: `97b27821b4` ("writeback, memcg: Implement foreign dirty flushing") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: Ma Wupeng <mawupeng1@huawei.com> Tested-by: Miko Larsson <mikoxyzzz@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Ma Wupeng <mawupeng1@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:10 -08:00
ye xingchen	1e90e35b62	Kconfig.debug: fix the help description in SCHED_DEBUG The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched. Link: https://lkml.kernel.org/r/202301291013573466558@zte.com.cn Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:10 -08:00
Longlong Xia	7717fc1a12	mm/swapfile: add cond_resched() in get_swap_pages() The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently. The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup. Link: https://lkml.kernel.org/r/20230128094757.1060525-1-xialonglong1@huawei.com Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:10 -08:00
Zhaoyang Huang	993f57e027	mm: use stack_depot_early_init for kmemleak Mirsad report the below error which is caused by stack_depot_init() failure in kvcalloc. Solve this by having stackdepot use stack_depot_early_init(). On 1/4/23 17:08, Mirsad Goran Todorovac wrote: I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak: [root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff951c118568b0 (size 16): comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s) hex dump (first 16 bytes): 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0....... backtrace: [root@pc-mtodorov ~]# Apparently, backtrace of called functions on the stack is no longer printed with the list of memory leaks. This appeared on Lenovo desktop 10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and 6.2-rc1 and 6.2-rc2 builds. This worked on 6.1 with the same CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from Mr. Torvalds' tree. I don't know if this is deliberate feature for some reason or a bug. Please find attached the config, lshw and kmemleak output. [vbabka@suse.cz: remove stack_depot_init() call] Link: https://lore.kernel.org/all/5272a819-ef74-65ff-be61-4d2d567337de@alu.unizg.hr/ Link: https://lkml.kernel.org/r/1674091345-14799-2-git-send-email-zhaoyang.huang@unisoc.com Fixes: `56a61617dd` ("mm: use stack_depot for recording kmemleak's backtrace") Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: ke.wang <ke.wang@unisoc.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:10 -08:00
Phillip Lougher	f65c4bbbd6	Squashfs: fix handling and sanity checking of xattr_ids count A Sysbot [1] corrupted filesystem exposes two flaws in the handling and sanity checking of the xattr_ids count in the filesystem. Both of these flaws cause computation overflow due to incorrect typing. In the corrupted filesystem the xattr_ids value is 4294967071, which stored in a signed variable becomes the negative number -225. Flaw 1 (64-bit systems only): The signed integer xattr_ids variable causes sign extension. This causes variable overflow in the SQUASHFS_XATTR_(A) macros. The variable is first multiplied by sizeof(struct squashfs_xattr_id) where the type of the sizeof operator is "unsigned long". On a 64-bit system this is 64-bits in size, and causes the negative number to be sign extended and widened to 64-bits and then become unsigned. This produces the very large number 18446744073709548016 or 2^64 - 3600. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0 (stored in len). Flaw 2 (32-bit systems only): On a 32-bit system the integer variable is not widened by the unsigned long type of the sizeof operator (32-bits), and the signedness of the variable has no effect due it always being treated as unsigned. The above corrupted xattr_ids value of 4294967071, when multiplied overflows and produces the number 4294963696 or 2^32 - 3400. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows again and produces a length of 0. The effect of the 0 length computation: In conjunction with the corrupted xattr_ids field, the filesystem also has a corrupted xattr_table_start value, where it matches the end of filesystem value of 850. This causes the following sanity check code to fail because the incorrectly computed len of 0 matches the incorrect size of the table reported by the superblock (0 bytes). len = SQUASHFS_XATTR_BLOCK_BYTES(xattr_ids); indexes = SQUASHFS_XATTR_BLOCKS(xattr_ids); / * The computed size of the index table (len bytes) should exactly * match the table start and end points / start = table_start + sizeof(id_table); end = msblk->bytes_used; if (len != (end - start)) return ERR_PTR(-EINVAL); Changing the xattr_ids variable to be "usigned int" fixes the flaw on a 64-bit system. This relies on the fact the computation is widened by the unsigned long type of the sizeof operator. Casting the variable to u64 in the above macro fixes this flaw on a 32-bit system. It also means 64-bit systems do not implicitly rely on the type of the sizeof operator to widen the computation. [1] https://lore.kernel.org/lkml/000000000000cd44f005f1a0f17f@google.com/ Link: https://lkml.kernel.org/r/20230127061842.10965-1-phillip@squashfs.org.uk Fixes: `506220d2ba` ("squashfs: add more sanity checks in xattr id lookup") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: <syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Fedor Pchelkin <pchelkin@ispras.ru> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:10 -08:00
Tom Saeger	c1c551bebf	sh: define RUNTIME_DISCARD_EXIT sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since commit `99cb0d917f` ("arch: fix broken BuildID for arm64 and riscv"). This is similar to fixes for powerpc and s390: commit `4b9880dbf3` ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT"). commit `a494398bde` ("s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36"). $ sh4-linux-gnu-ld --version \| head -n1 GNU ld (GNU Binutils for Debian) 2.35.2 $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- `.exit.text' referenced in section `__bug_table' of crypto/algboss.o: defined in discarded section `.exit.text' of crypto/algboss.o `.exit.text' referenced in section `__bug_table' of drivers/char/hw_random/core.o: defined in discarded section `.exit.text' of drivers/char/hw_random/core.o make[2]: * [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make[1]: * [Makefile:1252: vmlinux] Error 2 arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT: /* * .exit.text is discarded at runtime, not link time, to deal with * references from __bug_table */ .exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT } However, EXIT_TEXT is thrown away by DISCARD(include/asm-generic/vmlinux.lds.h) because sh does not define RUNTIME_DISCARD_EXIT. GNU ld 2.40 does not have this issue and builds fine. This corresponds with Masahiro's comments in `a494398bde`: "Nathan [Chancellor] also found that binutils commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this issue, so we cannot reproduce it with binutils 2.36+, but it is better to not rely on it." Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com Fixes: `99cb0d917f` ("arch: fix broken BuildID for arm64 and riscv") Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/ Link: https://lore.kernel.org/all/20230123194218.47ssfzhrpnv3xfez@oracle.com/ Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dennis Gilmore <dennis@ausil.us> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:09 -08:00
Matthew Wilcox (Oracle)	88d7b12068	highmem: round down the address passed to kunmap_flush_on_unmap() We already round down the address in kunmap_local_indexed() which is the other implementation of __kunmap_local(). The only implementation of kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned address. This may be causing PA-RISC to be flushing the wrong addresses currently. Link: https://lkml.kernel.org/r/20230126200727.1680362-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Fixes: `298fa1ad55` ("highmem: Provide generic variant of kmap_atomic*") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Helge Deller <deller@gmx.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: David Sterba <dsterba@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:09 -08:00
Mike Kravetz	73bdf65ea7	migrate: hugetlb: check for hugetlb shared PMD in node migration migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to move pages shared with another process to a different node. page_mapcount > 1 is being used to determine if a hugetlb page is shared. However, a hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. As a result, hugetlb pages shared by multiple processes and mapped with a shared PMD can be moved by a process without CAP_SYS_NICE. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found consider the page shared. Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com Fixes: `e2d8cf4055` ("migrate: add hugepage migration code to migrate_pages()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:09 -08:00
Mike Kravetz	3489dbb696	mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs". This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue. [1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/ This patch (of 2): A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table. page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added. [akpm@linux-foundation.org: simplification, per David] [akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com Fixes: `25ee01a2fc` ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:09 -08:00
Zach O'Keefe	edb5d0cf55	mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups In commit `34488399fa` ("mm/madvise: add file and shmem support to MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none(): - if (!pmd_present(pmde)) - return SCAN_PMD_NULL; + if (pmd_none(pmde)) + return SCAN_PMD_NONE; This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include: A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address. In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or it's a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd. However, the commit introduces a logical hole; namely, that we've allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values aren't checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd. We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch. The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86's pmd_present() also checks the _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit. Covering all 4 cases for x86 (all checks done on the same pmd value): 1) pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger? The only relevant path is: madvise_collapse() -> collapse_pte_mapped_thp() Where we might just incorrectly report back "success", when really the memory isn't pmd-backed. This is fine, since split could happen immediately after (actually) successful madvise_collapse(). So, it should be safe to just assume huge-pmd here. 2) pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock -- or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated. 3) !pmd_present() && pmd_trans_huge() Not possible. 4) !pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present I've checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesn't necessarily hold in general -- but that doesn't matter since !pmd_present() always takes failure path). Also, add a comment above find_pmd_or_thp_or_none() to help future travelers reason about the validity of the code; namely, the possible mutations that might happen out from under us, depending on how mmap_lock is held (if at all). Link: https://lkml.kernel.org/r/20230125225358.2576151-1-zokeefe@google.com Fixes: `34488399fa` ("mm/madvise: add file and shmem support to MADV_COLLAPSE") Signed-off-by: Zach O'Keefe <zokeefe@google.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:09 -08:00
Isaac J. Manjarres	8ef852f1cb	Revert "mm: kmemleak: alloc gray object for reserved region with direct map" This reverts commit `972fa3a7c1`. Kmemleak operates by periodically scanning memory regions for pointers to allocated memory blocks to determine if they are leaked or not. However, reserved memory regions can be used for DMA transactions between a device and a CPU, and thus, wouldn't contain pointers to allocated memory blocks, making them inappropriate for kmemleak to scan. Thus, revert this commit. Link: https://lkml.kernel.org/r/20230124230254.295589-1-isaacmanjarres@google.com Fixes: `972fa3a7c1` ("mm: kmemleak: alloc gray object for reserved region with direct map") Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Calvin Zhang <calvinzhang.cool@gmail.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: <stable@vger.kernel.org> [5.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:09 -08:00
Randy Dunlap	0d7866eace	freevxfs: Kconfig: fix spelling Fix a spello in freevxfs Kconfig. (reported by codespell) Link: https://lkml.kernel.org/r/20230124181638.15604-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:08 -08:00
Wei Yang	ab6ef70a8b	maple_tree: should get pivots boundary by type We should get pivots boundary by type. Fixes a potential overindexing of mt_pivots[]. Link: https://lkml.kernel.org/r/20221112234308.23823-1-richard.weiyang@gmail.com Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:08 -08:00
Eugen Hristev	889a904fe3	.mailmap: update e-mail address for Eugen Hristev Update e-mail address. Link: https://lkml.kernel.org/r/20230119072229.99603-1-eugen.hristev@collabora.com Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:08 -08:00
Vlastimil Babka	d014cd7c1c	mm, mremap: fix mremap() expanding for vma's with vm_ops->close() Fabian has reported another regression in 6.1 due to `ca3d76b0aa` ("mm: add merging after mremap resize"). The problem is that vma_merge() can fail when vma has a vm_ops->close() method, causing is_mergeable_vma() test to be negative. This was happening for vma mapping a file from fuse-overlayfs, which does have the method. But when we are simply expanding the vma, we never remove it due to the "merge" with the added area, so the test should not prevent the expansion. As a quick fix, check for such vmas and expand them using vma_adjust() directly as was done before commit `ca3d76b0aa`. For a more robust long term solution we should try to limit the check for vma_ops->close only to cases that actually result in vma removal, so that no merge would be prevented unnecessarily. [akpm@linux-foundation.org: fix indenting whitespace, reflow comment] Link: https://lkml.kernel.org/r/20230117101939.9753-1-vbabka@suse.cz Fixes: `ca3d76b0aa` ("mm: add merging after mremap resize") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Fabian Vogt <fvogt@suse.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359#c35 Tested-by: Fabian Vogt <fvogt@suse.com> Cc: Jakub Matěna <matenajakub@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:08 -08:00
Fedor Pchelkin	72e544b1b2	squashfs: harden sanity check in squashfs_read_xattr_id_table While mounting a corrupted filesystem, a signed integer '*xattr_ids' can become less than zero. This leads to the incorrect computation of 'len' and 'indexes' values which can cause null-ptr-deref in copy_bio_to_actor() or out-of-bounds accesses in the next sanity checks inside squashfs_read_xattr_id_table(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Link: https://lkml.kernel.org/r/20230117105226.329303-2-pchelkin@ispras.ru Fixes: `506220d2ba` ("squashfs: add more sanity checks in xattr id lookup") Reported-by: <syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:08 -08:00
James Morse	6f28a26134	ia64: fix build error due to switch case label appearing next to declaration Since commit `aa06a9bd85` ("ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency"), gcc 10.1.0 fails to build ia64 with the gnomic: \| ../arch/ia64/kernel/sys_ia64.c: In function 'ia64_clock_getres': \| ../arch/ia64/kernel/sys_ia64.c:189:3: error: a label can only be part of a statement and a declaration is not a statement \| 189 \| s64 tick_ns = DIV_ROUND_UP(NSEC_PER_SEC, local_cpu_data->itc_freq); This line appears immediately after a case label in a switch. Move the declarations out of the case, to the top of the function. Link: https://lkml.kernel.org/r/20230117151632.393836-1-james.morse@arm.com Fixes: `aa06a9bd85` ("ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency") Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Sergei Trofimovich <slyich@gmail.com> Cc: Émeric Maschino <emeric.maschino@gmail.com> Cc: matoro <matoro_mailinglist_kernel@matoro.tk> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:08 -08:00
Yu Zhao	de08eaa615	mm: multi-gen LRU: fix crash during cgroup migration lru_gen_migrate_mm() assumes lru_gen_add_mm() runs prior to itself. This isn't true for the following scenario: CPU 1 CPU 2 clone() cgroup_can_fork() cgroup_procs_write() cgroup_post_fork() task_lock() lru_gen_migrate_mm() task_unlock() task_lock() lru_gen_add_mm() task_unlock() And when the above happens, kernel crashes because of linked list corruption (mm_struct->lru_gen.list). Link: https://lore.kernel.org/r/20230115134651.30028-1-msizanoen@qtmlabs.xyz/ Link: https://lkml.kernel.org/r/20230116034405.2960276-1-yuzhao@google.com Fixes: `bd74fdaea1` ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: msizanoen <msizanoen@qtmlabs.xyz> Tested-by: msizanoen <msizanoen@qtmlabs.xyz> Cc: <stable@vger.kernel.org> [6.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:08 -08:00
Michal Hocko	55ab834a86	Revert "mm: add nodes= arg to memory.reclaim" This reverts commit `12a5d39552`. Although it is recognized that a finer grained pro-active reclaim is something we need and want the semantic of this implementation is really ambiguous. In a follow up discussion it became clear that there are two essential usecases here. One is to use memory.reclaim to pro-actively reclaim memory and expectation is that the requested and reported amount of memory is uncharged from the memcg. Another usecase focuses on pro-active demotion when the memory is merely shuffled around to demotion targets while the overall charged memory stays unchanged. The current implementation considers demoted pages as reclaimed and that break both usecases. [1] has tried to address the reporting part but there are more issues with that summarized in [2] and follow up emails. Let's revert the nodemask based extension of the memcg pro-active reclaim for now until we settle with a more robust semantic. [1] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20221206023406.3182800-1-almasrymina@google.com [2] http://lkml.kernel.org/r/Y5bsmpCyeryu3Zz1@dhcp22.suse.cz Link: https://lkml.kernel.org/r/Y5xASNe1x8cusiTx@dhcp22.suse.cz Fixes: `12a5d39552` ("mm: add nodes= arg to memory.reclaim") Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: zefan li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:07 -08:00
Nhat Pham	85b325815b	zsmalloc: fix a race with deferred_handles storing Currently, there is a race between zs_free() and zs_reclaim_page(): zs_reclaim_page() finds a handle to an allocated object, but before the eviction happens, an independent zs_free() call to the same handle could come in and overwrite the object value stored at the handle with the last deferred handle. When zs_reclaim_page() finally gets to call the eviction handler, it will see an invalid object value (i.e the previous deferred handle instead of the original object value). This race happens quite infrequently. We only managed to produce it with out-of-tree developmental code that triggers zsmalloc writeback with a much higher frequency than usual. This patch fixes this race by storing the deferred handle in the object header instead. We differentiate the deferred handle from the other two cases (handle for allocated object, and linkage for free object) with a new tag. If zspage reclamation succeeds, we will free these deferred handles by walking through the zspage objects. On the other hand, if zspage reclamation fails, we reconstruct the zspage freelist (with the deferred handle tag and allocated tag) before trying again with the reclamation. [arnd@arndb.de: avoid unused-function warning] Link: https://lkml.kernel.org/r/20230117170507.2651972-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20230110231701.326724-1-nphamcs@gmail.com Fixes: `9997bc0175` ("zsmalloc: implement writeback mechanism for zsmalloc") Signed-off-by: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:07 -08:00
Jann Horn	023f47a825	mm/khugepaged: fix ->anon_vma race If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires it to be locked. Page table traversal is allowed under any one of the mmap lock, the anon_vma lock (if the VMA is associated with an anon_vma), and the mapping lock (if the VMA is associated with a mapping); and so to be able to remove page tables, we must hold all three of them. retract_page_tables() bails out if an ->anon_vma is attached, but does this check before holding the mmap lock (as the comment above the check explains). If we racily merged an existing ->anon_vma (shared with a child process) from a neighboring VMA, subsequent rmap traversals on pages belonging to the child will be able to see the page tables that we are concurrently removing while assuming that nothing else can access them. Repeat the ->anon_vma check once we hold the mmap lock to ensure that there really is no concurrent page table access. Hitting this bug causes a lockdep warning in collapse_and_free_pmd(), in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)". It can also lead to use-after-free access. Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/ Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com Fixes: `f3f0e1d215` ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Jann Horn <jannh@google.com> Reported-by: Zach O'Keefe <zokeefe@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@intel.linux.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:07 -08:00
Liam Howlett	7327e8111a	maple_tree: fix mas_empty_area_rev() lower bound validation mas_empty_area_rev() was not correctly validating the start of a gap against the lower limit. This could lead to the range starting lower than the requested minimum. Fix the issue by better validating a gap once one is found. This commit also adds tests to the maple tree test suite for this issue and tests the mas_empty_area() function for similar bound checking. Link: https://lkml.kernel.org/r/20230111200136.1851322-1-Liam.Howlett@oracle.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216911 Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: <amanieu@gmail.com> Link: https://lore.kernel.org/linux-mm/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/ Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-31 16:44:07 -08:00
Pengfei Xu	24b5308cf5	selftests/filesystems: grant executable permission to run_fat_tests.sh When use tools/testing/selftests/kselftest_install.sh to make the kselftest-list.txt under tools/testing/selftests/kselftest_install. Then use tools/testing/selftests/kselftest_install/run_kselftest.sh to run all the kselftests in kselftest-list.txt, it will be blocked by case "filesystems/fat: run_fat_tests.sh" with "Warning: file run_fat_tests.sh is not executable", so grant executable permission to run_fat_tests.sh to fix this issue. Link: https://lkml.kernel.org/r/dfdbba6df8a1ab34bb1e81cd8bd7ca3f9ed5c369.1673424747.git.pengfei.xu@intel.com Fixes: `dd7c9be330` ("selftests/filesystems: add a vfat RENAME_EXCHANGE test") Signed-off-by: Pengfei Xu <pengfei.xu@intel.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-19 17:27:25 -08:00
Peter Xu	0ca2c535f5	selftests/vm: remove __USE_GNU in hugetlb-madvise.c __USE_GNU should be an internal macro only used inside glibc. Either memfd_create() or fallocate() requires _GNU_SOURCE per man page, where __USE_GNU will further be defined by glibc headers include/features.h: #ifdef _GNU_SOURCE # define __USE_GNU 1 #endif This fixes: >> hugetlb-madvise.c:20: warning: "__USE_GNU" redefined 20 \| #define __USE_GNU \| In file included from /usr/include/x86_64-linux-gnu/bits/libc-header-start.h:33, from /usr/include/stdlib.h:26, from hugetlb-madvise.c:16: /usr/include/features.h:407: note: this is the location of the previous definition 407 \| # define __USE_GNU 1 \| Link: https://lkml.kernel.org/r/Y8V9z+z6Tk7NetI3@x1n Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:20:52 -08:00
Björn Töpel	9a3f21fe5c	selftests: vm: enable cross-compilation Selftests vm builds break when doing cross-compilation. The Makefile MACHINE variable incorrectly picks upp the host machine architecture. If the CROSS_COMPILE variable is set, dig out the target host architecture from CROSS_COMPILE, instead of calling uname. Link: https://lkml.kernel.org/r/20230109114251.3349638-1-bjorn@kernel.org Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:59 -08:00
Alexander Pantyukhin	d526643f15	tools:cgroup:memcg_shrinker remove redundant import Remove redundant import of the sys module. Also use the sort function instead of sorted. It sorts the direct array without create the new one in memory. Link: https://lkml.kernel.org/r/20230108105023.4289-1-apantykhin@gmail.com Signed-off-by: Alexander Pantyukhin <apantykhin@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:59 -08:00

1 2 3 4 5 ...

1154435 Commits