mirror of
https://github.com/torvalds/linux.git
synced 2024-11-24 21:21:41 +00:00
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx TMNhHfyiHYDTx/GAV9NXW84tasJSDgA= =TG55 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
This commit is contained in:
commit
902861e34c
153
Documentation/ABI/testing/sysfs-bus-dax
Normal file
153
Documentation/ABI/testing/sysfs-bus-dax
Normal file
@ -0,0 +1,153 @@
|
||||
What: /sys/bus/dax/devices/daxX.Y/align
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RW) Provides a way to specify an alignment for a dax device.
|
||||
Values allowed are constrained by the physical address ranges
|
||||
that back the dax device, and also by arch requirements.
|
||||
|
||||
What: /sys/bus/dax/devices/daxX.Y/mapping
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(WO) Provides a way to allocate a mapping range under a dax
|
||||
device. Specified in the format <start>-<end>.
|
||||
|
||||
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
|
||||
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
|
||||
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) A dax device may have multiple constituent discontiguous
|
||||
address ranges. These are represented by the different
|
||||
'mappingX' subdirectories. The 'start' attribute indicates the
|
||||
start physical address for the given range. The 'end' attribute
|
||||
indicates the end physical address for the given range. The
|
||||
'page_offset' attribute indicates the offset of the current
|
||||
range in the dax device.
|
||||
|
||||
What: /sys/bus/dax/devices/daxX.Y/resource
|
||||
Date: June, 2019
|
||||
KernelVersion: v5.3
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) The resource attribute indicates the starting physical
|
||||
address of a dax device. In case of a device with multiple
|
||||
constituent ranges, it indicates the starting address of the
|
||||
first range.
|
||||
|
||||
What: /sys/bus/dax/devices/daxX.Y/size
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RW) The size attribute indicates the total size of a dax
|
||||
device. For creating subdivided dax devices, or for resizing
|
||||
an existing device, the new size can be written to this as
|
||||
part of the reconfiguration process.
|
||||
|
||||
What: /sys/bus/dax/devices/daxX.Y/numa_node
|
||||
Date: November, 2019
|
||||
KernelVersion: v5.5
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) If NUMA is enabled and the platform has affinitized the
|
||||
backing device for this dax device, emit the CPU node
|
||||
affinity for this device.
|
||||
|
||||
What: /sys/bus/dax/devices/daxX.Y/target_node
|
||||
Date: February, 2019
|
||||
KernelVersion: v5.1
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) The target-node attribute is the Linux numa-node that a
|
||||
device-dax instance may create when it is online. Prior to
|
||||
being online the device's 'numa_node' property reflects the
|
||||
closest online cpu node which is the typical expectation of a
|
||||
device 'numa_node'. Once it is online it becomes its own
|
||||
distinct numa node.
|
||||
|
||||
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/available_size
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) The available_size attribute tracks available dax region
|
||||
capacity. This only applies to volatile hmem devices, not pmem
|
||||
devices, since pmem devices are defined by nvdimm namespace
|
||||
boundaries.
|
||||
|
||||
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/size
|
||||
Date: July, 2017
|
||||
KernelVersion: v5.1
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) The size attribute indicates the size of a given dax region
|
||||
in bytes.
|
||||
|
||||
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/align
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) The align attribute indicates alignment of the dax region.
|
||||
Changes on align may not always be valid, when say certain
|
||||
mappings were created with 2M and then we switch to 1G. This
|
||||
validates all ranges against the new value being attempted, post
|
||||
resizing.
|
||||
|
||||
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/seed
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) The seed device is a concept for dynamic dax regions to be
|
||||
able to split the region amongst multiple sub-instances. The
|
||||
seed device, similar to libnvdimm seed devices, is a device
|
||||
that starts with zero capacity allocated and unbound to a
|
||||
driver.
|
||||
|
||||
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/create
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RW) The create interface to the dax region provides a way to
|
||||
create a new unconfigured dax device under the given region, which
|
||||
can then be configured (with a size etc.) and then probed.
|
||||
|
||||
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/delete
|
||||
Date: October, 2020
|
||||
KernelVersion: v5.10
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(WO) The delete interface for a dax region provides for deletion
|
||||
of any 0-sized and idle dax devices.
|
||||
|
||||
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/id
|
||||
Date: July, 2017
|
||||
KernelVersion: v5.1
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RO) The id attribute indicates the region id of a dax region.
|
||||
|
||||
What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
|
||||
Date: January, 2024
|
||||
KernelVersion: v6.8
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description:
|
||||
(RW) Control the memmap_on_memory setting if the dax device
|
||||
were to be hotplugged as system memory. This determines whether
|
||||
the 'altmap' for the hotplugged memory will be placed on the
|
||||
device being hotplugged (memmap_on_memory=1) or if it will be
|
||||
placed on regular memory (memmap_on_memory=0). This attribute
|
||||
must be set before the device is handed over to the 'kmem'
|
||||
driver (i.e. hotplugged into system-ram). Additionally, this
|
||||
depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled
|
||||
memmap_on_memory parameter for memory_hotplug. This is
|
||||
typically set on the kernel command line -
|
||||
memory_hotplug.memmap_on_memory set to 'true' or 'force'."
|
@ -23,3 +23,9 @@ Date: Feb 2021
|
||||
Contact: Minchan Kim <minchan@kernel.org>
|
||||
Description:
|
||||
the number of pages CMA API failed to allocate
|
||||
|
||||
What: /sys/kernel/mm/cma/<cma-heap-name>/release_pages_success
|
||||
Date: Feb 2024
|
||||
Contact: Anshuman Khandual <anshuman.khandual@arm.com>
|
||||
Description:
|
||||
the number of pages CMA API succeeded to release
|
||||
|
@ -34,7 +34,9 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or
|
||||
kdamond. Writing 'update_schemes_tried_bytes' to the file
|
||||
updates only '.../tried_regions/total_bytes' files of this
|
||||
kdamond. Writing 'clear_schemes_tried_regions' to the file
|
||||
removes contents of the 'tried_regions' directory.
|
||||
removes contents of the 'tried_regions' directory. Writing
|
||||
'update_schemes_effective_quotas' to the file updates
|
||||
'.../quotas/effective_bytes' files of this kdamond.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid
|
||||
Date: Mar 2022
|
||||
@ -208,6 +210,12 @@ Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Writing to and reading from this file sets and gets the size
|
||||
quota of the scheme in bytes.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/effective_bytes
|
||||
Date: Feb 2024
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Reading from this file gets the effective size quota of the
|
||||
scheme in bytes, which adjusted for the time quota and goals.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/reset_interval_ms
|
||||
Date: Mar 2022
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
@ -221,6 +229,12 @@ Description: Writing a number 'N' to this file creates the number of
|
||||
directories for setting automatic tuning of the scheme's
|
||||
aggressiveness named '0' to 'N-1' under the goals/ directory.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/target_metric
|
||||
Date: Feb 2024
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Writing to and reading from this file sets and gets the quota
|
||||
auto-tuning goal metric.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/target_value
|
||||
Date: Nov 2023
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
|
4
Documentation/ABI/testing/sysfs-kernel-mm-mempolicy
Normal file
4
Documentation/ABI/testing/sysfs-kernel-mm-mempolicy
Normal file
@ -0,0 +1,4 @@
|
||||
What: /sys/kernel/mm/mempolicy/
|
||||
Date: January 2024
|
||||
Contact: Linux memory management mailing list <linux-mm@kvack.org>
|
||||
Description: Interface for Mempolicy
|
@ -0,0 +1,25 @@
|
||||
What: /sys/kernel/mm/mempolicy/weighted_interleave/
|
||||
Date: January 2024
|
||||
Contact: Linux memory management mailing list <linux-mm@kvack.org>
|
||||
Description: Configuration Interface for the Weighted Interleave policy
|
||||
|
||||
What: /sys/kernel/mm/mempolicy/weighted_interleave/nodeN
|
||||
Date: January 2024
|
||||
Contact: Linux memory management mailing list <linux-mm@kvack.org>
|
||||
Description: Weight configuration interface for nodeN
|
||||
|
||||
The interleave weight for a memory node (N). These weights are
|
||||
utilized by tasks which have set their mempolicy to
|
||||
MPOL_WEIGHTED_INTERLEAVE.
|
||||
|
||||
These weights only affect new allocations, and changes at runtime
|
||||
will not cause migrations on already allocated pages.
|
||||
|
||||
The minimum weight for a node is always 1.
|
||||
|
||||
Minimum weight: 1
|
||||
Maximum weight: 255
|
||||
|
||||
Writing an empty string or `0` will reset the weight to the
|
||||
system default. The system default may be set by the kernel
|
||||
or drivers at boot or during hotplug events.
|
@ -65,11 +65,11 @@ Defines the beginning of the text section. In general, _stext indicates
|
||||
the kernel start address. Used to convert a virtual address from the
|
||||
direct kernel map to a physical address.
|
||||
|
||||
vmap_area_list
|
||||
--------------
|
||||
VMALLOC_START
|
||||
-------------
|
||||
|
||||
Stores the virtual area list. makedumpfile gets the vmalloc start value
|
||||
from this variable and its value is necessary for vmalloc translation.
|
||||
Stores the base address of vmalloc area. makedumpfile gets this value
|
||||
since is necessary for vmalloc translation.
|
||||
|
||||
mem_map
|
||||
-------
|
||||
|
@ -117,6 +117,33 @@ milliseconds.
|
||||
|
||||
1 second by default.
|
||||
|
||||
quota_mem_pressure_us
|
||||
---------------------
|
||||
|
||||
Desired level of memory pressure-stall time in microseconds.
|
||||
|
||||
While keeping the caps that set by other quotas, DAMON_RECLAIM automatically
|
||||
increases and decreases the effective level of the quota aiming this level of
|
||||
memory pressure is incurred. System-wide ``some`` memory PSI in microseconds
|
||||
per quota reset interval (``quota_reset_interval_ms``) is collected and
|
||||
compared to this value to see if the aim is satisfied. Value zero means
|
||||
disabling this auto-tuning feature.
|
||||
|
||||
Disabled by default.
|
||||
|
||||
quota_autotune_feedback
|
||||
-----------------------
|
||||
|
||||
User-specifiable feedback for auto-tuning of the effective quota.
|
||||
|
||||
While keeping the caps that set by other quotas, DAMON_RECLAIM automatically
|
||||
increases and decreases the effective level of the quota aiming receiving this
|
||||
feedback of value ``10,000`` from the user. DAMON_RECLAIM assumes the feedback
|
||||
value and the quota are positively proportional. Value zero means disabling
|
||||
this auto-tuning feature.
|
||||
|
||||
Disabled by default.
|
||||
|
||||
wmarks_interval
|
||||
---------------
|
||||
|
||||
|
@ -83,10 +83,10 @@ comma (",").
|
||||
│ │ │ │ │ │ │ │ sz/min,max
|
||||
│ │ │ │ │ │ │ │ nr_accesses/min,max
|
||||
│ │ │ │ │ │ │ │ age/min,max
|
||||
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms
|
||||
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes
|
||||
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
|
||||
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
|
||||
│ │ │ │ │ │ │ │ │ 0/target_value,current_value
|
||||
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value
|
||||
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
|
||||
│ │ │ │ │ │ │ :ref:`filters <sysfs_filters>`/nr_filters
|
||||
│ │ │ │ │ │ │ │ 0/type,matching,memcg_id
|
||||
@ -153,6 +153,9 @@ Users can write below commands for the kdamond to the ``state`` file.
|
||||
- ``clear_schemes_tried_regions``: Clear the DAMON-based operating scheme
|
||||
action tried regions directory for each DAMON-based operation scheme of the
|
||||
kdamond.
|
||||
- ``update_schemes_effective_bytes``: Update the contents of
|
||||
``effective_bytes`` files for each DAMON-based operation scheme of the
|
||||
kdamond. For more details, refer to :ref:`quotas directory <sysfs_quotas>`.
|
||||
|
||||
If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
|
||||
|
||||
@ -180,19 +183,14 @@ In each context directory, two files (``avail_operations`` and ``operations``)
|
||||
and three directories (``monitoring_attrs``, ``targets``, and ``schemes``)
|
||||
exist.
|
||||
|
||||
DAMON supports multiple types of monitoring operations, including those for
|
||||
virtual address space and the physical address space. You can get the list of
|
||||
available monitoring operations set on the currently running kernel by reading
|
||||
DAMON supports multiple types of :ref:`monitoring operations
|
||||
<damon_design_configurable_operations_set>`, including those for virtual address
|
||||
space and the physical address space. You can get the list of available
|
||||
monitoring operations set on the currently running kernel by reading
|
||||
``avail_operations`` file. Based on the kernel configuration, the file will
|
||||
list some or all of below keywords.
|
||||
|
||||
- vaddr: Monitor virtual address spaces of specific processes
|
||||
- fvaddr: Monitor fixed virtual address ranges
|
||||
- paddr: Monitor the physical address space of the system
|
||||
|
||||
Please refer to :ref:`regions sysfs directory <sysfs_regions>` for detailed
|
||||
differences between the operations sets in terms of the monitoring target
|
||||
regions.
|
||||
list different available operation sets. Please refer to the :ref:`design
|
||||
<damon_operations_set>` for the list of all available operation sets and their
|
||||
brief explanations.
|
||||
|
||||
You can set and get what type of monitoring operations DAMON will use for the
|
||||
context by writing one of the keywords listed in ``avail_operations`` file and
|
||||
@ -247,17 +245,11 @@ process to the ``pid_target`` file.
|
||||
targets/<N>/regions
|
||||
-------------------
|
||||
|
||||
When ``vaddr`` monitoring operations set is being used (``vaddr`` is written to
|
||||
the ``contexts/<N>/operations`` file), DAMON automatically sets and updates the
|
||||
monitoring target regions so that entire memory mappings of target processes
|
||||
can be covered. However, users could want to set the initial monitoring region
|
||||
to specific address ranges.
|
||||
|
||||
In contrast, DAMON do not automatically sets and updates the monitoring target
|
||||
regions when ``fvaddr`` or ``paddr`` monitoring operations sets are being used
|
||||
(``fvaddr`` or ``paddr`` have written to the ``contexts/<N>/operations``).
|
||||
Therefore, users should set the monitoring target regions by themselves in the
|
||||
cases.
|
||||
In case of ``fvaddr`` or ``paddr`` monitoring operations sets, users are
|
||||
required to set the monitoring target address ranges. In case of ``vaddr``
|
||||
operations set, it is not mandatory, but users can optionally set the initial
|
||||
monitoring region to specific address ranges. Please refer to the :ref:`design
|
||||
<damon_design_vaddr_target_regions_construction>` for more details.
|
||||
|
||||
For such cases, users can explicitly set the initial monitoring target regions
|
||||
as they want, by writing proper values to the files under this directory.
|
||||
@ -302,27 +294,8 @@ In each scheme directory, five directories (``access_pattern``, ``quotas``,
|
||||
|
||||
The ``action`` file is for setting and getting the scheme's :ref:`action
|
||||
<damon_design_damos_action>`. The keywords that can be written to and read
|
||||
from the file and their meaning are as below.
|
||||
|
||||
Note that support of each action depends on the running DAMON operations set
|
||||
:ref:`implementation <sysfs_context>`.
|
||||
|
||||
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
|
||||
Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
|
||||
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``lru_prio``: Prioritize the region on its LRU lists.
|
||||
Supported by ``paddr`` operations set.
|
||||
- ``lru_deprio``: Deprioritize the region on its LRU lists.
|
||||
Supported by ``paddr`` operations set.
|
||||
- ``stat``: Do nothing but count the statistics.
|
||||
Supported by all operations sets.
|
||||
from the file and their meaning are same to those of the list on
|
||||
:ref:`design doc <damon_design_damos_action>`.
|
||||
|
||||
The ``apply_interval_us`` file is for setting and getting the scheme's
|
||||
:ref:`apply_interval <damon_design_damos>` in microseconds.
|
||||
@ -350,8 +323,9 @@ schemes/<N>/quotas/
|
||||
The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given
|
||||
DAMON-based operation scheme.
|
||||
|
||||
Under ``quotas`` directory, three files (``ms``, ``bytes``,
|
||||
``reset_interval_ms``) and two directores (``weights`` and ``goals``) exist.
|
||||
Under ``quotas`` directory, four files (``ms``, ``bytes``,
|
||||
``reset_interval_ms``, ``effective_bytes``) and two directores (``weights`` and
|
||||
``goals``) exist.
|
||||
|
||||
You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
|
||||
``reset interval`` in milliseconds by writing the values to the three files,
|
||||
@ -359,7 +333,17 @@ respectively. Then, DAMON tries to use only up to ``time quota`` milliseconds
|
||||
for applying the ``action`` to memory regions of the ``access_pattern``, and to
|
||||
apply the action to only up to ``bytes`` bytes of memory regions within the
|
||||
``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the
|
||||
quota limits.
|
||||
quota limits unless at least one :ref:`goal <sysfs_schemes_quota_goals>` is
|
||||
set.
|
||||
|
||||
The time quota is internally transformed to a size quota. Between the
|
||||
transformed size quota and user-specified size quota, smaller one is applied.
|
||||
Based on the user-specified :ref:`goal <sysfs_schemes_quota_goals>`, the
|
||||
effective size quota is further adjusted. Reading ``effective_bytes`` returns
|
||||
the current effective size quota. The file is not updated in real time, so
|
||||
users should ask DAMON sysfs interface to update the content of the file for
|
||||
the stats by writing a special keyword, ``update_schemes_effective_bytes`` to
|
||||
the relevant ``kdamonds/<N>/state`` file.
|
||||
|
||||
Under ``weights`` directory, three files (``sz_permil``,
|
||||
``nr_accesses_permil``, and ``age_permil``) exist.
|
||||
@ -382,11 +366,11 @@ number (``N``) to the file creates the number of child directories named ``0``
|
||||
to ``N-1``. Each directory represents each goal and current achievement.
|
||||
Among the multiple feedback, the best one is used.
|
||||
|
||||
Each goal directory contains two files, namely ``target_value`` and
|
||||
``current_value``. Users can set and get any number to those files to set the
|
||||
feedback. User space main workload's latency or throughput, system metrics
|
||||
like free memory ratio or memory pressure stall time (PSI) could be example
|
||||
metrics for the values. Note that users should write
|
||||
Each goal directory contains three files, namely ``target_metric``,
|
||||
``target_value`` and ``current_value``. Users can set and get the three
|
||||
parameters for the quota auto-tuning goals that specified on the :ref:`design
|
||||
doc <damon_design_damos_quotas_auto_tuning>` by writing to and reading from each
|
||||
of the files. Note that users should further write
|
||||
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond
|
||||
directory <sysfs_kdamond>` to pass the feedback to DAMON.
|
||||
|
||||
@ -579,11 +563,11 @@ monitoring results recording.
|
||||
While the monitoring is turned on, you could record the tracepoint events and
|
||||
show results using tracepoint supporting tools like ``perf``. For example::
|
||||
|
||||
# echo on > monitor_on
|
||||
# echo on > kdamonds/0/state
|
||||
# perf record -e damon:damon_aggregated &
|
||||
# sleep 5
|
||||
# kill 9 $(pidof perf)
|
||||
# echo off > monitor_on
|
||||
# echo off > kdamonds/0/state
|
||||
# perf script
|
||||
kdamond.0 46568 [027] 79357.842179: damon:damon_aggregated: target_id=0 nr_regions=11 122509119488-135708762112: 0 864
|
||||
[...]
|
||||
@ -628,9 +612,17 @@ debugfs Interface (DEPRECATED!)
|
||||
move, please report your usecase to damon@lists.linux.dev and
|
||||
linux-mm@kvack.org.
|
||||
|
||||
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
|
||||
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
|
||||
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
|
||||
DAMON exports nine files, ``DEPRECATED``, ``attrs``, ``target_ids``,
|
||||
``init_regions``, ``schemes``, ``monitor_on_DEPRECATED``, ``kdamond_pid``,
|
||||
``mk_contexts`` and ``rm_contexts`` under its debugfs directory,
|
||||
``<debugfs>/damon/``.
|
||||
|
||||
|
||||
``DEPRECATED`` is a read-only file for the DAMON debugfs interface deprecation
|
||||
notice. Reading it returns the deprecation notice, as below::
|
||||
|
||||
# cat DEPRECATED
|
||||
DAMON debugfs interface is deprecated, so users should move to DAMON_SYSFS. If you cannot, please report your usecase to damon@lists.linux.dev and linux-mm@kvack.org.
|
||||
|
||||
|
||||
Attributes
|
||||
@ -755,19 +747,17 @@ Action
|
||||
~~~~~~
|
||||
|
||||
The ``<action>`` is a predefined integer for memory management :ref:`actions
|
||||
<damon_design_damos_action>`. The supported numbers and their meanings are as
|
||||
below.
|
||||
<damon_design_damos_action>`. The mapping between the ``<action>`` values and
|
||||
the memory management actions is as below. For the detailed meaning of the
|
||||
action and DAMON operations set supporting each action, please refer to the
|
||||
list on :ref:`design doc <damon_design_damos_action>`.
|
||||
|
||||
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if
|
||||
``target`` is ``paddr``.
|
||||
- 1: Call ``madvise()`` for the region with ``MADV_COLD``. Ignored if
|
||||
``target`` is ``paddr``.
|
||||
- 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
|
||||
- 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. Ignored if
|
||||
``target`` is ``paddr``.
|
||||
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. Ignored if
|
||||
``target`` is ``paddr``.
|
||||
- 5: Do nothing but count the statistics
|
||||
- 0: ``willneed``
|
||||
- 1: ``cold``
|
||||
- 2: ``pageout``
|
||||
- 3: ``hugepage``
|
||||
- 4: ``nohugepage``
|
||||
- 5: ``stat``
|
||||
|
||||
Quota
|
||||
~~~~~
|
||||
@ -848,16 +838,16 @@ Turning On/Off
|
||||
|
||||
Setting the files as described above doesn't incur effect unless you explicitly
|
||||
start the monitoring. You can start, stop, and check the current status of the
|
||||
monitoring by writing to and reading from the ``monitor_on`` file. Writing
|
||||
``on`` to the file starts the monitoring of the targets with the attributes.
|
||||
Writing ``off`` to the file stops those. DAMON also stops if every target
|
||||
process is terminated. Below example commands turn on, off, and check the
|
||||
status of DAMON::
|
||||
monitoring by writing to and reading from the ``monitor_on_DEPRECATED`` file.
|
||||
Writing ``on`` to the file starts the monitoring of the targets with the
|
||||
attributes. Writing ``off`` to the file stops those. DAMON also stops if
|
||||
every target process is terminated. Below example commands turn on, off, and
|
||||
check the status of DAMON::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# echo on > monitor_on
|
||||
# echo off > monitor_on
|
||||
# cat monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# echo off > monitor_on_DEPRECATED
|
||||
# cat monitor_on_DEPRECATED
|
||||
off
|
||||
|
||||
Please note that you cannot write to the above-mentioned debugfs files while
|
||||
@ -873,11 +863,11 @@ can get the pid of the thread by reading the ``kdamond_pid`` file. When the
|
||||
monitoring is turned off, reading the file returns ``none``. ::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# cat monitor_on
|
||||
# cat monitor_on_DEPRECATED
|
||||
off
|
||||
# cat kdamond_pid
|
||||
none
|
||||
# echo on > monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# cat kdamond_pid
|
||||
18594
|
||||
|
||||
@ -907,5 +897,5 @@ directory by putting the name of the context to the ``rm_contexts`` file. ::
|
||||
# ls foo
|
||||
# ls: cannot access 'foo': No such file or directory
|
||||
|
||||
Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the
|
||||
root directory only.
|
||||
Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on_DEPRECATED`` files
|
||||
are in the root directory only.
|
||||
|
@ -250,6 +250,15 @@ MPOL_PREFERRED_MANY
|
||||
can fall back to all existing numa nodes. This is effectively
|
||||
MPOL_PREFERRED allowed for a mask rather than a single node.
|
||||
|
||||
MPOL_WEIGHTED_INTERLEAVE
|
||||
This mode operates the same as MPOL_INTERLEAVE, except that
|
||||
interleaving behavior is executed based on weights set in
|
||||
/sys/kernel/mm/mempolicy/weighted_interleave/
|
||||
|
||||
Weighted interleave allocates pages on nodes according to a
|
||||
weight. For example if nodes [0,1] are weighted [5,2], 5 pages
|
||||
will be allocated on node0 for every 2 pages allocated on node1.
|
||||
|
||||
NUMA memory policy supports the following optional mode flags:
|
||||
|
||||
MPOL_F_STATIC_NODES
|
||||
|
@ -169,7 +169,7 @@ Error reports
|
||||
A typical KASAN report looks like this::
|
||||
|
||||
==================================================================
|
||||
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [kasan_test]
|
||||
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
|
||||
|
||||
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
|
||||
@ -179,8 +179,8 @@ A typical KASAN report looks like this::
|
||||
print_address_description+0x73/0x280
|
||||
kasan_report+0x144/0x187
|
||||
__asan_report_store1_noabort+0x17/0x20
|
||||
kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
kmalloc_oob_right+0xa8/0xbc [kasan_test]
|
||||
kmalloc_tests_init+0x16/0x700 [kasan_test]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
@ -200,8 +200,8 @@ A typical KASAN report looks like this::
|
||||
save_stack+0x43/0xd0
|
||||
kasan_kmalloc+0xa7/0xd0
|
||||
kmem_cache_alloc_trace+0xe1/0x1b0
|
||||
kmalloc_oob_right+0x56/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
kmalloc_oob_right+0x56/0xbc [kasan_test]
|
||||
kmalloc_tests_init+0x16/0x700 [kasan_test]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
@ -531,15 +531,15 @@ When a test passes::
|
||||
|
||||
When a test fails due to a failed ``kmalloc``::
|
||||
|
||||
# kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163
|
||||
# kmalloc_large_oob_right: ASSERTION FAILED at mm/kasan/kasan_test.c:245
|
||||
Expected ptr is not null, but is
|
||||
not ok 4 - kmalloc_large_oob_right
|
||||
not ok 5 - kmalloc_large_oob_right
|
||||
|
||||
When a test fails due to a missing KASAN report::
|
||||
|
||||
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
|
||||
# kmalloc_double_kzfree: EXPECTATION FAILED at mm/kasan/kasan_test.c:709
|
||||
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
|
||||
not ok 44 - kmalloc_double_kzfree
|
||||
not ok 28 - kmalloc_double_kzfree
|
||||
|
||||
|
||||
At the end the cumulative status of all KASAN tests is printed. On success::
|
||||
@ -555,7 +555,7 @@ There are a few ways to run KUnit-compatible KASAN tests.
|
||||
1. Loadable module
|
||||
|
||||
With ``CONFIG_KUNIT`` enabled, KASAN-KUnit tests can be built as a loadable
|
||||
module and run by loading ``test_kasan.ko`` with ``insmod`` or ``modprobe``.
|
||||
module and run by loading ``kasan_test.ko`` with ``insmod`` or ``modprobe``.
|
||||
|
||||
2. Built-In
|
||||
|
||||
|
@ -31,6 +31,8 @@ DAMON subsystem is configured with three layers including
|
||||
interfaces for the user space, on top of the core layer.
|
||||
|
||||
|
||||
.. _damon_design_configurable_operations_set:
|
||||
|
||||
Configurable Operations Set
|
||||
---------------------------
|
||||
|
||||
@ -63,6 +65,8 @@ modules that built on top of the core layer using the API, which can be easily
|
||||
used by the user space end users.
|
||||
|
||||
|
||||
.. _damon_operations_set:
|
||||
|
||||
Operations Set Layer
|
||||
====================
|
||||
|
||||
@ -71,16 +75,26 @@ The monitoring operations are defined in two parts:
|
||||
1. Identification of the monitoring target address range for the address space.
|
||||
2. Access check of specific address range in the target space.
|
||||
|
||||
DAMON currently provides the implementations of the operations for the physical
|
||||
and virtual address spaces. Below two subsections describe how those work.
|
||||
DAMON currently provides below three operation sets. Below two subsections
|
||||
describe how those work.
|
||||
|
||||
- vaddr: Monitor virtual address spaces of specific processes
|
||||
- fvaddr: Monitor fixed virtual address ranges
|
||||
- paddr: Monitor the physical address space of the system
|
||||
|
||||
|
||||
.. _damon_design_vaddr_target_regions_construction:
|
||||
|
||||
VMA-based Target Address Range Construction
|
||||
-------------------------------------------
|
||||
|
||||
This is only for the virtual address space monitoring operations
|
||||
implementation. That for the physical address space simply asks users to
|
||||
manually set the monitoring target address ranges.
|
||||
A mechanism of ``vaddr`` DAMON operations set that automatically initializes
|
||||
and updates the monitoring target address regions so that entire memory
|
||||
mappings of the target processes can be covered.
|
||||
|
||||
This mechanism is only for the ``vaddr`` operations set. In cases of
|
||||
``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the
|
||||
monitoring target address ranges.
|
||||
|
||||
Only small parts in the super-huge virtual address space of the processes are
|
||||
mapped to the physical memory and accessed. Thus, tracking the unmapped
|
||||
@ -294,9 +308,29 @@ not mandated to support all actions of the list. Hence, the availability of
|
||||
specific DAMOS action depends on what operations set is selected to be used
|
||||
together.
|
||||
|
||||
Applying an action to a region is considered as changing the region's
|
||||
characteristics. Hence, DAMOS resets the age of regions when an action is
|
||||
applied to those.
|
||||
The list of the supported actions, their meaning, and DAMON operations sets
|
||||
that supports each action are as below.
|
||||
|
||||
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``pageout``: Reclaim the region.
|
||||
Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
|
||||
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
|
||||
Supported by ``vaddr`` and ``fvaddr`` operations set.
|
||||
- ``lru_prio``: Prioritize the region on its LRU lists.
|
||||
Supported by ``paddr`` operations set.
|
||||
- ``lru_deprio``: Deprioritize the region on its LRU lists.
|
||||
Supported by ``paddr`` operations set.
|
||||
- ``stat``: Do nothing but count the statistics.
|
||||
Supported by all operations sets.
|
||||
|
||||
Applying the actions except ``stat`` to a region is considered as changing the
|
||||
region's characteristics. Hence, DAMOS resets the age of regions when any such
|
||||
actions are applied to those.
|
||||
|
||||
|
||||
.. _damon_design_damos_access_pattern:
|
||||
@ -364,12 +398,28 @@ Aim-oriented Feedback-driven Auto-tuning
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Automatic feedback-driven quota tuning. Instead of setting the absolute quota
|
||||
value, users can repeatedly provide numbers representing how much of their goal
|
||||
for the scheme is achieved as feedback. DAMOS then automatically tunes the
|
||||
value, users can specify the metric of their interest, and what target value
|
||||
they want the metric value to be. DAMOS then automatically tunes the
|
||||
aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS
|
||||
is under achieving the goal, DAMOS automatically increases the quota. If DAMOS
|
||||
is over achieving the goal, it decreases the quota.
|
||||
|
||||
The goal can be specified with three parameters, namely ``target_metric``,
|
||||
``target_value``, and ``current_value``. The auto-tuning mechanism tries to
|
||||
make ``current_value`` of ``target_metric`` be same to ``target_value``.
|
||||
Currently, two ``target_metric`` are provided.
|
||||
|
||||
- ``user_input``: User-provided value. Users could use any metric that they
|
||||
has interest in for the value. Use space main workload's latency or
|
||||
throughput, system metrics like free memory ratio or memory pressure stall
|
||||
time (PSI) could be examples. Note that users should explicitly set
|
||||
``current_value`` on their own in this case. In other words, users should
|
||||
repeatedly provide the feedback.
|
||||
- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information
|
||||
in microseconds that measured from last quota reset to next quota reset.
|
||||
DAMOS does the measurement on its own, so only ``target_value`` need to be
|
||||
set by users at the initial time. In other words, DAMOS does self-feedback.
|
||||
|
||||
|
||||
.. _damon_design_damos_watermarks:
|
||||
|
||||
|
@ -21,8 +21,8 @@ be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the
|
||||
memory management subsystem maintainer.
|
||||
|
||||
Note again the patches for review should be made against the mm-unstable
|
||||
tree[1] whenever possible. damon/next is only for preview of others' works in
|
||||
progress.
|
||||
tree [1]_ whenever possible. damon/next is only for preview of others' works
|
||||
in progress.
|
||||
|
||||
Submit checklist addendum
|
||||
-------------------------
|
||||
@ -41,8 +41,8 @@ Further doing below and putting the results will be helpful.
|
||||
Key cycle dates
|
||||
---------------
|
||||
|
||||
Patches can be sent anytime. Key cycle dates of the mm-unstable[1] and
|
||||
mm-stable[3] trees depend on the memory management subsystem maintainer.
|
||||
Patches can be sent anytime. Key cycle dates of the mm-unstable [1]_ and
|
||||
mm-stable [3]_ trees depend on the memory management subsystem maintainer.
|
||||
|
||||
Review cadence
|
||||
--------------
|
||||
|
@ -24,6 +24,11 @@ fragmentation statistics can be obtained through gfp flag information of
|
||||
each page. It is already implemented and activated if page owner is
|
||||
enabled. Other usages are more than welcome.
|
||||
|
||||
It can also be used to show all the stacks and their outstanding
|
||||
allocations, which gives us a quick overview of where the memory is going
|
||||
without the need to screen through all the pages and match the allocation
|
||||
and free operation.
|
||||
|
||||
page owner is disabled by default. So, if you'd like to use it, you need
|
||||
to add "page_owner=on" to your boot cmdline. If the kernel is built
|
||||
with page owner and page owner is disabled in runtime due to not enabling
|
||||
@ -68,6 +73,46 @@ Usage
|
||||
|
||||
4) Analyze information from page owner::
|
||||
|
||||
cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt
|
||||
cat stacks.txt
|
||||
prep_new_page+0xa9/0x120
|
||||
get_page_from_freelist+0x7e6/0x2140
|
||||
__alloc_pages+0x18a/0x370
|
||||
new_slab+0xc8/0x580
|
||||
___slab_alloc+0x1f2/0xaf0
|
||||
__slab_alloc.isra.86+0x22/0x40
|
||||
kmem_cache_alloc+0x31b/0x350
|
||||
__khugepaged_enter+0x39/0x100
|
||||
dup_mmap+0x1c7/0x5ce
|
||||
copy_process+0x1afe/0x1c90
|
||||
kernel_clone+0x9a/0x3c0
|
||||
__do_sys_clone+0x66/0x90
|
||||
do_syscall_64+0x7f/0x160
|
||||
entry_SYSCALL_64_after_hwframe+0x6c/0x74
|
||||
stack_count: 234
|
||||
...
|
||||
...
|
||||
echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold
|
||||
cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt
|
||||
cat stacks_7000.txt
|
||||
prep_new_page+0xa9/0x120
|
||||
get_page_from_freelist+0x7e6/0x2140
|
||||
__alloc_pages+0x18a/0x370
|
||||
alloc_pages_mpol+0xdf/0x1e0
|
||||
folio_alloc+0x14/0x50
|
||||
filemap_alloc_folio+0xb0/0x100
|
||||
page_cache_ra_unbounded+0x97/0x180
|
||||
filemap_fault+0x4b4/0x1200
|
||||
__do_fault+0x2d/0x110
|
||||
do_pte_missing+0x4b0/0xa30
|
||||
__handle_mm_fault+0x7fa/0xb70
|
||||
handle_mm_fault+0x125/0x300
|
||||
do_user_addr_fault+0x3c9/0x840
|
||||
exc_page_fault+0x68/0x150
|
||||
asm_exc_page_fault+0x22/0x30
|
||||
stack_count: 8248
|
||||
...
|
||||
|
||||
cat /sys/kernel/debug/page_owner > page_owner_full.txt
|
||||
./page_owner_sort page_owner_full.txt sorted_page_owner.txt
|
||||
|
||||
|
@ -344,7 +344,7 @@ debugfs接口
|
||||
:ref:`sysfs接口<sysfs_interface>`。
|
||||
|
||||
DAMON导出了八个文件, ``attrs``, ``target_ids``, ``init_regions``,
|
||||
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` 和
|
||||
``schemes``, ``monitor_on_DEPRECATED``, ``kdamond_pid``, ``mk_contexts`` 和
|
||||
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
|
||||
|
||||
|
||||
@ -521,15 +521,15 @@ DAMON导出了八个文件, ``attrs``, ``target_ids``, ``init_regions``,
|
||||
开关
|
||||
----
|
||||
|
||||
除非你明确地启动监测,否则如上所述的文件设置不会产生效果。你可以通过写入和读取 ``monitor_on``
|
||||
除非你明确地启动监测,否则如上所述的文件设置不会产生效果。你可以通过写入和读取 ``monitor_on_DEPRECATED``
|
||||
文件来启动、停止和检查监测的当前状态。写入 ``on`` 该文件可以启动对有属性的目标的监测。写入
|
||||
``off`` 该文件则停止这些目标。如果每个目标进程被终止,DAMON也会停止。下面的示例命令开启、关
|
||||
闭和检查DAMON的状态::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# echo on > monitor_on
|
||||
# echo off > monitor_on
|
||||
# cat monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# echo off > monitor_on_DEPRECATED
|
||||
# cat monitor_on_DEPRECATED
|
||||
off
|
||||
|
||||
请注意,当监测开启时,你不能写到上述的debugfs文件。如果你在DAMON运行时写到这些文件,将会返
|
||||
@ -543,11 +543,11 @@ DAMON通过一个叫做kdamond的内核线程来进行请求监测。你可以
|
||||
得该线程的 ``pid`` 。当监测被 ``关闭`` 时,读取该文件不会返回任何信息::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# cat monitor_on
|
||||
# cat monitor_on_DEPRECATED
|
||||
off
|
||||
# cat kdamond_pid
|
||||
none
|
||||
# echo on > monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# cat kdamond_pid
|
||||
18594
|
||||
|
||||
@ -574,7 +574,7 @@ DAMON通过一个叫做kdamond的内核线程来进行请求监测。你可以
|
||||
# ls foo
|
||||
# ls: cannot access 'foo': No such file or directory
|
||||
|
||||
注意, ``mk_contexts`` 、 ``rm_contexts`` 和 ``monitor_on`` 文件只在根目录下。
|
||||
注意, ``mk_contexts`` 、 ``rm_contexts`` 和 ``monitor_on_DEPRECATED`` 文件只在根目录下。
|
||||
|
||||
|
||||
监测结果的监测点
|
||||
@ -583,9 +583,9 @@ DAMON通过一个叫做kdamond的内核线程来进行请求监测。你可以
|
||||
DAMON通过一个tracepoint ``damon:damon_aggregated`` 提供监测结果. 当监测开启时,你可
|
||||
以记录追踪点事件,并使用追踪点支持工具如perf显示结果。比如说::
|
||||
|
||||
# echo on > monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# perf record -e damon:damon_aggregated &
|
||||
# sleep 5
|
||||
# kill 9 $(pidof perf)
|
||||
# echo off > monitor_on
|
||||
# echo off > monitor_on_DEPRECATED
|
||||
# perf script
|
||||
|
@ -137,7 +137,7 @@ KASAN受到通用 ``panic_on_warn`` 命令行参数的影响。当它被启用
|
||||
典型的KASAN报告如下所示::
|
||||
|
||||
==================================================================
|
||||
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [kasan_test]
|
||||
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
|
||||
|
||||
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
|
||||
@ -147,8 +147,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行参数的影响。当它被启用
|
||||
print_address_description+0x73/0x280
|
||||
kasan_report+0x144/0x187
|
||||
__asan_report_store1_noabort+0x17/0x20
|
||||
kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
kmalloc_oob_right+0xa8/0xbc [kasan_test]
|
||||
kmalloc_tests_init+0x16/0x700 [kasan_test]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
@ -168,8 +168,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行参数的影响。当它被启用
|
||||
save_stack+0x43/0xd0
|
||||
kasan_kmalloc+0xa7/0xd0
|
||||
kmem_cache_alloc_trace+0xe1/0x1b0
|
||||
kmalloc_oob_right+0x56/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
kmalloc_oob_right+0x56/0xbc [kasan_test]
|
||||
kmalloc_tests_init+0x16/0x700 [kasan_test]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
@ -421,15 +421,15 @@ KASAN连接到vmap基础架构以懒清理未使用的影子内存。
|
||||
|
||||
当由于 ``kmalloc`` 失败而导致测试失败时::
|
||||
|
||||
# kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163
|
||||
# kmalloc_large_oob_right: ASSERTION FAILED at mm/kasan/kasan_test.c:245
|
||||
Expected ptr is not null, but is
|
||||
not ok 4 - kmalloc_large_oob_right
|
||||
not ok 5 - kmalloc_large_oob_right
|
||||
|
||||
当由于缺少KASAN报告而导致测试失败时::
|
||||
|
||||
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
|
||||
# kmalloc_double_kzfree: EXPECTATION FAILED at mm/kasan/kasan_test.c:709
|
||||
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
|
||||
not ok 44 - kmalloc_double_kzfree
|
||||
not ok 28 - kmalloc_double_kzfree
|
||||
|
||||
|
||||
最后打印所有KASAN测试的累积状态。成功::
|
||||
@ -445,7 +445,7 @@ KASAN连接到vmap基础架构以懒清理未使用的影子内存。
|
||||
1. 可加载模块
|
||||
|
||||
启用 ``CONFIG_KUNIT`` 后,KASAN-KUnit测试可以构建为可加载模块,并通过使用
|
||||
``insmod`` 或 ``modprobe`` 加载 ``test_kasan.ko`` 来运行。
|
||||
``insmod`` 或 ``modprobe`` 加载 ``kasan_test.ko`` 来运行。
|
||||
|
||||
2. 内置
|
||||
|
||||
|
@ -344,7 +344,7 @@ debugfs接口
|
||||
:ref:`sysfs接口<sysfs_interface>`。
|
||||
|
||||
DAMON導出了八個文件, ``attrs``, ``target_ids``, ``init_regions``,
|
||||
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` 和
|
||||
``schemes``, ``monitor_on_DEPRECATED``, ``kdamond_pid``, ``mk_contexts`` 和
|
||||
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
|
||||
|
||||
|
||||
@ -521,15 +521,15 @@ DAMON導出了八個文件, ``attrs``, ``target_ids``, ``init_regions``,
|
||||
開關
|
||||
----
|
||||
|
||||
除非你明確地啓動監測,否則如上所述的文件設置不會產生效果。你可以通過寫入和讀取 ``monitor_on``
|
||||
除非你明確地啓動監測,否則如上所述的文件設置不會產生效果。你可以通過寫入和讀取 ``monitor_on_DEPRECATED``
|
||||
文件來啓動、停止和檢查監測的當前狀態。寫入 ``on`` 該文件可以啓動對有屬性的目標的監測。寫入
|
||||
``off`` 該文件則停止這些目標。如果每個目標進程被終止,DAMON也會停止。下面的示例命令開啓、關
|
||||
閉和檢查DAMON的狀態::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# echo on > monitor_on
|
||||
# echo off > monitor_on
|
||||
# cat monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# echo off > monitor_on_DEPRECATED
|
||||
# cat monitor_on_DEPRECATED
|
||||
off
|
||||
|
||||
請注意,當監測開啓時,你不能寫到上述的debugfs文件。如果你在DAMON運行時寫到這些文件,將會返
|
||||
@ -543,11 +543,11 @@ DAMON通過一個叫做kdamond的內核線程來進行請求監測。你可以
|
||||
得該線程的 ``pid`` 。當監測被 ``關閉`` 時,讀取該文件不會返回任何信息::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# cat monitor_on
|
||||
# cat monitor_on_DEPRECATED
|
||||
off
|
||||
# cat kdamond_pid
|
||||
none
|
||||
# echo on > monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# cat kdamond_pid
|
||||
18594
|
||||
|
||||
@ -574,7 +574,7 @@ DAMON通過一個叫做kdamond的內核線程來進行請求監測。你可以
|
||||
# ls foo
|
||||
# ls: cannot access 'foo': No such file or directory
|
||||
|
||||
注意, ``mk_contexts`` 、 ``rm_contexts`` 和 ``monitor_on`` 文件只在根目錄下。
|
||||
注意, ``mk_contexts`` 、 ``rm_contexts`` 和 ``monitor_on_DEPRECATED`` 文件只在根目錄下。
|
||||
|
||||
|
||||
監測結果的監測點
|
||||
@ -583,10 +583,10 @@ DAMON通過一個叫做kdamond的內核線程來進行請求監測。你可以
|
||||
DAMON通過一個tracepoint ``damon:damon_aggregated`` 提供監測結果. 當監測開啓時,你可
|
||||
以記錄追蹤點事件,並使用追蹤點支持工具如perf顯示結果。比如說::
|
||||
|
||||
# echo on > monitor_on
|
||||
# echo on > monitor_on_DEPRECATED
|
||||
# perf record -e damon:damon_aggregated &
|
||||
# sleep 5
|
||||
# kill 9 $(pidof perf)
|
||||
# echo off > monitor_on
|
||||
# echo off > monitor_on_DEPRECATED
|
||||
# perf script
|
||||
|
||||
|
@ -137,7 +137,7 @@ KASAN受到通用 ``panic_on_warn`` 命令行參數的影響。當它被啓用
|
||||
典型的KASAN報告如下所示::
|
||||
|
||||
==================================================================
|
||||
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [kasan_test]
|
||||
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
|
||||
|
||||
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
|
||||
@ -147,8 +147,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行參數的影響。當它被啓用
|
||||
print_address_description+0x73/0x280
|
||||
kasan_report+0x144/0x187
|
||||
__asan_report_store1_noabort+0x17/0x20
|
||||
kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
kmalloc_oob_right+0xa8/0xbc [kasan_test]
|
||||
kmalloc_tests_init+0x16/0x700 [kasan_test]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
@ -168,8 +168,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行參數的影響。當它被啓用
|
||||
save_stack+0x43/0xd0
|
||||
kasan_kmalloc+0xa7/0xd0
|
||||
kmem_cache_alloc_trace+0xe1/0x1b0
|
||||
kmalloc_oob_right+0x56/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
kmalloc_oob_right+0x56/0xbc [kasan_test]
|
||||
kmalloc_tests_init+0x16/0x700 [kasan_test]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
@ -421,15 +421,15 @@ KASAN連接到vmap基礎架構以懶清理未使用的影子內存。
|
||||
|
||||
當由於 ``kmalloc`` 失敗而導致測試失敗時::
|
||||
|
||||
# kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163
|
||||
# kmalloc_large_oob_right: ASSERTION FAILED at mm/kasan/kasan_test.c:245
|
||||
Expected ptr is not null, but is
|
||||
not ok 4 - kmalloc_large_oob_right
|
||||
not ok 5 - kmalloc_large_oob_right
|
||||
|
||||
當由於缺少KASAN報告而導致測試失敗時::
|
||||
|
||||
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
|
||||
# kmalloc_double_kzfree: EXPECTATION FAILED at mm/kasan/kasan_test.c:709
|
||||
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
|
||||
not ok 44 - kmalloc_double_kzfree
|
||||
not ok 28 - kmalloc_double_kzfree
|
||||
|
||||
|
||||
最後打印所有KASAN測試的累積狀態。成功::
|
||||
@ -445,7 +445,7 @@ KASAN連接到vmap基礎架構以懶清理未使用的影子內存。
|
||||
1. 可加載模塊
|
||||
|
||||
啓用 ``CONFIG_KUNIT`` 後,KASAN-KUnit測試可以構建爲可加載模塊,並通過使用
|
||||
``insmod`` 或 ``modprobe`` 加載 ``test_kasan.ko`` 來運行。
|
||||
``insmod`` 或 ``modprobe`` 加載 ``kasan_test.ko`` 來運行。
|
||||
|
||||
2. 內置
|
||||
|
||||
|
12
MAINTAINERS
12
MAINTAINERS
@ -5413,6 +5413,7 @@ R: Muchun Song <muchun.song@linux.dev>
|
||||
L: cgroups@vger.kernel.org
|
||||
L: linux-mm@kvack.org
|
||||
S: Maintained
|
||||
F: include/linux/memcontrol.h
|
||||
F: mm/memcontrol.c
|
||||
F: mm/swap_cgroup.c
|
||||
F: samples/cgroup/*
|
||||
@ -14144,15 +14145,24 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
|
||||
T: quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new
|
||||
F: include/linux/gfp.h
|
||||
F: include/linux/gfp_types.h
|
||||
F: include/linux/memfd.h
|
||||
F: include/linux/memory.h
|
||||
F: include/linux/memory_hotplug.h
|
||||
F: include/linux/memory-tiers.h
|
||||
F: include/linux/mempolicy.h
|
||||
F: include/linux/mempool.h
|
||||
F: include/linux/memremap.h
|
||||
F: include/linux/mm.h
|
||||
F: include/linux/mm_*.h
|
||||
F: include/linux/mmzone.h
|
||||
F: include/linux/mmu_notifier.h
|
||||
F: include/linux/pagewalk.h
|
||||
F: include/linux/rmap.h
|
||||
F: include/trace/events/ksm.h
|
||||
F: mm/
|
||||
F: tools/mm/
|
||||
F: tools/testing/selftests/mm/
|
||||
N: include/linux/page[-_]*
|
||||
|
||||
MEMORY MAPPING
|
||||
M: Andrew Morton <akpm@linux-foundation.org>
|
||||
@ -24447,6 +24457,7 @@ ZSWAP COMPRESSED SWAP CACHING
|
||||
M: Johannes Weiner <hannes@cmpxchg.org>
|
||||
M: Yosry Ahmed <yosryahmed@google.com>
|
||||
M: Nhat Pham <nphamcs@gmail.com>
|
||||
R: Chengming Zhou <chengming.zhou@linux.dev>
|
||||
L: linux-mm@kvack.org
|
||||
S: Maintained
|
||||
F: Documentation/admin-guide/mm/zswap.rst
|
||||
@ -24454,6 +24465,7 @@ F: include/linux/zpool.h
|
||||
F: include/linux/zswap.h
|
||||
F: mm/zpool.c
|
||||
F: mm/zswap.c
|
||||
F: tools/testing/selftests/cgroup/test_zswap.c
|
||||
|
||||
THE REST
|
||||
M: Linus Torvalds <torvalds@linux-foundation.org>
|
||||
|
@ -6,6 +6,7 @@
|
||||
config ARC
|
||||
def_bool y
|
||||
select ARC_TIMERS
|
||||
select ARCH_HAS_CPU_CACHE_ALIASING
|
||||
select ARCH_HAS_CACHE_LINE_SIZE
|
||||
select ARCH_HAS_DEBUG_VM_PGTABLE
|
||||
select ARCH_HAS_DMA_PREP_COHERENT
|
||||
|
9
arch/arc/include/asm/cachetype.h
Normal file
9
arch/arc/include/asm/cachetype.h
Normal file
@ -0,0 +1,9 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __ASM_ARC_CACHETYPE_H
|
||||
#define __ASM_ARC_CACHETYPE_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#define cpu_dcache_is_aliasing() true
|
||||
|
||||
#endif
|
@ -5,6 +5,7 @@ config ARM
|
||||
select ARCH_32BIT_OFF_T
|
||||
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE if HAVE_KRETPROBES && FRAME_POINTER && !ARM_UNWIND
|
||||
select ARCH_HAS_BINFMT_FLAT
|
||||
select ARCH_HAS_CPU_CACHE_ALIASING
|
||||
select ARCH_HAS_CPU_FINALIZE_INIT if MMU
|
||||
select ARCH_HAS_CURRENT_STACK_POINTER
|
||||
select ARCH_HAS_DEBUG_VIRTUAL if MMU
|
||||
|
@ -17,7 +17,7 @@ config ARM_PTDUMP_DEBUGFS
|
||||
kernel.
|
||||
If in doubt, say "N"
|
||||
|
||||
config DEBUG_WX
|
||||
config ARM_DEBUG_WX
|
||||
bool "Warn on W+X mappings at boot"
|
||||
depends on MMU
|
||||
select ARM_PTDUMP_CORE
|
||||
|
@ -252,7 +252,7 @@ CONFIG_DEBUG_INFO_REDUCED=y
|
||||
CONFIG_GDB_SCRIPTS=y
|
||||
CONFIG_STRIP_ASM_SYMS=y
|
||||
CONFIG_DEBUG_FS=y
|
||||
CONFIG_DEBUG_WX=y
|
||||
CONFIG_ARM_DEBUG_WX=y
|
||||
CONFIG_SCHED_STACK_END_CHECK=y
|
||||
CONFIG_PANIC_ON_OOPS=y
|
||||
CONFIG_PANIC_TIMEOUT=-1
|
||||
|
@ -302,7 +302,7 @@ CONFIG_DEBUG_INFO_REDUCED=y
|
||||
CONFIG_GDB_SCRIPTS=y
|
||||
CONFIG_STRIP_ASM_SYMS=y
|
||||
CONFIG_DEBUG_FS=y
|
||||
CONFIG_DEBUG_WX=y
|
||||
CONFIG_ARM_DEBUG_WX=y
|
||||
CONFIG_SCHED_STACK_END_CHECK=y
|
||||
CONFIG_PANIC_ON_OOPS=y
|
||||
CONFIG_PANIC_TIMEOUT=-1
|
||||
|
@ -20,6 +20,8 @@ extern unsigned int cacheid;
|
||||
#define icache_is_vipt_aliasing() cacheid_is(CACHEID_VIPT_I_ALIASING)
|
||||
#define icache_is_pipt() cacheid_is(CACHEID_PIPT)
|
||||
|
||||
#define cpu_dcache_is_aliasing() (cache_is_vivt() || cache_is_vipt_aliasing())
|
||||
|
||||
/*
|
||||
* __LINUX_ARM_ARCH__ is the minimum supported CPU architecture
|
||||
* Mask out support which will never be present on newer CPUs.
|
||||
|
@ -213,7 +213,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
|
||||
|
||||
#define pmd_pfn(pmd) (__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
|
||||
|
||||
#define pmd_large(pmd) (pmd_val(pmd) & 2)
|
||||
#define pmd_leaf(pmd) (pmd_val(pmd) & 2)
|
||||
#define pmd_bad(pmd) (pmd_val(pmd) & 2)
|
||||
#define pmd_present(pmd) (pmd_val(pmd))
|
||||
|
@ -118,7 +118,6 @@
|
||||
PMD_TYPE_TABLE)
|
||||
#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
|
||||
PMD_TYPE_SECT)
|
||||
#define pmd_large(pmd) pmd_sect(pmd)
|
||||
#define pmd_leaf(pmd) pmd_sect(pmd)
|
||||
|
||||
#define pud_clear(pudp) \
|
||||
|
@ -209,6 +209,8 @@ static inline void __sync_icache_dcache(pte_t pteval)
|
||||
extern void __sync_icache_dcache(pte_t pteval);
|
||||
#endif
|
||||
|
||||
#define PFN_PTE_SHIFT PAGE_SHIFT
|
||||
|
||||
void set_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pteval, unsigned int nr);
|
||||
#define set_ptes set_ptes
|
||||
|
@ -32,10 +32,10 @@ void ptdump_check_wx(void);
|
||||
|
||||
#endif /* CONFIG_ARM_PTDUMP_CORE */
|
||||
|
||||
#ifdef CONFIG_DEBUG_WX
|
||||
#define debug_checkwx() ptdump_check_wx()
|
||||
#ifdef CONFIG_ARM_DEBUG_WX
|
||||
#define arm_debug_checkwx() ptdump_check_wx()
|
||||
#else
|
||||
#define debug_checkwx() do { } while (0)
|
||||
#define arm_debug_checkwx() do { } while (0)
|
||||
#endif
|
||||
|
||||
#endif /* __ASM_PTDUMP_H */
|
||||
|
@ -60,6 +60,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o insn.o patch.o
|
||||
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o insn.o patch.o
|
||||
obj-$(CONFIG_JUMP_LABEL) += jump_label.o insn.o patch.o
|
||||
obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o
|
||||
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
|
||||
# Main staffs in KPROBES are in arch/arm/probes/ .
|
||||
obj-$(CONFIG_KPROBES) += patch.o insn.o
|
||||
obj-$(CONFIG_OABI_COMPAT) += sys_oabi-compat.o
|
||||
|
@ -198,10 +198,3 @@ void machine_kexec(struct kimage *image)
|
||||
|
||||
soft_restart(reboot_entry_phys);
|
||||
}
|
||||
|
||||
void arch_crash_save_vmcoreinfo(void)
|
||||
{
|
||||
#ifdef CONFIG_ARM_LPAE
|
||||
VMCOREINFO_CONFIG(ARM_LPAE);
|
||||
#endif
|
||||
}
|
||||
|
@ -979,7 +979,7 @@ static int __init init_machine_late(void)
|
||||
}
|
||||
late_initcall(init_machine_late);
|
||||
|
||||
#ifdef CONFIG_KEXEC
|
||||
#ifdef CONFIG_CRASH_RESERVE
|
||||
/*
|
||||
* The crash region must be aligned to 128MB to avoid
|
||||
* zImage relocating below the reserved region.
|
||||
@ -1066,7 +1066,7 @@ static void __init reserve_crashkernel(void)
|
||||
}
|
||||
#else
|
||||
static inline void reserve_crashkernel(void) {}
|
||||
#endif /* CONFIG_KEXEC */
|
||||
#endif /* CONFIG_CRASH_RESERVE*/
|
||||
|
||||
void __init hyp_mode_check(void)
|
||||
{
|
||||
|
10
arch/arm/kernel/vmcore_info.c
Normal file
10
arch/arm/kernel/vmcore_info.c
Normal file
@ -0,0 +1,10 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
|
||||
#include <linux/vmcore_info.h>
|
||||
|
||||
void arch_crash_save_vmcoreinfo(void)
|
||||
{
|
||||
#ifdef CONFIG_ARM_LPAE
|
||||
VMCOREINFO_CONFIG(ARM_LPAE);
|
||||
#endif
|
||||
}
|
@ -349,12 +349,12 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, unsigned long start)
|
||||
for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
|
||||
addr = start + i * PMD_SIZE;
|
||||
domain = get_domain_name(pmd);
|
||||
if (pmd_none(*pmd) || pmd_large(*pmd) || !pmd_present(*pmd))
|
||||
if (pmd_none(*pmd) || pmd_leaf(*pmd) || !pmd_present(*pmd))
|
||||
note_page(st, addr, 4, pmd_val(*pmd), domain);
|
||||
else
|
||||
walk_pte(st, pmd, addr, domain);
|
||||
|
||||
if (SECTION_SIZE < PMD_SIZE && pmd_large(pmd[1])) {
|
||||
if (SECTION_SIZE < PMD_SIZE && pmd_leaf(pmd[1])) {
|
||||
addr += SECTION_SIZE;
|
||||
pmd++;
|
||||
domain = get_domain_name(pmd);
|
||||
|
@ -458,7 +458,7 @@ static int __mark_rodata_ro(void *unused)
|
||||
void mark_rodata_ro(void)
|
||||
{
|
||||
stop_machine(__mark_rodata_ro, NULL, NULL);
|
||||
debug_checkwx();
|
||||
arm_debug_checkwx();
|
||||
}
|
||||
|
||||
#else
|
||||
|
@ -1814,6 +1814,6 @@ void set_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
if (--nr == 0)
|
||||
break;
|
||||
ptep++;
|
||||
pte_val(pteval) += PAGE_SIZE;
|
||||
pteval = pte_next_pfn(pteval);
|
||||
}
|
||||
}
|
||||
|
@ -1519,7 +1519,7 @@ config ARCH_SUPPORTS_CRASH_DUMP
|
||||
def_bool y
|
||||
|
||||
config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
|
||||
def_bool CRASH_CORE
|
||||
def_bool CRASH_RESERVE
|
||||
|
||||
config TRANS_TABLE
|
||||
def_bool y
|
||||
@ -2229,6 +2229,15 @@ config UNWIND_PATCH_PAC_INTO_SCS
|
||||
select UNWIND_TABLES
|
||||
select DYNAMIC_SCS
|
||||
|
||||
config ARM64_CONTPTE
|
||||
bool "Contiguous PTE mappings for user memory" if EXPERT
|
||||
depends on TRANSPARENT_HUGEPAGE
|
||||
default y
|
||||
help
|
||||
When enabled, user mappings are configured using the PTE contiguous
|
||||
bit, for any mappings that meet the size and alignment requirements.
|
||||
This reduces TLB pressure and improves performance.
|
||||
|
||||
endmenu # "Kernel Features"
|
||||
|
||||
menu "Boot options"
|
||||
|
@ -1,6 +1,6 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0-only */
|
||||
#ifndef _ARM64_CRASH_CORE_H
|
||||
#define _ARM64_CRASH_CORE_H
|
||||
#ifndef _ARM64_CRASH_RESERVE_H
|
||||
#define _ARM64_CRASH_RESERVE_H
|
||||
|
||||
/* Current arm64 boot protocol requires 2MB alignment */
|
||||
#define CRASH_ALIGN SZ_2M
|
@ -80,7 +80,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
|
||||
}
|
||||
}
|
||||
|
||||
#if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HIBERNATION)
|
||||
#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_HIBERNATION)
|
||||
extern bool crash_is_nosave(unsigned long pfn);
|
||||
extern void crash_prepare_suspend(void);
|
||||
extern void crash_post_resume(void);
|
||||
|
@ -98,7 +98,8 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
|
||||
__pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
|
||||
|
||||
#define pte_none(pte) (!pte_val(pte))
|
||||
#define pte_clear(mm,addr,ptep) set_pte(ptep, __pte(0))
|
||||
#define __pte_clear(mm, addr, ptep) \
|
||||
__set_pte(ptep, __pte(0))
|
||||
#define pte_page(pte) (pfn_to_page(pte_pfn(pte)))
|
||||
|
||||
/*
|
||||
@ -137,12 +138,16 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
|
||||
*/
|
||||
#define pte_valid_not_user(pte) \
|
||||
((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN))
|
||||
/*
|
||||
* Returns true if the pte is valid and has the contiguous bit set.
|
||||
*/
|
||||
#define pte_valid_cont(pte) (pte_valid(pte) && pte_cont(pte))
|
||||
/*
|
||||
* Could the pte be present in the TLB? We must check mm_tlb_flush_pending
|
||||
* so that we don't erroneously return false for pages that have been
|
||||
* remapped as PROT_NONE but are yet to be flushed from the TLB.
|
||||
* Note that we can't make any assumptions based on the state of the access
|
||||
* flag, since ptep_clear_flush_young() elides a DSB when invalidating the
|
||||
* flag, since __ptep_clear_flush_young() elides a DSB when invalidating the
|
||||
* TLB.
|
||||
*/
|
||||
#define pte_accessible(mm, pte) \
|
||||
@ -266,7 +271,7 @@ static inline pte_t pte_mkdevmap(pte_t pte)
|
||||
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
|
||||
}
|
||||
|
||||
static inline void set_pte(pte_t *ptep, pte_t pte)
|
||||
static inline void __set_pte(pte_t *ptep, pte_t pte)
|
||||
{
|
||||
WRITE_ONCE(*ptep, pte);
|
||||
|
||||
@ -280,6 +285,11 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
|
||||
}
|
||||
}
|
||||
|
||||
static inline pte_t __ptep_get(pte_t *ptep)
|
||||
{
|
||||
return READ_ONCE(*ptep);
|
||||
}
|
||||
|
||||
extern void __sync_icache_dcache(pte_t pteval);
|
||||
bool pgattr_change_is_safe(u64 old, u64 new);
|
||||
|
||||
@ -307,7 +317,7 @@ static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep,
|
||||
if (!IS_ENABLED(CONFIG_DEBUG_VM))
|
||||
return;
|
||||
|
||||
old_pte = READ_ONCE(*ptep);
|
||||
old_pte = __ptep_get(ptep);
|
||||
|
||||
if (!pte_valid(old_pte) || !pte_valid(pte))
|
||||
return;
|
||||
@ -316,7 +326,7 @@ static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep,
|
||||
|
||||
/*
|
||||
* Check for potential race with hardware updates of the pte
|
||||
* (ptep_set_access_flags safely changes valid ptes without going
|
||||
* (__ptep_set_access_flags safely changes valid ptes without going
|
||||
* through an invalid entry).
|
||||
*/
|
||||
VM_WARN_ONCE(!pte_young(pte),
|
||||
@ -346,23 +356,38 @@ static inline void __sync_cache_and_tags(pte_t pte, unsigned int nr_pages)
|
||||
mte_sync_tags(pte, nr_pages);
|
||||
}
|
||||
|
||||
static inline void set_ptes(struct mm_struct *mm,
|
||||
unsigned long __always_unused addr,
|
||||
pte_t *ptep, pte_t pte, unsigned int nr)
|
||||
/*
|
||||
* Select all bits except the pfn
|
||||
*/
|
||||
static inline pgprot_t pte_pgprot(pte_t pte)
|
||||
{
|
||||
unsigned long pfn = pte_pfn(pte);
|
||||
|
||||
return __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte));
|
||||
}
|
||||
|
||||
#define pte_advance_pfn pte_advance_pfn
|
||||
static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr)
|
||||
{
|
||||
return pfn_pte(pte_pfn(pte) + nr, pte_pgprot(pte));
|
||||
}
|
||||
|
||||
static inline void __set_ptes(struct mm_struct *mm,
|
||||
unsigned long __always_unused addr,
|
||||
pte_t *ptep, pte_t pte, unsigned int nr)
|
||||
{
|
||||
page_table_check_ptes_set(mm, ptep, pte, nr);
|
||||
__sync_cache_and_tags(pte, nr);
|
||||
|
||||
for (;;) {
|
||||
__check_safe_pte_update(mm, ptep, pte);
|
||||
set_pte(ptep, pte);
|
||||
__set_pte(ptep, pte);
|
||||
if (--nr == 0)
|
||||
break;
|
||||
ptep++;
|
||||
pte_val(pte) += PAGE_SIZE;
|
||||
pte = pte_advance_pfn(pte, 1);
|
||||
}
|
||||
}
|
||||
#define set_ptes set_ptes
|
||||
|
||||
/*
|
||||
* Huge pte definitions.
|
||||
@ -438,16 +463,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
|
||||
return clear_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
|
||||
}
|
||||
|
||||
/*
|
||||
* Select all bits except the pfn
|
||||
*/
|
||||
static inline pgprot_t pte_pgprot(pte_t pte)
|
||||
{
|
||||
unsigned long pfn = pte_pfn(pte);
|
||||
|
||||
return __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte));
|
||||
}
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
/*
|
||||
* See the comment in include/linux/pgtable.h
|
||||
@ -539,7 +554,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
|
||||
{
|
||||
__sync_cache_and_tags(pte, nr);
|
||||
__check_safe_pte_update(mm, ptep, pte);
|
||||
set_pte(ptep, pte);
|
||||
__set_pte(ptep, pte);
|
||||
}
|
||||
|
||||
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
|
||||
@ -1033,8 +1048,7 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
|
||||
return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
|
||||
extern int ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
extern int __ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
unsigned long address, pte_t *ptep,
|
||||
pte_t entry, int dirty);
|
||||
|
||||
@ -1044,7 +1058,8 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma,
|
||||
unsigned long address, pmd_t *pmdp,
|
||||
pmd_t entry, int dirty)
|
||||
{
|
||||
return ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry), dirty);
|
||||
return __ptep_set_access_flags(vma, address, (pte_t *)pmdp,
|
||||
pmd_pte(entry), dirty);
|
||||
}
|
||||
|
||||
static inline int pud_devmap(pud_t pud)
|
||||
@ -1078,12 +1093,13 @@ static inline bool pud_user_accessible_page(pud_t pud)
|
||||
/*
|
||||
* Atomic pte/pmd modifications.
|
||||
*/
|
||||
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
|
||||
static inline int __ptep_test_and_clear_young(pte_t *ptep)
|
||||
static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma,
|
||||
unsigned long address,
|
||||
pte_t *ptep)
|
||||
{
|
||||
pte_t old_pte, pte;
|
||||
|
||||
pte = READ_ONCE(*ptep);
|
||||
pte = __ptep_get(ptep);
|
||||
do {
|
||||
old_pte = pte;
|
||||
pte = pte_mkold(pte);
|
||||
@ -1094,18 +1110,10 @@ static inline int __ptep_test_and_clear_young(pte_t *ptep)
|
||||
return pte_young(pte);
|
||||
}
|
||||
|
||||
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
|
||||
unsigned long address,
|
||||
pte_t *ptep)
|
||||
{
|
||||
return __ptep_test_and_clear_young(ptep);
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
|
||||
static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
|
||||
static inline int __ptep_clear_flush_young(struct vm_area_struct *vma,
|
||||
unsigned long address, pte_t *ptep)
|
||||
{
|
||||
int young = ptep_test_and_clear_young(vma, address, ptep);
|
||||
int young = __ptep_test_and_clear_young(vma, address, ptep);
|
||||
|
||||
if (young) {
|
||||
/*
|
||||
@ -1128,12 +1136,11 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
|
||||
unsigned long address,
|
||||
pmd_t *pmdp)
|
||||
{
|
||||
return ptep_test_and_clear_young(vma, address, (pte_t *)pmdp);
|
||||
return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp);
|
||||
}
|
||||
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
||||
|
||||
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
|
||||
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
|
||||
static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
|
||||
unsigned long address, pte_t *ptep)
|
||||
{
|
||||
pte_t pte = __pte(xchg_relaxed(&pte_val(*ptep), 0));
|
||||
@ -1143,6 +1150,37 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
|
||||
return pte;
|
||||
}
|
||||
|
||||
static inline void __clear_full_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, unsigned int nr, int full)
|
||||
{
|
||||
for (;;) {
|
||||
__ptep_get_and_clear(mm, addr, ptep);
|
||||
if (--nr == 0)
|
||||
break;
|
||||
ptep++;
|
||||
addr += PAGE_SIZE;
|
||||
}
|
||||
}
|
||||
|
||||
static inline pte_t __get_and_clear_full_ptes(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
unsigned int nr, int full)
|
||||
{
|
||||
pte_t pte, tmp_pte;
|
||||
|
||||
pte = __ptep_get_and_clear(mm, addr, ptep);
|
||||
while (--nr) {
|
||||
ptep++;
|
||||
addr += PAGE_SIZE;
|
||||
tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
|
||||
if (pte_dirty(tmp_pte))
|
||||
pte = pte_mkdirty(pte);
|
||||
if (pte_young(tmp_pte))
|
||||
pte = pte_mkyoung(pte);
|
||||
}
|
||||
return pte;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
#define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
|
||||
static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
|
||||
@ -1156,16 +1194,12 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
|
||||
}
|
||||
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
||||
|
||||
/*
|
||||
* ptep_set_wrprotect - mark read-only while trasferring potential hardware
|
||||
* dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit.
|
||||
*/
|
||||
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
|
||||
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep)
|
||||
static inline void ___ptep_set_wrprotect(struct mm_struct *mm,
|
||||
unsigned long address, pte_t *ptep,
|
||||
pte_t pte)
|
||||
{
|
||||
pte_t old_pte, pte;
|
||||
pte_t old_pte;
|
||||
|
||||
pte = READ_ONCE(*ptep);
|
||||
do {
|
||||
old_pte = pte;
|
||||
pte = pte_wrprotect(pte);
|
||||
@ -1174,12 +1208,31 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
|
||||
} while (pte_val(pte) != pte_val(old_pte));
|
||||
}
|
||||
|
||||
/*
|
||||
* __ptep_set_wrprotect - mark read-only while trasferring potential hardware
|
||||
* dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit.
|
||||
*/
|
||||
static inline void __ptep_set_wrprotect(struct mm_struct *mm,
|
||||
unsigned long address, pte_t *ptep)
|
||||
{
|
||||
___ptep_set_wrprotect(mm, address, ptep, __ptep_get(ptep));
|
||||
}
|
||||
|
||||
static inline void __wrprotect_ptes(struct mm_struct *mm, unsigned long address,
|
||||
pte_t *ptep, unsigned int nr)
|
||||
{
|
||||
unsigned int i;
|
||||
|
||||
for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++)
|
||||
__ptep_set_wrprotect(mm, address, ptep);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
#define __HAVE_ARCH_PMDP_SET_WRPROTECT
|
||||
static inline void pmdp_set_wrprotect(struct mm_struct *mm,
|
||||
unsigned long address, pmd_t *pmdp)
|
||||
{
|
||||
ptep_set_wrprotect(mm, address, (pte_t *)pmdp);
|
||||
__ptep_set_wrprotect(mm, address, (pte_t *)pmdp);
|
||||
}
|
||||
|
||||
#define pmdp_establish pmdp_establish
|
||||
@ -1257,7 +1310,7 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
|
||||
#endif /* CONFIG_ARM64_MTE */
|
||||
|
||||
/*
|
||||
* On AArch64, the cache coherency is handled via the set_pte_at() function.
|
||||
* On AArch64, the cache coherency is handled via the __set_ptes() function.
|
||||
*/
|
||||
static inline void update_mmu_cache_range(struct vm_fault *vmf,
|
||||
struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
|
||||
@ -1309,6 +1362,282 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
|
||||
extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
pte_t old_pte, pte_t new_pte);
|
||||
|
||||
#ifdef CONFIG_ARM64_CONTPTE
|
||||
|
||||
/*
|
||||
* The contpte APIs are used to transparently manage the contiguous bit in ptes
|
||||
* where it is possible and makes sense to do so. The PTE_CONT bit is considered
|
||||
* a private implementation detail of the public ptep API (see below).
|
||||
*/
|
||||
extern void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte);
|
||||
extern void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte);
|
||||
extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte);
|
||||
extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep);
|
||||
extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte, unsigned int nr);
|
||||
extern void contpte_clear_full_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, unsigned int nr, int full);
|
||||
extern pte_t contpte_get_and_clear_full_ptes(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
unsigned int nr, int full);
|
||||
extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep);
|
||||
extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep);
|
||||
extern void contpte_wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, unsigned int nr);
|
||||
extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
pte_t entry, int dirty);
|
||||
|
||||
static __always_inline void contpte_try_fold(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep, pte_t pte)
|
||||
{
|
||||
/*
|
||||
* Only bother trying if both the virtual and physical addresses are
|
||||
* aligned and correspond to the last entry in a contig range. The core
|
||||
* code mostly modifies ranges from low to high, so this is the likely
|
||||
* the last modification in the contig range, so a good time to fold.
|
||||
* We can't fold special mappings, because there is no associated folio.
|
||||
*/
|
||||
|
||||
const unsigned long contmask = CONT_PTES - 1;
|
||||
bool valign = ((addr >> PAGE_SHIFT) & contmask) == contmask;
|
||||
|
||||
if (unlikely(valign)) {
|
||||
bool palign = (pte_pfn(pte) & contmask) == contmask;
|
||||
|
||||
if (unlikely(palign &&
|
||||
pte_valid(pte) && !pte_cont(pte) && !pte_special(pte)))
|
||||
__contpte_try_fold(mm, addr, ptep, pte);
|
||||
}
|
||||
}
|
||||
|
||||
static __always_inline void contpte_try_unfold(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep, pte_t pte)
|
||||
{
|
||||
if (unlikely(pte_valid_cont(pte)))
|
||||
__contpte_try_unfold(mm, addr, ptep, pte);
|
||||
}
|
||||
|
||||
#define pte_batch_hint pte_batch_hint
|
||||
static inline unsigned int pte_batch_hint(pte_t *ptep, pte_t pte)
|
||||
{
|
||||
if (!pte_valid_cont(pte))
|
||||
return 1;
|
||||
|
||||
return CONT_PTES - (((unsigned long)ptep >> 3) & (CONT_PTES - 1));
|
||||
}
|
||||
|
||||
/*
|
||||
* The below functions constitute the public API that arm64 presents to the
|
||||
* core-mm to manipulate PTE entries within their page tables (or at least this
|
||||
* is the subset of the API that arm64 needs to implement). These public
|
||||
* versions will automatically and transparently apply the contiguous bit where
|
||||
* it makes sense to do so. Therefore any users that are contig-aware (e.g.
|
||||
* hugetlb, kernel mapper) should NOT use these APIs, but instead use the
|
||||
* private versions, which are prefixed with double underscore. All of these
|
||||
* APIs except for ptep_get_lockless() are expected to be called with the PTL
|
||||
* held. Although the contiguous bit is considered private to the
|
||||
* implementation, it is deliberately allowed to leak through the getters (e.g.
|
||||
* ptep_get()), back to core code. This is required so that pte_leaf_size() can
|
||||
* provide an accurate size for perf_get_pgtable_size(). But this leakage means
|
||||
* its possible a pte will be passed to a setter with the contiguous bit set, so
|
||||
* we explicitly clear the contiguous bit in those cases to prevent accidentally
|
||||
* setting it in the pgtable.
|
||||
*/
|
||||
|
||||
#define ptep_get ptep_get
|
||||
static inline pte_t ptep_get(pte_t *ptep)
|
||||
{
|
||||
pte_t pte = __ptep_get(ptep);
|
||||
|
||||
if (likely(!pte_valid_cont(pte)))
|
||||
return pte;
|
||||
|
||||
return contpte_ptep_get(ptep, pte);
|
||||
}
|
||||
|
||||
#define ptep_get_lockless ptep_get_lockless
|
||||
static inline pte_t ptep_get_lockless(pte_t *ptep)
|
||||
{
|
||||
pte_t pte = __ptep_get(ptep);
|
||||
|
||||
if (likely(!pte_valid_cont(pte)))
|
||||
return pte;
|
||||
|
||||
return contpte_ptep_get_lockless(ptep);
|
||||
}
|
||||
|
||||
static inline void set_pte(pte_t *ptep, pte_t pte)
|
||||
{
|
||||
/*
|
||||
* We don't have the mm or vaddr so cannot unfold contig entries (since
|
||||
* it requires tlb maintenance). set_pte() is not used in core code, so
|
||||
* this should never even be called. Regardless do our best to service
|
||||
* any call and emit a warning if there is any attempt to set a pte on
|
||||
* top of an existing contig range.
|
||||
*/
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
|
||||
WARN_ON_ONCE(pte_valid_cont(orig_pte));
|
||||
__set_pte(ptep, pte_mknoncont(pte));
|
||||
}
|
||||
|
||||
#define set_ptes set_ptes
|
||||
static __always_inline void set_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte, unsigned int nr)
|
||||
{
|
||||
pte = pte_mknoncont(pte);
|
||||
|
||||
if (likely(nr == 1)) {
|
||||
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
|
||||
__set_ptes(mm, addr, ptep, pte, 1);
|
||||
contpte_try_fold(mm, addr, ptep, pte);
|
||||
} else {
|
||||
contpte_set_ptes(mm, addr, ptep, pte, nr);
|
||||
}
|
||||
}
|
||||
|
||||
static inline void pte_clear(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep)
|
||||
{
|
||||
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
|
||||
__pte_clear(mm, addr, ptep);
|
||||
}
|
||||
|
||||
#define clear_full_ptes clear_full_ptes
|
||||
static inline void clear_full_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, unsigned int nr, int full)
|
||||
{
|
||||
if (likely(nr == 1)) {
|
||||
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
|
||||
__clear_full_ptes(mm, addr, ptep, nr, full);
|
||||
} else {
|
||||
contpte_clear_full_ptes(mm, addr, ptep, nr, full);
|
||||
}
|
||||
}
|
||||
|
||||
#define get_and_clear_full_ptes get_and_clear_full_ptes
|
||||
static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
unsigned int nr, int full)
|
||||
{
|
||||
pte_t pte;
|
||||
|
||||
if (likely(nr == 1)) {
|
||||
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
|
||||
pte = __get_and_clear_full_ptes(mm, addr, ptep, nr, full);
|
||||
} else {
|
||||
pte = contpte_get_and_clear_full_ptes(mm, addr, ptep, nr, full);
|
||||
}
|
||||
|
||||
return pte;
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
|
||||
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep)
|
||||
{
|
||||
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
|
||||
return __ptep_get_and_clear(mm, addr, ptep);
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
|
||||
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep)
|
||||
{
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
|
||||
if (likely(!pte_valid_cont(orig_pte)))
|
||||
return __ptep_test_and_clear_young(vma, addr, ptep);
|
||||
|
||||
return contpte_ptep_test_and_clear_young(vma, addr, ptep);
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
|
||||
static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep)
|
||||
{
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
|
||||
if (likely(!pte_valid_cont(orig_pte)))
|
||||
return __ptep_clear_flush_young(vma, addr, ptep);
|
||||
|
||||
return contpte_ptep_clear_flush_young(vma, addr, ptep);
|
||||
}
|
||||
|
||||
#define wrprotect_ptes wrprotect_ptes
|
||||
static __always_inline void wrprotect_ptes(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep, unsigned int nr)
|
||||
{
|
||||
if (likely(nr == 1)) {
|
||||
/*
|
||||
* Optimization: wrprotect_ptes() can only be called for present
|
||||
* ptes so we only need to check contig bit as condition for
|
||||
* unfold, and we can remove the contig bit from the pte we read
|
||||
* to avoid re-reading. This speeds up fork() which is sensitive
|
||||
* for order-0 folios. Equivalent to contpte_try_unfold().
|
||||
*/
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
|
||||
if (unlikely(pte_cont(orig_pte))) {
|
||||
__contpte_try_unfold(mm, addr, ptep, orig_pte);
|
||||
orig_pte = pte_mknoncont(orig_pte);
|
||||
}
|
||||
___ptep_set_wrprotect(mm, addr, ptep, orig_pte);
|
||||
} else {
|
||||
contpte_wrprotect_ptes(mm, addr, ptep, nr);
|
||||
}
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
|
||||
static inline void ptep_set_wrprotect(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep)
|
||||
{
|
||||
wrprotect_ptes(mm, addr, ptep, 1);
|
||||
}
|
||||
|
||||
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
|
||||
static inline int ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
pte_t entry, int dirty)
|
||||
{
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
|
||||
entry = pte_mknoncont(entry);
|
||||
|
||||
if (likely(!pte_valid_cont(orig_pte)))
|
||||
return __ptep_set_access_flags(vma, addr, ptep, entry, dirty);
|
||||
|
||||
return contpte_ptep_set_access_flags(vma, addr, ptep, entry, dirty);
|
||||
}
|
||||
|
||||
#else /* CONFIG_ARM64_CONTPTE */
|
||||
|
||||
#define ptep_get __ptep_get
|
||||
#define set_pte __set_pte
|
||||
#define set_ptes __set_ptes
|
||||
#define pte_clear __pte_clear
|
||||
#define clear_full_ptes __clear_full_ptes
|
||||
#define get_and_clear_full_ptes __get_and_clear_full_ptes
|
||||
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
|
||||
#define ptep_get_and_clear __ptep_get_and_clear
|
||||
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
|
||||
#define ptep_test_and_clear_young __ptep_test_and_clear_young
|
||||
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
|
||||
#define ptep_clear_flush_young __ptep_clear_flush_young
|
||||
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
|
||||
#define ptep_set_wrprotect __ptep_set_wrprotect
|
||||
#define wrprotect_ptes __wrprotect_ptes
|
||||
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
|
||||
#define ptep_set_access_flags __ptep_set_access_flags
|
||||
|
||||
#endif /* CONFIG_ARM64_CONTPTE */
|
||||
|
||||
#endif /* !__ASSEMBLY__ */
|
||||
|
||||
#endif /* __ASM_PGTABLE_H */
|
||||
|
@ -29,13 +29,6 @@ void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name);
|
||||
static inline void ptdump_debugfs_register(struct ptdump_info *info,
|
||||
const char *name) { }
|
||||
#endif
|
||||
void ptdump_check_wx(void);
|
||||
#endif /* CONFIG_PTDUMP_CORE */
|
||||
|
||||
#ifdef CONFIG_DEBUG_WX
|
||||
#define debug_checkwx() ptdump_check_wx()
|
||||
#else
|
||||
#define debug_checkwx() do { } while (0)
|
||||
#endif
|
||||
|
||||
#endif /* __ASM_PTDUMP_H */
|
||||
|
@ -422,7 +422,7 @@ do { \
|
||||
#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
|
||||
__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, kvm_lpa2_is_enabled());
|
||||
|
||||
static inline void __flush_tlb_range(struct vm_area_struct *vma,
|
||||
static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
|
||||
unsigned long start, unsigned long end,
|
||||
unsigned long stride, bool last_level,
|
||||
int tlb_level)
|
||||
@ -456,10 +456,19 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
|
||||
__flush_tlb_range_op(vae1is, start, pages, stride, asid,
|
||||
tlb_level, true, lpa2_is_enabled());
|
||||
|
||||
dsb(ish);
|
||||
mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
|
||||
}
|
||||
|
||||
static inline void __flush_tlb_range(struct vm_area_struct *vma,
|
||||
unsigned long start, unsigned long end,
|
||||
unsigned long stride, bool last_level,
|
||||
int tlb_level)
|
||||
{
|
||||
__flush_tlb_range_nosync(vma, start, end, stride,
|
||||
last_level, tlb_level);
|
||||
dsb(ish);
|
||||
}
|
||||
|
||||
static inline void flush_tlb_range(struct vm_area_struct *vma,
|
||||
unsigned long start, unsigned long end)
|
||||
{
|
||||
|
@ -65,7 +65,7 @@ obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o kexec_image.o
|
||||
obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
|
||||
arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
|
||||
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
|
||||
obj-$(CONFIG_CRASH_CORE) += crash_core.o
|
||||
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
|
||||
obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
|
||||
obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
|
||||
obj-$(CONFIG_ARM64_MTE) += mte.o
|
||||
|
@ -103,7 +103,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data)
|
||||
{
|
||||
struct set_perm_data *spd = data;
|
||||
const efi_memory_desc_t *md = spd->md;
|
||||
pte_t pte = READ_ONCE(*ptep);
|
||||
pte_t pte = __ptep_get(ptep);
|
||||
|
||||
if (md->attribute & EFI_MEMORY_RO)
|
||||
pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
|
||||
@ -111,7 +111,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data)
|
||||
pte = set_pte_bit(pte, __pgprot(PTE_PXN));
|
||||
else if (system_supports_bti_kernel() && spd->has_bti)
|
||||
pte = set_pte_bit(pte, __pgprot(PTE_GP));
|
||||
set_pte(ptep, pte);
|
||||
__set_pte(ptep, pte);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -255,7 +255,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
|
||||
pr_info("Starting crashdump kernel...\n");
|
||||
}
|
||||
|
||||
#ifdef CONFIG_HIBERNATION
|
||||
#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_HIBERNATION)
|
||||
/*
|
||||
* To preserve the crash dump kernel image, the relevant memory segments
|
||||
* should be mapped again around the hibernation.
|
||||
|
@ -39,6 +39,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
|
||||
return kexec_image_post_load_cleanup_default(image);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_CRASH_DUMP
|
||||
static int prepare_elf_headers(void **addr, unsigned long *sz)
|
||||
{
|
||||
struct crash_mem *cmem;
|
||||
@ -80,6 +81,7 @@ out:
|
||||
kfree(cmem);
|
||||
return ret;
|
||||
}
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Tries to add the initrd and DTB to the image. If it is not possible to find
|
||||
@ -93,8 +95,8 @@ int load_other_segments(struct kimage *image,
|
||||
char *cmdline)
|
||||
{
|
||||
struct kexec_buf kbuf;
|
||||
void *headers, *dtb = NULL;
|
||||
unsigned long headers_sz, initrd_load_addr = 0, dtb_len,
|
||||
void *dtb = NULL;
|
||||
unsigned long initrd_load_addr = 0, dtb_len,
|
||||
orig_segments = image->nr_segments;
|
||||
int ret = 0;
|
||||
|
||||
@ -102,7 +104,10 @@ int load_other_segments(struct kimage *image,
|
||||
/* not allocate anything below the kernel */
|
||||
kbuf.buf_min = kernel_load_addr + kernel_size;
|
||||
|
||||
#ifdef CONFIG_CRASH_DUMP
|
||||
/* load elf core header */
|
||||
void *headers;
|
||||
unsigned long headers_sz;
|
||||
if (image->type == KEXEC_TYPE_CRASH) {
|
||||
ret = prepare_elf_headers(&headers, &headers_sz);
|
||||
if (ret) {
|
||||
@ -130,6 +135,7 @@ int load_other_segments(struct kimage *image,
|
||||
kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
|
||||
image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
|
||||
}
|
||||
#endif
|
||||
|
||||
/* load initrd */
|
||||
if (initrd) {
|
||||
|
@ -67,7 +67,7 @@ int memcmp_pages(struct page *page1, struct page *page2)
|
||||
/*
|
||||
* If the page content is identical but at least one of the pages is
|
||||
* tagged, return non-zero to avoid KSM merging. If only one of the
|
||||
* pages is tagged, set_pte_at() may zero or change the tags of the
|
||||
* pages is tagged, __set_ptes() may zero or change the tags of the
|
||||
* other page via mte_sync_tags().
|
||||
*/
|
||||
if (page_mte_tagged(page1) || page_mte_tagged(page2))
|
||||
|
@ -4,7 +4,7 @@
|
||||
* Copyright (C) Huawei Futurewei Technologies.
|
||||
*/
|
||||
|
||||
#include <linux/crash_core.h>
|
||||
#include <linux/vmcore_info.h>
|
||||
#include <asm/cpufeature.h>
|
||||
#include <asm/memory.h>
|
||||
#include <asm/pgtable-hwdef.h>
|
||||
@ -23,7 +23,6 @@ void arch_crash_save_vmcoreinfo(void)
|
||||
/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
|
||||
vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR);
|
||||
vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END);
|
||||
vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START);
|
||||
vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END);
|
||||
vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START);
|
||||
vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END);
|
@ -1072,7 +1072,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
|
||||
} else {
|
||||
/*
|
||||
* Only locking to serialise with a concurrent
|
||||
* set_pte_at() in the VMM but still overriding the
|
||||
* __set_ptes() in the VMM but still overriding the
|
||||
* tags, hence ignoring the return value.
|
||||
*/
|
||||
try_page_mte_tagging(page);
|
||||
|
@ -3,6 +3,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \
|
||||
cache.o copypage.o flush.o \
|
||||
ioremap.o mmap.o pgd.o mmu.o \
|
||||
context.o proc.o pageattr.o fixmap.o
|
||||
obj-$(CONFIG_ARM64_CONTPTE) += contpte.o
|
||||
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
|
||||
obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
|
||||
obj-$(CONFIG_PTDUMP_DEBUGFS) += ptdump_debugfs.o
|
||||
|
408
arch/arm64/mm/contpte.c
Normal file
408
arch/arm64/mm/contpte.c
Normal file
@ -0,0 +1,408 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
/*
|
||||
* Copyright (C) 2023 ARM Ltd.
|
||||
*/
|
||||
|
||||
#include <linux/mm.h>
|
||||
#include <linux/efi.h>
|
||||
#include <linux/export.h>
|
||||
#include <asm/tlbflush.h>
|
||||
|
||||
static inline bool mm_is_user(struct mm_struct *mm)
|
||||
{
|
||||
/*
|
||||
* Don't attempt to apply the contig bit to kernel mappings, because
|
||||
* dynamically adding/removing the contig bit can cause page faults.
|
||||
* These racing faults are ok for user space, since they get serialized
|
||||
* on the PTL. But kernel mappings can't tolerate faults.
|
||||
*/
|
||||
if (unlikely(mm_is_efi(mm)))
|
||||
return false;
|
||||
return mm != &init_mm;
|
||||
}
|
||||
|
||||
static inline pte_t *contpte_align_down(pte_t *ptep)
|
||||
{
|
||||
return PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * CONT_PTES);
|
||||
}
|
||||
|
||||
static void contpte_try_unfold_partial(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, unsigned int nr)
|
||||
{
|
||||
/*
|
||||
* Unfold any partially covered contpte block at the beginning and end
|
||||
* of the range.
|
||||
*/
|
||||
|
||||
if (ptep != contpte_align_down(ptep) || nr < CONT_PTES)
|
||||
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
|
||||
|
||||
if (ptep + nr != contpte_align_down(ptep + nr)) {
|
||||
unsigned long last_addr = addr + PAGE_SIZE * (nr - 1);
|
||||
pte_t *last_ptep = ptep + nr - 1;
|
||||
|
||||
contpte_try_unfold(mm, last_addr, last_ptep,
|
||||
__ptep_get(last_ptep));
|
||||
}
|
||||
}
|
||||
|
||||
static void contpte_convert(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte)
|
||||
{
|
||||
struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
|
||||
unsigned long start_addr;
|
||||
pte_t *start_ptep;
|
||||
int i;
|
||||
|
||||
start_ptep = ptep = contpte_align_down(ptep);
|
||||
start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
|
||||
pte = pfn_pte(ALIGN_DOWN(pte_pfn(pte), CONT_PTES), pte_pgprot(pte));
|
||||
|
||||
for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) {
|
||||
pte_t ptent = __ptep_get_and_clear(mm, addr, ptep);
|
||||
|
||||
if (pte_dirty(ptent))
|
||||
pte = pte_mkdirty(pte);
|
||||
|
||||
if (pte_young(ptent))
|
||||
pte = pte_mkyoung(pte);
|
||||
}
|
||||
|
||||
__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
|
||||
|
||||
__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
|
||||
}
|
||||
|
||||
void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte)
|
||||
{
|
||||
/*
|
||||
* We have already checked that the virtual and pysical addresses are
|
||||
* correctly aligned for a contpte mapping in contpte_try_fold() so the
|
||||
* remaining checks are to ensure that the contpte range is fully
|
||||
* covered by a single folio, and ensure that all the ptes are valid
|
||||
* with contiguous PFNs and matching prots. We ignore the state of the
|
||||
* access and dirty bits for the purpose of deciding if its a contiguous
|
||||
* range; the folding process will generate a single contpte entry which
|
||||
* has a single access and dirty bit. Those 2 bits are the logical OR of
|
||||
* their respective bits in the constituent pte entries. In order to
|
||||
* ensure the contpte range is covered by a single folio, we must
|
||||
* recover the folio from the pfn, but special mappings don't have a
|
||||
* folio backing them. Fortunately contpte_try_fold() already checked
|
||||
* that the pte is not special - we never try to fold special mappings.
|
||||
* Note we can't use vm_normal_page() for this since we don't have the
|
||||
* vma.
|
||||
*/
|
||||
|
||||
unsigned long folio_start, folio_end;
|
||||
unsigned long cont_start, cont_end;
|
||||
pte_t expected_pte, subpte;
|
||||
struct folio *folio;
|
||||
struct page *page;
|
||||
unsigned long pfn;
|
||||
pte_t *orig_ptep;
|
||||
pgprot_t prot;
|
||||
|
||||
int i;
|
||||
|
||||
if (!mm_is_user(mm))
|
||||
return;
|
||||
|
||||
page = pte_page(pte);
|
||||
folio = page_folio(page);
|
||||
folio_start = addr - (page - &folio->page) * PAGE_SIZE;
|
||||
folio_end = folio_start + folio_nr_pages(folio) * PAGE_SIZE;
|
||||
cont_start = ALIGN_DOWN(addr, CONT_PTE_SIZE);
|
||||
cont_end = cont_start + CONT_PTE_SIZE;
|
||||
|
||||
if (folio_start > cont_start || folio_end < cont_end)
|
||||
return;
|
||||
|
||||
pfn = ALIGN_DOWN(pte_pfn(pte), CONT_PTES);
|
||||
prot = pte_pgprot(pte_mkold(pte_mkclean(pte)));
|
||||
expected_pte = pfn_pte(pfn, prot);
|
||||
orig_ptep = ptep;
|
||||
ptep = contpte_align_down(ptep);
|
||||
|
||||
for (i = 0; i < CONT_PTES; i++) {
|
||||
subpte = pte_mkold(pte_mkclean(__ptep_get(ptep)));
|
||||
if (!pte_same(subpte, expected_pte))
|
||||
return;
|
||||
expected_pte = pte_advance_pfn(expected_pte, 1);
|
||||
ptep++;
|
||||
}
|
||||
|
||||
pte = pte_mkcont(pte);
|
||||
contpte_convert(mm, addr, orig_ptep, pte);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__contpte_try_fold);
|
||||
|
||||
void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte)
|
||||
{
|
||||
/*
|
||||
* We have already checked that the ptes are contiguous in
|
||||
* contpte_try_unfold(), so just check that the mm is user space.
|
||||
*/
|
||||
if (!mm_is_user(mm))
|
||||
return;
|
||||
|
||||
pte = pte_mknoncont(pte);
|
||||
contpte_convert(mm, addr, ptep, pte);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__contpte_try_unfold);
|
||||
|
||||
pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
|
||||
{
|
||||
/*
|
||||
* Gather access/dirty bits, which may be populated in any of the ptes
|
||||
* of the contig range. We are guaranteed to be holding the PTL, so any
|
||||
* contiguous range cannot be unfolded or otherwise modified under our
|
||||
* feet.
|
||||
*/
|
||||
|
||||
pte_t pte;
|
||||
int i;
|
||||
|
||||
ptep = contpte_align_down(ptep);
|
||||
|
||||
for (i = 0; i < CONT_PTES; i++, ptep++) {
|
||||
pte = __ptep_get(ptep);
|
||||
|
||||
if (pte_dirty(pte))
|
||||
orig_pte = pte_mkdirty(orig_pte);
|
||||
|
||||
if (pte_young(pte))
|
||||
orig_pte = pte_mkyoung(orig_pte);
|
||||
}
|
||||
|
||||
return orig_pte;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_ptep_get);
|
||||
|
||||
pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
|
||||
{
|
||||
/*
|
||||
* The ptep_get_lockless() API requires us to read and return *orig_ptep
|
||||
* so that it is self-consistent, without the PTL held, so we may be
|
||||
* racing with other threads modifying the pte. Usually a READ_ONCE()
|
||||
* would suffice, but for the contpte case, we also need to gather the
|
||||
* access and dirty bits from across all ptes in the contiguous block,
|
||||
* and we can't read all of those neighbouring ptes atomically, so any
|
||||
* contiguous range may be unfolded/modified/refolded under our feet.
|
||||
* Therefore we ensure we read a _consistent_ contpte range by checking
|
||||
* that all ptes in the range are valid and have CONT_PTE set, that all
|
||||
* pfns are contiguous and that all pgprots are the same (ignoring
|
||||
* access/dirty). If we find a pte that is not consistent, then we must
|
||||
* be racing with an update so start again. If the target pte does not
|
||||
* have CONT_PTE set then that is considered consistent on its own
|
||||
* because it is not part of a contpte range.
|
||||
*/
|
||||
|
||||
pgprot_t orig_prot;
|
||||
unsigned long pfn;
|
||||
pte_t orig_pte;
|
||||
pgprot_t prot;
|
||||
pte_t *ptep;
|
||||
pte_t pte;
|
||||
int i;
|
||||
|
||||
retry:
|
||||
orig_pte = __ptep_get(orig_ptep);
|
||||
|
||||
if (!pte_valid_cont(orig_pte))
|
||||
return orig_pte;
|
||||
|
||||
orig_prot = pte_pgprot(pte_mkold(pte_mkclean(orig_pte)));
|
||||
ptep = contpte_align_down(orig_ptep);
|
||||
pfn = pte_pfn(orig_pte) - (orig_ptep - ptep);
|
||||
|
||||
for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) {
|
||||
pte = __ptep_get(ptep);
|
||||
prot = pte_pgprot(pte_mkold(pte_mkclean(pte)));
|
||||
|
||||
if (!pte_valid_cont(pte) ||
|
||||
pte_pfn(pte) != pfn ||
|
||||
pgprot_val(prot) != pgprot_val(orig_prot))
|
||||
goto retry;
|
||||
|
||||
if (pte_dirty(pte))
|
||||
orig_pte = pte_mkdirty(orig_pte);
|
||||
|
||||
if (pte_young(pte))
|
||||
orig_pte = pte_mkyoung(orig_pte);
|
||||
}
|
||||
|
||||
return orig_pte;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_ptep_get_lockless);
|
||||
|
||||
void contpte_set_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte, unsigned int nr)
|
||||
{
|
||||
unsigned long next;
|
||||
unsigned long end;
|
||||
unsigned long pfn;
|
||||
pgprot_t prot;
|
||||
|
||||
/*
|
||||
* The set_ptes() spec guarantees that when nr > 1, the initial state of
|
||||
* all ptes is not-present. Therefore we never need to unfold or
|
||||
* otherwise invalidate a range before we set the new ptes.
|
||||
* contpte_set_ptes() should never be called for nr < 2.
|
||||
*/
|
||||
VM_WARN_ON(nr == 1);
|
||||
|
||||
if (!mm_is_user(mm))
|
||||
return __set_ptes(mm, addr, ptep, pte, nr);
|
||||
|
||||
end = addr + (nr << PAGE_SHIFT);
|
||||
pfn = pte_pfn(pte);
|
||||
prot = pte_pgprot(pte);
|
||||
|
||||
do {
|
||||
next = pte_cont_addr_end(addr, end);
|
||||
nr = (next - addr) >> PAGE_SHIFT;
|
||||
pte = pfn_pte(pfn, prot);
|
||||
|
||||
if (((addr | next | (pfn << PAGE_SHIFT)) & ~CONT_PTE_MASK) == 0)
|
||||
pte = pte_mkcont(pte);
|
||||
else
|
||||
pte = pte_mknoncont(pte);
|
||||
|
||||
__set_ptes(mm, addr, ptep, pte, nr);
|
||||
|
||||
addr = next;
|
||||
ptep += nr;
|
||||
pfn += nr;
|
||||
|
||||
} while (addr != end);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_set_ptes);
|
||||
|
||||
void contpte_clear_full_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, unsigned int nr, int full)
|
||||
{
|
||||
contpte_try_unfold_partial(mm, addr, ptep, nr);
|
||||
__clear_full_ptes(mm, addr, ptep, nr, full);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_clear_full_ptes);
|
||||
|
||||
pte_t contpte_get_and_clear_full_ptes(struct mm_struct *mm,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
unsigned int nr, int full)
|
||||
{
|
||||
contpte_try_unfold_partial(mm, addr, ptep, nr);
|
||||
return __get_and_clear_full_ptes(mm, addr, ptep, nr, full);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_get_and_clear_full_ptes);
|
||||
|
||||
int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep)
|
||||
{
|
||||
/*
|
||||
* ptep_clear_flush_young() technically requires us to clear the access
|
||||
* flag for a _single_ pte. However, the core-mm code actually tracks
|
||||
* access/dirty per folio, not per page. And since we only create a
|
||||
* contig range when the range is covered by a single folio, we can get
|
||||
* away with clearing young for the whole contig range here, so we avoid
|
||||
* having to unfold.
|
||||
*/
|
||||
|
||||
int young = 0;
|
||||
int i;
|
||||
|
||||
ptep = contpte_align_down(ptep);
|
||||
addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
|
||||
|
||||
for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE)
|
||||
young |= __ptep_test_and_clear_young(vma, addr, ptep);
|
||||
|
||||
return young;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_ptep_test_and_clear_young);
|
||||
|
||||
int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep)
|
||||
{
|
||||
int young;
|
||||
|
||||
young = contpte_ptep_test_and_clear_young(vma, addr, ptep);
|
||||
|
||||
if (young) {
|
||||
/*
|
||||
* See comment in __ptep_clear_flush_young(); same rationale for
|
||||
* eliding the trailing DSB applies here.
|
||||
*/
|
||||
addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
|
||||
__flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE,
|
||||
PAGE_SIZE, true, 3);
|
||||
}
|
||||
|
||||
return young;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_ptep_clear_flush_young);
|
||||
|
||||
void contpte_wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, unsigned int nr)
|
||||
{
|
||||
/*
|
||||
* If wrprotecting an entire contig range, we can avoid unfolding. Just
|
||||
* set wrprotect and wait for the later mmu_gather flush to invalidate
|
||||
* the tlb. Until the flush, the page may or may not be wrprotected.
|
||||
* After the flush, it is guaranteed wrprotected. If it's a partial
|
||||
* range though, we must unfold, because we can't have a case where
|
||||
* CONT_PTE is set but wrprotect applies to a subset of the PTEs; this
|
||||
* would cause it to continue to be unpredictable after the flush.
|
||||
*/
|
||||
|
||||
contpte_try_unfold_partial(mm, addr, ptep, nr);
|
||||
__wrprotect_ptes(mm, addr, ptep, nr);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_wrprotect_ptes);
|
||||
|
||||
int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
unsigned long addr, pte_t *ptep,
|
||||
pte_t entry, int dirty)
|
||||
{
|
||||
unsigned long start_addr;
|
||||
pte_t orig_pte;
|
||||
int i;
|
||||
|
||||
/*
|
||||
* Gather the access/dirty bits for the contiguous range. If nothing has
|
||||
* changed, its a noop.
|
||||
*/
|
||||
orig_pte = pte_mknoncont(ptep_get(ptep));
|
||||
if (pte_val(orig_pte) == pte_val(entry))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* We can fix up access/dirty bits without having to unfold the contig
|
||||
* range. But if the write bit is changing, we must unfold.
|
||||
*/
|
||||
if (pte_write(orig_pte) == pte_write(entry)) {
|
||||
/*
|
||||
* For HW access management, we technically only need to update
|
||||
* the flag on a single pte in the range. But for SW access
|
||||
* management, we need to update all the ptes to prevent extra
|
||||
* faults. Avoid per-page tlb flush in __ptep_set_access_flags()
|
||||
* and instead flush the whole range at the end.
|
||||
*/
|
||||
ptep = contpte_align_down(ptep);
|
||||
start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
|
||||
|
||||
for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE)
|
||||
__ptep_set_access_flags(vma, addr, ptep, entry, 0);
|
||||
|
||||
if (dirty)
|
||||
__flush_tlb_range(vma, start_addr, addr,
|
||||
PAGE_SIZE, true, 3);
|
||||
} else {
|
||||
__contpte_try_unfold(vma->vm_mm, addr, ptep, orig_pte);
|
||||
__ptep_set_access_flags(vma, addr, ptep, entry, dirty);
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(contpte_ptep_set_access_flags);
|
@ -191,7 +191,7 @@ static void show_pte(unsigned long addr)
|
||||
if (!ptep)
|
||||
break;
|
||||
|
||||
pte = READ_ONCE(*ptep);
|
||||
pte = __ptep_get(ptep);
|
||||
pr_cont(", pte=%016llx", pte_val(pte));
|
||||
pte_unmap(ptep);
|
||||
} while(0);
|
||||
@ -205,16 +205,16 @@ static void show_pte(unsigned long addr)
|
||||
*
|
||||
* It needs to cope with hardware update of the accessed/dirty state by other
|
||||
* agents in the system and can safely skip the __sync_icache_dcache() call as,
|
||||
* like set_pte_at(), the PTE is never changed from no-exec to exec here.
|
||||
* like __set_ptes(), the PTE is never changed from no-exec to exec here.
|
||||
*
|
||||
* Returns whether or not the PTE actually changed.
|
||||
*/
|
||||
int ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
unsigned long address, pte_t *ptep,
|
||||
pte_t entry, int dirty)
|
||||
int __ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
unsigned long address, pte_t *ptep,
|
||||
pte_t entry, int dirty)
|
||||
{
|
||||
pteval_t old_pteval, pteval;
|
||||
pte_t pte = READ_ONCE(*ptep);
|
||||
pte_t pte = __ptep_get(ptep);
|
||||
|
||||
if (pte_same(pte, entry))
|
||||
return 0;
|
||||
|
@ -124,9 +124,9 @@ void __set_fixmap(enum fixed_addresses idx,
|
||||
ptep = fixmap_pte(addr);
|
||||
|
||||
if (pgprot_val(flags)) {
|
||||
set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags));
|
||||
__set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags));
|
||||
} else {
|
||||
pte_clear(&init_mm, addr, ptep);
|
||||
__pte_clear(&init_mm, addr, ptep);
|
||||
flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
|
||||
}
|
||||
}
|
||||
|
@ -45,13 +45,6 @@ void __init arm64_hugetlb_cma_reserve(void)
|
||||
else
|
||||
order = CONT_PMD_SHIFT - PAGE_SHIFT;
|
||||
|
||||
/*
|
||||
* HugeTLB CMA reservation is required for gigantic
|
||||
* huge pages which could not be allocated via the
|
||||
* page allocator. Just warn if there is any change
|
||||
* breaking this assumption.
|
||||
*/
|
||||
WARN_ON(order <= MAX_PAGE_ORDER);
|
||||
hugetlb_cma_reserve(order);
|
||||
}
|
||||
#endif /* CONFIG_CMA */
|
||||
@ -152,14 +145,14 @@ pte_t huge_ptep_get(pte_t *ptep)
|
||||
{
|
||||
int ncontig, i;
|
||||
size_t pgsize;
|
||||
pte_t orig_pte = ptep_get(ptep);
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
|
||||
if (!pte_present(orig_pte) || !pte_cont(orig_pte))
|
||||
return orig_pte;
|
||||
|
||||
ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize);
|
||||
for (i = 0; i < ncontig; i++, ptep++) {
|
||||
pte_t pte = ptep_get(ptep);
|
||||
pte_t pte = __ptep_get(ptep);
|
||||
|
||||
if (pte_dirty(pte))
|
||||
orig_pte = pte_mkdirty(orig_pte);
|
||||
@ -184,11 +177,11 @@ static pte_t get_clear_contig(struct mm_struct *mm,
|
||||
unsigned long pgsize,
|
||||
unsigned long ncontig)
|
||||
{
|
||||
pte_t orig_pte = ptep_get(ptep);
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
unsigned long i;
|
||||
|
||||
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
|
||||
pte_t pte = ptep_get_and_clear(mm, addr, ptep);
|
||||
pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
|
||||
|
||||
/*
|
||||
* If HW_AFDBM is enabled, then the HW could turn on
|
||||
@ -236,7 +229,7 @@ static void clear_flush(struct mm_struct *mm,
|
||||
unsigned long i, saddr = addr;
|
||||
|
||||
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
|
||||
ptep_clear(mm, addr, ptep);
|
||||
__ptep_get_and_clear(mm, addr, ptep);
|
||||
|
||||
flush_tlb_range(&vma, saddr, addr);
|
||||
}
|
||||
@ -254,12 +247,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
|
||||
|
||||
if (!pte_present(pte)) {
|
||||
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
|
||||
set_pte_at(mm, addr, ptep, pte);
|
||||
__set_ptes(mm, addr, ptep, pte, 1);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!pte_cont(pte)) {
|
||||
set_pte_at(mm, addr, ptep, pte);
|
||||
__set_ptes(mm, addr, ptep, pte, 1);
|
||||
return;
|
||||
}
|
||||
|
||||
@ -270,7 +263,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
|
||||
clear_flush(mm, addr, ptep, pgsize, ncontig);
|
||||
|
||||
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
|
||||
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
|
||||
__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
|
||||
}
|
||||
|
||||
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
|
||||
@ -400,7 +393,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
|
||||
ncontig = num_contig_ptes(sz, &pgsize);
|
||||
|
||||
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
|
||||
pte_clear(mm, addr, ptep);
|
||||
__pte_clear(mm, addr, ptep);
|
||||
}
|
||||
|
||||
pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
|
||||
@ -408,10 +401,10 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
|
||||
{
|
||||
int ncontig;
|
||||
size_t pgsize;
|
||||
pte_t orig_pte = ptep_get(ptep);
|
||||
pte_t orig_pte = __ptep_get(ptep);
|
||||
|
||||
if (!pte_cont(orig_pte))
|
||||
return ptep_get_and_clear(mm, addr, ptep);
|
||||
return __ptep_get_and_clear(mm, addr, ptep);
|
||||
|
||||
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
|
||||
|
||||
@ -431,11 +424,11 @@ static int __cont_access_flags_changed(pte_t *ptep, pte_t pte, int ncontig)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (pte_write(pte) != pte_write(ptep_get(ptep)))
|
||||
if (pte_write(pte) != pte_write(__ptep_get(ptep)))
|
||||
return 1;
|
||||
|
||||
for (i = 0; i < ncontig; i++) {
|
||||
pte_t orig_pte = ptep_get(ptep + i);
|
||||
pte_t orig_pte = __ptep_get(ptep + i);
|
||||
|
||||
if (pte_dirty(pte) != pte_dirty(orig_pte))
|
||||
return 1;
|
||||
@ -459,7 +452,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
pte_t orig_pte;
|
||||
|
||||
if (!pte_cont(pte))
|
||||
return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
|
||||
return __ptep_set_access_flags(vma, addr, ptep, pte, dirty);
|
||||
|
||||
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
|
||||
dpfn = pgsize >> PAGE_SHIFT;
|
||||
@ -478,7 +471,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
|
||||
|
||||
hugeprot = pte_pgprot(pte);
|
||||
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
|
||||
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
|
||||
__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
|
||||
|
||||
return 1;
|
||||
}
|
||||
@ -492,8 +485,8 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
|
||||
size_t pgsize;
|
||||
pte_t pte;
|
||||
|
||||
if (!pte_cont(READ_ONCE(*ptep))) {
|
||||
ptep_set_wrprotect(mm, addr, ptep);
|
||||
if (!pte_cont(__ptep_get(ptep))) {
|
||||
__ptep_set_wrprotect(mm, addr, ptep);
|
||||
return;
|
||||
}
|
||||
|
||||
@ -507,7 +500,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
|
||||
pfn = pte_pfn(pte);
|
||||
|
||||
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
|
||||
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
|
||||
__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
|
||||
}
|
||||
|
||||
pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
|
||||
@ -517,7 +510,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
|
||||
size_t pgsize;
|
||||
int ncontig;
|
||||
|
||||
if (!pte_cont(READ_ONCE(*ptep)))
|
||||
if (!pte_cont(__ptep_get(ptep)))
|
||||
return ptep_clear_flush(vma, addr, ptep);
|
||||
|
||||
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
|
||||
@ -550,7 +543,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr
|
||||
* when the permission changes from executable to non-executable
|
||||
* in cases where cpu is affected with errata #2645198.
|
||||
*/
|
||||
if (pte_user_exec(READ_ONCE(*ptep)))
|
||||
if (pte_user_exec(__ptep_get(ptep)))
|
||||
return huge_ptep_clear_flush(vma, addr, ptep);
|
||||
}
|
||||
return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
|
||||
|
@ -100,7 +100,7 @@ static void __init arch_reserve_crashkernel(void)
|
||||
bool high = false;
|
||||
int ret;
|
||||
|
||||
if (!IS_ENABLED(CONFIG_KEXEC_CORE))
|
||||
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
|
||||
return;
|
||||
|
||||
ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
|
||||
|
@ -125,8 +125,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
|
||||
if (!early)
|
||||
memset(__va(page_phys), KASAN_SHADOW_INIT, PAGE_SIZE);
|
||||
next = addr + PAGE_SIZE;
|
||||
set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
|
||||
} while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)));
|
||||
__set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
|
||||
} while (ptep++, addr = next, addr != end && pte_none(__ptep_get(ptep)));
|
||||
}
|
||||
|
||||
static void __init kasan_pmd_populate(pud_t *pudp, unsigned long addr,
|
||||
@ -366,7 +366,7 @@ static void __init kasan_init_shadow(void)
|
||||
* so we should make sure that it maps the zero page read-only.
|
||||
*/
|
||||
for (i = 0; i < PTRS_PER_PTE; i++)
|
||||
set_pte(&kasan_early_shadow_pte[i],
|
||||
__set_pte(&kasan_early_shadow_pte[i],
|
||||
pfn_pte(sym_to_pfn(kasan_early_shadow_page),
|
||||
PAGE_KERNEL_RO));
|
||||
|
||||
|
@ -179,16 +179,16 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
|
||||
|
||||
ptep = pte_set_fixmap_offset(pmdp, addr);
|
||||
do {
|
||||
pte_t old_pte = READ_ONCE(*ptep);
|
||||
pte_t old_pte = __ptep_get(ptep);
|
||||
|
||||
set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
|
||||
__set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
|
||||
|
||||
/*
|
||||
* After the PTE entry has been populated once, we
|
||||
* only allow updates to the permission attributes.
|
||||
*/
|
||||
BUG_ON(!pgattr_change_is_safe(pte_val(old_pte),
|
||||
READ_ONCE(pte_val(*ptep))));
|
||||
pte_val(__ptep_get(ptep))));
|
||||
|
||||
phys += PAGE_SIZE;
|
||||
} while (ptep++, addr += PAGE_SIZE, addr != end);
|
||||
@ -682,8 +682,6 @@ void mark_rodata_ro(void)
|
||||
WRITE_ONCE(rodata_is_rw, false);
|
||||
update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
|
||||
section_size, PAGE_KERNEL_RO);
|
||||
|
||||
debug_checkwx();
|
||||
}
|
||||
|
||||
static void __init declare_vma(struct vm_struct *vma,
|
||||
@ -846,12 +844,12 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
|
||||
|
||||
do {
|
||||
ptep = pte_offset_kernel(pmdp, addr);
|
||||
pte = READ_ONCE(*ptep);
|
||||
pte = __ptep_get(ptep);
|
||||
if (pte_none(pte))
|
||||
continue;
|
||||
|
||||
WARN_ON(!pte_present(pte));
|
||||
pte_clear(&init_mm, addr, ptep);
|
||||
__pte_clear(&init_mm, addr, ptep);
|
||||
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
|
||||
if (free_mapped)
|
||||
free_hotplug_page_range(pte_page(pte),
|
||||
@ -979,7 +977,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
|
||||
|
||||
do {
|
||||
ptep = pte_offset_kernel(pmdp, addr);
|
||||
pte = READ_ONCE(*ptep);
|
||||
pte = __ptep_get(ptep);
|
||||
|
||||
/*
|
||||
* This is just a sanity check here which verifies that
|
||||
@ -998,7 +996,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
|
||||
*/
|
||||
ptep = pte_offset_kernel(pmdp, 0UL);
|
||||
for (i = 0; i < PTRS_PER_PTE; i++) {
|
||||
if (!pte_none(READ_ONCE(ptep[i])))
|
||||
if (!pte_none(__ptep_get(&ptep[i])))
|
||||
return;
|
||||
}
|
||||
|
||||
@ -1494,7 +1492,7 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte
|
||||
* when the permission changes from executable to non-executable
|
||||
* in cases where cpu is affected with errata #2645198.
|
||||
*/
|
||||
if (pte_user_exec(READ_ONCE(*ptep)))
|
||||
if (pte_user_exec(ptep_get(ptep)))
|
||||
return ptep_clear_flush(vma, addr, ptep);
|
||||
}
|
||||
return ptep_get_and_clear(vma->vm_mm, addr, ptep);
|
||||
|
@ -36,12 +36,12 @@ bool can_set_direct_map(void)
|
||||
static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
|
||||
{
|
||||
struct page_change_data *cdata = data;
|
||||
pte_t pte = READ_ONCE(*ptep);
|
||||
pte_t pte = __ptep_get(ptep);
|
||||
|
||||
pte = clear_pte_bit(pte, cdata->clear_mask);
|
||||
pte = set_pte_bit(pte, cdata->set_mask);
|
||||
|
||||
set_pte(ptep, pte);
|
||||
__set_pte(ptep, pte);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -245,5 +245,5 @@ bool kernel_page_present(struct page *page)
|
||||
return true;
|
||||
|
||||
ptep = pte_offset_kernel(pmdp, addr);
|
||||
return pte_valid(READ_ONCE(*ptep));
|
||||
return pte_valid(__ptep_get(ptep));
|
||||
}
|
||||
|
@ -322,7 +322,7 @@ static struct ptdump_info kernel_ptdump_info __ro_after_init = {
|
||||
.mm = &init_mm,
|
||||
};
|
||||
|
||||
void ptdump_check_wx(void)
|
||||
bool ptdump_check_wx(void)
|
||||
{
|
||||
struct pg_state st = {
|
||||
.seq = NULL,
|
||||
@ -343,11 +343,16 @@ void ptdump_check_wx(void)
|
||||
|
||||
ptdump_walk_pgd(&st.ptdump, &init_mm, NULL);
|
||||
|
||||
if (st.wx_pages || st.uxn_pages)
|
||||
if (st.wx_pages || st.uxn_pages) {
|
||||
pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found, %lu non-UXN pages found\n",
|
||||
st.wx_pages, st.uxn_pages);
|
||||
else
|
||||
|
||||
return false;
|
||||
} else {
|
||||
pr_info("Checked W+X mappings: passed, no W+X pages found\n");
|
||||
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
static int __init ptdump_init(void)
|
||||
|
@ -33,7 +33,7 @@ static void *trans_alloc(struct trans_pgd_info *info)
|
||||
|
||||
static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
|
||||
{
|
||||
pte_t pte = READ_ONCE(*src_ptep);
|
||||
pte_t pte = __ptep_get(src_ptep);
|
||||
|
||||
if (pte_valid(pte)) {
|
||||
/*
|
||||
@ -41,7 +41,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
|
||||
* read only (code, rodata). Clear the RDONLY bit from
|
||||
* the temporary mappings we use during restore.
|
||||
*/
|
||||
set_pte(dst_ptep, pte_mkwrite_novma(pte));
|
||||
__set_pte(dst_ptep, pte_mkwrite_novma(pte));
|
||||
} else if ((debug_pagealloc_enabled() ||
|
||||
is_kfence_address((void *)addr)) && !pte_none(pte)) {
|
||||
/*
|
||||
@ -55,7 +55,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
|
||||
*/
|
||||
BUG_ON(!pfn_valid(pte_pfn(pte)));
|
||||
|
||||
set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte)));
|
||||
__set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte)));
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -2,6 +2,7 @@
|
||||
config CSKY
|
||||
def_bool y
|
||||
select ARCH_32BIT_OFF_T
|
||||
select ARCH_HAS_CPU_CACHE_ALIASING
|
||||
select ARCH_HAS_DMA_PREP_COHERENT
|
||||
select ARCH_HAS_GCOV_PROFILE_ALL
|
||||
select ARCH_HAS_SYNC_DMA_FOR_CPU
|
||||
|
9
arch/csky/include/asm/cachetype.h
Normal file
9
arch/csky/include/asm/cachetype.h
Normal file
@ -0,0 +1,9 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __ASM_CSKY_CACHETYPE_H
|
||||
#define __ASM_CSKY_CACHETYPE_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#define cpu_dcache_is_aliasing() true
|
||||
|
||||
#endif
|
@ -260,7 +260,7 @@ static void __init arch_reserve_crashkernel(void)
|
||||
char *cmdline = boot_command_line;
|
||||
bool high = false;
|
||||
|
||||
if (!IS_ENABLED(CONFIG_KEXEC_CORE))
|
||||
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
|
||||
return;
|
||||
|
||||
ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
|
||||
|
@ -723,7 +723,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
|
||||
/*
|
||||
* Read each entry once. As above, a non-leaf entry can be promoted to
|
||||
* a huge page _during_ this walk. Re-reading the entry could send the
|
||||
* walk into the weeks, e.g. p*d_large() returns false (sees the old
|
||||
* walk into the weeks, e.g. p*d_leaf() returns false (sees the old
|
||||
* value) and then p*d_offset() walks into the target huge page instead
|
||||
* of the old page table (sees the new value).
|
||||
*/
|
||||
|
@ -3,6 +3,7 @@ config M68K
|
||||
bool
|
||||
default y
|
||||
select ARCH_32BIT_OFF_T
|
||||
select ARCH_HAS_CPU_CACHE_ALIASING
|
||||
select ARCH_HAS_BINFMT_FLAT
|
||||
select ARCH_HAS_CPU_FINALIZE_INIT if MMU
|
||||
select ARCH_HAS_CURRENT_STACK_POINTER
|
||||
|
9
arch/m68k/include/asm/cachetype.h
Normal file
9
arch/m68k/include/asm/cachetype.h
Normal file
@ -0,0 +1,9 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __ASM_M68K_CACHETYPE_H
|
||||
#define __ASM_M68K_CACHETYPE_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#define cpu_dcache_is_aliasing() true
|
||||
|
||||
#endif
|
@ -4,6 +4,7 @@ config MIPS
|
||||
default y
|
||||
select ARCH_32BIT_OFF_T if !64BIT
|
||||
select ARCH_BINFMT_ELF_STATE if MIPS_FP_SUPPORT
|
||||
select ARCH_HAS_CPU_CACHE_ALIASING
|
||||
select ARCH_HAS_CPU_FINALIZE_INIT
|
||||
select ARCH_HAS_CURRENT_STACK_POINTER if !CC_IS_CLANG || CLANG_VERSION >= 140000
|
||||
select ARCH_HAS_DEBUG_VIRTUAL if !64BIT
|
||||
|
9
arch/mips/include/asm/cachetype.h
Normal file
9
arch/mips/include/asm/cachetype.h
Normal file
@ -0,0 +1,9 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __ASM_MIPS_CACHETYPE_H
|
||||
#define __ASM_MIPS_CACHETYPE_H
|
||||
|
||||
#include <asm/cpu-features.h>
|
||||
|
||||
#define cpu_dcache_is_aliasing() cpu_has_dc_aliases
|
||||
|
||||
#endif
|
@ -442,8 +442,6 @@ static void __init mips_reserve_vmcore(void)
|
||||
#endif
|
||||
}
|
||||
|
||||
#ifdef CONFIG_KEXEC
|
||||
|
||||
/* 64M alignment for crash kernel regions */
|
||||
#define CRASH_ALIGN SZ_64M
|
||||
#define CRASH_ADDR_MAX SZ_512M
|
||||
@ -454,6 +452,9 @@ static void __init mips_parse_crashkernel(void)
|
||||
unsigned long long crash_size, crash_base;
|
||||
int ret;
|
||||
|
||||
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
|
||||
return;
|
||||
|
||||
total_mem = memblock_phys_mem_size();
|
||||
ret = parse_crashkernel(boot_command_line, total_mem,
|
||||
&crash_size, &crash_base,
|
||||
@ -489,6 +490,9 @@ static void __init request_crashkernel(struct resource *res)
|
||||
{
|
||||
int ret;
|
||||
|
||||
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
|
||||
return;
|
||||
|
||||
if (crashk_res.start == crashk_res.end)
|
||||
return;
|
||||
|
||||
@ -498,15 +502,6 @@ static void __init request_crashkernel(struct resource *res)
|
||||
(unsigned long)(resource_size(&crashk_res) >> 20),
|
||||
(unsigned long)(crashk_res.start >> 20));
|
||||
}
|
||||
#else /* !defined(CONFIG_KEXEC) */
|
||||
static void __init mips_parse_crashkernel(void)
|
||||
{
|
||||
}
|
||||
|
||||
static void __init request_crashkernel(struct resource *res)
|
||||
{
|
||||
}
|
||||
#endif /* !defined(CONFIG_KEXEC) */
|
||||
|
||||
static void __init check_kernel_sections_mem(void)
|
||||
{
|
||||
|
@ -2,6 +2,7 @@
|
||||
config NIOS2
|
||||
def_bool y
|
||||
select ARCH_32BIT_OFF_T
|
||||
select ARCH_HAS_CPU_CACHE_ALIASING
|
||||
select ARCH_HAS_DMA_PREP_COHERENT
|
||||
select ARCH_HAS_SYNC_DMA_FOR_CPU
|
||||
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
|
||||
|
10
arch/nios2/include/asm/cachetype.h
Normal file
10
arch/nios2/include/asm/cachetype.h
Normal file
@ -0,0 +1,10 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __ASM_NIOS2_CACHETYPE_H
|
||||
#define __ASM_NIOS2_CACHETYPE_H
|
||||
|
||||
#include <asm/page.h>
|
||||
#include <asm/cache.h>
|
||||
|
||||
#define cpu_dcache_is_aliasing() (NIOS2_DCACHE_SIZE > PAGE_SIZE)
|
||||
|
||||
#endif
|
@ -178,6 +178,8 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
|
||||
*ptep = pteval;
|
||||
}
|
||||
|
||||
#define PFN_PTE_SHIFT 0
|
||||
|
||||
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pte, unsigned int nr)
|
||||
{
|
||||
|
@ -8,6 +8,7 @@ config PARISC
|
||||
select HAVE_FUNCTION_GRAPH_TRACER
|
||||
select HAVE_SYSCALL_TRACEPOINTS
|
||||
select ARCH_WANT_FRAME_POINTERS
|
||||
select ARCH_HAS_CPU_CACHE_ALIASING
|
||||
select ARCH_HAS_DMA_ALLOC if PA11
|
||||
select ARCH_HAS_ELF_RANDOMIZE
|
||||
select ARCH_HAS_STRICT_KERNEL_RWX
|
||||
|
9
arch/parisc/include/asm/cachetype.h
Normal file
9
arch/parisc/include/asm/cachetype.h
Normal file
@ -0,0 +1,9 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef __ASM_PARISC_CACHETYPE_H
|
||||
#define __ASM_PARISC_CACHETYPE_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
#define cpu_dcache_is_aliasing() true
|
||||
|
||||
#endif
|
@ -608,6 +608,11 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
|
||||
config ARCH_SUPPORTS_KEXEC
|
||||
def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
|
||||
|
||||
config ARCH_SELECTS_KEXEC
|
||||
def_bool y
|
||||
depends on KEXEC
|
||||
select CRASH_DUMP
|
||||
|
||||
config ARCH_SUPPORTS_KEXEC_FILE
|
||||
def_bool PPC64
|
||||
|
||||
@ -618,6 +623,7 @@ config ARCH_SELECTS_KEXEC_FILE
|
||||
def_bool y
|
||||
depends on KEXEC_FILE
|
||||
select KEXEC_ELF
|
||||
select CRASH_DUMP
|
||||
select HAVE_IMA_KEXEC if IMA
|
||||
|
||||
config PPC64_BIG_ENDIAN_ELF_ABI_V2
|
||||
@ -690,7 +696,6 @@ config ARCH_SELECTS_CRASH_DUMP
|
||||
config FA_DUMP
|
||||
bool "Firmware-assisted dump"
|
||||
depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
|
||||
select CRASH_CORE
|
||||
select CRASH_DUMP
|
||||
help
|
||||
A robust mechanism to get reliable kernel crash dump with
|
||||
|
@ -1157,20 +1157,6 @@ pud_hugepage_update(struct mm_struct *mm, unsigned long addr, pud_t *pudp,
|
||||
return pud_val(*pudp);
|
||||
}
|
||||
|
||||
/*
|
||||
* returns true for pmd migration entries, THP, devmap, hugetlb
|
||||
* But compile time dependent on THP config
|
||||
*/
|
||||
static inline int pmd_large(pmd_t pmd)
|
||||
{
|
||||
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
|
||||
}
|
||||
|
||||
static inline int pud_large(pud_t pud)
|
||||
{
|
||||
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
|
||||
}
|
||||
|
||||
/*
|
||||
* For radix we should always find H_PAGE_HASHPTE zero. Hence
|
||||
* the below will work for radix too
|
||||
@ -1451,18 +1437,16 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va
|
||||
}
|
||||
|
||||
/*
|
||||
* Like pmd_huge() and pmd_large(), but works regardless of config options
|
||||
* Like pmd_huge(), but works regardless of config options
|
||||
*/
|
||||
#define pmd_is_leaf pmd_is_leaf
|
||||
#define pmd_leaf pmd_is_leaf
|
||||
static inline bool pmd_is_leaf(pmd_t pmd)
|
||||
#define pmd_leaf pmd_leaf
|
||||
static inline bool pmd_leaf(pmd_t pmd)
|
||||
{
|
||||
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
|
||||
}
|
||||
|
||||
#define pud_is_leaf pud_is_leaf
|
||||
#define pud_leaf pud_is_leaf
|
||||
static inline bool pud_is_leaf(pud_t pud)
|
||||
#define pud_leaf pud_leaf
|
||||
static inline bool pud_leaf(pud_t pud)
|
||||
{
|
||||
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
|
||||
}
|
||||
|
@ -41,6 +41,8 @@ struct mm_struct;
|
||||
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
#define PFN_PTE_SHIFT PTE_RPN_SHIFT
|
||||
|
||||
void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
|
||||
pte_t pte, unsigned int nr);
|
||||
#define set_ptes set_ptes
|
||||
@ -99,10 +101,6 @@ void poking_init(void);
|
||||
extern unsigned long ioremap_bot;
|
||||
extern const pgprot_t protection_map[16];
|
||||
|
||||
#ifndef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
#define pmd_large(pmd) 0
|
||||
#endif
|
||||
|
||||
/* can we use this in kvm */
|
||||
unsigned long vmalloc_to_phys(void *vmalloc_addr);
|
||||
|
||||
@ -180,30 +178,6 @@ static inline void pte_frag_set(mm_context_t *ctx, void *p)
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifndef pmd_is_leaf
|
||||
#define pmd_is_leaf pmd_is_leaf
|
||||
static inline bool pmd_is_leaf(pmd_t pmd)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifndef pud_is_leaf
|
||||
#define pud_is_leaf pud_is_leaf
|
||||
static inline bool pud_is_leaf(pud_t pud)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifndef p4d_is_leaf
|
||||
#define p4d_is_leaf p4d_is_leaf
|
||||
static inline bool p4d_is_leaf(p4d_t p4d)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
#endif
|
||||
|
||||
#define pmd_pgtable pmd_pgtable
|
||||
static inline pgtable_t pmd_pgtable(pmd_t pmd)
|
||||
{
|
||||
|
@ -19,6 +19,8 @@
|
||||
|
||||
#include <linux/pagemap.h>
|
||||
|
||||
static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
|
||||
unsigned long address);
|
||||
#define __tlb_remove_tlb_entry __tlb_remove_tlb_entry
|
||||
|
||||
#define tlb_flush tlb_flush
|
||||
|
@ -109,7 +109,7 @@ int ppc_do_canonicalize_irqs;
|
||||
EXPORT_SYMBOL(ppc_do_canonicalize_irqs);
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_CRASH_CORE
|
||||
#ifdef CONFIG_VMCORE_INFO
|
||||
/* This keeps a track of which one is the crashing cpu. */
|
||||
int crashing_cpu = -1;
|
||||
#endif
|
||||
|
@ -8,6 +8,7 @@ obj-y += core.o crash.o core_$(BITS).o
|
||||
obj-$(CONFIG_PPC32) += relocate_32.o
|
||||
|
||||
obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o
|
||||
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
|
||||
|
||||
# Disable GCOV, KCOV & sanitizers in odd or sensitive code
|
||||
GCOV_PROFILE_core_$(BITS).o := n
|
||||
|
@ -53,34 +53,6 @@ void machine_kexec_cleanup(struct kimage *image)
|
||||
{
|
||||
}
|
||||
|
||||
void arch_crash_save_vmcoreinfo(void)
|
||||
{
|
||||
|
||||
#ifdef CONFIG_NUMA
|
||||
VMCOREINFO_SYMBOL(node_data);
|
||||
VMCOREINFO_LENGTH(node_data, MAX_NUMNODES);
|
||||
#endif
|
||||
#ifndef CONFIG_NUMA
|
||||
VMCOREINFO_SYMBOL(contig_page_data);
|
||||
#endif
|
||||
#if defined(CONFIG_PPC64) && defined(CONFIG_SPARSEMEM_VMEMMAP)
|
||||
VMCOREINFO_SYMBOL(vmemmap_list);
|
||||
VMCOREINFO_SYMBOL(mmu_vmemmap_psize);
|
||||
VMCOREINFO_SYMBOL(mmu_psize_defs);
|
||||
VMCOREINFO_STRUCT_SIZE(vmemmap_backing);
|
||||
VMCOREINFO_OFFSET(vmemmap_backing, list);
|
||||
VMCOREINFO_OFFSET(vmemmap_backing, phys);
|
||||
VMCOREINFO_OFFSET(vmemmap_backing, virt_addr);
|
||||
VMCOREINFO_STRUCT_SIZE(mmu_psize_def);
|
||||
VMCOREINFO_OFFSET(mmu_psize_def, shift);
|
||||
#endif
|
||||
VMCOREINFO_SYMBOL(cur_cpu_spec);
|
||||
VMCOREINFO_OFFSET(cpu_spec, cpu_features);
|
||||
VMCOREINFO_OFFSET(cpu_spec, mmu_features);
|
||||
vmcoreinfo_append_str("NUMBER(RADIX_MMU)=%d\n", early_radix_enabled());
|
||||
vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset());
|
||||
}
|
||||
|
||||
/*
|
||||
* Do not allocate memory (or fail in any way) in machine_kexec().
|
||||
* We are past the point of no return, committed to rebooting now.
|
||||
|
32
arch/powerpc/kexec/vmcore_info.c
Normal file
32
arch/powerpc/kexec/vmcore_info.c
Normal file
@ -0,0 +1,32 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
|
||||
#include <linux/vmcore_info.h>
|
||||
#include <asm/pgalloc.h>
|
||||
|
||||
void arch_crash_save_vmcoreinfo(void)
|
||||
{
|
||||
|
||||
#ifdef CONFIG_NUMA
|
||||
VMCOREINFO_SYMBOL(node_data);
|
||||
VMCOREINFO_LENGTH(node_data, MAX_NUMNODES);
|
||||
#endif
|
||||
#ifndef CONFIG_NUMA
|
||||
VMCOREINFO_SYMBOL(contig_page_data);
|
||||
#endif
|
||||
#if defined(CONFIG_PPC64) && defined(CONFIG_SPARSEMEM_VMEMMAP)
|
||||
VMCOREINFO_SYMBOL(vmemmap_list);
|
||||
VMCOREINFO_SYMBOL(mmu_vmemmap_psize);
|
||||
VMCOREINFO_SYMBOL(mmu_psize_defs);
|
||||
VMCOREINFO_STRUCT_SIZE(vmemmap_backing);
|
||||
VMCOREINFO_OFFSET(vmemmap_backing, list);
|
||||
VMCOREINFO_OFFSET(vmemmap_backing, phys);
|
||||
VMCOREINFO_OFFSET(vmemmap_backing, virt_addr);
|
||||
VMCOREINFO_STRUCT_SIZE(mmu_psize_def);
|
||||
VMCOREINFO_OFFSET(mmu_psize_def, shift);
|
||||
#endif
|
||||
VMCOREINFO_SYMBOL(cur_cpu_spec);
|
||||
VMCOREINFO_OFFSET(cpu_spec, cpu_features);
|
||||
VMCOREINFO_OFFSET(cpu_spec, mmu_features);
|
||||
vmcoreinfo_append_str("NUMBER(RADIX_MMU)=%d\n", early_radix_enabled());
|
||||
vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset());
|
||||
}
|
@ -503,7 +503,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t *pmd, bool full,
|
||||
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
|
||||
if (!pmd_present(*p))
|
||||
continue;
|
||||
if (pmd_is_leaf(*p)) {
|
||||
if (pmd_leaf(*p)) {
|
||||
if (full) {
|
||||
pmd_clear(p);
|
||||
} else {
|
||||
@ -532,7 +532,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t *pud,
|
||||
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) {
|
||||
if (!pud_present(*p))
|
||||
continue;
|
||||
if (pud_is_leaf(*p)) {
|
||||
if (pud_leaf(*p)) {
|
||||
pud_clear(p);
|
||||
} else {
|
||||
pmd_t *pmd;
|
||||
@ -635,12 +635,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
|
||||
new_pud = pud_alloc_one(kvm->mm, gpa);
|
||||
|
||||
pmd = NULL;
|
||||
if (pud && pud_present(*pud) && !pud_is_leaf(*pud))
|
||||
if (pud && pud_present(*pud) && !pud_leaf(*pud))
|
||||
pmd = pmd_offset(pud, gpa);
|
||||
else if (level <= 1)
|
||||
new_pmd = kvmppc_pmd_alloc();
|
||||
|
||||
if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
|
||||
if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd)))
|
||||
new_ptep = kvmppc_pte_alloc();
|
||||
|
||||
/* Check if we might have been invalidated; let the guest retry if so */
|
||||
@ -658,7 +658,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
|
||||
new_pud = NULL;
|
||||
}
|
||||
pud = pud_offset(p4d, gpa);
|
||||
if (pud_is_leaf(*pud)) {
|
||||
if (pud_leaf(*pud)) {
|
||||
unsigned long hgpa = gpa & PUD_MASK;
|
||||
|
||||
/* Check if we raced and someone else has set the same thing */
|
||||
@ -709,7 +709,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
|
||||
new_pmd = NULL;
|
||||
}
|
||||
pmd = pmd_offset(pud, gpa);
|
||||
if (pmd_is_leaf(*pmd)) {
|
||||
if (pmd_leaf(*pmd)) {
|
||||
unsigned long lgpa = gpa & PMD_MASK;
|
||||
|
||||
/* Check if we raced and someone else has set the same thing */
|
||||
|
@ -113,7 +113,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
|
||||
|
||||
WARN_ON(pte_hw_valid(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp)));
|
||||
assert_spin_locked(pmd_lockptr(mm, pmdp));
|
||||
WARN_ON(!(pmd_large(pmd)));
|
||||
WARN_ON(!(pmd_leaf(pmd)));
|
||||
#endif
|
||||
trace_hugepage_set_pmd(addr, pmd_val(pmd));
|
||||
return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
|
||||
@ -130,7 +130,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
|
||||
|
||||
WARN_ON(pte_hw_valid(pud_pte(*pudp)));
|
||||
assert_spin_locked(pud_lockptr(mm, pudp));
|
||||
WARN_ON(!(pud_large(pud)));
|
||||
WARN_ON(!(pud_leaf(pud)));
|
||||
#endif
|
||||
trace_hugepage_set_pud(addr, pud_val(pud));
|
||||
return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
|
||||
|
@ -204,14 +204,14 @@ static void radix__change_memory_range(unsigned long start, unsigned long end,
|
||||
pudp = pud_alloc(&init_mm, p4dp, idx);
|
||||
if (!pudp)
|
||||
continue;
|
||||
if (pud_is_leaf(*pudp)) {
|
||||
if (pud_leaf(*pudp)) {
|
||||
ptep = (pte_t *)pudp;
|
||||
goto update_the_pte;
|
||||
}
|
||||
pmdp = pmd_alloc(&init_mm, pudp, idx);
|
||||
if (!pmdp)
|
||||
continue;
|
||||
if (pmd_is_leaf(*pmdp)) {
|
||||
if (pmd_leaf(*pmdp)) {
|
||||
ptep = pmdp_ptep(pmdp);
|
||||
goto update_the_pte;
|
||||
}
|
||||
@ -767,7 +767,7 @@ static void __meminit remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
|
||||
if (!pmd_present(*pmd))
|
||||
continue;
|
||||
|
||||
if (pmd_is_leaf(*pmd)) {
|
||||
if (pmd_leaf(*pmd)) {
|
||||
if (IS_ALIGNED(addr, PMD_SIZE) &&
|
||||
IS_ALIGNED(next, PMD_SIZE)) {
|
||||
if (!direct)
|
||||
@ -807,7 +807,7 @@ static void __meminit remove_pud_table(pud_t *pud_start, unsigned long addr,
|
||||
if (!pud_present(*pud))
|
||||
continue;
|
||||
|
||||
if (pud_is_leaf(*pud)) {
|
||||
if (pud_leaf(*pud)) {
|
||||
if (!IS_ALIGNED(addr, PUD_SIZE) ||
|
||||
!IS_ALIGNED(next, PUD_SIZE)) {
|
||||
WARN_ONCE(1, "%s: unaligned range\n", __func__);
|
||||
@ -845,7 +845,7 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct,
|
||||
if (!p4d_present(*p4d))
|
||||
continue;
|
||||
|
||||
if (p4d_is_leaf(*p4d)) {
|
||||
if (p4d_leaf(*p4d)) {
|
||||
if (!IS_ALIGNED(addr, P4D_SIZE) ||
|
||||
!IS_ALIGNED(next, P4D_SIZE)) {
|
||||
WARN_ONCE(1, "%s: unaligned range\n", __func__);
|
||||
@ -924,7 +924,7 @@ bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
|
||||
int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
|
||||
unsigned long addr, unsigned long next)
|
||||
{
|
||||
int large = pmd_large(*pmdp);
|
||||
int large = pmd_leaf(*pmdp);
|
||||
|
||||
if (large)
|
||||
vmemmap_verify(pmdp_ptep(pmdp), node, addr, next);
|
||||
@ -1554,7 +1554,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
|
||||
|
||||
int pud_clear_huge(pud_t *pud)
|
||||
{
|
||||
if (pud_is_leaf(*pud)) {
|
||||
if (pud_leaf(*pud)) {
|
||||
pud_clear(pud);
|
||||
return 1;
|
||||
}
|
||||
@ -1601,7 +1601,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
|
||||
|
||||
int pmd_clear_huge(pmd_t *pmd)
|
||||
{
|
||||
if (pmd_is_leaf(*pmd)) {
|
||||
if (pmd_leaf(*pmd)) {
|
||||
pmd_clear(pmd);
|
||||
return 1;
|
||||
}
|
||||
|
@ -226,7 +226,7 @@ static int __init pseries_alloc_bootmem_huge_page(struct hstate *hstate)
|
||||
return 0;
|
||||
m = phys_to_virt(gpage_freearray[--nr_gpages]);
|
||||
gpage_freearray[nr_gpages] = 0;
|
||||
list_add(&m->list, &huge_boot_pages);
|
||||
list_add(&m->list, &huge_boot_pages[0]);
|
||||
m->hstate = hstate;
|
||||
return 1;
|
||||
}
|
||||
@ -614,8 +614,6 @@ void __init gigantic_hugetlb_cma_reserve(void)
|
||||
*/
|
||||
order = mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
|
||||
|
||||
if (order) {
|
||||
VM_WARN_ON(order <= MAX_PAGE_ORDER);
|
||||
if (order)
|
||||
hugetlb_cma_reserve(order);
|
||||
}
|
||||
}
|
||||
|
@ -171,12 +171,6 @@ static inline void mmu_mark_rodata_ro(void) { }
|
||||
void __init mmu_mapin_immr(void);
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_DEBUG_WX
|
||||
void ptdump_check_wx(void);
|
||||
#else
|
||||
static inline void ptdump_check_wx(void) { }
|
||||
#endif
|
||||
|
||||
static inline bool debug_pagealloc_enabled_or_kfence(void)
|
||||
{
|
||||
return IS_ENABLED(CONFIG_KFENCE) || debug_pagealloc_enabled();
|
||||
|
@ -13,7 +13,7 @@
|
||||
#include <linux/delay.h>
|
||||
#include <linux/memblock.h>
|
||||
#include <linux/libfdt.h>
|
||||
#include <linux/crash_core.h>
|
||||
#include <linux/crash_reserve.h>
|
||||
#include <linux/of.h>
|
||||
#include <linux/of_fdt.h>
|
||||
#include <asm/cacheflush.h>
|
||||
@ -173,7 +173,7 @@ static __init bool overlaps_region(const void *fdt, u32 start,
|
||||
|
||||
static void __init get_crash_kernel(void *fdt, unsigned long size)
|
||||
{
|
||||
#ifdef CONFIG_CRASH_CORE
|
||||
#ifdef CONFIG_CRASH_RESERVE
|
||||
unsigned long long crash_size, crash_base;
|
||||
int ret;
|
||||
|
||||
|
@ -220,10 +220,7 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
|
||||
break;
|
||||
ptep++;
|
||||
addr += PAGE_SIZE;
|
||||
/*
|
||||
* increment the pfn.
|
||||
*/
|
||||
pte = pfn_pte(pte_pfn(pte) + 1, pte_pgprot((pte)));
|
||||
pte = pte_next_pfn(pte);
|
||||
}
|
||||
}
|
||||
|
||||
@ -413,7 +410,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
|
||||
if (p4d_none(p4d))
|
||||
return NULL;
|
||||
|
||||
if (p4d_is_leaf(p4d)) {
|
||||
if (p4d_leaf(p4d)) {
|
||||
ret_pte = (pte_t *)p4dp;
|
||||
goto out;
|
||||
}
|
||||
@ -435,7 +432,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
|
||||
if (pud_none(pud))
|
||||
return NULL;
|
||||
|
||||
if (pud_is_leaf(pud)) {
|
||||
if (pud_leaf(pud)) {
|
||||
ret_pte = (pte_t *)pudp;
|
||||
goto out;
|
||||
}
|
||||
@ -474,7 +471,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (pmd_is_leaf(pmd)) {
|
||||
if (pmd_leaf(pmd)) {
|
||||
ret_pte = (pte_t *)pmdp;
|
||||
goto out;
|
||||
}
|
||||
|
@ -153,7 +153,6 @@ void mark_rodata_ro(void)
|
||||
|
||||
if (v_block_mapped((unsigned long)_stext + 1)) {
|
||||
mmu_mark_rodata_ro();
|
||||
ptdump_check_wx();
|
||||
return;
|
||||
}
|
||||
|
||||
@ -166,9 +165,6 @@ void mark_rodata_ro(void)
|
||||
PFN_DOWN((unsigned long)_stext);
|
||||
|
||||
set_memory_ro((unsigned long)_stext, numpages);
|
||||
|
||||
// mark_initmem_nx() should have already run by now
|
||||
ptdump_check_wx();
|
||||
}
|
||||
#endif
|
||||
|
||||
|
@ -100,7 +100,7 @@ EXPORT_SYMBOL(__pte_frag_size_shift);
|
||||
/* 4 level page table */
|
||||
struct page *p4d_page(p4d_t p4d)
|
||||
{
|
||||
if (p4d_is_leaf(p4d)) {
|
||||
if (p4d_leaf(p4d)) {
|
||||
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
|
||||
VM_WARN_ON(!p4d_huge(p4d));
|
||||
return pte_page(p4d_pte(p4d));
|
||||
@ -111,7 +111,7 @@ struct page *p4d_page(p4d_t p4d)
|
||||
|
||||
struct page *pud_page(pud_t pud)
|
||||
{
|
||||
if (pud_is_leaf(pud)) {
|
||||
if (pud_leaf(pud)) {
|
||||
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
|
||||
VM_WARN_ON(!pud_huge(pud));
|
||||
return pte_page(pud_pte(pud));
|
||||
@ -125,14 +125,14 @@ struct page *pud_page(pud_t pud)
|
||||
*/
|
||||
struct page *pmd_page(pmd_t pmd)
|
||||
{
|
||||
if (pmd_is_leaf(pmd)) {
|
||||
if (pmd_leaf(pmd)) {
|
||||
/*
|
||||
* vmalloc_to_page may be called on any vmap address (not only
|
||||
* vmalloc), and it uses pmd_page() etc., when huge vmap is
|
||||
* enabled so these checks can't be used.
|
||||
*/
|
||||
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
|
||||
VM_WARN_ON(!(pmd_large(pmd) || pmd_huge(pmd)));
|
||||
VM_WARN_ON(!(pmd_leaf(pmd) || pmd_huge(pmd)));
|
||||
return pte_page(pmd_pte(pmd));
|
||||
}
|
||||
return virt_to_page(pmd_page_vaddr(pmd));
|
||||
@ -150,9 +150,6 @@ void mark_rodata_ro(void)
|
||||
radix__mark_rodata_ro();
|
||||
else
|
||||
hash__mark_rodata_ro();
|
||||
|
||||
// mark_initmem_nx() should have already run by now
|
||||
ptdump_check_wx();
|
||||
}
|
||||
|
||||
void mark_initmem_nx(void)
|
||||
|
@ -184,13 +184,14 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr)
|
||||
{
|
||||
pte_t pte = __pte(st->current_flags);
|
||||
|
||||
if (!IS_ENABLED(CONFIG_DEBUG_WX) || !st->check_wx)
|
||||
if (!st->check_wx)
|
||||
return;
|
||||
|
||||
if (!pte_write(pte) || !pte_exec(pte))
|
||||
return;
|
||||
|
||||
WARN_ONCE(1, "powerpc/mm: Found insecure W+X mapping at address %p/%pS\n",
|
||||
WARN_ONCE(IS_ENABLED(CONFIG_DEBUG_WX),
|
||||
"powerpc/mm: Found insecure W+X mapping at address %p/%pS\n",
|
||||
(void *)st->start_address, (void *)st->start_address);
|
||||
|
||||
st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
|
||||
@ -326,8 +327,7 @@ static void __init build_pgtable_complete_mask(void)
|
||||
pg_level[i].mask |= pg_level[i].flag[j].mask;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_DEBUG_WX
|
||||
void ptdump_check_wx(void)
|
||||
bool ptdump_check_wx(void)
|
||||
{
|
||||
struct pg_state st = {
|
||||
.seq = NULL,
|
||||
@ -343,15 +343,22 @@ void ptdump_check_wx(void)
|
||||
}
|
||||
};
|
||||
|
||||
if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !mmu_has_feature(MMU_FTR_KERNEL_RO))
|
||||
return true;
|
||||
|
||||
ptdump_walk_pgd(&st.ptdump, &init_mm, NULL);
|
||||
|
||||
if (st.wx_pages)
|
||||
if (st.wx_pages) {
|
||||
pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found\n",
|
||||
st.wx_pages);
|
||||
else
|
||||
|
||||
return false;
|
||||
} else {
|
||||
pr_info("Checked W+X mappings: passed, no W+X pages found\n");
|
||||
|
||||
return true;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
static int __init ptdump_init(void)
|
||||
{
|
||||
|
@ -16,7 +16,7 @@
|
||||
#include <linux/kobject.h>
|
||||
#include <linux/sysfs.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/crash_core.h>
|
||||
#include <linux/vmcore_info.h>
|
||||
#include <linux/of.h>
|
||||
|
||||
#include <asm/page.h>
|
||||
|
@ -3342,7 +3342,7 @@ static void show_pte(unsigned long addr)
|
||||
return;
|
||||
}
|
||||
|
||||
if (p4d_is_leaf(*p4dp)) {
|
||||
if (p4d_leaf(*p4dp)) {
|
||||
format_pte(p4dp, p4d_val(*p4dp));
|
||||
return;
|
||||
}
|
||||
@ -3356,7 +3356,7 @@ static void show_pte(unsigned long addr)
|
||||
return;
|
||||
}
|
||||
|
||||
if (pud_is_leaf(*pudp)) {
|
||||
if (pud_leaf(*pudp)) {
|
||||
format_pte(pudp, pud_val(*pudp));
|
||||
return;
|
||||
}
|
||||
@ -3370,7 +3370,7 @@ static void show_pte(unsigned long addr)
|
||||
return;
|
||||
}
|
||||
|
||||
if (pmd_is_leaf(*pmdp)) {
|
||||
if (pmd_leaf(*pmdp)) {
|
||||
format_pte(pmdp, pmd_val(*pmdp));
|
||||
return;
|
||||
}
|
||||
|
@ -767,7 +767,7 @@ config ARCH_SUPPORTS_CRASH_DUMP
|
||||
def_bool y
|
||||
|
||||
config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
|
||||
def_bool CRASH_CORE
|
||||
def_bool CRASH_RESERVE
|
||||
|
||||
config COMPAT
|
||||
bool "Kernel support for 32-bit U-mode"
|
||||
|
@ -1,6 +1,6 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0-only */
|
||||
#ifndef _RISCV_CRASH_CORE_H
|
||||
#define _RISCV_CRASH_CORE_H
|
||||
#ifndef _RISCV_CRASH_RESERVE_H
|
||||
#define _RISCV_CRASH_RESERVE_H
|
||||
|
||||
#define CRASH_ALIGN PMD_SIZE
|
||||
|
@ -190,7 +190,7 @@ static inline int pud_bad(pud_t pud)
|
||||
}
|
||||
|
||||
#define pud_leaf pud_leaf
|
||||
static inline int pud_leaf(pud_t pud)
|
||||
static inline bool pud_leaf(pud_t pud)
|
||||
{
|
||||
return pud_present(pud) && (pud_val(pud) & _PAGE_LEAF);
|
||||
}
|
||||
|
@ -241,7 +241,7 @@ static inline int pmd_bad(pmd_t pmd)
|
||||
}
|
||||
|
||||
#define pmd_leaf pmd_leaf
|
||||
static inline int pmd_leaf(pmd_t pmd)
|
||||
static inline bool pmd_leaf(pmd_t pmd)
|
||||
{
|
||||
return pmd_present(pmd) && (pmd_val(pmd) & _PAGE_LEAF);
|
||||
}
|
||||
@ -527,6 +527,8 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
|
||||
set_pte(ptep, pteval);
|
||||
}
|
||||
|
||||
#define PFN_PTE_SHIFT _PAGE_PFN_SHIFT
|
||||
|
||||
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
|
||||
pte_t *ptep, pte_t pteval, unsigned int nr)
|
||||
{
|
||||
|
@ -1,22 +0,0 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* Copyright (C) 2019 SiFive
|
||||
*/
|
||||
|
||||
#ifndef _ASM_RISCV_PTDUMP_H
|
||||
#define _ASM_RISCV_PTDUMP_H
|
||||
|
||||
void ptdump_check_wx(void);
|
||||
|
||||
#ifdef CONFIG_DEBUG_WX
|
||||
static inline void debug_checkwx(void)
|
||||
{
|
||||
ptdump_check_wx();
|
||||
}
|
||||
#else
|
||||
static inline void debug_checkwx(void)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif /* _ASM_RISCV_PTDUMP_H */
|
@ -94,7 +94,7 @@ obj-$(CONFIG_KGDB) += kgdb.o
|
||||
obj-$(CONFIG_KEXEC_CORE) += kexec_relocate.o crash_save_regs.o machine_kexec.o
|
||||
obj-$(CONFIG_KEXEC_FILE) += elf_kexec.o machine_kexec_file.o
|
||||
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
|
||||
obj-$(CONFIG_CRASH_CORE) += crash_core.o
|
||||
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
|
||||
|
||||
obj-$(CONFIG_JUMP_LABEL) += jump_label.o
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user