The set_memory_{ro/rw/nx/x}() functions are required for
STRICT_MODULE_RWX, and are generally useful primitives to have. This
implementation is designed to be generic across powerpc's many MMUs.
It's possible that this could be optimised to be faster for specific
MMUs.
This implementation does not handle cases where the caller is attempting
to change the mapping of the page it is executing from, or if another
CPU is concurrently using the page being altered. These cases likely
shouldn't happen, but a more complex implementation with MMU-specific code
could safely handle them.
On hash, the linear mapping is not kept in the linux pagetable, so this
will not change the protection if used on that range. Currently these
functions are not used on the linear map so just WARN for now.
apply_to_existing_page_range() does not work on huge pages so for now
disallow changing the protection of huge pages.
[jpn: - Allow set memory functions to be used without Strict RWX
- Hash: Disallow certain regions
- Have change_page_attr() take function pointers to manipulate ptes
- Radix: Add ptesync after set_pte_at()]
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-2-jniethe5@gmail.com
Merge some powerpc KVM patches from our topic branch.
In particular this brings in Nick's big series rewriting parts of the
guest entry/exit path in C.
Conflicts:
arch/powerpc/kernel/security.c
arch/powerpc/kvm/book3s_hv_rmhandlers.S
Update _tlbiel_pid() such that we can avoid build errors like below when
using this function in other places.
arch/powerpc/mm/book3s64/radix_tlb.c: In function ‘__radix__flush_tlb_range_psize’:
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably does not match constraints
114 | asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
| ^~~
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in ‘asm’
make[4]: *** [scripts/Makefile.build:271: arch/powerpc/mm/book3s64/radix_tlb.o] Error 1
m
With this fix, we can also drop the __always_inline in __radix_flush_tlb_range_psize
which was added by commit e12d6d7d46 ("powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210610083639.387365-1-aneesh.kumar@linux.ibm.com
KUEP implements the update of user segment registers.
Move it into mmu-hash.h in order to use it from other places.
And inline kuep_lock() and kuep_unlock(). Inlining kuep_lock() is
important for system_call_exception(), otherwise system_call_exception()
has to save into stack the system call parameters that are used just
after, and doing that takes more instructions than kuep_lock() itself.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24591ca480d14a62ef910e38a5273d551262c4a2.1622708530.git.christophe.leroy@csgroup.eu
'struct ppc_inst' is an internal representation of an instruction, but
in-memory instructions are and will remain a table of 'u32' forever.
Replace all 'struct ppc_inst *' used for locating an instruction in
memory by 'u32 *'. This removes a lot of undue casts to 'struct
ppc_inst *'.
It also helps locating ab-use of 'struct ppc_inst' dereference.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix ppc_inst_next(), use u32 instead of unsigned int]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
Rather than partition the guest PID space + flush a rogue guest PID to
work around this problem, instead fix it by always disabling the MMU when
switching in or out of guest MMU context in HV mode.
This may be a bit less efficient, but it is a lot less complicated and
allows the P9 path to trivally implement the workaround too. Newer CPUs
are not subject to this issue.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com
A bunch of PPC files are missing the inclusion of linux/of.h and
linux/irqdomain.h, relying on transitive inclusion from another
file.
As we are about to break this dependency, make sure these dependencies
are explicit.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Merge more updates from Andrew Morton:
"The remainder of the main mm/ queue.
143 patches.
Subsystems affected by this patch series (all mm): pagecache, hugetlb,
userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
kfence"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits)
kfence: use power-efficient work queue to run delayed work
kfence: maximize allocation wait timeout duration
kfence: await for allocation using wait_event
kfence: zero guard page after out-of-bounds access
mm/process_vm_access.c: remove duplicate include
mm/mempool: minor coding style tweaks
mm/highmem.c: fix coding style issue
btrfs: use memzero_page() instead of open coded kmap pattern
iov_iter: lift memzero_page() to highmem.h
mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
mm/zswap.c: switch from strlcpy to strscpy
arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
mm,memory_hotplug: allocate memmap from the added memory range
mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
mm,memory_hotplug: relax fully spanned sections check
drivers/base/memory: introduce memory_block_{online,offline}
mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
...
Merge misc updates from Andrew Morton:
"A few misc subsystems and some of MM.
175 patches.
Subsystems affected by this patch series: ia64, kbuild, scripts, sh,
ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub,
kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap,
mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization,
pagealloc, and memory-failure)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (175 commits)
mm/memory-failure: unnecessary amount of unmapping
mm/mmzone.h: fix existing kernel-doc comments and link them to core-api
mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1
net: page_pool: use alloc_pages_bulk in refill code path
net: page_pool: refactor dma_map into own function page_pool_dma_map
SUNRPC: refresh rq_pages using a bulk page allocator
SUNRPC: set rq_page_end differently
mm/page_alloc: inline __rmqueue_pcplist
mm/page_alloc: optimize code layout for __alloc_pages_bulk
mm/page_alloc: add an array-based interface to the bulk page allocator
mm/page_alloc: add a bulk page allocator
mm/page_alloc: rename alloced to allocated
mm/page_alloc: duplicate include linux/vmalloc.h
mm, page_alloc: avoid page_to_pfn() in move_freepages()
mm/Kconfig: remove default DISCONTIGMEM_MANUAL
mm: page_alloc: dump migrate-failed pages
mm/mempolicy: fix mpol_misplaced kernel-doc
mm/mempolicy: rewrite alloc_pages_vma documentation
mm/mempolicy: rewrite alloc_pages documentation
mm/mempolicy: rename alloc_pages_current to alloc_pages
...
The memory ordering comment no longer applies, because mm_ctx_id is
no longer used anywhere. At best always been difficult to follow.
It's better to consider the load on which the slbmte depends on, which
the MMU depends on before it can start loading TLBs, rather than a
store which may or may not have a subsequent dependency chain to the
slbmte.
So update the comment and we use the load of the mm's user context ID.
This is much more analogous the radix ordering too, which is good.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210421151733.212858-1-npiggin@gmail.com
Define macros to list ppc interrupt types in interttupt.h, replace the
reference of the trap hex values with these macros.
Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S,
arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h.
Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
The lkp bot pointed out that with W=1 we get:
arch/powerpc/mm/book3s64/radix_pgtable.c:183:6: error: no previous
prototype for 'radix__change_memory_range'
Which is really saying that it could be static, make it so.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
flush_dcache_page() is only a few lines, it is worth
inlining.
ia64, csky, mips, openrisc and riscv have a similar
flush_dcache_page() and inline it.
On pmac32_defconfig, we get a small size reduction.
On ppc64_defconfig, we get a very small size increase.
In both case that's in the noise (less than 0.1%).
text data bss dec hex filename
18991155 5934744 1497624 26423523 19330e3 vmlinux64.before
18994829 59367321497624 26429185 1934701 vmlinux64.after
9150963 2467502 184548 11803013 b41985 vmlinux32.before
9149689 2467302 184548 11801539 b413c3 vmlinux32.after
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/21c417488b70b7629dae316539fb7bb8bdef4fdd.1617895813.git.christophe.leroy@csgroup.eu
__flush_dcache_icache() is usable for non HIGHMEM pages on
every platform.
It is only for HIGHMEM pages that BOOKE needs kmap() and
BOOK3S needs flush_dcache_icache_phys().
So make flush_dcache_icache_phys() dependent on CONFIG_HIGHMEM and
call it only when it is a HIGHMEM page.
We could make flush_dcache_icache_phys() available at all time,
but as it is declared NOKPROBE_SYMBOL(), GCC doesn't optimise
it out when it is not used.
So define a stub for !CONFIG_HIGHMEM in order to remove the #ifdef in
flush_dcache_icache_page() and use IS_ENABLED() instead.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/79ed5d7914f497cd5fcd681ca2f4d50a91719455.1617895813.git.christophe.leroy@csgroup.eu
flush_dcache_icache_hugepage() is a static function, with
only one caller. That caller calls it when PageCompound() is true,
so bugging on !PageCompound() is useless if we can trust the
compiler a little. Remove the BUG_ON(!PageCompound()).
The number of elements of a page won't change over time, but
GCC doesn't know about it, so it gets the value at every iteration.
To avoid that, call compound_nr() outside the loop and save it in
a local variable.
Whether the page is a HIGHMEM page or not doesn't change over time.
But GCC doesn't know it so it does the test on every iteration.
Do the test outside the loop.
When the page is not a HIGHMEM page, page_address() will fallback on
lowmem_page_address(), so call lowmem_page_address() directly and
don't suffer the call to page_address() on every iteration.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ab03712b70105fccfceef095aa03007de9295a40.1617895813.git.christophe.leroy@csgroup.eu