linux

Author	SHA1	Message	Date
Bharata B Rao	55548a86eb	powerpc/mm: Limit resize_hpt_for_hotplug() call to hash guests only During memory hotplug and unplug, resize_hpt_for_hotplug() gets called for both hash and radix guests but it should be called only for hash guests. Though the call does nothing in the radix guest case, it is cleaner to push this call into hash specific memory hotplug routines. Reported-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727095704.1432916-1-bharata@linux.ibm.com	2020-07-29 21:02:12 +10:00
Michael Ellerman	773b3e53df	powerpc/mm: Remove custom stack expansion checking We have powerpc specific logic in our page fault handling to decide if an access to an unmapped address below the stack pointer should expand the stack VMA. The logic aims to prevent userspace from doing bad accesses below the stack pointer. However as long as the stack is < 1MB in size, we allow all accesses without further checks. Adding some debug I see that I can do a full kernel build and LTP run, and not a single process has used more than 1MB of stack. So for the majority of processes the logic never even fires. We also recently found a nasty bug in this code which could cause userspace programs to be killed during signal delivery. It went unnoticed presumably because most processes use < 1MB of stack. The generic mm code has also grown support for stack guard pages since this code was originally written, so the most heinous case of the stack expanding into other mappings is now handled for us. Finally although some other arches have special logic in this path, from what I can tell none of x86, arm64, arm and s390 impose any extra checks other than those in expand_stack(). So drop our complicated logic and like other architectures just let the stack expand as long as its within the rlimit. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20200724092528.1578671-4-mpe@ellerman.id.au	2020-07-29 21:02:12 +10:00
Michael Ellerman	63dee5df43	powerpc: Allow 4224 bytes of stack expansion for the signal frame We have powerpc specific logic in our page fault handling to decide if an access to an unmapped address below the stack pointer should expand the stack VMA. The code was originally added in 2004 "ported from 2.4". The rough logic is that the stack is allowed to grow to 1MB with no extra checking. Over 1MB the access must be within 2048 bytes of the stack pointer, or be from a user instruction that updates the stack pointer. The 2048 byte allowance below the stack pointer is there to cover the 288 byte "red zone" as well as the "about 1.5kB" needed by the signal delivery code. Unfortunately since then the signal frame has expanded, and is now 4224 bytes on 64-bit kernels with transactional memory enabled. This means if a process has consumed more than 1MB of stack, and its stack pointer lies less than 4224 bytes from the next page boundary, signal delivery will fault when trying to expand the stack and the process will see a SEGV. The total size of the signal frame is the size of struct rt_sigframe (which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on 64-bit). The 2048 byte allowance was correct until 2008 as the signal frame was: struct rt_sigframe { struct ucontext uc; /* 0 1440 / / --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- / long unsigned int _unused[2]; / 1440 16 / unsigned int tramp[6]; / 1456 24 / struct siginfo pinfo; /* 1480 8 / void puc; /* 1488 8 / struct siginfo info; / 1496 128 / / --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- / char abigap[288]; / 1624 288 / / size: 1920, cachelines: 15, members: 7 / / padding: 8 / }; 1920 + 128 = 2048 Then in commit `ce48b21007` ("powerpc: Add VSX context save/restore, ptrace and signal support") (Jul 2008) the signal frame expanded to 2304 bytes: struct rt_sigframe { struct ucontext uc; / 0 1696 / <-- / --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- / long unsigned int _unused[2]; / 1696 16 / unsigned int tramp[6]; / 1712 24 / struct siginfo pinfo; /* 1736 8 / void puc; /* 1744 8 / struct siginfo info; / 1752 128 / / --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- / char abigap[288]; / 1880 288 / / size: 2176, cachelines: 17, members: 7 / / padding: 8 / }; 2176 + 128 = 2304 At this point we should have been exposed to the bug, though as far as I know it was never reported. I no longer have a system old enough to easily test on. Then in 2010 commit `320b2b8de1` ("mm: keep a guard page below a grow-down stack segment") caused our stack expansion code to never trigger, as there was always a VMA found for a write up to PAGE_SIZE below r1. That meant the bug was hidden as we continued to expand the signal frame in commit `2b0a576d15` ("powerpc: Add new transactional memory state to the signal context") (Feb 2013): struct rt_sigframe { struct ucontext uc; / 0 1696 / / --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- / struct ucontext uc_transact; / 1696 1696 / <-- / --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- / long unsigned int _unused[2]; / 3392 16 / unsigned int tramp[6]; / 3408 24 / struct siginfo pinfo; /* 3432 8 / void puc; /* 3440 8 / struct siginfo info; / 3448 128 / / --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- / char abigap[288]; / 3576 288 / / size: 3872, cachelines: 31, members: 8 / / padding: 8 / / last cacheline: 32 bytes / }; 3872 + 128 = 4000 And commit `573ebfa660` ("powerpc: Increase stack redzone for 64-bit userspace to 512 bytes") (Feb 2014): struct rt_sigframe { struct ucontext uc; / 0 1696 / / --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- / struct ucontext uc_transact; / 1696 1696 / / --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- / long unsigned int _unused[2]; / 3392 16 / unsigned int tramp[6]; / 3408 24 / struct siginfo pinfo; /* 3432 8 / void puc; /* 3440 8 / struct siginfo info; / 3448 128 / / --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- / char abigap[512]; / 3576 512 / <-- / size: 4096, cachelines: 32, members: 8 / / padding: 8 */ }; 4096 + 128 = 4224 Then finally in 2017, commit `1be7107fbe` ("mm: larger stack guard gap, between vmas") exposed us to the existing bug, because it changed the stack VMA to be the correct/real size, meaning our stack expansion code is now triggered. Fix it by increasing the allowance to 4224 bytes. Hard-coding 4224 is obviously unsafe against future expansions of the signal frame in the same way as the existing code. We can't easily use sizeof() because the signal frame structure is not in a header. We will either fix that, or rip out all the custom stack expansion checking logic entirely. Fixes: `ce48b21007` ("powerpc: Add VSX context save/restore, ptrace and signal support") Cc: stable@vger.kernel.org # v2.6.27+ Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724092528.1578671-2-mpe@ellerman.id.au	2020-07-29 21:02:12 +10:00
Christophe Leroy	e54e30bca4	powerpc/ptdump: Refactor update of pg_state In note_page(), the pg_state is updated the same way in two places. Add note_page_update_state() to do it. Also include the display of boundary markers there as it is missing "no level" leg, leading to a mismatch when the first two markers are at the same address and the first displayed area uses that address. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a284a809f01c705bbaab303b06fda216f147a99a.1593429426.git.christophe.leroy@csgroup.eu	2020-07-27 00:01:31 +10:00
Christophe Leroy	846feeace5	powerpc/ptdump: Refactor update of st->last_pa st->last_pa is always updated in note_page() so it can be done outside the if/elseif/else block. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/610d6b1a60ad0bedef865a90153c1110cfaa507e.1593429426.git.christophe.leroy@csgroup.eu	2020-07-27 00:01:30 +10:00
Christophe Leroy	6ca055322d	powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX When STRICT_KERNEL_RWX is set, we want to set NX bit on vmalloc segments. But modules require exec. Use a dedicated segment for modules. There is not much space above kernel, and we don't waste vmalloc space to do alignment. Therefore, we take the segment before PAGE_OFFSET for modules. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eb8faba9148b6cf17c696ba776b4e8ee2f6313bf.1593428200.git.christophe.leroy@csgroup.eu	2020-07-27 00:01:30 +10:00
Christophe Leroy	f1a1f7a15e	powerpc/32s: Kernel space starts at TASK_SIZE Kernel space starts at TASK_SIZE. Select kernel page table when address is over TASK_SIZE. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/893425e32cd0a003539573b2d115e0ffa98bc26c.1593428200.git.christophe.leroy@csgroup.eu	2020-07-27 00:01:30 +10:00
Christophe Leroy	b6be1bb7f7	powerpc/32: Set user/kernel boundary at TASK_SIZE instead of PAGE_OFFSET User space stops at TASK_SIZE. At the moment, kernel space starts at PAGE_OFFSET. In order to use space between TASK_SIZE and PAGE_OFFSET for modules, make TASK_SIZE the limit between user and kernel space. Note that fault.c already considers TASK_SIZE as the boundary between user and kernel space. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b38b52cd8dabbb56fbd6f9219d6f3cdccbb43b44.1593428200.git.christophe.leroy@csgroup.eu	2020-07-27 00:01:30 +10:00
Christophe Leroy	c496433197	powerpc/32s: Only leave NX unset on segments used for modules Instead of leaving NX unset on all segments above the start of vmalloc space, only leave NX unset on segments used for modules. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7172c0f5253419315e434a1816ee3d6ed6505bc0.1593428200.git.christophe.leroy@csgroup.eu	2020-07-27 00:01:30 +10:00
Christophe Leroy	7fbc22ce29	powerpc: Use MODULES_VADDR if defined In order to allow allocation of modules outside of vmalloc space, use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined. Redefine module_alloc() when MODULES_VADDR defined. Unmap corresponding KASAN shadow memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7ecf5fff1eef67d450e73fc412b6ec3818483d75.1593428200.git.christophe.leroy@csgroup.eu	2020-07-27 00:01:30 +10:00
Srikar Dronamraju	dbce456280	powerpc/numa: Limit possible nodes to within num_possible_nodes MAX_NUMNODES is a theoretical maximum number of nodes thats is supported by the kernel. Device tree properties exposes the number of possible nodes on the current platform. The kernel would detected this and would use it for most of its resource allocations. If the platform now increases the nodes to over what was already exposed, then it may lead to inconsistencies. Hence limit it to the already exposed nodes. Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724105809.24733-1-srikar@linux.vnet.ibm.com	2020-07-26 23:34:25 +10:00
Aneesh Kumar K.V	269e829f48	powerpc/book3s64/pkey: Disable pkey on POWER6 and before POWER6 only supports AMR update via privileged mode (MSR[PR] = 0, SPRN_AMR=29) The PR=1 (userspace) alias for that SPR (SPRN_AMR=13) was only supported from POWER7. Since we don't allow userspace modifying of AMR value we should disable pkey support on P6 and before. The hypervisor will still report pkey support via "ibm,processor-storage-keys". Hence also check for P7 CPU_FTR bit to decide on pkey support. Fixes: `f491fe3fb4` ("powerpc/book3s64/pkeys: Simplify the key initialization") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200726132517.399076-1-aneesh.kumar@linux.ibm.com	2020-07-26 23:34:18 +10:00
Santosh Sivaraj	69507b984d	powerpc/mm/hash64: Remove comment that is no longer valid hash_low_64.S was removed in commit `a43c0eb836` ("powerpc/mm: Convert 4k insert from asm to C") and flush_hash_page() is no longer called from any assembly routine. Signed-off-by: Santosh Sivaraj <santosh@fossix.org> [mpe: Tweak comment wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200721091915.205006-1-santosh@fossix.org	2020-07-23 17:43:35 +10:00
Nicholas Piggin	5c9fa16e8a	powerpc/64s: Remove PROT_SAO support ISA v3.1 does not support the SAO storage control attribute required to implement PROT_SAO. PROT_SAO was used by specialised system software (Lx86) that has been discontinued for about 7 years, and is not thought to be used elsewhere, so removal should not cause problems. We rather remove it than keep support for older processors, because live migrating guest partitions to newer processors may not be possible if SAO is in use (or worse allowed with silent races). - PROT_SAO stays in the uapi header so code using it would still build. - arch_validate_prot() is removed, the generic version rejects PROT_SAO so applications would get a failure at mmap() time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop KVM change for the time being] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703011958.1166620-3-npiggin@gmail.com	2020-07-22 00:01:25 +10:00
Aneesh Kumar K.V	482b9b3948	powerpc/book3s64/pkeys: Remove is_pkey_enabled() There is only one caller to this function and the function is wrongly named. Avoid further confusion w.r.t name and open code this at the only call site. Also remove read_uamor(). There are no users for the same after this. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-24-aneesh.kumar@linux.ibm.com	2020-07-22 00:01:22 +10:00
Aneesh Kumar K.V	e0d8e991be	powerpc/book3s64/kuap: Move UAMOR setup to key init function UAMOR values are not application-specific. The kernel initializes its value based on different reserved keys. Remove the thread-specific UAMOR value and don't switch the UAMOR on context switch. Move UAMOR initialization to key initialization code and remove thread_struct.uamor because it is not used anymore. Before commit: `4a4a5e5d2a` ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") we used to update uamor based on key allocation and free. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V	000a42b35a	powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec As we kexec across kernels that use AMR/IAMR for different purposes we need to ensure that new kernels get kexec'd with a reset value of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old AMR value prevents access to key 0. This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR is not needed and the IAMR reset is partial (it doesn't do the reset on secondary cpus) and is redundant with this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-19-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V	7cdd3745f2	powerpc/book3s64/keys: Print information during boot. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-18-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V	f7045a4511	powerpc/book3s64/pkeys: Use MMU_FTR_PKEY instead of pkey_disabled static key Instead of pkey_disabled static key use mmu feature MMU_FTR_PKEY. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-17-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V	2daf298de7	powerpc/book3s64/pkeys: Use pkey_execute_disable_supported Use pkey_execute_disable_supported to check for execute key support instead of pkey_disabled. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-16-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:59 +10:00
Aneesh Kumar K.V	e10cc8715d	powerpc/book3s64/kuep: Add MMU_FTR_KUEP This will be used to enable/disable Kernel Userspace Execution Prevention (KUEP). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-15-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V	d3cd91fb8d	powerpc/book3s64/pkeys: Add MMU_FTR_PKEY Parse storage keys related device tree entry in early_init_devtree and enable MMU feature MMU_FTR_PKEY if pkeys are supported. MMU feature is used instead of CPU feature because this enables us to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-14-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V	3e4352aeb8	powerpc/book3s64/pkeys: Mark all the pkeys above max pkey as reserved The hypervisor can return less than max allowed pkey (for ex: 31) instead of 32. We should mark all the pkeys above max allowed as reserved so that we avoid the allocation of the wrong pkey(for ex: key 31 in the above case) by userspace. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-13-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V	3c8ab47362	powerpc/book3s64/pkeys: Make initial_allocation_mask static initial_allocation_mask is not used outside this file. Also mark reserved_allocation_mask and initial_allocation_mask __ro_after_init; Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-12-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V	c529afd7cb	powerpc/book3s64/pkeys: Convert pkey_total to num_pkey num_pkey now represents max number of keys supported such that we return to userspace 0 - num_pkey - 1. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-11-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V	a4678d4b47	powerpc/book3s64/pkeys: Simplify pkey disable branch Make the default value FALSE (pkey enabled) and set to TRUE when we find the total number of keys supported to be zero. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-10-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V	718d9b3801	powerpc/book3s64/pkeys: Prevent key 1 modification from userspace. Key 1 is marked reserved by ISA. Setup uamor to prevent userspace modification of the same. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-8-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:58 +10:00
Aneesh Kumar K.V	f491fe3fb4	powerpc/book3s64/pkeys: Simplify the key initialization Add documentation explaining the execute_only_key. The reservation and initialization mask details are also explained in this patch. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-7-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:57 +10:00
Aneesh Kumar K.V	1f404058e2	powerpc/book3s64/pkeys: Explain key 1 reservation details This explains the details w.r.t key 1. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-6-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:57 +10:00
Aneesh Kumar K.V	d79e7a5f26	powerpc/book3s64/pkeys: Use PVR check instead of cpu feature We are wrongly using CPU_FTRS_POWER8 to check for P8 support. Instead, we should use PVR value. Now considering we are using CPU_FTRS_POWER8, that implies we returned true for P9 with older firmware. Keep the same behavior by checking for P9 PVR value. Fixes: `cf43d3b264` ("powerpc: Enable pkey subsystem") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-2-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:57 +10:00
Aneesh Kumar K.V	af9d00e93a	powerpc/mm/radix: Create separate mappings for hot-plugged memory To enable memory unplug without splitting kernel page table mapping, we force the max mapping size to the LMB size. LMB size is the unit in which hypervisor will do memory add/remove operation. Pseries systems supports max LMB size of 256MB. Hence on pseries, we now end up mapping memory with 2M page size instead of 1G. To improve that we want hypervisor to hint the kernel about the hotplug memory range. That was added that as part of commit `b6eca183e2` ("powerpc/kernel: Enables memory hot-remove after reboot on pseries guests") But PowerVM doesn't provide that hint yet. Once we get PowerVM updated, we can then force the 2M mapping only to hot-pluggable memory region using memblock_is_hotpluggable(). Till then let's depend on LMB size for finding the mapping page size for linear range. With this change KVM guest will also be doing linear mapping with 2M page size. The actual TLB benefit of mapping guest page table entries with hugepage size can only be materialized if the partition scoped entries are also using the same or higher page size. A guest using 1G hugetlbfs backing guest memory can have a performance impact with the above change. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Fold in fix from Aneesh spotted by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-5-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:56 +10:00
Bharata B Rao	d6d6ebfc5d	powerpc/mm/radix: Remove split_kernel_mapping() We split the page table mapping on memory unplug if the linear range was mapped with huge page mapping (for ex: 1G) The page table splitting code has a few issues: 1. Recursive locking -------------------- Memory unplug path takes cpu_hotplug_lock and calls stop_machine() for splitting the mappings. However stop_machine() takes cpu_hotplug_lock again causing deadlock. 2. BUG: sleeping function called from in_atomic() context --------------------------------------------------------- Memory unplug path (remove_pagetable) takes init_mm.page_table_lock spinlock and later calls stop_machine() which does wait_for_completion() 3. Bad unlock unbalance ----------------------- Memory unplug path takes init_mm.page_table_lock spinlock and calls stop_machine(). The stop_machine thread function runs in a different thread context (migration thread) which tries to release and reaquire ptl. Releasing ptl from a different thread than which acquired it causes bad unlock unbalance. These problems can be avoided if we avoid mapping hot-plugged memory with 1G mapping, thereby removing the need for splitting them during unplug. The kernel always make sure the minimum unplug request is SUBSECTION_SIZE for device memory and SECTION_SIZE for regular memory. In preparation for such a change remove page table splitting support. This essentially is a revert of commit `4dd5f8a99e` ("powerpc/mm/radix: Split linear mapping on hot-unplug") Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-4-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:56 +10:00
Bharata B Rao	9ce8853b4a	powerpc/mm/radix: Free PUD table when freeing pagetable remove_pagetable() isn't freeing PUD table. This causes memory leak during memory unplug. Fix this. Fixes: `4b5d62ca17` ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-3-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:56 +10:00
Aneesh Kumar K.V	645d5ce2f7	powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings We can hit the following BUG_ON during memory unplug: kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries NIP [c000000000093308] pmd_fragment_free+0x48/0xc0 LR [c00000000147bfec] remove_pagetable+0x578/0x60c Call Trace: 0xc000008050000000 (unreliable) remove_pagetable+0x384/0x60c radix__remove_section_mapping+0x18/0x2c remove_section_mapping+0x1c/0x3c arch_remove_memory+0x11c/0x180 try_remove_memory+0x120/0x1b0 __remove_memory+0x20/0x40 dlpar_remove_lmb+0xc0/0x114 dlpar_memory+0x8b0/0xb20 handle_dlpar_errorlog+0xc0/0x190 pseries_hp_work_fn+0x2c/0x60 process_one_work+0x30c/0x810 worker_thread+0x98/0x540 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x74 This occurs when unplug is attempted for such memory which has been mapped using memblock pages as part of early kernel page table setup. We wouldn't have initialized the PMD or PTE fragment count for those PMD or PTE pages. This can be fixed by allocating memory in PAGE_SIZE granularity during early page table allocation. This makes sure a specific page is not shared for another memblock allocation and we can free them correctly on removing page-table pages. Since we now do PAGE_SIZE allocations for both PUD table and PMD table (Note that PTE table allocation is already of PAGE_SIZE), we end up allocating more memory for the same amount of system RAM. Here is a comparision of how much more we need for a 64T and 2G system after this patch: 1. 64T system ------------- 64T RAM would need 64G for vmemmap with struct page size being 64B. 128 PUD tables for 64T memory (1G mappings) 1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings) With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)4K=772K With PAGE_SIZE(64K) table allocations, (128+1+64)64K=12352K 2. 2G system ------------ 2G RAM would need 2M for vmemmap with struct page size being 64B. 1 PUD table for 2G memory (1G mapping) 1 PUD table and 1 PMD table for 2M vmemmap (2M mappings) With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)4K=12K With new PAGE_SIZE(64K) table allocations, (1+1+1)64K=192K Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-2-aneesh.kumar@linux.ibm.com	2020-07-20 22:57:56 +10:00
Michael Ellerman	ef9f7cfaa5	Merge branch 'fixes' into next Merge our fixes branch, primarily to bring in the ebb selftests build fix and the pkey fix, which is a dependency for some future work.	2020-07-18 22:43:55 +10:00
Nathan Lynch	cdf082c457	powerpc/numa: remove arch_update_cpu_topology Since arch_update_cpu_topology() doesn't do anything on powerpc now, remove it and associated dead code. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-15-nathanl@linux.ibm.com	2020-07-16 13:12:39 +10:00
Nathan Lynch	042ef7cc43	powerpc/numa: remove prrn_is_enabled() All users of this prrn_is_enabled() are gone; remove it. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-14-nathanl@linux.ibm.com	2020-07-16 13:12:39 +10:00
Nathan Lynch	1835303e56	powerpc/numa: remove start/stop_topology_update() These APIs have become no-ops, so remove them and all call sites. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-12-nathanl@linux.ibm.com	2020-07-16 13:12:38 +10:00
Nathan Lynch	b1815aeac7	powerpc/numa: remove timed_topology_update() timed_topology_update is a no-op now, so remove it and all call sites. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-11-nathanl@linux.ibm.com	2020-07-16 13:12:37 +10:00
Nathan Lynch	893ec6461f	powerpc/numa: stub out numa_update_cpu_topology() Previous changes have removed the code which sets bits in cpu_associativity_changes_mask and thus it is never modifed at runtime. From this we can reason that numa_update_cpu_topology() always returns 0 without doing anything. Remove the body of numa_update_cpu_topology() and remove all code which becomes unreachable as a result. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-10-nathanl@linux.ibm.com	2020-07-16 13:12:37 +10:00
Nathan Lynch	9fb8b5fd1b	powerpc/numa: remove vphn_enabled and prrn_enabled internal flags These flags are always zero now; remove them and suitably adjust the remaining references to them. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-9-nathanl@linux.ibm.com	2020-07-16 13:12:37 +10:00
Nathan Lynch	6325cb4a4e	powerpc/numa: remove unreachable topology workqueue code Since vphn_enabled is always 0, we can remove the call to topology_schedule_update() and remove the code which becomes unreachable as a result. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-8-nathanl@linux.ibm.com	2020-07-16 13:12:36 +10:00
Nathan Lynch	50e0cf3742	powerpc/numa: remove unreachable topology timer code Since vphn_enabled is always 0, we can stub out timed_topology_update() and remove the code which becomes unreachable. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-7-nathanl@linux.ibm.com	2020-07-16 13:12:36 +10:00
Nathan Lynch	e6eacf8eb4	powerpc/numa: make vphn_enabled, prrn_enabled flags const Previous changes have made it so these flags are never changed; enforce this by making them const. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-6-nathanl@linux.ibm.com	2020-07-16 13:12:36 +10:00
Nathan Lynch	7d35bef96a	powerpc/numa: remove unreachable topology update code Since the topology_updates_enabled flag is now always false, remove it and the code which has become unreachable. This is the minimum change that prevents 'defined but unused' warnings emitted by the compiler after stubbing out the start/stop_topology_updates() functions. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-5-nathanl@linux.ibm.com	2020-07-16 13:12:35 +10:00
Nathan Lynch	c30f931e89	powerpc/numa: remove ability to enable topology updates Remove the /proc/powerpc/topology_updates interface and the topology_updates=on/off command line argument. The internal topology_updates_enabled flag remains for now, but always false. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-4-nathanl@linux.ibm.com	2020-07-16 13:12:35 +10:00
Nicholas Piggin	dd3d9aa558	powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when !GTSE When platform doesn't support GTSE, let TLB invalidation requests for radix guests be off-loaded to the host using H_RPT_INVALIDATE hcall. [hcall wrapper, error path handling and renames] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703053608.12884-4-bharata@linux.ibm.com	2020-07-16 13:00:21 +10:00
Bharata B Rao	029ab30b4c	powerpc/mm: Enable radix GTSE only if supported. Make GTSE an MMU feature and enable it by default for radix. However for guest, conditionally enable it if hypervisor supports it via OV5 vector. Let prom_init ask for radix GTSE only if the support exists. Having GTSE as an MMU feature will make it easy to enable radix without GTSE. Currently radix assumes GTSE is enabled by default. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703053608.12884-2-bharata@linux.ibm.com	2020-07-16 13:00:21 +10:00
Christophe Leroy	41ea93cf7b	powerpc/kasan: Fix shadow pages allocation failure Doing kasan pages allocation in MMU_init is too early, kernel doesn't have access yet to the entire memory space and memblock_alloc() fails when the kernel is a bit big. Do it from kasan_init() instead. Fixes: `2edb16efc8` ("powerpc/32: Add KASAN support") Fixes: `d2a91cef9b` ("powerpc/kasan: Fix shadow pages allocation failure") Cc: stable@vger.kernel.org Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181 Link: https://lore.kernel.org/r/63048fcea8a1c02f75429ba3152f80f7853f87fc.1593690707.git.christophe.leroy@csgroup.eu	2020-07-15 12:04:39 +10:00
Christophe Leroy	b506923ee4	Revert "powerpc/kasan: Fix shadow pages allocation failure" This reverts commit `d2a91cef9b`. This commit moved too much work in kasan_init(). The allocation of shadow pages has to be moved for the reason explained in that patch, but the allocation of page tables still need to be done before switching to the final hash table. First revert the incorrect commit, following patch redoes it properly. Fixes: `d2a91cef9b` ("powerpc/kasan: Fix shadow pages allocation failure") Cc: stable@vger.kernel.org Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181 Link: https://lore.kernel.org/r/3667deb0911affbf999b99f87c31c77d5e870cd2.1593690707.git.christophe.leroy@csgroup.eu	2020-07-15 12:04:39 +10:00

1 2 3 4 5 ...

2469 Commits