linux

Author	SHA1	Message	Date
Oliver O'Halloran	42de19d5ef	powerpc/pseries/eeh: Allow zero to be a valid PE configuration address There's no real reason why zero can't be a valid PE configuration address. Under qemu each sPAPR PHB (i.e. EEH supporting) has the passed-though devices on bus zero, so the PE address of bus <dddd>:00 should be zero. However, all previous versions of Linux will reject that, so Qemu at least goes out of it's way to avoid it. The Qemu implementation of ibm,get-config-addr-info2 RTAS has the following comment: > /* > * We always have PE address of form "00BB0001". "BB" > * represents the bus number of PE's primary bus. > */ So qemu puts a one into the register portion of the PE's config_addr to avoid it being zero. The whole is pretty silly considering that RTAS will return a negative error code if it can't map the device's config_addr to a PE. This patch fixes Linux to treat zero as a valid PE address. This shouldn't have any real effects due to the Qemu hack mentioned above. And the fact that Linux EEH has worked historically on PowerVM means they never pass through devices on bus zero so we would never see the problem there either. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-8-oohall@gmail.com	2020-10-06 23:22:25 +11:00
Oliver O'Halloran	98ba956f6a	powerpc/pseries/eeh: Rework device EEH PE determination The process Linux uses for determining if a device supports EEH or not appears to be at odds with what PAPR says the OS should be doing. The current flow is something like: 1. Assume pe_config_addr is equal the the device's config_addr. 2. Attempt to enable EEH on that PE 3. Verify EEH was enabled (POWER4 bug workaround) 4. Try find the pe_config_addr using the ibm,get-config-addr-info2 RTAS call. 5. If that fails walk the pci_dn tree upwards trying to find a parent device with EEH support. If we find one then add the device to that PE. The first major problem with this process is that we need the PE config address in step 2) since its needs to be passed to the ibm,set-eeh-option RTAS call when enabling EEH for th PE. We hack around this requirement in by making the assumption in 1) and delay finding the actual PE address until 4). This is fine if: a) The PCI device is the 0th function, and b) The device is on the PE's root bus. Granted, the current sequence does appear to work on most systems even when these conditions are false. At a guess PowerVM's RTAS has workarounds to accommodate Linux's quirks or the RTAS call to enable EEH is treated as no-op on most platforms since EEH is usually enabled by default. However, what is currently implemented is a bit sketch and is downright confusing since it doesn't match up with what what PAPR suggests we should be doing. This patch re-works how we handle EEH init so that we find the PE config address using the ibm,get-config-addr-info2 RTAS call first, then use the found address to finish the EEH init process. It also drops the Power4 workaround since as of commit `471d7ff8b5` ("powerpc/64s: Remove POWER4 support") the kernel does not support running on a Power4 CPU so there's no need to support the Power4 platform's quirks either. With the patch applied the sequence is now: 1. Find the pe_config_addr from the device using the RTAS call. 2. Enable the PE. 3. Insert the edev into the tree and create an eeh_pe if needed. The other change made here is ignoring unsupported devices entirely. Currently the device's BARs are saved to the eeh_dev even if the device is not part of an EEH PE. Not being part of a PE means that an EEH recovery pass will never see that device so the saving the BARs is pointless. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-7-oohall@gmail.com	2020-10-06 23:22:25 +11:00
Oliver O'Halloran	f61c859feb	powerpc/pseries/eeh: Clean up pe_config_addr lookups De-duplicate, and fix up the comments, and make the prototype just take a pci_dn since the job of the function is to return the pe_config_addr of the PE which contains a given device. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-6-oohall@gmail.com	2020-10-06 23:22:25 +11:00
Oliver O'Halloran	395ee2a2a1	powerpc/eeh: Move EEH initialisation to an arch initcall The initialisation of EEH mostly happens in a core_initcall_sync initcall, followed by registering a bus notifier later on in an arch_initcall. Anything involving initcall dependecies is mostly incomprehensible unless you've spent a while staring at code so here's the full sequence: ppc_md.setup_arch <-- pci_controllers are created here ...time passes... core_initcall <-- pci_dns are created from DT nodes core_initcall_sync <-- platforms call eeh_init() postcore_initcall <-- PCI bus type is registered postcore_initcall_sync arch_initcall <-- EEH pci_bus notifier registered subsys_initcall <-- PHBs are scanned here There's no real requirement to do the EEH setup at the core_initcall_sync level. It just needs to be done after pci_dn's are created and before we start scanning PHBs. Simplify the flow a bit by moving the platform EEH inititalisation to an arch_initcall so we can fold the bus notifier registration into eeh_init(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-5-oohall@gmail.com	2020-10-06 23:22:25 +11:00
Oliver O'Halloran	5d69e46a21	powerpc/eeh: Delete eeh_ops->init No longer used since the platforms perform their EEH initialisation before calling eeh_init(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-4-oohall@gmail.com	2020-10-06 23:22:25 +11:00
Oliver O'Halloran	1f8fa0cd6a	powerpc/pseries: Stop using eeh_ops->init() Fold pseries_eeh_init() into eeh_pseries_init() rather than having eeh_init() call it via eeh_ops->init(). It's simpler and it'll let us delete eeh_ops.init. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-3-oohall@gmail.com	2020-10-06 23:22:24 +11:00
Oliver O'Halloran	82a1ea21f1	powerpc/powernv: Stop using eeh_ops->init() Fold pnv_eeh_init() into eeh_powernv_init() rather than having eeh_init() call it via eeh_ops->init(). It's simpler and it'll let us delete eeh_ops.init. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-2-oohall@gmail.com	2020-10-06 23:22:24 +11:00
Oliver O'Halloran	d125aedb40	powerpc/eeh: Rework EEH initialisation Drop the EEH register / unregister ops thing and have the platform pass the ops structure into eeh_init() directly. This takes one initcall out of the EEH setup path and it means we're only doing EEH setup on the platforms which actually support it. It's also less code and generally easier to follow. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918093050.37344-1-oohall@gmail.com	2020-10-06 23:22:24 +11:00
Christoph Hellwig	874dc62f54	powerpc: switch 85xx defconfigs from legacy ide to libata Switch the 85xx defconfigs from the soon to be removed legacy ide driver to libata. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200924041310.520970-1-hch@lst.de	2020-10-06 23:22:24 +11:00
Daniel Axtens	5c5e46dad9	powerpc: PPC_SECURE_BOOT should not require PowerNV In commit `61f879d97c` ("powerpc/pseries: Detect secure and trusted boot state of the system.") we taught the kernel how to understand the secure-boot parameters used by a pseries guest. However, CONFIG_PPC_SECURE_BOOT still requires PowerNV. I didn't catch this because pseries_le_defconfig includes support for PowerNV and so everything still worked. Indeed, most configs will. Nonetheless, technically PPC_SECURE_BOOT doesn't require PowerNV any more. The secure variables support (PPC_SECVAR_SYSFS) doesn't do anything on pSeries yet, but I don't think it's worth adding a new condition - at some stage we'll want to add a backend for pSeries anyway. Fixes: `61f879d97c` ("powerpc/pseries: Detect secure and trusted boot state of the system.") Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200924014922.172914-1-dja@axtens.net	2020-10-06 23:22:24 +11:00
Wang Wensheng	4366337490	powerpc/papr_scm: Fix warnings about undeclared variable Build the kernel with 'make C=2': arch/powerpc/platforms/pseries/papr_scm.c:825:1: warning: symbol 'dev_attr_perf_stats' was not declared. Should it be static? Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200918085951.44983-1-wangwensheng4@huawei.com	2020-10-06 23:22:24 +11:00
Nicholas Piggin	455575533c	powerpc/64: make restore_interrupts 64e only This is not used by 64s. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200915114650.3980244-5-npiggin@gmail.com	2020-10-06 23:22:24 +11:00
Nicholas Piggin	903dd1ff45	powerpc/64e: remove 64s specific interrupt soft-mask code Since the assembly soft-masking code was moved to 64e specific, there are some 64s specific interrupt types still there. Remove them. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200915114650.3980244-4-npiggin@gmail.com	2020-10-06 23:22:23 +11:00
Nicholas Piggin	012a9a97a8	powerpc/64e: remove PACA_IRQ_EE_EDGE This is not used anywhere. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200915114650.3980244-3-npiggin@gmail.com	2020-10-06 23:22:23 +11:00
Nicholas Piggin	2b48e96be2	powerpc/64: fix irq replay pt_regs->softe value Replayed interrupts get an "artificial" struct pt_regs constructed to pass to interrupt handler functions. This did not get the softe field set correctly, it's as though the interrupt has hit while irqs are disabled. It should be IRQS_ENABLED. This is possibly harmless, asynchronous handlers should not be testing if irqs were disabled, but it might be possible for example some code is shared with synchronous or NMI handlers, and it makes more sense if debug output looks at this. Fixes: `3282a3da25` ("powerpc/64: Implement soft interrupt replay in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200915114650.3980244-2-npiggin@gmail.com	2020-10-06 23:22:23 +11:00
Nicholas Piggin	903fd31d32	powerpc/64: fix irq replay missing preempt Prior to commit `3282a3da25` ("powerpc/64: Implement soft interrupt replay in C"), replayed interrupts returned by the regular interrupt exit code, which performs preemption in case an interrupt had set need_resched. This logic was missed by the conversion. Adding preempt_disable/enable around the interrupt replay and final irq enable will reschedule if needed. Fixes: `3282a3da25` ("powerpc/64: Implement soft interrupt replay in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200915114650.3980244-1-npiggin@gmail.com	2020-10-06 23:22:23 +11:00
Nicholas Piggin	cdb1ea0276	powerpc/pseries: add new branch prediction security bits for link stack The hypervisor interface has defined branch prediction security bits for handling the link stack. Wire them up. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075612.224656-1-npiggin@gmail.com	2020-10-06 23:22:23 +11:00
Nicholas Piggin	05504b4256	powerpc/64s: Add cp_abort after tlbiel to invalidate copy-buffer address The copy buffer is implemented as a real address in the nest which is translated from EA by copy, and used for memory access by paste. This requires that it be invalidated by TLB invalidation. TLBIE does invalidate the copy buffer, but TLBIEL does not. Add cp_abort to the tlbiel sequence. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup whitespace and comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200916030234.4110379-2-npiggin@gmail.com	2020-10-06 23:22:23 +11:00
Nicholas Piggin	9983efa83b	powerpc: untangle cputable mce include Having cputable.h include mce.h means it pulls in a bunch of low level headers (e.g., synch.h) which then can't use CPU_FTR_ definitions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200916030234.4110379-1-npiggin@gmail.com	2020-10-06 23:22:22 +11:00
Mahesh Salgaonkar	aea948bb80	powerpc/powernv/elog: Fix race while processing OPAL error log event. Every error log reported by OPAL is exported to userspace through a sysfs interface and notified using kobject_uevent(). The userspace daemon (opal_errd) then reads the error log and acknowledges the error log is saved safely to disk. Once acknowledged the kernel removes the respective sysfs file entry causing respective resources to be released including kobject. However it's possible the userspace daemon may already be scanning elog entries when a new sysfs elog entry is created by the kernel. User daemon may read this new entry and ack it even before kernel can notify userspace about it through kobject_uevent() call. If that happens then we have a potential race between elog_ack_store->kobject_put() and kobject_uevent which can lead to use-after-free of a kernfs object resulting in a kernel crash. eg: BUG: Unable to handle kernel data access on read at 0x6b6b6b6b6b6b6bfb Faulting instruction address: 0xc0000000008ff2a0 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV CPU: 27 PID: 805 Comm: irq/29-opal-elo Not tainted 5.9.0-rc2-gcc-8.2.0-00214-g6f56a67bcbb5-dirty #363 ... NIP kobject_uevent_env+0xa0/0x910 LR elog_event+0x1f4/0x2d0 Call Trace: 0x5deadbeef0000122 (unreliable) elog_event+0x1f4/0x2d0 irq_thread_fn+0x4c/0xc0 irq_thread+0x1c0/0x2b0 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x6c This patch fixes this race by protecting the sysfs file creation/notification by holding a reference count on kobject until we safely send kobject_uevent(). The function create_elog_obj() returns the elog object which if used by caller function will end up in use-after-free problem again. However, the return value of create_elog_obj() function isn't being used today and there is no need as well. Hence change it to return void to make this fix complete. Fixes: `774fea1a38` ("powerpc/powernv: Read OPAL error log and export it through sysfs") Cc: stable@vger.kernel.org # v3.15+ Reported-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Rework the logic to use a single return, reword comments, add oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201006122051.190176-1-mpe@ellerman.id.au	2020-10-06 23:22:22 +11:00
Dan Williams	ec6347bb43	x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com	2020-10-06 11:18:04 +02:00
Christoph Hellwig	9f4df96b87	dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h> Move more nitty gritty DMA implementation details into the common internal header. Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-06 07:07:06 +02:00
Christoph Hellwig	0a0f0d8be7	dma-mapping: split <linux/dma-mapping.h> Split out all the bits that are purely for dma_map_ops implementations and related code into a new <linux/dma-map-ops.h> header so that they don't get pulled into all the drivers. That also means the architecture specific <asm/dma-mapping.h> is not pulled in by <linux/dma-mapping.h> any more, which leads to a missing includes that were pulled in by the x86 or arm versions in a few not overly portable drivers. Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-10-06 07:07:03 +02:00
David S. Miller	8b0308fe31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-05 18:40:01 -07:00
Vladimir Oltean	e69eb0824d	powerpc: dts: t1040rdb: add ports for Seville Ethernet switch Define the network interface names for the switch ports and hook them up to the 2 QSGMII PHYs that are onboard. A conscious decision was taken to go along with the numbers that are written on the front panel of the board and not with the hardware numbers of the switch chip ports. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 17:02:42 -07:00
Vladimir Oltean	aa3098676c	powerpc: dts: t1040: add bindings for Seville Ethernet switch Add the description of the embedded L2 switch inside the SoC dtsi file for NXP T1040. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-03 17:02:42 -07:00
Christoph Hellwig	c3973b401e	mm: remove compat_process_vm_{readv,writev} Now that import_iovec handles compat iovecs, the native syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-10-03 00:02:15 -04:00
Christoph Hellwig	598b3cec83	fs: remove compat_sys_vmsplice Now that import_iovec handles compat iovecs, the native vmsplice syscall can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-10-03 00:02:15 -04:00
Christoph Hellwig	5f764d624a	fs: remove the compat readv/writev syscalls Now that import_iovec handles compat iovecs, the native readv and writev syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-10-03 00:02:14 -04:00
David S. Miller	1f25c9bbfd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2020-09-29 The following pull-request contains BPF updates for your net tree. We've added 7 non-merge commits during the last 14 day(s) which contain a total of 7 files changed, 28 insertions(+), 8 deletions(-). The main changes are: 1) fix xdp loading regression in libbpf for old kernels, from Andrii. 2) Do not discard packet when NETDEV_TX_BUSY, from Magnus. 3) Fix corner cases in libbpf related to endianness and kconfig, from Tony. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 01:49:20 -07:00
He Zhe	9cf51446e6	bpf, powerpc: Fix misuse of fallthrough in bpf_jit_comp() The user defined label following "fallthrough" is not considered by GCC and causes build failure. kernel-source/include/linux/compiler_attributes.h:208:41: error: attribute 'fallthrough' not preceding a case label or default label [-Werror] 208 define fallthrough _attribute((fallthrough_)) ^~~~~~~~~~~~~ Fixes: `df561f6688` ("treewide: Use fallthrough pseudo-keyword") Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/bpf/20200928090023.38117-1-zhe.he@windriver.com	2020-09-29 16:39:11 +02:00
Thomas Gleixner	981aa1d366	PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS The unconditional selection of PCI_MSI_ARCH_FALLBACKS has an unmet dependency because PCI_MSI_ARCH_FALLBACKS is defined in a 'if PCI' clause. As it is only relevant when PCI_MSI is enabled, update the affected architecture Kconfigs to make the selection of PCI_MSI_ARCH_FALLBACKS depend on 'if PCI_MSI'. Fixes: `077ee78e39` ("PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Links: https://lore.kernel.org/r/cdfd63305caa57785b0925dd24c0711ea02c8527.camel@redhat.com	2020-09-28 16:03:10 +02:00
Christoph Hellwig	efa70f2fdc	dma-mapping: add a new dma_alloc_pages API This API is the equivalent of alloc_pages, except that the returned memory is guaranteed to be DMA addressable by the passed in device. The implementation will also be used to provide a more sensible replacement for DMA_ATTR_NON_CONSISTENT flag. Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages as its backend. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> (MIPS part)	2020-09-25 06:20:47 +02:00
Christoph Hellwig	8c1c6c7588	Merge branch 'master' of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into dma-mapping-for-next Pull in the latest 5.9 tree for the commit to revert the V4L2_FLAG_MEMORY_NON_CONSISTENT uapi addition.	2020-09-25 06:19:19 +02:00
Masahiro Yamada	596b0474d3	kbuild: preprocess module linker script There was a request to preprocess the module linker script like we do for the vmlinux one. (https://lkml.org/lkml/2020/8/21/512) The difference between vmlinux.lds and module.lds is that the latter is needed for external module builds, thus must be cleaned up by 'make mrproper' instead of 'make clean'. Also, it must be created by 'make modules_prepare'. You cannot put it in arch/$(SRCARCH)/kernel/, which is cleaned up by 'make clean'. I moved arch/$(SRCARCH)/kernel/module.lds to arch/$(SRCARCH)/include/asm/module.lds.h, which is included from scripts/module.lds.S. scripts/module.lds is fine because 'make clean' keeps all the build artifacts under scripts/. You can add arch-specific sections in <asm/module.lds.h>. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Jessica Yu <jeyu@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Jessica Yu <jeyu@kernel.org>	2020-09-25 00:36:41 +09:00
Christoph Hellwig	028abd9222	fs: remove compat_sys_mount compat_sys_mount is identical to the regular sys_mount now, so remove it and use the native version everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-22 23:45:57 -04:00
Paolo Bonzini	2e3df760cd	PPC KVM update for 5.10 - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJfaXWyAAoJEJ2a6ncsY3GfcBwH/A/8qEXwzRZifASJMCVNyDho iGU3ku2I6Ui9zC1PpcbAJ0Eu7ZAdUXpqrdyKOJLHquZGhKH4Jl7Cjq5ZpEXzGP4v 22QJA8ek9HWAbaeV+N9Q1zpWUCRBGR+Onm5g9KE7BG5/eUaQDukpHLpDNXl3nT95 zlmaiMYYYYhgSQKBh3HQp5nhYMVpwToq14EsJV6sZ99nJhrjtXjx3MsoCU03+h+k 9y1FwBVkS2XQ0deYQuFYSgVNCF1gmK8lBbKL1Zly2MYJhQDZbiX/VJrtGW/ls5kl KbzWYxQHZ46NH6SSfwXLbBZa5reS+s1va/Q9RJL9/lCncn5iYhcCJ1sVBHLIq4U= =nYru -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.10 - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes	2020-09-22 08:18:16 -04:00
Wang Wensheng	cf59eb13e1	KVM: PPC: Book3S: Fix symbol undeclared warnings Build the kernel with `C=2`: arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol 'kvmhv_alloc_nested' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol 'kvmppc_radix_set_pte_at' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol 'kvmhv_p9_guest_entry' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol 'iommu_tce_kill_rm' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol 'kvmppc_tce_iommu_do_map' was not declared. Should it be static? arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr' was not declared. Should it be static? Those symbols are used only in the files that define them so make them static to fix the warnings. Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-22 11:53:55 +10:00
Jing Xiangfeng	eb173559c9	KVM: PPC: Book3S: Remove redundant initialization of variable ret The variable ret is being initialized with '-ENOMEM' that is meaningless. So remove it. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-22 11:53:55 +10:00
Qinglang Miao	4517076608	KVM: PPC: Book3S HV: XIVE: Convert to DEFINE_SHOW_ATTRIBUTE Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-22 11:53:55 +10:00
Linus Torvalds	5a55d36f71	powerpc fixes for 5.9 #5 Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing crashes. Fix a long standing bug in our DMA mask handling that was hidden until recently, and which caused problems with some drivers. Fix a boot failure on systems with large amounts of RAM, and no hugepage support and using Radix MMU, only seen in the lab. A few other minor fixes. Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy, Hari Bathini, Ira Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav Jain, Vaidyanathan Srinivasan. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl9kk4UTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLM5D/42Wuuq6hOpGEfE2XjBjsOxCR07SCw9 CQ/c72jw/1tDLe0YclPVYAZc8BmT8uLo6tE2Ot+0vI+1Y06rRvC5g5uQBwp1zD/t MOwC0d4zf8a7WzBCcVaBv9HMHVOaKaTBwQc2R4k6NzYtARIf5m0evMPOWINioRsv /x4+Np8aeQd1WiVn6PBdqL8w1yRhk8LsVDvX35lFzQlgZvH09umXSGjw9K442xdE lr1PrV9GKd4DeudwLHPkMNs8Ul1QTxmY5vKIAklsJ5g3dBySfM7+GMrTzYOHYkWr aqGfGH6ojdFSQZRo7QwFhO52Kni7JN7AIoUPEBDqLb1fR10w8wesdjCs8JQMXIMc 8Eo210EbiSsq6kG/LzqZuStLUAup3rQd20+wWua7jo8HbcZOLDH7pPGwNtJInPTr gwH7sALYhTzFUAO4LzqVVE+yA8wndFPHoz+QSkO6LZJKuszON2LJ0r+IEx1l9Wmr ClZYujK36/N+ih42xBFcBZi2lL0VKzlkn7u7NDeZQBONgoJMCoWkGDkEHnMI9zdT 1iYcVZyY+IpJLcwxH+NP8weJKHIZbP4kvuFXR/QIpHdrDWKaTT95BvBhAK1KMYBo lLo81zMTzJCz2wWDg/+3sLuEzHdGr//uxcVXxjqE2vyEckeuZjiCyz0dRNEY4Sw3 vB2p3Zl7L0J4pQ== =cj6B -----END PGP SIGNATURE----- Merge tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.9: - Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing crashes. - Fix a long standing bug in our DMA mask handling that was hidden until recently, and which caused problems with some drivers. - Fix a boot failure on systems with large amounts of RAM, and no hugepage support and using Radix MMU, only seen in the lab. - A few other minor fixes. Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy, Hari Bathini, Ira Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav Jain, and Vaidyanathan Srinivasan" * tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute cpuidle: pseries: Fix CEDE latency conversion from tb to us powerpc/dma: Fix dma_map_ops::get_required_mask Revert "powerpc/build: vdso linker warning for orphan sections" powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc selftests/powerpc: Skip PROT_SAO test in guests/LPARS powerpc/book3s64/radix: Fix boot failure with large amount of guest memory	2020-09-18 11:48:25 -07:00
Cédric Le Goater	ebbfeef0d8	powerpc/32: Declare stack_overflow_exception() prototype This fixes a compile error with W=1. CC arch/powerpc/kernel/traps.o ../arch/powerpc/kernel/traps.c:1663:6: error: no previous prototype for ‘stack_overflow_exception’ [-Werror=missing-prototypes] void stack_overflow_exception(struct pt_regs *regs) ^~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `3978eb7851` ("powerpc/32: Add early stack overflow detection with VMAP stack.") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914211007.2285999-8-clg@kaod.org	2020-09-18 20:05:25 +10:00
Cédric Le Goater	2228f19cf9	powerpc/xive: Make debug routines static This fixes a compile error with W=1. CC arch/powerpc/sysdev/xive/common.o ../arch/powerpc/sysdev/xive/common.c:1568:6: error: no previous prototype for ‘xive_debug_show_cpu’ [-Werror=missing-prototypes] void xive_debug_show_cpu(struct seq_file m, int cpu) ^~~~~~~~~~~~~~~~~~~ ../arch/powerpc/sysdev/xive/common.c:1602:6: error: no previous prototype for ‘xive_debug_show_irq’ [-Werror=missing-prototypes] void xive_debug_show_irq(struct seq_file m, u32 hw_irq, struct irq_data *d) ^~~~~~~~~~~~~~~~~~~ Fixes: `930914b7d5` ("powerpc/xive: Add a debugfs file to dump internal XIVE state") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914211007.2285999-5-clg@kaod.org	2020-09-18 20:05:25 +10:00
Cédric Le Goater	5ab187e01a	powerpc/sstep: Remove empty if statement checking for invalid form The check should be performed by the caller. This fixes a compile error with W=1. ../arch/powerpc/lib/sstep.c: In function ‘mlsd_8lsd_ea’: ../arch/powerpc/lib/sstep.c:225:3: error: suggest braces around empty body in an ‘if’ statement [-Werror=empty-body] ; /* Invalid form. Should already be checked for by caller! */ ^ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914211007.2285999-4-clg@kaod.org	2020-09-18 20:05:24 +10:00
Cédric Le Goater	7b2aab5f22	powerpc/sysfs: Remove unused 'err' variable in sysfs_create_dscr_default() This fixes a compile error with W=1. arch/powerpc/kernel/sysfs.c: In function ‘sysfs_create_dscr_default’: arch/powerpc/kernel/sysfs.c:228:7: error: variable ‘err’ set but not used [-Werror=unused-but-set-variable] int err = 0; ^~~ cc1: all warnings being treated as errors Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914211007.2285999-2-clg@kaod.org	2020-09-18 20:05:24 +10:00
Qinglang Miao	8ec5cb12cd	powerpc/powernv: fix wrong warning message in opalcore_config_init() The logic of the warn output is incorrect. The two args should be exchanged. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200916062129.190864-1-miaoqinglang@huawei.com	2020-09-18 19:59:45 +10:00
Michael Ellerman	6c71cfcc01	powerpc/prom_init: Check display props exist before enabling btext It's possible to enable CONFIG_PPC_EARLY_DEBUG_BOOTX for a pseries kernel (maybe it shouldn't be), which is then booted with qemu/slof. But if you do that the kernel crashes in draw_byte(), with a DAR pointing somewhere near INT_MAX. Adding some debug to prom_init we see that we're not able to read the "address" property from OF, so we're just using whatever junk value was on the stack. So check the properties can be read properly from OF, if not we bail out before initialising btext, which avoids the crash. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Link: https://lore.kernel.org/r/20200821103407.3362149-1-mpe@ellerman.id.au	2020-09-18 19:59:44 +10:00
Michael Ellerman	39f8756145	powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self() We have smp_ops->cpu_die() and ppc_md.cpu_die(). One of them offlines the current CPU and one offlines another CPU, can you guess which is which? Also one is in smp_ops and one is in ppc_md? So rename ppc_md.cpu_die(), to cpu_offline_self(), because that's what it does. And move it into smp_ops where it belongs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200819015634.1974478-3-mpe@ellerman.id.au	2020-09-18 19:59:43 +10:00
Michael Ellerman	bf3c1464db	powerpc/smp: Fold cpu_die() into its only caller Avoid the eternal confusion between cpu_die() and __cpu_die() by removing the former, folding it into its only caller. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200819015634.1974478-2-mpe@ellerman.id.au	2020-09-18 19:59:43 +10:00
Michael Ellerman	1ea21ba231	powerpc: Move arch_cpu_idle_dead() into smp.c arch_cpu_idle_dead() is in idle.c, which makes sense, but it's inside a CONFIG_HOTPLUG_CPU block. It would be more at home in smp.c, inside the existing CONFIG_HOTPLUG_CPU block. Note that CONFIG_HOTPLUG_CPU depends on CONFIG_SMP so even though smp.c is not built for SMP=n builds, that's fine. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200819015634.1974478-1-mpe@ellerman.id.au	2020-09-18 19:59:43 +10:00
Michael Ellerman	d10ebe79df	powerpc/perf: Add declarations to fix sparse warnings Sparse warns about all the init functions: symbol init_ppc970_pmu was not declared. Should it be static? symbol init_power5p_pmu was not declared. Should it be static? symbol init_power5_pmu was not declared. Should it be static? symbol init_power6_pmu was not declared. Should it be static? symbol init_power7_pmu was not declared. Should it be static? symbol init_power9_pmu was not declared. Should it be static? symbol init_power8_pmu was not declared. Should it be static? symbol init_generic_compat_pmu was not declared. Should it be static? They're already declared in internal.h, so just make sure all the C files include that directly or indirectly. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://lore.kernel.org/r/20200916115637.3100484-2-mpe@ellerman.id.au	2020-09-18 19:59:43 +10:00
Michael Ellerman	ef1edbba52	powerpc/mm/64s: Fix slb_setup_new_exec() sparse warning Sparse says: symbol slb_setup_new_exec was not declared. Should it be static? No, it should have a declaration in a header, add one. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200916115637.3100484-1-mpe@ellerman.id.au	2020-09-18 19:59:43 +10:00
Liu Shixin	96543e7352	powerpc/pseries: convert to use DEFINE_SEQ_ATTRIBUTE macro Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200916025026.3992835-1-liushixin2@huawei.com	2020-09-18 19:59:43 +10:00
Yang Yingliang	bda7673d64	powerpc/book3s64: fix link error with CONFIG_PPC_RADIX_MMU=n Fix link error when CONFIG_PPC_RADIX_MMU is disabled: powerpc64-linux-gnu-ld: arch/powerpc/platforms/pseries/lpar.o:(.toc+0x0): undefined reference to `mmu_pid_bits' Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200917020643.90375-1-yangyingliang@huawei.com	2020-09-18 19:59:43 +10:00
Michael Ellerman	0b30191b27	Merge branch 'topic/irqs-off-activate-mm' into next Merge Nick's series to add ARCH_WANT_IRQS_OFF_ACTIVATE_MM.	2020-09-18 18:14:06 +10:00
Michael Ellerman	d208e13c6a	powerpc/process: Fix uninitialised variable error Clang, and GCC with -Wmaybe-uninitialized, can't see that val is unused in get_fpexec_mode(): arch/powerpc/kernel/process.c:1940:7: error: variable 'val' is used uninitialized whenever 'if' condition is true if (cpu_has_feature(CPU_FTR_SPE)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ We know that CPU_FTR_SPE will only be true iff CONFIG_SPE is also true, but the compiler doesn't. Avoid it by initialising val to zero. Reported-by: kernel test robot <lkp@intel.com> Fixes: `532ed1900d` ("powerpc/process: Remove useless #ifdef CONFIG_SPE") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20200917024509.3253837-1-mpe@ellerman.id.au	2020-09-18 18:12:46 +10:00
Christoph Hellwig	cc7886d25b	compat: lift compat_s64 and compat_u64 to <asm-generic/compat.h> lift the compat_s64 and compat_u64 definitions into common code using the COMPAT_FOR_U64_ALIGNMENT symbol for the x86 special case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-17 13:00:46 -04:00
Paul Mackerras	35dfb43c24	KVM: PPC: Book3S HV: Set LPCR[HDICE] before writing HDEC POWER8 and POWER9 machines have a hardware deviation where generation of a hypervisor decrementer exception is suppressed if the HDICE bit in the LPCR register is 0 at the time when the HDEC register decrements from 0 to -1. When entering a guest, KVM first writes the HDEC register with the time until it wants the CPU to exit the guest, and then writes the LPCR with the guest value, which includes HDICE = 1. If HDEC decrements from 0 to -1 during the interval between those two events, it is possible that we can enter the guest with HDEC already negative but no HDEC exception pending, meaning that no HDEC interrupt will occur while the CPU is in the guest, or at least not until HDEC wraps around. Thus it is possible for the CPU to keep executing in the guest for a long time; up to about 4 seconds on POWER8, or about 4.46 years on POWER9 (except that the host kernel hard lockup detector will fire first). To fix this, we set the LPCR[HDICE] bit before writing HDEC on guest entry. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-17 11:38:17 +10:00
Fabiano Rosas	05e6295dc7	KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest The current nested KVM code does not support HPT guests. This is informed/enforced in some ways: - Hosts < P9 will not be able to enable the nested HV feature; - The nested hypervisor MMU capabilities will not contain KVM_CAP_PPC_MMU_HASH_V3; - QEMU reflects the MMU capabilities in the 'ibm,arch-vec-5-platform-support' device-tree property; - The nested guest, at 'prom_parse_mmu_model' ignores the 'disable_radix' kernel command line option if HPT is not supported; - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT. There is, however, still a way to start a HPT guest by using max-compat-cpu=power8 at the QEMU machine options. This leads to the guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB ioctl. With the guest set to hash, the nested hypervisor goes through the entry path that has no knowledge of nesting (kvmppc_run_vcpu) and crashes when it tries to execute an hypervisor-privileged (mtspr HDEC) instruction at __kvmppc_vcore_entry: root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ... <snip> [ 538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1 [ 538.543355] NIP: c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0 [ 538.543417] REGS: c0000013e91e33b0 TRAP: 0700 Not tainted (5.9.0-rc4) [ 538.543470] MSR: 8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 22422882 XER: 20040000 [ 538.543546] CFAR: c00800000753f4b0 IRQMASK: 3 GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000 GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72 GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0 GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008 GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8 GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000 GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001 GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001 [ 538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv] [ 538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv] [ 538.544173] Call Trace: [ 538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable) [ 538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv] [ 538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv] [ 538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm] [ 538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm] [ 538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm] [ 538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90 [ 538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0 [ 538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c [ 538.544787] Instruction dump: [ 538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002 [ 538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd [ 538.544953] ---[ end trace 74423e2b948c2e0c ]--- This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in the nested hypervisor, causing QEMU to abort. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-17 11:38:17 +10:00
Greg Kurz	4e1b2ab7e6	KVM: PPC: Don't return -ENOTSUPP to userspace in ioctls ENOTSUPP is a linux only thingy, the value of which is unknown to userspace, not to be confused with ENOTSUP which linux maps to EOPNOTSUPP, as permitted by POSIX [1]: [EOPNOTSUPP] Operation not supported on socket. The type of socket (address family or protocol) does not support the requested operation. A conforming implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP]. Return -EOPNOTSUPP instead of -ENOTSUPP for the following ioctls: - KVM_GET_FPU for Book3s and BookE - KVM_SET_FPU for Book3s and BookE - KVM_GET_DIRTY_LOG for BookE This doesn't affect QEMU which doesn't call the KVM_GET_FPU and KVM_SET_FPU ioctls on POWER anyway since they are not supported, and _buggily_ ignores anything but -EPERM for KVM_GET_DIRTY_LOG. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-17 11:38:17 +10:00
Thomas Gleixner	077ee78e39	PCI/MSI: Make arch_._msi_irq[s] fallbacks selectable The arch_._msi_irq[s] fallbacks are compiled in whether an architecture requires them or not. Architectures which are fully utilizing hierarchical irq domains should never call into that code. It's not only architectures which depend on that by implementing one or more of the weak functions, there is also a bunch of drivers which relies on the weak functions which invoke msi_controller::setup_irq[s] and msi_controller::teardown_irq. Make the architectures and drivers which rely on them select them in Kconfig and if not selected replace them by stub functions which emit a warning and fail the PCI/MSI interrupt allocation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112333.992429909@linutronix.de	2020-09-16 16:52:37 +02:00
Srikar Dronamraju	fa35e868f9	powerpc/smp: Implement cpu_to_coregroup_id Lookup the coregroup id from the associativity array. If unable to detect the coregroup id, fallback on the core id. This way, ensure sched_domain degenerates and an extra sched domain is not created. Ideally this function should have been implemented in arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we don't need to find the primary domain again. If the device-tree mentions more than one coregroup, then kernel implements only the last or the smallest coregroup, which currently corresponds to the penultimate domain in the device-tree. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-11-srikar@linux.vnet.ibm.com	2020-09-16 22:13:32 +10:00
Srikar Dronamraju	72730bfc2a	powerpc/smp: Create coregroup domain Add percpu coregroup maps and masks to create coregroup domain. If a coregroup doesn't exist, the coregroup domain will be degenerated in favour of SMT/CACHE domain. Do note this patch is only creating stubs for cpu_to_coregroup_id. The actual cpu_to_coregroup_id implementation would be in a subsequent patch. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-10-srikar@linux.vnet.ibm.com	2020-09-16 22:13:32 +10:00
Srikar Dronamraju	6e08630281	powerpc/smp: Allocate cpumask only after searching thread group If allocated earlier and the search fails, then cpu_l1_cache_map cpumask is unnecessarily cleared. However cpu_l1_cache_map can be allocated / cleared after we search thread group. Please note CONFIG_CPUMASK_OFFSTACK is not set on Powerpc. Hence cpumask allocated by zalloc_cpumask_var_node is never freed. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-9-srikar@linux.vnet.ibm.com	2020-09-16 22:13:32 +10:00
Srikar Dronamraju	f9f130ff2e	powerpc/numa: Detect support for coregroup Add support for grouping cores based on the device-tree classification. - The last domain in the associativity domains always refers to the core. - If primary reference domain happens to be the penultimate domain in the associativity domains device-tree property, then there are no coregroups. However if its not a penultimate domain, then there are coregroups. There can be more than one coregroup. For now we would be interested in the last or the smallest coregroups, i.e one sub-group per DIE. Currently there are no firmwares that are exposing this grouping. Hence allow the basis for grouping to be abstract. Once the firmware starts using this grouping, code would be added to detect the type of grouping and adjust the sd domain flags accordingly. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-8-srikar@linux.vnet.ibm.com	2020-09-16 22:13:31 +10:00
Srikar Dronamraju	caa8e29da5	powerpc/smp: Optimize start_secondary In start_secondary, even if shared_cache was already set, system does a redundant match for cpumask. This redundant check can be removed by checking if shared_cache is already set. While here, localize the sibling_mask variable to within the if condition. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-7-srikar@linux.vnet.ibm.com	2020-09-16 22:13:31 +10:00
Srikar Dronamraju	f6606cfdfb	powerpc/smp: Dont assume l2-cache to be superset of sibling Current code assumes that cpumask of cpus sharing a l2-cache mask will always be a superset of cpu_sibling_mask. Lets stop that assumption. cpu_l2_cache_mask is a superset of cpu_sibling_mask if and only if shared_caches is set. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200913171038.GB11808@linux.vnet.ibm.com	2020-09-16 22:13:31 +10:00
Srikar Dronamraju	3c6032a8fe	powerpc/smp: Move topology fixups into a new function Move topology fixup based on the platform attributes into its own function which is called just before set_sched_topology. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-5-srikar@linux.vnet.ibm.com	2020-09-16 22:13:31 +10:00
Srikar Dronamraju	5e93f16ae4	powerpc/smp: Move powerpc_topology above Just moving the powerpc_topology description above. This will help in using functions in this file and avoid declarations. No other functional changes Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-4-srikar@linux.vnet.ibm.com	2020-09-16 22:13:31 +10:00
Srikar Dronamraju	2ef0ca54d9	powerpc/smp: Merge Power9 topology with Power topology A new sched_domain_topology_level was added just for Power9. However the same can be achieved by merging powerpc_topology with power9_topology and makes the code more simpler especially when adding a new sched domain. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-3-srikar@linux.vnet.ibm.com	2020-09-16 22:13:31 +10:00
Srikar Dronamraju	d0fd24bbd2	powerpc/smp: Fix a warning under !NEED_MULTIPLE_NODES Fix a build warning in a non CONFIG_NEED_MULTIPLE_NODES "error: _numa_cpu_lookup_table_ undeclared" Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200810071834.92514-2-srikar@linux.vnet.ibm.com	2020-09-16 22:13:30 +10:00
Srikar Dronamraju	e75130f20b	powerpc/numa: Offline memoryless cpuless node 0 Currently Linux kernel with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause numa_balancing to be enabled on systems with only one node with memory and CPUs. The existence of this dummy node which is cpuless and memoryless node can confuse users/scripts looking at output of lscpu / numactl. By marking, node 0 as offline, lets stop assuming that node 0 is always online. If node 0 has CPU or memory that are online, node 0 will again be set as online. v5.8 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files ------------------ /sys/devices/system/node/online: 0,2 /proc/sys/kernel/numa_balancing: 1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.8 + patch ------------------ available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files ------------------ /sys/devices/system/node/online: 2 /proc/sys/kernel/numa_balancing: 0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Example of a node with online CPUs/memory on node 0. (Same o/p with and without patch) numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 0 size: 32482 MB node 0 free: 22994 MB node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 node 3 size: 0 MB node 3 free: 0 MB node distances: node 0 1 2 3 0: 10 20 40 40 1: 20 10 40 40 2: 40 40 10 20 3: 40 40 20 10 Note: On Powerpc, cpu_to_node of possible but not present cpus would previously return 0. Hence this commit depends on commit ("powerpc/numa: Set numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id queried from vphn"). Without the 2 commits, Powerpc system might crash. 1. User space applications like Numactl, lscpu, that parse the sysfs tend to believe there is an extra online node. This tends to confuse users and applications. Other user space applications start believing that system was not able to use all the resources (i.e missing resources) or the system was not setup correctly. 2. Also existence of dummy node also leads to inconsistent information. The number of online nodes is inconsistent with the information in the device-tree and resource-dump 3. When the dummy node is present, single node non-Numa systems end up showing up as NUMA systems and numa_balancing gets enabled. This will mean we take the hit from the unnecessary numa hinting faults. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200818081104.57888-4-srikar@linux.vnet.ibm.com	2020-09-16 22:05:20 +10:00
Srikar Dronamraju	6398eaa268	powerpc/numa: Prefer node id queried from vphn Node id queried from the static device tree may not be correct. For example: it may always show 0 on a shared processor. Hence prefer the node id queried from vphn and fallback on the device tree based node id if vphn query fails. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200818081104.57888-3-srikar@linux.vnet.ibm.com	2020-09-16 22:05:19 +10:00
Srikar Dronamraju	a874f1005e	powerpc/numa: Set numa_node for all possible cpus A Powerpc system with multiple possible nodes and with CONFIG_NUMA enabled always used to have a node 0, even if node 0 does not any cpus or memory attached to it. As per PAPR, node affinity of a cpu is only available once its present / online. For all cpus that are possible but not present, cpu_to_node() would point to node 0. To ensure a cpuless, memoryless dummy node is not online, powerpc need to make sure all possible but not present cpu_to_node are set to a proper node. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200818081104.57888-2-srikar@linux.vnet.ibm.com	2020-09-16 22:05:19 +10:00
Srikar Dronamraju	67df77845c	powerpc/numa: Restrict possible nodes based on platform As per draft LoPAPR (Revision 2.9_pre7), section B.5.3 "Run Time Abstraction Services (RTAS) Node" available at: https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf ... there are 2 device tree properties: "ibm,max-associativity-domains" which defines the maximum number of domains that the firmware i.e PowerVM can support. and: "ibm,current-associativity-domains" which defines the maximum number of domains that the current platform can support. The value of "ibm,max-associativity-domains" is always greater than or equal to "ibm,current-associativity-domains" property. If the latter property is not available, use "ibm,max-associativity-domain" as a fallback. In this yet to be released LoPAPR, "ibm,current-associativity-domains" is mentioned in page 833 / B.5.3 which is covered under under "Appendix B. System Binding" section Currently powerpc uses the "ibm,max-associativity-domains" property while setting the possible number of nodes. This is currently set at 32. However the possible number of nodes for a platform may be significantly less. Hence set the possible number of nodes based on "ibm,current-associativity-domains" property. Nathan Lynch had raised a valid concern that post LPM (Live Partition Migration), a user could DLPAR add processors and memory after LPM with "new" associativity properties: https://lore.kernel.org/linuxppc-dev/871rljfet9.fsf@linux.ibm.com/t/#u He also pointed out that "ibm,max-associativity-domains" has the same contents on all currently available PowerVM systems, unlike "ibm,current-associativity-domains" and hence may be better able to handle the new NUMA associativity properties. However with the recent commit `dbce456280` ("powerpc/numa: Limit possible nodes to within num_possible_nodes"), all new NUMA associativity properties are capped to initially set nr_node_ids. Hence this commit should be safe with any new DLPAR add post LPM. $ lsprop /proc/device-tree/rtas/ibm,associ-domains /proc/device-tree/rtas/ibm,current-associativity-domains 00000005 00000001 00000002 00000002 00000002 00000010 /proc/device-tree/rtas/ibm,max-associativity-domains 00000005 00000001 00000008 00000020 00000020 00000100 $ cat /sys/devices/system/node/possible ##Before patch 0-31 $ cat /sys/devices/system/node/possible ##After patch 0-1 Note the maximum nodes this platform can support is only 2 but the possible nodes is set to 32. This is important because lot of kernel and user space code allocate structures for all possible nodes leading to a lot of memory that is allocated but not used. I ran a simple experiment to create and destroy 100 memory cgroups on boot on a 8 node machine (Power8 Alpine). Before patch: free -k at boot total used free shared buff/cache available Mem: 523498176 4106816 518820608 22272 570752 516606720 Swap: 4194240 0 4194240 free -k after creating 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4628416 518246464 22336 623296 516058688 Swap: 4194240 0 4194240 free -k after destroying 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4697408 518173760 22400 627008 515987904 Swap: 4194240 0 4194240 After patch: free -k at boot total used free shared buff/cache available Mem: 523498176 3969472 518933888 22272 594816 516731776 Swap: 4194240 0 4194240 free -k after creating 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4181888 518676096 22208 640192 516496448 Swap: 4194240 0 4194240 free -k after destroying 100 memory cgroups total used free shared buff/cache available Mem: 523498176 4232320 518619904 22272 645952 516443264 Swap: 4194240 0 4194240 Observations: Fixed kernel takes 137344 kb (4106816-3969472) less to boot. Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> [mpe: Reformat change log a bit for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200817055257.110873-1-srikar@linux.vnet.ibm.com	2020-09-16 22:05:19 +10:00
Srikar Dronamraju	f3232321db	powerpc/topology: Override cpu_smt_mask On Power9, a pair of SMT4 cores can be presented by the firmware as a SMT8 core for backward compatibility reasons, with the fusion of two SMT4 cores. Powerpc allows LPARs to be live migrated from Power8 to Power9. Existing software developed/configured for Power8, expects to see a SMT8 core. In order to maintain userspace backward compatibility (with Power8 chips in case of Power9) in enterprise Linux systems, the topology_sibling_cpumask has to be set to SMT8 core. cpu_smt_mask() should generally point to the cpu mask of the SMT4 core. Hence override the default cpu_smt_mask() to be powerpc specific allowing for better scheduling behaviour on Power. schbench (latency measured in usecs, so lesser is better) Without patch With patch Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 34 50.0000th: 38 75.0000th: 47 75.0000th: 52 90.0000th: 54 90.0000th: 60 95.0000th: 57 95.0000th: 64 99.0000th: 62 99.0000th: 72 99.5000th: 65 99.5000th: 75 99.9000th: 76 99.9000th: 3452 min=0, max=9205 min=0, max=9344 schbench (With Cede disabled) Without patch With patch Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 20 50.0000th: 21 75.0000th: 28 75.0000th: 29 90.0000th: 33 90.0000th: 34 95.0000th: 35 95.0000th: 37 99.0000th: 40 99.0000th: 40 99.5000th: 48 99.5000th: 42 99.9000th: 94 99.9000th: 79 min=0, max=791 min=0, max=791 perf bench sched pipe usec/ops : lesser is better Without patch N Min Max Median Avg Stddev 101 5.095113 5.595269 5.204842 5.2298776 0.10762713 5.10 - 5.15 : ################################################## 23% (24) 5.15 - 5.20 : ############################################# 21% (22) 5.20 - 5.25 : ################################################## 23% (24) 5.25 - 5.30 : ######################### 11% (12) 5.30 - 5.35 : ########## 4% (5) 5.35 - 5.40 : ######## 3% (4) 5.40 - 5.45 : ######## 3% (4) 5.45 - 5.50 : #### 1% (2) 5.50 - 5.55 : ## 0% (1) 5.55 - 5.60 : #### 1% (2) With patch N Min Max Median Avg Stddev 101 5.134675 8.524719 5.207658 5.2780985 0.34911969 5.1 - 5.5 : ################################################## 94% (95) 5.5 - 5.8 : ## 3% (4) 5.8 - 6.2 : 0% (1) 6.2 - 6.5 : 6.5 - 6.8 : 6.8 - 7.2 : 7.2 - 7.5 : 7.5 - 7.8 : 7.8 - 8.2 : 8.2 - 8.5 : perf bench sched pipe (cede disabled) usec/ops : lesser is better Without patch N Min Max Median Avg Stddev 101 7.884227 12.576538 7.956474 8.0170722 0.46159054 7.9 - 8.4 : ################################################## 99% (100) 8.4 - 8.8 : 8.8 - 9.3 : 9.3 - 9.8 : 9.8 - 10.2 : 10.2 - 10.7 : 10.7 - 11.2 : 11.2 - 11.6 : 11.6 - 12.1 : 12.1 - 12.6 : With patch N Min Max Median Avg Stddev 101 7.956021 8.217284 8.015615 8.0283866 0.049844967 7.96 - 7.98 : ###################### 12% (13) 7.98 - 8.01 : ################################################## 28% (29) 8.01 - 8.03 : #################################### 20% (21) 8.03 - 8.06 : ######################### 14% (15) 8.06 - 8.09 : ###################### 12% (13) 8.09 - 8.11 : ###### 3% (4) 8.11 - 8.14 : ### 1% (2) 8.14 - 8.17 : ### 1% (2) 8.17 - 8.19 : 8.19 - 8.22 : # 0% (1) Observations: With the patch, the initial run/iteration takes a slight longer time. This can be attributed to the fact that now we pick a CPU from a idle core which could be sleep mode. Once we remove the cede, state the numbers improve in favour of the patch. ebizzy: transactions per second (higher is better) without patch N Min Max Median Avg Stddev 100 1018433 1304470 1193208 1182315.7 60018.733 1018433 - 1047037 : ###### 3% (3) 1047037 - 1075640 : ######## 4% (4) 1075640 - 1104244 : ######## 4% (4) 1104244 - 1132848 : ############### 7% (7) 1132848 - 1161452 : #################################### 17% (17) 1161452 - 1190055 : ########################## 12% (12) 1190055 - 1218659 : ############################################# 21% (21) 1218659 - 1247263 : ################################################## 23% (23) 1247263 - 1275866 : ######## 4% (4) 1275866 - 1304470 : ######## 4% (4) with patch N Min Max Median Avg Stddev 100 967014 1292938 1208819 1185281.8 69815.851 967014 - 999606 : ## 1% (1) 999606 - 1032199 : ## 1% (1) 1032199 - 1064791 : ############ 6% (6) 1064791 - 1097384 : ########## 5% (5) 1097384 - 1129976 : ################## 9% (9) 1129976 - 1162568 : #################### 10% (10) 1162568 - 1195161 : ########################## 13% (13) 1195161 - 1227753 : ############################################ 22% (22) 1227753 - 1260346 : ################################################## 25% (25) 1260346 - 1292938 : ############## 7% (7) Observations: Not much changes, ebizzy is not much impacted. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200807074517.27957-2-srikar@linux.vnet.ibm.com	2020-09-16 22:05:19 +10:00
Vaibhav Jain	ca78ef2f08	powerpc/papr_scm: Fix warning triggered by perf_stats_show() A warning is reported by the kernel in case perf_stats_show() returns an error code. The warning is of the form below: papr_scm ibm,persistent-memory:ibm,pmemory@44100001: Failed to query performance stats, Err:-10 dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count On investigation it looks like that the compiler is silently truncating the return value of drc_pmem_query_stats() from 'long' to 'int', since the variable used to store the return code 'rc' is an 'int'. This truncated value is then returned back as a 'ssize_t' back from perf_stats_show() to 'dev_attr_show()' which thinks of it as a large unsigned number and triggers this warning.. To fix this we update the type of variable 'rc' from 'int' to 'ssize_t' that prevents the compiler from truncating the return value of drc_pmem_query_stats() and returning correct signed value back from perf_stats_show(). Fixes: `2d02bf835e` ("powerpc/papr_scm: Fetch nvdimm performance stats from PHYP") Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200912081451.66225-1-vaibhav@linux.ibm.com	2020-09-16 22:05:03 +10:00
Nicholas Piggin	a665eec0a2	powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm Commit `0cef77c779` ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") added a mechanism to trim the mm_cpumask of a process under certain conditions. One of the assumptions is that mm_users would not be incremented via a reference outside the process context with mmget_not_zero() then go on to kthread_use_mm() via that reference. That invariant was broken by io_uring code (see previous sparc64 fix), but I'll point Fixes: to the original powerpc commit because we are changing that assumption going forward, so this will make backports match up. Fix this by no longer relying on that assumption, but by having each CPU check the mm is not being used, and clearing their own bit from the mask only if it hasn't been switched-to by the time the IPI is processed. This relies on commit `38cf307c1f` ("mm: fix kthread_use_mm() vs TLB invalidate") and ARCH_WANT_IRQS_OFF_ACTIVATE_MM to disable irqs over mm switch sequences. Fixes: `0cef77c779` ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Depends-on: `38cf307c1f` ("mm: fix kthread_use_mm() vs TLB invalidate") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-5-npiggin@gmail.com	2020-09-16 12:24:37 +10:00
Nicholas Piggin	66acd46080	powerpc: select ARCH_WANT_IRQS_OFF_ACTIVATE_MM powerpc uses IPIs in some situations to switch a kernel thread away from a lazy tlb mm, which is subject to the TLB flushing race described in the changelog introducing ARCH_WANT_IRQS_OFF_ACTIVATE_MM. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-3-npiggin@gmail.com	2020-09-16 12:24:37 +10:00
Cédric Le Goater	3a3181e16f	powerpc/pci: unmap legacy INTx interrupts when a PHB is removed When a passthrough IO adapter is removed from a pseries machine using hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS to clear all page table entries related to the adapter. If some are still present, the RTAS call which isolates the PCI slot returns error 9001 "valid outstanding translations" and the removal of the IO adapter fails. This is because when the PHBs are scanned, Linux maps automatically the INTx interrupts in the Linux interrupt number space but these are never removed. To solve this problem, we introduce a PPC platform specific pcibios_remove_bus() routine which clears all interrupt mappings when the bus is removed. This also clears the associated page table entries of the ESB pages when using XIVE. For this purpose, we record the logical interrupt numbers of the mapped interrupt under the PHB structure and let pcibios_remove_bus() do the clean up. Since some PCI adapters, like GPUs, use the "interrupt-map" property to describe interrupt mappings other than the legacy INTx interrupts, we can not restrict the size of the mapping array to PCI_NUM_INTX. The number of interrupt mappings is computed from the "interrupt-map" property and the mapping array is allocated accordingly. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200807101854.844619-1-clg@kaod.org	2020-09-15 22:13:39 +10:00
Nicholas Piggin	ffd2961bb4	powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10 This driver does not restore stop > 3 state, so it limits itself to states which do not lose full state or TB. The POWER10 SPRs are sufficiently different from P9 that it seems easier to split out the P10 code. The POWER10 deep sleep code (e.g., the BHRB restore) has been taken out, but it can be re-added when stop > 3 support is added. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Pratik Rajesh Sampat<psampat@linux.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Reviewed-by: Pratik Rajesh Sampat<psampat@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200819094700.493399-1-npiggin@gmail.com	2020-09-15 22:13:38 +10:00
Aneesh Kumar K.V	79b123cdf9	powerepc/book3s64/hash: Align start/end address correctly with bolt mapping This ensures we don't do a partial mapping of memory. With nvdimm, when creating namespaces with size not aligned to 16MB, the kernel ends up partially mapping the pages. This can result in kernel adding multiple hash page table entries for the same range. A new namespace will result in create_section_mapping() with start and end overlapping an already existing bolted hash page table entry. commit: `6acd7d5ef2` ("libnvdimm/namespace: Enforce memremap_compat_align()") made sure that we always create namespaces aligned to 16MB. But we can do better by avoiding mapping pages that are not aligned. This helps to catch access to these partially mapped pages early. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200907072539.67310-1-aneesh.kumar@linux.ibm.com	2020-09-15 22:13:38 +10:00
Jason Yan	bbc4f40b53	powerpc/ps3: make two symbols static This addresses the following sparse warning: arch/powerpc/platforms/ps3/spu.c:451:33: warning: symbol 'spu_management_ps3_ops' was not declared. Should it be static? arch/powerpc/platforms/ps3/spu.c:592:28: warning: symbol 'spu_priv1_ps3_ops' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200911020121.1464585-1-yanaijie@huawei.com	2020-09-15 22:13:38 +10:00
Christophe Leroy	4c42dc5c69	powerpc/kasan: Fix CONFIG_KASAN_VMALLOC for 8xx Before the commit identified below, pages tables allocation was performed after the allocation of final shadow area for linear memory. But that commit switched the order, leading to page tables being already allocated at the time 8xx kasan_init_shadow_8M() is called. Due to this, kasan_init_shadow_8M() doesn't map the needed shadow entries because there are already page tables. kasan_init_shadow_8M() installs huge PMD entries instead of page tables. We could at that time free the page tables, but there is no point in creating page tables that get freed before being used. Only book3s/32 hash needs early allocation of page tables. For other variants, we can keep the initial order and create remaining page tables after the allocation of final shadow memory for linear mem. Move back the allocation of shadow page tables for CONFIG_KASAN_VMALLOC into kasan_init() after the loop which creates final shadow memory for linear mem. Fixes: `41ea93cf7b` ("powerpc/kasan: Fix shadow pages allocation failure") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8ae4554357da4882612644a74387ae05525b2aaa.1599800716.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:37 +10:00
Christophe Leroy	2c637d2df4	powerpc/powermac: Fix low_sleep_handler with KUAP and KUEP low_sleep_handler() has an hardcoded restore of segment registers that doesn't take KUAP and KUEP into account. Use head_32's load_segment_registers() routine instead. Fixes: `a68c31fc01` ("powerpc/32s: Implement Kernel Userspace Access Protection") Fixes: `31ed2b13c4` ("powerpc/32s: Implement Kernel Userspace Execution Prevention.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/21b05f7298c1b18f73e6e5b4cd5005aafa24b6da.1599820109.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:37 +10:00
Christophe Leroy	c83c192a6f	powerpc/process: Remove useless #ifdef CONFIG_PPC_FPU Add a stub for __giveup_fpu() when CONFIG_PPC_FPU is not selected, as done for CONFIG_SPE and CONFIG_ALTIVEC. This allows to remove some #ifdef CONFIG_PPC_FPU. Also change one to IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/69c8b7954ceeccc6b849e52e1fa41b3a0f10f6c1.1597643221.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:36 +10:00
Christophe Leroy	532ed1900d	powerpc/process: Remove useless #ifdef CONFIG_SPE cpu_has_feature(CPU_FTR_SPE) returns false when CONFIG_SPE is not set. There is no need to enclose the test in an #ifdef CONFIG_SPE. Remove it. CPU_FTR_SPE only exists on 32 bits. Define it as 0 on 64 bits. We have a couple of places like: #ifdef CONFIG_SPE if (cpu_has_feature(CPU_FTR_SPE)) { do_something_that_requires_CONFIG_SPE } else { return -EINVAL; } #else return -EINVAL; #endif Replace them by a cleaner version: if (cpu_has_feature(CPU_FTR_SPE)) { #ifdef CONFIG_SPE do_something_that_requires_CONFIG_SPE #endif } else { return -EINVAL; } When CONFIG_SPE is not set, this resolves to an unconditional return of -EINVAL Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/698df8387555765b70ea42e4a7fa48141c309c1f.1597643221.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:36 +10:00
Christophe Leroy	e3667ee427	powerpc/process: Remove useless #ifdef CONFIG_ALTIVEC cpu_has_feature(CPU_FTR_ALTIVEC) returns false when CONFIG_ALTIVEC is not set. There is no need to enclose the test in an #ifdef CONFIG_ALTIVEC. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/03ba6b52344ca7c336df2bc6e3d31d736c804ae2.1597643221.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:36 +10:00
Christophe Leroy	80739c2bd2	powerpc/process: Remove useless #ifdef CONFIG_VSX cpu_has_feature(CPU_FTR_VSX) returns false when CONFIG_VSX is not set. There is no need to enclose the test in an #ifdef CONFIG_VSX. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0eb61cf0dc66d781d47deb2228498cd61d03a754.1597643221.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:35 +10:00
Christophe Leroy	60d62bfd24	powerpc/process: Tag an #endif to help locate the matching #ifdef. That #endif is more than 100 lines after the matching #ifdef, and there are several #ifdef/#else/#endif inbetween. Tag it as /* CONFIG_PPC_BOOK3S_64 */ to help locate the matching #ifdef. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3612a8f8aaca16de3fc414a7e66293319d6e213c.1597643147.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:35 +10:00
Christophe Leroy	8f020c7ca3	powerpc/process: Replace #ifdef CONFIG_KALLSYMS by IS_ENABLED() The #ifdef CONFIG_KALLSYMS encloses some printk which can compile in all cases. Replace by IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2d89732a9062b2cf2651728804e4b8f6c9b9358e.1597643164.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:34 +10:00
Christophe Leroy	2ec42996f5	powerpc/process: Replace an #if defined(CONFIG_4xx) \|\| defined(CONFIG_BOOKE) by IS_ENABLED() The #if defined(CONFIG_4xx) \|\| defined(CONFIG_BOOKE) encloses some printk which can be compiled in all cases. Replace by IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a1b6ef3d657c8f249193442f56868fc358ea5b6c.1597643160.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:34 +10:00
Christophe Leroy	bfac279930	powerpc/process: Replace an #ifdef CONFIG_PPC_BOOK3S_64 by IS_ENABLED() This #ifdef CONFIG_PPC_BOOK3S_64 calls preload_new_slb_context() when radix is not enabled. radix_enabled() is always defined, and the prototype for preload_new_slb_context() is always present, so the #ifdef is unneeded. Replace it by IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d31506ca9bac9def68cf7424eded63fdc4fb6660.1597643167.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:34 +10:00
Christophe Leroy	04d476bfbb	powerpc/process: Replace an #ifdef CONFIG_PPC_47x by IS_ENABLED() isync() is always defined, no need for an #ifdef. Replace it by IS_ENABLED(CONFIG_PPC_47x). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ac8da0e3baa91dda805e1e492fd65aecd90c1fb5.1597643156.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:33 +10:00
Christophe Leroy	da7bb43ab9	powerpc/32: Fix vmap stack - Properly set r1 before activating MMU We need r1 to be properly set before activating MMU, otherwise any new exception taken while saving registers into the stack in exception prologs will use the user stack, which is wrong and will even lockup or crash when KUAP is selected. Do that by switching the meaning of r11 and r1 until we have saved r1 to the stack: copy r1 into r11 and setup the new stack pointer in r1. To avoid complicating and impacting all generic and specific prolog code (and more), copy back r1 into r11 once r11 is save onto the stack. We could get rid of copying r1 back and forth at the cost of rewriting everything to use r1 instead of r11 all the way when CONFIG_VMAP_STACK is set, but the effort is probably not worth it. Fixes: `028474876f` ("powerpc/32: prepare for CONFIG_VMAP_STACK") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8f85e8752ac5af602db7237ef53d634f4f3d3892.1599486108.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:33 +10:00
Christophe Leroy	c118c7303a	powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct We need r1 to be properly set before activating MMU, so reading task_struct->stack must be done with MMU off. This means we need an additional register to play with MSR bits while r11 now points to the stack. For that, move r10 back to CR (As is already done for hash MMU) and use r10. We still don't have r1 correct yet when we activate MMU. It is done in following patch. Fixes: `028474876f` ("powerpc/32: prepare for CONFIG_VMAP_STACK") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a027d447022a006c9c4958ac734128e577a3c5c1.1599486108.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:33 +10:00
Christophe Leroy	7fdf966bed	powerpc/uaccess: Remove __put_user_asm() and __put_user_asm2() __put_user_asm() and __put_user_asm2() are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d66c4a372738d2fbd81f433ca86e4295871ace6a.1599216721.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:32 +10:00
Christophe Leroy	e64ac41ab0	powerpc/uaccess: Switch __patch_instruction() to __put_user_asm_goto() __patch_instruction() is the only user of __put_user_asm() outside of asm/uaccess.h Switch to the new __put_user_asm_goto() to enable retirement of __put_user_asm() in a later patch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b9745b122f4a9ae72cef445c61320022ab8b77b7.1599216721.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:32 +10:00
Christophe Leroy	ee0a49a687	powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto() __put_user_asm_goto() provides more flexibility to GCC and avoids using a local variable to tell if the write succeeded or not. GCC can then avoid implementing a cmp in the fast path. See the difference for a small function like the PPC64 version of save_general_regs() in arch/powerpc/kernel/signal_32.c: Before the patch (unreachable nop removed): 0000000000000c10 <.save_general_regs>: c10: 39 20 00 2c li r9,44 c14: 39 40 00 00 li r10,0 c18: 7d 29 03 a6 mtctr r9 c1c: 38 c0 00 00 li r6,0 c20: 48 00 00 14 b c34 <.save_general_regs+0x24> c30: 42 40 00 40 bdz c70 <.save_general_regs+0x60> c34: 28 2a 00 27 cmpldi r10,39 c38: 7c c8 33 78 mr r8,r6 c3c: 79 47 1f 24 rldicr r7,r10,3,60 c40: 39 20 00 01 li r9,1 c44: 41 82 00 0c beq c50 <.save_general_regs+0x40> c48: 7d 23 38 2a ldx r9,r3,r7 c4c: 79 29 00 20 clrldi r9,r9,32 c50: 91 24 00 00 stw r9,0(r4) c54: 2c 28 00 00 cmpdi r8,0 c58: 39 4a 00 01 addi r10,r10,1 c5c: 38 84 00 04 addi r4,r4,4 c60: 41 82 ff d0 beq c30 <.save_general_regs+0x20> c64: 38 60 ff f2 li r3,-14 c68: 4e 80 00 20 blr c70: 38 60 00 00 li r3,0 c74: 4e 80 00 20 blr 0000000000000000 <.fixup>: cc: 39 00 ff f2 li r8,-14 d0: 48 00 00 00 b d0 <.fixup+0xd0> d0: R_PPC64_REL24 .text+0xc54 After the patch: 0000000000001490 <.save_general_regs>: 1490: 39 20 00 2c li r9,44 1494: 39 40 00 00 li r10,0 1498: 7d 29 03 a6 mtctr r9 149c: 60 00 00 00 nop 14a0: 28 2a 00 27 cmpldi r10,39 14a4: 79 48 1f 24 rldicr r8,r10,3,60 14a8: 39 20 00 01 li r9,1 14ac: 41 82 00 0c beq 14b8 <.save_general_regs+0x28> 14b0: 7d 23 40 2a ldx r9,r3,r8 14b4: 79 29 00 20 clrldi r9,r9,32 14b8: 91 24 00 00 stw r9,0(r4) 14bc: 39 4a 00 01 addi r10,r10,1 14c0: 38 84 00 04 addi r4,r4,4 14c4: 42 00 ff dc bdnz 14a0 <.save_general_regs+0x10> 14c8: 38 60 00 00 li r3,0 14cc: 4e 80 00 20 blr 14d0: 38 60 ff f2 li r3,-14 14d4: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/94ba5a5138f99522e1562dbcdb38d31aa790dc89.1599216721.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:32 +10:00
Christophe Leroy	fcf1f26895	powerpc/uaccess: Add pre-update addressing to __put_user_asm_goto() Enable pre-update addressing mode in __put_user_asm_goto() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/346f65d677adb11865f7762c25a1ca3c64404ba5.1599216023.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:31 +10:00
Christophe Leroy	e47168f3d1	powerpc/8xx: Support 16k hugepages with 4k pages The 8xx has 4 page sizes: 4k, 16k, 512k and 8M 4k and 16k can be selected at build time as standard page sizes, and 512k and 8M are hugepages. When 4k standard pages are selected, 16k pages are not available. Allow 16k pages as hugepages when 4k pages are used. To allow that, implement arch_make_huge_pte() which receives the necessary arguments to allow setting the PTE in accordance with the page size: - 512 k pages must have _PAGE_HUGE and _PAGE_SPS. They are set by pte_mkhuge(). arch_make_huge_pte() does nothing. - 16 k pages must have only _PAGE_SPS. arch_make_huge_pte() clears _PAGE_HUGE. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a518abc29266a708dfbccc8fce9ae6694fe4c2c6.1598862623.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:31 +10:00
Christophe Leroy	175a999915	powerpc/8xx: Refactor calculation of number of entries per PTE in page tables On 8xx, the number of entries occupied by a PTE in the page tables depends on the size of the page. At the time being, this calculation is done in two places: in pte_update() and in set_huge_pte_at() Refactor this calculation into a helper called number_of_cells_per_pte(). For the time being, the val param is unused. It will be used by following patch. Instead of opencoding is_hugepd(), use hugepd_ok() with a forward declaration. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f6ea2483c2c389567b007945948f704d18cfaeea.1598862623.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:31 +10:00
Christophe Leroy	542db12a9c	powerpc: Fix random segfault when freeing hugetlb range The following random segfault is observed from time to time with map_hugetlb selftest: root@localhost:~# ./map_hugetlb 1 19 524288 kB hugepages Mapping 1 Mbytes Segmentation fault [ 31.219972] map_hugetlb[365]: segfault (11) at 117 nip 77974f8c lr 779a6834 code 1 in ld-2.23.so[77966000+21000] [ 31.220192] map_hugetlb[365]: code: 9421ffc0 480318d1 93410028 90010044 9361002c 93810030 93a10034 93c10038 [ 31.220307] map_hugetlb[365]: code: 93e1003c 93210024 8123007c 81430038 <80e90004> 814a0004 7f443a14 813a0004 [ 31.221911] BUG: Bad rss-counter state mm:(ptrval) type:MM_FILEPAGES val:33 [ 31.229362] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:5 This fault is due to hugetlb_free_pgd_range() freeing page tables that are also used by regular pages. As explain in the comment at the beginning of hugetlb_free_pgd_range(), the verification done in free_pgd_range() on floor and ceiling is not done here, which means hugetlb_free_pte_range() can free outside the expected range. As the verification cannot be done in hugetlb_free_pgd_range(), it must be done in hugetlb_free_pte_range(). Fixes: `b250c8c08c` ("powerpc/8xx: Manage 512k huge pages as standard pages.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f0cb2a5477cd87d1eaadb128042e20aeb2bc2859.1598860677.git.christophe.leroy@csgroup.eu	2020-09-15 22:13:30 +10:00
Finn Thain	e63d6fb563	powerpc/tau: Disable TAU between measurements Enabling CONFIG_TAU_INT causes random crashes: Unrecoverable exception 1700 at c0009414 (msr=1000) Oops: Unrecoverable exception, sig: 6 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac-00043-gd5f545e1a8593 #5 NIP: c0009414 LR: c0009414 CTR: c00116fc REGS: c0799eb8 TRAP: 1700 Not tainted (5.7.0-pmac-00043-gd5f545e1a8593) MSR: 00001000 <ME> CR: 22000228 XER: 00000100 GPR00: 00000000 c0799f70 c076e300 00800000 0291c0ac 00e00000 c076e300 00049032 GPR08: 00000001 c00116fc 00000000 dfbd3200 ffffffff 007f80a8 00000000 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c075ce04 GPR24: c075ce04 dfff8880 c07b0000 c075ce04 00080000 00000001 c079ef98 c079ef5c NIP [c0009414] arch_cpu_idle+0x24/0x6c LR [c0009414] arch_cpu_idle+0x24/0x6c Call Trace: [c0799f70] [00000001] 0x1 (unreliable) [c0799f80] [c0060990] do_idle+0xd8/0x17c [c0799fa0] [c0060ba4] cpu_startup_entry+0x20/0x28 [c0799fb0] [c072d220] start_kernel+0x434/0x44c [c0799ff0] [00003860] 0x3860 Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX 3d20c07b XXXXXXXX XXXXXXXX XXXXXXXX 7c0802a6 XXXXXXXX XXXXXXXX XXXXXXXX 4e800421 XXXXXXXX XXXXXXXX XXXXXXXX 7d2000a6 ---[ end trace 3a0c9b5cb216db6b ]--- Resolve this problem by disabling each THRMn comparator when handling the associated THRMn interrupt and by disabling the TAU entirely when updating THRMn thresholds. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5a0ba3dc5612c7aac596727331284a3676c08472.1599260540.git.fthain@telegraphics.com.au	2020-09-15 22:13:30 +10:00
Finn Thain	5e3119e15f	powerpc/tau: Check processor type before enabling TAU interrupt According to Freescale's documentation, MPC74XX processors have an erratum that prevents the TAU interrupt from working, so don't try to use it when running on those processors. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c281611544768e758bd58fe812cf702a5bd2d042.1599260540.git.fthain@telegraphics.com.au	2020-09-15 22:13:28 +10:00
Finn Thain	420ab2bc75	powerpc/tau: Remove duplicated set_thresholds() call The commentary at the call site seems to disagree with the code. The conditional prevents calling set_thresholds() via the exception handler, which appears to crash. Perhaps that's because it immediately triggers another TAU exception. Anyway, calling set_thresholds() from TAUupdate() is redundant because tau_timeout() does so. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d7c7ee33232cf72a6a6bbb6ef05838b2e2b113c0.1599260540.git.fthain@telegraphics.com.au	2020-09-15 22:13:27 +10:00
Finn Thain	b1c6a0a10b	powerpc/tau: Convert from timer to workqueue Since commit `19dbdcb803` ("smp: Warn on function calls from softirq context") the Thermal Assist Unit driver causes a warning like the following when CONFIG_SMP is enabled. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/smp.c:428 smp_call_function_many_cond+0xf4/0x38c Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac #3 NIP: c00b37a8 LR: c00b3abc CTR: c001218c REGS: c0799c60 TRAP: 0700 Not tainted (5.7.0-pmac) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 42000224 XER: 00000000 GPR00: c00b3abc c0799d18 c076e300 c079ef5c c0011fec 00000000 00000000 00000000 GPR08: 00000100 00000100 00008000 ffffffff 42000224 00000000 c079d040 c079d044 GPR16: 00000001 00000000 00000004 c0799da0 c079f054 c07a0000 c07a0000 00000000 GPR24: c0011fec 00000000 c079ef5c c079ef5c 00000000 00000000 00000000 00000000 NIP [c00b37a8] smp_call_function_many_cond+0xf4/0x38c LR [c00b3abc] on_each_cpu+0x38/0x68 Call Trace: [c0799d18] [ffffffff] 0xffffffff (unreliable) [c0799d68] [c00b3abc] on_each_cpu+0x38/0x68 [c0799d88] [c0096704] call_timer_fn.isra.26+0x20/0x7c [c0799d98] [c0096b40] run_timer_softirq+0x1d4/0x3fc [c0799df8] [c05b4368] __do_softirq+0x118/0x240 [c0799e58] [c0039c44] irq_exit+0xc4/0xcc [c0799e68] [c000ade8] timer_interrupt+0x1b0/0x230 [c0799ea8] [c0013520] ret_from_except+0x0/0x14 --- interrupt: 901 at arch_cpu_idle+0x24/0x6c LR = arch_cpu_idle+0x24/0x6c [c0799f70] [00000001] 0x1 (unreliable) [c0799f80] [c0060990] do_idle+0xd8/0x17c [c0799fa0] [c0060ba8] cpu_startup_entry+0x24/0x28 [c0799fb0] [c072d220] start_kernel+0x434/0x44c [c0799ff0] [00003860] 0x3860 Instruction dump: 8129f204 2f890000 40beff98 3d20c07a 8929eec4 2f890000 40beff88 0fe00000 81220000 552805de 550802ef 4182ff84 <0fe00000> 3860ffff 7f65db78 7f44d378 ---[ end trace 34a886e47819c2eb ]--- Don't call on_each_cpu() from a timer callback, call it from a worker thread instead. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bb61650bea4f4c91fb8e24b9a6f130a1438651a7.1599260540.git.fthain@telegraphics.com.au	2020-09-15 22:13:26 +10:00
Finn Thain	66943005cc	powerpc/tau: Use appropriate temperature sample interval According to the MPC750 Users Manual, the SITV value in Thermal Management Register 3 is 13 bits long. The present code calculates the SITV value as 60 * 500 cycles. This would overflow to give 10 us on a 500 MHz CPU rather than the intended 60 us. (But according to the Microprocessor Datasheet, there is also a factor of 266 that has to be applied to this value on certain parts i.e. speed sort above 266 MHz.) Always use the maximum cycle count, as recommended by the Datasheet. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/896f542e5f0f1d6cf8218524c2b67d79f3d69b3c.1599260540.git.fthain@telegraphics.com.au	2020-09-15 22:13:24 +10:00
Aneesh Kumar K.V	b32d5d7e92	powerpc/mm/book3s: Split radix and hash MAX_PHYSMEM limit MAX_PHYSMEM #define is used along with sparsemem to determine the SECTION_SHIFT value. Powerpc also uses the same value to limit the max memory enabled on the system. With 4K PAGE_SIZE and hash translation mode, we want to limit the max memory enabled to 64TB due to page table size restrictions. However, with radix translation, we don't have these restrictions. Hence split the radix and hash MA_PHYSMEM limit and use different limit for each of them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200608070904.387440-4-aneesh.kumar@linux.ibm.com	2020-09-15 22:13:22 +10:00
Aneesh Kumar K.V	7746406baa	powerpc/book3s64/hash/4k: Support large linear mapping range with 4K With commit: `0034d395f8` ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range"), we now split the 64TB address range into 4 contexts each of 16TB. That implies we can do only 16TB linear mapping. On some systems, eg. Power9, memory attached to nodes > 0 will appear above 16TB in the linear mapping. This resulted in kernel crash when we boot such systems in hash translation mode with 4K PAGE_SIZE. This patch updates the kernel mapping such that we now start supporting upto 61TB of memory with 4K. The kernel mapping now looks like below 4K PAGE_SIZE and hash translation. vmalloc start = 0xc0003d0000000000 IO start = 0xc0003e0000000000 vmemmap start = 0xc0003f0000000000 Our MAX_PHYSMEM_BITS for 4K is still 64TB even though we can only map 61TB. We prevent bolt mapping anything outside 61TB range by checking against H_VMALLOC_START. Fixes: `0034d395f8` ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") Reported-by: Cameron Berkenpas <cam@neo-zeon.de> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200608070904.387440-3-aneesh.kumar@linux.ibm.com	2020-09-15 22:13:22 +10:00
Aneesh Kumar K.V	eb553f1697	powerpc/64/mm: implement page mapping percpu first chunk allocator Implement page mapping percpu first chunk allocator as a fallback to the embedding allocator. With 4K hash translation we limit our page table range to 64TB and commit: `0034d395f8` ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") moved all kernel mapping to that 64TB range. In-order to support sparse memory layout we need to increase our linear mapping space and reduce other mappings. With such a layout percpu embedded first chunk allocator will fail because of small vmalloc range. Add a fallback to page mapping percpu first chunk allocator for such failures. The below dmesg output can be observed in such case. percpu: max_distance=0x1ffffef00000 too large for vmalloc space 0x10000000000 PERCPU: auto allocator failed (-22), falling back to page size percpu: 40 4K pages/cpu s148816 r0 d15024 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200608070904.387440-2-aneesh.kumar@linux.ibm.com	2020-09-15 22:13:22 +10:00
Aneesh Kumar K.V	2a32abac88	powerpc/percpu: Update percpu bootmem allocator This update the ppc64 version to be closer to x86/sparc. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200608070904.387440-1-aneesh.kumar@linux.ibm.com	2020-09-15 22:13:21 +10:00
Ravi Bangoria	fa725cc53d	powerpc/watchpoint/ptrace: Introduce PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether we are running on an ISA 3.1 compliant machine. Which is needed to determine DAR behaviour, 512 byte boundary limit etc. This was requested by Pedro Miraglia Franco de Carvalho for extending watchpoint features in gdb. Note that availability of 2nd DAWR is independent of this flag and should be checked using ppc_debug_info->num_data_bps. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-8-ravi.bangoria@linux.ibm.com	2020-09-15 22:13:20 +10:00
Ravi Bangoria	58da5984d2	powerpc/watchpoint: Add hw_len wherever missing There are couple of places where we set len but not hw_len. For ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len will be calculated and set internally while parsing watchpoint. But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set 'hw_len'. Similarly for xmon as well, hw_len needs to be set directly. Fixes: `b57aeab811` ("powerpc/watchpoint: Fix length calculation for unaligned target") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-7-ravi.bangoria@linux.ibm.com	2020-09-15 22:13:20 +10:00
Ravi Bangoria	5b905d7798	powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel disables event every time it fires and user has to re-enable it. Also, in case of ptrace watchpoint, kernel notifies ptrace user before executing instruction. With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable ptrace event and thus it's causing infinite loop of exceptions. This is especially harmful when user watches on a data which is also read/written by kernel, eg syscall parameters. In such case, infinite exceptions happens in kernel mode which causes soft-lockup. Fixes: `9422de3e95` ("powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers") Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-6-ravi.bangoria@linux.ibm.com	2020-09-15 22:13:20 +10:00
Ravi Bangoria	edc8dd99b2	powerpc/watchpoint: Move DAWR detection logic outside of hw_breakpoint.c Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused the exception. So we have a sw logic to detect that in hw_breakpoint.c. But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y. Move DAWR detection logic outside of hw_breakpoint.c so that it can be reused when CONFIG_HAVE_HW_BREAKPOINT is not set. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-5-ravi.bangoria@linux.ibm.com	2020-09-15 22:13:19 +10:00
Ravi Bangoria	9b6b7c680c	powerpc/watchpoint/ptrace: Fix SETHWDEBUG when CONFIG_HAVE_HW_BREAKPOINT=N When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited functionalities. But, such watchpoints are never firing because of the missing privilege settings. Fix that. It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will also help to find scenarios when kernel accesses user memory. Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com> Suggested-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-4-ravi.bangoria@linux.ibm.com	2020-09-15 22:13:18 +10:00
Ravi Bangoria	4441eb0233	powerpc/watchpoint: Fix handling of vector instructions Vector load/store instructions are special because they are always aligned. Thus unaligned EA needs to be aligned down before comparing it with watch ranges. Otherwise we might consider valid event as invalid. Fixes: `74c6881019` ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-3-ravi.bangoria@linux.ibm.com	2020-09-15 22:13:18 +10:00
Ravi Bangoria	4759c11ed2	powerpc/watchpoint: Fix quadword instruction handling on p10 predecessors On p10 predecessors, watchpoint with quadword access is compared at quadword length. If the watch range is doubleword or less than that in a first half of quadword aligned 16 bytes, and if there is any unaligned quadword access which will access only the 2nd half, the handler should consider it as extraneous and emulate/single-step it before continuing. Fixes: `74c6881019` ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-2-ravi.bangoria@linux.ibm.com	2020-09-15 22:12:25 +10:00
Linus Torvalds	973c096f6a	vgacon: remove software scrollback support Yunhai Zhang recently fixed a VGA software scrollback bug in commit `ebfdfeeae8` ("vgacon: Fix for missing check in scrollback handling"), but that then made people look more closely at some of this code, and there were more problems on the vgacon side, but also the fbcon software scrollback. We don't really have anybody who maintains this code - probably because nobody actually _uses_ it any more. Sure, people still use both VGA and the framebuffer consoles, but they are no longer the main user interfaces to the kernel, and haven't been for decades, so these kinds of extra features end up bitrotting and not really being used. So rather than try to maintain a likely unused set of code, I'll just aggressively remove it, and see if anybody even notices. Maybe there are people who haven't jumped on the whole GUI badnwagon yet, and think it's just a fad. And maybe those people use the scrollback code. If that turns out to be the case, we can resurrect this again, once we've found the sucker^Wmaintainer for it who actually uses it. Reported-by: NopNop Nop <nopitydays@gmail.com> Tested-by: Willy Tarreau <w@1wt.eu> Cc: 张云海 <zhangyunhai@nsfocus.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Willy Tarreau <w@1wt.eu> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-14 10:06:15 -07:00
Thiago Jung Bauermann	eae9eec476	powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory POWER secure guests (i.e., guests which use the Protected Execution Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but they don't need the SWIOTLB memory to be in low addresses since the hypervisor doesn't have any addressing limitation. This solves a SWIOTLB initialization problem we are seeing in secure guests with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved memory, which leaves no space for SWIOTLB in low addresses. To do this, we use mostly the same code as swiotlb_init(), but allocate the buffer using memblock_alloc() instead of memblock_alloc_low(). Fixes: `2efbc58f15` ("powerpc/pseries/svm: Force SWIOTLB for secure guests") Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200818221126.391073-1-bauerman@linux.ibm.com	2020-09-14 23:07:14 +10:00
Michael Ellerman	231b232df8	powerpc/64: Make VDSO32 track COMPAT on 64-bit When we added the VDSO32 kconfig symbol, which controls building of the 32-bit VDSO, we made it depend on CPU_BIG_ENDIAN (for 64-bit). That was because back then COMPAT was always enabled for 64-bit, so depending on it would have left the 32-bit VDSO always enabled, which we didn't want. But since then we have made COMPAT selectable, and off by default for ppc64le, so VDSO32 should really depend on that. For most people this makes no difference, none of the defconfigs change, it's only if someone is building ppc64le with COMPAT=y, they will now also get VDSO32. If they've enabled COMPAT in order to run 32-bit binaries they presumably also want the 32-bit VDSO. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/20200908125850.407939-1-mpe@ellerman.id.au	2020-09-14 23:07:04 +10:00
Michael Ellerman	960e370813	Merge branch 'fixes' into next Bring in our fixes branch for this cycle which avoids some small conflicts with upcoming commits.	2020-09-14 22:57:18 +10:00
Christoph Hellwig	5ceda74093	dma-direct: rename and cleanup __phys_to_dma The __phys_to_dma vs phys_to_dma distinction isn't exactly obvious. Try to improve the situation by renaming __phys_to_dma to phys_to_dma_unencryped, and not forcing architectures that want to override phys_to_dma to actually provide __phys_to_dma. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>	2020-09-11 09:14:43 +02:00
Christoph Hellwig	7bc5c428a6	dma-direct: remove __dma_to_phys There is no harm in just always clearing the SME encryption bit, while significantly simplifying the interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>	2020-09-11 09:14:25 +02:00
Vaibhav Jain	0460534b53	powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute The newly introduced 'perf_stats' attribute uses the default access mode of 0444, allowing non-root users to access performance stats of an nvdimm and potentially force the kernel into issuing a large number of expensive hypercalls. Since the information exposed by this attribute cannot be cached it is better to ward off access to this attribute from users who don't need to access to these performance statistics. Hence update the access mode of 'perf_stats' attribute to be only readable by root users. Fixes: `2d02bf835e` ("powerpc/papr_scm: Fetch nvdimm performance stats from PHYP") Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200907110540.21349-1-vaibhav@linux.ibm.com	2020-09-09 14:44:38 +10:00
Christoph Hellwig	5ae4998b5d	powerpc: remove address space overrides using set_fs() Stop providing the possibility to override the address space using set_fs() now that there is no need for that any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-08 22:21:37 -04:00
Christoph Hellwig	c331652534	powerpc: use non-set_fs based maccess routines Provide __get_kernel_nofault and __put_kernel_nofault routines to implement the maccess routines without messing with set_fs and without opening up access to user space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-08 22:21:36 -04:00
Christoph Hellwig	5e6e9852d6	uaccess: add infrastructure for kernel builds with set_fs() Add a CONFIG_SET_FS option that is selected by architecturess that implement set_fs, which is all of them initially. If the option is not set stubs for routines related to overriding the address space are provided so that architectures can start to opt out of providing set_fs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-09-08 22:21:32 -04:00
Nicholas Piggin	dc462267d2	powerpc/64s: handle ISA v3.1 local copy-paste context switches The ISA v3.1 the copy-paste facility has a new memory move functionality which allows the copy buffer to be pasted to domestic memory (RAM) as opposed to foreign memory (accelerator). This means the POWER9 trick of avoiding the cp_abort on context switch if the process had not mapped foreign memory does not work on POWER10. Do the cp_abort unconditionally there. KVM must also cp_abort on guest exit to prevent copy buffer state leaking between contexts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075535.224536-1-npiggin@gmail.com	2020-09-08 22:57:12 +10:00
Joel Stanley	a02f6d4235	powerpc: Warn about use of smt_snooze_delay It's not done anything for a long time. Save the percpu variable, and emit a warning to remind users to not expect it to do anything. This uses pr_warn_once instead of pr_warn_ratelimit as testing 'ppc64_cpu --smt=off' on a 24 core / 4 SMT system showed the warning to be noisy, as the online/offline loop is slow. Fixes: `3fa8cad82b` ("powerpc/pseries/cpuidle: smt-snooze-delay cleanup.") Cc: stable@vger.kernel.org # v3.14 Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902000012.3440389-1-joel@jms.id.au	2020-09-08 22:57:12 +10:00
Joel Stanley	8f55984f53	powerpc/powernv: Print helpful message when cores guarded Often the firmware will guard out cores after a crash. This often undesirable, and is not immediately noticeable. This adds an informative message when a CPU device tree nodes are marked bad in the device tree. Signed-off-by: Joel Stanley <joel@jms.id.au> [mpe: Use an eye-catcher that's less likely to get us in trouble] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190801051630.5804-1-joel@jms.id.au	2020-09-08 22:57:11 +10:00
Leonardo Bras	8c0d51592f	powerpc/pseries/iommu: Allow bigger 64bit window by removing default DMA window On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the default DMA window for the device, before attempting to configure a DDW, in order to make the maximum resources available for the next DDW to be created. This is a requirement for using DDW on devices in which hypervisor allows only one DMA window. If setting up a new DDW fails anywhere after the removal of this default DMA window, it's needed to restore the default DMA window. For this, an implementation of ibm,reset-pe-dma-windows rtas call is needed: Platforms supporting the DDW option starting with LoPAR level 2.7 implement ibm,ddw-extensions. The first extension available (index 2) carries the token for ibm,reset-pe-dma-windows rtas call, which is used to restore the default DMA window for a device, if it has been deleted. It does so by resetting the TCE table allocation for the PE to it's boot time value, available in "ibm,dma-window" device tree node. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-5-leobras.c@gmail.com	2020-09-08 22:57:11 +10:00
Leonardo Bras	74d0b3994e	powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window Move the window-removing part of remove_ddw into a new function (remove_dma_window), so it can be used to remove other DMA windows. It's useful for removing DMA windows that don't create DIRECT64_PROPNAME property, like the default DMA window from the device, which uses "ibm,dma-window". Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-4-leobras.c@gmail.com	2020-09-08 22:57:11 +10:00
Leonardo Bras	80f0251231	powerpc/pseries/iommu: Update call to ibm, query-pe-dma-windows >From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of outputs from "ibm,query-pe-dma-windows" go from 5 to 6. This change of output size is meant to expand the address size of largest_available_block PE TCE from 32-bit to 64-bit, which ends up shifting page_size and migration_capable. This ends up requiring the update of ddw_query_response->largest_available_block from u32 to u64, and manually assigning the values from the buffer into this struct, according to output size. Also, a routine was created for helping reading the ddw extensions as suggested by LoPAR: First reading the size of the extension array from index 0, checking if the property exists, and then returning it's value. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-3-leobras.c@gmail.com	2020-09-08 22:57:11 +10:00
Leonardo Bras	cac3e62908	powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicable Create defines to help handling ibm,ddw-applicable values, avoiding confusion about the index of given operations. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-2-leobras.c@gmail.com	2020-09-08 22:57:11 +10:00
Russell Currey	0fb4871bcc	powerpc/tools: Remove 90 line limit in checkpatch script As of commit `bdc48fa11e`, scripts/checkpatch.pl now has a default line length warning of 100 characters. The powerpc wrapper script was using a length of 90 instead of 80 in order to make checkpatch less restrictive, but now it's making it more restrictive instead. I think it makes sense to just use the default value now. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200828020542.393022-1-ruscur@russell.cc	2020-09-08 22:57:11 +10:00
Jordan Niethe	364b236a0b	powerpc/boot: Update Makefile comment for 64bit wrapper As of commit `147c05168f` ("powerpc/boot: Add support for 64bit little endian wrapper") the comment in the Makefile is misleading. The wrapper packaging 64bit kernel may built as a 32 or 64 bit elf. Update the comment to reflect this. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825035147.3239-1-jniethe5@gmail.com	2020-09-08 22:24:19 +10:00
Michael Ellerman	529d2bd56a	powerpc/64: Remove unused generic_secondary_thread_init() The last caller was removed in 2014 in commit `fb5a515704` ("powerpc: Remove platforms/wsp and associated pieces"). As Jordan noticed even though there are no callers, the code above in fsl_secondary_thread_init() falls through into generic_secondary_thread_init(). So we can remove the _GLOBAL but not the body of the function. However because fsl_secondary_thread_init() is inside #ifdef CONFIG_PPC_BOOK3E, we can never reach the body of generic_secondary_thread_init() unless CONFIG_PPC_BOOK3E is enabled, so we can wrap the whole thing in a single #ifdef. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200819015704.1976364-1-mpe@ellerman.id.au	2020-09-08 22:24:17 +10:00
Oliver O'Halloran	10bf59d923	powerpc/pseries/eeh: Fix dumb linebreaks These annoy me every time I see them. Why are they here? They're not even needed for 80cols compliance. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200818044557.135497-1-oohall@gmail.com	2020-09-08 22:23:42 +10:00
Christophe Leroy	353bce211e	powerpc/process: Remove unnecessary #ifdef CONFIG_FUNCTION_GRAPH_TRACER ftrace_graph_ret_addr() is always defined and returns 'ip' when CONFIG_FUNCTION GRAPH_TRACER is not set. So the #ifdef is not needed, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9d11143d4e27ba8274369a926968756917584868.1597643153.git.christophe.leroy@csgroup.eu	2020-09-08 22:23:42 +10:00
Christophe Leroy	2f279eeb68	powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm() Enable pre-update addressing mode in __get_user_asm() and __put_user_asm() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/13041c7df39e89ddf574ea0cdc6dedfdd9734140.1597235091.git.christophe.leroy@csgroup.eu	2020-09-08 22:23:22 +10:00
Masami Hiramatsu	b6c5a58dd8	powerpc: kprobes: Use generic kretprobe trampoline handler Use the generic kretprobe trampoline handler. Don't use framepointer verification. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/159870610825.1229682.2090635992093223399.stgit@devnote2	2020-09-08 11:52:34 +02:00
Alexey Kardashevskiy	437ef802e0	powerpc/dma: Fix dma_map_ops::get_required_mask There are 2 problems with it: 1. "<" vs expected "<<" 2. the shift number is an IOMMU page number mask, not an address mask as the IOMMU page shift is missing. This did not hit us before `f1565c24b5` ("powerpc: use the generic dma_ops_bypass mode") because we had additional code to handle bypass mask so this chunk (almost?) never executed.However there were reports that aacraid does not work with "iommu=nobypass". After `f1565c24b5`, aacraid (and probably others which call dma_get_required_mask() before setting the mask) was unable to enable 64bit DMA and fall back to using IOMMU which was known not to work, one of the problems is double free of an IOMMU page. This fixes DMA for aacraid, both with and without "iommu=nobypass" in the kernel command line. Verified with "stress-ng -d 4". Fixes: `6a5c7be5e4` ("powerpc: Override dma_get_required_mask by platform hook and ops") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200908015106.79661-1-aik@ozlabs.ru	2020-09-08 14:20:55 +10:00
Masahiro Yamada	887af6d7c9	arch: vdso: add vdso linker script to 'targets' instead of extra-y The vdso linker script is preprocessed on demand. Adding it to 'targets' is enough to include the .cmd file. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Greentime Hu <green.hu@gmail.com>	2020-09-07 21:41:27 +09:00
Herbert Xu	d08d387b73	crypto: powerpc/crc-vpmsum_test - Fix sparse endianness warning This patch fixes a sparse endianness warning by changing crc32 to __le32 instead of u32: CHECK ../arch/powerpc/crypto/crc-vpmsum_test.c ../arch/powerpc/crypto/crc-vpmsum_test.c:102:39: warning: cast from restricted __le32 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-09-04 17:57:15 +10:00
Nicolin Chen	1e9d90dbed	dma-mapping: introduce dma_get_seg_boundary_nr_pages() We found that callers of dma_get_seg_boundary mostly do an ALIGN with page mask and then do a page shift to get number of pages: ALIGN(boundary + 1, 1 << shift) >> shift However, the boundary might be as large as ULONG_MAX, which means that a device has no specific boundary limit. So either "+ 1" or passing it to ALIGN() would potentially overflow. According to kernel defines: #define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) #define ALIGN(x, a) ALIGN_MASK(x, (typeof(x))(a) - 1) We can simplify the logic here into a helper function doing: ALIGN(boundary + 1, 1 << shift) >> shift = ALIGN_MASK(b + 1, (1 << s) - 1) >> s = {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s = [b + 1 + (1 << s) - 1] >> s = [b + (1 << s)] >> s = (b >> s) + 1 This patch introduces and applies dma_get_seg_boundary_nr_pages() as an overflow-free helper for the dma_get_seg_boundary() callers to get numbers of pages. It also takes care of the NULL dev case for non-DMA API callers. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-09-03 18:12:15 +02:00
Michael Ellerman	4c62285439	Revert "powerpc/build: vdso linker warning for orphan sections" This reverts commit `f2af201002`. It added a usage of cc-ldoption, but cc-ldoption was removed in commit `055efab312` ("kbuild: drop support for cc-ldoption"). Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2020-09-03 15:42:26 +10:00
Greg Kurz	5706d14d2a	KVM: PPC: Book3S HV: XICS: Replace the 'destroy' method by a 'release' method Similarly to what was done with XICS-on-XIVE and XIVE native KVM devices with commit `5422e95103` ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method"), convert the historical XICS KVM device to implement the 'release' method. This is needed to run nested guests with an in-kernel IRQ chip. A typical POWER9 guest can select XICS or XIVE during boot, which requires to be able to destroy and to re-create the KVM device. Only the historical XICS KVM device is available under pseries at the current time and it still uses the legacy 'destroy' method. Switching to 'release' means that vCPUs might still be running when the device is destroyed. In order to avoid potential use-after-free, the kvmppc_xics structure is allocated on first usage and kept around until the VM exits. The same pointer is used each time a KVM XICS device is being created, but this is okay since we only have one per VM. Clear the ICP of each vCPU with vcpu->mutex held. This ensures that the next time the vCPU resumes execution, it won't be going into the XICS code anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2020-09-03 14:12:48 +10:00
Aneesh Kumar K.V	675bceb097	powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc The test is broken w.r.t page table update rules and results in kernel crash as below. Disable the support until we get the tests updated. [ 21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304! cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0] pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380 lr: c0000000005eeeec: pte_update+0x11c/0x190 sp: c000000c6d1e7950 msr: 8000000002029033 current = 0xc000000c6d172c80 paca = 0xc000000003ba0000 irqmask: 0x03 irq_happened: 0x01 pid = 1, comm = swapper/0 kernel BUG at arch/powerpc/mm/pgtable.c:304! [link register ] c0000000005eeeec pte_update+0x11c/0x190 [c000000c6d1e7950] 0000000000000001 (unreliable) [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190 [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8 [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338 [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0 [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4 [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160 [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c With DEBUG_VM disabled [ 20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 20.530183] Faulting instruction address: 0xc0000000000df330 cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700] pc: c0000000000df330: memset+0x68/0x104 lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0 sp: c000000c6d19f990 msr: 8000000002009033 dar: 0 current = 0xc000000c6d177480 paca = 0xc00000001ec4f400 irqmask: 0x03 irq_happened: 0x01 pid = 1, comm = swapper/0 [link register ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0 [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable) [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378 [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244 [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0 [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4 [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160 [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c 33:mon> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902040122.136414-1-aneesh.kumar@linux.ibm.com	2020-09-02 22:14:26 +10:00

1 2 3 4 5 ...

22530 Commits