linux

Author	SHA1	Message	Date
Ram Pai	256ba2c168	powerpc/pseries/svm: Unshare all pages before kexecing a new kernel A new kernel deserves a clean slate. Any pages shared with the hypervisor is unshared before invoking the new kernel. However there are exceptions. If the new kernel is invoked to dump the current kernel, or if there is a explicit request to preserve the state of the current kernel, unsharing of pages is skipped. NOTE: While testing crashkernel, make sure at least 256M is reserved for crashkernel. Otherwise SWIOTLB allocation will fail and crash kernel will fail to boot. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-11-bauerman@linux.ibm.com	2019-08-30 09:55:40 +10:00
Anshuman Khandual	d5394c059d	powerpc/pseries/svm: Use shared memory for Debug Trace Log (DTL) Secure guests need to share the DTL buffers with the hypervisor. To that end, use a kmem_cache constructor which converts the underlying buddy allocated SLUB cache pages into shared memory. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-10-bauerman@linux.ibm.com	2019-08-30 09:55:40 +10:00
Anshuman Khandual	bd104e6db6	powerpc/pseries/svm: Use shared memory for LPPACA structures LPPACA structures need to be shared with the host. Hence they need to be in shared memory. Instead of allocating individual chunks of memory for a given structure from memblock, a contiguous chunk of memory is allocated and then converted into shared memory. Subsequent allocation requests will come from the contiguous chunk which will be always shared memory for all structures. While we are able to use a kmem_cache constructor for the Debug Trace Log, LPPACAs are allocated very early in the boot process (before SLUB is available) so we need to use a simpler scheme here. Introduce helper is_svm_platform() which uses the S bit of the MSR to tell whether we're running as a secure guest. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-9-bauerman@linux.ibm.com	2019-08-30 09:55:40 +10:00
Sukadev Bhattiprolu	7f70c3815a	powerpc: Introduce the MSR_S bit Protected Execution Facility (PEF) is an architectural change for POWER 9 that enables Secure Virtual Machines (SVMs). When enabled, PEF adds a new higher privileged mode, called Ultravisor mode, to POWER architecture. The hardware changes include the following: * There is a new bit in the MSR that determines whether the current process is running in secure mode, MSR(S) bit 41. MSR(S)=1, process is in secure mode, MSR(s)=0 process is in normal mode. * The MSR(S) bit can only be set by the Ultravisor. * HRFID cannot be used to set the MSR(S) bit. If the hypervisor needs to return to a SVM it must use an ultracall. It can determine if the VM it is returning to is secure. * The privilege of a process is now determined by three MSR bits, MSR(S, HV, PR). In each of the tables below the modes are listed from least privilege to highest privilege. The higher privilege modes can access all the resources of the lower privilege modes. Secure Mode MSR Settings +---+---+---+---------------+ \| S \| HV\| PR\|Privilege \| +===+===+===+===============+ \| 1 \| 0 \| 1 \| Problem \| +---+---+---+---------------+ \| 1 \| 0 \| 0 \| Privileged(OS)\| +---+---+---+---------------+ \| 1 \| 1 \| 0 \| Ultravisor \| +---+---+---+---------------+ \| 1 \| 1 \| 1 \| Reserved \| +---+---+---+---------------+ Normal Mode MSR Settings +---+---+---+---------------+ \| S \| HV\| PR\|Privilege \| +===+===+===+===============+ \| 0 \| 0 \| 1 \| Problem \| +---+---+---+---------------+ \| 0 \| 0 \| 0 \| Privileged(OS)\| +---+---+---+---------------+ \| 0 \| 1 \| 0 \| Hypervisor \| +---+---+---+---------------+ \| 0 \| 1 \| 1 \| Problem (HV) \| +---+---+---+---------------+ Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ cclaudio: Update the commit message ] Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-7-bauerman@linux.ibm.com	2019-08-30 09:55:40 +10:00
Ram Pai	f7777e008c	powerpc/pseries/svm: Add helpers for UV_SHARE_PAGE and UV_UNSHARE_PAGE These functions are used when the guest wants to grant the hypervisor access to certain pages. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-6-bauerman@linux.ibm.com	2019-08-30 09:55:40 +10:00
Ram Pai	6a9c930bd7	powerpc/prom_init: Add the ESM call to prom_init Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure mode. Pass kernel base address and FDT address so that the Ultravisor is able to verify the integrity of the VM using information from the ESM blob. Add "svm=" command line option to turn on switching to secure mode. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ andmike: Generate an RTAS os-term hcall when the ESM ucall fails. ] Signed-off-by: Michael Anderson <andmike@linux.ibm.com> [ bauerman: Cleaned up the code a bit. ] Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-5-bauerman@linux.ibm.com	2019-08-30 09:54:35 +10:00
Thiago Jung Bauermann	136bc0397a	powerpc/pseries: Introduce option to build secure virtual machines Introduce CONFIG_PPC_SVM to control support for secure guests and include Ultravisor-related helpers when it is selected Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-3-bauerman@linux.ibm.com	2019-08-30 09:53:28 +10:00
Michael Ellerman	9044adca78	Merge branch 'topic/ppc-kvm' into next Merge our ppc-kvm topic branch to bring in the Ultravisor support patches.	2019-08-30 09:52:57 +10:00
Sukadev Bhattiprolu	6c85b7bc63	powerpc/kvm: Use UV_RETURN ucall to return to ultravisor When an SVM makes an hypercall or incurs some other exception, the Ultravisor usually forwards (a.k.a. reflects) the exceptions to the Hypervisor. After processing the exception, Hypervisor uses the UV_RETURN ultracall to return control back to the SVM. The expected register state on entry to this ultracall is: * Non-volatile registers are restored to their original values. * If returning from an hypercall, register R0 contains the return value (unlike other ultracalls) and, registers R4 through R12 contain any output values of the hypercall. * R3 contains the ultracall number, i.e UV_RETURN. * If returning with a synthesized interrupt, R2 contains the synthesized interrupt number. Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-8-cclaudio@linux.ibm.com	2019-08-30 09:40:16 +10:00
Claudio Carvalho	5223134029	powerpc/mm: Write to PTCR only if ultravisor disabled In ultravisor enabled systems, PTCR becomes ultravisor privileged only for writing and an attempt to write to it will cause a Hypervisor Emulation Assitance interrupt. This patch uses the set_ptcr_when_no_uv() function to restrict PTCR writing to only when ultravisor is disabled. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-6-cclaudio@linux.ibm.com	2019-08-30 09:40:16 +10:00
Michael Anderson	139a1d2842	powerpc/mm: Use UV_WRITE_PATE ucall to register a PATE When Ultravisor (UV) is enabled, the partition table is stored in secure memory and can only be accessed via the UV. The Hypervisor (HV) however maintains a copy of the partition table in normal memory to allow Nest MMU translations to occur (for normal VMs). The HV copy includes partition table entries (PATE)s for secure VMs which would currently be unused (Nest MMU translations cannot access secure memory) but they would be needed as we add functionality. This patch adds the UV_WRITE_PATE ucall which is used to update the PATE for a VM (both normal and secure) when Ultravisor is enabled. Signed-off-by: Michael Anderson <andmike@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ cclaudio: Write the PATE in HV's table before doing that in UV's ] Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Reviewed-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-5-cclaudio@linux.ibm.com	2019-08-30 09:40:15 +10:00
Claudio Carvalho	bb04ffe85e	powerpc/powernv: Introduce FW_FEATURE_ULTRAVISOR In PEF enabled systems, some of the resources which were previously hypervisor privileged are now ultravisor privileged and controlled by the ultravisor firmware. This adds FW_FEATURE_ULTRAVISOR to indicate if PEF is enabled. The host kernel can use FW_FEATURE_ULTRAVISOR, for instance, to skip accessing resources (e.g. PTCR and LDBAR) in case PEF is enabled. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> [ andmike: Device node name to "ibm,ultravisor" ] Signed-off-by: Michael Anderson <andmike@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-4-cclaudio@linux.ibm.com	2019-08-30 09:40:15 +10:00
Claudio Carvalho	a49dddbdb0	powerpc/kernel: Add ucall_norets() ultravisor call handler The ultracalls (ucalls for short) allow the Secure Virtual Machines (SVM)s and hypervisor to request services from the ultravisor such as accessing a register or memory region that can only be accessed when running in ultravisor-privileged mode. This patch adds the ucall_norets() ultravisor call handler. The specific service needed from an ucall is specified in register R3 (the first parameter to the ucall). Other parameters to the ucall, if any, are specified in registers R4 through R12. Return value of all ucalls is in register R3. Other output values from the ucall, if any, are returned in registers R4 through R12. Each ucall returns specific error codes, applicable in the context of the ucall. However, like with the PowerPC Architecture Platform Reference (PAPR), if no specific error code is defined for a particular situation, then the ucall will fallback to an erroneous parameter-position based code. i.e U_PARAMETER, U_P2, U_P3 etc depending on the ucall parameter that may have caused the error. Every host kernel (powernv) needs to be able to do ucalls in case it ends up being run in a machine with ultravisor enabled. Otherwise, the kernel may crash early in boot trying to access ultravisor resources, for instance, trying to set the partition table entry 0. Secure guests also need to be able to do ucalls and its kernel may not have CONFIG_PPC_POWERNV=y. For that reason, the ucall.S file is placed under arch/powerpc/kernel. If ultravisor is not enabled, the ucalls will be redirected to the hypervisor which must handle/fail the call. Thanks to inputs from Ram Pai and Michael Anderson. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-3-cclaudio@linux.ibm.com	2019-08-30 09:40:15 +10:00
Claudio Carvalho	70ed86f4de	powerpc: Add PowerPC Capabilities ELF note Add the PowerPC name and the PPC_ELFNOTE_CAPABILITIES type in the kernel binary ELF note. This type is a bitmap that can be used to advertise kernel capabilities to userland. This patch also defines PPCCAP_ULTRAVISOR_BIT as being the bit zero. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> [ maxiwell: Define the 'PowerPC' type in the elfnote.h ] Signed-off-by: Maxiwell S. Garcia <maxiwell@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829155021.2915-2-maxiwell@linux.ibm.com	2019-08-30 09:40:15 +10:00
Alexey Kardashevskiy	a102f139aa	powerpc/powernv/ioda: Remove obsolete iommu_table_ops::exchange callbacks As now we have xchg_no_kill/tce_kill, these are not used anymore so remove them. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-6-aik@ozlabs.ru	2019-08-30 09:40:15 +10:00
Alexey Kardashevskiy	35872480da	powerpc/powernv/ioda: Split out TCE invalidation from TCE updates At the moment updates in a TCE table are made by iommu_table_ops::exchange which update one TCE and invalidates an entry in the PHB/NPU TCE cache via set of registers called "TCE Kill" (hence the naming). Writing a TCE is a simple xchg() but invalidating the TCE cache is a relatively expensive OPAL call. Mapping a 100GB guest with PCI+NPU passed through devices takes about 20s. Thankfully we can do better. Since such big mappings happen at the boot time and when memory is plugged/onlined (i.e. not often), these requests come in 512 pages so we call call OPAL 512 times less which brings 20s from the above to less than 10s. Also, since TCE caches can be flushed entirely, calling OPAL for 512 TCEs helps skiboot [1] to decide whether to flush the entire cache or not. This implements 2 new iommu_table_ops callbacks: - xchg_no_kill() to update a single TCE with no TCE invalidation; - tce_kill() to invalidate multiple TCEs. This uses the same xchg_no_kill() callback for IODA1/2. This implements 2 new wrappers on top of the new callbacks similar to the existing iommu_tce_xchg(). This does not use the new callbacks yet, the next patches will; so this should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-2-aik@ozlabs.ru	2019-08-30 09:40:14 +10:00
Christoph Hellwig	f2902a2fb4	powerpc: use the generic dma coherent remap allocator This switches to using common code for the DMA allocations, including potential use of the CMA allocator if configured. Switching to the generic code enables DMA allocations from atomic context, which is required by the DMA API documentation, and also adds various other minor features drivers start relying upon. It also makes sure we have on tested code base for all architectures that require uncached pte bits for coherent DMA allocations. Another advantage is that consistent memory allocations now share the general vmalloc pool instead of needing an explicit careout from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # tested on 8xx Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190814132230.31874-2-hch@lst.de	2019-08-28 23:19:34 +10:00
Christophe Leroy	e0291f1dec	powerpc/32: drop CPU_FTR_UNIFIED_ID_CACHE Only 601 and e200 have unified I/D cache. Drop the feature and use CONFIG_PPC_BOOK3S_601 and CONFIG_E200. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b5902144266d2f4eed1ffea53915bd0245841e02.1566834712.git.christophe.leroy@c-s.fr	2019-08-28 23:19:33 +10:00
Christophe Leroy	88fb309409	powerpc/32s: drop CPU_FTR_USE_RTC feature CPU_FTR_USE_RTC feature only applies to powerpc601. Drop this feature and replace it with tests on CONFIG_PPC_BOOK3S_601. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/170411e2360861f4a95c21faad43519a08bc4040.1566834712.git.christophe.leroy@c-s.fr	2019-08-28 23:19:33 +10:00
Christophe Leroy	12c3f1fd87	powerpc/32s: get rid of CPU_FTR_601 feature Now that 601 is exclusive from other 6xx, CPU_FTR_601 and associated fixups are useless. Drop this feature and use #ifdefs instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ecdb7194a17dbfa01865df6a82979533adc2c70b.1566834712.git.christophe.leroy@c-s.fr	2019-08-28 23:19:33 +10:00
Christophe Leroy	63ce271b5e	powerpc/prom: convert PROM_BUG() to standard trap Prior to commit 1bd98d7fbaf5 ("ppc64: Update BUG handling based on ppc32"), BUG() family was using BUG_ILLEGAL_INSTRUCTION which was an invalid instruction opcode to trap into program check exception. That commit converted them to using standard trap instructions, but prom/prom_init and their PROM_BUG() macro were left over. head_64.S and exception-64s.S were left aside as well. Convert them to using the standard BUG infrastructure. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cdaf4bbbb64c288a077845846f04b12683f8875a.1566817807.git.christophe.leroy@c-s.fr	2019-08-28 11:31:18 +10:00
Christopher M. Riedl	405efc5980	powerpc/spinlocks: Fix oops in __spin_yield() on bare metal Booting w/ppc64le_defconfig + CONFIG_PREEMPT on bare metal results in the oops below due to calling into __spin_yield() when not running in an SPLPAR, which means lppaca pointers are NULL. We fixed a similar case previously in commit `a6201da34f` ("powerpc: Fix oops due to bad access of lppaca on bare metal"), by adding SPLPAR checks in lppaca_shared_proc(). However when PREEMPT is enabled we can call __spin_yield() directly from arch_spin_yield(). To fix it add spin_yield() and rw_yield() which check that shared-processor LPAR is enabled before calling the SPLPAR-only implementation of each. BUG: Kernel NULL pointer dereference at 0x00000100 Faulting instruction address: 0xc000000000097f88 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.2.0-rc6-00491-g249155c20f9b #28 NIP: c000000000097f88 LR: c000000000c07a88 CTR: c00000000015ca10 REGS: c0000000727079f0 TRAP: 0300 Not tainted (5.2.0-rc6-00491-g249155c20f9b) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 84000424 XER: 20040000 CFAR: c000000000c07a84 DAR: 0000000000000100 DSISR: 00080000 IRQMASK: 1 GPR00: c000000000c07a88 c000000072707c80 c000000001546300 c00000007be38a80 GPR04: c0000000726f0c00 0000000000000002 c00000007279c980 0000000000000100 GPR08: c000000001581b78 0000000080000001 0000000000000008 c00000007279c9b0 GPR12: 0000000000000000 c000000001730000 c000000000142558 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: c00000007be38a80 c000000000c002f4 0000000000000000 0000000000000000 GPR28: c000000072221a00 c0000000726c2600 c00000007be38a80 c00000007be38a80 NIP [c000000000097f88] __spin_yield+0x48/0xa0 LR [c000000000c07a88] __raw_spin_lock+0xb8/0xc0 Call Trace: [c000000072707c80] [c000000072221a00] 0xc000000072221a00 (unreliable) [c000000072707cb0] [c000000000bffb0c] __schedule+0xbc/0x850 [c000000072707d70] [c000000000c002f4] schedule+0x54/0x130 [c000000072707da0] [c0000000001427dc] kthreadd+0x28c/0x2b0 [c000000072707e20] [c00000000000c1cc] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 4d9e0020 552a043e 210a07ff 79080fe0 0b080000 3d020004 3908b878 794a1f24 e8e80000 7ce7502a e8e70000 38e70100 <7ca03c2c> 70a70001 78a50020 4d820020 ---[ end trace 474d6b2b8fc5cb7e ]--- Fixes: `499dcd4137` ("powerpc/64s: Allocate LPPACAs individually") Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> [mpe: Reword change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813031314.1828-4-cmr@informatik.wtf	2019-08-27 21:34:34 +10:00
Christopher M. Riedl	31391ff7ea	powerpc/spinlocks: Rename SPLPAR-only spinlocks The __rw_yield and __spin_yield locks only pertain to SPLPAR mode. Rename them to make this relationship obvious. Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813031314.1828-3-cmr@informatik.wtf	2019-08-27 13:03:36 +10:00
Christopher M. Riedl	d57b78353a	powerpc/spinlocks: Refactor SHARED_PROCESSOR Determining if a processor is in shared processor mode is not a constant so don't hide it behind a #define. Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813031314.1828-2-cmr@informatik.wtf	2019-08-27 13:03:36 +10:00
Christophe Leroy	d7fb5b18a5	powerpc/64: optimise LOAD_REG_IMMEDIATE_SYM() Optimise LOAD_REG_IMMEDIATE_SYM() using a temporary register to parallelise operations. It reduces the path from 5 to 3 instructions. Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bad41ed02531bb0382420cbab50a0d7153b71767.1566311636.git.christophe.leroy@c-s.fr	2019-08-27 13:03:36 +10:00
Christophe Leroy	c691b4b83b	powerpc: rewrite LOAD_REG_IMMEDIATE() as an intelligent macro Today LOAD_REG_IMMEDIATE() is a basic #define which loads all parts on a value into a register, including the parts that are NUL. This means always 2 instructions on PPC32 and always 5 instructions on PPC64. And those instructions cannot run in parallele as they are updating the same register. Ex: LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in: 3c 20 00 00 lis r1,0 60 21 00 00 ori r1,r1,0 78 21 07 c6 rldicr r1,r1,32,31 64 21 00 00 oris r1,r1,0 60 21 40 00 ori r1,r1,16384 Rewrite LOAD_REG_IMMEDIATE() with GAS macro in order to skip the parts that are NUL. Rename existing LOAD_REG_IMMEDIATE() as LOAD_REG_IMMEDIATE_SYM() and use that one for loading value of symbols which are not known at compile time. Now LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in: 38 20 40 00 li r1,16384 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d60ce8dd3a383c7adbfc322bf1d53d81724a6000.1566311636.git.christophe.leroy@c-s.fr	2019-08-27 13:03:36 +10:00
Christophe Leroy	163918fc57	powerpc/mm: split out early ioremap path. ioremap does things differently depending on whether SLAB is available or not at different levels. Try to separate the early path from the beginning. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3acd2dbe04b04f111475e7a59f2b6f2ab9b95ab6.1566309263.git.christophe.leroy@c-s.fr	2019-08-27 13:03:35 +10:00
Christophe Leroy	4a45b7460c	powerpc/mm: refactor ioremap vm area setup. PPC32 and PPC64 are doing the same once SLAB is available. Create a do_ioremap() function that calls get_vm_area and do the mapping. For PPC64, we add the 4K PFN hack sanity check to __ioremap_caller() in order to avoid using __ioremap_at(). Other checks in __ioremap_at() are irrelevant for __ioremap_caller(). On PPC64, VM area is allocated in the range [ioremap_bot ; IOREMAP_END] On PPC32, VM area is allocated in the range [VMALLOC_START ; VMALLOC_END] Lets define IOREMAP_START is ioremap_bot for PPC64, and alias IOREMAP_START/END to VMALLOC_START/END on PPC32 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/42e7e36ad32e0fdf76692426cc642799c9f689b8.1566309263.git.christophe.leroy@c-s.fr	2019-08-27 13:03:35 +10:00
Christophe Leroy	191e42063a	powerpc/mm: refactor ioremap_range() and use ioremap_page_range() book3s64's ioremap_range() is almost same as fallback ioremap_range(), except that it calls radix__ioremap_range() when radix is enabled. radix__ioremap_range() is also very similar to the other ones, expect that it calls ioremap_page_range when slab is available. PPC32 __ioremap_caller() have a loop doing the same thing as ioremap_range() so use it on PPC32 as well. Lets keep only one version of ioremap_range() which calls ioremap_page_range() on all platforms when slab is available. At the same time, drop the nid parameter which is not used. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4b1dca7096b01823b101be7338983578641547f1.1566309263.git.christophe.leroy@c-s.fr	2019-08-27 13:03:35 +10:00
Christophe Leroy	7cd9b317b6	powerpc/mm: make ioremap_bot common to all Drop multiple definitions of ioremap_bot and make one common to all subarches. Only CONFIG_PPC_BOOK3E_64 had a global static init value for ioremap_bot. Now ioremap_bot is set in early_init_mmu_global(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/920eebfd9f36f14c79d1755847f5bf7c83703bdd.1566309262.git.christophe.leroy@c-s.fr	2019-08-27 13:03:34 +10:00
Christophe Leroy	14b4d97669	powerpc/mm: rework io-workaround invocation. ppc_md.ioremap() is only used for I/O workaround on CELL platform, so indirect function call can be avoided. This patch reworks the io-workaround and ioremap() functions to use the global 'io_workaround_inited' flag for the activation of io-workaround. When CONFIG_PPC_IO_WORKAROUNDS or CONFIG_PPC_INDIRECT_MMIO are not selected, the I/O workaround ioremap() voids and the global flag is not used. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5fa3ef069fbd0f152512afaae19e7a60161454cf.1566309262.git.christophe.leroy@c-s.fr	2019-08-27 13:03:34 +10:00
Christophe Leroy	492643e81e	powerpc/mm: drop function __ioremap() __ioremap() is not used anymore, drop it. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ccc439f481a0884e00a6be1bab44bab2a4477fea.1566309262.git.christophe.leroy@c-s.fr	2019-08-27 13:03:33 +10:00
Christophe Leroy	8aee077292	powerpc/mm: drop ppc_md.iounmap() and __iounmap() ppc_md.iounmap() is never set, drop it. Once ppc_md.iounmap() is gone, iounmap() remains the only user of __iounmap() and iounmap() does nothing else than calling __iounmap(). So drop iounmap() and make __iounmap() the new iounmap(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d73ba92bb7a387cc58cc34666d7f5158a45851b0.1566309262.git.christophe.leroy@c-s.fr	2019-08-27 13:03:33 +10:00
Christoph Hellwig	f0f8d7ae39	powerpc: remove the ppc44x ocm.c file The on chip memory allocator is entirely unused in the kernel tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7b1668941ad1041d08b19167030868de5840b153.1566309262.git.christophe.leroy@c-s.fr	2019-08-27 13:03:33 +10:00
Sam Bobroff	cef50c67c1	powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse() There are no users of the early-out return value from eeh_pe_dev_traverse(), so remove it. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c648070f5b28fe8ca1880b48e64b267959ffd369.1565930772.git.sbobroff@linux.ibm.com	2019-08-22 23:12:47 +10:00
Sam Bobroff	1ff8f36fc7	powerpc/eeh: Convert log messages to eeh_edev_* macros Convert existing messages, where appropriate, to use the eeh_edev_* logging macros. The only effect should be minor adjustments to the log messages, apart from: - A new message in pseries_eeh_probe() "Probing device" to match the powernv case. - The "Probing device" message in pnv_eeh_probe() is now generated slightly later, which will mean that it is no longer emitted for devices that aren't probed due to the initial checks. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ce505a0a7a4a5b0367f0f40f8b26e7c0a9cf4cb7.1565930772.git.sbobroff@linux.ibm.com	2019-08-22 23:12:47 +10:00
Sam Bobroff	b093f2cbed	powerpc/eeh: Introduce EEH edev logging macros Now that struct eeh_dev includes the BDFN of it's PCI device, make use of it to replace eeh_edev_info() with a set of dev_dbg()-style macros that only need a struct edev. With the BDFN available without the struct pci_dev, eeh_pci_name() is now unnecessary, so remove it. While only the "info" level function is used here, the others will be used in followup work. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f90ae9a53d762be7b0ccbad79e62b5a1b4f4996e.1565930772.git.sbobroff@linux.ibm.com	2019-08-22 23:12:46 +10:00
Oliver O'Halloran	7c33a994d3	powerpc/eeh: Add bdfn field to eeh_dev Preparation for removing pci_dn from the powernv EEH code. The only thing we really use pci_dn for is to get the bdfn of the device for config space accesses, so adding that information to eeh_dev reduces the need to carry around the pci_dn. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [SB: Re-wrapped commit message, fixed whitespace damage.] Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e458eb69a1f591d8a120782f23a8506b15d3c654.1565930772.git.sbobroff@linux.ibm.com	2019-08-22 23:12:46 +10:00
Sam Bobroff	c44e4ccada	powerpc/eeh: Refactor around eeh_probe_devices() Now that EEH support for all devices (on PowerNV and pSeries) is provided by the pcibios bus add device hooks, eeh_probe_devices() and eeh_addr_cache_build() are redundant and can be removed. Move the EEH enabled message into it's own function so that it can be called from multiple places. Note that previously on pSeries, useless EEH sysfs files were created for some devices that did not have EEH support and this change prevents them from being created. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/33b0a6339d5ac88693de092d6fba984f2a5add66.1565930772.git.sbobroff@linux.ibm.com	2019-08-22 23:12:46 +10:00
Sam Bobroff	685a0bc00a	powerpc/eeh: Initialize EEH address cache earlier The EEH address cache is currently initialized and populated by a single function: eeh_addr_cache_build(). While the initial population of the cache can only be done once resources are allocated, initialization (just setting up a spinlock) could be done much earlier. So move the initialization step into a separate function and call it from a core_initcall (rather than a subsys initcall). This will allow future work to make use of the cache during boot time PCI scanning. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0557206741bffee76cdfff042f65321f6f7a5b41.1565930772.git.sbobroff@linux.ibm.com	2019-08-22 23:11:48 +10:00
Santosh Sivaraj	42ac26d253	powerpc: add machine check safe copy_to_user Use memcpy_mcsafe() implementation to define copy_to_user_mcsafe() Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-8-santosh@fossix.org	2019-08-21 22:23:49 +10:00
Balbir Singh	4d4a273854	powerpc/memcpy: Add memcpy_mcsafe for pmem The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to convert machine check exceptions into a return value on failure in case a machine check exception is encountered during the memcpy. The return value is the number of bytes remaining to be copied. This patch largely borrows from the copyuser_power7 logic and does not add the VMX optimizations, largely to keep the patch simple. If needed those optimizations can be folded in. Signed-off-by: Balbir Singh <bsingharora@gmail.com> [arbab@linux.ibm.com: Added symbol export] Co-developed-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-7-santosh@fossix.org	2019-08-21 22:23:48 +10:00
Balbir Singh	895e3dceeb	powerpc/mce: Handle UE event for memcpy_mcsafe If we take a UE on one of the instructions with a fixup entry, set nip to continue execution at the fixup entry. Stop processing the event further or print it. Co-developed-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-6-santosh@fossix.org	2019-08-21 22:23:48 +10:00
Nicholas Piggin	6bb25170d7	powerpc/64s/radix: Remove redundant pfn_pte bitop, add VM_BUG_ON pfn_pte is never given a pte above the addressable physical memory limit, so the masking is redundant. In case of a software bug, it is not obviously better to silently truncate the pfn than to corrupt the pte (either one will result in memory corruption or crashes), so there is no reason to add this to the fast path. Add VM_BUG_ON to catch cases where the pfn is invalid. These would catch the create_section_mapping bug fixed by a previous commit. [16885.256466] ------------[ cut here ]------------ [16885.256492] kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! cpu 0x0: Vector: 700 (Program Check) at [c0000000ee0a36d0] pc: c000000000080738: __map_kernel_page+0x248/0x6f0 lr: c000000000080ac0: __map_kernel_page+0x5d0/0x6f0 sp: c0000000ee0a3960 msr: 9000000000029033 current = 0xc0000000ec63b400 paca = 0xc0000000017f0000 irqmask: 0x03 irq_happened: 0x01 pid = 85, comm = sh kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! Linux version 5.3.0-rc1-00001-g0fe93e5f3394 enter ? for help [c0000000ee0a3a00] c000000000d37378 create_physical_mapping+0x260/0x360 [c0000000ee0a3b10] c000000000d370bc create_section_mapping+0x1c/0x3c [c0000000ee0a3b30] c000000000071f54 arch_add_memory+0x74/0x130 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-5-npiggin@gmail.com	2019-08-20 21:22:20 +10:00
Nicholas Piggin	4dd7554a64	powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses Ensure __va is given a physical address below PAGE_OFFSET, and __pa is given a virtual address above PAGE_OFFSET. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-4-npiggin@gmail.com	2019-08-20 21:22:20 +10:00
Nicholas Piggin	e354d7dc81	powerpc/64: allow compiler to cache 'current' current may be cached by the compiler, so remove the volatile asm restriction. This results in better generated code, as well as being smaller and fewer dependent loads, it can avoid store-hit-load flushes like this one that shows up in irq_exit(): preempt_count_sub(HARDIRQ_OFFSET); if (!in_interrupt() && ...) Which ends up as: ((struct thread_info )current)->preempt_count -= HARDIRQ_OFFSET; if (((struct thread_info )current)->preempt_count ... Evaluating current twice presently means it has to be loaded twice, and here gcc happens to pick a different register each time, then preempt_count is accessed via that base register: 1058: ld r10,2392(r13) <-- current 105c: lwz r9,0(r10) <-- preempt_count 1060: addis r9,r9,-1 1064: stw r9,0(r10) <-- preempt_count 1068: ld r9,2392(r13) <-- current 106c: lwz r9,0(r9) <-- preempt_count 1070: rlwinm. r9,r9,0,11,23 1074: bne 1090 <irq_exit+0x60> This can frustrate store-hit-load detection heuristics and cause flushes. Allowing the compiler to cache current in a reigster with this patch results in the same base register being used for all accesses, which is more likely to be detected as an alias: 1058: ld r31,2392(r13) ... 1070: lwz r9,0(r31) 1074: addis r9,r9,-1 1078: stw r9,0(r31) 107c: lwz r9,0(r31) 1080: rlwinm. r9,r9,0,11,23 1084: bne 10a0 <irq_exit+0x60> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190612140317.24490-1-npiggin@gmail.com	2019-08-20 21:22:15 +10:00
Christophe Leroy	7ab0b7cb89	powerpc/32: Add warning on misaligned copy_page() or clear_page() copy_page() and clear_page() expect page aligned destination, and use dcbz instruction to clear entire cache lines based on the assumption that the destination is cache aligned. As shown during analysis of a bug in BTRFS filesystem, a misaligned copy_page() can create bugs that are difficult to locate (see Link). Add an explicit WARNING when copy_page() or clear_page() are called with misaligned destination. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=204371 Link: https://lore.kernel.org/r/c6cea38f90480268d439ca44a645647e260fff09.1565941808.git.christophe.leroy@c-s.fr	2019-08-20 21:22:15 +10:00
Christophe Leroy	4c1616ef03	powerpc/mm: move FSL_BOOK3 version of update_mmu_cache() Move FSL_BOOK3E version of update_mmu_cache() at the same place as book3e_hugetlb_preload() as update_mmu_cache() is the only user of book3e_hugetlb_preload(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4d69fdc86df9c74adc71a60331a86f6afb8b5e9e.1565933217.git.christophe.leroy@c-s.fr	2019-08-20 21:22:14 +10:00
Christophe Leroy	d964211791	powerpc/mm: define empty update_mmu_cache() as static inline Only BOOK3S and FSL_BOOK3E have a usefull update_mmu_cache(). For the others, just define it static inline. In the meantime, simplify the FSL_BOOK3E related ifdef as book3e_hugetlb_preload() only exists when CONFIG_PPC_FSL_BOOK3E is selected. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/668aba4db6b9af6d8a151174e11a4289f1a6bbcd.1565933217.git.christophe.leroy@c-s.fr	2019-08-20 21:22:14 +10:00
Christophe Leroy	38a0d0cdb4	powerpc/futex: Fix warning: 'oldval' may be used uninitialized in this function We see warnings such as: kernel/futex.c: In function 'do_futex': kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ kernel/futex.c:1651:6: note: 'oldval' was declared here int oldval, ret; ^ This is because arch_futex_atomic_op_inuser() only sets oval if ret is 0 and GCC doesn't see that it will only use it when ret is 0. Anyway, the non-zero ret path is an error path that won't suffer from setting oval, and as *oval is a local var in futex_atomic_op_inuser() it will have no impact. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: reword change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/86b72f0c134367b214910b27b9a6dd3321af93bb.1565774657.git.christophe.leroy@c-s.fr	2019-08-20 21:22:12 +10:00

1 2 3 4 5 ...

5250 Commits