linux

Author	SHA1	Message	Date
Guilherme G. Piccoli	8445a87f70	powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Commit `39baadbf36` ("powerpc/eeh: Remove eeh information from pci_dn") changed the pci_dn struct by removing its EEH-related members. As part of this clean-up, DDW mechanism was modified to read the device configuration address from eeh_dev struct. As a consequence, now if we disable EEH mechanism on kernel command-line for example, the DDW mechanism will fail, generating a kernel oops by dereferencing a NULL pointer (which turns to be the eeh_dev pointer). This patch just changes the configuration address calculation on DDW functions to a manual calculation based on pci_dn members instead of using eeh_dev-based address. No functional changes were made. This was tested on pSeries, both in PHyp and qemu guest. Fixes: `39baadbf36` ("powerpc/eeh: Remove eeh information from pci_dn") Cc: stable@vger.kernel.org # v3.4+ Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-12 19:52:21 +10:00
Guilherme G. Piccoli	c2078d9ef6	Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" This reverts commit `89a51df5ab`. The function eeh_add_device_early() is used to perform EEH initialization in devices added later on the system, like in hotplug/DLPAR scenarios. Since the commit `89a51df5ab` ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") a new check was introduced in this function - Cell has no EEH capabilities which led to kernel oops if hotplug was performed, so checking for eeh_enabled() was introduced to avoid the issue. However, in architectures that EEH is present like pSeries or PowerNV, we might reach a case in which no PCI devices are present on boot time and so EEH is not initialized. Then, if a device is added via DLPAR for example, eeh_add_device_early() fails because eeh_enabled() is false, and EEH end up not being enabled at all. This reverts the aforementioned patch since a new verification was introduced by the commit `d91dafc02f` ("powerpc/eeh: Delay probing EEH device during hotplug") and so the original Cell issue does not happen anymore. Cc: stable@vger.kernel.org # v4.1+ Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-12 19:52:21 +10:00
Gavin Shan	d6d63d720d	powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() The label "reset" in eeh_pe_change_owner() is used only for once. No need to keep it and just drop it. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-12 19:52:20 +10:00
Gavin Shan	2efc771f24	powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrough device are transferred to guest and backwards, meaning the device's driver is vfio-pci or none. In both cases, the handlers triggered by eeh_report_reset() and eeh_report_resume() shouldn't be called. This ignores the error handlers from eeh_report_reset() and eeh_report_resume(). Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-12 19:52:20 +10:00
Gavin Shan	5a0cdbfd17	powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrou device are transferred to guest and backwards. The content in the device's config space will be lost on PE reset issued in the middle of the recovery. The function saves/restores it before/after the reset. However, config access to some adapters like Broadcom BCM5719 at this point will causes fenced PHB. The config space is always blocked and we save 0xFF's that are restored at late point. The memory BARs are totally corrupted, causing another EEH error upon access to one of the memory BARs. This restores the config space on those adapters like BCM5719 from the content saved to the EEH device when it's populated, to resolve above issue. Fixes: `5cfb20b9` ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Cc: stable@vger.kernel.org #v3.18+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-12 19:52:20 +10:00
Gavin Shan	affeb0f2d3	powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() The function eeh_pe_reset_and_recover() is used to recover EEH error when the passthrough device are transferred to guest and backwards, meaning the device's driver is vfio-pci or none. When the driver is vfio-pci that provides error_detected() error handler only, the handler simply stops the guest and it's not expected behaviour. On the other hand, no error handlers will be called if we don't have a bound driver. This ignores the error handler in eeh_pe_reset_and_recover() that reports the error to device driver to avoid the exceptional behaviour. Fixes: `5cfb20b9` ("powerpc/eeh: Emulate EEH recovery for VFIO devices") Cc: stable@vger.kernel.org #v3.18+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-12 19:52:20 +10:00
Michael Ellerman	848912e547	Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" This reverts commit `c8ceacc22b`. Gavin says: I missed the fact that it affects the PCI passthrou path as reported by Alexey: When passing GPU (0003:01:00.0) which seats behind the root port, the reset request is routed to skiboot in original code. In skiboot, the link bouncing events are masked during the reset. So we don't see EEH (freeze all) error even link bouncing happens. With the changes included, the reset is done by kernel and the link bouncing events aren't masked by altering content of PHB3 (or P7IOC) specific hardware registers which are invisible to kernel (skiboot hides the hardware specific). It means the link bouncing is seen by the root port and it causes a EEH (freeze all) error. The PCI passthrough on GPU device cannot work. Requested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Requested-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-12 19:43:37 +10:00
Ingo Molnar	4eb8676517	Merge branch 'smp/hotplug' into sched/core, to resolve conflicts Conflicts: kernel/sched/core.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-12 09:51:36 +02:00
Paul Mackerras	b1a4286b8f	KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts Commit `c9a5eccac1` ("kvm/eventfd: add arch-specific set_irq", 2015-10-16) added the possibility for architecture-specific code to handle the generation of virtual interrupts in atomic context where possible, without having to schedule a work function. Since we can easily generate virtual interrupts on XICS without having to do anything worse than take a spinlock, we define a kvm_arch_set_irq_inatomic() for XICS. We also remove kvm_set_msi() since it is not used any more. The one slightly tricky thing is that with the new interface, we don't get told whether the interrupt is an MSI (or other edge sensitive interrupt) vs. level-sensitive. The difference as far as interrupt generation is concerned is that for LSIs we have to set the asserted flag so it will continue to fire until it is explicitly cleared. In fact the XICS code gets told which interrupts are LSIs by userspace when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES attribute group on the XICS device. To store this information, we add a new "lsi" field to struct ics_irq_state. With that we can also do a better job of returning accurate values when reading the attribute group. Signed-off-by: Paul Mackerras <paulus@samba.org>	2016-05-12 16:40:55 +10:00
Greg Kurz	0b1b1dfd52	kvm: introduce KVM_MAX_VCPU_ID The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and also the upper limit for vCPU ids. This is okay for all archs except PowerPC which can have higher ids, depending on the cpu/core/thread topology. In the worst case (single threaded guest, host with 8 threads per core), it limits the maximum number of vCPUS to KVM_MAX_VCPUS / 8. This patch separates the vCPU numbering from the total number of vCPUs, with the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids plus one. The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids before passing them to KVM_CREATE_VCPU. This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC. Other archs continue to return KVM_MAX_VCPUS instead. Suggested-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-11 22:37:54 +02:00
Ingo Molnar	d2950158d0	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-11 16:56:38 +02:00
Alexey Kardashevskiy	b5cb9ab1a0	powerpc/powernv/npu: Enable NVLink pass through IBM POWER8 NVlink systems come with Tesla K40-ish GPUs each of which also has a couple of fast speed links (NVLink). The interface to links is exposed as an emulated PCI bridge which is included into the same IOMMU group as the corresponding GPU. In the kernel, NPUs get a separate PHB of the PNV_PHB_NPU type and a PE which behave pretty much as the standard IODA2 PHB except NPU PHB has just a single TVE in the hardware which means it can have either 32bit window or 64bit window or DMA bypass but never two of these. In order to make these links work when GPU is passed to the guest, these bridges need to be passed as well; otherwise performance will degrade. This implements and exports API to manage NPU state in regard to VFIO; it replicates iommu_table_group_ops. This defines a new pnv_pci_ioda2_npu_ops which is assigned to the IODA2 bridge if there are NPUs for a GPU on the bridge. The new callbacks call the default IODA2 callbacks plus new NPU API. This adds a gpe_table_group_to_npe() helper to find NPU PE for the IODA2 table_group, it is not expected to fail as the helper is only called from the pnv_pci_ioda2_npu_ops. This does not define NPU-specific .release_ownership() so after VFIO is finished, DMA on NPU is disabled which is ok as the nvidia driver sets DMA mask when probing which enable 32 or 64bit DMA on NPU. This adds a pnv_pci_npu_setup_iommu() helper which adds NPUs to the GPU group if any found. The helper uses helpers to look for the "ibm,gpu" property in the device tree which is a phandle of the corresponding GPU. This adds an additional loop over PEs in pnv_ioda_setup_dma() as the main loop skips NPU PEs as they do not have 32bit DMA segments. As pnv_npu_set_window() and pnv_npu_unset_window() are started being used by the new IODA2-NPU IOMMU group, this makes the helpers public and adds the DMA window number parameter. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-By: Alistair Popple <alistair@popple.id.au> [mpe: Add pnv_pci_ioda_setup_iommu_api() to fix build with IOMMU_API=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:31 +10:00
Alexey Kardashevskiy	85674868ce	powerpc/powernv/npu: Rework TCE Kill handling The pnv_ioda_pe struct keeps an array of peers. At the moment it is only used to link GPU and NPU for 2 purposes: 1. Access NPU quickly when configuring DMA for GPU - this was addressed in the previos patch by removing use of it as DMA setup is not what the kernel would constantly do. 2. Invalidate TCE cache for NPU when it is invalidated for GPU. GPU and NPU are in different PE. There is already a mechanism to attach multiple iommu_table_group to the same iommu_table (used for VFIO), we can reuse it here so does this patch. This gets rid of peers[] array and PNV_IODA_PE_PEER flag as they are not needed anymore. While we are here, add TCE cache invalidation after enabling bypass. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:31 +10:00
Alexey Kardashevskiy	b575c731fe	powerpc/powernv/npu: Add set/unset window helpers The upcoming NVLink passthrough support will require NPU code to cope with two DMA windows. This adds a pnv_npu_set_window() helper which programs 32bit window to the hardware. This also adds multilevel TCE support. This adds a pnv_npu_unset_window() helper which removes the DMA window from the hardware. This does not make difference now as the caller - pnv_npu_dma_set_bypass() - enables bypass in the hardware but the next patch will use it to manage TCE table lists for TCE Kill handling. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:30 +10:00
Alexey Kardashevskiy	7d623e4256	powerpc/powernv/ioda2: Export debug helper pe_level_printk() This exports debugging helper pe_level_printk() and corresponding macroses so they can be used in npu-dma.c. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:30 +10:00
Alexey Kardashevskiy	f9f8345674	powerpc/powernv/npu: Simplify DMA setup NPU devices are emulated in firmware and mainly used for NPU NVLink training; one NPU device is per a hardware link. Their DMA/TCE setup must match the GPU which is connected via PCIe and NVLink so any changes to the DMA/TCE setup on the GPU PCIe device need to be propagated to the NVLink device as this is what device drivers expect and it doesn't make much sense to do anything else. This makes NPU DMA setup explicit. pnv_npu_ioda_controller_ops::pnv_npu_dma_set_mask is moved to pci-ioda, made static and prints warning as dma_set_mask() should never be called on this function as in any case it will not configure GPU; so we make this explicit. Instead of using PNV_IODA_PE_PEER and peers[] (which the next patch will remove), we test every PCI device if there are corresponding NVLink devices. If there are any, we propagate bypass mode to just found NPU devices by calling the setup helper directly (which takes @bypass) and avoid guessing (i.e. calculating from DMA mask) whether we need bypass or not on NPU devices. Since DMA setup happens in very rare occasion, this will not slow down booting or VFIO start/stop much. This renames pnv_npu_disable_bypass to pnv_npu_dma_set_32 to make it more clear what the function really does which is programming 32bit table address to the TVT ("disabling bypass" means writing zeroes to the TVT). This removes pnv_npu_dma_set_bypass() from pnv_npu_ioda_fixup() as the DMA configuration on NPU does not matter until dma_set_mask() is called on GPU and that will do the NPU DMA configuration. This removes phb->dma_dev_setup initialization for NPU as pnv_pci_ioda_dma_dev_setup is no-op for it anyway. This stops using npe->tce_bypass_base as it never changes and values other than zero are not supported. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:29 +10:00
Alexey Kardashevskiy	6969af7352	powerpc/powernv/npu: Use the correct IOMMU page size This uses the page size from iommu_table instead of hard-coded 4K. This should cause no change in behavior. While we are here, move bits around to prepare for further rework which will define and use iommu_table_group_ops. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:29 +10:00
Alexey Kardashevskiy	0bbcdb437d	powerpc/powernv/npu: TCE Kill helpers cleanup NPU PHB TCE Kill register is exactly the same as in the rest of POWER8 so let's reuse the existing code for NPU. The only bit missing is a helper to reset the entire TCE cache so this moves such a helper from NPU code and renames it. Since pnv_npu_tce_invalidate() does really invalidate the entire cache, this uses pnv_pci_ioda2_tce_invalidate_entire() directly for NPU. This adds an explicit comment for workaround for invalidating NPU TCE cache. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:29 +10:00
Alexey Kardashevskiy	bef9253f55	powerpc/powernv: Define TCE Kill flags This replaces magic constants for TCE Kill IODA2 register with macros. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:28 +10:00
Alexey Kardashevskiy	a7cf13caad	powerpc/powernv: Rename pnv_pci_ioda2_tce_invalidate_entire As in fact pnv_pci_ioda2_tce_invalidate_entire() invalidates TCEs for the specific PE rather than the entire cache, rename it to pnv_pci_ioda2_tce_invalidate_pe(). In later patches we will add a proper pnv_pci_ioda2_tce_invalidate_entire(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:28 +10:00
Gavin Shan	c8ceacc22b	powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() The function pnv_pci_reset_secondary_bus() is called like below. It's impossible for call the function on root bus. So it's safe to remove the root bus case in the function. No functional changes introduced. pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus() pci_reset_bridge_secondary_bus() pcibios_reset_secondary_bus() pnv_pci_reset_secondary_bus() Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:27 +10:00
Gavin Shan	4fad494321	powerpc/powernv: Simplify pnv_eeh_reset() This drops unnecessary nested if statements in pnv_eeh_reset() to improve the code readability. After the changes, the unused local variable "ret" is dropped as well. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:26 +10:00
Gavin Shan	4a5954ed77	powerpc/pci: Don't scan empty slot In hotplug case, function pci_add_pci_devices() is called to rescan the specified PCI bus, which might not have any child devices. Access to the PCI bus's child device node will cause kernel crash without exception. This adds one more check to skip scanning PCI bus that doesn't have any subordinate devices from device-tree, in order to avoid kernel crash. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:26 +10:00
Gavin Shan	cdddc577d9	powerpc/pci: Export pci_traverse_device_nodes() This renames traverse_pci_devices() to pci_traverse_device_nodes(). The function traverses all subordinate device nodes of the specified one. Also, below cleanup applied to the function. No logical changes introduced. * Rename "pre" to "fn". * Avoid assignment in if condition reported from checkpatch.pl. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:25 +10:00
Gavin Shan	de5a28ac5a	powerpc/pci: Introduce pci_remove_device_node_info() This implements and exports pci_remove_device_node_info(). It's used to remove the pdn (struct pci_dn) for the indicated device node. The function is going to be used by PowerNV PCI hotplug driver. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:25 +10:00
Gavin Shan	d8f66f411e	powerpc/pci: Export pci_add_device_node_info() This renames update_dn_pci_info() to pci_add_device_node_info() with corresponding adjustment on the parameter type and exports it. The function is used to create pdn (struct pci_dn) for the indicated device node. Another function add_pdn(), almost wrapper of pci_add_device_node_info(), to be used in traverse_pci_devices(). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:24 +10:00
Gavin Shan	6384d97780	powerpc/pci: Move pci_find_bus_by_node() around This moves pci_find_bus_by_node() from arch/powerpc/platforms/ pseries/pci_dlpar.c to arch/powerpc/kernel/pci-hotplug.c so that the function can be used by pSeries and PowerNV platform at the same time. Also, below cleanup applied. No functional changes introduced. * Remove variable "busdn" in find_bus_among_children() * Use PCI_DN() to convert device node to pci_dn Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:24 +10:00
Gavin Shan	3773dd258e	powerpc/pci: Rename pcibios_find_pci_bus() This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to avoid conflicts with those PCI subsystem weak function names, which have prefix "pcibios". No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:24 +10:00
Gavin Shan	bd251b893d	powerpc/pci: Rename pcibios_{add, remove}_pci_devices() This renames pcibios_{add,remove}_pci_devices() to avoid conflicts with names of the weak functions in PCI subsystem, which have the prefix "pcibios". No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:23 +10:00
Gavin Shan	1e9167726c	powerpc/powernv: Use PE instead of number during setup and release In current implementation, the PEs that are allocated or picked from the reserved list are identified by PE number. The PE instance has to be picked according to the PE number eventually. We have same issue when PE is released. For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns PE instance so that pnv_ioda_setup_bus_PE() can use the allocated or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE() returns the reserved/allocated PE instance to be used in subsequent patches. On the other hand, pnv_ioda_free_pe() uses PE instance (not number) as its argument. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:23 +10:00
Gavin Shan	2b923ed1bd	powerpc/powernv/ioda1: Improve DMA32 segment track In current implementation, the DMA32 segments required by one specific PE isn't calculated with the information hold in the PE independently. It conflicts with the PCI hotplug design: PE centralized, meaning the PE's DMA32 segments should be calculated from the information hold in the PE independently. This introduces an array (@dma32_segmap) for every PHB to track the DMA32 segmeng usage. Besides, this moves the logic calculating PE's consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's DMA32 segments are calculated/allocated from the information hold in the PE (DMA32 weight). Also the logic is improved: we try to allocate as much DMA32 segments as we can. It's acceptable that number of DMA32 segments less than the expected number are allocated. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:22 +10:00
Gavin Shan	801846d1de	powerpc/powernv: Remove DMA32 PE list PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according to their DMA32 weight. The PEs on the list are iterated to setup their TCE32 tables at system booting time. The list is used for once at boot time and no need to keep it. This moves the logic calculating DMA32 weight of PHB and PE to pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount are useless and they're removed. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:22 +10:00
Gavin Shan	acce971c0e	powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE Currently, there is one macro (TCE32_TABLE_SIZE) representing the TCE table size for one DMA32 segment. The constant representing the DMA32 segment size (1 << 28) is still used in the code. This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32 segment size. the TCE table size can be calcualted when the page has fixed 4KB size. So all the related calculation depends on one macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:21 +10:00
Gavin Shan	b30d936f6f	powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe() This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe() as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe(). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:21 +10:00
Gavin Shan	9945155143	powerpc/powernv/ioda1: M64 support on P7IOC This enables M64 window on P7IOC, which has been enabled on PHB3. Different from PHB3 where 16 M64 BARs are supported and each of them can be owned by one particular PE# exclusively or divided evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each of them are divided to 8 segments. So every P7IOC PHB supports 128 M64 segments in total. P7IOC has M64DT, which helps mapping one particular M64 segment# to arbitrary PE#. PHB3 doesn't have M64DT, indicating that one M64 segment can only be pinned to the fixed PE#. In order to unified M64 support M64 on P7IOC and PHB3, we just provide 128 M64 segments on every P7IOC PHB and each of them is pinned to the fixed PE# by bypassing the function of M64DT. In turn, we just need different phb->init_m64() for P7IOC and PHB3 and maps M64 segment in pnv_ioda_reserve_m64_pe() for P7IOC, most of the code are shared by them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:20 +10:00
Gavin Shan	c430670ad1	powerpc/powernv: Rename M64 related functions This renames those functions picking PE number based on consumed M64 segments, mapping M64 segments to PEs as those functions are going to be shared by IODA1/IODA2 in next patch. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:20 +10:00
Gavin Shan	93289d8c08	powerpc/powernv: Track M64 segment consumption When unplugging PCI devices, their parent PEs might be offline. The consumed M64 resource by the PEs should be released at that time. As we track M32 segment consumption, this introduces an array to the PHB to track the mapping between M64 segment and PE number. Note: M64 mapping isn't covered by pnv_ioda_setup_pe_seg() as IODA2 doesn't support the mapping explicitly while it's supported on IODA1. Until now, no M64 is supported on IODA1 in software. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:19 +10:00
Gavin Shan	69d733e72d	powerpc/powernv: IO and M32 mapping based on PCI device resources Currently, the IO and M32 segments are mapped to the corresponding PE based on the windows of the parent bridge of PE's primary bus. It's not going to work when the windows of root port or upstream port of the PCIe switch behind root port are extended to PHB's apertures in order to support hotplug in subsequent patch. This fixes the issue by mapping IO and M32 segments based on the resources of the PCI devices included in the PE, instead of the windows of the parent bridge of the PE's primary bus. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:19 +10:00
Gavin Shan	23e79425fe	powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() pnv_ioda_setup_pe_seg() associates the IO and M32 segments with the owner PE. The code mapping segments should be fixed and immune from logic changes introduced to pnv_ioda_setup_pe_seg(). This moves the code mapping segments to helper pnv_ioda_setup_pe_res(). The data type for @rc is changed to "int64_t". Also, argument @hose is removed from pnv_ioda_setup_pe() as it can be got from @pe. No functional changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:18 +10:00
Gavin Shan	3fa23ff8ff	powerpc/powernv: Fix initial IO and M32 segmap There are two arrays for IO and M32 segment maps on every PHB. The index of the arrays are segment number and the value stored in the corresponding element is PE number, indicating the segment is assigned to the PE. Initially, all elements in those two arrays are zeroes, meaning all segments are assigned to PE#0. It's wrong. This fixes the initial values in the elements of those two arrays to IODA_INVALID_PE, meaning all segments aren't assigned to any PE. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:18 +10:00
Gavin Shan	689ee8c95f	powerpc/powernv: Data type unsigned int for PE number This changes the data type of PE number from "int" to "unsigned int" in order to match the fact PE number is never negative: * The number of PE to which the specified PCI device is attached. * The PE number map for SRIOV VFs. * The returned PE number from pnv_ioda_alloc_pe(). * The returned PE number from pnv_ioda2_pick_m64_pe(). Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:17 +10:00
Gavin Shan	92b8f137b3	powerpc/powernv: Rename PE# fields in struct pnv_phb This renames the fields related to PE number in "struct pnv_phb" for better reflecting of their usages as Alexey suggested. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:17 +10:00
Gavin Shan	13ce7598b6	powerpc/powernv: Reorder fields in struct pnv_phb This moves those fields in struct pnv_phb that are related to PE allocation around. No logical change. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:16 +10:00
Gavin Shan	475d92c27f	powerpc/powernv: Drop phb->bdfn_to_pe() The last usage of pnv_phb::bdfn_to_pe() was removed in `ff57b454dd` ("powerpc/eeh: Do probe on pci_dn"), so drop it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:16 +10:00
Gavin Shan	cb4224c501	powerpc/powernv: Cleanup on pci_controller_ops instances This cleans up on below data struct instances to use tab instead of space indent of statement to avoid complains from scripts/checkpatch.pl. No logical changes introduced. @pnv_pci_ioda_controller_ops @pnv_npu_ioda_controller_ops Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:15 +10:00
Gavin Shan	062b26ba3e	powerpc/pci: Cleanup on struct pci_controller_ops Each PHB has one instance of "struct pci_controller_ops" that includes various callbacks called by PCI subsystem. In the definition of this struct, some callbacks have explicit names for its arguments, but the left don't have. This adds all explicit names of the arguments to the callbacks in "struct pci_controller_ops" so that the code looks consistent. Also, argument name @dev is replaced by @pdev as the later one is the preferred name for PCI device. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:15 +10:00
Mahesh Salgaonkar	2513767d22	powerpc/powernv: Rename machine_check_pSeries_early() to powernv The routine machine_check_pSeries_early() is only used on powernv, not pseries. Hence rename machine_check_pSeries_early() to machine_check_powernv_early(). Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:11 +10:00
Gavin Shan	171cb719da	powerpc/mm: Improve readability of update_mmu_cache() The function is used to update the MMU with software PTE. It can be called by data access exception handler (0x300) or instruction access exception handler (0x400). If the function is called by 0x400 handler, the local variable @access is set to _PAGE_EXEC to indicate the software PTE should have that flag set. When the function is called by 0x300 handler, @access is set to zero. This improves the readability of the function by replacing if statements with switch. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:09 +10:00
Oliver O'Halloran	dd0b52c47a	powerpc/mm: define TOP_ZONE as a constant The zone that contains the top of memory will be either ZONE_NORMAL or ZONE_HIGHMEM depending on the kernel config. There are two functions that require this information and both of them use an #ifdef to set a local variable (top_zone). This is a little silly so lets just make it a constant. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Cc: linux-mm@kvack.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:08 +10:00
Oliver O'Halloran	6670783606	powerpc/sstep: Fix emulation fall-through There is a switch fallthough in instr_analyze() which can cause an invalid instruction to be emulated as a different, valid, instruction. The rld* (opcode 30) case extracts a sub-opcode from bits 3:1 of the instruction word. However, the only valid values of this field are 001 and 000. These cases are correctly handled, but the others are not which causes execution to fall through into case 31. Breaking out of the switch causes the instruction to be marked as unknown and allows the caller to deal with the invalid instruction in a manner consistent with other invalid instructions. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:08 +10:00
Lennart Sorensen	dd21731022	powerpc/sstep: Fix sstep.c compile on powerpcspe Commit `be96f63375` ("powerpc: Split out instruction analysis part of emulate_step()") introduced ldarx and stdcx into the instructions in sstep.c, which are not accepted by the assembler on powerpcspe, but does seem to be accepted by the normal powerpc assembler even in 32 bit mode. Wrap these two instructions in a __powerpc64__ check like it is everywhere else in the file. Fixes: `be96f63375` ("powerpc: Split out instruction analysis part of emulate_step()") Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:07 +10:00
Paul Mackerras	31cdd0c39c	powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs xmon has commands for reading and writing SPRs, but they don't work currently for several reasons. They attempt to synthesize a small function containing an mfspr or mtspr instruction and call it. However, the instructions are on the stack, which is usually not executable. Also, for 64-bit we set up a procedure descriptor, which is fine for the big-endian ABIv1, but not correct for ABIv2. Finally, the code uses the infrastructure for catching memory errors, but that only catches data storage interrupts and machine check interrupts, but a failed mfspr/mtspr can generate a program interrupt or a hypervisor emulation assist interrupt, or be a no-op. Instead of trying to synthesize a function on the fly, this adds two new functions, xmon_mfspr() and xmon_mtspr(), which take an SPR number as an argument and read or write the SPR. Because there is no Power ISA instruction which takes an SPR number in a register, we have to generate one of each possible mfspr and mtspr instruction, for all 1024 possible SPRs. Thus we get just over 8k bytes of code for each of xmon_mfspr() and xmon_mtspr(). However, this 16kB of code pales in comparison to the > 130kB of PPC opcode tables used by the xmon disassembler. To catch interrupts caused by the mfspr/mtspr instructions, we add a new 'catch_spr_faults' flag. If an interrupt occurs while it is set, we come back into xmon() via program_check_interrupt(), _exception() and die(), see that catch_spr_faults is set and do a longjmp to bus_error_jmp, back into read_spr() or write_spr(). This adds a couple of other nice features: first, a "Sa" command that attempts to read and print out the value of all 1024 SPRs. If any mfspr instruction acts as a no-op, then the SPR is not implemented and not printed. Secondly, the Sr and Sw commands detect when an SPR is not implemented (i.e. mfspr is a no-op) and print a message to that effect rather than printing a bogus value. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:07 +10:00
Chandan Kumar	17ed7c3842	powerpc: Add HAVE_PERF_USER_STACK_DUMP support With perf regs support enabled for powerpc, in commit `ed4a4ef85c` ("powerpc/perf: Add support for sampling interrupt register state"), the support for obtaining perf user stack dump is already enabled. This patch declares the support for same and also updates documentation to mark the support for perf-regs and perf-stackdump. Signed-off-by: Chandan Kumar <chandan.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:05 +10:00
Michael Ellerman	aac55d7573	powerpc/mm/hash64: Fix subpage protection with 4K HPTE config With Linux page size of 64K and hardware only supporting 4K HPTE, if we use subpage protection, we always fail for the subpage 0 as shown below (using the selftest subpage_prot test): 520175565: (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass ! 4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass ! 4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass ! 4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass ! This is because hash preload wrongly inserts the HPTE entry for subpage 0 without looking at the subpage protection information. Fix it by teaching should_hash_preload() not to preload if we have subpage protection configured for that range. It appears this has been broken since it was introduced in 2008. Fixes: `fa28237cfc` ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:05 +10:00
Michael Ellerman	8bbc9b7b00	powerpc/mm/hash64: Factor out hash preload psize check Currently we have a check in hash_preload() against the psize, which is only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand this check in a subsequent patch, so factor it out to allow that. As a bonus it removes the #ifdef in the C code. Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES block because it would require a forward declaration. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:04 +10:00
Suraj Jitindar Singh	925e2d1ded	powerpc: Update of_remove_property() call sites to remove null checking After obtaining a property from of_find_property() and before calling of_remove_property() most code checks to ensure that the property returned from of_find_property() is not null. The previous patch moved this check to the start of the function of_remove_property() in order to avoid the case where this check isn't done and a null value is passed. This ensures the check is always conducted before taking locks and attempting to remove the property. Thus it is no longer necessary to perform a check for null values before invoking of_remove_property(). Update of_remove_property() call sites in order to remove redundant checking for null property value as check is now performed within the of_remove_property function(). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [mpe: Unbreak some lines which are just >80 chars for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:04 +10:00
Suraj Jitindar Singh	b2ed059642	powerpc/pseries: Add null property check to pseries_discover_pic() The return value of of_get_property() isn't checked before it is passed to the strstr() function, if it happens that the return value is null then this will result in a null pointer being dereferenced. Add a check to see if the return value of of_get_property() is null and if it is continue straight on to the next node. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:03 +10:00
Alexey Kardashevskiy	9e44754755	powerpc/powernv/pci: Fix cfg_dbg() & replace with pr_devel() When cfg_dbg() is enabled (i.e. mapped to printk()), gcc produces errors as the __func__ parameter is missing (pnv_pci_cfg_read() has one); this adds the missing parameter. cfg_dbg() is just an inferior version of pr_devel() so use the latter instead. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:02 +10:00
Chris Smart	e44c1b15cf	powerpc: Remove unnecessary CONFIG_SMP #ifdefs The code in machine_restart/power_off/halt() includes #ifdefs around calls to smp_send_stop(), however these are not required as include/linux/smp.h includes an empty version of this function for CONFIG_SMP=n builds. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:01 +10:00
Rashmica Gupta	c415c9cb8a	powerpc: Remove unused remnants from A2 cpu Support for the A2 cpu was removed in commit `fb5a515704` ("powerpc: Remove platforms/wsp and associated pieces"), and the externs: __setup_cpu_a2 and __restore_cpu_a2 are still around and unused, so remove them. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:00 +10:00
Aneesh Kumar K.V	62ccf5bf1f	powerpc/mm/slice: Remove slice_mm_new_context() The usage in mm mmu_context_nohash.c is bogus, because we set the context.id value to MMU_NO_CONTEXT 4 lines previously in the same function, meaning slice_mm_new_context() will always be true. The book3s 64 usage was removed in the previous commit. So remove it as unused. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:54:00 +10:00
Aneesh Kumar K.V	2d566537dd	powerpc/mm/subpage: Initialise user psize correctly As part of the radix support we switched Book3s64 to use a value of ~0 for MMU_NO_CONTEXT. That is because id 0 is special on radix. However that broke the logic in init_new_context(). The code there needs to differentiate between a newly allocated context and one inherited via fork. Previously it worked because a newly allocated context has an id of zero (because it was just memset() to zero), which used to match MMU_NO_CONTEXT, and therefore slice_mm_new_context() did the right thing. Instead check against a context.id value of zero instead of using slice_mm_new_context(). Without this patch we never call slice_set_user_psize(), and end up with a slice psize value of zero and we always end up using 4K HPTE. Fixes: `1a472c9dba` ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:59 +10:00
Valentin Rothberg	bb03efe2b7	powerpc/mm/radix: Fix CONFIG_PPC_MMU_STD_64 typo It's CONFIG_PPC_STD_MMU_64 not ... CONFIG_PPC_MMU_STD_64. Fixes: 11ffc1cfa4c2 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:59 +10:00
Aneesh Kumar K.V	69dfbaeb65	powerpc/mm/radix: Document software bits for radix Add #defines for Power ISA 3.0 software defined bits. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:58 +10:00
Aneesh Kumar K.V	17a3dd2f5f	powerpc/mm/radix: Use firmware feature to enable Radix MMU We use the existing "ibm,pa-features" device-tree property to enable Radix MMU mode. This means we default to hash mode unless firmware tells us it's OK to start using Radix mode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:58 +10:00
Aneesh Kumar K.V	ab62476240	powerpc/mm/radix: Add THP support for 4K linux page size This adds THP support for 4K Linux page size config with radix. We still don't do THP with 4K Linux page size and hash page table. Hash page table needs a 16MB hugepage and we can't do THP with 16MM hugepage and 4K Linux page size. We add missing functions to 4K hash config to get it to build and hash__has_transparent_hugepage() makes sure we don't enable THP for 4K hash config. To catch wrong usage of THP related with 4K config, we add BUG() in those dummy functions we added to get it compile. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:58 +10:00
Aneesh Kumar K.V	bde3eb6222	powerpc/mm/radix: Add radix THP callbacks The deposited pgtable_t is a pte fragment hence we cannot use page->lru for linking then together. We use the first two 64 bits for pte fragment as list_head type to link all deposited fragments together. On withdraw we properly zero then out. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:57 +10:00
Aneesh Kumar K.V	3df33f12be	powerpc/mm/thp: Abstraction for THP functions Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:57 +10:00
Aneesh Kumar K.V	6a1ea36260	powerpc/mm: THP is only available on hash64 as of now Only code movement in this patch. No functionality change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:56 +10:00
Aneesh Kumar K.V	c0a6c719d2	powerpc/mm/radix: Add hugetlb support 4K page size We have hugepage at the pmd level with 4K radix config. Hence we don't need to use hugepd format with radix. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:56 +10:00
Aneesh Kumar K.V	43a5c68427	powerpc/mm/radix: Make sure swapper pgdir is properly aligned With 4K page size radix config our level 1 page table size is 64K and it should be naturally aligned. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:55 +10:00
Aneesh Kumar K.V	484837601d	powerpc/mm: Add radix support for hugetlb Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:55 +10:00
Aneesh Kumar K.V	2f5f0dfd1e	powerpc/mm: Fix vma_mmu_pagesize() for radix Radix doesn't use the slice framework to find the page size. Hence use vma to find the page size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:54 +10:00
Aneesh Kumar K.V	5ed7ecd08a	powerpc/mm: pte_frag abstraction In this patch we make the number of pte fragments per level 4 page table page a variable. Radix level 4 table size is 256 bytes and hence we can have 256 fragments per level 4 page. We don't update the fragment count in this patch. We need to do performance measurements to find the right value for fragment count. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:54 +10:00
Aneesh Kumar K.V	a3dece6d69	powerpc/radix: Update MMU cache With radix there is no MMU cache. Hence we don't need to do anything in update_mmu_cache(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:53 +10:00
Aneesh Kumar K.V	d6a9996e84	powerpc/mm: vmalloc abstraction in preparation for radix The vmalloc range differs between hash and radix config. Hence make VMALLOC_START and related constants a variable which will be runtime initialized depending on whether hash or radix mode is active. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix missing init of ioremap_bot in pgtable_64.c for ppc64e] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:53 +10:00
Aneesh Kumar K.V	4dfb88ca9b	powerpc/mm: Update pte filter for radix Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:52 +10:00
Aneesh Kumar K.V	a2f41eb992	powerpc/mm: Add radix pgalloc details Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:51 +10:00
Aneesh Kumar K.V	934828edfa	powerpc/mm: Make 4K and 64K use pte_t for pgtable_t This patch switches 4K Linux page size config to use pte_t * type instead of struct page * for pgtable_t. This simplifies the code a lot and helps in consolidating both 64K and 4K page allocator routines. The changes should not have any impact, because we already store physical address in the upper level page table tree and that implies we already do struct page * to physical address conversion. One change to note here is we move the pgtable_page_dtor() call for nohash to pte_fragment_free_mm(). The nohash related change is due to the related changes in pgtable_64.c. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:51 +10:00
Aneesh Kumar K.V	74701d5947	powerpc/mm: Rename function to indicate we are allocating fragments Only code cleanup. No functionality change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:50 +10:00
Aneesh Kumar K.V	bcbe7f777e	powerpc/mm: Simplify the code dropping 4-level table #ifdef Simplify the code by dropping 4-level page table #ifdef. We are always 4-level now. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:50 +10:00
Aneesh Kumar K.V	27209206a6	powerpc/mm: Revert changes made to nohash pgalloc-64.h This reverts pgalloc related changes WRT implementing 4-level page table for 64K Linux page size and storing of physical address in higher level page tables since they are only applicable to book3s64 variant and we now have a separate copy for book3s64. This helps to keep these headers simpler. Cc: Scott Wood <scottwood@freescale.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:49 +10:00
Aneesh Kumar K.V	75a9b8a6c2	powerpc/mm: Copy pgalloc (part 2) This moves the nohash variant of pgalloc headers to nohash/ directory Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:49 +10:00
Aneesh Kumar K.V	101ad5c65e	powerpc/mm: Make a copy of pgalloc.h for 32 and 64 book3s This patch start to make a book3s variant for pgalloc headers. We have multiple book3s specific changes such as: * 4 level page table * store physical address in higher level table * use pte_t * for pgtable_t Having a book3s64 specific variant helps to keep code simpler and remove lots of #ifdef around code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:48 +10:00
Aneesh Kumar K.V	b5dcc60969	powerpc/mm/radix: Update PTCR on secondary CPUs Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:48 +10:00
Aneesh Kumar K.V	7a0eedeedd	powerpc/mm/radix: Pick the address layout for radix config Hash needs special get_unmapped_area() handling because of limitations around base page size, so we have to set HAVE_ARCH_UNMAPPED_AREA. With radix we don't have such restrictions, so we could use the generic code. But because we've set HAVE_ARCH_UNMAPPED_AREA (for hash), we have to re-implement the same logic as the generic code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:47 +10:00
Aneesh Kumar K.V	177ba7c647	powerpc/mm/radix: Limit paca allocation in radix On return from RTAS we access the paca variables and we have 64 bit disabled. This requires us to limit paca in 32 bit range. Fix this by setting ppc64_rma_size to first_memblock_size/1G range. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:47 +10:00
Aneesh Kumar K.V	764041e0f4	powerpc/mm/radix: Add checks in slice code to catch radix usage Radix doesn't need slice support. Catch incorrect usage of slice code when radix is enabled. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:46 +10:00
Aneesh Kumar K.V	d8c476eeb6	powerpc/mm/radix: Isolate hash table function from pseries guest code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:46 +10:00
Aneesh Kumar K.V	caca285e5a	powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code We also use MMU_FTR_RADIX to branch out from code path specific to hash. No functionality change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:53:45 +10:00
Aneesh Kumar K.V	a8ed87c92a	powerpc/mm/radix: Add MMU_FTR_RADIX We are going to add asm changes in the follow up patches. Add the feature bit now so that we can get it all build. mpe: When CONFIG_PPC_RADIX_MMU=n we omit MMU_FTR_RADIX from the MMU_FTRS_POSSIBLE mask. This allows the compiler to work out that those checks will always be false and so the code can be elided completely. Note we do not define MMU_FTR_RADIX to 0 in the RADIX_MMU=n case, because that doesn't work with the ASM_FTR patching. In particular an IF_SET section will result in a mask and value of zero, which is always true, meaning the section won't be patched, which is the opposite of what we want. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:49:25 +10:00
Michael Ellerman	773edeadf6	powerpc/mm: Add mask of possible MMU features Follow the example of the cpu feature code, and add a mask of possible MMU features, MMU_FTRS_POSSIBLE. This is used in mmu_has_feature(), which allows the possible mask to act as a shortcut for any features that are not possible, but still allows the feature bit itself to be defined. We will use this in the next commit to allow MMU_FTR_RADIX checks to be elided when MMU_FTR_RADIX is not possible. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-11 21:46:02 +10:00
Gavin Shan	07f8ab255f	KVM: PPC: Book3S HV: Fix build error in book3s_hv.c When CONFIG_KVM_XICS is enabled, CPU_UP_PREPARE and other macros for CPU states in linux/cpu.h are needed by arch/powerpc/kvm/book3s_hv.c. Otherwise, build error as below is seen: gwshan@gwshan:~/sandbox/l$ make arch/powerpc/kvm/book3s_hv.o : CC arch/powerpc/kvm/book3s_hv.o arch/powerpc/kvm/book3s_hv.c: In function ‘kvmppc_cpu_notify’: arch/powerpc/kvm/book3s_hv.c:3072:7: error: ‘CPU_UP_PREPARE’ \ undeclared (first use in this function) This fixes the issue introduced by commit <6f3bb80944> ("KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs"). Fixes: `6f3bb80944` Cc: stable@vger.kernel.org # v4.6 Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2016-05-11 21:19:10 +10:00
Paul Mackerras	eb8b056016	KVM: PPC: Fix emulated MMIO sign-extension When the guest does a sign-extending load instruction (such as lha or lwa) to an emulated MMIO location, it results in a call to kvmppc_handle_loads() in the host. That function sets the vcpu->arch.mmio_sign_extend flag and calls kvmppc_handle_load() to do the rest of the work. However, kvmppc_handle_load() sets the mmio_sign_extend flag to 0 unconditionally, so the sign extension never gets done. To fix this, we rename kvmppc_handle_load to __kvmppc_handle_load and add an explicit parameter to indicate whether sign extension is required. kvmppc_handle_load() and kvmppc_handle_loads() then become 1-line functions that just call __kvmppc_handle_load() with the extra parameter. Reported-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2016-05-11 21:19:10 +10:00
Alexey Kardashevskiy	ade3ac660a	KVM: PPC: Fix debug macros When XICS_DBG is enabled, gcc produces format errors. This fixes formats to match passed values types. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2016-05-11 21:19:10 +10:00
Laurent Vivier	11dd6ac025	KVM: PPC: Book3S PR: Manage single-step mode Until now, when we connect gdb to the QEMU gdb-server, the single-step mode is not managed. This patch adds this, only for kvm-pr: If KVM_GUESTDBG_SINGLESTEP is set, we enable single-step trace bit in the MSR (MSR_SE) just before the __kvmppc_vcpu_run(), and disable it just after. In kvmppc_handle_exit_pr, instead of routing the interrupt to the guest, we return to host, with KVM_EXIT_DEBUG reason. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2016-05-11 21:19:10 +10:00
Linus Torvalds	4883d11e06	powerpc fixes for 4.6 #4 - Fix bad inline asm constraint in create_zero_mask() from Anton Blanchard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXLEx6AAoJEFHr6jzI4aWAQskQAL9rtQtxhJQnCE97832iIgWn +YEef53mSTX7CEpoQqGY/2PUAOeQ8MPteewNaO5spZC2Vv34bXuHtv0q1uhiVJ6K Q0t1LSbm60rv9BIuO8rDSJmSGnSljI1bczd0WVsTHuYLyFrVPtRZaDBJT9Bh2JTQ +K/GE3o4R1X5pnUlZ4UqCcN7KwLrqHkQrmEcsSdUQMJ4s4jZaxS1KP9Kzq/M+mjy L9K4SyXneMyCgiUCIdu3hN5A7Vb+2NauGmCP/eV6x4+M/Bq5JC4vNDUcCKE0IBLF bNP5nvMKBBZOHlwxVWo9d2UWPeyEoIz4sKiZ26AE+urCYGhRMHb4M/Mh0qUbru13 CwZ1BwmZnyYs4pY8INbZJ+TvLcFgROfxxeUlT0KHStlnOrIbmgeKPWI/nymsL6n8 uHgSc8SnhskNW0n4NhWKVPD5iKqeWV2spFr5aTQ2tLDi7s2rT2oZjl5/GvKSIt+o NlmkL2LpbIOftRQ2j+EhHPq/acV8+d20yjN8Zul0tSTNuFeSk6rXyXhEiYXfU+Ao lPjkheNJZ9Oq0MAW0dvqWK4DrK31lhnZYcauO84cSA39sGa9BhHFe1E9XXMDSDwf YFs7ihMcbap1pejcf7waJJ5axCwN1SgxUGH1lCuGmOAypftbQT5TpB7m52ctAS5J vbsUZ8Z0W6OwsfSiKc/G =IkNh -----END PGP SIGNATURE----- Merge tag 'powerpc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Fix bad inline asm constraint in create_zero_mask() from Anton Blanchard" * tag 'powerpc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix bad inline asm constraint in create_zero_mask()	2016-05-06 11:05:07 -07:00
Peter Zijlstra (Intel)	e9d867a67f	sched: Allow per-cpu kernel threads to run on online && !active In order to enable symmetric hotplug, we must mirror the online && !active state of cpu-down on the cpu-up side. However, to retain sanity, limit this state to per-cpu kthreads. Aside from the change to set_cpus_allowed_ptr(), which allow moving the per-cpu kthreads on, the other critical piece is the cpu selection for pinned tasks in select_task_rq(). This avoids dropping into select_fallback_rq(). select_fallback_rq() cannot be allowed to select !active cpus because its used to migrate user tasks away. And we do not want to move user tasks onto cpus that are in transition. Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160301152303.GV6356@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-05-06 14:58:22 +02:00
Ingo Molnar	1a618c2cfe	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 10:12:37 +02:00
Al Viro	4e82901cd6	dcache_{readdir,dir_lseek}() users: switch to ->iterate_shared no need to lock directory in dcache_dir_lseek(), while we are at it - per-struct file exclusion is enough. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:49:32 -04:00

1 2 3 4 5 ...

14938 Commits