For some SR-IOV devices, the number of available virtual functions, i.e.,
TotalVFs, increases after setting the ARI Capable Hierarchy bit in the
SR-IOV Control register. This violates the SR-IOV spec, r1.1, sec 3.3.6,
which says TotalVFs is HwInit, but we don't need TotalVFs before setting
the ARI Capable bit anyway.
Set the ARI Capable Hierarchy bit (if ARI is enabled in the upstream
bridge) before reading TotalVFs.
[bhelgaas: changelog]
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Per the SR-IOV spec r1.1, sec 3.3.14, the required alignment of a PF's IOV
BAR is the size of an individual VF BAR, and the size consumed is the
individual VF BAR size times NumVFs.
The PowerNV platform has additional alignment requirements to help support
its Partitionable Endpoint device isolation feature (see
Documentation/powerpc/pci_iov_resource_on_powernv.txt).
Add a pcibios_iov_resource_alignment() interface to allow platforms to
request additional alignment.
[bhelgaas: changelog, adapt to reworked pci_sriov_resource_alignment(),
drop "align" parameter]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
VFs are dynamically created when a driver enables them. On some platforms,
like PowerNV, special resources are necessary to enable VFs.
Add platform hooks for enabling and disabling VFs.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On PowerNV, some resource reservation is needed for SR-IOV VFs that don't
exist at the bootup stage. To do the match between resources and VFs, the
code need to get the VF's BDF in advance.
Rename virtfn_bus() and virtfn_devfn() to pci_iov_virtfn_bus() and
pci_iov_virtfn_devfn() and export them.
[bhelgaas: changelog, make "busnr" int]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
An SR-IOV device can change its First VF Offset and VF Stride based on the
values of ARI Capable Hierarchy and NumVFs. The number of buses required
for all VFs is determined by NumVFs, First VF Offset, and VF Stride (see
SR-IOV spec r1.1, sec 2.1.2).
Previously pci_iov_bus_range() computed how many buses would be required by
TotalVFs, but this was based on a single NumVFs value and may not have been
the maximum for all NumVFs configurations.
Iterate over all valid NumVFs and calculate the maximum number of bus
numbers that could ever be required for VFs of this device.
[bhelgaas: changelog, compute busnr of NumVFs, not TotalVFs, remove
kerenl-doc comment marker]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The First VF Offset and VF Stride fields depend on the NumVFs setting, so
refresh the cached fields in struct pci_sriov when updating NumVFs. See
the SR-IOV spec r1.1, sec 3.3.9 and 3.3.10.
[bhelgaas: changelog, remove kernel-doc comment marker]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Most of PCI uses "res = &dev->resource[i]", not "res = dev->resource + i".
Use that style in iov.c also.
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we don't store the individual VF BAR size. We calculate it when
needed by dividing the PF's IOV resource size (which contains space for
*all* the VFs) by total_VFs or by reading the BAR in the SR-IOV capability
again.
Keep the individual VF BAR size in struct pci_sriov.barsz[], add
pci_iov_resource_size() to retrieve it, and use that instead of doing the
division or reading the SR-IOV capability BAR.
[bhelgaas: rename to "barsz[]", simplify barsz[] index computation, remove
SR-IOV capability BAR sizing]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we size VF BAR0, VF BAR1, etc., from the SR-IOV Capability of a PF, we
learn the alignment requirement and amount of space consumed by a single
VF. But when VFs are enabled, *each* of the NumVFs consumes that amount of
space, so the total size of the PF resource is "VF BAR size * NumVFs".
Add a printk of the total space consumed by the VFs corresponding to what
we already do for normal non-IOV BARs.
No functional change; new message only.
[bhelgaas: split out into its own patch]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we don't have space for all the bus numbers required to enable VFs,
print the largest bus number required and the range available.
No functional change; improved error message only.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pci_iov_resource_bar() always sets its 'pci_bar_type' parameter to
'pci_bar_unknown'. Drop the parameter and just use 'pci_bar_unknown'
directly in the callers.
No functional change intended.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Yu Zhao <yuzhao@google.com>
Use PCI device flag helper functions when checking whether a device is
assigned. No functional change.
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_bus_add_device() always returns 0, so there's no point in returning
anything at all. Make it a void function and remove the tests of the
return value from the callers.
[bhelgaas: changelog, remove unused "err" from i82875p_setup_overfl_dev()]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 74bb1bcc7d ("PCI: handle SR-IOV Virtual Function
Migration"), removing this exported interface:
pci_sriov_migration()
Since pci_sriov_migration() is unused, it is impossible to schedule
sriov_migration_task() or use any of the other migration infrastructure.
This is based on Stephen Hemminger's patch (see link below), but goes a bit
further.
Link: http://lkml.kernel.org/r/20131227132710.7190647c@nehalam.linuxnetplumber.net
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Per the SR-IOV spec rev 1.1:
3.4.1.9 Header Type (Offset 0Eh)
"... For VFs, this register must be RO Zero."
Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb NIC.
When they do it makes us handle ACS testing and therefore IOMMU groups as
if they were actual multifunction devices and require ACS capabilities to
make sure there's no peer-to-peer between functions. VFs are never
traditional multifunction devices, so simply clear this bit before we get
any further into setup.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=68431
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When SR-IOV is disabled (VF Enable is cleared), NumVFs is not very useful,
so this patch clears it out to prevent confusing lspci output like that
below. We already clear NumVFs in sriov_disable(), and this does the same
when we disable SR-IOV as part of parsing the SR-IOV capability.
$ lspci -vvv -s 13:00.0
13:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
Initial VFs: 64, Total VFs: 64, Number of VFs: 64, ...
[bhelgaas: changelog]
Signed-off-by: ethan.zhao <ethan.kernel@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Fix whitespace, capitalization, and spelling errors. No functional change.
I know "busses" is not an error, but "buses" was more common, so I used it
consistently.
Signed-off-by: Marta Rybczynska <rybczynska@gmail.com> (pci_reset_bridge_secondary_bus())
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change the return value to -ENOSYS if a device is not an SR-IOV PF.
Previously we returned either -ENODEV or -EINVAL.
Also have pci_sriov_get_totalvfs() return 0 in the error case to make the
behaviour consistent whether CONFIG_PCI_IOV is enabled or not.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Currently, we only update NumVFs register during sriov_enable().
This register should also be updated during sriov_disable() and when
sriov_enable() fails. Otherwise, we will get the stale "Number of VFs"
info from lspci.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Trivial changes to IOV:
1) use new PCI interfaces to simplify IOV implementation
2) fix some reference count related race windows
[bhelgaas: fix virtfn_add() add bus/alloc dev error paths]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
The flag pci_bus->is_added is used to guard invocation of
pcibios_fixup_bus(pci_bus). When virtfn_add_bus() is called, the
pci_bus->is_added flag has already been set, so remove the redundant
bus->is_added = 1;
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Use the new pci_alloc_dev(bus) to replace the existing using of
alloc_pci_dev(void).
[bhelgaas: drop pci_bus ref later in pci_release_dev()]
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Commit 4f535093cf "PCI: Put pci_dev in device tree as early as possible"
moves device registering from pci_bus_add_devices() to pci_device_add().
That causes problems for virtual functions because device_add(&virtfn->dev)
is called before setting the virtfn->is_virtfn flag, which then causes Xen
to report PCI virtual functions as PCI physical functions.
Fix it by setting virtfn->is_virtfn before calling pci_device_add().
[Jiang Liu]: Move the setting of virtfn->is_virtfn ahead further for better
readability and modify changelog.
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v3.9+
This function is meant to add a helper function that will determine if a PF
has any VFs that are currently assigned to a guest. We currently have been
implementing this function per driver, and going forward I would like to avoid
that by making this function generic and using this helper.
v2: Removed extern from declaration of pci_vfs_assigned in pci.h and return
0 if SR-IOV is disabled with is inline with other PCI SRIOV functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Host bridge hotplug
- Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
- Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
- Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
- Stop caching _PRT and make independent of bus numbers (Yinghai Lu)
PCI device hotplug
- Clean up cpqphp dead code (Sasha Levin)
- Disable ARI unless device and upstream bridge support it (Yijing Wang)
- Initialize all hot-added devices (not functions 0-7) (Yijing Wang)
Power management
- Don't touch ASPM if disabled (Joe Lawrence)
- Fix ASPM link state management (Myron Stowe)
Miscellaneous
- Fix PCI_EXP_FLAGS accessor (Alex Williamson)
- Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
- Document hotplug resource and MPS parameters (Yijing Wang)
- Add accessor for PCIe capabilities (Myron Stowe)
- Drop pciehp suspend/resume messages (Paul Bolle)
- Make pci_slot built-in only (not a module) (Jiang Liu)
- Remove unused PCI/ACPI bind ops (Jiang Liu)
- Removed used pci_root_bus (Bjorn Helgaas)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRKS3hAAoJEFmIoMA60/r8xxoP/j1CS4oCZAnBIVT9fKBkis+/
CENcfHIUKj6J9iMfJEVvqBELvqaLqtpeNwAGMcGPxV7VuT3K1QumChfaTpRDP0HC
VDRmrjcmfenEK+YPOG7acsrTyqk2wjpLOyu9MKRxtC5u7tF6376KQpkEFpO4haL4
eUHTxfE76OkrPBSvx3+PUSf6jqrvrNbjX8K6HdDVVlm3sVAQKmYJU/Wphv2NPOqa
CAMyCzEGybFjr8hDRwvWgr+06c718GMwQUbnrPdHXAe7lMNMrN/XVBmU9ABN3Aas
icd3lrDs+yPObgcO/gT8+sAZErCtdJ9zuHGYHdYpRbIQj/5JT4TMk7tw/Bj7vKY9
Mqmho9GR5YmYTRN9f1r+2n5AQ/KYWXJDrRNOnt5/ys5BOM3vwJ7WJ902zpSwtFQp
nLX+oD/hLfzpnoIQGDuBAoAXp2Kam3XWRgVvG78buRNrPj+kUzimk14a8qQeY+CB
El6UKuwi5Uv/qgs1gAqqjmZmsAkon2DnsRZa6Fl8NTkDlis7LY4gp9OU38ySFpB+
PhCmRyCZmDDqTVtwj6XzR3nPQ5LBSbvsTfgMxYMIUSXHa06tyb2q5p4mEIas0OmU
RKaP5xQqZuTgD8fbdYrx0xgSrn7JHt/j/X//Qs6unlLCWhlpm3LjJZKxyw2FwBGr
o4Lci+PiBh3MowCrju9D
=ER3b
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug
- Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
- Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
- Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
- Stop caching _PRT and make independent of bus numbers (Yinghai Lu)
PCI device hotplug
- Clean up cpqphp dead code (Sasha Levin)
- Disable ARI unless device and upstream bridge support it (Yijing Wang)
- Initialize all hot-added devices (not functions 0-7) (Yijing Wang)
Power management
- Don't touch ASPM if disabled (Joe Lawrence)
- Fix ASPM link state management (Myron Stowe)
Miscellaneous
- Fix PCI_EXP_FLAGS accessor (Alex Williamson)
- Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
- Document hotplug resource and MPS parameters (Yijing Wang)
- Add accessor for PCIe capabilities (Myron Stowe)
- Drop pciehp suspend/resume messages (Paul Bolle)
- Make pci_slot built-in only (not a module) (Jiang Liu)
- Remove unused PCI/ACPI bind ops (Jiang Liu)
- Removed used pci_root_bus (Bjorn Helgaas)"
* tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits)
PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers
PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS
ACPI / PCI: Make pci_slot built-in only, not a module
PCI/PM: Clear state_saved during suspend
PCI: Use atomic_inc_return() rather than atomic_add_return()
PCI: Catch attempts to disable already-disabled devices
PCI: Disable Bus Master unconditionally in pci_device_shutdown()
PCI: acpiphp: Remove dead code for PCI host bridge hotplug
PCI: acpiphp: Create companion ACPI devices before creating PCI devices
PCI: Remove unused "rc" in virtfn_add_bus()
PCI: pciehp: Drop suspend/resume ENTRY messages
PCI/ASPM: Don't touch ASPM if forcibly disabled
PCI/ASPM: Deallocate upstream link state even if device is not PCIe
PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etc
PCI: Document hpiosize= and hpmemsize= resource reservation parameters
PCI: Use PCI Express Capability accessor
PCI: Introduce accessor to retrieve PCIe Capabilities Register
PCI: Put pci_dev in device tree as early as possible
PCI: Skip attaching driver in device_add()
PCI: acpiphp: Keep driver loaded even if no slots found
...
We want to put pci_dev structs in the device tree as soon as possible so
for_each_pci_dev() iteration will not miss them, but driver attachment
needs to be delayed until after pci_assign_unassigned_resources() to make
sure all devices have resources assigned first.
This patch moves device registering from pci_bus_add_devices() to
pci_device_add(), which happens earlier, leaving driver attachment in
pci_bus_add_devices().
It also removes unattached child bus handling in pci_bus_add_devices().
That's not needed because child bus via pci_add_new_bus() is already
in parent bus children list.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix kernel-doc warning in iov.c:
Warning(drivers/pci/iov.c:752): No description found for parameter 'numvfs'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Sorry-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No need to check "!dev" when the caller should always supply a valid
pointer. If the caller *doesn't* supply a valid pointer, it probably
won't check for a failure return either. This way we'll oops and get a
backtrace.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Some implementations of SRIOV provide a capability structure
value of TotalVFs that is greater than what the software can support.
Provide a method to reduce the capability structure reported value
to the value the driver can support.
This ensures sysfs reports the current capability of the system,
hardware and software.
Example for its use: igb & ixgbe -- report 8 & 64 as TotalVFs,
but drivers only support 7 & 63 maximum.
Signed-off-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 433efd2247.
When we remove an SR-IOV device, we have this call chain:
driver .remove() method
pci_disable_sriov()
sriov_disable()
virtfn_remove()
pci_get_domain_bus_and_slot()
sriov_disable() is only called for PFs, not for VFs. When it's called
for a PF, it loops through all the VFs and calls virtfn_remove() for
each. But we stop and remove VFs before PFs, so by the time we get
to virtfn_remove(), the VFs have already been stopped and deleted
from the device list. Now pci_get_domain_bus_and_slot(), which uses
bus_find_device() and relies on that device list, doesn't find the
VFs, so the VF references aren't released correctly.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/jiang-get-domain-bus-slot:
xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot()
PCI: Use hotplug-safe pci_get_domain_bus_and_slot()
PCI/cpcihp: Use hotplug-safe pci_get_domain_bus_and_slot()
PCI/vga: Use hotplug-safe pci_get_domain_bus_and_slot()
ia64/PCI: Use hotplug-safe pci_get_domain_bus_and_slot()
Following code has a race window between pci_find_bus() and pci_get_slot()
if PCI hotplug operation happens between them which removes the pci_bus.
So use PCI hotplug safe interface pci_get_domain_bus_and_slot() instead,
which also reduces code complexity.
struct pci_bus *pci_bus = pci_find_bus(domain, busno);
struct pci_dev *pci_dev = pci_get_slot(pci_bus, devfn);
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Introduce an inline function pci_pcie_type(dev) to extract PCIe
device type from pci_dev->pcie_flags_reg field, and prepare for
removing pci_dev->pcie_type.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Replace the struct pci_bus secondary/subordinate members with the
struct resource busn_res. Later we'll build a resource tree of these
bus numbers.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The old pci_remove_bus_device actually did stop and remove.
Make the name reflect that to reduce confusion.
This patch is done by sed scripts and changes back some incorrect
__pci_remove_bus_device changes.
Suggested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
sysfs is a bit stricter now and emits warnings in more cases.
For SRIOV hotplug, we are calling pci_stop_dev() for each VF first
(after we update pci_stop_bus_devices) which remove each VF subdir. So
double check the VF dir in /sys before trying to remove the physfn link.
Signed-of-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For an SRIOV device, PCI_SRIOV_SYS_PGSIZE should be set before
the PCI_SRIOV_BAR are queried. The sys pagesize defaults to 4k,
so this change is required on powerpc box with 64k base page size.
This is a regression caused due to moving SRIOV init to sriov_enable().
| commit afd24ece5c
| Author: Ram Pai <linuxram@us.ibm.com>
| PCI: delay configuration of SRIOV capability
| The SRIOV capability, namely page size and total_vfs of a device are
| configured during enumeration phase of the device. This can potentially
| interfere with the PCI operations of the platform, if the IOV capability
| of the device is not enabled.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The SRIOV capability, namely page size and total_vfs of a device are
configured during enumeration phase of the device. This can potentially
interfere with the PCI operations of the platform, if the IOV capability
of the device is not enabled.
The following patch postpones the configuration of the IOV capability of
the device to a later point, when the IOV capability is explicitly
enabled by the device driver.
The patch is tested on x86 and power platform.
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_block_user_cfg_access was designed for the use case that a single
context, the IPR driver, temporarily delays user space accesses to the
config space via sysfs. This assumption became invalid by the time
pci_dev_reset was added as locking instance. Today, if you run two loops
in parallel that reset the same device via sysfs, you end up with a
kernel BUG as pci_block_user_cfg_access detect the broken assumption.
This reworks the pci_block_user_cfg_access to a sleeping service
pci_cfg_access_lock and an atomic-compatible variant called
pci_cfg_access_trylock. The former not only blocks user space access as
before but also waits if access was already locked. The latter service
just returns false in this case, allowing the caller to resolve the
conflict instead of raising a BUG.
Adaptions of the ipr driver were originally written by Brian King.
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
All the PCI BARs of a device are enabled when the device is enabled
using pci_enable_device(). This unnecessarily enables SRIOV BARs of the
device.
On some platforms, which do not support SRIOV as yet, the
pci_enable_device() fails to enable the device if its SRIOV BARs are not
allocated resources correctly.
The following patch fixes the above problem. The SRIOV BARs are now
enabled when IOV capability of the device is enabled in sriov_enable().
NOTE: Note, there is subtle change in the pci_enable_device() API. Any
driver that depends on SRIOV BARS to be enabled in pci_enable_device()
can fail.
The patch has been touch tested on power and x86 platform.
Tested-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
They were implicitly getting it from device.h --> module.h but
we want to clean that up. So add the minimal header for these
macros.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
ATS does not depend on IOV support, so move the code into
its own file. This file will also include support for the
PRI and PASID capabilities later.
Also give ATS its own Kconfig variable to allow selecting it
without IOV support.
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch moves the relevant declarations from the local
header file in drivers/pci to a more accessible locations so
that it can be used by the AMD IOMMU driver too.
The file is named pci-ats.h because support for the PCI PRI
capability will also be added there in a later patch-set.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This fixes the prototype for both pci_resource_alignment() and
pci_sriov_resource_alignment().
Patch started as debugging effort from Cam Macdonell.
Cc: Cam Macdonell <cam@cs.ualberta.ca>
Cc: Avi Kivity <avi@redhat.com>
[chrisw: add iov bits]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>