The model for IOMMU passthrough is that decent devices that can cope
with DMA to all of memory get passthrough; crappy devices with a limited
dma_mask don't -- they get to use the IOMMU anyway.
This is done on the basis that IOMMU passthrough is usually wanted for
performance reasons, and it's only the decent PCI devices that you
really care about performance for, while the crappy 32-bit ones like
your USB controller can just use the IOMMU and you won't really care.
Unfortunately, the check for this was only looking at dev->dma_mask, not
at dev->coherent_dma_mask. And some devices have a 32-bit
coherent_dma_mask even though they have a full 64-bit dma_mask.
Even more unfortunately, fixing that simple oversight would upset
certain broken HP devices. Not only do they have a 32-bit
coherent_dma_mask, but they also have a tendency to do stray DMA to
unmapped addresses. And then they die when they take the DMA fault they
so richly deserve.
So if we do the 'correct' fix, it'll mean that affected users have to
disable IOMMU support completely on "a large percentage of servers from
a major vendor."
Personally, I have little sympathy -- given that this is the _same_
'major vendor' who is shipping machines which claim to have IOMMU
support but have obviously never _once_ booted a VT-d capable OS to do
any form of QA. But strictly speaking, it _would_ be a regression even
though it only ever worked by fluke.
For 2.6.33, we'll come up with a quirk which gives swiotlb support
for this particular device, and other devices with an inadequate
coherent_dma_mask will just get normal IOMMU mapping.
The simplest fix for 2.6.32, though, is just to jump through some hoops
to try to allocate coherent DMA memory for such devices in a place that
they can reach. We'd use dma_generic_alloc_coherent() for this if it
existed on IA64.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Chris Wright has some patches which let us fall back to swiotlb nicely
if IOMMU initialisation fails. But those are a bit much for 2.6.32.
Instead, let's shift the check for the biggest problem, the HP and Acer
BIOS bug which reports a DMAR at physical address zero. That one can
actually be checked much earlier -- before we even admit to having
detected an IOMMU in the first place. So the swiotlb init goes ahead as
we want.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This reverts commit 308cf8e13f. This
patch had trouble with transparent bridges, among other things. A more
readable and correct version should land in 2.6.33.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit d43c36dc6b ("headers: remove
sched.h from interrupt.h") left some build errors in some configurations
due to drivers having depended on getting header files "accidentally".
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Combined several one-liners from Ingo into one single patch - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/~dwmw2/iommu-2.6.32:
x86: Move pci_iommu_init to rootfs_initcall()
Run pci_apply_final_quirks() sooner.
Mark pci_apply_final_quirks() __init rather than __devinit
Rename pci_init() to pci_apply_final_quirks(), move it to quirks.c
intel-iommu: Yet another BIOS workaround: Isoch DMAR unit with no TLB space
intel-iommu: Decode (and ignore) RHSA entries
intel-iommu: Make "Unknown DMAR structure" message more informative
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: Prevent AER driver from being loaded on non-root port PCIE devices
PCI: get larger bridge ranges when space is available
PCI: pci.c: fix kernel-doc notation
PCI quirk: TI XIO200a erroneously reports support for fast b2b transfers
PCI PM: Read device power state from register after updating it
PCI: remove pci_assign_resource_fixed()
PCI: PCIe portdrv: remove "-driver" from driver name
Having this as a device_initcall() means that some real device drivers
can actually initialise _before_ the quirks are run, which is wrong.
We want it to run _before_ device_initcall(), but _after_ fs_initcall(),
since some arch-specific PCI initialisation like pcibios_assign_resources()
is done at fs_initcall().
We could use rootfs_initcall() but I actually want to use that for the
IOMMU initialisation, which has to come after the quirks, but still
before the real devices. So use fs_initcall_sync() instead -- since this
is entirely synchronous, it doesn't hurt that it'll escape the
synchronisation.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This function may have done more in the past, but all it does now is
apply the PCI_FIXUP_FINAL quirks. So name it sensibly and put it where
it belongs.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
A bug was seen on boards using a PLX 8518 switch device which advertises
AER on each of it's transparent bridges. The AER driver was loaded for
each bridge and this driver tried to access the AER source ID register
whenever an interrupt occured on the shared PCI INTX lines. The source
ID register does not exist on non root port PCIE device's which
advertise AER and trying to access this register causes a unsupported
request error on the bridge. Thus, when the next interrupt occurs,
another error is found and the non existent source ID register is
accessed again, and so it goes on.
The result is a spammed dmesg with unsupported request PCI express
errors on the bridge device that the AER driver is loaded against.
Reported-by: Malcolm Crossley <malcolm.crossley2@gefanuc.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Malcolm Crossley <malcolm.crossley2@gefanuc.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Found one system:
[ 71.120590] pci 0000:40:05.0: scanning behind bridge, config 4f4a40, pass 0
[ 71.138283] PCI: Scanning bus 0000:4a
[ 71.140341] pci 0000:4a:00.0: found [15b3:6278] class 000c06 header type 00
[ 71.157173] pci 0000:4a:00.0: reg 10 64bit mmio: [0x000000-0x0fffff]
[ 71.161697] pci 0000:4a:00.0: reg 18 64bit mmio pref: [0x000000-0x7fffff]
[ 71.179403] pci 0000:4a:00.0: reg 20 64bit mmio pref: [0x000000-0xfffffff]
[ 71.185366] pci 0000:4a:00.0: calling quirk_resource_alignment+0x0/0x1dd
[ 71.200846] pci 0000:4a:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'
[ 71.219623] PCI: Fixups for bus 0000:4a
[ 71.222194] pci 0000:40:05.0: bridge 32bit mmio: [0xcf000000-0xcf0fffff]
[ 71.238662] pci 0000:40:05.0: bridge 64bit mmio pref: [0xcd800000-0xcdffffff]
[ 71.255793] PCI: Bus scan for 0000:4a returning with max=4a
Device needs a big pref mmio, but BIOS doesn't allocate mmio to it aside
from a small MMIO range. Later, the kernel will not allocate resources to
that to the device:
[ 99.574030] pci 0000:4a:00.0: BAR 4: can't allocate mem resource [0xd0000000-0xcdffffff]
[ 99.580102] pci 0000:4a:00.0: BAR 2: got res [0xcd800000-0xcdffffff] bus [0xcd800000-0xcdffffff] flags 0x12120c
[ 99.602307] pci 0000:4a:00.0: BAR 2: moved to bus [0xcd800000-0xcdffffff] flags 0x12120c
[ 99.615991] pci 0000:4a:00.0: BAR 0: got res [0xcf000000-0xcf0fffff] bus [0xcf000000-0xcf0fffff] flags 0x120204
[ 99.634499] pci 0000:4a:00.0: BAR 0: moved to bus [0xcf000000-0xcf0fffff] flags 0x120204
[ 99.654318] pci 0000:40:05.0: PCI bridge, secondary bus 0000:4a
[ 99.658766] pci 0000:40:05.0: IO window: disabled
[ 99.675478] pci 0000:40:05.0: MEM window: 0xcf000000-0xcf0fffff
[ 99.681663] pci 0000:40:05.0: PREFETCH window: 0x000000cd800000-0x000000cdffffff
So try to get a big range in the pci bridge if there is no child using
that range. With the patch we get:
[ 99.104525] pci 0000:4a:00.0: BAR 4: got res [0xfc080000000-0xfc08fffffff] bus [0xfc080000000-0xfc08fffffff] flags 0x12120c
[ 99.123624] pci 0000:4a:00.0: BAR 4: moved to bus [0xfc080000000-0xfc08fffffff] flags 0x12120c
[ 99.131977] pci 0000:4a:00.0: BAR 2: got res [0xfc090000000-0xfc0907fffff] bus [0xfc090000000-0xfc0907fffff] flags 0x12120c
[ 99.149788] pci 0000:4a:00.0: BAR 2: moved to bus [0xfc090000000-0xfc0907fffff] flags 0x12120c
[ 99.169248] pci 0000:4a:00.0: BAR 0: got res [0xc0200000-0xc02fffff] bus [0xc0200000-0xc02fffff] flags 0x120204
[ 99.189508] pci 0000:4a:00.0: BAR 0: moved to bus [0xc0200000-0xc02fffff] flags 0x120204
[ 99.206402] pci 0000:40:05.0: PCI bridge, secondary bus 0000:4a
[ 99.210637] pci 0000:40:05.0: IO window: disabled
[ 99.224856] pci 0000:40:05.0: MEM window: 0xc0200000-0xc03fffff
[ 99.230019] pci 0000:40:05.0: PREFETCH window: 0x000fc080000000-0x000fc097ffffff
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This quirk will disable fast back to back transfer on the secondary bus
segment of the TI Bridge.
Signed-off-by: Gabe Black <gabe.black@ni.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After attempting to change the power state of a PCI device
pci_raw_set_power_state() doesn't check if the value it wrote into
the device's PCI_PM_CTRL register has been stored in there, but
unconditionally modifies the device's current_state field to reflect
the change. This may cause problems to happen if the power state of
the device hasn't been changed in fact, because it will make the PCI
PM core make a wrong assumption.
To prevent such situations from happening modify
pci_raw_set_power_state() so that it reads the device's PCI_PM_CTRL
register after writing into it and uses the value read from the
register to update the device's current_state field. Also make it
print a message saying that the device refused to change its power
state as requested (returning an error code in such cases would cause
suspend regressions to appear on some systems, where device drivers'
suspend routines return error codes if pci_set_power_state() fails).
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Adrian commented out this function in 2baad5f96b, but I don't think
it's even worth cluttering the file with the unused code.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
No need to include "-driver" in the driver name.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
CC: Tom Long Nguyen <tom.l.nguyen@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Asus decided to ship a BIOS which configures sound DMA to go via the
dedicated IOMMU unit, but assigns precisely zero TLB entries to that
unit. Which causes the whole thing to deadlock, including the DMA
traffic on the _other_ IOMMU units. Nice one.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Commit 15b8dd53f5 changed info->hardware_id from a static array to
a pointer. If hardware_id is non-NULL, it points to a NULL-terminated
string, so we don't need to terminate it explicitly. However, it may
be NULL; in that case, we *can't* add a NULL terminator.
This causes a NULL pointer dereference oops for devices without _HID.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
CC: Lin Ming <ming.m.lin@intel.com>
CC: Bob Moore <robert.moore@intel.com>
CC: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
I recently got a system where the DMAR table included a couple of RHSA
(remapping hardware static affinity) entries. Rather than printing a
message about an "Unknown DMAR structure," it would probably be more
useful to dump the RHSA structure (as other DMAR structures are dumped).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We might as well print the type of the DMAR structure we don't know how
to handle when skipping it. Then someone getting this message has a
chance of telling whether the structure is just bogus, or if there
really is something valid that the kernel doesn't know how to handle.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* git://git.infradead.org/iommu-2.6: (23 commits)
intel-iommu: Disable PMRs after we enable translation, not before
intel-iommu: Kill DMAR_BROKEN_GFX_WA option.
intel-iommu: Fix integer wrap on 32 bit kernels
intel-iommu: Fix integer overflow in dma_pte_{clear_range,free_pagetable}()
intel-iommu: Limit DOMAIN_MAX_PFN to fit in an 'unsigned long'
intel-iommu: Fix kernel hang if interrupt remapping disabled in BIOS
intel-iommu: Disallow interrupt remapping if not all ioapics covered
intel-iommu: include linux/dmi.h to use dmi_ routines
pci/dmar: correct off-by-one error in dmar_fault()
intel-iommu: Cope with yet another BIOS screwup causing crashes
intel-iommu: iommu init error path bug fixes
intel-iommu: Mark functions with __init
USB: Work around BIOS bugs by quiescing USB controllers earlier
ia64: IOMMU passthrough mode shouldn't trigger swiotlb init
intel-iommu: make domain_add_dev_info() call domain_context_mapping()
intel-iommu: Unify hardware and software passthrough support
intel-iommu: Cope with broken HP DC7900 BIOS
iommu=pt is a valid early param
intel-iommu: double kfree()
intel-iommu: Kill pointless intel_unmap_single() function
...
Fixed up trivial include lines conflict in drivers/pci/intel-iommu.c
The following 64 bit promotions are necessary to handle memory above the
4GiB boundary correctly.
[dwmw2: Fix the second part not to need 64-bit arithmetic at all]
Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If end_pfn is equal to (unsigned long)-1, then the loop will never end.
Seen on 32-bit kernel, but could have happened on 64-bit too once we get
hardware that supports 64-bit guest addresses.
Change both functions to a 'do {} while' loop with the test at the end,
and check for the PFN having wrapper round to zero.
Reported-by: Benjamin LaHaise <ben.lahaise@neterion.com>
Tested-by: Benjamin LaHaise <ben.lahaise@neterion.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This means we're limited to 44-bit addresses on 32-bit kernels, and
makes it sane for us to use 'unsigned long' for PFNs throughout.
Which is just as well, really, since we already do that.
Reported-by: Benjamin LaHaise <ben.lahaise@neterion.com>
Tested-by: Benjamin LaHaise <ben.lahaise@neterion.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Since slot_cap field in struct controller contains physical slot
number informationq, we don't need number field in struct slot.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The struct hpc_ops seems a set of hooks to controller specific
routines. But, it is meaningless because no hotplug controller driver
follows this framework.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since we have a pointer to pcie_device in struct controller, we don't
need a pointer to pci_dev.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The crit_sect mutex defined in struct controller is to serialize
hot-plug operations against multiple slots under the same bus. But,
since PCIe doesnstream port has only one slot at most, it is
meaningless and we don't need it.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The slot number can be calculated only by physical slot number field
in the slot capabilities register. So the first_slot field in struct
controller is meaningless and we don't need it.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since the device number of the hot-slot under the PCIe downstream port
is always 0, the slot_device_offset field in the slot is meaningless
and we don't need it.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The hp_slot field is to identify the slot under the same
controller. But, since PCIe downstream port has only one slot at most,
it is meaningless and we don't need it.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The device field in the struct slot is not necessary because it is
always 0 in pciehp driver.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The bus field in struct slot is not necessary.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The slot_num_inc field in struct controller is unused and meaningless
in pciehp driver.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since PCIe downstream port has only one slot at most, we don't need
num_slots field in struct controller. Note that struct controller
itself doesn't exist if PCIe downstream port has no slot.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since PCIe downstream port has only one slot at most, we don't need
'slot_list' linked list to manage multiple slots under the port.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When booting with pci=nomsi aer causes lost interrupts and
lockdep inversions.
So check if MSIs are not disabled before initializing the aer
driver.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The definition of the ASPM support field in the Link Capabilities
Register had been changed by the "ASPM optionality ECN" as follows:
<Before>
00b Reserved
01b L0s Supported
10b Reserved
11b L0s and L1 Supported
<After>
00b No ASPM Support
01b L0s Supported
10b L1 Supported
11b L0s and L1 Supported
Current linux ASPM driver doesn't enable ASPM if the support field is
00b or 10b. So there is no impact about 00b. But current linux ASPM
driver doesn't enable L1 if the support field is 10b. With this patch,
10b (L1 support) is handled properly.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
PCI hotplug: clean up acpi_run_hpp()
PCI hotplug: acpiphp: use generic pci_configure_slot()
PCI hotplug: shpchp: use generic pci_configure_slot()
PCI hotplug: pciehp: use generic pci_configure_slot()
PCI hotplug: add pci_configure_slot()
PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
PCI: Clear saved_state after the state has been restored
PCI PM: Return error codes from pci_pm_resume()
PCI: use dev_printk in quirk messages
PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
PCI Hotplug: acpiphp: find bridges the easy way
PCI: pcie portdrv: remove unused variable
PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
ACPI PM: Replace wakeup.prepared with reference counter
PCI PM: Introduce device flag wakeup_prepared
PCI / ACPI PM: Rework some debug messages
PCI PM: Simplify PCI wake-up code
...
Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases. The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.