This seems to be required on some X58 chipsets on systems
with more than one IOMMU. QI does not work until it is
enabled on all IOMMUs in the system.
Reported-by: Dheeraj CVR <cvr.dheeraj@gmail.com>
Tested-by: Dheeraj CVR <cvr.dheeraj@gmail.com>
Fixes: 5f0a7f7614 ('iommu/vt-d: Make root entry visible for hardware right after allocation')
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In commit <8bf478163e69> ("iommu/vt-d: Split up iommu->domains array"), it
it splits iommu->domains in two levels. Each first level contains 256
entries of second level. In case of the ndomains is exact a multiple of
256, it would have one more extra first level entry for current
implementation.
This patch refines this calculation to reduce the extra first level entry.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull intel IOMMU updates from David Woodhouse:
"This patchset improves the scalability of the Intel IOMMU code by
resolving two spinlock bottlenecks and eliminating the linearity of
the IOVA allocator, yielding up to ~5x performance improvement and
approaching 'iommu=off' performance"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: Use per-cpu IOVA caching
iommu/iova: introduce per-cpu caching to iova allocation
iommu/vt-d: change intel-iommu to use IOVA frame numbers
iommu/vt-d: avoid dev iotlb logic for domains with no dev iotlbs
iommu/vt-d: only unmap mapped entries
iommu/vt-d: correct flush_unmaps pfn usage
iommu/vt-d: per-cpu deferred invalidation queues
iommu/vt-d: refactoring of deferred flush entries
Commit 9257b4a2 ('iommu/iova: introduce per-cpu caching to iova allocation')
introduced per-CPU IOVA caches to massively improve scalability. Use them.
Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased, cleaned up and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
[dwmw2: split out VT-d part into a separate patch]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Make intel-iommu map/unmap/invalidate work with IOVA pfns instead of
pointers to "struct iova". This avoids using the iova struct from the IOVA
red-black tree and the resulting explicit find_iova() on unmap.
This patch will allow us to cache IOVAs in the next patch, in order to
avoid rbtree operations for the majority of map/unmap operations.
Note: In eliminating the find_iova() operation, we have also eliminated
the sanity check previously done in the unmap flow. Arguably, this was
overhead that is better avoided in production code, but it could be
brought back as a debug option for driver development.
Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased, fixed to not break iova api, and reworded
the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch avoids taking the device_domain_lock in iommu_flush_dev_iotlb()
for domains with no dev iotlb devices.
Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[gvdl@google.com: fixed locking issues]
Signed-off-by: Godfrey van der Linden <gvdl@google.com>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Current unmap implementation unmaps the entire area covered by the IOVA
range, which is a power-of-2 aligned region. The corresponding map,
however, only maps those pages originally mapped by the user. This
discrepancy can lead to unmapping of already unmapped entries, which is
unneeded work.
With this patch, only mapped pages are unmapped. This is also a baseline
for a map/unmap implementation based on IOVAs and not iova structures,
which will allow caching.
Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Change flush_unmaps() to correctly pass iommu_flush_iotlb_psi()
dma addresses. (x86_64 mm and dma have the same size for pages
at the moment, but this usage improves consistency.)
Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The IOMMU's IOTLB invalidation is a costly process. When iommu mode
is not set to "strict", it is done asynchronously. Current code
amortizes the cost of invalidating IOTLB entries by batching all the
invalidations in the system and performing a single global invalidation
instead. The code queues pending invalidations in a global queue that
is accessed under the global "async_umap_flush_lock" spinlock, which
can result is significant spinlock contention.
This patch splits this deferred queue into multiple per-cpu deferred
queues, and thus gets rid of the "async_umap_flush_lock" and its
contention. To keep existing deferred invalidation behavior, it still
invalidates the pending invalidations of all CPUs whenever a CPU
reaches its watermark or a timeout occurs.
Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased, cleaned up and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Currently, deferred flushes' info is striped between several lists in
the flush tables. Instead, move all information about a specific flush
to a single entry in this table.
This patch does not introduce any functional change.
Signed-off-by: Omer Peleg <omer@cs.technion.ac.il>
[mad@cs.technion.ac.il: rebased and reworded the commit message]
Signed-off-by: Adam Morrison <mad@cs.technion.ac.il>
Reviewed-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ben Serebrin <serebrin@google.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
My static checker complains that "dma_alias" is uninitialized unless we
are dealing with a pci device. This is true but harmless. Anyway, we
can flip the condition around to silence the warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
dma_pte_free_pagetable no longer depends on last level ptes
being clear, it clears them itself. Fix up the comment to
match.
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In the PCI hotplug path of the Intel IOMMU driver, replace
the usage of the BUS_NOTIFY_DEL_DEVICE notifier, which is
executed before the driver is unbound from the device, with
BUS_NOTIFY_REMOVED_DEVICE, which runs after that.
This fixes a kernel BUG being triggered in the VT-d code
when the device driver tries to unmap DMA buffers and the
VT-d driver already destroyed all mappings.
Reported-by: Stefani Seibold <stefani@seibold.net>
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
commit db0fa0cb01 "scatterlist: use sg_phys()" did replacements of
the form:
phys_addr_t phys = page_to_phys(sg_page(s));
phys_addr_t phys = sg_phys(s) & PAGE_MASK;
However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long). Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.
Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: db0fa0cb01 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This time including:
* A new IOMMU driver for s390 pci devices
* Common dma-ops support based on iommu-api for ARM64. The plan is to
use this as a basis for ARM32 and hopefully other architectures as
well in the future.
* MSI support for ARM-SMMUv3
* Cleanups and dead code removal in the AMD IOMMU driver
* Better RMRR handling for the Intel VT-d driver
* Various other cleanups and small fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
SecikoTtH1Q4RVtqOcAQ
=jLlB
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"This time including:
- A new IOMMU driver for s390 pci devices
- Common dma-ops support based on iommu-api for ARM64. The plan is
to use this as a basis for ARM32 and hopefully other architectures
as well in the future.
- MSI support for ARM-SMMUv3
- Cleanups and dead code removal in the AMD IOMMU driver
- Better RMRR handling for the Intel VT-d driver
- Various other cleanups and small fixes"
* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
iommu: Move default domain allocation to iommu_group_get_for_dev()
iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
iommu/arm-smmu: Switch to device_group call-back
iommu/fsl: Convert to device_group call-back
iommu: Add device_group call-back to x86 iommu drivers
iommu: Add generic_device_group() function
iommu: Export and rename iommu_group_get_for_pci_dev()
iommu: Revive device_group iommu-ops call-back
iommu/amd: Remove find_last_devid_on_pci()
iommu/amd: Remove first/last_device handling
iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
iommu/amd: Cleanup buffer allocation
iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
iommu/amd: Align DTE flag definitions
iommu/amd: Remove old alias handling code
iommu/amd: Set alias DTE in do_attach/do_detach
iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
...
Pull intel iommu updates from David Woodhouse:
"This adds "Shared Virtual Memory" (aka PASID support) for the Intel
IOMMU. This allows devices to do DMA using process address space,
translated through the normal CPU page tables for the relevant mm.
With corresponding support added to the i915 driver, this has been
tested with the graphics device on Skylake. We don't have the
required TLP support in our PCIe root ports for supporting discrete
devices yet, so it's only integrated devices that can do it so far"
* git://git.infradead.org/intel-iommu: (23 commits)
iommu/vt-d: Fix rwxp flags in SVM device fault callback
iommu/vt-d: Expose struct svm_dev_ops without CONFIG_INTEL_IOMMU_SVM
iommu/vt-d: Clean up pasid_enabled() and ecs_enabled() dependencies
iommu/vt-d: Handle Caching Mode implementations of SVM
iommu/vt-d: Fix SVM IOTLB flush handling
iommu/vt-d: Use dev_err(..) in intel_svm_device_to_iommu(..)
iommu/vt-d: fix a loop in prq_event_thread()
iommu/vt-d: Fix IOTLB flushing for global pages
iommu/vt-d: Fix address shifting in page request handler
iommu/vt-d: shift wrapping bug in prq_event_thread()
iommu/vt-d: Fix NULL pointer dereference in page request error case
iommu/vt-d: Implement SVM_FLAG_SUPERVISOR_MODE for kernel access
iommu/vt-d: Implement SVM_FLAG_PRIVATE_PASID to allocate unique PASIDs
iommu/vt-d: Add callback to device driver on page faults
iommu/vt-d: Implement page request handling
iommu/vt-d: Generalise DMAR MSI setup to allow for page request events
iommu/vt-d: Implement deferred invalidate for SVM
iommu/vt-d: Add basic SVM PASID support
iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS
iommu/vt-d: Add initial support for PASID tables
...
When booted with intel_iommu=ecs_off we were still allocating the PASID
tables even though we couldn't actually use them. We really want to make
the pasid_enabled() macro depend on ecs_enabled().
Which is unfortunate, because currently they're the other way round to
cope with the Broadwell/Skylake problems with ECS.
Instead of having ecs_enabled() depend on pasid_enabled(), which was never
something that made me happy anyway, make it depend in the normal case
on the "broken PASID" bit 28 *not* being set.
Then pasid_enabled() can depend on ecs_enabled() as it should. And we also
don't need to mess with it if we ever see an implementation that has some
features requiring ECS (like PRI) but which *doesn't* have PASID support.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Pull intel-iommu bugfix from David Woodhouse:
"This contains a single fix, for when the IOMMU API is used to overlay
an existing mapping comprised of 4KiB pages, with a mapping that can
use superpages.
For the *first* superpage in the new mapping, we were correctly¹
freeing the old bottom-level page table page and clearing the link to
it, before installing the superpage. For subsequent superpages,
however, we weren't. This causes a memory leak, and a warning about
setting a PTE which is already set.
¹ Well, not *entirely* correctly. We just free the page table pages
right there and then, which is wrong. In fact they should only be
freed *after* the IOTLB is flushed so we know the hardware will no
longer be looking at them.... and in fact I note that the IOTLB
flush is completely missing from the intel_iommu_map() code path,
although it needs to be there if it's permitted to overwrite
existing mappings.
Fixing those is somewhat more intrusive though, and will probably
need to wait for 4.4 at this point"
* tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu:
iommu/vt-d: fix range computation when making room for large pages
This will give a little bit of assistance to those developing drivers
using SVM. It might cause a slight annoyance to end-users whose kernel
disables the IOMMU when drivers are trying to use it. But the fix there
is to fix the kernel to enable the IOMMU.
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This provides basic PASID support for endpoint devices, tested with a
version of the i915 driver.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The behaviour if you enable PASID support after ATS is undefined. So we
have to enable it first, even if we don't know whether we'll need it.
This is safe enough; unless we set up a context that permits it, the device
can't actually *do* anything with it.
Also shift the feature detction to dmar_insert_one_dev_info() as it only
needs to happen once.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
As long as we use an identity mapping to work around the worst of the
hardware bugs which caused us to defeature it and change the definition
of the capability bit, we *can* use PASID support on the devices which
advertised it in bit 28 of the Extended Capability Register.
Allow people to do so with 'intel_iommu=pasid28' on the command line.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The VT-d specification says that "Software must enable ATS on endpoint
devices behind a Root Port only if the Root Port is reported as
supporting ATS transactions."
We walk up the tree to find a Root Port, but for integrated devices we
don't find one — we get to the host bridge. In that case we *should*
allow ATS. Currently we don't, which means that we are incorrectly
failing to use ATS for the integrated graphics. Fix that.
We should never break out of this loop "naturally" with bus==NULL,
since we'll always find bridge==NULL in that case (and now return 1).
So remove the check for (!bridge) after the loop, since it can never
happen. If it did, it would be worthy of a BUG_ON(!bridge). But since
it'll oops anyway in that case, that'll do just as well.
Cc: stable@vger.kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In preparation for deprecating ioremap_cache() convert its usage in
intel-iommu to memremap. This also eliminates the mishandling of the
__iomem annotation in the implementation.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In preparation for the installation of a large page, any small page
tables that may still exist in the target IOV address range are
removed. However, if a scatter/gather list entry is large enough to
fit more than one large page, the address space for any subsequent
large pages is not cleared of conflicting small page tables.
This can cause legitimate mapping requests to fail with errors of the
form below, potentially followed by a series of IOMMU faults:
ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083)
In this example, a 4MiB scatter/gather list entry resulted in the
successful installation of a large page @ vPFN 0xfdc00, followed by
a failed attempt to install another large page @ vPFN 0xfde00, due to
the presence of a pointer to a small page table @ 0x7f83a4000.
To address this problem, compute the number of large pages that fit
into a given scatter/gather list entry, and use it to derive the
last vPFN covered by the large page(s).
Cc: stable@vger.kernel.org
Signed-off-by: Christian Zander <christian@nervanasys.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
A few fixes piled up:
* Fix for a suspend/resume issue where PCI probing code overwrote
dev->irq for the MSI irq of the AMD IOMMU.
* Fix for a kernel crash when a 32 bit PCI device was assigned to a KVM
guest.
* Fix for a possible memory leak in the VT-d driver
* A couple of fixes for the ARM-SMMU driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWHNdbAAoJECvwRC2XARrjB/YQAKouJaRMjBaehx6kbaZMhMJy
hXDsh8Xl6TtCe6kLD2uXrvjLZAdu32kjrtzhhcM21EO5Ms2Weq6A60/98LwnJ4Eg
AqftjfxQsIwf2G1PvHb+xepgcFxIAhW6a3nORzx6d2AGrNWmMtUhbLTSncYjmojf
Td4dscuRmRPenJUV1JhcJQBR62QonknIHV99QmevaCSAoUdyuMH+t5kQVEgPjx7C
GlMPNEZZmGl7J3NXSWRtDSkUxFZ1OU8MTKc1LmPPHHAOZk37wbePihQbLLySlHPH
v4G1R05e2hG7C66yu959fyOleL87lDToUXhwQNFJMqEc+e7IzBzZsB3ANEHjpLQH
UJC9COU+sf8mPafja4ge/KbyGDmgDg/OMQJDhU6+DSXUflwymeWJmXr7sLFQex6O
nZO/SVzkbKj+PKxV7UnGD0sTeAAk0X6vfhFCL0l/acPpQg0T6Fpky5D5fUMv5dWS
xxxvxfwBcDoI44fxWBhfPYvmLFT9f5da+bpbzeeGjVSNezOkPJ65AJcVk5An4kQu
PRzJGoq3XpZHOeg5+O7IKzeuJ+3qc7Tz4wAzMxcaNFpVBl2qp1RUkTbmS9/YV1b5
ZOcIFBMLuUROE1ExsU19c5Uo0j1Bvh9jtdy6lNFCagQYzihtA0Jk19ucllx1jIjD
sdv2hgDIauRToKF1d9xz
=v5G4
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"A few fixes piled up:
- Fix for a suspend/resume issue where PCI probing code overwrote
dev->irq for the MSI irq of the AMD IOMMU.
- Fix for a kernel crash when a 32 bit PCI device was assigned to a
KVM guest.
- Fix for a possible memory leak in the VT-d driver
- A couple of fixes for the ARM-SMMU driver"
* tag 'iommu-fixes-v4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix NULL pointer deref on device detach
iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices
iommu/vt-d: Fix memory leak in dmar_insert_one_dev_info()
iommu/arm-smmu: Use correct address mask for CMD_TLBI_S2_IPA
iommu/arm-smmu: Ensure IAS is set correctly for AArch32-capable SMMUs
iommu/io-pgtable-arm: Don't use dma_to_phys()
Currently the RMRR entries are created only at boot time.
This means they will vanish when the domain allocated at
boot time is destroyed.
This patch makes sure that also newly allocated domains will
get RMRR mappings.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Split the part of the function that fetches the domain out
and put the rest into into a domain_prepare_identity_map, so
that the code can also be used with when the domain is
already known.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull IOVA fixes from David Woodhouse:
"The main fix here is the first one, fixing the over-allocation of
size-aligned requests. The other patches simply make the existing
IOVA code available to users other than the Intel VT-d driver, with no
functional change.
I concede the latter really *should* have been submitted during the
merge window, but since it's basically risk-free and people are
waiting to build on top of it and it's my fault I didn't get it in, I
(and they) would be grateful if you'd take it"
* git://git.infradead.org/intel-iommu:
iommu: Make the iova library a module
iommu: iova: Export symbols
iommu: iova: Move iova cache management to the iova library
iommu/iova: Avoid over-allocating when size-aligned
We are returning NULL if we are not able to attach the iommu
to the domain but while returning we missed freeing info.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This time the IOMMU updates are mostly cleanups or fixes. No big new
features or drivers this time. In particular the changes include:
* Bigger cleanup of the Domain<->IOMMU data structures and the
code that manages them in the Intel VT-d driver. This makes
the code easier to understand and maintain, and also easier to
keep the data structures in sync. It is also a preparation
step to make use of default domains from the IOMMU core in the
Intel VT-d driver.
* Fixes for a couple of DMA-API misuses in ARM IOMMU drivers,
namely in the ARM and Tegra SMMU drivers.
* Fix for a potential buffer overflow in the OMAP iommu driver's
debug code
* A couple of smaller fixes and cleanups in various drivers
* One small new feature: Report domain-id usage in the Intel
VT-d driver to easier detect bugs where these are leaked.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJV7sCEAAoJECvwRC2XARrjz3YP/Au4IIfqykfPvmI0cmPhVnAV
Q72tltwkbK2u2iP+pHheveaMngJtAshsZrnhBon4KJRIt/KTLZQvsFplHDaRhPfY
yw3LIxhC5kLG/S6irY9Ozb0+uTMdQ3BU2uS23pyoFVfCz+RngBrAwDBcTKqZDCDG
8dNd+T21XlzxuyeGr58h9upz2VFtq6feoGFhLU5PNxTlf4JWZe77D7NlbSvx6Nwy
7Ai8dVRgpV9ciUP7w8FXrCUvbMZQDIoTMiWGNSlogVMgA0dllGES91UZYhWf3pil
abuX6DeFul/cOhEOnH2xa+j5zz2O/upe9stU4wAFw6IhPiAELTHc2NKlWAhwb0SY
bpDRf7dgLnUfqpmZLpWjTwN4jllc0qS2MIHj+eUu0uhdFi4Z0BuH2wSCdbR7xkqk
u5u0Jq7hDNKs5FmQTSsWSiAdjakMsRjIN7jMrBbOeZnBSmUnLx74KGPLTb63ncR3
WIOi4Iyu+LSXBIvZDiLu3lIIh7Atzd+y7IDnb8KXdyqfy+h53OZZOJNbP/qTWHgT
ZUdm/qrqjIQpTQfleOEadC7vY/y3fR5sBtOQHUamfntni3oYCc4AMRlNdf3eV9lb
Tyss6F699mU7d/vennTaIToBgVwaXdLYtmvGWjnoT/kqOMclyDf3cIUtZGtp2rJR
ddmzDA3vBUC5pGj8Hd8R
=yoGE
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates for from Joerg Roedel:
"This time the IOMMU updates are mostly cleanups or fixes. No big new
features or drivers this time. In particular the changes include:
- Bigger cleanup of the Domain<->IOMMU data structures and the code
that manages them in the Intel VT-d driver. This makes the code
easier to understand and maintain, and also easier to keep the data
structures in sync. It is also a preparation step to make use of
default domains from the IOMMU core in the Intel VT-d driver.
- Fixes for a couple of DMA-API misuses in ARM IOMMU drivers, namely
in the ARM and Tegra SMMU drivers.
- Fix for a potential buffer overflow in the OMAP iommu driver's
debug code
- A couple of smaller fixes and cleanups in various drivers
- One small new feature: Report domain-id usage in the Intel VT-d
driver to easier detect bugs where these are leaked"
* tag 'iommu-updates-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (83 commits)
iommu/vt-d: Really use upper context table when necessary
x86/vt-d: Fix documentation of DRHD
iommu/fsl: Really fix init section(s) content
iommu/io-pgtable-arm: Unmap and free table when overwriting with block
iommu/io-pgtable-arm: Move init-fn declarations to io-pgtable.h
iommu/msm: Use BUG_ON instead of if () BUG()
iommu/vt-d: Access iomem correctly
iommu/vt-d: Make two functions static
iommu/vt-d: Use BUG_ON instead of if () BUG()
iommu/vt-d: Return false instead of 0 in irq_remapping_cap()
iommu/amd: Use BUG_ON instead of if () BUG()
iommu/amd: Make a symbol static
iommu/amd: Simplify allocation in irq_remapping_alloc()
iommu/tegra-smmu: Parameterize number of TLB lines
iommu/tegra-smmu: Factor out tegra_smmu_set_pde()
iommu/tegra-smmu: Extract tegra_smmu_pte_get_use()
iommu/tegra-smmu: Use __GFP_ZERO to allocate zeroed pages
iommu/tegra-smmu: Remove PageReserved manipulation
iommu/tegra-smmu: Convert to use DMA API
iommu/tegra-smmu: smmu_flush_ptc() wants device addresses
...
Pull SG updates from Jens Axboe:
"This contains a set of scatter-gather related changes/fixes for 4.3:
- Add support for limited chaining of sg tables even for
architectures that do not set ARCH_HAS_SG_CHAIN. From Christoph.
- Add sg chain support to target_rd. From Christoph.
- Fixup open coded sg->page_link in crypto/omap-sham. From
Christoph.
- Fixup open coded crypto ->page_link manipulation. From Dan.
- Also from Dan, automated fixup of manual sg_unmark_end()
manipulations.
- Also from Dan, automated fixup of open coded sg_phys()
implementations.
- From Robert Jarzmik, addition of an sg table splitting helper that
drivers can use"
* 'for-4.3/sg' of git://git.kernel.dk/linux-block:
lib: scatterlist: add sg splitting function
scatterlist: use sg_phys()
crypto/omap-sham: remove an open coded access to ->page_link
scatterlist: remove open coded sg_unmark_end instances
crypto: replace scatterwalk_sg_chain with sg_chain
target/rd: always chain S/G list
scatterlist: allow limited chaining without ARCH_HAS_SG_CHAIN
There is a bug in iommu_context_addr() which will always use
the lower context table, even when the upper context table
needs to be used. Fix this issue.
Fixes: 03ecc32c52 ("iommu/vt-d: support extended root and context entries")
Reported-by: Xiao, Nan <nan.xiao@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When a 'struct device_domain_info' is created as an alias
for another device, this struct will not be re-used when the
real device is encountered. Fix that to avoid duplicate
device_domain_info structures being added.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For devices without an PCI alias there will be two
device_domain_info structures added. Prevent that by
checking if the alias is different from the device.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This struct contains all necessary information for the
function already. Also handle the info->dev == NULL case
while at it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The code in the locked section does not touch anything
protected by the dmar_global_lock. Remove it from there.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When this lock is held the device_domain_lock is also
required to make sure the device_domain_info does not vanish
while in use. So this lock can be removed as it gives no
additional protection.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move the code to attach/detach domains to iommus and vice
verce into a single function to make sure there are no
dangling references.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This makes domain attachment more synchronous with domain
deattachment. The domain<->iommu link is released in
dmar_remove_one_dev_info.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename this function and the ones further down its
call-chain to domain_context_clear_*. In particular this
means:
iommu_detach_dependent_devices -> domain_context_clear
iommu_detach_dev_cb -> domain_context_clear_one_cb
iommu_detach_dev -> domain_context_clear_one
These names match a lot better with its
domain_context_mapping counterparts.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename the function to dmar_remove_one_dev_info to match is
name better with its dmar_insert_one_dev_info counterpart.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename this function to dmar_insert_one_dev_info() to match
the name better with its counter part function
domain_remove_one_dev_info().
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Do the context-mapping of devices from a single place in the
call-path and clean up the other call-sites.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Just call domain_remove_one_dev_info() for all devices in
the domain instead of reimplementing the functionality.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We don't need to do an expensive search for domain-ids
anymore, as we keep track of per-iommu domain-ids.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This replaces the dmar_domain->iommu_bmp with a similar
reference count array. This allows us to keep track of how
many devices behind each iommu are attached to the domain.
This is necessary for further simplifications and
optimizations to the iommu<->domain attachment code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This field is now obsolete because all places use the
per-iommu domain-ids. Kill the remaining uses of this field
and remove it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There is no reason for this special handling of the
si_domain. The per-iommu domain-id can be allocated
on-demand like for any other domain. So remove the
pre-allocation code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This function can figure out the domain-id to use itself
from the iommu_did array. This is more reliable over
different domain types and brings us one step further to
remove the domain->id field.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Get rid of the special cases for VM domains vs. non-VM
domains and simplify the code further to just handle the
hardware passthrough vs. page-table case.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There is no reason to pass the translation type through
multiple layers. It can also be determined in the
domain_context_mapping_one function directly.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The special case for VM domains is not needed, as other
domains could be attached to the iommu in the same way. So
get rid of this special case.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This array is indexed by the domain-id and contains the
pointers to the domains attached to this iommu. Modern
systems support 65536 domain ids, so that this array has a
size of 512kb, per iommu.
This is a huge waste of space, as the array is usually
sparsely populated. This patch makes the array
two-dimensional and allocates the memory for the domain
pointers on-demand.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Instead of searching in the domain array for already
allocated domain ids, keep track of them explicitly.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Debugging domain ID leakage typically requires long running tests in
order to exhaust the domain ID space or kernel instrumentation to
track the setting and clearing of bits. A couple trivial intel-iommu
specific sysfs extensions make it much easier to expose the IOMMU
capabilities and current usage.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This makes sure it won't be possible to accidentally leak format
strings into iommu device names. Current name allocations are safe,
but this makes the "%s" explicit.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This is necessary to separate intel-iommu from the iova library.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Currently, allocating a size-aligned IOVA region quietly adjusts the
actual allocation size in the process, returning a rounded-up
power-of-two-sized allocation. This results in mismatched behaviour in
the IOMMU driver if the original size was not a power of two, where the
original size is mapped, but the rounded-up IOVA size is unmapped.
Whilst some IOMMUs will happily unmap already-unmapped pages, others
consider this an error, so fix it by computing the necessary alignment
padding without altering the actual allocation size. Also clean up by
making pad_size unsigned, since its callers always pass unsigned values
and negative padding makes little sense here anyway.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This continues the attempt to fix commit fb170fb4c5 ("iommu/vt-d:
Introduce helper functions to make code symmetric for readability").
The previous attempt in commit 7168440690 ("iommu/vt-d: Detach
domain *only* from attached iommus") overlooked the fact that
dmar_domain.iommu_bmp gets cleared for VM domains when devices are
detached:
intel_iommu_detach_device
domain_remove_one_dev_info
domain_detach_iommu
The domain is detached from the iommu, but the iommu is still attached
to the domain, for whatever reason. Thus when we get to domain_exit(),
we can't rely on iommu_bmp for VM domains to find the active iommus,
we must check them all. Without that, the corresponding bit in
intel_iommu.domain_ids doesn't get cleared and repeated VM domain
creation and destruction will run out of domain IDs. Meanwhile we
still can't call iommu_detach_domain() on arbitrary non-VM domains or
we risk clearing in-use domain IDs, as 7168440690 attempted to
address.
It's tempting to modify iommu_detach_domain() to test the domain
iommu_bmp, but the call ordering from domain_remove_one_dev_info()
prevents it being able to work as fb170fb4c5 seems to have intended.
Caching of unused VM domains on the iommu object seems to be the root
of the problem, but this code is far too fragile for that kind of
rework to be proposed for stable, so we simply revert this chunk to
its state prior to fb170fb4c5.
Fixes: fb170fb4c5 ("iommu/vt-d: Introduce helper functions to make
code symmetric for readability")
Fixes: 7168440690 ("iommu/vt-d: Detach domain *only* from attached
iommus")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We check the ATS state (enabled/disabled) and fetch the PCI ATS Invalidate
Queue Depth in performance-sensitive paths. It's easy to cache these,
which removes dependencies on PCI.
Remember the ATS enabled state. When enabling, read the queue depth once
and cache it in the device_domain_info struct. This is similar to what
amd_iommu.c does.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Do not touch the TE bit unless we know translation is
disabled.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For all the copy-translation code to run, we have to keep
translation enabled in intel_iommu_init(). So remove the
code disabling it.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
We can't change the RTT bit when translation is enabled, so
don't copy translation tables when we would change the bit
with our new root entry.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When we copied over context tables from an old kernel, we
need to defer assignment of devices to domains until the
device driver takes over. So skip this part of
initialization when we copied over translation tables from
the old kernel.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This seperates the allocation of the si_domain from its
assignment to devices. It makes sure that the iommu=pt case
still works in the kdump kernel, when we have to defer the
assignment of devices to domains to device driver
initialization time.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Mark the context entries we copied over from the old kernel,
so that we don't detect them as present in other code paths.
This makes sure we safely overwrite old context entries when
a new domain is assigned.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Mark all domain-ids we find as reserved, so that there could
be no collision between domains from the previous kernel and
our domains in the IOMMU TLB.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If we are in a kdump kernel and find translation enabled in
the iommu, try to copy the translation tables from the old
kernel to preserve the mappings until the device driver
takes over.
This supports old and the extended root-entry and
context-table formats.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add code to detect whether translation is already enabled in
the IOMMU. Save this state in a flags field added to
struct intel_iommu.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In case there was an old root entry, make our new one
visible immediately after it was allocated.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
QI needs to be available when we write the root entry into
hardware because flushes might be necessary after this.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Give them a common prefix that can be grepped for and
improve the wording here and there.
Tested-by: ZhenHua Li <zhen-hual@hp.com>
Tested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Although the extended tables are theoretically a completely orthogonal
feature to PASID and anything else that *uses* the newly-available bits,
some of the early hardware has problems even when all we do is enable
them and use only the same bits that were in the old context tables.
For now, there's no motivation to support extended tables unless we're
going to use PASID support to do SVM. So just don't use them unless
PASID support is advertised too. Also add a command-line bailout just in
case later chips also have issues.
The equivalent problem for PASID support has already been fixed with the
upcoming VT-d spec update and commit bd00c606a ("iommu/vt-d: Change
PASID support to bit 40 of Extended Capability Register"), because the
problematic platforms use the old definition of the PASID-capable bit,
which is now marked as reserved and meaningless.
So with this change, we'll magically start using ECS again only when we
see the new hardware advertising "hey, we have PASID support and we
actually tested it this time" on bit 40.
The VT-d hardware architect has promised that we are not going to have
any reason to support ECS *without* PASID any time soon, and he'll make
sure he checks with us before changing that.
In the future, if hypothetical new features also use new bits in the
context tables and can be seen on implementations *without* PASID support,
we might need to add their feature bits to the ecs_enabled() macro.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When we use 'intel_iommu=igfx_off' to disable translation for the
graphics, and when we discover that the BIOS has misconfigured the DMAR
setup for I/OAT, we use a special DUMMY_DEVICE_DOMAIN_INFO value in
dev->archdata.iommu to indicate that translation is disabled.
With passthrough mode, we were attempting to dereference that as a
normal pointer to a struct device_domain_info when setting up an
identity mapping for the affected device.
This fixes the problem by making device_to_iommu() explicitly check for
the special value and indicate that no IOMMU was found to handle the
devices in question.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org (which means you can pick up 18436afdc now too)
Pull intel iommu updates from David Woodhouse:
"This lays a little of the groundwork for upcoming Shared Virtual
Memory support — fixing some bogus #defines for capability bits and
adding the new ones, and starting to use the new wider page tables
where we can, in anticipation of actually filling in the new fields
therein.
It also allows graphics devices to be assigned to VM guests again.
This got broken in 3.17 by disallowing assignment of RMRR-afflicted
devices. Like USB, we do understand why there's an RMRR for graphics
devices — and unlike USB, it's actually sane. So we can make an
exception for graphics devices, just as we do USB controllers.
Finally, tone down the warning about the X2APIC_OPT_OUT bit, due to
persistent requests. X2APIC_OPT_OUT was added to the spec as a nasty
hack to allow broken BIOSes to forbid us from using X2APIC when they
do stupid and invasive things and would break if we did.
Someone noticed that since Windows doesn't have full IOMMU support for
DMA protection, setting the X2APIC_OPT_OUT bit made Windows avoid
initialising the IOMMU on the graphics unit altogether.
This means that it would be available for use in "driver mode", where
the IOMMU registers are made available through a BAR of the graphics
device and the graphics driver can do SVM all for itself.
So they started setting the X2APIC_OPT_OUT bit on *all* platforms with
SVM capabilities. And even the platforms which *might*, if the
planets had been aligned correctly, possibly have had SVM capability
but which in practice actually don't"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: support extended root and context entries
iommu/vt-d: Add new extended capabilities from v2.3 VT-d specification
iommu/vt-d: Allow RMRR on graphics devices too
iommu/vt-d: Print x2apic opt out info instead of printing a warning
iommu/vt-d: kill bogus ecap_niotlb_iunits()
Not much this time, but the changes include:
* Moving domain allocation into the iommu drivers to prepare for
the introduction of default domains for devices
* Fixing the IO page-table code in the AMD IOMMU driver to
correctly encode large page sizes
* Extension of the PCI support in the ARM-SMMU driver
* Various fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVNFIPAAoJECvwRC2XARrj4v8QAMVsPJ+kmnLvqGDkO9v2i9z6
sFX27h55HhK3Pgb5aEmEhvZd0Eec22KtuADr92LsRSjskgA4FgrzzSlo8w7+MbwM
dtowij+5Bzx/jEeexM5gog0ZA9Brl725KSYBmwJIAroKAtl3YXsIA4TO7X/JtXJm
0qWbCxLs9CX5uWyJawkeDl8UAaZYb8AHKv1UhJt8Z5yajM/qITMULi51g2Bgh8kx
YaRHeZNj+mFQqb6IlNkmOhILN+dbTdxQREp+aJs1alGdkBGlJyfo6eK4weNOpA4x
gc8EXUWZzj1GEPyWMpA/ZMzPzCbj9M6wTeXqRiTq31AMV10zcy545uYcLWks680M
CYvWTmjeCvwsbuaj9cn+efa47foH2UoeXxBmXWOJDv4WxcjE1ejmlmSd8WYfwkh9
hIkMzD8tW2iZf3ssnjCeQLa7f6ydL2P4cpnK2JH+N7hN9VOASAlciezroFxtCjU+
18T7ozgUTbOXZZomBX7OcGQ8ElXMiHB/uaCyNO64yVzApsUnQfpHzcRI5OavOYn5
dznjrzvNLCwHs3QFI4R7rsmIfPkOM0g5nY5drGwJ23+F+rVpLmpWVPR5hqT7a1HM
tJVmzces6HzOu7P1Mo0IwvNbZEmNBGTHYjGtWs6e79MQxdriFT4I+DwvFOy7GUq/
Is2b+HPwWhiWJHQXLTT2
=FMxH
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"Not much this time, but the changes include:
- moving domain allocation into the iommu drivers to prepare for the
introduction of default domains for devices
- fixing the IO page-table code in the AMD IOMMU driver to correctly
encode large page sizes
- extension of the PCI support in the ARM-SMMU driver
- various fixes and cleanups"
* tag 'iommu-updates-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (34 commits)
iommu/amd: Correctly encode huge pages in iommu page tables
iommu/amd: Optimize amd_iommu_iova_to_phys for new fetch_pte interface
iommu/amd: Optimize alloc_new_range for new fetch_pte interface
iommu/amd: Optimize iommu_unmap_page for new fetch_pte interface
iommu/amd: Return the pte page-size in fetch_pte
iommu/amd: Add support for contiguous dma allocator
iommu/amd: Don't allocate with __GFP_ZERO in alloc_coherent
iommu/amd: Ignore BUS_NOTIFY_UNBOUND_DRIVER event
iommu/amd: Use BUS_NOTIFY_REMOVED_DEVICE
iommu/tegra: smmu: Compute PFN mask at runtime
iommu/tegra: gart: Set aperture at domain initialization time
iommu/tegra: Setup aperture
iommu: Remove domain_init and domain_free iommu_ops
iommu/fsl: Make use of domain_alloc and domain_free
iommu/rockchip: Make use of domain_alloc and domain_free
iommu/ipmmu-vmsa: Make use of domain_alloc and domain_free
iommu/shmobile: Make use of domain_alloc and domain_free
iommu/msm: Make use of domain_alloc and domain_free
iommu/tegra-gart: Make use of domain_alloc and domain_free
iommu/tegra-smmu: Make use of domain_alloc and domain_free
...
* device-properties:
device property: Introduce firmware node type for platform data
device property: Make it possible to use secondary firmware nodes
driver core: Implement device property accessors through fwnode ones
driver core: property: Update fwnode_property_read_string_array()
driver core: Add comments about returning array counts
ACPI: Introduce has_acpi_companion()
driver core / ACPI: Represent ACPI companions using fwnode_handle
Get rid of domain_init and domain_destroy and implement
domain_alloc/domain_free instead.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a new function iommu_context_addr() which takes care of the
differences and returns a pointer to a context entry which may be
in either format. The formats are binary compatible for all the old
fields anyway; the new one is just larger and some of the reserved
bits in the original 128 are now meaningful.
So far, nothing actually uses the new fields in the extended context
entry. Modulo hardware bugs with interpreting the new-style tables,
this should basically be a no-op.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API
domains") prevents certain options for devices with RMRRs. This even
prevents those devices from getting a 1:1 mapping with 'iommu=pt',
because we don't have the code to handle *preserving* the RMRR regions
when moving the device between domains.
There's already an exclusion for USB devices, because we know the only
reason for RMRRs there is a misguided desire to keep legacy
keyboard/mouse emulation running in some theoretical OS which doesn't
have support for USB in its own right... but which *does* enable the
IOMMU.
Add an exclusion for graphics devices too, so that 'iommu=pt' works
there. We should be able to successfully assign graphics devices to
guests too, as long as the initial handling of stolen memory is
reconfigured appropriately. This has certainly worked in the past.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
Device domains never span IOMMU hardware units, which allows the
domain ID space for each IOMMU to be an independent address space.
Therefore we can have multiple, independent domains, each with the
same domain->id, but attached to different hardware units. This is
also why we need to do a heavy-weight search for VM domains since
they can span multiple IOMMUs hardware units and we don't require a
single global ID to use for all hardware units.
Therefore, if we call iommu_detach_domain() across all active IOMMU
hardware units for a non-VM domain, the result is that we clear domain
IDs that are not associated with our domain, allowing them to be
re-allocated and causing apparent coherency issues when the device
cannot access IOVAs for the intended domain.
This bug was introduced in commit fb170fb4c5 ("iommu/vt-d: Introduce
helper functions to make code symmetric for readability"), but is
significantly exacerbated by the more recent commit 62c22167dd
("iommu/vt-d: Fix dmar_domain leak in iommu_attach_device") which calls
domain_exit() more frequently to resolve a domain leak.
Fixes: fb170fb4c5 ("iommu/vt-d: Introduce helper functions to make code symmetric for readability")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that the ACPI companions of devices are represented by pointers
to struct fwnode_handle, it is not quite efficient to check whether
or not an ACPI companion of a device is present by evaluating the
ACPI_COMPANION() macro.
For this reason, introduce a special static inline routine for that,
has_acpi_companion(), and update the code to use it where applicable.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Systems may contain heterogeneous IOMMUs supporting differing minimum
page sizes, which may also not be common with the CPU page size.
Thus it is practical to have an explicit notion of IOVA granularity
to simplify handling of mapping and allocation constraints.
As an initial step, move the IOVA page granularity from an implicit
compile-time constant to a per-domain property so we can make use
of it in IOVA domain context at runtime. To keep the abstraction tidy,
extend the little API of inline iova_* helpers to parallel some of the
equivalent PAGE_* macros.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
To share the IOVA allocator with other architectures, it needs to
accommodate more general aperture restrictions; move the lower limit
from a compile-time constant to a runtime domain property to allow
IOVA domains with different requirements to co-exist.
Also reword the slightly unclear description of alloc_iova since we're
touching it anyway.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In order to share the IOVA allocator with other architectures, break
the unnecssary dependency on the Intel IOMMU driver and move the
remaining IOVA internals to iova.c
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since commit 1196c2f a domain is only destroyed in the
notifier path if it is hot-unplugged. This caused a
domain leakage in iommu_attach_device when a driver was
unbound from the device and bound to VFIO. In this case the
device is attached to a new domain and unlinked from the old
domain. At this point nothing points to the old domain
anymore and its memory is leaked.
Fix this by explicitly freeing the old domain in
iommu_attach_domain.
Fixes: 1196c2f (iommu/vt-d: Fix dmar_domain leak in iommu_attach_device)
Cc: stable@vger.kernel.org # v3.18
Tested-by: Jerry Hoemann <jerry.hoemann@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There's an off-by-one bug in function __domain_mapping(), which may
trigger the BUG_ON(nr_pages < lvl_pages) when
(nr_pages + 1) & superpage_mask == 0
The issue was introduced by commit 9051aa0268 "intel-iommu: Combine
domain_pfn_mapping() and domain_sg_mapping()", which sets sg_res to
"nr_pages + 1" to avoid some of the 'sg_res==0' code paths.
It's safe to remove extra "+1" because sg_res is only used to calculate
page size now.
Reported-And-Tested-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: <stable@vger.kernel.org> # >= 3.0
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Implement required callback functions for intel-iommu driver
to support DMAR unit hotplug.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
On Intel platforms, an IO Hub (PCI/PCIe host bridge) may contain DMAR
units, so we need to support DMAR hotplug when supporting PCI host
bridge hotplug on Intel platforms.
According to Section 8.8 "Remapping Hardware Unit Hot Plug" in "Intel
Virtualization Technology for Directed IO Architecture Specification
Rev 2.2", ACPI BIOS should implement ACPI _DSM method under the ACPI
object for the PCI host bridge to support DMAR hotplug.
This patch introduces interfaces to parse ACPI _DSM method for
DMAR unit hotplug. It also implements state machines for DMAR unit
hot-addition and hot-removal.
The PCI host bridge hotplug driver should call dmar_hotplug_hotplug()
before scanning PCI devices connected for hot-addition and after
destroying all PCI devices for hot-removal.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce functions to support dynamic IOMMU seq_id allocating and
releasing, which will be used to support DMAR hotplug.
Also rename IOMMU_UNITS_SUPPORTED as DMAR_UNITS_SUPPORTED.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce helper function dmar_walk_resources to walk resource entries
in DMAR table and ACPI buffer object returned by ACPI _DSM method
for IOMMU hot-plug.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The function context_set_address_root() and set_root_value are setting new
address in a wrong way, and this patch is trying to fix this problem.
According to Intel Vt-d specs(Feb 2011, Revision 1.3), Chapter 9.1 and 9.2,
field ctp in root entry is using bits 12:63, field asr in context entry is
using bits 12:63.
To set these fields, the following functions are used:
static inline void context_set_address_root(struct context_entry *context,
unsigned long value);
and
static inline void set_root_value(struct root_entry *root, unsigned long value)
But they are using an invalid method to set these fields, in current code, only
a '|' operator is used to set it. This will not set the asr to the expected
value if it has an old value.
For example:
Before calling this function,
context->lo = 0x3456789012111;
value = 0x123456789abcef12;
After we call context_set_address_root(context, value), expected result is
context->lo == 0x123456789abce111;
But the actual result is:
context->lo == 0x1237577f9bbde111;
So we need to clear bits 12:63 before setting the new value, this will fix
this problem.
Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Mapping and unmapping are more often than not in the critical path.
map_sg allows IOMMU driver implementations to optimize the process
of mapping buffers into the IOMMU page tables.
Instead of mapping a buffer one page at a time and requiring potentially
expensive TLB operations for each page, this function allows the driver
to map all pages in one go and defer TLB maintenance until after all
pages have been mapped.
Additionally, the mapping operation would be faster in general since
clients does not have to keep calling map API over and over again for
each physically contiguous chunk of memory that needs to be mapped to a
virtually contiguous region.
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This makes sure any RMRR mappings stay in place when the
driver is unbound from the device.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Jerry Hoemann <jerry.hoemann@hp.com>
When the BUS_NOTIFY_DEL_DEVICE event is received the device
might still be attached to a driver. In this case the domain
can't be released as the mappings might still be in use.
Defer the domain removal in this case until we receivce the
BUS_NOTIFY_UNBOUND_DRIVER event.
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org # v3.15, v3.16
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The user of the IOMMU API domain expects to have full control of
the IOVA space for the domain. RMRRs are fundamentally incompatible
with that idea. We can neither map the RMRR into the IOMMU API
domain, nor can we guarantee that the device won't continue DMA with
the area described by the RMRR as part of the new domain. Therefore
we must prevent such devices from being used by the IOMMU API.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
IOMMU units may dynamically attached to/detached from domains,
so we should scan all active IOMMU units when computing iommu_snooping
flag for a domain instead of only scanning IOMMU units associated
with the domain.
Also check snooping and superpage capabilities when hot-adding DMAR units.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce helper function domain_pfn_within_range() to simplify code
and improve readability.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce intel_unmap() to reduce duplicated code in intel_unmap_sg()
and intel_unmap_page().
Also let dma_pte_free_pagetable() to call dma_pte_clear_range() directly,
so caller only needs to call dma_pte_free_pagetable().
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Virtual machine domains are created by intel_iommu_domain_init() and
should be destroyed by intel_iommu_domain_destroy(). So avoid freeing
virtual machine domain data structure in free_dmar_iommu() when
doamin->iommu_count reaches zero, otherwise it may cause invalid
memory access because the IOMMU framework still holds references
to the domain structure.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Static identity and virtual machine domains may be cached in
iommu->domain_ids array after corresponding IOMMUs have been removed
from domain->iommu_bmp. So we should check domain->iommu_bmp before
decreasing domain->iommu_count in function free_dmar_iommu(), otherwise
it may cause free of inuse domain data structure.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Check the same domain id is allocated for si_domain on each IOMMU,
otherwise the IOTLB flush for si_domain will fail.
Now the rules to allocate and manage domain id are:
1) For normal and static identity domains, domain id is allocated
when creating domain structure. And this id will be written into
context entry.
2) For virtual machine domain, a virtual id is allocated when creating
domain. And when binding virtual machine domain to an iommu, a real
domain id is allocated on demand and this domain id will be written
into context entry. So domain->id for virtual machine domain may be
different from the domain id written into context entry(used by
hardware).
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce domain_attach_iommu()/domain_detach_iommu() and refine
iommu_attach_domain()/iommu_detach_domain() to make code symmetric
and improve readability.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For virtual machine domains, domain->id is a virtual id, and the real
domain id written into context entry is dynamically allocated.
So use the real domain id instead of domain->id when flushing iotlbs
for virtual machine domains.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For virtual machine and static identity domains, there may be devices
from different PCI segments associated with the same domain.
So function iommu_support_dev_iotlb() should also match PCI segment
number (iommu unit) when searching for dev_iotlb capable devices.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This structure is read-only data and should never be modified.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Register our DRHD IOMMUs, cross link devices, and provide a base set
of attributes for the IOMMU. Note that IRQ remapping support parses
the DMAR table very early in boot, well before the iommu_class can
reasonably be setup, so our registration is split between
intel_iommu_init(), which occurs later, and alloc_iommu(), which
typically occurs much earlier, but may happen at any time later
with IOMMU hot-add support.
On a typical desktop system, this provides the following (pruned):
$ find /sys | grep dmar
/sys/devices/virtual/iommu/dmar0
/sys/devices/virtual/iommu/dmar0/devices
/sys/devices/virtual/iommu/dmar0/devices/0000:00:02.0
/sys/devices/virtual/iommu/dmar0/intel-iommu
/sys/devices/virtual/iommu/dmar0/intel-iommu/cap
/sys/devices/virtual/iommu/dmar0/intel-iommu/ecap
/sys/devices/virtual/iommu/dmar0/intel-iommu/address
/sys/devices/virtual/iommu/dmar0/intel-iommu/version
/sys/devices/virtual/iommu/dmar1
/sys/devices/virtual/iommu/dmar1/devices
/sys/devices/virtual/iommu/dmar1/devices/0000:00:00.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:01.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:16.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:1a.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:1b.0
/sys/devices/virtual/iommu/dmar1/devices/0000:00:1c.0
...
/sys/devices/virtual/iommu/dmar1/intel-iommu
/sys/devices/virtual/iommu/dmar1/intel-iommu/cap
/sys/devices/virtual/iommu/dmar1/intel-iommu/ecap
/sys/devices/virtual/iommu/dmar1/intel-iommu/address
/sys/devices/virtual/iommu/dmar1/intel-iommu/version
/sys/class/iommu/dmar0
/sys/class/iommu/dmar1
(devices also link back to the dmar units)
This makes address, version, capabilities, and extended capabilities
available, just like printed on boot. I've tried not to duplicate
data that can be found in the DMAR table, with the exception of the
address, which provides an easy way to associate the sysfs device with
a DRHD entry in the DMAR. It's tempting to add scopes and RMRR data
here, but the full DMAR table is already exposed under /sys/firmware/
and therefore already provides a way for userspace to learn such
details.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
VT-d code currently makes use of pci_find_upstream_pcie_bridge() in
order to find the topology based alias of a device. This function has
a few problems. First, it doesn't check the entire alias path of the
device to the root bus, therefore if a PCIe device is masked upstream,
the wrong result is produced. Also, it's known to get confused and
give up when it crosses a bridge from a conventional PCI bus to a PCIe
bus that lacks a PCIe capability. The PCI-core provided DMA alias
support solves both of these problems and additionally adds support
for DMA function quirks allowing VT-d to work with devices like
Marvell and Ricoh with known broken requester IDs.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The IOMMU code now provides a common interface for finding or
creating an IOMMU group for a device on PCI buses. Make use of it
and remove piles of code.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
suppress compiler warnings:
drivers/iommu/intel-iommu.c: In function ‘device_to_iommu’:
drivers/iommu/intel-iommu.c:673: warning: ‘segment’ may be used uninitialized in this function
drivers/iommu/intel-iommu.c: In function ‘get_domain_for_dev.clone.3’:
drivers/iommu/intel-iommu.c:2217: warning: ‘bridge_bus’ may be used uninitialized in this function
drivers/iommu/intel-iommu.c:2217: warning: ‘bridge_devfn’ may be used uninitialized in this function
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use inline function dma_pte_superpage() instead of macro for
better readability.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Alloc_domain() will initialize domain->nid to -1. So the
initialization for domain->nid in md_domain_init() is redundant,
clear it.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use list_for_each_entry_safe() instead of list_entry()
to simplify code.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Function dmar_iommu_notify_scope_dev() makes a wrong assumption that
there's one RMRR for each PCI device at most, which causes DMA failure
on some HP platforms. So enhance dmar_iommu_notify_scope_dev() to
handle multiple RMRRs for the same PCI device.
Fixbug: https://bugzilla.novell.com/show_bug.cgi?id=879482
Cc: <stable@vger.kernel.org> # 3.15
Reported-by: Tom Mingarelli <thomas.mingarelli@hp.com>
Tested-by: Linda Knippers <linda.knippers@hp.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This adds support for the DMA Contiguous Memory Allocator for
intel-iommu. This change enables dma_alloc_coherent() to allocate big
contiguous memory.
It is achieved in the same way as nommu_dma_ops currently does, i.e.
trying to allocate memory by dma_alloc_from_contiguous() and
alloc_pages() is used as a fallback.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 146922ec79 ("iommu/vt-d: Make get_domain_for_dev() take struct
device") introduced new variables bridge_bus and bridge_devfn to
identify the upstream PCIe to PCI bridge responsible for the given
target device. Leaving the original bus/devfn variables to identify
the target device itself, now that it is no longer assumed to be PCI
and we can no longer trivially find that information.
However, the patch failed to correctly use the new variables in all
cases; instead using the as-yet-uninitialised 'bus' and 'devfn'
variables.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Commit ea8ea46 "iommu/vt-d: Clean up and fix page table clear/free
behaviour" introduces possible leakage of DMA page tables due to:
for (pte = page_address(pg); !first_pte_in_page(pte); pte++) {
if (dma_pte_present(pte) && !dma_pte_superpage(pte))
freelist = dma_pte_list_pagetables(domain, level - 1,
pte, freelist);
}
For the first pte in a page, first_pte_in_page(pte) will always be true,
thus dma_pte_list_pagetables() will never be called and leak DMA page
tables if level is bigger than 1.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
If we hit this error condition then we want to return a NULL pointer and
not a freed variable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Mostly made redundant by using dev_name() instead of pci_name(), and one
instance of using *dev->dma_mask instead of pdev->dma_mask.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Should hopefully never happen (RMRRs are an abomination) but while we're
busy eliminating all the PCI assumptions, we might as well do it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Pass the struct device to it, and also make it return the bus/devfn to use,
since that is also stored in the DMAR table.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This was problematic because it works by domain/bus/devfn and we want
to make device_to_iommu() use only a struct device * (for handling non-PCI
devices). Now that the iommu pointer is reliably stored in the
device_domain_info, we don't need to look it up.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
By moving this into get_domain_for_dev() we can make dmar_insert_dev_info()
suitable for use with "special" domains such as the si_domain, which
currently use domain_add_dev_info().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
It's not only for PCI devices any more, and the scope information for an
ACPI device provides the bus and devfn so that has to be stored here too.
It is the device pointer itself which needs to be protected with RCU,
so the __rcu annotation follows it into the definition of struct
dmar_dev_scope, since we're no longer just passing arrays of device
pointers around.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In commit 2e12bc29 ("intel-iommu: Default to non-coherent for domains
unattached to iommus") we decided to err on the side of caution and
always assume that it's possible that a device will be attached which is
behind a non-coherent IOMMU.
In some cases, however, that just *cannot* happen. If there *are* no
IOMMUs in the system which are non-coherent, then we don't need to do
it. And flushing the dcache is a *significant* performance hit.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There is a race condition between the existing clear/free code and the
hardware. The IOMMU is actually permitted to cache the intermediate
levels of the page tables, and doesn't need to walk the table from the
very top of the PGD each time. So the existing back-to-back calls to
dma_pte_clear_range() and dma_pte_free_pagetable() can lead to a
use-after-free where the IOMMU reads from a freed page table.
When freeing page tables we actually need to do the IOTLB flush, with
the 'invalidation hint' bit clear to indicate that it's not just a
leaf-node flush, after unlinking each page table page from the next level
up but before actually freeing it.
So in the rewritten domain_unmap() we just return a list of pages (using
pg->freelist to make a list of them), and then the caller is expected to
do the appropriate IOTLB flush (or tear down the domain completely,
whatever), before finally calling dma_free_pagelist() to free the pages.
As an added bonus, we no longer need to flush the CPU's data cache for
pages which are about to be *removed* from the page table hierarchy anyway,
in the non-cache-coherent case. This drastically improves the performance
of large unmaps.
As a side-effect of all these changes, this also fixes the fact that
intel_iommu_unmap() was neglecting to free the page tables for the range
in question after clearing them.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We have this horrid API where iommu_unmap() can unmap more than it's asked
to, if the IOVA in question happens to be mapped with a large page.
Instead of propagating this nonsense to the point where we end up returning
the page order from dma_pte_clear_range(), let's just do it once and adjust
the 'size' parameter accordingly.
Augment pfn_to_dma_pte() to return the level at which the PTE was found,
which will also be useful later if we end up changing the API for
iommu_iova_to_phys() to behave the same way as is being discussed upstream.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Now we have a PCI bus notification based mechanism to update DMAR
device scope array, we could extend the mechanism to support boot
time initialization too, which will help to unify and simplify
the implementation.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Current Intel DMAR/IOMMU driver assumes that all PCI devices associated
with DMAR/RMRR/ATSR device scope arrays are created at boot time and
won't change at runtime, so it caches pointers of associated PCI device
object. That assumption may be wrong now due to:
1) introduction of PCI host bridge hotplug
2) PCI device hotplug through sysfs interfaces.
Wang Yijing has tried to solve this issue by caching <bus, dev, func>
tupple instead of the PCI device object pointer, but that's still
unreliable because PCI bus number may change in case of hotplug.
Please refer to http://lkml.org/lkml/2013/11/5/64
Message from Yingjing's mail:
after remove and rescan a pci device
[ 611.857095] dmar: DRHD: handling fault status reg 2
[ 611.857109] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff7000
[ 611.857109] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.857524] dmar: DRHD: handling fault status reg 102
[ 611.857534] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff6000
[ 611.857534] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.857936] dmar: DRHD: handling fault status reg 202
[ 611.857947] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff5000
[ 611.857947] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.858351] dmar: DRHD: handling fault status reg 302
[ 611.858362] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff4000
[ 611.858362] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.860819] IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
[ 611.860983] dmar: DRHD: handling fault status reg 402
[ 611.860995] dmar: INTR-REMAP: Request device [[86:00.3] fault index a4
[ 611.860995] INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear
This patch introduces a new mechanism to update the DRHD/RMRR/ATSR device scope
caches by hooking PCI bus notification.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Global DMA and interrupt remapping resources may be accessed in
interrupt context, so use RCU instead of rwsem to protect them
in such cases.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Introduce a global rwsem dmar_global_lock, which will be used to
protect DMAR related global data structures from DMAR/PCI/memory
device hotplug operations in process context.
DMA and interrupt remapping related data structures are read most,
and only change when memory/PCI/DMAR hotplug event happens.
So a global rwsem solution is adopted for balance between simplicity
and performance.
For interrupt remapping driver, function intel_irq_remapping_supported(),
dmar_table_init(), intel_enable_irq_remapping(), disable_irq_remapping(),
reenable_irq_remapping() and enable_drhd_fault_handling() etc
are called during booting, suspending and resuming with interrupt
disabled, so no need to take the global lock.
For interrupt remapping entry allocation, the locking model is:
down_read(&dmar_global_lock);
/* Find corresponding iommu */
iommu = map_hpet_to_ir(id);
if (iommu)
/*
* Allocate remapping entry and mark entry busy,
* the IOMMU won't be hot-removed until the
* allocated entry has been released.
*/
index = alloc_irte(iommu, irq, 1);
up_read(&dmar_global_lock);
For DMA remmaping driver, we only uses the dmar_global_lock rwsem to
protect functions which are only called in process context. For any
function which may be called in interrupt context, we will use RCU
to protect them in following patches.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Introduce for_each_dev_scope()/for_each_active_dev_scope() to walk
{active} device scope entries. This will help following RCU lock
related patches.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Reduce duplicated code to handle virtual machine domains, there's no
functionality changes. It also improves code readability.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Enhance function get_domain_for_dev() to release allocated resources
if failed to create domain for PCIe endpoint, otherwise the allocated
resources will get lost.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Function get_domain_for_dev() is a little complex, simplify it
by factoring out dmar_search_domain_by_dev_info() and
dmar_insert_dev_info().
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Move private structures and variables into intel-iommu.c, which will
help to simplify locking policy for hotplug. Also delete redundant
declarations.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Function device_notifier() in intel-iommu.c only remove domain_device_info
data structure associated with a PCI device when handling PCI device
driver unbinding events. If a PCI device has never been bound to a PCI
device driver, there won't be BUS_NOTIFY_UNBOUND_DRIVER event when
hot-removing the PCI device. So associated domain_device_info data
structure may get lost.
On the other hand, if iommu_pass_through is enabled, function
iommu_prepare_static_indentify_mapping() will create domain_device_info
data structure for each PCIe to PCIe bridge and PCIe endpoint,
no matter whether there are drivers associated with those PCIe devices
or not. So those domain_device_info data structures will get lost when
hot-removing the assocated PCIe devices if they have never bound to
any PCI device driver.
To be even worse, it's not only an memory leak issue, but also an
caching of stale information bug because the memory are kept in
device_domain_list and domain->devices lists.
Fix the bug by trying to remove domain_device_info data structure when
handling BUS_NOTIFY_DEL_DEVICE event.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Function device_notifier() in intel-iommu.c fails to remove
device_domain_info data structures for PCI devices if they are
associated with si_domain because iommu_no_mapping() returns true
for those PCI devices. This will cause memory leak and caching of
stale information in domain->devices list.
So fix the issue by not calling iommu_no_mapping() and skipping check
of iommu_pass_through.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
A few patches have been queued up for this merge window:
* Improvements for the ARM-SMMU driver
(IOMMU_EXEC support, IOMMU group support)
* Updates and fixes for the shmobile IOMMU driver
* Various fixes to generic IOMMU code and the
Intel IOMMU driver
* Some cleanups in IOMMU drivers (dev_is_pci() usage)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJS6XrCAAoJECvwRC2XARrjgy4P/itemtg2U+603Ldje8WcPo0E
OCO/0VVSmCTYKUJDZY0hiVwqmhe5gFL3Hm/gGwkS0+UJenFXMmi+aVaPp4pCpgH+
dL2HD3dIEvi14bisrdxG/8MdR6mIx0qzKtnZLkKSR4LXwucyLHvC/DaCoOytb7Yk
7s+eEuo0hj0jAkiqSG/zLEtKElTEnoAAkLOjMy46orecJ5q4HusPZekLtWZs2ETe
x3NS63Unb9g1iSQJWIA7HnQlxWIr2+iynoamHHJRiVFzqRF0W0sGvQY3auG0DSCn
70WRNE1rKfEkfXMJxosRQ4394YUQdAkt8MBENNcJcC6E1n5PBi0cEZXH6mCnEIlG
jXzIKUY9fz68ZboaqIxXv4Hb+JLlPXCvPBvQzIQiKRgVxd8nncEjn5I9MHdf+je5
BmJlzJLJvP4cFvW8Hc8k2Oq101b1kEcSCLARWWvE9/bk9xIUyrqBkR4XjC0vb6qq
1HbKVdZ7KFKCkBHy9xMpr7CUjKiDiiLeUmqlhyjcK9spicuNIZQnC11HemL6/USP
oR6Ext9RGhvz+ch656+5+L6f6FURVP8/ywKiJ3RjmvXV5/fCYo3WMitOB2qzlWCy
SYXAczAOMOdOo+1Dxbghrr+7HzUWPqgfPmntZEPGMZhfuZ6xXr+7pGLjAhHb4vcR
SZxqkDo1cprqrR9KFAWC
=YKLk
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU Updates from Joerg Roedel:
"A few patches have been queued up for this merge window:
- improvements for the ARM-SMMU driver (IOMMU_EXEC support, IOMMU
group support)
- updates and fixes for the shmobile IOMMU driver
- various fixes to generic IOMMU code and the Intel IOMMU driver
- some cleanups in IOMMU drivers (dev_is_pci() usage)"
* tag 'iommu-updates-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
iommu/vt-d: Fix signedness bug in alloc_irte()
iommu/vt-d: free all resources if failed to initialize DMARs
iommu/vt-d, trivial: clean sparse warnings
iommu/vt-d: fix wrong return value of dmar_table_init()
iommu/vt-d: release invalidation queue when destroying IOMMU unit
iommu/vt-d: fix access after free issue in function free_dmar_iommu()
iommu/vt-d: keep shared resources when failed to initialize iommu devices
iommu/vt-d: fix invalid memory access when freeing DMAR irq
iommu/vt-d, trivial: simplify code with existing macros
iommu/vt-d, trivial: use defined macro instead of hardcoding
iommu/vt-d: mark internal functions as static
iommu/vt-d, trivial: clean up unused code
iommu/vt-d, trivial: check suitable flag in function detect_intel_iommu()
iommu/vt-d, trivial: print correct domain id of static identity domain
iommu/vt-d, trivial: refine support of 64bit guest address
iommu/vt-d: fix resource leakage on error recovery path in iommu_init_domains()
iommu/vt-d: fix a race window in allocating domain ID for virtual machines
iommu/vt-d: fix PCI device reference leakage on error recovery path
drm/msm: Fix link error with !MSM_IOMMU
iommu/vt-d: use dedicated bitmap to track remapping entry allocation status
...
dma_pte_free_level() has an off-by-one error when checking whether a pte
is completely covered by a range. Take for example the case of
attempting to free pfn 0x0 - 0x1ff, ie. 512 entries covering the first
2M superpage.
The level_size() is 0x200 and we test:
static void dma_pte_free_level(...
...
if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
...
}
Clearly the 2nd test is true, which means we fail to take the branch to
clear and free the pagetable entry. As a result, we're leaking
pagetables and failing to install new pages over the range.
This was found with a PCI device assigned to a QEMU guest using vfio-pci
without a VGA device present. The first 1M of guest address space is
mapped with various combinations of 4K pages, but eventually the range
is entirely freed and replaced with a 2M contiguous mapping.
intel-iommu errors out with something like:
ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083)
In this case 5c2b8003 is the pointer to the previous leaf page that was
neither freed nor cleared and 849c00083 is the superpage entry that
we're trying to replace it with.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enhance intel_iommu_init() to free all resources if failed to
initialize DMAR hardware.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Clean up most sparse warnings in Intel DMA and interrupt remapping
drivers.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Simplify vt-d related code with existing macros and introduce a new
macro for_each_active_drhd_unit() to enumerate all active DRHD unit.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Remove dead code from VT-d related files.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Conflicts:
drivers/iommu/dmar.c
Field si_domain->id is set by iommu_attach_domain(), so we should only
print domain id for static identity domain after calling
iommu_attach_domain(si_domain, iommu), otherwise it's always zero.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
In Intel IOMMU driver, it calculate page table level from adjusted guest
address width as 'level = (agaw - 30) / 9', which assumes (agaw -30)
could be divided by 9. On the other hand, 64bit is a valid agaw and
(64 - 30) can't be divided by 9, so it needs special handling.
This patch enhances Intel IOMMU driver to correctly handle 64bit agaw.
It's mainly for code readability because there's no hardware supporting
64bit agaw yet.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Release allocated resources on error recovery path in function
iommu_init_domains().
Also improve printk messages in iommu_init_domains().
Acked-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Function intel_iommu_domain_init() may be concurrently called by upper
layer without serialization, so use atomic_t to protect domain id
allocation.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Use PCI standard marco dev_is_pci() instead of directly compare
pci_bus_type to check whether it is pci device.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
The BUG_ON in drivers/iommu/intel-iommu.c:785 can be triggered from userspace via
VFIO by calling the VFIO_IOMMU_MAP_DMA ioctl on a vfio device with any address
beyond the addressing capabilities of the IOMMU. The problem is that the ioctl code
calls iommu_iova_to_phys before it calls iommu_map. iommu_map handles the case that
it gets addresses beyond the addressing capabilities of its IOMMU.
intel_iommu_iova_to_phys does not.
This patch fixes iommu_iova_to_phys to return NULL for addresses beyond what the
IOMMU can handle. This in turn causes the ioctl call to fail in iommu_map and
(correctly) return EFAULT to the user with a helpful warning message in the kernel
log.
Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
At best the current code only seems to free the leaf pagetables and
the root. If you're unlucky enough to have a large gap (like any
QEMU guest with more than 3G of memory), only the first chunk of leaf
pagetables are freed (plus the root). This is a massive memory leak.
This patch re-writes the pagetable freeing function to use a
recursive algorithm and manages to not only free all the pagetables,
but does it without any apparent performance loss versus the current
broken version.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
If a device is multifunction and does not have ACS enabled then we
assume that the entire package lacks ACS and use function 0 as the
base of the group. The PCIe spec however states that components are
permitted to implement ACS on some, none, or all of their applicable
functions. It's therefore conceivable that function 0 may be fully
independent and support ACS while other functions do not. Instead
use the lowest function of the slot that does not have ACS enabled
as the base of the group. This may be the current device, which is
intentional. So long as we use a consistent algorithm, all the
non-ACS functions will be grouped together and ACS functions will
get separate groups.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
The swap_pci_ref function is used by the IOMMU API code for
swapping pci device pointers, while determining the iommu
group for the device.
Currently this function was being implemented for different
IOMMU drivers. This patch moves the function to a new file,
drivers/iommu/pci.h so that the implementation can be
shared across various IOMMU drivers.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This patch disables translation(dma-remapping) before its initialization
if it is already enabled.
This is needed for kexec/kdump boot. If dma-remapping is enabled in the
first kernel, it need to be disabled before initializing its page table
during second kernel boot. Wei Hu also reported that this is needed
when second kernel boots with intel_iommu=off.
Basically iommu->gcmd is used to know whether translation is enabled or
disabled, but it is always zero at boot time even when translation is
enabled since iommu->gcmd is initialized without considering such a
case. Therefor this patch synchronizes iommu->gcmd value with global
command register when iommu structure is allocated.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This is required in case of PAMU, as it can support a window size of up
to 64G (even on 32bit).
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Pull drm merge from Dave Airlie:
"Highlights:
- TI LCD controller KMS driver
- TI OMAP KMS driver merged from staging
- drop gma500 stub driver
- the fbcon locking fixes
- the vgacon dirty like zebra fix.
- open firmware videomode and hdmi common code helpers
- major locking rework for kms object handling - pageflip/cursor
won't block on polling anymore!
- fbcon helper and prime helper cleanups
- i915: all over the map, haswell power well enhancements, valleyview
macro horrors cleaned up, killing lots of legacy GTT code,
- radeon: CS ioctl unification, deprecated UMS support, gpu reset
rework, VM fixes
- nouveau: reworked thermal code, external dp/tmds encoder support
(anx9805), fences sleep instead of polling,
- exynos: all over the driver fixes."
Lovely conflict in radeon/evergreen_cs.c between commit de0babd60d
("drm/radeon: enforce use of radeon_get_ib_value when reading user cmd")
and the new changes that modified that evergreen_dma_cs_parse()
function.
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (508 commits)
drm/tilcdc: only build on arm
drm/i915: Revert hdmi HDP pin checks
drm/tegra: Add list of framebuffers to debugfs
drm/tegra: Fix color expansion
drm/tegra: Split DC_CMD_STATE_CONTROL register write
drm/tegra: Implement page-flipping support
drm/tegra: Implement VBLANK support
drm/tegra: Implement .mode_set_base()
drm/tegra: Add plane support
drm/tegra: Remove bogus tegra_framebuffer structure
drm: Add consistency check for page-flipping
drm/radeon: Use generic HDMI infoframe helpers
drm/tegra: Use generic HDMI infoframe helpers
drm: Add EDID helper documentation
drm: Add HDMI infoframe helpers
video: Add generic HDMI infoframe helpers
drm: Add some missing forward declarations
drm: Move mode tables to drm_edid.c
drm: Remove duplicate drm_mode_cea_vic()
gma500: Fix n, m1 and m2 clock limits for sdvo and lvds
...
Pull x86/apic changes from Ingo Molnar:
"Main changes:
- Multiple MSI support added to the APIC, PCI and AHCI code - acked
by all relevant maintainers, by Alexander Gordeev.
The advantage is that multiple AHCI ports can have multiple MSI
irqs assigned, and can thus spread to multiple CPUs.
[ Drivers can make use of this new facility via the
pci_enable_msi_block_auto() method ]
- x86 IOAPIC code from interrupt remapping cleanups from Joerg
Roedel:
These patches move all interrupt remapping specific checks out of
the x86 core code and replaces the respective call-sites with
function pointers. As a result the interrupt remapping code is
better abstraced from x86 core interrupt handling code.
- Various smaller improvements, fixes and cleanups."
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess
x86, kvm: Fix intialization warnings in kvm.c
x86, irq: Move irq_remapped out of x86 core code
x86, io_apic: Introduce eoi_ioapic_pin call-back
x86, msi: Introduce x86_msi.compose_msi_msg call-back
x86, irq: Introduce setup_remapped_irq()
x86, irq: Move irq_remapped() check into free_remapped_irq
x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq()
x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core
x86, irq: Add data structure to keep AMD specific irq remapping information
x86, irq: Move irq_remapping_enabled declaration to iommu code
x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin
x86, io_apic: Move irq_remapping_enabled checks out of check_timer()
x86, io_apic: Convert setup_ioapic_entry to function pointer
x86, io_apic: Introduce set_affinity function pointer
x86, msi: Use IRQ remapping specific setup_msi_irqs routine
x86, hpet: Introduce x86_msi_ops.setup_hpet_msi
x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging
x86, io_apic: Introduce x86_io_apic_ops.disable()
x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume
...
We already have the quirk entry for the mobile platform, but also
reports on some desktop versions. So be paranoid and set it
everywhere.
References: http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg33138.html
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "Sankaran, Rajesh" <rajesh.sankaran@intel.com>
Reported-and-tested-by: Mihai Moldovan <ionic@ionic.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Remove the last left-over from this flag from x86 code.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
DMAR support on g4x/gm45 integrated gpus seems to be totally busted.
So don't bother, but instead disable it by default to allow distros to
unconditionally enable DMAR support.
v2: Actually wire up the right quirk entry, spotted by Adam Jackson.
Note that according to intel marketing materials only g45 and gm45
support DMAR/VT-d. So we have reports for all relevant gen4 pci ids by
now. Still, keep all the other gen4 ids in the quirk table in case the
marketing stuff confused me again, which would not be the first time.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51921
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=538163
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=538163
Cc: Adam Jackson <ajax@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: stathis <stathis@npcglib.org>
Tested-by: Mihai Moldovan <ionic@ionic.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Omar Ramirez Luna <omar.luna@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Hiroshi Doyu <hdoyu@nvidia.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Bharat Nihalani <bnihalani@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A few new features this merge-window. The most important one is
probably, that dma-debug now warns if a dma-handle is not checked with
dma_mapping_error by the device driver. This requires minor changes to
some architectures which make use of dma-debug. Most of these changes
have the respective Acks by the Arch-Maintainers.
Besides that there are updates to the AMD IOMMU driver for refactor the
IOMMU-Groups support and to make sure it does not trigger a hardware
erratum.
The OMAP changes (for which I pulled in a branch from Tony Lindgren's
tree) have a conflict in linux-next with the arm-soc tree. The conflict
is in the file arch/arm/mach-omap2/clock44xx_data.c which is deleted in
the arm-soc tree. It is safe to delete the file too so solve the
conflict. Similar changes are done in the arm-soc tree in the common
clock framework migration. A missing hunk from the patch in the IOMMU
tree will be submitted as a seperate patch when the merge-window is
closed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQzbQQAAoJECvwRC2XARrjXCIP/2RxBzbVOiaPOorl+ZWbsZ41
lzWiXsCHJkh4BK4/qGsVeKhiNd9LcbQUlhywnBbhWxym3spzmjGtvU2Hcg8QiO/M
R83r9S4e8Z6DnF9Gcats1Ns9BufgpyhLXg3XoXPxtyHOgRS59fvYi6xXOxyX30Dy
uhbj+WL6UD0zvOMNztEnM1p6UhX+XlpvzKDTR5+G5xKdVPkcgeiaKSwqz739caTn
QE2NpqIh+8Mwuu1nIapk8h07xhUYU5eGMXa38u1LvDwSHsrsCMLC+lXIjtInn7Gw
Bv+XcCHgtOaoPQwwk/xd2HVwJQxO9HNb5YX51EIjwP0C5S/3yW9Ji1RgqFb6Ewqq
jIkF6ckwUheLWsBGkw5UknI/f7RX3MDiTWkziYLIniYKKewm+ymGfgIqPt2TzLIO
tMZZiIssKvy7wOXQ5JjpYJg5Xmrau6opNwdEguC8pWkJT7qsn+3SeLjMt0Lh9IoY
+37DOgOLb3O3/vnZJ3i0KMRZBfVeaRj5HaGmlxFCYUZCNQymIPTih9Jtqm+WuVcu
YaGQCTtynsQ0JVh8YEekLzSfgd3OODP68fyCg1CQNixEgvUi2hd/toX2/Z1wkkSA
JC9bZarcoPkSWqaTAA2HvmaaxvRR+0UbhFPopFTQarVV0MVLZWBxoyuKy/nMrmMd
UgTzrDYy74UKdrSTwIXg
=pPHZ
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"A few new features this merge-window. The most important one is
probably, that dma-debug now warns if a dma-handle is not checked with
dma_mapping_error by the device driver. This requires minor changes
to some architectures which make use of dma-debug. Most of these
changes have the respective Acks by the Arch-Maintainers.
Besides that there are updates to the AMD IOMMU driver for refactor
the IOMMU-Groups support and to make sure it does not trigger a
hardware erratum.
The OMAP changes (for which I pulled in a branch from Tony Lindgren's
tree) have a conflict in linux-next with the arm-soc tree. The
conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is
deleted in the arm-soc tree. It is safe to delete the file too so
solve the conflict. Similar changes are done in the arm-soc tree in
the common clock framework migration. A missing hunk from the patch
in the IOMMU tree will be submitted as a seperate patch when the
merge-window is closed."
* tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (29 commits)
ARM: dma-mapping: support debug_dma_mapping_error
ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks
iommu/omap: Adapt to runtime pm
iommu/omap: Migrate to hwmod framework
iommu/omap: Keep mmu enabled when requested
iommu/omap: Remove redundant clock handling on ISR
iommu/amd: Remove obsolete comment
iommu/amd: Don't use 512GB pages
iommu/tegra: smmu: Move bus_set_iommu after probe for multi arch
iommu/tegra: gart: Move bus_set_iommu after probe for multi arch
iommu/tegra: smmu: Remove unnecessary PTC/TLB flush all
tile: dma_debug: add debug_dma_mapping_error support
sh: dma_debug: add debug_dma_mapping_error support
powerpc: dma_debug: add debug_dma_mapping_error support
mips: dma_debug: add debug_dma_mapping_error support
microblaze: dma-mapping: support debug_dma_mapping_error
ia64: dma_debug: add debug_dma_mapping_error support
c6x: dma_debug: add debug_dma_mapping_error support
ARM64: dma_debug: add debug_dma_mapping_error support
intel-iommu: Prevent devices with RMRRs from being placed into SI Domain
...
The dma_pte_free_pagetable() function will only free a page table page
if it is asked to free the *entire* 2MiB range that it covers. So if a
page table page was used for one or more small mappings, it's likely to
end up still present in the page tables... but with no valid PTEs.
This was fine when we'd only be repopulating it with 4KiB PTEs anyway
but the same virtual address range can end up being reused for a
*large-page* mapping. And in that case were were trying to insert the
large page into the second-level page table, and getting a complaint
from the sanity check in __domain_mapping() because there was already a
corresponding entry. This was *relatively* harmless; it led to a memory
leak of the old page table page, but no other ill-effects.
Fix it by calling dma_pte_clear_range (hopefully redundant) and
dma_pte_free_pagetable() before setting up the new large page.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Ravi Murty <Ravi.Murty@intel.com>
Tested-by: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: stable@kernel.org [3.0+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is to prevent non-USB devices that have RMRRs associated with them from
being placed into the SI Domain during init. This fixes the issue where the RMRR info
for devices being placed in and out of the SI Domain gets lost.
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Tested-by: Shuah Khan <shuah.khan@hp.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
We can't assume this device exists, fall back to the bridge itself.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Matthew Thode <prometheanfire@gentoo.org>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joro@8bytes.org>
This time the IOMMU updates contain a bunch of fixes and cleanups to
various IOMMU drivers and the DMA debug code. New features are the
code for IRQ remapping support with the AMD IOMMU (preperation for that
was already merged in the last release) and a debugfs interface to
export some statistics in the NVidia Tegra IOMMU driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIbBAABAgAGBQJQbvi2AAoJECvwRC2XARrjIOcP92esm3auz49N2+rq5TAto6hX
4MbpUFgkpKQRdFUSgYOsuzjh+I46ZpWEVajCxM+9oJz4I3PalDH0hvN7VkcZTjvR
bENRaLOZNR39rtIPrNVlne2RX7DXjKxJF5+K7jD6uMGJdItPfXk1Vo0VrLw30+XD
E3pWNFCS+qJkWl7d0UGUJ6CzS1BZuHZdAx+XdHTa9DIV88jANiUefIV35cAVuqrL
xVDDDMs7ITIYn6vTpEj7/zXCHNcFS7Ki4FQxAQ4DFvpPdrvLtPMeXnVSBMg0LPpe
Gb2r8tjywo83yjZMoDxca6w20SJPkQHoWwlansGlg5j0KwqPhvD1M/OXwzbq6avd
fqL69kZTXttLY+qtnlM5YfAqySXevGmGusgmqEXT7GvbAH6dSqMDdPU2W6lt4zBd
80LyAdcwP5X5hT91KiOfGGpFY4np97dUC7QFbwAdgDO7t6R+c2OfSte6azzWkYm4
2eYCsIKy+5U6XLSNabp7QNAxnic3mt6/AVMydZsVLyjeQZ8ybCwslQjPxSNwQrxc
bx8z8KFY37UuFUs77dYFIChM8BJP6erKL0W1BOt0Ve2PWgeOaEl3y5n3mwBe+GXa
JU9SPqy0juNZIYJWU1ZE/90ULo6ZpB3wM7QpMgX1pJ7KTd1gYDjSXhAEJi5Eje8r
GNHlYwjxsQZQcvR19g8=
=Q5kj
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This time the IOMMU updates contain a bunch of fixes and cleanups to
various IOMMU drivers and the DMA debug code. New features are the
code for IRQ remapping support with the AMD IOMMU (preperation for
that was already merged in the last release) and a debugfs interface
to export some statistics in the NVidia Tegra IOMMU driver."
* tag 'iommu-updates-v3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (27 commits)
iommu/amd: Remove obsolete comment line
dma-debug: Remove local BUS_NOTIFY_UNBOUND_DRIVER define
iommu/amd: Fix possible use after free in get_irq_table()
iommu/amd: Report irq remapping through IOMMU-API
iommu/amd: Print message to system log when irq remapping is enabled
iommu/irq: Use amd_iommu_irq_ops if supported
iommu/amd: Make sure irq remapping still works on dma init failure
iommu/amd: Add initialization routines for AMD interrupt remapping
iommu/amd: Add call-back routine for HPET MSI
iommu/amd: Implement MSI routines for interrupt remapping
iommu/amd: Add IOAPIC remapping routines
iommu/amd: Add routines to manage irq remapping tables
iommu/amd: Add IRTE invalidation routine
iommu/amd: Make sure IOMMU is not considered to translate itself
iommu/amd: Split device table initialization into irq and dma part
iommu/amd: Check if IOAPIC information is correct
iommu/amd: Allocate data structures to keep track of irq remapping tables
iommu/amd: Add slab-cache for irq remapping tables
iommu/amd: Keep track of HPET and IOAPIC device ids
iommu/amd: Fix features reporting
...
domain_update_iommu_coherency() currently defaults to setting domains
as coherent when the domain is not attached to any iommus. This
allows for a window in domain_context_mapping_one() where such a
domain can update context entries non-coherently, and only after
update the domain capability to clear iommu_coherency.
This can be seen using KVM device assignment on VT-d systems that
do not support coherency in the ecap register. When a device is
added to a guest, a domain is created (iommu_coherency = 0), the
device is attached, and ranges are mapped. If we then hot unplug
the device, the coherency is updated and set to the default (1)
since no iommus are attached to the domain. A subsequent attach
of a device makes use of the same dmar domain (now marked coherent)
updates context entries with coherency enabled, and only disables
coherency as the last step in the process.
To fix this, switch domain_update_iommu_coherency() to use the
safer, non-coherent default for domains not attached to iommus.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Acked-by: Donald Dutile <ddutile@redhat.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* commit 'v3.6-rc5': (1098 commits)
Linux 3.6-rc5
HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
Remove user-triggerable BUG from mpol_to_str
xen/pciback: Fix proper FLR steps.
uml: fix compile error in deliver_alarm()
dj: memory scribble in logi_dj
Fix order of arguments to compat_put_time[spec|val]
xen: Use correct masking in xen_swiotlb_alloc_coherent.
xen: fix logical error in tlb flushing
xen/p2m: Fix one-off error in checking the P2M tree directory.
powerpc: Don't use __put_user() in patch_instruction
powerpc: Make sure IPI handlers see data written by IPI senders
powerpc: Restore correct DSCR in context switch
powerpc: Fix DSCR inheritance in copy_thread()
powerpc: Keep thread.dscr and thread.dscr_inherit in sync
powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
powerpc/powernv: Always go into nap mode when CPU is offline
powerpc: Give hypervisor decrementer interrupts their own handler
powerpc/vphn: Fix arch_update_cpu_topology() return value
ARM: gemini: fix the gemini build
...
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/rapidio/devices/tsi721.c
Introduce an inline function pci_pcie_type(dev) to extract PCIe
device type from pci_dev->pcie_flags_reg field, and prepare for
removing pci_dev->pcie_type.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
SR-IOV can create buses without a bridge. There may be other cases
where this happens as well. In these cases skip to the parent bus
and continue testing devices there.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add missing free_domain_mem on failure path after alloc_domain.
A simplified version of the semantic match that finds this
problem is as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@km exists@
local idexpression e;
expression e1,e2,e3;
type T,T1;
identifier f;
@@
* e = alloc_domain(...)
... when any
when != e = e1
when != e1 = (T)e
when != e1(...,(T)e,...)
when != &e->f
if(...) { ... when != e2(...,(T1)e,...)
when != e3 = e
when forall
(
return <+...e...+>;
|
* return ...;
) }
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The most important part of these updates is the IOMMU groups code
enhancement written by Alex Williamson. It abstracts the problem that a
given hardware IOMMU can't isolate any given device from any other
device (e.g. 32 bit PCI devices can't usually be isolated). Devices that
can't be isolated are grouped together. This code is required for the
upcoming VFIO framework.
Another IOMMU-API change written by be is the introduction of domain
attributes. This makes it easier to handle GART-like IOMMUs with the
IOMMU-API because now the start-address and the size of the domain
address space can be queried.
Besides that there are a few cleanups and fixes for the NVidia Tegra
IOMMU drivers and the reworked init-code for the AMD IOMMU. The later is
from my patch-set to support interrupt remapping. The rest of this
patch-set requires x86 changes which are not mergabe yet. So full
support for interrupt remapping with AMD IOMMUs will come in a future
merge window.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQDV/MAAoJECvwRC2XARrjSDcP+gJbtSHDMyZ71zyfQfAZcxJt
rTqLbdZRtIjrjgtKSEDp8u5Bo5TK9dAYoZVuJMOZewFzwI/fSfbRsWp1PU0I88Fr
ZzM+/o1N9MLvf1e3kRVOzNzUfku+jTQgUBD4txsbtQzc/IeGHe9qP1Bqzs/xg4Pk
SjWu7pLNYxaER10z76nRodNn6zGjsc7GFdOW8cJu2HOAHhisIAR291jSQgd6Rz9r
zWqSTsXIEzYt2CtU3G2/tFJ554Mp8v5F80gHo+0Ldw8aNxlD6nGtbqGNt+KI8qTv
MUL8KJ0TNms9CZdti1CSlSNp51VgJi2GaWKCaDAkYuuER2IbC/8Yp/p2DIIA0GNp
HpziIs+dauZPWfZHc6oU7lJAClGAG4MUx7CysVIOzl7ML/Bf4mjYv0faGf5YQfyE
weOR+OPPIWDUwgjzHKMAboA4ijkE/v+EKjOaN/S9rEqFEMKC99fwGkf9wUcpZTne
8lzdI2JrgYNDWMVNYlomeLD4lBAbxb/QsnRUa33igjr0MclvMDkp5HaO631Z1+Zx
be2z8Rl1CtMwS4qeaOXoeaoNWHU26+oJRZNtCGi/Fw4aKqYXP1dnE/m0GtqEP9Yi
+CU2rKbZn3j0+ZcQjCQop8FREPrZ2/Uaji70b6G7WZ2ApcqBxzBffpbMKOmd6T1D
HIzGh0fpdYNDuwn6Txit
=MbAC
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The most important part of these updates is the IOMMU groups code
enhancement written by Alex Williamson. It abstracts the problem that
a given hardware IOMMU can't isolate any given device from any other
device (e.g. 32 bit PCI devices can't usually be isolated). Devices
that can't be isolated are grouped together. This code is required
for the upcoming VFIO framework.
Another IOMMU-API change written by me is the introduction of domain
attributes. This makes it easier to handle GART-like IOMMUs with the
IOMMU-API because now the start-address and the size of the domain
address space can be queried.
Besides that there are a few cleanups and fixes for the NVidia Tegra
IOMMU drivers and the reworked init-code for the AMD IOMMU. The
latter is from my patch-set to support interrupt remapping. The rest
of this patch-set requires x86 changes which are not mergabe yet. So
full support for interrupt remapping with AMD IOMMUs will come in a
future merge window."
* tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits)
iommu/amd: Fix hotplug with iommu=pt
iommu/amd: Add missing spin_lock initialization
iommu/amd: Convert iommu initialization to state machine
iommu/amd: Introduce amd_iommu_init_dma routine
iommu/amd: Move unmap_flush message to amd_iommu_init_dma_ops()
iommu/amd: Split enable_iommus() routine
iommu/amd: Introduce early_amd_iommu_init routine
iommu/amd: Move informational prinks out of iommu_enable
iommu/amd: Split out PCI related parts of IOMMU initialization
iommu/amd: Use acpi_get_table instead of acpi_table_parse
iommu/amd: Fix sparse warnings
iommu/tegra: Don't call alloc_pdir with as->lock
iommu/tegra: smmu: Fix unsleepable memory allocation at alloc_pdir()
iommu/tegra: smmu: Remove unnecessary sanity check at alloc_pdir()
iommu/exynos: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/tegra: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/msm: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/omap: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/vt-d: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/amd: Implement DOMAIN_ATTR_GEOMETRY attribute
...
Work around broken devices and adhere to ACS support when determining
IOMMU grouping.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Add IOMMU group support to Intel VT-d code. This driver sets up
devices ondemand, so make use of the add_device/remove_device
callbacks in IOMMU API to manage setting up the groups.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
IOMMU device groups are currently a rather vague associative notion
with assembly required by the user or user level driver provider to
do anything useful. This patch intends to grow the IOMMU group concept
into something a bit more consumable.
To do this, we first create an object representing the group, struct
iommu_group. This structure is allocated (iommu_group_alloc) and
filled (iommu_group_add_device) by the iommu driver. The iommu driver
is free to add devices to the group using it's own set of policies.
This allows inclusion of devices based on physical hardware or topology
limitations of the platform, as well as soft requirements, such as
multi-function trust levels or peer-to-peer protection of the
interconnects. Each device may only belong to a single iommu group,
which is linked from struct device.iommu_group. IOMMU groups are
maintained using kobject reference counting, allowing for automatic
removal of empty, unreferenced groups. It is the responsibility of
the iommu driver to remove devices from the group
(iommu_group_remove_device).
IOMMU groups also include a userspace representation in sysfs under
/sys/kernel/iommu_groups. When allocated, each group is given a
dynamically assign ID (int). The ID is managed by the core IOMMU group
code to support multiple heterogeneous iommu drivers, which could
potentially collide in group naming/numbering. This also keeps group
IDs to small, easily managed values. A directory is created under
/sys/kernel/iommu_groups for each group. A further subdirectory named
"devices" contains links to each device within the group. The iommu_group
file in the device's sysfs directory, which formerly contained a group
number when read, is now a link to the iommu group. Example:
$ ls -l /sys/kernel/iommu_groups/26/devices/
total 0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
../../../../devices/pci0000:00/0000:00:1e.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
$ ls -l /sys/kernel/iommu_groups/26/devices/*/iommu_group
[truncating perms/owner/timestamp]
/sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
../../../../kernel/iommu_groups/26
/sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
../../../../kernel/iommu_groups/26
Groups also include several exported functions for use by user level
driver providers, for example VFIO. These include:
iommu_group_get(): Acquires a reference to a group from a device
iommu_group_put(): Releases reference
iommu_group_for_each_dev(): Iterates over group devices using callback
iommu_group_[un]register_notifier(): Allows notification of device add
and remove operations relevant to the group
iommu_group_id(): Return the group number
This patch also extends the IOMMU API to allow attaching groups to
domains. This is currently a simple wrapper for iterating through
devices within a group, but it's expected that the IOMMU API may
eventually make groups a more integral part of domains.
Groups intentionally do not try to manage group ownership. A user
level driver provider must independently acquire ownership for each
device within a group before making use of the group as a whole.
This may change in the future if group usage becomes more pervasive
across both DMA and IOMMU ops.
Groups intentionally do not provide a mechanism for driver locking
or otherwise manipulating driver matching/probing of devices within
the group. Such interfaces are generic to devices and beyond the
scope of IOMMU groups. If implemented, user level providers have
ready access via iommu_group_for_each_dev and group notifiers.
iommu_device_group() is removed here as it has no users. The
replacement is:
group = iommu_group_get(dev);
id = iommu_group_id(group);
iommu_group_put(group);
AMD-Vi & Intel VT-d support re-added in following patches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Replace the struct pci_bus secondary/subordinate members with the
struct resource busn_res. Later we'll build a resource tree of these
bus numbers.
[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Now we have four copies of this code, Linus "suggested" it was about time
we stopped copying it and turned it into a helper.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add device info into list before doing context mapping, because device
info will be used by iommu_enable_dev_iotlb(). Without it, ATS won't get
enabled as it should be.
ATS, while a dubious decision from a security point of view, can be very
important for performance.
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the file names consistent with the naming conventions of irq subsystem.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Make the code consistent with the naming conventions of irq subsystem.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch introduces irq_remap_ops to hold implementation
specific function pointer to handle interrupt remapping. As
the first part the initialization functions for VT-d are
converted to these ops.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Pull DMA mapping branch from Marek Szyprowski:
"Short summary for the whole series:
A few limitations have been identified in the current dma-mapping
design and its implementations for various architectures. There exist
more than one function for allocating and freeing the buffers:
currently these 3 are used dma_{alloc, free}_coherent,
dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent.
For most of the systems these calls are almost equivalent and can be
interchanged. For others, especially the truly non-coherent ones
(like ARM), the difference can be easily noticed in overall driver
performance. Sadly not all architectures provide implementations for
all of them, so the drivers might need to be adapted and cannot be
easily shared between different architectures. The provided patches
unify all these functions and hide the differences under the already
existing dma attributes concept. The thread with more references is
available here:
http://www.spinics.net/lists/linux-sh/msg09777.html
These patches are also a prerequisite for unifying DMA-mapping
implementation on ARM architecture with the common one provided by
dma_map_ops structure and extending it with IOMMU support. More
information is available in the following thread:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819
More works on dma-mapping framework are planned, especially in the
area of buffer sharing and managing the shared mappings (together with
the recently introduced dma_buf interface: commit d15bd7ee44
"dma-buf: Introduce dma buffer sharing mechanism").
The patches in the current set introduce a new alloc/free methods
(with support for memory attributes) in dma_map_ops structure, which
will later replace dma_alloc_coherent and dma_alloc_writecombine
functions."
People finally started piping up with support for merging this, so I'm
merging it as the last of the pending stuff from the merge window.
Looks like pohmelfs is going to wait for 3.5 and more external support
for merging.
* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
common: DMA-mapping: add NON-CONSISTENT attribute
common: DMA-mapping: add WRITE_COMBINE attribute
common: dma-mapping: introduce mmap method
common: dma-mapping: remove old alloc_coherent and free_coherent methods
Hexagon: adapt for dma_map_ops changes
Unicore32: adapt for dma_map_ops changes
Microblaze: adapt for dma_map_ops changes
SH: adapt for dma_map_ops changes
Alpha: adapt for dma_map_ops changes
SPARC: adapt for dma_map_ops changes
PowerPC: adapt for dma_map_ops changes
MIPS: adapt for dma_map_ops changes
X86 & IA64: adapt for dma_map_ops changes
common: dma-mapping: introduce generic alloc() and free() methods