* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock()
lockdep: Comment all warnings
lib: atomic64: Change the type of local lock to raw_spinlock_t
locking, lib/atomic64: Annotate atomic64_lock::lock as raw
locking, x86, iommu: Annotate qi->q_lock as raw
locking, x86, iommu: Annotate irq_2_ir_lock as raw
locking, x86, iommu: Annotate iommu->register_lock as raw
locking, dma, ipu: Annotate bank_lock as raw
locking, ARM: Annotate low level hw locks as raw
locking, drivers/dca: Annotate dca_lock as raw
locking, powerpc: Annotate uic->lock as raw
locking, x86: mce: Annotate cmci_discover_lock as raw
locking, ACPI: Annotate c3_lock as raw
locking, oprofile: Annotate oprofilefs lock as raw
locking, video: Annotate vga console lock as raw
locking, latencytop: Annotate latency_lock as raw
locking, timer_stats: Annotate table_lock as raw
locking, rwsem: Annotate inner lock as raw
locking, semaphores: Annotate inner lock as raw
locking, sched: Annotate thread_group_cputimer as raw
...
Fix up conflicts in kernel/posix-cpu-timers.c manually: making
cputimer->cputime a raw lock conflicted with the ABBA fix in commit
bcd5cff721 ("cputimer: Cure lock inversion").
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, ioapic: Consolidate the explicit EOI code
x86, ioapic: Restore the mask bit correctly in eoi_ioapic_irq()
x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC
iommu: Rename the DMAR and INTR_REMAP config options
x86, ioapic: Define irq_remap_modify_chip_defaults()
x86, msi, intr-remap: Use the ioapic set affinity routine
iommu: Cleanup ifdefs in detect_intel_iommu()
iommu: No need to set dmar_disabled in check_zero_address()
iommu: Move IOMMU specific code to intel-iommu.c
intr_remap: Call dmar_dev_scope_init() explicitly
x86, x2apic: Enable the bios request for x2apic optout
If target_level == 0, current code breaks out of the while-loop if
SUPERPAGE bit is set. We should also break out if PTE is not present.
If we don't do this, KVM calls to iommu_iova_to_phys() will cause
pfn_to_dma_pte() to create mapping for 4KiB pages.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
set dmar->iommu_superpage field to the smallest common denominator
of super page sizes supported by all active VT-d engines. Initialize
this field in intel_iommu_domain_init() API so intel_iommu_map() API
will be able to use iommu_superpage field to determine the appropriate
super page size to use.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
iommu_unmap() API expects IOMMU drivers to return the actual page order
of the address being unmapped. Previous code was just returning page
order passed in from the caller. This patch fixes this problem.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We really don't want this to work in the general case; device drivers
*shouldn't* care whether they are behind an IOMMU or not. But the
integrated graphics is a special case, because the IOMMU and the GTT are
all kind of smashed into one and generally horrifically buggy, so it's
reasonable for the graphics driver to want to know when the IOMMU is
active for the graphics hardware.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
To work around a hardware issue, we have to submit IOTLB flushes while
the graphics engine is idle. The graphics driver will (we hope) go to
great lengths to ensure that it gets that right on the affected
chipset(s)... so let's not screw it over by deferring the unmap and
doing it later. That wouldn't be very helpful.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
When unbinding a device so that I could pass it through to a KVM VM, I
got the lockdep report below. It looks like a legitimate lock
ordering problem:
- domain_context_mapping_one() takes iommu->lock and calls
iommu_support_dev_iotlb(), which takes device_domain_lock (inside
iommu->lock).
- domain_remove_one_dev_info() starts by taking device_domain_lock
then takes iommu->lock inside it (near the end of the function).
So this is the classic AB-BA deadlock. It looks like a safe fix is to
simply release device_domain_lock a bit earlier, since as far as I can
tell, it doesn't protect any of the stuff accessed at the end of
domain_remove_one_dev_info() anyway.
BTW, the use of device_domain_lock looks a bit unsafe to me... it's
at least not obvious to me why we aren't vulnerable to the race below:
iommu_support_dev_iotlb()
domain_remove_dev_info()
lock device_domain_lock
find info
unlock device_domain_lock
lock device_domain_lock
find same info
unlock device_domain_lock
free_devinfo_mem(info)
do stuff with info after it's free
However I don't understand the locking here well enough to know if
this is a real problem, let alone what the best fix is.
Anyway here's the full lockdep output that prompted all of this:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.39.1+ #1
-------------------------------------------------------
bash/13954 is trying to acquire lock:
(&(&iommu->lock)->rlock){......}, at: [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
but task is already holding lock:
(device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (device_domain_lock){-.-...}:
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f8350>] domain_context_mapping_one+0x600/0x750
[<ffffffff812f84df>] domain_context_mapping+0x3f/0x120
[<ffffffff812f9175>] iommu_prepare_identity_map+0x1c5/0x1e0
[<ffffffff81ccf1ca>] intel_iommu_init+0x88e/0xb5e
[<ffffffff81cab204>] pci_iommu_init+0x16/0x41
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff81ca3d3f>] kernel_init+0xe3/0x168
[<ffffffff8157ac24>] kernel_thread_helper+0x4/0x10
-> #0 (&(&iommu->lock)->rlock){......}:
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
6 locks held by bash/13954:
#0: (&buffer->mutex){+.+.+.}, at: [<ffffffff811e4464>] sysfs_write_file+0x44/0x170
#1: (s_active#3){++++.+}, at: [<ffffffff811e44ed>] sysfs_write_file+0xcd/0x170
#2: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81372edb>] driver_unbind+0x9b/0xc0
#3: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81373cc7>] device_release_driver+0x27/0x50
#4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<ffffffff8108974f>] __blocking_notifier_call_chain+0x5f/0xb0
#5: (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
stack backtrace:
Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
Call Trace:
[<ffffffff810993a7>] print_circular_bug+0xf7/0x100
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8109d57d>] ? trace_hardirqs_on_caller+0x13d/0x180
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Before the restructruing of the x86 IOMMU code,
intel_iommu_init() was getting called directly from
pci_iommu_init() and hence needed to explicitly set
dmar_disabled to 1 for the failure conditions of
check_zero_address().
Recent changes don't call intel_iommu_init() if the intel iommu
detection fails as a result of failure in check_zero_address().
So no need for this ifdef and the code inside it.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: youquan.song@intel.com
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.334878686@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Both DMA-remapping aswell as Interrupt-remapping depend on the
dmar dev scope to be initialized. When both DMA and
IRQ-remapping are enabled, we depend on DMA-remapping init code
to call dmar_dev_scope_init(). This resulted in not doing this
init when DMA-remapping was turned off but interrupt-remapping
turned on in the kernel config.
This caused interrupt routing to break with CONFIG_INTR_REMAP=y
and CONFIG_DMAR=n.
This issue was introduced by this commit:
| commit 9d5ce73a64
| Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
| Date: Tue Nov 10 19:46:16 2009 +0900
|
| x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook
Fix this by calling dmar_dev_scope_init() explicitly from the
interrupt remapping code too.
Reported-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: youquan.song@intel.com
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.229207526@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On the platforms which are x2apic and interrupt-remapping
capable, Linux kernel is enabling x2apic even if the BIOS
doesn't. This is to take advantage of the features that x2apic
brings in.
Some of the OEM platforms are running into issues because of
this, as their bios is not x2apic aware. For example, this was
resulting in interrupt migration issues on one of the platforms.
Also if the BIOS SMI handling uses APIC interface to send SMI's,
then the BIOS need to be aware of x2apic mode that OS has
enabled.
On some of these platforms, BIOS doesn't have a HW mechanism to
turnoff the x2apic feature to prevent OS from enabling it.
To resolve this mess, recent changes to the VT-d2 specification:
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf
includes a mechanism that provides BIOS a way to request system
software to opt out of enabling x2apic mode.
Look at the x2apic optout flag in the DMAR tables before
enabling the x2apic mode in the platform. Also print a warning
that we have disabled x2apic based on the BIOS request.
Kernel boot parameter "intremap=no_x2apic_optout" can be used to
override the BIOS x2apic optout request.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.171766616@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mark this lowlevel IRQ handler as non-threaded. This prevents a boot
crash when "threadirqs" is on the kernel commandline. Also the
interrupt handler is handling hardware critical events which should
not be delayed into a thread.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The qi->q_lock lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The irq_2_ir_lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The iommu->register_lock can be taken in atomic context and therefore
must not be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The domain_flush_devices() function takes the domain->lock.
But this function is only called from update_domain() which
itself is already called unter the domain->lock. This causes
a deadlock situation when the dma-address-space of a domain
grows larger than 1GB.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The value is only set to true but never set back to false,
which causes to many completion-wait commands to be sent to
hardware. Fix it with this patch.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Reserve the MSI address range in the address allocator so
that MSI addresses are not handed out as dma handles.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
A few parts of the driver were missing in drivers/iommu.
Move them there to have the complete driver in that
directory.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.
Note: to move intel-iommu.c, the declaration of pci_find_upstream_pcie_bridge()
has to move from drivers/pci/pci.h to include/linux/pci.h. This is handled
in this patch, too.
As suggested, also drop DMAR's EXPERIMENTAL tag while we're at it.
Compile-tested on x86_64.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.
Compile-tested on x86_64.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This should ease finding similarities with different platforms,
with the intention of solving problems once in a generic framework
which everyone can use.
Compile-tested for MSM8X60.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Acked-by: David Brown <davidb@codeaurora.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Create a dedicated folder for iommu drivers, and move the base
iommu implementation over there.
Grouping the various iommu drivers in a single location will help
finding similar problems shared by different platforms, so they
could be solved once, in the iommu framework, instead of solved
differently (or duplicated) in each driver.
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>