If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.
The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:
http://marc.info/?l=linux-kernel&m=125657444317079&w=2
The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.
This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.
The new x86 IOMMU initialization sequence are:
1. we initialize the swiotlb (and setting swiotlb to 1) in the case
of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
the boot option, we finish here.
2. we call the detection functions of all the IOMMUs
3. the detection function sets x86_init.iommu.iommu_init to the
IOMMU initialization function (so we can avoid calling the
initialization functions of all the IOMMUs needlessly).
4. if the IOMMU initialization function doesn't need to swiotlb
then sets swiotlb to zero (e.g. the initialization is
sucessful).
5. if we find that swiotlb is set to zero, we free swiotlb
resource.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find
that we don't need swiotlb.
This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: tony.luck@intel.com
Cc: benh@kernel.crashing.org
LKML-Reference: <1257849980-22640-9-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: merge up conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
swiotlb_free() function frees all allocated memory for swiotlb.
We need to initialize swiotlb before IOMMU initialization (x86
and powerpc needs to allocate memory from bootmem allocator). If
IOMMU initialization is successful, we need to free swiotlb
resource (don't want to waste 64MB).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-8-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_SWIOTLB case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a new function for freeing bootmem after the bootmem
allocator has been released and the unreserved pages given to
the page allocator.
This allows us to reserve bootmem and then release it if we
later discover it was not needed.
( This new API will be used by the swiotlb code to recover
a significant amount of RAM (64MB). )
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: hannes@cmpxchg.org
Cc: tj@kernel.org
Cc: akpm@linux-foundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1257849980-22640-7-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes detect_intel_iommu() to set intel_iommu_init() to
iommu_init hook if detect_intel_iommu() finds the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-6-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_DMAR case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes amd_iommu_detect() to set amd_iommu_init to
iommu_init hook if amd_iommu_detect() finds the AMD IOMMU.
We can kill the code to check if we found the IOMMU in
amd_iommu_init() since amd_iommu_detect() sets amd_iommu_init()
only when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-5-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes gart_iommu_hole_init() to set gart_iommu_init() to
iommu_init hook if gart_iommu_hole_init() finds the GART IOMMU.
We can kill the code to check if we found the IOMMU in
gart_iommu_init() since gart_iommu_hole_init() sets
gart_iommu_init() only when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-4-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes detect_calgary() to set init_calgary() to
iommu_init hook if detect_calgary() finds the Calgary IOMMU.
We can kill the code to check if we found the IOMMU in
init_calgary() since detect_calgary() sets init_calgary() only
when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
LKML-Reference: <1257849980-22640-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We call the detections functions of all the IOMMUs then all
their initialization functions. The latter is pointless since we
don't detect multiple different IOMMUs. What we need to do is
calling the initialization function of the detected IOMMU.
This adds iommu_init hook to x86_init_ops so if an IOMMU
detection function can set its initialization function to the
hook.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch cleans up pci_iommu_shutdown() a bit to use
x86_platform (similar to how IA64 initializes an IOMMU driver).
This adds iommu_shutdown() to x86_platform to avoid calling
every IOMMUs' shutdown functions in pci_iommu_shutdown() in
order. The IOMMU shutdown functions are platform specific (we
don't have multiple different IOMMU hardware) so the current way
is pointless.
An IOMMU driver sets x86_platform.iommu_shutdown to the shutdown
function if necessary.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: joerg.roedel@amd.com
LKML-Reference: <20091027163358F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: get_tss_base_addr() should return a gpa_t
KVM: x86: Catch potential overrun in MCE setup
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: invalidate target of rename
fuse: fix kunmap in fuse_ioctl_copy_user
fuse: prevent fuse_put_request on invalid pointer
* git://git.infradead.org/users/dwmw2/mtd-2.6.32:
mtd/maps: gpio-addr-flash: depend on GPIO arch support
mtd/maps: gpio-addr-flash: pull in linux/ headers rather than asm/
mtd: nand: fix htmldocs warnings
The current implementation of get_user_desc() sign extends the return
value because of integer promotion rules. For the most part, this
doesn't matter, because the top bit of base2 is usually 0. If, however,
that bit is 1, then the entire value will be 0xffff... which is
probably not what the caller intended.
This patch casts the entire thing to unsigned before returning, which
generates almost the same assembly as the current code but replaces the
final "cltq" (sign extend) with a "mov %eax %eax" (zero-extend). This
fixes booting certain guests under KVM.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen: mask extended topology info in cpuid
xen/hvc: make sure console output is always emitted, with explicit polling
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c
sched: Disable SD_PREFER_LOCAL at node level
sched: Fix boot crash by zalloc()ing most of the cpu masks
sched: Strengthen buddies and mitigate buddy induced latencies
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, fs: Fix x86 procfs stack information for threads on 64-bit
x86: Add reboot quirk for 3 series Mac mini
x86: Fix printk message typo in mtrr cleanup code
dma-debug: Fix compile warning with PAE enabled
x86/amd-iommu: Un__init function required on shutdown
x86/amd-iommu: Workaround for erratum 63
* 'for-linus' of git://www.linux-m32r.org/git/takata/linux-2.6_dev:
m32r: Should index be positive?
m32r: bzip2/lzma kernel compression support
m32r: add NOTES to vmlinux.lds.S to remove .note.gnu.build-id section
arch/m32r: Use DIV_ROUND_CLOSEST
Not a single line of actual code in the function was really
fundamentally correct.
Problems ranged from lack of proper range checking, to removing the last
character written (which admittedly is usually '\n'), to not accepting
hex numbers even though the 'show' routine would show the data in that
format.
This tries to do better.
Acked-by: Michael Buesch <mb@bu3sch.de>
Tested-and-acked-by: Jack Steiner <steiner@sgi.com>
Cc: stable@kernel.org
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Michael Gilbert <michael.s.gilbert@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch f598282f51 exposed a problem in
powerpc MSI-X functionality, making network interfaces such as ixgbe
and cxgb3 stop to work when MSI-X is enabled. RX interrupts were not
being generated.
The problem was caused because MSI irq was not being effectively
unmasked after device initialization.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Doing so causes xtime to be negative which crashes the timekeeping
code in funny ways when doing suspend/resume
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I inadvertently left that debug code enabled, causing the number of
contexts to be clamped to 31 which is going to slow things down on
4xx and just plain breaks 8xx
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While refreshing my sysfs patches I noticed a leak in the secdata
implementation. We don't free the secdata when we free the
sysfs dirent.
This is a bug in 2.6.32-rc5 that we really should close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Ironlake suspend/resume support
drm/i915: kill warning in intel_find_pll_g4x_dp
drm/i915: update watermarks before enabling PLLs
drm/i915: add FIFO watermark support for G4x
drm/i915: quiet DP i2c init
drm/i915: fix panel fitting filter coefficient select for Ironlake
drm/i915: fix to setup display reference clock control on Ironlake
drm/i915: Install a fence register for fbc on g4x
drm/i915: save/restore BLC histogram control reg across suspend/resume
drm/i915: Fix FDI M/N setting according with correct color depth
drm/i915: disable powersave feature for Ironlake currently
drm/i915: Fix render reclock availability detection.
drm/i915: Save and restore the GM45 FBC regs on suspend and resume.
drm/i915: Set the LVDS_BORDER when using LVDS scaling mode
drm/i915: disable FBC for Pineview, fixing a boot hang.
If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G
Cc: stable@kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We only allocate memory for 32 MCE banks (KVM_MAX_MCE_BANKS) but we
allow user space to fill up to 255 on setup (mcg_cap & 0xff), corrupting
kernel memory. Catch these overflows.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is
called before pci_register_driver. If it fails, should exit with err
directly.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This patch fixes two issues in the procfs stack information on
x86-64 linux.
The 32 bit loader compat_do_execve did not store stack
start. (this was figured out by Alexey Dobriyan).
The stack information on a x64_64 kernel always shows 0 kbyte
stack usage, because of a missing implementation of the KSTK_ESP
macro which always returned -1.
The new implementation now returns the right value.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1257240160.4889.24.camel@wall-e>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Invalidate the target's attributes, which may have changed (such as
nlink, change time) so that they are refreshed on the next getattr().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Looks like another victim of the confusing kmap() vs kmap_atomic() API
differences.
Reported-by: Todor Gyumyushev <yodor1@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
fuse_direct_io() has a loop where requests are allocated in each
iteration. if allocation fails, the loop is broken out and follows
into an unconditional fuse_put_request() on that invalid pointer.
Signed-off-by: Anand V. Avati <avati@gluster.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@kernel.org
When a command is passed to the set_ftrace_filter, then
the ftrace_regex_lock is still held going back to user space.
# echo 'do_open : foo' > set_ftrace_filter
(still holding ftrace_regex_lock when returning to user space!)
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AEF7F8A.3080300@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
We got a sudden panic when we reduced the size of the
ringbuffer.
We can reproduce the panic by the following steps:
echo 1 > events/sched/enable
cat trace_pipe > /dev/null &
while ((1))
do
echo 12000 > buffer_size_kb
echo 512 > buffer_size_kb
done
(not more than 5 seconds, panic ...)
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4AF01735.9060409@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: limit coop preemption
cfq-iosched: fix bad return value cfq_should_preempt()
backing-dev: bdi sb prune should be in the unregister path, not destroy
Fix bio_alloc() and bio_kmalloc() documentation
bio_put(): add bio_clone() to the list of functions in the comment
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
sata_via: Remove redundant device ID for VIA VT8261
drivers/ata/libata: Move dereference after NULL test
ahci: Enable SB600 64bit DMA on MSI K9A2 Platinum v2
Index `ipi_num' is signed, test whether it is negative to
make sure we don't get a negative array element.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
- Support bzip2 and lzma kernel compression for m32r.
- Clean up arch/m32r/boot/compressed/misc.c.
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Building with --build-id option, .note.gnu.build-id section is added
to vmlinux.bin. But some old buggy binutils creates a huge vmlinux.bin,
and a bootloader fails to boot its zImage as well.
This patch adds a NOTES macro to a linker script vmlinux.ld.S to put
.note.gnu.build-id section into .note section.
Then, the .note section will be removed, because "-R .note" option is
specified in OBJCOPYFLAGS to make a vmlinux.bin binary.
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d
but is perhaps more readable.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@haskernel@
@@
@depends on haskernel@
expression x,__divisor;
@@
- (((x) + ((__divisor) / 2)) / (__divisor))
+ DIV_ROUND_CLOSEST(x,__divisor)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Just remove redundant device ID for VIA VT8261.
The device ID 0x9000 and 0x9040 are redundant (for VT8261).
The 0x9040 is reserved for other usage.
Signed-off-by: Joseph Chan <josephchan@via.com.tw>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
In each case, if the NULL test on qc is needed, then the derefernce
should be after the NULL test.
A simplified version of the semantic match that detects this problem is as
follows (http://coccinelle.lip6.fr/):
// <smpl>
@match exists@
expression x, E;
identifier fld;
@@
* x->fld
... when != \(x = E\|&x\)
* x == NULL
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>