* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: only save/restore existent registers in the PCIe capability
x86/PCI: don't bother with root quirks if _CRS is used
docbooks: add/fix PCI kernel-doc
PCI: cleanup debug output resources
x86/PCI: set_pci_bus_resources_arch_default cleanups
x86/PCI: Move set_pci_bus_resources_arch_default into arch/x86
x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case
PCI quirk: disable MSI on VIA VT3364 chipsets
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hpet: Stop soliciting hpet=force users on ICH4M
x86: check boundary in setup_node_bootmem()
uv_time: add parameter to uv_read_rtc()
x86: hpet: fix periodic mode programming on AMD 81xx
x86: more than 8 32-bit CPUs requires X86_BIGSMP
x86: avoid theoretical spurious NMI backtraces with CONFIG_CPUMASK_OFFSTACK=y
x86: fix boot crash in NMI watchdog with CONFIG_CPUMASK_OFFSTACK=y and flat APIC
x86-64: fix FPU corruption with signals and preemption
x86/uv: fix for no memory at paddr 0
docs, x86: add nox2apic back to kernel-parameters.txt
x86: mm/numa_32.c calculate_numa_remap_pages should use __init
x86, kbuild: make "make install" not depend on vmlinux
x86/uv: fix init of cpu-less nodes
x86/uv: fix init of memory-less nodes
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/irq: mark NUMA_MIGRATE_IRQ_DESC broken
x86, irq: Remove IRQ_DISABLED check in process context IRQ move
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (34 commits)
ACPI, i915: Register ACPI video even when not modesetting
Revert "ACPICA: delete check for AML access to port 0x81-83"
I/O port protection: update for windows compatibility.
sony-laptop: always try to unblock rfkill on load
sony-laptop: fix bogus error message display on resume
ACPI: EC: Fix ACPI EC resume non-query interrupt message
sony-laptop: SNC input event 38 fix
sony-laptop: SNC 127 Initialization Fix
sony-laptop: Duplicate SNC 127 Event Fix
ACPI: prevent processor.max_cstate=0 boot crash
ACPI/hpet: prevent boot hang when hpet=force used on ICH-4M
ACPI: delete obsolete "bus master activity" proc field
ACPI: idle: mark_tsc_unstable() at init-time, not run-time
ACPI: add /sys/firmware/acpi/interrupts/sci_not counter
ACPI video: fix an error when the brightness levels on AC and on Battery are same
acpi-cpufreq: Do not let get_measured perf depend on internal variable
acpi-cpufreq: style-only: add parens to math expression
acpi-cpufreq: Cleanup: Use printk_once
x86, acpi_cpufreq: Fix the NULL pointer dereference in get_measured_perf
thinkpad-acpi: bump up version to 0.23
...
The HPET in the ICH4M is not documented in the data sheet
because it was not officially validated.
While it is fine for hackers to continue to use "hpet=force"
to enable the hardware that they have, it is not prudent to
solicit additional "hpet=force" users on this hardware.
[ Impact: remove hpet=force syslog message on old-ICH systems ]
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <alpine.LFD.2.00.0904231918510.15843@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit dc09855 ("x86/uv: fix init of memory-less nodes") causes a
two sockets system (where node-1 doesn't have RAM installed) to crash.
That commit makes node_possible include cpu nodes that do not have memory.
So check boundary in setup_node_bootmem().
[ Impact: fix boot crash on RAM-less NUMA node system ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <49EF89DF.9090404@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It will be overwriten later if _CRS is used, so don't bother to set it.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Rename set_pci_bus_resources_arch_default to x86_pci_root_bus_res_quirks, move
the weak version from common.c to i386.c, and before calling, make sure it's a
root bus.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit 30a18d6c3f introduced a new
function to set the PCI bus resources. Unfortunately, neither the
author, nor the committers seemed to know that we already have somewhere
to do that -- pcibios_fixup_bus(). This patch moves the hook (used only
by the K8 code) into x86-specific code where it should have been in the
first place.
Cc: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
uv_read_rtc() is referenced by read member of struct clocksource clocksource_uv.
In include/linux/clocksource.h, read of struct clocksource is declared as:
cycle_t (*read)(struct clocksource *cs)
This got introduced recently in:
8e19608: clocksource: pass clocksource to read() callback
But arch/x86/kernel/uv_time.c was not properly converted by that pach.
This patch adds a dummy parameter (struct clocksource type) to uv_read_rtc() to
fix the incompatible reference in clocksource_uv, and add a NULL parameter in
all places where uv_read_rtc() gets called.
[ Impact: cleanup, address compiler warning ]
Signed-off-by: Coly Li <coly.li@suse.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Magnus Damm <damm@igel.co.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>
LKML-Reference: <49EF3614.1050806@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dimitri Sivanich <sivanich@sgi.com>
(See http://bugzilla.kernel.org/show_bug.cgi?id=12961)
It partially reverts commit c23e253e67
(x86: hpet: stop HPET_COUNTER when programming periodic mode)
HPET on AMD 81xx chipset needs a second write (with HPET_TN_SETVAL
cleared) to T0_CMP register to set the period in periodic mode.
With this patch HPET_COUNTER is still stopped but not reset when HPET
is programmed in periodic mode. This should help to avoid races when
HPET is programmed in periodic mode and fixes a boot time hang that
I've observed on a machine when using 1000HZ.
[ Impact: fix boot time hang on machines with AMD 81xx chipset ]
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Jeff Mahoney <jeffm@suse.com>
LKML-Reference: <20090421180037.GA2763@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Properly unregister cpufreq notifier on onload if it was registered
during init.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Not releasing the time_page causes a leak of that page or the compound
page it is situated in.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
does not agree with that specified by DEFINE_PER_CPU(). This means that
architectures that have a small data section references relative to a base
register may throw up linkage errors due to too great a displacement between
where the base register points and the per-CPU variable.
On FRV, the .h declaration says that the variable is in the .sdata section, but
the .c definition says it's actually in the .data section. The linker throws
up the following errors:
kernel/built-in.o: In function `release_task':
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
as does DEFINE_PER_CPU(). However, this is made slightly more complex by
virtue of the fact that there are several variants on DEFINE, so these need to
be matched by variants on DECLARE.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
$ cat x86-more-than-8-cpus-requires-bigsmp.patch
Enforce NR_CPUS <= 8 limitation if X86_BIGSMP not set
Configuring more than 8 logical CPUs on 32-bit x86 requires
X86_BIGSMP to be set in order to boot successfully, if more than 8
logical CPUs are actually found at boot time. The X86_BIGSMP help
text describes that it is required to be set if more than 8 CPUs
are configured, but this was previously not enforced.
This configuration error has affected multiple distributions:
https://bugzilla.redhat.com/show_bug.cgi?id=480844https://issues.rpath.com/browse/RPL-3022
Signed-off-by: Michael K Johnson <johnsonm@rpath.com>
LKML-Reference: <20090422014448.GB32541@logo.rdu.rpath.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Pass clocksource pointer to the read() callback for clocksources. This
allows us to share the callback between multiple instances.
[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In theory (though not shown in practice) alloc_cpumask_var() doesn't zero
memory, so CPUs might print an "NMI backtrace for cpu %d" once on boot.
(Bug introduced in fcef8576d8).
[ Impact: avoid theoretical syslog noise in rare configs ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <alpine.DEB.2.00.0904202113520.10097@gandalf.stny.rr.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fcef8576d8 converted backtrace_mask to a
cpumask_var_t, and assumed check_nmi_watchdog was called before
nmi_watchdog_tick was ever called. Steven's oops shows I was wrong.
This is something of a bandaid: I'm not sure we *should* be calling
nmi_watchdog_tick before check_nmi_watchdog. Note that gcc eliminates
this test for the CONFIG_CPUMASK_OFFSTACK=n case.
[ Impact: fix boot crash in rare configs ]
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <alpine.DEB.2.00.0904202113520.10097@gandalf.stny.rr.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is a version incorporating Christoph's suggestion.
Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In 64bit signal delivery path, clear_used_math() was happening before saving
the current active FPU state on to the user stack for signal handling. Between
clear_used_math() and the state store on to the user stack, potentially we
can get a page fault for the user address and can block. Infact, while testing
we were hitting the might_fault() in __clear_user() which can do a schedule().
At a later point in time, we will schedule back into this process and
resume the save state (using "xsave/fxsave" instruction) which can lead
to DNA fault. And as used_math was cleared before, we will reinit the FP state
in the DNA fault and continue. This reinit will result in loosing the
FPU state of the process.
Move clear_used_math() to a point after the FPU state has been stored
onto the user stack.
This issue is present from a long time (even before the xsave changes
and the x86 merge). But it can easily be exposed in 2.6.28.x and 2.6.29.x
series because of the __clear_user() in this path, which has an explicit
__cond_resched() leading to a context switch with CONFIG_PREEMPT_VOLUNTARY.
[ Impact: fix FPU state corruption ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org> [2.6.28.x, 2.6.29.x]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Fix endcase where the memory at physical address 0 does not really
exist AND one of the sockets on blade 0 has no active cpus.
The memory that _appears_ to be at physical address 0 is actually
memory that located at a different address but has been remapped by
the chipset so that it appears to be at physical address 0.
When determining the UV pnode, the algorithm for determining the pnode
incorrectly used the relocated physical address instead of the actual
(global) address.
[ Impact: boot failure on partitioned systems ]
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090420132530.GA23156@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take already available policy->cpuinfo.max_freq and get rid of acpi-cpufreq
specific max_freq variable.
This implies that P0 is always the highest frequency which should always
be true as ACPI spec says:
As a result, the zeroth entry describes the highest performance state
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Fixes guest crash 'lguest: bad read address 0x4800000 len 256'
The new per-cpu allocator ends up handing a non-linear address to
write_gdt_entry. We do __pa() on it, and hand it to the host, which
kills us.
I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
code, but had no pressing reason until now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: lguest@ozlabs.org
I hit the check_flags error of lockdep:
WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0()
[...]
hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110
hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40
softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110
softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0
The check_flags warning of lockdep tells me that lockdep thought interrupts
were disabled, but they were really enabled.
The numbers in the above parenthesis show the order of events:
12566: softirqs last enabled: lock_sock_nested
12567: hardirqs last enabled: local_bh_enable
12568: softirqs last disabled: tcp_prequeue_process
12566: hardirqs last disabled: int3
int3 is a breakpoint!
Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds
break points into the kernel.
The paranoid_exit of the return of int3 does not account for enabling
interrupts on return to kernel. This code is a bit tricky since it
is also used by the nmi handler (when lockdep is off), and we must be
careful about the swapgs. We can not call kernel code after the swapgs
has been performed.
[ Impact: fix lockdep check_flags warning + self-turn-off ]
Acked-by: Peter Zijlsta <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
calculate_numa_remap_pages() is called only by __init initmem_init()
further calculate_numa_remap_pages is calling:
__init find_e820_area() and __init reserve_early()
So calculate_numa_remap_pages() should be __init calculate_numa_remap_pages().
WARNING: arch/x86/built-in.o(.text+0x82ea3): Section mismatch in reference from the function calculate_numa_remap_pages() to the function .init.text:find_e820_area()
The function calculate_numa_remap_pages() references
the function __init find_e820_area().
This is often because calculate_numa_remap_pages lacks a __init
annotation or the annotation of find_e820_area is wrong.
WARNING: arch/x86/built-in.o(.text+0x82f5f): Section mismatch in reference from the function calculate_numa_remap_pages() to the function .init.text:reserve_early()
The function calculate_numa_remap_pages() references
the function __init reserve_early().
This is often because calculate_numa_remap_pages lacks a __init
annotation or the annotation of reserve_early is wrong.
[ Impact: save memory, address Section mismatch warning ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1239991281.3153.4.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is common to use "make install" in restricted environments which
differ from the one which was actually used to build the kernel. In
such environments it is highly undesirable to trigger a rebuild of any
part of the system. Worse, the rebuild may be spurious, triggered by
differences in the environment.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <20090415234642.GA28531@uranus.ravnborg.org>
Fix an endcase in the UV initialization code for the "UV large system mode"
of apicids. If node zero contains no cpus, cpus on another node will be the
boot cpu. The percpu data that contains the extra apicid bits was not
being initialized early enough.
[ Impact: fix potential boot crash on cpu-less UV nodes ]
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090417142447.GA23759@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add support for nodes that have cpus but no memory.
The current code was failing to add these nodes
to the nodes_present_map.
v2: Fixes case caught by David Rientjes - missed support
for the x2apic SRAT table.
[ Impact: fix potential boot crash on memory-less UV nodes. ]
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090417142242.GA23743@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix microcode driver newly spewing warnings
x86, PAT: Remove page granularity tracking for vm_insert_pfn maps
x86: disable X86_PTRACE_BTS for now
x86, documentation: kernel-parameters replace X86-32,X86-64 with X86
x86: pci-swiotlb.c swiotlb_dma_ops should be static
x86, PAT: Remove duplicate memtype reserve in devmem mmap
x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()
x86, PAT: Changing memtype to WC ensuring no WB alias
x86, PAT: Handle faults cleanly in set_memory_ APIs
x86, PAT: Change order of cpa and free in set_memory_wb
x86, CPA: Change idmap attribute before ioremap attribute setup
It causes crash on system with lots of cards with MSI-X
when irq_balancer enabled...
The patches fixing it were both complex and fragile, according
to Eric they were also doing quite dangerous things to the
hardware.
Instead we now have patches that solve this problem via static
NUMA node mappings - not dynamic allocation and balancing.
The patches are much simpler than this method but are still too
large outside of the merge window, so we mark the dynamic balancer
as broken for now, and queue up the new approach for v2.6.31.
[ Impact: deactivate broken kernel feature ]
Reported-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49E68C41.4020801@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86/uv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: UV BAU distribution and payload MMRs
x86: UV: BAU partition-relative distribution map
x86, uv: add Kconfig dependency on NUMA for UV systems
x86: prevent /sys/firmware/sgi_uv from being created on non-uv systems
x86, UV: Fix for nodes with memory and no cpus
x86, UV: system table in bios accessed after unmap
x86: UV BAU messaging timeouts
x86: UV BAU and nodes with no memory
Jeff Garzik reported this WARN_ON() noise:
> Kernel: 2.6.30-rc1-00306-g8371f87
> Hardware: ICH10 x86-64
>
> This is a regression from 2.6.29. Microcode spews the following WARNING
> multiple times during boot:
>
> ------------[ cut here ]------------
> WARNING: at fs/sysfs/group.c:138 sysfs_remove_group+0xeb/0xf0()
> Hardware name: sysfs group ffffffffa0209700 not found for
> kobject 'cpu0'
Keep sysfs files around for cpus even when we failed to locate
microcode for them at the moment of module loading. The appropriate
microcode firmware can become available later on.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This change resolves the problem of too many single page entries
in pat_memtype_list and "freeing invalid memtype" errors with i915,
reported here:
http://marc.info/?l=linux-kernel&m=123845244713183&w=2
Remove page level granularity track and untrack of vm_insert_pfn.
memtype tracking at page granularity does not scale and cleaner
approach would be for the driver to request a type for a bigger
IO address range or PCI io memory range for that device, either at
mmap time or driver init time and just use that type during
vm_insert_pfn.
This patch just removes the track/untrack of vm_insert_pfn. That
means we will be in same state as 2.6.28, with respect to these APIs.
Newer APIs for the drivers to request a memtype for a bigger region
is coming soon.
[ Impact: fix Xorg startup warnings and hangs ]
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Tested-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <20090408223716.GC3493@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch correctly sets BAU memory mapped registers to point
to the sending activation descriptor table and target payload table.
The "Broadcast Assist Unit" is used for TLB shootdown in UV.
The memory mapped registers that point to sending and receiving
memory structures contain node numbers.
In one case the __pa() function did not provide the node id of
memory on blade zero in configurations where that id is nonzero.
In another case, it was assumed that memory was allocated on
the local node. That assumption is not true in a configuration
in which the node has no memory.
Tested on the UV hardware simulator.
[ Impact: fix possible runtime crash due to incorrect TLB logic ]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1LuR5Z-0007An-B8@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Oleg Nesterov found a couple of races in the ptrace-bts code
and fixes are queued up for it but they did not get ready in time
for the merge window. We'll merge them in v2.6.31 - until then
mark the feature as CONFIG_BROKEN. There's no user-space yet
making use of this so it's not a big issue.
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It turns out that 'smp_call_function_many()' doesn't work at all like
'smp_call_function_single()', and my change to Andrew's patch to use it
rather than a loop over all CPU's acpi-cpufreq doesn't work.
My bad.
'smp_call_function_many()' has two "features" (aka "documented bugs"):
(a) it needs to be called with preemption disabled, because it uses
smp_processor_id() without guarding the CPU lookup with 'get_cpu()'
and 'put_cpu()' like the 'single' variant does.
(b) even if the current CPU is part of the CPU mask, it won't do the
call on that CPU.
Still, we're better off trying to use 'smp_call_function_many()' than
looping over CPU's, since it at least in theory allows us to use a
broadcast IPI and do it all in parallel. So let's just work around the
silly semantic bugs in that function.
Reported-and-tested-by: Ali Gholami Rudi <ali@rudi.ir>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert part of af5c820a31 ("x86: cpumask:
use work_on_cpu in arch/x86/kernel/microcode_core.c")
That change is causing only one Intel CPU's microcode to be updated e.g.
microcode: CPU3 updated from revision 0x9 to 0x17, date = 2005-04-22
where before it announced that also for CPU0 and CPU1 and CPU2.
We cannot use work_on_cpu() in the CONFIG_MICROCODE_OLD_INTERFACE code,
because Intel's request_microcode_user() involves a copy_from_user() from
/sbin/microcode_ctl, which therefore needs to be on that CPU at the time.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch enables each partition's BAU distribution bit map
to be partition-relative.
The distribution bitmap had been constructed assuming 0 as the base
node number. That construct would not have allowed a total system of
greater than 256 nodes.
It also corrects an error that occurred when the first blade's nasid
was not zero. That nasid was stored as the base node.
The base node number gets added by hardware to the node numbers implied
in the distribution bitmap, resulting in invalid target nasids.
Tested on the UV hardware simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1Ltl0C-0004Ob-37@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As discussed in the thread here:
http://marc.info/?l=linux-kernel&m=123964468521142&w=2
Eric W. Biederman observed:
> It looks like some additional bugs have slipped in since last I looked.
>
> set_irq_affinity does this:
> ifdef CONFIG_GENERIC_PENDING_IRQ
> if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
> cpumask_copy(desc->affinity, cpumask);
> desc->chip->set_affinity(irq, cpumask);
> } else {
> desc->status |= IRQ_MOVE_PENDING;
> cpumask_copy(desc->pending_mask, cpumask);
> }
> #else
>
> That IRQ_DISABLED case is a software state and as such it has nothing to
> do with how safe it is to move an irq in process context.
[...]
>
> The only reason we migrate MSIs in interrupt context today is that there
> wasn't infrastructure for support migration both in interrupt context
> and outside of it.
Yes. The idea here was to force the MSI migration to happen in process
context. One of the patches in the series did
disable_irq(dev->irq);
irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
enable_irq(dev->irq);
with the above patch adding irq/manage code check for interrupt disabled
and moving the interrupt in process context.
IIRC, there was no IRQ_MOVE_PCNTXT when we were developing this HPET
code and we ended up having this ugly hack. IRQ_MOVE_PCNTXT was there
when we eventually submitted the patch upstream. But, looks like I did a
blind rebasing instead of using IRQ_MOVE_PCNTXT in hpet MSI code.
Below patch fixes this. i.e., revert commit 932775a4ab
and add PCNTXT to HPET MSI setup. Also removes copying of desc->affinity
in generic code as set_affinity routines are doing it internally.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Li Shaohua" <shaohua.li@intel.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: "lcm@us.ibm.com" <lcm@us.ibm.com>
Cc: suresh.b.siddha@intel.com
LKML-Reference: <20090413222058.GB8211@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We ended up incorrectly using '&cur' instead of '&readin' in the
work_on_cpu() -> smp_call_function_single() transformation in commit
01599fca67 ("cpufreq: use
smp_call_function_[single|many]() in acpi-cpufreq.c").
Andrew explains:
"OK, the acpi tree went and had conflicting changes merged into it after
I'd written the patch and it appears that I incorrectly reverted part
of 18b2646fe3 while fixing the resulting
rejects.
Switching it to `readin' looks correct."
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: reduce kernel size a bit, address sparse warning
Addresses the problem pointed out by this sparse warning:
arch/x86/kernel/pci-swiotlb.c:53:20: warning: symbol 'swiotlb_dma_ops' was not declared. Should it be static?
For x86: swiotlb_dma_ops can be static, because it's not used outside
of pci-swiotlb.c
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1239558861.3938.2.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>