Check CONFIG_FREEZER instead of CONFIG_PM because kprobe booster
depends on freeze_processes() and thaw_processes() when CONFIG_PREEMPT=y.
This fixes a linkage error which occurs when CONFIG_PREEMPT=y, CONFIG_PM=y
and CONFIG_FREEZER=n.
Reported-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Len Brown <len.brown@intel.com>
Often the cause of kernel unaligned access warnings is not
obvious from just the ip displayed in the warning. This adds
the option via proc to dump the stack in addition to the warning.
The default is off (just display the 1 line warning). To enable
the stack to be shown: echo 1 > /proc/sys/kernel/unaligned-dump-stack
Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Impact: fix build errors
Since the SPARSE IRQS changes redefined how the kstat irqs are
organized, arch's must use the new accessor function:
kstat_incr_irqs_this_cpu(irq, DESC);
If CONFIG_SPARSE_IRQS is set, then DESC is a pointer to the
irq_desc which has a pointer to the kstat_irqs. If not, then
the .irqs field of struct kernel_stat is used instead.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove __attribute__((weak)) from common code sys_pipe implemantation.
IA64, ALPHA, SUPERH (32bit) and SPARC (32bit) have own implemantations
with the same name. Just rename them.
For sys_pipe2 there is no architecture specific implementation.
Cc: Richard Henderson <rth@twiddle.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
IA64 dynamic ftrace support.
The original _mcount stub for each function is like:
alloc r40=ar.pfs,12,8,0
mov r43=r0;;
mov r42=b0
mov r41=r1
nop.i 0x0
br.call.sptk.many b0 = _mcount;;
The patch convert it to below for nop:
[MII] nop.m 0x0
mov r3=ip
nop.i 0x0
[MLX] nop.m 0x0
nop.x 0x0;;
This isn't completely nop, as there is one instuction 'mov r3=ip', but
it should be light and harmless for code follow it.
And below is for call
[MII] nop.m 0x0
mov r3=ip
nop.i 0x0
[MLX] nop.m 0x0
brl.many .;;
In this way, only one instruction is changed to convert code between nop
and call. This should meet dyn-ftrace's requirement.
But this requires CPU support brl instruction, so dyn-ftrace isn't
supported for old Itanium system. Assume there are quite few such old
system running.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
IA64 ftrace suppport. In IA64, below code will be added in each function
if -pg is enabled.
alloc r40=ar.pfs,12,8,0
mov r43=r0;;
mov r42=b0
mov r41=r1
nop.i 0x0
br.call.sptk.many b0 = _mcount;;
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, update to new cpumask API
Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's
so access to them should be using the new cpumask API.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: build fix
Ingo Molnar wrote:
> tip/arch/blackfin/kernel/irqchip.c: In function 'show_interrupts':
> tip/arch/blackfin/kernel/irqchip.c:85: error: 'struct kernel_stat' has no member named 'irqs'
> make[2]: *** [arch/blackfin/kernel/irqchip.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
>
So could move kstat_irqs array to irq_desc struct.
(s390, m68k, sparc) are not touched yet, because they don't support genirq
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
[IA64] fix typo in cpumask_of_pcibus()
x86: fix x86_32 builds for summit and es7000 arch's
cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
cpumask: use cpumask_var_t in acpi-cpufreq.c
cpumask: use work_on_cpu in acpi/cstate.c
cpumask: convert struct cpufreq_policy to cpumask_var_t
cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
x86: cleanup remaining cpumask_t ops in smpboot code
cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
cpumask: update local_cpus_show to use new cpumask API
ia64: cpumask fix for is_affinity_mask_valid()
On some boxes there exist both RSDT and XSDT table. But unfortunately
sometimes there exists the following error when XSDT table is used:
a. 32/64X address mismatch
b. The 32/64X FACS address mismatch
In such case the boot option of "acpi=rsdt" is provided so that
RSDT is tried instead of XSDT table when the system can't work well.
http://bugzilla.kernel.org/show_bug.cgi?id=8246
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
cc:Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.
This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: Section fix
WARNING: vmlinux.o(.text+0x596d2): Section mismatch in
reference from the function swiotlb_dma_init() to the function
.init.text:swiotlb_init()
The function swiotlb_dma_init() references
the function __init swiotlb_init().
This is often because swiotlb_dma_init lacks a __init
annotation or the annotation of swiotlb_init is wrong.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This adds swiotlb_map_page and swiotlb_unmap_page to lib/swiotlb.c and
remove IA64 and X86's swiotlb_map_page and swiotlb_unmap_page.
This also removes unnecessary swiotlb_map_single, swiotlb_map_single_attrs,
swiotlb_unmap_single and swiotlb_unmap_single_attrs.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This converts X86 and IA64 to use include/linux/dma-mapping.h.
It's a bit large but pretty boring. The major change for X86 is
converting 'int dir' to 'enum dma_data_direction dir' in DMA mapping
operations. The major changes for IA64 is using map_page and
unmap_page instead of map_single and unmap_single.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds dma_get_ops hook to struct ia64_machine_vector. We use
dma_get_ops() in arch/ia64/kernel/dma-mapping.c, which simply returns
the global dma_ops. This is for removing hwsw_dma_ops.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch introduces a global pointer, dma_ops, which points to an
appropriate dma_mapping_ops that the kernel should use. This is a
common way to handle multiple dma_mapping_ops (X86, POWER, and SPARC).
dma_ops is set in platform_dma_init. We also set it by hand where
machvec_init is callev via subsys_initcall.
- IA64_DIG_VTD uses vtd_dma_ops.
- IA64_HP_ZX1 uses sba_dma_ops.
- IA64_HP_ZX1_SWIOTLB uses hwsw_dma_ops.
- IA64_SGI_SN2 uses sn_dma_ops.
- The rest use swiotlb_dma_ops.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is already dma_mapping_ops for SWIOTLB but there are some
missing hooks.
This is for IA64_DIG_VTD, IA64_HP_ZX1_SWIOTLB, IA64_SGI_UV,
IA64_HP_SIM, IA64_XEN_GUEST and IA64_GENERIC.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The function prototype should use 'struct cpumask *' to declare
cpumask arguments (instead of cpumask_var_t).
Note: arch/ia64/kernel/irq.c still had the following "old cpumask_t" usages:
105: cpumask_t mask = CPU_MASK_NONE;
107: cpu_set(cpu_logical_id(hwid), mask);
110: irq_desc[irq].affinity = mask;
... replaced with a simple "cpumask_of(cpu_logical_id(hwid))".
161: new_cpu = any_online_cpu(cpu_online_map);
194: time_keeper_id = first_cpu(cpu_online_map);
... replaced with cpu_online_mask refs.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
Impact: build fix on ia64
ia64's default_affinity_write() still had old cpumask_t usage:
/home/mingo/tip/kernel/irq/proc.c: In function `default_affinity_write':
/home/mingo/tip/kernel/irq/proc.c:114: error: incompatible type for argument 1 of `is_affinity_mask_valid'
make[3]: *** [kernel/irq/proc.o] Error 1
make[3]: *** Waiting for unfinished jobs....
update it to cpumask_var_t.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.
In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.
This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
We can also use the new for_each_cpu_and() to avoid a temporary cpumask,
and a gratuitous test in sn_topology_show.
(Includes fix from KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: change existing irq_chip API
Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.
Fortunately, not widely used code, but hits a few architectures.
Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly. Ingo, does this break anything?
(Folded in fix from KOSAKI Motohiro)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: change calling convention of existing cpumask APIs
Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.
These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
Impact: cleanup
Each SMP arch defines these themselves. Move them to a central
location.
Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
Those archs simply have phys_cpu_present_map replaced everywhere.
3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
so I just manipulate them both in sync.
4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
declarations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
The generic_defconfig has three section mismatches. This clears
arch_unregister_cpu()
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
pv_cpu_ops.getreg(_IA64_REG_IP) returned constant.
But the returned ip valued should be the one in the caller, not of the callee.
This patch fixes that.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/kernel/pci-dma.c only needs to include iommu once.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Using printk from MCA/INIT context is unsafe since it can cause deadlock.
The ia64_mca_modify_original_stack is called from both of mca handler and
init handler, so it should use mprintk instead of printk.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Itanium processors can handle some misaligned data accesses. They
also provide a mode where all such accesses are forced to trap. The
kernel was schizophrenic about use of this mode:
* Base kernel code ran in permissive mode where the only traps
generated were from those cases that the h/w could not handle.
* Interrupt, syscall and trap code ran in strict mode where all
unaligned accesses caused traps to the 0x5a00 unaligned reference
vector.
Use strict alignment checking throughout the kernel, but make
sure that we continue to let user mode use more relaxed mode
as the default.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
IA64 kdump kernel failed to initialize /proc/vmcore in 2.6.28-rc2.
A bug was introduced in this patch commit:
d9a9855d0b
always reserve elfcore header memory in crash kernel
The problem was that the call to reserve_elfcorehdr() should be placed
in CONFIG_CRASH_DUMP rather than in CONFIG_CRASH_KERNEL, which does
not exist.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Hormon <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes a regression introduced by 2c6e6db41f
"Minimize per_cpu reservations." That patch incorrectly used information about
what CPUs are possible that was not yet initialized by ACPI. The end result
was that per_cpu structures for offline CPUs were not initialized causing a
NULL pointer reference.
Since we cannot do the full acpi_boot_init() call any earlier, the simplest
fix is to just parse the MADT for SAPIC entries early to find the CPU
info. This should also allow for some cleanup of the code added by the
"Minimize per_cpu reservations". This patch just fixes the regressions, the
cleanup will come in a later patch.
Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
CC: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On Thu, Oct 23, 2008 at 04:09:52PM -0700, Alexander Beregalov wrote:
> arch/x86/kernel/built-in.o: In function `iommu_setup':
> pci-dma.c:(.init.text+0x36ad): undefined reference to `forbid_dac'
> pci-dma.c:(.init.text+0x36cc): undefined reference to `forbid_dac'
> pci-dma.c:(.init.text+0x3711): undefined reference to `forbid_dac
This patch partially reverts a patch to add IOMMU support to ia64. The
forbid_dac variable was incorrectly moved to quirks.c, which isn't built
when PCI is disabled.
Tested-by: "Alexander Beregalov" <a.beregalov@gmail.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (41 commits)
[IA64] Fix annoying IA64_TR_ALLOC_MAX message.
[IA64] kill sys32_pipe
[IA64] remove sys32_pause
[IA64] Add Variable Page Size and IA64 Support in Intel IOMMU
ia64/pv_ops: paravirtualized instruction checker.
ia64/xen: a recipe for using xen/ia64 with pv_ops.
ia64/pv_ops: update Kconfig for paravirtualized guest and xen.
ia64/xen: preliminary support for save/restore.
ia64/xen: define xen machine vector for domU.
ia64/pv_ops/xen: implement xen pv_time_ops.
ia64/pv_ops/xen: implement xen pv_irq_ops.
ia64/pv_ops/xen: define the nubmer of irqs which xen needs.
ia64/pv_ops/xen: implement xen pv_iosapic_ops.
ia64/pv_ops/xen: paravirtualize entry.S for ia64/xen.
ia64/pv_ops/xen: paravirtualize ivt.S for xen.
ia64/pv_ops/xen: paravirtualize DO_SAVE_MIN for xen.
ia64/pv_ops/xen: define xen paravirtualized instructions for hand written assembly code
ia64/pv_ops/xen: define xen pv_cpu_ops.
ia64/pv_ops/xen: define xen pv_init_ops for various xen initialization.
ia64/pv_ops/xen: elf note based xen startup.
...
elfcore header memory needs to be reserved in a crash kernel. This means
that the relevant code should be protected by CONFIG_CRASH_DUMP rather
than CONFIG_PROC_VMCORE.
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The usage of elfcorehdr_addr has changed recently such that being set to
ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is
executing in a kernel executed as a crash kernel.
However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest
elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent
calls to is_kdump_kernel() will return 0, even though they should return
1.
Ok, at this point in time there are no subsequent calls, but I think its
fair to say that there is ample scope for error or at the very least
confusion.
This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that
elfcorehdr_addr was passed on the command line, and thus execution is
taking place in a crashdump kernel, but vmcore can't be used for some
reason. This is tested for using is_vmcore_usable() and set using
vmcore_unusable(). A subsequent patch makes use of this new code.
To summarise, the states that elfcorehdr_addr can now be in are as follows:
ELFCORE_ADDR_MAX: not a crashdump kernel
ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable
any other value: crash dump kernel and vmcore is usable
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE
but also by the code which is not inside CONFIG_PROC_VMCORE. For
example, is_kdump_kernel() is used by powerpc code to determine if
kernel is booting after a panic then use previous kernel's TCE table.
So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be
able to correctly determine that we are booting after a panic and setup
calgary iommu accordingly.
o So remove the assumption that elfcorehdr_addr is under
CONFIG_PROC_VMCORE.
o Move definition of elfcorehdr_addr to arch dependent crash files.
(Unfortunately crash dump does not have an arch independent file
otherwise that would have been the best place).
o kexec.c is not the right place as one can Have CRASH_DUMP enabled in
second kernel without KEXEC being enabled.
o I don't see sh setup code parsing the command line for
elfcorehdr_addr. I am wondering how does vmcore interface work on sh.
Anyway, I am atleast defining elfcoredhr_addr so that compilation is not
broken on sh.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch contains Intel IOMMU IA64 specific code. It defines new
machvec dig_vtd, hooks for IOMMU, DMAR table detection, cache line flush
function, etc.
For a generic kernel with CONFIG_DMAR=y, if Intel IOMMU is detected,
dig_vtd is used for machinve vector. Otherwise, kernel falls back to
dig machine vector. Kernel parameter "machvec=dig" or "intel_iommu=off"
can be used to force kernel to boot dig machine vector.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements a checker to detect instructions which
should be paravirtualized instead of direct writing raw instruction.
This patch does rough check so that it doesn't fully cover all cases,
but it can detects most cases of paravirtualization breakage of hand
written assembly codes.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define arch/ia64/include/asm/xen/irq.h to define the number of
irqs which xen needs.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch enables elf note based xen startup for IA-64, which gives the
kernel an early hint for running on xen like x86 case.
In order to avoid the multi entry point, presumably extending booting
protocol(i.e. extending struct ia64_boot_param) would be necessary.
It probably means that elilo also needs modification.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
eliminate the function declaration ia64_cpu_local_tick() in
process.c by defining in arch/ia64/include/asm/timex.h
The same function will be used in a different .c file later.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The macro get_irq_chip() is defined in linux/include/linux/irq.h
which cause name conflict with one in linux/arch/ia64/include/asm/paravirt.h.
rename the latter to __get_irq_chip().
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When CONFIG_SMP=n, three instruction in ivt.S were missed to paravirtualize.
paravirtualize them.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Initial fix for making sure that we can access percpu variables
in all C code (commit: 10617bbe84)
inadvertantly allocated the memory in the "percpu" section of
the vmlinux ELF executable. This confused kexec/dump.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently a memory segment in memory map with attribute of EFI_MEMORY_UC
is denoted as "System RAM" in /proc/iomem, while memory of attribute
(EFI_MEMORY_WB|EFI_MEMORY_UC) is also labeled the same.
The kexec utility then includes uncached memory as part of vmcore. The
kdump kernel MCA'ed when it tries to save the vmcore to a disk. A normal
"cached" access may cause MCAs.
This patch would label memory with attribute of EFI_MEMORY_UC only as
"Uncached RAM" so that kexec would know not to include it in the vmcore.
I will submit a separate kexec-tools patch to the kexec list.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Peter Chubb reported that commit 3463a93def
(Update check_sal_cache_flush to use platform_send_ipi()) broke
Ski because it does not implement IPIs.
Tony Luck suggested we just #ifndef out the call (since the simulator
does not have the SAL bug that this code is attempting to detect and
workaround)
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make ia64 refrain from clearing a given to-be-offlined CPU's bit in the
cpu_online_mask until it has processed pending irqs. This change
prevents other CPUs from being blindsided by an apparently offline CPU
nevertheless changing globally visible state. Also remove the existing
redundant cpu_clear(cpu, cpu_online_map).
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Broke the non modular builds by moving an essential function into
modules.c. Fix this by moving it out again and into asm/sections.h as
an inline. To do this, the definitions of struct fdesc and struct
got_val have been lifted out of modules.c and put in asm/elf.h where
they belong.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7. However,
the current way its coded doesn't work on parisc64. For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors
Make dereference_function_descriptor() more accommodating by allowing
architecture overrides. I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, there is no notifier that is called on a new cpu, before the new
cpu begins processing interrupts/softirqs.
Various kernel function would need that notification, e.g. kvm works around
by calling smp_call_function_single(), rcu polls cpu_online_map.
The patch adds a CPU_STARTING notification. It also adds a helper function
that sends the message to all cpu_chain handlers.
Tested on x86-64.
All other archs are untested. Especially on sparc, I'm not sure if I got
it right.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_SFC=m uses topology_core_siblings() which, for ia64, expects
cpu_core_map to be exported. It is not. This patch exports the needed
symbol.
Maintainers note: This really looks like the wrong thing to do ... it
would be much better for the kernel to export an API to provide
drivers like this with data they need (which in the case of this
driver seems to be an estimate of the effective parallelism available
on the platform). But x86 has exported this forever ... so go with
the flow until such an API is defined.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Making allmodconfig will break the current build. This patch shrinks
the per_cpu__shadow_flush_counts from 16k to 8k which frees enough space
to allow allmodconfig to successfully complete.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=11338
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 handles per-cpu variables a litle differently from other architectures
in that it maps the physical memory allocated for each cpu at a constant
virtual address (0xffffffffffff0000). This mapping is not enabled until
the architecture specific cpu_init() function is run, which causes problems
since some generic code is run before this point. In particular when
CONFIG_PRINTK_TIME is enabled, the boot cpu will trap on the access to
per-cpu memory at the first printk() call so the boot will fail without
the kernel printing anything to the console.
Fix this by allocating percpu memory for cpu0 in the kernel data section
and doing all initialization to enable percpu access in head.S before
calling any generic code.
Other cpus must take care not to access per-cpu variables too early, but
their code path from start_secondary() to cpu_init() is all in arch/ia64
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/kernel/vmlinux.lds is a generated file. Tell
git to ignore it.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Recent kernels are not booting on some HP systems (though
it does boot on others). James and Willy reported the
problem. James did the bisection to find the commit
that caused the problem:
498c517047.
[IA64] pvops: paravirtualize ivt.S
Two instructions were wrongly paravirtualized such that
_FROM_ macro had been used where _TO_ was intended
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Wilcox, Matthew R" <matthew.r.wilcox@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After moving the the include files there were a few clean-ups:
1) Some files used #include <asm-ia64/xyz.h>, changed to <asm/xyz.h>
2) Some comments alerted maintainers to look at various header files to
make matching updates if certain code were to be changed. Updated these
comments to use the new include paths.
3) Some header files mentioned their own names in initial comments. Just
deleted these self references.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This extends wait_task_inactive() with a new argument so it can be used in
a "soft" mode where it will check for the task changing state unexpectedly
and back off. There is no change to existing callers. This lays the
groundwork to allow robust, noninvasive tracing that can try to sample a
blocked thread but back off safely if it wakes up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value. This patch implements
the handling of O_CLOEXEC for the flag. I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation. I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.
The implementation introduces do_pipe_flags. I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code. To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags. Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_pipe2
# ifdef __x86_64__
# define __NR_pipe2 293
# elif defined __i386__
# define __NR_pipe2 331
# else
# error "need __NR_pipe2"
# endif
#endif
int
main (void)
{
int fd[2];
if (syscall (__NR_pipe2, fd, 0) != 0)
{
puts ("pipe2(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
{
puts ("pipe2(O_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
acpi_map_lsapic tries to stuff a long into ia64_cpu_to_sapicid[],
which can only hold ints, so let's fix that.
We need to update the signature of acpi_map_cpu2node() too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
module_free() refers the first parameter before checking.
But it is called like below(in kernel/kprobes). The first parameter is always NULL.
This happens when many probe points(>1024) are set by kprobes.
I encountered this with using SystemTap. It can set many probes easily.
static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
{
...
if (kip->nused == 0) {
hlist_del(&kip->hlist);
if (hlist_empty(&kprobe_insn_pages)) {
...
} else {
module_free(NULL, kip->insns); //<<< 1st param always NULL
kfree(kip);
}
return 1;
}
return 0;
}
Signed-off-by: Akiyama, Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When dprintk is enabled the following warnings are generated:
arch/ia64/kernel/cpufreq/acpi-cpufreq.c: In function 'processor_set_pstate':
arch/ia64/kernel/cpufreq/acpi-cpufreq.c:54: warning: format '%x' expects type 'unsigned int', but argumen
t 3 has type 's64'
arch/ia64/kernel/cpufreq/acpi-cpufreq.c: In function 'processor_get_pstate':
arch/ia64/kernel/cpufreq/acpi-cpufreq.c:76: warning: format '%x' expects type 'unsigned int', but argumen
t 2 has type 's64'
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
"idle=nomwait" disables the use of the MWAIT
instruction from both C1 (C1_FFH) and deeper (C2C3_FFH)
C-states.
When MWAIT is unavailable, the BIOS and OS generally
negotiate to use the HALT instruction for C1,
and use IO accesses for deeper C-states.
This option is useful for power and performance
comparisons, and also to work around BIOS bugs
where broken MWAIT support is advertised.
http://bugzilla.kernel.org/show_bug.cgi?id=10807http://bugzilla.kernel.org/show_bug.cgi?id=10914
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
"idle=halt" limits the idle loop to using
the halt instruction. No MWAIT, no IO accesses,
no C-states deeper than C1.
If something is broken in the idle code,
"idle=halt" is a less severe workaround
than "idle=poll" which disables all power savings.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
The symbol account_system_vtime is used by the kvm module but
not exported. This breaks building with CONFIG_VIRT_CPU_ACCOUNTING
and CONFIG_KVM=m.
Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Acked-by: Hidetosho Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On a system where there are no hot pluggable cpus "additional_cpus"
is still set to -1 at the point where we call per_cpu_scan_finalize().
If we didn't find an SRAT table and so pick the default "32" for the
number of cpus, when we get to:
high_cpu = min(high_cpu + reserve_cpus, NR_CPUS);
we will end up initializing for just 31 cpus ... and so we will
die horribly when bringing up cpu#32.
Problem introduced by: 2c6e6db41f
"Minimize per_cpu reservations."
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This converts ia64 to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory. Thus a NULL test or
memset after calls to these functions is unnecessary.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Call check_sal_cache_flush() after platform_setup() as
check_sal_cache_flush() now relies on being able to call platform
vector code.
Problem was introduced by: 3463a93def
"Update check_sal_cache_flush to use platform_send_ipi()"
Signed-off-by: Jes Sorensen <jes@sgi.com>
Tested-by: Alex Chiang: <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
check_sal_cache_flush is used to detect broken firmware that drops
pending interrupts.
The old implementation schedules a timer interrupt for itself in
the future by getting the current value of the Interval Timer
Counter + 1000 cycles, waits for the interrupt to be pended, calls
SAL_CACHE_FLUSH, and finally checks to see if the interrupt is
still pending.
This implementation can cause problems for virtual machine code if
the process of scheduling the timer interrupt takes more than 1000
cycles; the virtual machine can end up sleeping for several hundred
years while waiting for the ITC to wrap around.
The fix is to use platform_send_ipi. The processor will still send
an interrupt to itself, using the IA64_IPI_DM_INT delivery mode,
which causes the IPI to look like an external interrupt. The rest
of the SAL_CACHE_FLUSH + checking to see if the interrupt is still
pending remains unchanged.
This fix has been boot tested successfully on:
- intel tiger2
- hp rx6600
- hp rx5670
The rx5670 has known buggy firmware, where SAL_CACHE_FLUSH drops
pending interrupts. A boot test on this machine showed this message
on the console:
SAL: SAL_CACHE_FLUSH drops interrupts; PAL_CACHE_FLUSH will be used instead
Which proves that the self-inflicted IPI approach is viable. And
as expected, the other tested platforms correctly did not display
the warning.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a SLIT sanity checking patch. It moves slit_valid() function to
generic ACPI code and does sanity checking for both x86 and ia64. It sets up
node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
SLIT table on ia64. It also cleans up unused variable localities in
acpi_parse_slit() on x86.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Move the cleanup of the async queue to the close callback from the flush
callback. This avoids losing asynchronous overflow notifications when
the file descriptor is shared by multiple processes and one terminates.
Signed-off-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
move interrupt, page_fault, non_syscall, dispatch_unaligned_handler and
dispatch_to_fault_handler to avoid lack of instructin space.
The change set 4dcc29e157 bloated
SAVE_MIN_WITH_COVER, SAVE_MIN_WITH_COVER_R19 so that it bloated the
functions which uses those macros.
In the native case, only dispatch_illegal_op_fault had to be moved.
When paravirtualized case the all functions which use the macros need
to be moved to avoid the lack of space.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Introduce pv_time_ops which adds hook to steal time accounting.
On virtualized environment, cpus are shared by many guests and
steal time is the time which is used for other guests.
On virtualized environtment, streal time should be accounted.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_irq_ops which adds hooks to paravirtualize irq related
operations.
On virtualized environment, interruption may be replaced by something
virtualization friendly. So the irq related operation also may need
paravirtualization.
This patch adds necessary hooks to paravirtualize irq related operations.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
add hooks to paravirtualize iosapic which is a real hardware resource.
On virtualized environment it may be replaced something virtualized
friendly.
Define pv_iosapic_ops and add the hooks.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define pv_init_ops hooks which represents various initialization
hooks for paravirtualized environment. and add hooks.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make NR_IRQ overridable by each pv instances.
Pv instance may need each own number of irqs so that
NR_IRQS should be the maximum number of nr_irqs each
pv instances need.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ia64_swtich_to, ia64_leave_syscall and ia64_leave_kernel.
They include sensitive or performance critical privileged instructions
so that they need paravirtualization.
To paravirtualize them by single source and multi compile
they are converted into indirect jump. And define each pv instances.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ivt.S which implements fault handler in hand written
assembly code.
They includes sensitive or performance critical privileged instructions.
So they need paravirtualization.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize minstate.h which are hand written assembly code.
They include sensitive or performance critical privileged
instructions. So that they are appropriate for paravirtualization.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Preparation for paravirtualization of hand written assembly code.
They are paravirtualized by single source code and compiled multi times.
To tell those files for target (including native), add one defines.
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_cpu_ops to paravirtualize privleged instructions
which are defined by ia64 intrinsics.
make them indirect C function calls by introducing function
tables, pv_cpu_ops.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds a setup hook in the very early boot sequence
before start_kernel() to initialize paravirtualization stuff.
The hook will be set by each pv loader code or by using multi entry point.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_info which describes some randome info about
underlying execution environment.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move the LOAD_OFFSET definition from vmlinux.lds.S into system.h.
On paravirtualized environments, it is necessary to detect the
execution environment. One of the solutions is the multi entry point.
The multi entry point allows a boot loader to start the kernel execution
from the entry point which is different from the ELF entry point.
The non standard entry point will defined as the specialized elf note
which contains the LMA of the entry point symbol.
The constant, LOAD_OFFSET, is necessary to calculate the symbol's LMA.
Move the definition into the public header file to make it available
to the multi entry point support.
Cc: "He, Qing" <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
remove extern declaration of handle_IPI() in irq_ia64.c.
Instead, declare it in asm-ia64/smp.h.
Later handle_IPI() will be referenced from another file.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Problem: An application violating the architectural rules regarding
operation dependencies and having specific Register Stack Engine (RSE)
state at the time of the violation, may result in an illegal operation
fault and invalid RSE state. Such faults may initiate a cascade of
repeated illegal operation faults within OS interruption handlers.
The specific behavior is OS dependent.
Implication: An application causing an illegal operation fault with
specific RSE state may result in a series of illegal operation faults
and an eventual OS stack overflow condition.
Workaround: OS interruption handlers that switch to kernel backing
store implement a check for invalid RSE state to avoid the series
of illegal operation faults.
The core of the workaround is the RSE_WORKAROUND code sequence
inserted into each invocation of the SAVE_MIN_WITH_COVER and
SAVE_MIN_WITH_COVER_R19 macros. This sequence includes hard-coded
constants that depend on the number of stacked physical registers
being 96. The rest of this patch consists of code to disable this
workaround should this not be the case (with the presumption that
if a future Itanium processor increases the number of registers, it
would also remove the need for this patch).
Move the start of the RBS up to a mod32 boundary to avoid some
corner cases.
The dispatch_illegal_op_fault code outgrew the spot it was
squatting in when built with this patch and CONFIG_VIRT_CPU_ACCOUNTING=y
Move it out to the end of the ivt.
Signed-off-by: Tony Luck <tony.luck@intel.com>
acpi_unregister_gsi() should "undo" what acpi_register_gsi() does.
On systems that have legacy interrupts, acpi_unregister_gsi erroneously calls
iosapci_unregister_intr() which is wrong to do and causes a loud warning.
acpi_unregister_gsi() should just return in these cases.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is only palinfo_handle_smp as (indirect) user of palinfo_smp_call (by
way of smp_call_function_single) and surely palinfo_handle_smp never pass
NULL as parameter for info.
Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a typo, and coding style cleanups for pfm_handle_work().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch does:
- make comment at next to resched check more robust
- move "re-check" comments to next to where change predicate regs
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[Bug-fix for "[BUG?][2.6.25-mm1] sleeping during IRQ disabled"]
This patch does:
- enable interrupts before calling schedule() as same as others, ex. x86
- enable interrupts during ia64_do_signal() and ia64_sync_krbs()
- do_notify_resume_user() is still called with interrupts disabled, since
we can take short path of fsys_mode if-statement quickly.
- pfm_handle_work() is also called with interrupts disabled, since
it can deal interrupt mask within itself.
- fix/add some comments/notes
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sequence executed in check_sal_cache_flush:
- pend a timer interrupt
- call SAL_CACHE_FLUSH
- see if interrupt is still pending
can hang HP machines with buggy SAL_CACHE_FLUSH implementations.
Provide a kernel command-line argument to allow users skip this
check if desired. Using this parameter will force ia64_sal_cache_flush
to call ia64_pal_cache_flush() instead of SAL_CACHE_FLUSH.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some IA64 machines map all cell-local memory above 4 GB (32 bit limit).
However, in most cases, the kernel needs some memory below that limit that is
DMA-capable. So in this machine configuration, the crashkernel will be reserved
above 4 GB.
For machines that use SWIOTLB implementation because they lack an I/O MMU
the low memory is required by the SWIOTLB implementation. In that case,
it doesn't make sense to reserve the crashkernel at all because it's unusable
for kdump.
A special case is the "hpzx1" machine vector. In theory, it has a I/O MMU, so
it can be booted above 4 GB. However, in the kdump case that is not possible
because of changeset 51b58e3e26:
On HP zx1 machines, the 'machvec=dig' parameter is needed for the kdump
kernel to avoid problems with the HP sba iommu. The problem is that during
the boot of the kdump kernel, the iommu is re-initialized, so in-flight DMA
from improperly shutdown drivers causes an IOTLB miss which leads to an
MCA. With kdump, the idea is to get into the kdump kernel with as little
code as we can, so shutting down drivers properly is not an option.
The workaround is to add 'machvec=dig' to the kdump kernel boot parameters.
This makes the kdump kernel avoid using the sba iommu altogether, leaving
the IOTLB intact. Any ongoing DMA falls harmlessly outside the kdump
kernel. After the kdump kernel reboots, all devices will have been
shutdown properly and DMA stopped.
This patch pushes that functionality into the sba iommu initialization
code, so that users won't have to find the obscure documentation telling
them about 'machvec=dig'.
This means that also for hpzx1 it's not possible to boot when all
memory is above the 4 GB limit. So the only machine vectors that can handle
this case are "sn2" and "uv".
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds the basic IA64 machvec infrastructure to support
the SGI "UV" platform.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Races galore... General rule: as soon as it's in descriptor table,
it's over; another thread might have started IO on it/dup2() it
elsewhere/dup2() something *over* it/etc. fd_install() is the very
last step one should take - it's a point of no return.
Besides, the damn thing leaked on failure exits...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define
our own set_restore_sigmask() function. This saves the costly
SMP-safe set_bit operation, which we do not need for the sigmask
flag since TIF_SIGPENDING always has to be set too.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix indenting of switch statement to follow CodingStyle, and
pull out handling of call_data into an inlined function.
I confirmed that applying this fix doesn't affect assembled code.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch silences:
WARNING: vmlinux.o(.text+0x44672): Section mismatch in
reference from the function arch_register_cpu() to the
function .cpuinit.text:register_cpu()
Changes are based on codes in arch/x86/kernel/topology.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes following warning:
WARNING: vmlinux.o(.exit.text+0xb1): Section mismatch in
reference from the function palinfo_exit() to the variable
.cpuinit.data:palinfo_cpu_notifier
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch shuts up the following:
WARNING: vmlinux.o(.text+0x7102): Section mismatch in
reference from the function fixup_irqs() to the function
.devinit.text:ia64_disable_timer()
Removing ia64_disable_timer() is safe because there are no functions
calling it other than the fixup_irqs(),
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch kills:
WARNING: vmlinux.o(.text+0x1702): Section mismatch in
reference from the function acpi_register_ioapic() to the
function .devinit.text:iosapic_init()
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks.
Those low bits are scarce. Renumber TIF_RESTORE_SIGMASK to free one up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Legacy HP ia64 platforms currently cannot provide
/proc/cpuinfo/physical_id due to legacy SAL/PAL implementations.
However, that physical topology information can be obtained
via ACPI.
Provide an interface that gives ACPI one last chance to provide
physical_id for these legacy platforms. This logic only comes
into play iff:
- ACPI actually provides slot information for the CPU
- we lack a valid socket_id
Otherwise, we don't do anything.
Since x86 uses the ACPI processor driver as well, we provide a nop
stub function for arch_fix_phys_package_id() in asm-x86/topology.h
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Commit 113134fcbc changed the flow of
control when calling PAL_LOGICAL_TO_PHYSICAL and SAL_PHYSICAL_ID_INFO.
With the change, if a platform did not implement the latter, a useless
printk would appear in the boot log:
ia64_sal_pltid failed with -1
So let's check the return code and only printk on a true error, and do
not print anything in the unimplemented case. While we're in there,
clean up some stylistic issues too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Enable the uncached allocator to allocate multiple pages of contiguous
uncached memory.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- remove unused 'irq' argument from pfm_do_interrupt_handler()
- remove pointless cast to void*
- add KERN_xxx prefix to printk()
- remove braces around singleton C statement
- in tioce_provider.c, start tioce_dma_consistent() and
tioce_error_intr_handler() function declarations in column 0
This change's main purpose is to prepare for the patchset in
jgarzik/misc-2.6.git#irq-remove, that explores removal of the
never-used 'irq' argument in each interrupt handler.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are many notify_die() and almost all take same style with
ia64_mca_spin(). This patch defines macros and replace them all,
to reduce lines and to improve readability.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are 3 hooks in MCA handler, but this DIE_MCA_MONARCH_PROCESS
event does not notified other than for the first monarch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing with CONFIG_VIRT_CPU_ACCOUNTING=y, I found that
I occasionally get very huge system time in some threads.
So I dug the issue and finally noticed that it was caused
because of an interrupt which interrupt in the following window:
> [arch/ia64/kernel/entry.S: (!CONFIG_PREEMPT && CONFIG_VIRT_CPU_ACCOUNTING)]
>
> ENTRY(ia64_leave_syscall)
> :
> (pUStk) rsm psr.i
> cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall
> (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk
> .work_processed_syscall:
> adds r2=PT(LOADRS)+16,r12
> (pUStk) mov.m r22=ar.itc // fetch time at leave
> adds r18=TI_FLAGS+IA64_TASK_SIZE,r13
> ;;
> <<< window: from here >>>
> (p6) ld4 r31=[r18] // load current_thread_info()->flags
> ld8 r19=[r2],PT(B6)-PT(LOADRS)
> adds r3=PT(AR_BSPSTORE)+16,r12
> ;;
> mov r16=ar.bsp
> ld8 r18=[r2],PT(R9)-PT(B6)
> (p6) and r15=TIF_WORK_MASK,r31 // any work other than TIF_SYSCALL_TRACE?
> ;;
> ld8 r23=[r3],PT(R11)-PT(AR_BSPSTORE)
> (p6) cmp4.ne.unc p6,p0=r15, r0 // any special work pending?
> (p6) br.cond.spnt .work_pending_syscall
> ;;
> ld8 r9=[r2],PT(CR_IPSR)-PT(R9)
> ld8 r11=[r3],PT(CR_IIP)-PT(R11)
> (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE!
> ;;
> invala
> <<< window: to here >>>
> rsm psr.i | psr.ic // turn off interrupts and interruption collection
If pUStk is true, it means we are going to return user mode, hence we fetch
ar.itc to get time at leave from system.
It seems that it is not possible to interrupt the window if pUStk is true,
because interrupts are disabled early. And also disabling interrupt makes
sense because it is safe for referring current_thread_info()->flags.
However interrupting the window while pUStk is true was possible.
The route was:
ia64_trace_syscall
-> .work_pending_syscall_end
-> .work_processed_syscall
Only in case entering the window from this route, interrupts are enabled
during in the window even if pUStk is true. I suppose interrupts must be
disabled here anyway if pUStk is true.
I'm not sure but afraid that what kind of bad effect were there, other
than crazy system time which I found.
FYI, there was a commit 6f6d75825d that
points out a bug at same point(exit of ia64_trace_syscall) in 2006.
It can be said that there was an another bug.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
This patch fixes the problem that kdump by INIT does not work if we use
makedumpfile. The problem is that after INIT is issued, 2nd kernel
starts and makedumpfile fails with the following error message.
/proc/vmcore doesn't contain vmcoreinfo.
'-x' or '-i' must be specified.
makedumpfile Failed.
The cause of this problem is that kernel does not call
crash_save_vmcoreinfo. When kdump starts by panic or sysrq-trigger,
crash_save_vmcoreinfo is called by crash_kexec. But this function is not
called when kdump starts by INIT. The Attached patch fixes this.
This patch just adds crash_save_vmcoreinfo into machine_kdump_on_init so
that crash_save_vmcoreinfo can be called when kdump starts by INIT.
I tested this patch with linux-2.6.25-rc9 and I confirmed it worked.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a NUMA memory configuration issue in 2.6.24:
A 2-node machine of ours has got the following memory layout:
Node 0: 0 - 2 Gbytes
Node 0: 4 - 8 Gbytes
Node 1: 8 - 16 Gbytes
Node 0: 16 - 18 Gbytes
"efi_memmap_init()" merges the three last ranges into one.
"register_active_ranges()" is called as follows:
efi_memmap_walk(register_active_ranges, NULL);
i.e. once for the 4 - 18 Gbytes range. It picks up the node
number from the start address, and registers all the memory for
the node #0.
"register_active_ranges()" should be called as follows to
make sure there is no merged address range at its entry:
efi_memmap_walk(filter_memory, register_active_ranges);
"filter_memory()" is similar to "filter_rsvd_memory()",
but the reserved memory ranges are not filtered out.
Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The functions time_before, time_before_eq, time_after, and time_after_eq are
more robust for comparing jiffies against other values.
So use the time_after() & time_before() macros, defined at linux/jiffies.h,
which deal with wrapping correctly
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add kprobe-booster support on ia64.
Kprobe-booster improves the performance of kprobes by eliminating single-step,
where possible. Currently, kprobe-booster is implemented on x86 and x86-64.
This is an ia64 port.
On ia64, kprobe-booster executes a copied bundle directly, instead of single
stepping. Bundles which have B or X unit and which may cause an exception
(including break) are not executed directly. And also, to prevent hitting
break exceptions on the copied bundle, only the hindmost kprobe is executed
directly if several kprobes share a bundle and are placed in different slots.
Note: set_brl_inst() is used for preparing an instruction buffer(it does not
modify any active code), so it does not need any atomic operation.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: bibo,mao <bibo.mao@intel.com>
Cc: Rusty Lynch <rusty.lynch@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sys_getpid() and sys_set_tid_address() behavior changed from
return current->tgid
to
struct pid *pid;
pid = current->pids[PIDTYPE_PID].pid;
return pid->numbers[pid->level].nr;
But the fast system calls on ia64 still operate the old way. Patch them
appropriately to let ia64 work with pid namespaces. Besides, this is one more
step in deprecating of pid and tgid on task_struct.
The fsys_getppid() is to be patched as well, but its logic is much
more complex now, so I will make it later.
One thing I'm not 100% sure is the trick with the IA64_UPID_SHIFT. On order
to access the pid->level's element of an array I have to perform the following
calculations
pid + sizeof(struct upid) * pid->level
The problem is that ia64 can only multiply float point registers, while all
the offsets I have in code are in rXX ones. Fortunately, the sizeof(struct
upid) is 32 bytes on ia64 (and is very unlikely to ever change), so the
calculations get simpler:
pid + pid->level << 5
So, I introduce the IA64_UPID_SHIFT and use the shl instruction. I also
looked at how gcc compiles the similar place and found that it makes it with
shift as well. Is this OK to do so?
Tested with ski emulator with 2.6.24 kernel, but fits 2.6.25-rc4 and
2.6.25-rc4-mm1 as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: David Mosberger-Tang <davidm@hpl.hp.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
do_each_thread/while_each_thread is a double loop, so
should use 'goto' rather than 'break' to break out
the loop.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
One should normally unlock in the reverse order of the lock calls,
and in this case there certainly is no reason not to.
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While it is convenient that we can invoke kdump by asserting INIT
via button on chassis etc., there are some situations that invoking
kdump on fatal MCA is not welcomed rather than rebooting fast without
dump.
This patch adds a new flag 'kdump_on_fatal_mca' that is independent
from 'kdump_on_init' currently available. Adding this flag enable
us to turning on/off of kdump depend on the event, INIT and/or fatal
MCA. Default for this flag is to take the dump.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This attached patch significantly shrinks boot memory allocation on ia64.
It does this by not allocating per_cpu areas for cpus that can never
exist.
In the case where acpi does not have any numa node description of the
cpus, I defaulted to assigning the first 32 round-robin on the known
nodes.. For the !CONFIG_ACPI I used for_each_possible_cpu().
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch defines kernel parameter "nptcg=". The parameter overrides max number
of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to SDM2.2, Itanium supports multiple outstanding ptc.g instructions.
But current kernel function ia64_global_tlb_purge() uses a spinlock to serialize
ptc.g instructions issued by multiple processors. This serialization might have
scalability issue on a big SMP machine where many processors could purge TLB
in parallel.
The patch fixes this problem by issuing multiple ptc.g instructions in
ia64_global_tlb_purge(). It also adds support for the "PALO" table to get
a platform view of the max number of outstanding ptc.g instructions (which
may be different from the processor view found from PAL_VM_SUMMARY).
PALO specification can be found at: http://www.dig64.org/home/DIG64_PALO_R1_0.pdf
spinaphore implementation by Matthew Wilcox.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This interface provides more flexible functionality for smp
infrastructure ... e.g. KVM frequently needs to operate on
a subset of cpus.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Dynamic TR resource should be managed in the uniform way.
Add two interfaces for kernel:
ia64_itr_entry: Allocate a (pair of) TR for caller.
ia64_ptr_entry: Purge a (pair of ) TR by caller.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We have duplicate code to access registers (access_uarea and regset
way). They just have different layout, so remove duplicate code.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After we have regset support, we can use CORE_DUMP_USE_REGSET.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is the 32-bit regset implementation under IA64. Basically register
read/write, which is derived from current ptrace register read/write.
This version added TLS support.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is the 64-bit regset implementation under IA64. Basically register
read/write, which is derived from current ptrace register read/write.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch does:
- Remove outdated comments (which someday I marked with "?").
- Reassemble instructions to fit them in fewer bundles.
- If McKinley Errata 9 workaround is not needed, the workaround
bundles will be patched out with NOPs. However it also not
needed to have a totally NOP bundle (nop * 3) before branch.
As a result, this makes the code path 3 (or 2) bundles shorter
(and remove 1 unnecessary stop bit). It seems to be 1% faster.
(10sec loop test, with nojitter @ Madison 1.5GHz x 4)
Before:
CPU 0: 0.14 (usecs) (0 errors / 69598875 iterations)
CPU 1: 0.14 (usecs) (0 errors / 69630721 iterations)
CPU 2: 0.14 (usecs) (0 errors / 69607850 iterations)
CPU 3: 0.14 (usecs) (0 errors / 69619832 iterations)
After:
CPU 0: 0.14 (usecs) (0 errors / 70257728 iterations)
CPU 1: 0.14 (usecs) (0 errors / 70309498 iterations)
CPU 2: 0.14 (usecs) (0 errors / 70280639 iterations)
CPU 3: 0.14 (usecs) (0 errors / 70260682 iterations)
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 named their handler kprobes_fault_handler while all other
arches used kprobe_fault_handler. Change the function definition
and header declaration.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When EFI_DEBUG is defined to a non-zero value in arch/ia64/kernel/efi.c,
the efi memory regions are displayed. This patch enhances the
display code in a few ways:
1. Use TB, GB and MB as well as KB as units.
Although this introduces rounding errors (KB doesn't as
size is always a multiple of 4Kb), it does make
things a lot more readable.
Also as the range is also shown, it is possible to note the exact size
if it is important. In my experience, the size field is mostly useful
for getting a general idea of the size of a region.
On the rx2620 that I use, there actually is an 8TB region (though not
backed by physical memory, and 8TB really is a lot more readable than
8589934592KB.
2. pad the size field with leading spaces to further improve readability
...
... ( 8MB)
... ( 928MB)
... ( 3MB)
...
vs
...
... (8MB)
... (928MB)
... (3MB)
...
3. Pad the attr field out to 64bits using leading zeros,
to further improve readability.
...
mem05: type= 2, attr=0x0000000000000008, range=[0x0000000004000000-0x000000000481f000) ( 8MB)
mem06: type= 7, attr=0x0000000000000008, range=[0x000000000481f000-0x000000003e876000) ( 928MB)
mem07: type= 5, attr=0x8000000000000008, range=[0x000000003e876000-0x000000003eb8e000) ( 3MB)
mem08: type= 4, attr=0x0000000000000008, range=[0x000000003eb8e000-0x000000003ee7a000) ( 2MB)
...
...
mem05: type= 2, attr=0x8, range=[0x0000000004000000-0x000000000481f000) ( 8MB)
mem06: type= 7, attr=0x8, range=[0x000000000481f000-0x000000003e876000) ( 928MB)
mem07: type= 5, attr=0x8000000000000008, range=[0x000000003e876000-0x000000003eb8e000) ( 3MB)
mem08: type= 4, attr=0x8, range=[0x000000003eb8e000-0x000000003ee7a000) ( 2MB)
...
4. Use %d instead of %u for the index field, as i is a signed int.
N.B: This code is not compiled unless EFI_DEBUG is non 0.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
__FUNCTION__ is gcc-specific, use __func__
Long lines have been kept where they exist, some small spacing changes
have been done.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When !CONFIG_SMP, cpu_physical_id() is ia64_get_lid(), which is
functionally identical to
(ia64_getreg(_IA64_REG_CR_LID) >> 16) & 0xffff
so there's no need for two versions of this code.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove all code which does exactly the same thing as ptrace_request().
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
find_thread_for_addr() is no longer needed. It was only used to find
the correct kernel RBS for a given memory address, but since the kernel
RBS is not needed any longer, this function can go away.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Syncing is no longer needed, because user RBS is already
up-to-date. Actually, if a debugger modified the contents
of the original RBS prior to changing PT_AR_BSP, the
modifications would get overwritten.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Because the user RBS of a process is now completely stored in
user-mode when the process is ptrace-stopped, accesses to the
RBS should no longer augment any part of the kernel RBS.
This means we can get rid of most ia64_peek() and ia64_poke()
calls.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes the following compile error with a recent gcc:
CC kernel/kprobes.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/kprobes.c:1066: error: __ksymtab_jprobe_return causes a section type conflict
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes regression introduced in 113134fcbc
Intel Tiger platforms hang when calling SAL_GET_PHYSICAL_ID_INFO
instead of properly returning -1 for unimplemented, so add a
version check.
SGI Altix platforms have an incorrect SAL version hard-coded into
their prom -- they encode 2.9, but actually implement 3.2 -- so
fix it up and allow ia64_sal_get_physical_id_info to keep
working.
Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that the following error message is sometimes displayed
at irq migration when vector domain is enabled.
"Unexpected interrupt vector %d on CPU %d is not mapped to any IRQ!"
The cause of this problem is an interrupt is sent to the previous
target CPU after cleaning up vector to irq mapping table. To clean up
vector to irq map on the previous target CPU safty, change the irq
migration in multiple vector domain as follows. The original idea is
from x86 interrupt management code.
- Delay vector to irq table cleanup until the interrupts are sent
to new target CPUs. By this, it is ensured that target CPU is
completely changed on the interrupt controller side.
- Even after the interrupts are sent to new target CPUs, there can
be pended interrupts remaining on the previous target CPU. So we
need to delay clearning up vector to irq table until the pended
interrupt is handled. For this, send IPI to the previous target
CPU with lower priority vector and clean up vector to irq table
in its handler.
This patch affects only to irq migration code with multiple vector
domain is enabled.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The similar check has been added to x86_32(i386) in commit
id 83bd01024b.
So we add this check to ia64 and improve it a liitle bit in that
we need to check for stack overflow only when the signal is on stack.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements VIRT_CPU_ACCOUNTING for ia64,
which enable us to use more accurate cpu time accounting.
The VIRT_CPU_ACCOUNTING is an item of kernel config, which s390
and powerpc arch have. By turning this config on, these archs
change the mechanism of cpu time accounting from tick-sampling
based one to state-transition based one.
The state-transition based accounting is done by checking time
(cycle counter in processor) at every state-transition point,
such as entrance/exit of kernel, interrupt, softirq etc.
The difference between point to point is the actual time consumed
during in the state. There is no doubt about that this value is
more accurate than that of tick-sampling based accounting.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix large MCA bootmem allocation
[IA64] Simplify cpu_idle_wait
[IA64] Synchronize RBS on PTRACE_ATTACH
[IA64] Synchronize kernel RSE to user-space and back
[IA64] Rename TIF_PERFMON_WORK back to TIF_NOTIFY_RESUME
[IA64] Wire up timerfd_{create,settime,gettime} syscalls
The MCA code allocates bootmem memory for NR_CPUS, regardless
of how many cpus the system actually has. This change allocates
memory only for cpus that actually exist.
On my test system with NR_CPUS = 1024, reserved memory was reduced by 130944k.
Before: Memory: 27886976k/28111168k available (8282k code, 242304k reserved, 5928k data, 1792k init)
After: Memory: 28017920k/28111168k available (8282k code, 111360k reserved, 5928k data, 1792k init)
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is just Venki's patch[*] for x86 ported to ia64.
* http://marc.info/?l=linux-kernel&m=120249201318159&w=2
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When attaching to a stopped process, the RSE must be explicitly
synced to user-space, so the debugger can read the correct values.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is base kernel patch for ptrace RSE bug. It's basically a backport
from the utrace RSE patch I sent out several weeks ago. please review.
when a thread is stopped (ptraced), debugger might change thread's user
stack (change memory directly), and we must avoid the RSE stored in
kernel to override user stack (user space's RSE is newer than kernel's
in the case). To workaround the issue, we copy kernel RSE to user RSE
before the task is stopped, so user RSE has updated data. we then copy
user RSE to kernel after the task is resummed from traced stop and
kernel will use the newer RSE to return to user.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since the RSE synchronization will need a TIF_ flag, but all
work-to-be-done bits are already used, so we have to multiplex
TIF_NOTIFY_RESUME again.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix typo in comments.
BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the configuration dependencies in the vmcoreinfo data.
i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
and x86_64's one is defined in arch/x86/mm/numa_64.c.
They depend on CONFIG_NUMA:
arch/x86/mm/Makefile_32:7
obj-$(CONFIG_NUMA) += discontig_32.o
arch/x86/mm/Makefile_64:7
obj-$(CONFIG_NUMA) += numa_64.o
ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
arch/ia64/mm/Makefile:9-10
obj-$(CONFIG_DISCONTIGMEM) += discontig.o
obj-$(CONFIG_SPARSEMEM) += discontig.o
ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
and it depends on CONFIG_NUMA:
arch/ia64/mm/Makefile:8
obj-$(CONFIG_NUMA) += numa.o
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is for the vmcoreinfo data.
The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.
This patch:
VMCOREINFO_SIZE() should be renamed VMCOREINFO_STRUCT_SIZE() since it's always
returning the size of the struct with a given name. This change would allow
VMCOREINFO_TYPEDEF_SIZE() to simply become VMCOREINFO_SIZE() since it need not
be used exclusively for typedefs.
This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0582.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
calibrate_delay() must be __cpuinit, not __{dev,}init.
I've verified that this is correct for all users.
While doing the latter, I also did the following cleanups:
- remove pointless additional prototypes in C files
- ensure all users #include <linux/delay.h>
This fixes the following section mismatches with CONFIG_HOTPLUG=n,
CONFIG_HOTPLUG_CPU=y:
WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christian Zankel <chris@zankel.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] make pfm_get_task work with virtual pids
[IA64] honor notify_die() returning NOTIFY_STOP
[IA64] remove dead code: __cpu_{down,die} from !HOTPLUG_CPU
[IA64] Appoint kvm/ia64 Maintainers
[IA64] ia64_set_psr should use srlz.i
[IA64] Export three symbols for module use
[IA64] mca style cleanup
[IA64] sn_hwperf semaphore to mutex
[IA64] generalize attribute of fsyscall_gtod_data
[IA64] efi.c Add /* never reached */ annotation
[IA64] efi.c Spelling/punctuation fixes
[IA64] Make efi.c mostly fit in 80 columns
[IA64] aliasing-test: fix gcc warnings on non-ia64
[IA64] Slim-down __clear_bit_unlock
[IA64] Fix the order of atomic operations in restore_previous_kprobes on ia64
[IA64] constify function pointer tables
[IA64] fix userspace compile error in gcc_intrin.h
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This pid comes from user space, so treat it accordingly.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This requires making die() and die_if_kernel() return a value, and their
callers to honor this (and be prepared that it returns).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Neither __cpu_down() nor __cpu_die() are being referenced without
CONFIG_HOTPLUG_CPU.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The only in kernel use of ia64_set_psr() needs to follow
it with a srlz.i (since it is changing state for PSR.ic).
So it is pointless to issue srlz.d inside this function.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since kvm/module needs to use some unexported functions in kernel,
so export them with this patch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In an ordinary way,
> } __attribute__ ((aligned (L1_CACHE_BYTES)));
should be
> } ____cacheline_aligned;
to save some bytes on an uni-processor.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As written, this loop could be for (;;) instead of do while (md). The tests
inside the loop always result in a return so the loop never terminates normally.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Incorporates the suggestions from Peter Chubb the last time I submitted
this. This called for using the same verb tense in the couple of preceding
comments as well.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is purely whitespace changes to make the code fit in 80
columns, plus fix some inconsistent indentation. The efi_guidcmp()
tests remain wider than 80-columns since that seems to be the most
clear.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the order of atomic operations to prevent overwriting prev_kprobe[0].
To pop values from stack, we must decrement stack index right AFTER
reading values.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ACPI_PDC_SMP_T_SWCOORD bit is set by and OS that is capable of
native ACPI throttling software coordination for mutli-processors
using the _TSD information.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
percpu_modcopy() is defined multiple times in arch files. However, the only
user is module.c. Put a static definition into module.c and remove
the definitions from the arch files.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- add support for PER_CPU_ATTRIBUTES
- fix generic smp percpu_modcopy to use per_cpu_offset() macro.
Add the ability to use generic/percpu even if the arch needs to override
several aspects of its operations. This will enable the use of generic
percpu.h for all arches.
An arch may define:
__per_cpu_offset Do not use the generic pointer array. Arch must
define per_cpu_offset(cpu) (used by x86_64, s390).
__my_cpu_offset Can be defined to provide an optimized way to determine
the offset for variables of the currently executing
processor. Used by ia64, x86_64, x86_32, sparc64, s/390.
SHIFT_PTR(ptr, offset) If an arch defines it then special handling
of pointer arithmentic may be implemented. Used
by s/390.
(Some of these special percpu arch implementations may be later consolidated
so that there are less cases to deal with.)
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.
This is a preparational patch - alone it does not buy
us much good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The compiler team did the hard work for this distilling a problem in
large fortran application which showed up when applied to a 290MB input
data set down to this instruction:
ldfd f34=[r17],-8
Which they noticed incremented r17 by 0x10 rather than decrementing it
by 8 when the value in r17 caused an unaligned data fault. I tracked
it down to some bad instruction decoding in unaligned.c. The code
assumes that the 'x' bit can determine whether the instruction is
an "ldf" or "ldfp" ... which it is for opcode=6 (see table 4-29 on
page 3:302 of the SDM). But for opcode=7 the 'x' bit is irrelevent,
all variants are "ldf" instructions (see table 4-36 on page 3:306).
Note also that interpreting the instruction as "ldfp" means that the
"paired" floating point register (f35 in the example here) will also
be corrupted.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently CMCI mask of hot-added CPU is always disabled after CPU hotplug.
We should adjust this mask depending on CMC polling state.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes an unused variable warning in mm/vmalloc.c.
Tony: also fix resulting fallout in uncached.c with a
typo in args to flush_tlb_kernel_range().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the following assembler warning messages.
AS arch/ia64/kernel/head.o
arch/ia64/kernel/head.S: Assembler messages:
arch/ia64/kernel/head.S:1179: Warning: Use of 'ld8' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1179: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
arch/ia64/kernel/head.S:1180: Warning: Use of 'ld8' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1180: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
:
arch/ia64/kernel/head.S:1213: Warning: Use of 'ldf.fill.nta' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1213: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the following compiler warning messages.
CC arch/ia64/kernel/irq_ia64.o
arch/ia64/kernel/irq_ia64.c: In function 'create_irq':
arch/ia64/kernel/irq_ia64.c:343: warning: 'domain.bits[0u]' may be used uninitialized in this function
arch/ia64/kernel/irq_ia64.c: In function 'assign_irq_vector':
arch/ia64/kernel/irq_ia64.c:203: warning: 'domain.bits[0u]' may be used uninitialized in this function
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I tried to upgrade an IA32 chroot on my IA64 to a new glibc with TLS.
It kept dying because set_thread_area was returning -ESRCH
(bugs.debian.org/451939).
I instrumented arch/ia64/ia32/sys_ia32.c:get_free_idx() and ended up
seeing output like
[pid] idx desc->a desc->b
-----------------------------
[2710] 0 -> c6b0ffff 40dff31b
[2710] 1 -> 0 0
[2710] 2 -> 0 0
[2710] 0 -> c6b0ffff 40dff31b
[2710] 1 -> c6b0ffff 40dff31b
[2710] 2 -> 0 0
[2711] 0 -> c6b0ffff 40dff31b
[2711] 1 -> c6b0ffff 40dff31b
[2711] 2 -> 48c0ffff 40dff317
which suggested to me that TLS pointers were surviving exec() calls,
leading to GDT pointers filling up and the eventual failure of
get_free_idx().
I think the solution is flushing the tls array on exec.
Signed-Off-By: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ia64 oops message doesn't include the kernel version, which
makes it hard to automatically categorize oops messages scraped
from mailing lists and bug databases.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes some redundant code in the function setup_sigcontext().
The registers ar.ccv,b7,r14,ar.csd,ar.ssd,r2-r3 and r16-r31 are not
restored in restore_sigcontext() when (flags & IA64_SC_FLAG_IN_SYSCALL) is
true. So we don't need to zero those variables in setup_sigcontext().
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ACPI tables follow a tree structure in memory.
The root of the tree is the RSDP (Root System Description Pointer).
To find the RSDP, the OS searches for the signature "RSD PTR "
in well known physical memory locations. Then the OS computes
a table checksum to verify that the signature is really part
of a valid table header.
Some systems have a proper signature but an invalid checksum;
followed elsewhere by a proper signature with valid checksum.
http://bugzilla.kernel.org/show_bug.cgi?id=9444
The Linux RSDP scanning code bailed out on those systems
and as a result they booted with ACPI disabled.
Fix this by deleting the Linux RSDP scanning code and
plugging in the ACPICA RSDP scanning code.
Signed-off-by: Len Brown <len.brown@intel.com>
.. as it it used only during early boot.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
arch/ia64/kernel/acpi.c | 2 +-
arch/x86/kernel/acpi/boot.c | 4 ++--
drivers/acpi/osl.c | 3 ++-
3 files changed, 5 insertions(+), 4 deletions(-)
Signed-off-by: Len Brown <len.brown@intel.com>
If "CPEI Processor Override" bit is not set in "Platform Interrupt
Source Flags" in "Platform Interrupt Sources Structure" in ACPI MADT,
the target processor of CPEI is restricted to a specific CPU. Because
of this, the delivery mode for CPEI should be IOSAPIC_FIXED.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Restore regs->ccr_iip before kreturn probe handler runs. In this way, if
probe handler does unwind, unwind can correctly get the stack trace.
Fixes: http://sourceware.org/bugzilla/show_bug.cgi?id=5051
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
'!' has a higher priority than '&', so as was
this won't test the first bit, but rather evaluates to false for any non-zero
lsapic->lapic_flags.
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Macro efi_md_size is defined in efi.c, and here we apply it throughout
efi.c.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rename _bss to __bss_start as on other architectures. That makes it
possible to use the <linux/sections.h> instead of own declarations. Also
add __bss_stop because that symbol exists on other architectures.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make some IOSAPIC functions static and remove one that is unused.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not all the return value of __copy_from_user and
__put_user is checked.This patch fixed it.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
With the unionfs patch applied I get
ERROR: "copy_page" [fs/unionfs/unionfs.ko] undefined!
the other architectures (some, at least) export copy_page() so I guess ia64
should also do so.
To do this we need to move the copy_page() functions out of lib.a and into
built-in.o and add the EXPORT_SYMBOL().
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
/opt/crosstool/gcc-3.4.5-glibc-2.3.6/ia64-unknown-linux-gnu/lib/gcc/ia64-unknown-linux-gnu/3.4.5/../../../../ia64-unknown-linux-gnu/bin/ld: section .data.patch [a000000000000500 -> a000000000000507] overlaps section .dynamic [a0000000000003c8 -> a000000000000507]
This only appears to be a problem with strangely configured
cross-compilation ... native compilers don't have this issue.
But in the interests of helping others at least compile for
ia64, this can go in. -Tony
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.
But ia64 registers it as IORESOURCE_MEM only.
In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.
This difference causes a failure of memory unplug of x86-64. This patch
fixes it.
This patch adds IORESOURCE_BUSY to avoid potential overlap mapping by PCI
device.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Luck, Tony" <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On Altix (sn2) machines the "Error parsing MADT" message is
misleading because the lack of IOSAPIC entries is expected.
Since I am sure someone will ask, I have been told that
the chance of this changing anytime soon is close to nil.
Signed-off-by: George Beshers <gbeshers@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Newer Itanium versions have added additional processor feature set
bits. This patch prints all the implemented feature set bits. Some
bit descriptions have not been made public. For those bits, a generic
"Feature set X bit Y" message is printed. Bits that are not implemented
will no longer be printed.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that redirect hit bit in I/O SAPIC RTE is set even
when it must be disabled (e.g. nointroute boot option is set, CPU
hotplug is enabled or percpu vector is enabled).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Clean up /proc/interrupts output on the system that has 10 or more
CPUs.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When the CPE handler encounters too many CPEs (such as a solid single
bit memory error), it sets up a polling timer and disables the CPE
interrupt (to avoid excessive overhead logging the stream of single
bit errors). disable_irq_nosync() calls chip->disable() to provide
a chipset specifiec interface for disabling the interrupt. This patch
adds the Altix specific support to disable and re-enable the CPE interrupt.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
No need to print "McKinley Errata 9 workaround not needed; disabling it"
on every non-McKinley Itanium, which at this point is almost all of them.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
If you build the kernel `in-place' then do a git update, git
complains about arch/ia64/kernel/gate.lds being modified and
untracked.
Add that (generated) file to a .gitignore file.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not sizeof(ptr) ... we meant to say sizeof(*ptr).
Also moved the memset to the error path (the normal path overwrites
every field in the structure anyway) -Tony
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
New sanity checks in sysctl_check_table() complain about a couple
of mode 0755 that should be 0555 in the perfmon code:
sysctl table check failed: /kernel .1 Writable sysctl directory
sysctl table check failed: /kernel/perfmon Writable sysctl directory
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that pci_enable_msi() fails on ia64 platform. The cause of
this problem is incorrect return value of ia64_setup_msi_irq(). It must
return 0 on success, instead of irq number.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Clean up the process for presenting the "physical id" field in
/proc/cpuinfo.
- remove global smp_num_cpucores, as it is mostly useless
- remove check_for_logical_procs(), since we do the same
functionality in identify_siblings()
- reflow logic in identify_siblings(). If an older CPU
does not implement PAL_LOGICAL_TO_PHYSICAL, we may still
be able to get useful information from SAL_PHYSICAL_ID_INFO
- in identify_siblings(), threads/cores are a property of
the CPU, not the platform
- remove useless printk's about multi-core / thread
capability in identify_siblings(), as that information
is readily available in /proc/cpuinfo, and printing for
the BSP only adds little value
- smp_num_siblings is now meaningful if any CPU in the
system supports threads, not just the BSP
- expose "physical id" field, even on CPUs that are not
multi-core / multi-threaded (as long as we have a valid
value). Now we know what sockets Madisons live in too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When gcc uses --build-id by default, the gate.lds.S linker script runs afoul
of the new note section and produces a bad DSO image. This fixes it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
vmcore_find_descriptor_size() is only called by
reserve_elfcorehdr(), which is in __init, so it seems to me that
vmcore_find_descriptor_size() should be there too.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add the BSS to the resource tree just as kernel text and kernel data are in
the resource tree. The main reason behind this is to avoid crashkernel
reservation in that area.
While it's not strictly necessary to have the BSS in the resource tree (the
actual collision detection is done in the reserve_bootmem() function before),
the usage of the BSS resource should be presented to the user in /proc/iomem
just as Kernel data and Kernel code.
Note: The patch currently is only implemented for x86 and ia64 (because
efi_initialize_iomem_resources() has the same signature on i386 and ia64).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adapts IA64 to use the generic parse_crashkernel() function instead
of its own parsing for the crashkernel command line.
Because the total amount of System RAM must be known when calling this
function, efi_memmap_init() is modified to return its accumulated total_memory
variable.
Also, the crashkernel handling is moved in an own function in
arch/ia64/kernel/setup.c to make the code more readable.
[kamalesh@linux.vnet.ibm.com: build fix]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.
The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
include powerpc), it is possible for the vdso to get out of sync if a user
calls (admittedly unusual) settimeofday(NULL, ptr).
This patch adds a hook for architectures that set
CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
updatee their copy in the vdso.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/ia64/kernel/machine_kexec.c: In function `arch_crash_save_vmcoreinfo':
arch/ia64/kernel/machine_kexec.c:131: error: `pgdat_list' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:131: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/machine_kexec.c:131: error: for each function it appears in.)
arch/ia64/kernel/machine_kexec.c:134: error: `node_memblk' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:135: error: `NR_NODE_MEMBLKS' undeclared (first use in this function)
arch/ia64/kernel/machine_kexec.c:136: error: invalid application of `sizeof' to incomplete type `node_memblk_s'
arch/ia64/kernel/machine_kexec.c:137: error: dereferencing pointer to incomplete type
arch/ia64/kernel/machine_kexec.c:138: error: dereferencing pointer to incomplete type
make[1]: *** [arch/ia64/kernel/machine_kexec.o] Error 1
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros
were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is
impossible to grep for them. So these names should be changed. This
discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch set frees the restriction that makedumpfile users should install a
vmlinux file (including the debugging information) into each system.
makedumpfile command is the dump filtering feature for kdump. It creates a
small dumpfile by filtering unnecessary pages for the analysis. To
distinguish unnecessary pages, it needs a vmlinux file including the debugging
information. These days, the debugging package becomes a huge file, and it is
hard to install it into each system.
To solve the problem, kdump developers discussed it at lkml and kexec-ml. As
the result, we reached the conclusion that necessary information for dump
filtering (called "vmcoreinfo") should be embedded into the first kernel file
and it should be accessed through /proc/vmcore during the second kernel.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html)
Dan Aloni created the patch set for the above implementation.
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html)
And I updated it for multi architectures and memory models.
(http://lists.infradead.org/pipermail/kexec/2007-August/000479.html)
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d5a7430ddc missed a spot where we
use cpu_sibling_map and cpu_core_map. These don't exist on a
uni-processor build. Wrap #ifdef CONFIG_SMP ... #endif around it.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This cleans up the formatting in the vDSO linker script, mostly just the
use of whitespace. It's intended to approximate the kernel standard
conventions for indenting C, treating elements of the linker script about
like initialized variable definitions.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.
This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The checks for node_online in the uncached allocator are made to make sure
that memory is available on these nodes. Thus switch all the checks to use
N_HIGH_MEMORY and to N_ONLINE.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the problem that kdump on INIT hung up if kdump kernel image is
not configured.
The kdump_init_notifier() on monarch CPU stops its operation at
DIE_INIT_MONARCH_LEAVE time if the kdump kernel image is not
configured. On the other hand, kdump_init_notifier() on non-monarch
CPUs get into spin because they don't know the fact the monarch stops
its operation. This is the cause of this problem. To fix this problem,
we need to check the kdump kernel image at the top of the
kdump_init_notifier() function.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that kdump on INIT causes a kernel panic if kdump
kernel image is not configured. The cause of this problem is
machine_kexec_on_init() is using printk in INIT context. It should
use ia64_mca_printk() instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The use of vector in ia64_machine_kexec() seems spurious,
and removing it simplifies the code slightly.
As suggested by Alex Williamson <alex.williamson@hp.com>
Cc: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Additional testing uncovered a situation where the MCA recovery code could
hang due to a race condition.
According to the SAL spec, SAL sends a rendezvous interrupt to all but the first
CPU that goes into MCA. This includes other CPUs that go into MCA at the same
time. Those other CPUs will go into the linux MCA handler (rather than the
slave loop) with the rendezvous interrupt pending. When all the CPUs have
completed MCA processing and the last monarch completes, freeing all the CPUs,
the CPUs with the pended rendezvous interrupt then go into the
ia64_mca_rendez_int_handler(). In ia64_mca_rendez_int_handler() the CPUs
get marked as rendezvoused, but then leave the handler (due to no MCA).
That leaves the CPUs marked as rendezvoused _before_ the next MCA event.
When the next MCA hits, the monarch will mistakenly believe that all the CPUs
are rendezvoused when they are not, opening up a window where a CPU can get
stuck in the slave loop.
This patch avoids leaving CPUs marked as rendezvoused when they are not.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing the MCA recovery code, noticed that some machines would have a
five second delay rendezvousing cpus. What was happening is that
ia64_wait_for_slaves() would check to see if all the slave CPUs had
rendezvoused. If any had not, it would wait 1 millisecond then check again.
If any CPUs had still not rendezvoused, it would wait 5 seconds before
checking again.
On some configs the rendezvous takes more than 1 millisecond, causing the code
to wait the full 5 seconds, even though the last CPU rendezvoused after only
a few milliseconds.
The fix is to check every 1 millisecond to see if all the cpus have
rendezvoused. After 5 seconds the code concludes the CPUs will never
rendezvous (same as before).
The MCA code is, by definition, not performance critical, but a needless
delay of 5 seconds is senseless. The 5 seconds also adds up quickly
when running the error injection code in a loop.
This patch both simplifies the code and removes the needless delay.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Because it is dead code and not referenced by anybody else (that file cannot
be built modular).
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* palinfo.c:
palinfo_cpu_notifier is a CPU hotplug notifier_block, and can be
marked __cpuinitdata, and the callback function palinfo_cpu_callback()
itself can be marked __cpuinit. create_palinfo_proc_entries() is only
called from __cpuinit callback or general __init code, therefore a
candidate for __cpuinit itself. remove_palinfo_proc_entries() is only
called from __cpuinit callback or general __exit code, therefore a
candidate for __cpuexit.
* salinfo.c:
The CPU hotplug notifier_block can be __cpuinitdata. The callback
salinfo_cpu_callback() is incorrectly marked __devinit -- it must
be __cpuinit instead.
* topology.c:
cache_sysfs_init() is only called at device_initcall() time so marking
it as __cpuinit is wrong and wasteful. It should be unconditionally
__init. Also cleanup reference to hotplug notifier callback function
from this function and replace with cache_add_dev(), which could also
enable us to use other tricks to replace __cpuinit{data} annotations,
as recently discussed on this list.
cache_shared_cpu_map_setup() is only ever called from __cpuinit-marked
functions hence both its definitions (SMP or !SMP) are candidates for
__cpuinit itself. Also all_cpu_cache_info can be __cpuinitdata because
only referenced from __cpuinit code.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
When PTRACE_SYSCALL was used and then PTRACE_DETACH is used, the
TIF_SYSCALL_TRACE flag is left set on the formerly-traced task. This
means that when a new tracer comes along and does PTRACE_ATTACH, it's
possible he gets a syscall tracing stop even though he's never used
PTRACE_SYSCALL. This happens if the task was in the middle of a system
call when the second PTRACE_ATTACH was done. The symptom is an
unexpected SIGTRAP when the tracer thinks that only SIGSTOP should have
been provoked by his ptrace calls so far.
A few machines already fixed this in ptrace_disable (i386, ia64, m68k).
But all other machines do not, and still have this bug. On x86_64, this
constitutes a regression in IA32 compatibility support.
Since all machines now use TIF_SYSCALL_TRACE for this, I put the
clearing of TIF_SYSCALL_TRACE in the generic ptrace_detach code rather
than adding it to every other machine's ptrace_disable.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch cleans up the `enable early console for SKI' patch
(471e7a4484), and
1. potentially allows the gensparse_defconfig to work again.
(there are other problems running a generic kernel on Ski)
2. fixes the `console registered twice' problem.
3. Cleans up the code by moving the `extern hpsim_cons' declaration to
a new asm/hpsim.h file.
Thanks to Jes for comments.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add additional support for CPU disable on SN platforms.
Correctly setup the smp_affinity mask for I/O error IRQs.
Restrict the use of the feature to Altix 4000 and 450 systems
running with a CPU disable capable PROM, and do not allow disabling
of CPU 0.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The pending interrupts can be remaining at boot up time on some
platform. This will cause spurious interrupts when interrupt is
enabled for the first time. This patch clears IVR at the CPU
initialization to eliminate such spurious interrupts.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix handling for spurious interrupts not being mapped to any IRQs.
Currently, spurious interrupts that are not mapped to any IRQs are
handled as IRQ 15 (== IA64_SPURIOUS_VECTOR). But it is not proper
because vector != irq. We need special handlings for such spurious
interrupts not being mapped to any IRQs.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When using Ski to debug early startup, it's a bit of a pain not to
have printk.
This patch enables the simulated console very early.
It may be worth conditionalising on the command line... but this is
enough for now.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The "ri" field in the processor status register only has defined
values of 0, 1, 2. Do not let ptrace set this to 3. As with
other reserved fields in registers we silently discard the value.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The core cpufreq code doesn't appear to understand returning -EAGAIN
for the get() function of the cpufreq_driver. If PAL_GET_PSTATE returns
-1, such as when running on Xen, scaling_cur_freq is happy to return
4294967285 kHz (ie. (unsigned)-11). The other drivers appear to return
0 for a failure, and doing so gives me the max frequency from
scaling_cur_frequency and "<unknown>" from cpuinfo_cur_frequency. I
believe that's the desired behavior.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Explicitly put the unwind section into its own program-header. This
used to be unnecessary (probably because binutils did it for us), but
with current binutils (e.g., v2.17.50.20070804) we won't get
the PT_IA_64_UNWIND header without this patch which will break
unwinding in a debugger and simulators such as Ski.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add NOTES to linker script such that the kernel can be built with
recent versions of binutils. Without this patch, final link fails
with this error:
ld: .tmp_vmlinux1: section `.text' can't be allocated in segment 0
ld: final link failed: Bad value
This error is due to the fact that the --build-id option is used
with newer linkers to include a .notes section on the kernel, but
without the NOTES macro, that section won't be included in the kernel
which then leads to the above error message.
Signed-off-by: David Mosberger-Tang <dmosberger@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use local_vector_to_irq() instead of looping through all NR_IRQS.
This avoids registering the CPE handler on multiple irqs. Only
register if the irq is valid. If no valid irq is found, print an
error message and set up polling.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add base support for implementing platform_irq_to_vector(), and
then use it on SN2.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While sending interrupts to a cpu to repeatedly wake a thread, on occasion
that thread will take a full timer tick cycle (4002 usec in my case)
to wakeup.
The problem concerns a race condition in the code around the safe_halt()
call in the default_idle() routine. Setting 'nohalt' on the kernel
command line causes the long wakeups to disappear.
void
default_idle (void)
{
local_irq_enable();
while (!need_resched()) {
--> if (can_do_pal_halt)
--> safe_halt();
else
A timer tick could arrive between the check for !need_resched and the
actual call to safe_halt() (which does a pal call to PAL_HALT_LIGHT).
By the time the timer tick completes, a thread that might now need to run
could get held up for as long as a timer tick waiting for the halted cpu.
I'm proposing that we disable irq's and check need_resched again before
calling safe_halt(). Does anyone see any problem with this approach?
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] ITC: Reduce rating for ITC clock if ITCs are drifty
[IA64] SN2: Fix up sn2_rtc clock
[IA64] Fix wrong access to irq_desc[] in iosapic_register_intr().
[IA64] Fix possible race in destroy_and_reserve_irq()
[IA64] Fix registered interrupt check
[IA64] Remove a few duplicate includes
[IA64] Allow smp_call_function_single() to current cpu
[IA64] fix a few section mismatch warnings
Make sure to reduce the rating of the ITC clock if ITCs are drifty. If they
are drifting then we have not synchronized the ITC values, nor are we doing
the jitter compensation (useless since drift may increase the differentials
arbitrarily).
Without this patch it is possible that the ITC clock becomes selected as
the system clock on systems with drifty ITCs which will result in
nanosleep hanging.
One can still select the itc clock manually on such systems via
clocksource=itc
(Produces nice hangs on SGI Altix.)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In error path we must unlock irq_desc[irq].lock before we change
'irq'.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove unused TIF_NOTIFY_RESUME flag for all processor architectures. The
flag was not used excecpt on IA-64 where the patch replaces it with
TIF_PERFMON_WORK.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, destroy_and_reserve_irq() sets irq_status[irq] UNUSED using
clear_irq_vector() and sets irq_status[irq] RSVD using reserve_irq().
But there is a race window because vector_lock is once released between
them. This patch fixes this race window.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that interrupts are not initialized correctly at PCI
hotplug or driver reloading time.
By vector domain change, the iosapic_rte_info structure was changed to
be on the iosapic_intr_info[irq].rtes list even after the interrupts
are unregistered. So iosapic_intr_info[irq].rtes list must not be
checked to see if there are registered interrupts (RTEs) on the
irq. We must check iosapic_intr_info[irq].count counter instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes a few duplicate includes from arch/ia64/
Acked-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This removes the requirement for callers to get_cpu() to check in simple
cases. i386 and x86_64 already received a similar treatment.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the following section mismatch warnings:
WARNING: vmlinux.o(.text+0x41902): Section mismatch: reference to .init.text:__alloc_bootmem (between 'ia64_mca_cpu_init' and 'ia64_do_tlb_purge')
WARNING: vmlinux.o(.text+0x49222): Section mismatch: reference to .init.text:__alloc_bootmem (between 'register_intr' and 'iosapic_register_intr')
WARNING: vmlinux.o(.text+0x62beb2): Section mismatch: reference to .init.text:__alloc_bootmem_node (between 'hubdev_init_node' and 'cnodeid_get_geoid')
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix wrong return value in parse_vector_domain().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ia64's acpi_gsi_to_irq() function assumes irq == vector. But in
fact irq can be different from vector. This patch fix this wrong
assumption.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add some sanity checks into __bind_irq_vector().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
u32* volatile cyclone_timer means volatile auto pointer to u32,
which is clearly not what had been intended (we never even take
the address of that variable, let alone pass it to something that
could change it behind our back). u32 volatile * is what the
authors apparently wanted to say, but in reality we don't need that
qualifier there at all - it's (properly) only passed to iomem helpers
which takes care of that stuff just fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Nail two more simple section mismatch errors
[IA64] fix section mismatch warnings
[IA64] rename partial_page
[IA64] Ensure that machvec is set up takes place before serial console
[IA64] vector-domain - fix vector_table
[IA64] vector-domain - handle assign_irq_vector(AUTO_ASSIGN)
pcibios_setup (between 'pci_setup' and 'quirk_mellanox_tavor')
setup_profiling_timer (between 'write_profile' and 'delayed_put_task_struct')
Signed-off-by: Tony Luck <tony.luck@intel.com>
In 741f98fe29 Sam added full
checking across the entire vmlinux image. This flushed out
a dozen new section mismatch warnings. Start the whack-a-mole
game again to stomp them out.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jens has added a partial_page thing in splice whcih conflicts with the ia64
one. Rename ia64 out of the way. (ia64 chose poorly).
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Parse the machvec command line option outside of the early_param()
so that ia64_mv is set before any console intialisation that
may result from early_param parsing.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This change fixes a panic when assign_irq_vector(irq) is called with
irq = AUTO_ASSIGN.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As it was a synonym for (CONFIG_ACPI && CONFIG_X86),
the ifdefs for it were more clutter than they were worth.
For ia64, just add a few stubs in anticipation of future
S3 or S4 support.
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Prevent people from directly including <asm/rwsem.h>.
[IA64] remove time interpolator
[IA64] Convert to generic timekeeping/clocksource
[IA64] refresh some config files for 64K pagesize
[IA64] Delete iosapic_free_rte()
[IA64] fallocate system call
[IA64] Enable percpu vector domain for IA64_DIG
[IA64] Enable percpu vector domain for IA64_GENERIC
[IA64] Support irq migration across domain
[IA64] Add support for vector domain
[IA64] Add mapping table between irq and vector
[IA64] Check if irq is sharable
[IA64] Fix invalid irq vector assumption for iosapic
[IA64] Use dynamic irq for iosapic interrupts
[IA64] Use per iosapic lock for indirect iosapic register access
[IA64] Cleanup lock order in iosapic_register_intr
[IA64] Remove duplicated members in iosapic_rte_info
[IA64] Remove block structure for locking in iosapic.c
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
> arch/ia64/kernel/iosapic.c:597: warning: 'iosapic_free_rte' defined but not used
>
> This isn't spurious, the only call to iosapic_free_rte() has been removed, but there
> is still a call to iosapic_alloc_rte() ... which means we must have a memory leak.
I did it on purpose (and gave the warning a miss...) and I consider
iosapic_free_rte() is no longer needed.
I decided to remain iosapic_rte_info to keep gsi-to-irq binding
after device disable. Indeed it needs some extra memory, but it
is only "sizeof(iosapic_rte_info) * <the number of removed devices>"
bytes and has no memory leak becasue re-enabled devices use the
iosapic_rte_info which they used before disabling.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
sys_fallocate for ia64. This uses an empty slot #1303 erroneously
marked as reserved for move_pages (which had already been allocated
as syscall #1276)
Signed-Off-By: Dave Chinner <dgc@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since Ingo's recent scheduler rewrite which was merged as commit
0437e109e1 sched_cacheflush is unused.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute. Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.
This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.
One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.
This patch:
Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I realise jprobes are a razor-blades-included type of interface, but that
doesn't mean we can't try and make them safer to use. This guy I know once
wrote code like this:
struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" };
And then his kernel exploded. Oops.
This patch adds an arch hook, arch_deref_entry_point() (I don't like it
either) which takes the void * in a struct jprobe, and gives back the text
address that it represents.
We can then use that in register_jprobe() to check that the entry point we're
passed is actually in the kernel text, rather than just some random value.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Clean away some code inside some non-existent CONFIG ifdefs
[IA64] ar.itc access must really be after xtime_lock.sequence has been read
[IA64] correctly count CPU objects in the ia64/sn hwperf interface
[IA64] arbitary speed tty ioctl support
[IA64] use machvec=dig on hpzx1 platforms
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the kernelcore= parameter for x86.
Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.
From: Yasunori Goto <y-goto@jp.fujitsu.com>
When "kernelcore" boot option is specified, kernel can't boot up on ia64
because of an infinite loop. In addition, the parsing code can be handled
in an architecture-independent manner.
This patch uses common code to handle the kernelcore= parameter. It is
only available to architectures that support arch-independent zone-sizing
(i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
ignore the boot parameter.
[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-CPU vector domain support for IA64_DIG. It is enabled by
adding the "vector=percpu" boot option.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add per-CPU vector domain support for IA64_GENERIC. It is enabled by
adding the "vector=percpu" boot option.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add support for IRQ migration across vector domain.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add fundamental support for multiple vector domain. There still exists
only one vector domain even with this patch. IRQ migration across
domain is not supported yet by this patch.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add mapping tables between irqs and vectors, and its management code.
This is necessary for supporting multiple vector domain because 1:1
mapping between irq and vector will be changed to n:1.
The irq == vector relationship between irqs and vectors is explicitly
remained for percpu interrupts, platform interrupts, isa IRQs and
vectors assigned using assign_irq_vector() because some programs might
depend on it.
And I should consider the following problem.
When pci drivers enabled/disabled devices dynamically, its irq number
is changed to the different one. Therefore, suspend/resume code may
happen problem.
To fix this problem, I bound gsi to irq.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Need to check if irq is sharable amoung handlers when searching
sharable IOSAPIC irq.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Many of IOSAPIC codes depends on the flollowing assumptions, but these
would become invalid when multiple vector domain will be supported in
the future.
- 1:1 mapping between IRQ and vector
- IRQ == vector
To fix those invalid assumptions, this patch changes iosapic_intr_info[]
to be indexed by irq number instead of vector.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cleanup order of irq_desc.lock and iosapic_lock in
iosapic_register_intr() and iosapic_unregister_intr().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove duplicated members in iosapic_rte_info in iosapic.c. This patch
has no functional changes.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove unnecessary indent between spin_lock() and spin_unlock() in
iosapic.c. This has no functional changes.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in
serial initializing stage. the console_init=>serial8250_console_init=>
register_console=>serial8250_console_setup will return -ENDEV, and console
ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in
drivers/serial/serial_core.c to call register_console to get console ttyS0.
that is too late.
Make early_uart to use early_param, so uart console can be used earlier. Make
it to be bootconsole with CON_BOOT flag, so can use console handover feature.
and it will switch to corresponding normal serial console automatically.
new command line will be:
console=uart8250,io,0x3f8,9600n8
console=uart8250,mmio,0xff5e0000,115200n8
or
earlycon=uart8250,io,0x3f8,9600n8
earlycon=uart8250,mmio,0xff5e0000,115200n8
it will print in very early stage:
Early serial console at I/O port 0x3f8 (options '9600n8')
console [uart0] enabled
later for console it will print:
console handover: boot [uart0] -> real [ttyS0]
Signed-off-by: <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ".acq" semantics of the load only apply w.r.t. other data access.
Reading the clock (ar.itc) isn't a data access so strange things can
happen here. Specifically the read of ar.itc can be launched as soon
as the read of xtime_lock.sequence is ISSUED. Since this may cache
miss, and that might cause a thread switch, and there may be cache
contention for the line containing xtime_lock, it may be a long time
before the actual value is returned, so the ar.itc value may be very
stale.
Move the consumption of r28 up before the read of ar.itc to make sure
that we really have got the current value of xtime_lock.sequence
before look at ar.itc.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Support multiple CPUs going through OS_MCA
[IA64] silence GCC ia64 unused variable warnings
[IA64] prevent MCA when performing MMIO mmap to PCI config space
[IA64] add sn_register_pmi_handler oemcall
[IA64] Stop bit for brl instruction
[IA64] SN: Correct ROM resource length for BIOS copy
[IA64] Don't set psr.ic and psr.i simultaneously
Linux does not gracefully deal with multiple processors going
through OS_MCA aa part of the same MCA event. The first cpu
into OS_MCA grabs the ia64_mca_serialize lock. Subsequent
cpus wait for that lock, preventing them from reporting in as
rendezvoused. The first cpu waits 5 seconds then complains
that all the cpus have not rendezvoused. The first cpu then
handles its MCA and frees up all the rendezvoused cpus and
releases the ia64_mca_serialize lock. One of the subsequent
cpus going thought OS_MCA then gets the ia64_mca_serialize
lock, waits another 5 seconds and then complains that none of
the other cpus have rendezvoused.
This patch allows multiple CPUs to gracefully go through OS_MCA.
The first CPU into ia64_mca_handler() grabs a mca_count lock.
Subsequent CPUs into ia64_mca_handler() are added to a list of cpus
that need to go through OS_MCA (a bit set in mca_cpu), and report
in as rendezvoused, and but spin waiting their turn.
The first CPU sees everyone rendezvous, handles his MCA, wakes up
one of the other CPUs waiting to process their MCA (by clearing
one mca_cpu bit), and then waits for the other cpus to complete
their MCA handling. The next CPU handles his MCA and the process
repeats until all the CPUs have handled their MCA. When the last
CPU has handled it's MCA, it sets monarch_cpu to -1, releasing all
the CPUs.
In testing this works more reliably and faster.
Thanks to Keith Owens for suggesting numerous improvements
to this code.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tell GCC to stop spewing out unnecessary warnings for unused variables
passed to functions as pointers for ia64 files.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
SDM says that brl instruction must be followed by a stop bit.
Fix instance in BRL_COND_FSYS_BUBBLE_DOWN where it isn't.
Signed-off-by: Christian Kandeler <christian.kandeler@hob.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's not a good idea to use "ssm psr.ic | psr.i" to simultaneously
enable interrupts and interrupt state collection, the two bits can
take effect asynchronously, so it is possible for an interrupt to
be serviced while psr.ic is still zero.
Signed-off-by: Tony Luck <tony.luck@intel.com>
the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.
this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.
(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)
under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove duplicate header include line from arch/ia64/kernel/time.c.
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Both rp_loc and pfs_loc can be in the register stack area _or_ they can
be in the memory stack area, the latter occurs when a struct pt_regs is
pushed. Correct the validation check on these fields to check for both
stack areas. Not allowing for memory stack locations means no
backtrace past ia64_leave_kernel, or any other code that uses
PT_REGS_UNWIND_INFO.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Section mismatch: reference to .init.text:acpi_find_rsdp
(between 'acpi_get_sysname' and 'acpi_request_vector')
acpi_get_sysname() needs to call the __init function acpi_find_rsdp, but it
doesn't have the __init attribute itself, hence the warning. Luckily it is
only called from machvec_init() which has __init attribute, so the fix
is to define acpi_get_sysname() as __init too.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Silly bug in _PDC data setup. Haven't seen any real side-effects of this one
yet. But, needs fixing regardless.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] fix kmalloc(0) in arch/ia64/pci/pci.c
[IA64] Only unwind non-running tasks.
[IA64] Improve unwind checking.
[IA64] Yet another section mismatch warning
[IA64] Fix bogus messages about system calls not implemented.
Unwinding a running task has proven problematic.
In one instance, the running task was attempting to unwind itself and
received an interrupt between when get_wchan allocated local variables on
the stack and when unw_init_from_blocked_task was called which resulted
in unw_init_frame_info to place this tasks task_struct pointer over the
switch stack's ar_bspstore entry.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds some sanity checks to keep register and memory stack
pointers in the unw_frame_info structure within the tasks stack address
range.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Get rid of the notifier list and call the kprobes code directly
if compiled in. This mirrors the changes that recently went
into powerpc, s390 and sparc64.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Building with GCC 4.2, I get the following error:
CC arch/ia64/kernel/mca.o
arch/ia64/kernel/mca.c:275: error: __ksymtab_ia64_mlogbuf_finish causes a
section type conflict
This is because ia64_mlogbuf_finish is both declared static and exported.
Fix by removing the export (which is unneeded now).
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Previous spelling patch from Simon Arlott broke one spot that
didn't need fixing (reported by Simon within 35 minutes of the
patch ... but not until after I'd applied to GIT and pushed :-(
Signed-off-by: Tony Luck <tony.luck@intel.com>
The current implementation of kdump on INIT events would enter
kdump processing on DIE_INIT_MONARCH_ENTER and DIE_INIT_SLAVE_ENTER
events. Thus, the monarch cpu would go ahead and boot up the kdump
On SN shub2 systems, this out-of-sync situation causes some slave
cpus on different nodes to enter POD.
This patch moves kdump entry points to DIE_INIT_MONARCH_LEAVE and
DIE_INIT_SLAVE_LEAVE. It also sets kdump_in_progress variable in
the DIE_INIT_MONARCH_PROCESS event to not dump all active stack
traces to the console in the case of kdump.
I have tested this patch on an SN machine and a HP RX2600.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Quicklist support for IA64
[IA64] fix Kprobes reentrancy
[IA64] SN: validate smp_affinity mask on intr redirect
[IA64] drivers/char/snsc_event.c:206: warning: unused variable `p'
[IA64] mca.c:121: warning: 'cpe_poll_timer' defined but not used
[IA64] Fix - Section mismatch: reference to .init.data:mvec_name
[IA64] more warning cleanups
[IA64] Wire up epoll_pwait and utimensat
[IA64] Fix warnings resulting from type-checking in dev_dbg()
[IA64] typo s/kenrel/kernel/
In case of reentrance i.e when a probe handler calls a functions which
inturn has a probe, we save a previous kprobe information and just single
step the reentrant probe without calling the actual probe handler. During
this reentracy period, if an interrupt occurs and if probe happens to
trigger in the inturrupt path, then we were corrupting the previous kprobe(
as we were overriding the previous kprobe info) info their by crashing the
system. This patch fixes this issues by having a an array of previous
kprobe info struct(with the array size of 2).
This similar technique is not needed on i386 and x86_64 because by default
interrupts are turn off in the break/int3 exception handler.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On SN, only allow one bit to be set in the smp_affinty mask when
redirecting an interrupt. Currently setting multiple bits is allowed, but
only the first bit is used in determining the CPU to redirect to. This has
caused confusion among some customers.
[akpm@linux-foundation.org: fixes]
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When auditing syscalls that send signals, log the pid and security
context for each target process. Optimize the data collection by
adding a counter for signal-related rules, and avoiding allocating an
aux struct unless we have more than one target process. For process
groups, collect pid/context data in blocks of 16. Move the
audit_signal_info() hook up in check_kill_permission() so we audit
attempts where permission is denied.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only shows up while building sim_defconfig because CONFIG_ACPI=n
there, and all of the uses of cpe_poll_timer are inside #ifdef CONFIG_ACPI.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Seems more than just deprecated, we can't build using SA_INTERUPT.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] wire up pselect, ppoll
[IA64] Add TIF_RESTORE_SIGMASK
[IA64] unwind did not work for processes born with CLONE_STOPPED
[IA64] Optional method to purge the TLB on SN systems
[IA64] SPIN_LOCK_UNLOCKED macro cleanup in arch/ia64
[IA64-SN2][KJ] mmtimer.c-kzalloc
[IA64] fix stack alignment for ia32 signal handlers
[IA64] - Altix: hotplug after intr redirect can crash system
[IA64] save and restore cpus_allowed in cpu_idle_wait
[IA64] Removal of percpu TR cleanup in kexec code
[IA64] Fix some section mismatch errors
This finally renames the thread_info field in task structure to stack, so that
the assumptions about this field are gone and archs have more freedom about
placing the thread_info structure.
Nonbroken archs which have a proper thread pointer can do the access to both
current thread and task structure via a single pointer.
It'll allow for a few more cleanups of the fork code, from which e.g. ia64
could benefit.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
[akpm@linux-foundation.org: build fix]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minor problem for mainstream. Big problem for checkpoint/restore,
because all the stopped/traced processes are born in this state,
hence they cannot be checkpointed again due to failing unwind.
The problem was identified as assumption in kernel unwind library
that top level frame is different of syscall frame. It is the case
unless process was born with CLONE_STOPPED.
Author: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds an optional method for purging the TLB on SN IA64 systems.
The change should not affect any non-SN system.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch provides a debugfs knob to turn kprobes on/off
o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
not (default enabled)
o Echoing 0 to this file will disarm all installed probes
o Any new probe registration when disabled will register the probe but
not arm it. A message will be printed out in such a case.
o When a value 1 is echoed to the file, all probes (including ones
registered in the intervening period) will be enabled
o Unregistration will happen irrespective of whether probes are globally
enabled or not.
o Update Documentation/kprobes.txt to reflect these changes. While there
also update the doc to make it current.
We are also looking at providing sysrq key support to tie to the disabling
feature provided by this patch.
[akpm@linux-foundation.org: Use bool like a bool!]
[akpm@linux-foundation.org: add printk facility levels]
[cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390]
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- consolidate duplicate code in all arch_prepare_kretprobe instances
into common code
- replace various odd helpers that use hlist_for_each_entry to get
the first elemenet of a list with either a hlist_for_each_entry_save
or an opencoded access to the first element in the caller
- inline add_rp_inst into it's only remaining caller
- use kretprobe_inst_table_head instead of opencoding it
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used to warn unless the EFI system table major revision was exactly 1.
But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't
affect anything in Linux.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In certain cases like when the real return address can't be found or when
the number of tracked calls to a kretprobed function is less than the
number of returns, we may not be able to find the correct return address
after processing a kretprobe. Currently we just do a BUG_ON, but no
information is provided about the actual failing kretprobe.
Print out details of the kretprobe before calling BUG().
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the size of the per-cpu region reserved to save crash notes is
set by the per-architecture value MAX_NOTE_BYTES. Which in turn is
currently set to 1024 on all supported architectures.
While testing ia64 I recently discovered that this value is in fact too
small. The particular setup I was using actually needs 1172 bytes. This
lead to very tedious failure mode where the tail of one elf note would
overwrite the head of another if they ended up being alocated sequentially
by kmalloc, which was often the case.
It seems to me that a far better approach is to caclculate the size that
the area needs to be. This patch does just that.
If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is
needed then this should be as easy as making MAX_NOTE_BYTES larger in
arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I
think that the approach in this patch is a much more robust idea.
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)
arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Save and Restore the task's cpus_allowed mask, across the set_cpus_allowed()
in cpu_idle_wait(). Without this, we will endup corrupting task's cpu affinity.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The kexec code wasn't in the kernel when Ken Chen created
this patch (00b65985fb) to
change the mapping of percpu area from a TR to a TC.
Zou Nanhai spotted this extra piece, and Simon Horman
concurred.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Section mismatch: reference to ...
.init.text:prefill_possible_map from .text between 'setup_per_cpu_areas' and 'cpu_init'
.init.text:iosapic_override_isa_irq from .text between 'iosapic_init' and 'iosapic_remove'
Signed-off-by: Tony Luck <tony.luck@intel.com>
Handle MAP_FIXED in ia64 arch_get_unmapped_area and
hugetlb_get_unmapped_area(), just call prepare_hugepage_range in the later and
is_hugepage_only_range() in the former.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
My patch: git commit=95235ca2c20ac0b31a8eb39e2d599bcc3e9c9a10 introduced a bug
in IA64 cpuinfo output.
Patch changed the proc_freq from 1HZ resolution to 1KHz resolution, but left
format string unchanged at " %lu.%06lu". Below is the fix.
Thanks to Bjorn for catching this.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Example memory map (from HP sx1000 with VGA enabled):
0x00000 - 0x9FFFF supports only WB (cacheable) access
0xA0000 - 0xBFFFF supports only UC (uncacheable) access
0xC0000 - 0xFFFFF supports only WB (cacheable) access
Some versions of X map the entire 0x00000-0xFFFFF area at once. With the
example above, this mmap must fail because there's no memory attribute that's
safe for the entire area.
Prior to this patch, we performed the mmap with a UC mapping. When X
accessed the WB memory at 0xC0000, it caused an MCA. The crash can happen
when mapping 0xC0000 from either /dev/mem or a /sys/.../legacy_mem file.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Skip clock calibration if cpu being brought online is exactly the same
speed, stepping, etc., as the previous cpu. This significantly reduces
the time to boot very large systems.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The following 'if' statement in ia64_setup_msi_irq() always fails even
if create_irq() returns <0 value, because variable 'irq' is defined as
unsigned int. It would cause invalid memory access.
irq = create_irq();
if (irq < 0)
return irq;
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Clearly should be checking for "val == DIE_INIT_SLAVE_ENTER".
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If a system consists of mixed processor types, kmalloc()
can be called before the per-cpu data page is initialized.
If the slab contains sufficient memory, then kmalloc() works
ok. However, if the slabs are empty, slab calls the memory
allocator. This requires per-cpu data (NODE_DATA()) & the
cpu dies.
Also noted by Russ Anderson who had a very similar patch.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
kdump_find_rsvd_region() is only called by
reserve_memory() which is in __init, so it seems that
kdump_find_rsvd_region() should also be in there.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ptrace misses clearing the syscall trace flag.
The increased syscall overhead is retained after the trace is finished.
This case happens when strace is terminated by force.
Signed-off-by: Akiyama, Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Grammatical fixes (s/freezed/frozen/)
Make some variables static
Change a C++ "//" comment to "/* ... */"
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Similar to memory error recovery, when a cache error is consumed
by a user process terminate the user instead of crashing the system.
Signed-off-by: Russ Anderson (rja@sgi.com)
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Jack Steiner noticed that duplicate TLB DTC entries do not cause a
linux panic. See discussion:
http://www.gelato.unsw.edu.au/archives/linux-ia64/0307/6108.html
The current TLB recovery code is recovering from the duplicate itr.d
dropins, masking the underlying problem. This change modifies
the MCA recovery code to look for the TLB check signature of the
duplicate TLB entry and panic in that case.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
On 1.6GHz Montectio Tiger4, the following performance data is measured with
kernel built with defconfig which has NUMA configured:
Fastest sys_getcpu: 502 itc counts.
Fastest fsys_getcpu: 28 itc counts.
fsys_getcpu performance is largly impacted by whether data (node_to_cpu_map
etc) is in cache. It can take fsys_getcpu up to ~150 itc counts in cold
cache case.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
efi_initialize_iomem_resources() is declared in both include/linux/efi.h
and arch/ia64/kernel/setup.c. This patch removes the latter.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Berhhard Walle noted that on his HP rx8640 he ended up with saved_max_pfn
smaller than the highest address of system ram in /proc/iomem and proposed
a patch to base the address on the unrounded and unfiltered EFI memory
map address. Simon Horman and Magnus Damm suggested that the whole test
be moved earlier in the function. This is the combination of both of
these patches.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes boot failure because irq_desc->mask() is NULL.
- Added mask/unmask functions to ia64's irq desc function table.
- rename hw_interrupt_type to irq_chip. hw_interrupt_type is old name.
- Tony: Added same change to arch/ia64/sn/kernel/irq.c as pointed out
by Eric Biederman ... mask/unmask functions there can be no-op.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The address where the ELF core header is stored is passed to the secondary
kernel as a kernel command line option. The memory area for this header is
also marked as a separate EFI memory descriptor on ia64.
The separate EFI memory descriptor is at the moment of the type
EFI_UNUSABLE_MEMORY. With such a type the secondary kernel skips over the
entire memory granule (config option, 16M or 64M) when detecting memory.
If we are lucky we will just lose some memory, but if we happen to have
data in the same granule (such as an initramfs image), then this data will
never get mapped and the kernel bombs out when trying to access it.
So this is an attempt to fix this by changing the EFI memory descriptor
type into EFI_LOADER_DATA. This type is the same type used for the kernel
data and for initramfs. In the secondary kernel we then handle the ELF
core header data the same way as we handle the initramfs image.
This patch contains the kernel changes to make this happen. Pretty
straightforward, we reserve the area in reserve_memory(). The address for
the area comes from the kernel command line and the size comes from the
specialized EFI parsing function vmcore_find_descriptor_size().
The kexec-tools-testing code for this can be found here:
http://lists.osdl.org/pipermail/fastboot/2007-February/005983.html
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Perfmon associates vmalloc()ed memory with a file descriptor, and installs
a vma mapping that memory. Unfortunately, the vm_file field is not filled
in, so processes with mappings to that memory do not prevent the file from
being closed and the memory freed. This results in use-after-free bugs and
multiple freeing of pages, etc.
I saw this bug on an Altix on SLES9. Haven't reproduced upstream but it
looks like the same issue is there.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make saved_max_pfn point to max_pfn of entire system.
Without this patch is so that vmcore is zero length on ia64. This is
because saved_max_pfn was wrongly being set to the max_pfn of the crash
kernel's address space, rather than the max_pfg on the physical memory of
the machine - the whole purpose of vmcore is to access physical memory that
is not part of the crash kernel's addresss space.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Sort-Of-Acked-By: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch replaces all instances of "set_native_irq_info(irq, mask)"
with "irq_desc[irq].affinity = mask". The latter form is clearer
uses fewer abstractions, and makes access to this field uniform
accross different architectures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
Fix "spin_lock_irqrestore" to "spin_unlock_irqrestore."
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
acpi_boot_init() is making a bad check on the return
status from acpi_table_parse(). acpi_table_parse() now
returns zero on success, one on failure.
Signed-off-by: Aaron Young <ayoung@sgi.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This convters the sysctl ctl_tables to use C99 initializers. While I was
looking at it I discovered it was using a portion of the sysctl binary
addresses space under CTL_KERN KERN_OSTYPE which was completely inappropriate.
So I completely removed all of the sysctl binary names, to remove and avoid
the ABI conflict.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove in-source externs, linux/init.h is included in all cases.
This is a fixups for "Dynamic kernel command-line" patch.
It also includes some uml __init fixups so that we can __initdata also its
command_line.
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. Rename saved_command_line into boot_command_line.
2. Set command_line as __initdata.
[akpm@osdl.org: move some declarations to the right place]
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Part of long forgotten patch
http://groups.google.com/group/fa.linux.kernel/msg/e98e941ce1cf29f6?dmode=source
Since then, m32r grabbed two copies.
Leave s390 copy because of important absence of CONFIG_VT, but remove
references to non-existent timerlist_lock. ia64 also loses timerlist_lock.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs
when CONFIG_BLK_DEV_INITRAMFS is not selected. This saves another 4 kbytes
on most platfoms (some reserve PAGE_SIZE for initramfs).
Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The arch hooks arch_setup_msi_irq and arch_teardown_msi_irq are now
responsible for allocating and freeing the linux irq in addition to
setting up the the linux irq to work with the interrupt.
arch_setup_msi_irq now takes a pci_device and a msi_desc and returns
an irq.
With this change in place this code should be useable by all platforms
except those that won't let the OS touch the hardware like ppc RTAS.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (140 commits)
ACPICA: reduce table header messages to fit within 80 columns
asus-laptop: merge with ACPICA table update
ACPI: bay: Convert ACPI Bay driver to be compatible with sysfs update.
ACPI: bay: new driver is EXPERIMENTAL
ACPI: bay: make drive_bays static
ACPI: bay: make bay a platform driver
ACPI: bay: remove prototype procfs code
ACPI: bay: delete unused variable
ACPI: bay: new driver adding removable drive bay support
ACPI: dock: check if parent is on dock
ACPICA: fix gcc build warnings
Altix: Add ACPI SSDT PCI device support (hotplug)
Altix: ACPI SSDT PCI device support
ACPICA: reduce conflicts with Altix patch series
ACPI_NUMA: fix HP IA64 simulator issue with extended memory domain
ACPI: fix HP RX2600 IA64 boot
ACPI: build fix for IBM x440 - CONFIG_X86_SUMMIT
ACPICA: Update version to 20070126
ACPICA: Fix for incorrect parameter passed to AcpiTbDeleteTable during table load.
ACPICA: Update copyright to 2007.
...
Instead of pinning per-cpu TLB into a DTR, use DTC. This will free up
one TLB entry for application, or even kernel if access pattern to
per-cpu data area has high temporal locality.
Since per-cpu is mapped at the top of region 7 address, we just need to
add special case in alt_dtlb_miss. The physical address of per-cpu data
is already conveniently stored in IA64_KR(PER_CPU_DATA). Latency for
alt_dtlb_miss is not affected as we can hide all the latency. It was
measured that alt_dtlb_miss handler has 23 cycles latency before and
after the patch.
The performance effect is massive for applications that put lots of tlb
pressure on CPU. Workload environment like database online transaction
processing or application uses tera-byte of memory would benefit the most.
Measurement with industry standard database benchmark shown an upward
of 1.6% gain. While smaller workloads like cpu, java also showing small
improvement.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's not efficient to use a per-cpu variable just to store
how many physical stack register a cpu has. Ever since the
incarnation of ia64 up till upcoming Montecito processor, that
variable has "glued" to 96. Having a variable in memory means
that the kernel is burning an extra cacheline access on every
syscall and kernel exit path. Such "static" value is better
served with the instruction patching utility exists today.
Convert ia64_phys_stacked_size_p8 into dynamic insn patching.
This also has a pleasant side effect of eliminating access to
per-cpu area while psr.ic=0 in the kernel exit path. (fixable
for per-cpu DTC work, but why bother?)
There are some concerns with the default value that the instruc-
tion encoded in the kernel image. It shouldn't be concerned.
The reasons are:
(1) cpu_init() is called at CPU initialization. In there, we
find out physical stack register size from PAL and patch
two instructions in kernel exit code. The code in question
can not be executed before the patching is done.
(2) current implementation stores zero in ia64_phys_stacked_size_p8,
and that's what the current kernel exit path loads the value with.
With the new code, it is equivalent that we store reg size 96
in ia64_phys_stacked_size_p8, thus creating a better safety net.
Given (1) above can never fail, having (2) is just a bonus.
All in all, this patch allow one less memory reference in the kernel
exit path, thus reducing syscall and interrupt return latency; and
avoid polluting potential useful data in the CPU cache.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
getcpu system call returns cpu# and node# on which this system call and
its caller are running. This patch hooks up its implementation on IA64.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Occasionally the FSYS_RETURN patch list can have an odd length, causing other
data structures to get out of alignment. In OpenVZ it is odd and we get
misaligned kernel image, which does not boot.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When we offline a CPU, migrate_irqs() tries to determine whether the
affinity bits of the IRQ descriptor match any of the remaining online
CPUs. If not, it fixes up the interrupt to point somewhere else.
Unfortunately, if an IRQ is unregistered the IRQ descriptor may still
have affinity to the CPU being offlined, but the no_irq_chip handler
doesn't provide a set_affinity function. This causes us to hit the
WARN_ON in migrate_irqs().
The easiest solution seems to be setting all the bits in the affinity
mask when the last interrupt is removed from the vector. I hit this on
an older kernel with Xen/ia64 using driver domains (so it probably needs
more testing on upstream). Xen essentially uses the bind/unbind
interface in sysfs to unregister a device from a driver and thus
unregister the interrupt.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes a NULL-pointer dereference in ia64_machine_kexec().
The variable ia64_kimage is set in machine_kexec_prepare() which is
called from sys_kexec_load(). If kdump wasn't configured before,
ia64_kimage is NULL. machine_kdump_on_init() passes ia64_kimage() to
machine_kexec() which assumes a valid value.
The patch also adds a few sanity checks for the image to simplify
debugging of similar problems in future.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I encountered one problem when running ptrace test case the situation
is this: traced process's syscall parameter needs to be accessed, but
for sys_clone system call with clone_flag (CLONE_VFORK | CLONE_VM |
SIGCHLD) parameter. This syscall's parameter accessing result is wrong.
The reason is that vforked child process mm point is the same, but
tgid is different. Without this patch find_thread_for_addr will return
vforked process if vforked process is also stopped, but not the thread
which calls vfork syscall.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some patches have turned up on xen-devel recently to convert strcpy()
to safer alternatives and so forth. While reviewing those patches
I noticed that the features string building could be cleaned up.
This patch uses snprintf() instead of strcpy() and direct character
pointer manipulation. It makes the features string building safe and
gets rid of the special case for features output in show_cpuinfo()
Additionally I removed the (int) cast of ARRAY_SIZE, which seems to
serve no purpose.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As is pointed out in
http://www.gelato.org/community/view_linear.php?id=1_1036&from=authors&value=Ian%20Wienand#1_1039,
if single step on break instruction, the break fault has higher
priority than the single-step trap. When the break fault handler
is entered, it advances the IP by 1 instruction so break instruction
single-stepping is skipped, actually it is next instruction which
is single stepped.
This patch modifies this, it adds TIF_SINGLESTEP bit for thread
flags, and generate a fake sigtrap when single stepping break
instruction. Test case in attachment can verify this. Any comments
is welcome.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This moves the ia64 implementation of machine_shutdown() from
machine_kexec.c to process.c, which is in keeping with the implelmentation
on other architectures, and seems like a much more appropriate home for it.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove the Remove inline declaration of efi_get_pal_addr() as it is
declared in linux/efi.h.
Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
linux/uaccess.h was being included, but it seems that
really the following includes are needed.
asm/page.h: for __va() and PAGE_SHIFT
asm/uaccess.h: for copy_to_user()
I guess that linux/uaccess.h pulls in both asm/page.h and asm/uaccess.h.
I notices this while backporting the code to xen's linux-2.6.16.33,
which does not have linux/uaccess.h. I'm posting it as I think it is a
correct, though somewhat cosmetic fix.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Kexec support for 2.6.20 on ia64 does not build properly using a config
made up by CONFIG_SMP=n and CONFIG_HOTPLUG_CPU=n:
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The SN Altix platform does not conform to the IOSAPIC IRQ routing model.
Add code in acpi_unregister_gsi() to check if (acpi_irq_model ==
ACPI_IRQ_MODEL_PLATFORM) and return.
Due to an oversight, this code was not added previously when
similar code was added to acpi_register_gsi().
http://marc.theaimsgroup.com/?l=linux-acpi&m=116680983430121&w=2
Signed-off-by: John Keller <jpk@sgi.com>
Acked-by: Len Brown <lenb@kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes up ia64 kexec support for HP rx2620 hardware. It does
this by skipping migration of already disabled irqs. This is most likely a
problem on other ia64 platforms as well, but I've only been able to
reproduce it on one machine so far.
The full story is that handle_bad_irq() gets invoked before starting the
new kernel without this patch. This seems to happen when fixup_irqs()
calls generic_handle_irq() on already migrated (and disabled) irqs. So by
avoiding migration of disabled irqs we stay away of handle_bad_irq().
The code has been tested on three different ia64 machines, all with good
results. It is possible to trigger the same bug by offlining a processor
using echo 0 > /sys/devices/system/cpu/cpuX/online.
More detailed information is available in the following mail thread:
http://lists.osdl.org/pipermail/fastboot/2007-January/thread.html#5774
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Zou, Nanhai <nanhai.zou@intel.com>
Acked-by: Jay Lan <jlan@sgi.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ACPI 3.0 incorporated the SRAT spec, upping the table version to 2,
and extending the size of the proximity domain from 1-byte to 4-bytes.
This extension was into a reserved field that firmware should
set to 0, but the HP simulator had non-zero values there
resulting in unexpected huge numbers.
So mask the domain down to 8-bits for now.
A more general fix will be to check the table version
supplied by firmware and get paranoid about reserved fields.
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This kernel driver patch provides sysfs interface for user application to
call pal_mc_error_inject() procedure.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Provide ACPI _PRT support for SN Altix systems.
The SN Altix platform does not conform to the
IOSAPIC IRQ routing model, so a new acpi_irq_model
(ACPI_IRQ_MODEL_PLATFORM) has been defined. The SN
platform specific code sets acpi_irq_model to
this new value, and keys off of it in acpi_register_gsi()
to avoid the iosapic code path.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits)
ACPI: replace kmalloc+memset with kzalloc
ACPI: Add support for acpi_load_table/acpi_unload_table_id
fbdev: update after backlight argument change
ACPI: video: Add dev argument for backlight_device_register
ACPI: Implement acpi_video_get_next_level()
ACPI: Kconfig - depend on PM rather than selecting it
ACPI: fix NULL check in drivers/acpi/osl.c
ACPI: make drivers/acpi/ec.c:ec_ecdt static
ACPI: prevent processor module from loading on failures
ACPI: fix single linked list manipulation
ACPI: ibm_acpi: allow clean removal
ACPI: fix git automerge failure
ACPI: ibm_acpi: respond to workqueue update
ACPI: dock: add uevent to indicate change in device status
ACPI: ec: Lindent once again
ACPI: ec: Change #define to enums there possible.
ACPI: ec: Style changes.
ACPI: ec: Acquire Global Lock under EC mutex.
ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead.
ACPI: ec: Rename gpe_bit to gpe
...
Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
xruns starting at the 2.6.18-rt kernel, and those problems persisted all
until current -rt kernels. The latencies were serious and unjustified by
system load, often in the milliseconds range.
After a patient and heroic multi-month effort of Fernando, where he
tested dozens of kernels, tried various configs, boot options,
test-patches of mine and provided latency traces of those incidents, the
following 'smoking gun' trace was captured by him:
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
IRQ_19-1479 1D..1 0us : __trace_start_sched_wakeup (c01262ba 0 0)
IRQ_19-1479 1D..1 0us : resched_task (try_to_wake_up)
IRQ_19-1479 1D..1 0us : __spin_unlock_irqrestore (try_to_wake_up)
...
<idle>-0 1...1 11us!: default_idle (cpu_idle)
...
<idle>-0 0Dn.1 602us : smp_apic_timer_interrupt (c0103baf 1 0)
...
<...>-5856 0D..2 618us : __switch_to (__schedule)
<...>-5856 0D..2 618us : __schedule <<idle>-0> (20 162)
<...>-5856 0D..2 619us : __spin_unlock_irq (__schedule)
<...>-5856 0...1 619us : trace_stop_sched_switched (__schedule)
<...>-5856 0D..1 619us : trace_stop_sched_switched <<...>-5856> (37 0)
what is visible in this trace is that CPU#1 ran try_to_wake_up() for
PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
for CPU#0. But it decided to not send an IPI that no CPU - due to
TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
result was a 600+ usecs latency and a missed wakeup!
the bug turned out to be an idle-wakeup bug introduced into the mainline
kernel this summer via an optimization in the x86_64 tree:
commit 495ab9c045
Author: Andi Kleen <ak@suse.de>
Date: Mon Jun 26 13:59:11 2006 +0200
[PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
the problem is this type of change:
if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
- clear_thread_flag(TIF_POLLING_NRFLAG);
+ current_thread_info()->status &= ~TS_POLLING;
smp_mb__after_clear_bit();
while (!need_resched()) {
local_irq_disable();
this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
clear_thread_flag() is defined as:
clear_bit(flag, &ti->flags);
and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:
static inline void clear_bit(int nr, volatile unsigned long * addr)
{
__asm__ __volatile__( LOCK_PREFIX
"btrl %1,%0"
hence smp_mb__after_clear_bit() is defined as a simple compile barrier:
#define smp_mb__after_clear_bit() barrier()
but the explicit TS_POLLING clearing introduced by the patch:
+ current_thread_info()->status &= ~TS_POLLING;
is not an atomic op! So the clearing of the TS_POLLING bit is freely
reorderable with the reading of the NEED_RESCHED bit - and both now
reside in different memory addresses.
CPU idle wakeup very much depends on ordered memory ops, the clearing of
the TS_POLLING flag must always be done before we test need_resched()
and hit the idle instruction(s). [Symmetrically, the wakeup code needs
to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
ordering is paramount.]
Fernando's dual-core Athlon64 system has a sufficiently advanced memory
ordering model so that it triggered this scenario very often.
( And it also turned out that the reason why these latencies never
triggered on my testsystems is that i routinely use idle=poll, which
was the only idle variant not affected by this bug. )
The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
act as an absolute barrier between the TS_POLLING write and the
NEED_RESCHED read. This affects almost all idling methods (default,
ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On IA64 there exists some special instructions which
always need to be executed regradless of qp bits, such
as com.crel.unc, tbit.trel.unc etc.
This patch clears qp bits when inserting kprobe trap code
and disables probepoint on slot 1 for these special
instructions.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Because slot 1 of one instr bundle crosses border of two consecutive
8-bytes, kprobe on slot 1 is disabled. This patch enables kprobe on
slot1, it only replaces higher 8-bytes of the instruction bundle and
changes the exception code to ignore the low 12 bits of the break
number (which is across the border in the lower 8-bytes of the bundle).
For those instructions which must execute regardless qp bits,
kprobe on slot 1 is still disabled.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Stephane thought he saw a problem here (but was just confused
by the return value from ia64_pal_get_brand_info()). But we
should be more defensive here in case an prototype PAL for
a future processor doesn't implement this PAL call.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Improve the scalability of the fpswa code that rate-limits
logging of messages.
There are 2 distinctly different problems in this code.
1) If prctl is used to disable logging, last_time is never
updated. The result is that fpu_swa_count is zeroed out on
EVERY fp fault. This causes a very very hot cache line.
The fix reduces the wallclock time of a 1024p FP exception test
from 28734 sec to 19 sec!!!
2) On VERY large systems, excessive messages are logged because
multiple cpus can each reset or increment fpu_swa_count at
about the same time. The result is that hundreds of messages
are logged each second. The fixes reduces the logging rate
to ~1 per second.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* Make NORET_TYPE and ATTRIB_NORET in line with the
declaration for other architectures
* Add parameter names
Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There seems to be a value in both allowing the kernel to determine
the base offset of the crashkernel automatically and allowing
users's to sepcify it.
The old behaviour on ia64, which is still the current behaviour on
most architectures is for the user to always specify the address.
Recently ia64 was changed so that it is always automatically determined.
With this patch the kernel automatically determines the offset if
the supplied value is 0, otherwise it uses the value provided.
This should probably be backed by a documentation change.
Signed-Off-By: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Actually, on reflection I think that there is a good case for
keeping the options separate. I am thinking particularly of people
who want a very small crashdump kernel and thus don't want to compile
in kexec.
The patch below should fix things up so that all valid combinations of
KEXEC, CRASH_DUMP and VMCORE compile cleanly - VMCORE depends on
CRASH_DUMP which is why I said valid combinations. In a nutshell
it just untangles unrelated code and switches around a few defines.
Please note that it creats a new file, arch/ia64/kernel/crash_dump.c
This is in keeping with the i386 implementation.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] replace kmalloc+memset with kzalloc
[IA64] resolve name clash by renaming is_available_memory()
[IA64] Need export for csum_ipv6_magic
[IA64] Fix DISCONTIGMEM without VIRTUAL_MEM_MAP
[PATCH] Add support for type argument in PAL_GET_PSTATE
[IA64] tidy up return value of ip_fast_csum
[IA64] implement csum_ipv6_magic for ia64.
[IA64] More Itanium PAL spec updates
[IA64] Update processor_info features
[IA64] Add se bit to Processor State Parameter structure
[IA64] Add dp bit to cache and bus check structs
[IA64] SN: Correctly update smp_affinty mask
[IA64] sparse cleanups
[IA64] IA64 Kexec/kdump
Replace kmalloc+memset with kzalloc
Signed-off-by: Yan Burman <burman.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a name clash with ia64 arch code in Andrew's tree. Rename
is_avialable_memory() to is_memory_available() to avoid the clash.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Now we have our own highly optimized assembly code version of
this routine (Thanks Ken!) we should export it so that it can
be used.
Signed-off-by: Tony Luck <tony.luck@intel.com>
PAL_GET_PSTATE accepts a type argument to return different kinds of
frequency information.
Refer: Intel ItaniumArchitecture Software Developer's Manual -
Volume 2: System Architecture, Revision 2.2
(http://developer.intel.com/design/itanium/manuals/245318.htm)
Add the support for type argument and use Instantaneous frequency
in the acpi driver.
Also fix a bug, where in return value of PAL_GET_PSTATE was getting compared
with 'control' bits instead of 'status' bits.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Changes and updates.
1. Remove fake rendz path and related code according to discuss with Khalid Aziz.
2. fc.i offset fix in relocate_kernel.S.
3. iospic shutdown code eoi and mask race fix from Fujitsu.
4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner.
5. Send slave to SAL slave loop patch from Jay Lan.
6. Kdump on non-recoverable MCA event patch from Jay Lan
7. Use CTL_UNNUMBERED in kdump_on_init sysctl.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.
the compiler can skip truly unused functions just fine:
text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after
[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer. The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag. If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.
However, the processing of this check routine takes a long time. So, this
patch introduces the garbage collection mechanism of insn_slot. It also
introduces the "dirty" flag to free_insn_slot because of efficiency.
The "clean" instruction slots (dirty flag is cleared) are released
immediately. But the "dirty" slots which are used by boosted kprobes, are
marked as garbages. collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the 'no_control' field in the cpu struct to a more positive
and better term 'hotpluggable'. And change(/cleanup) the logic accordingly.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Fix up arch-specific work items where possible to use the new work_struct and
delayed_work structs.
Three places that enqueue bits of their stack and then return have been marked
with #error as this is not permitted.
Signed-Off-By: David Howells <dhowells@redhat.com>
Use generic_handle_irq() to handle mixed-type irq handling.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
`typename' is going away and is usually uninitialised anwyay.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Correct definition of handle_IPI
[IA64] move SAL_CACHE_FLUSH check later in boot
[IA64] MCA recovery: Montecito support
[IA64] cpu-hotplug: Fixing confliction between CPU hot-add and IPI
[IA64] don't double >> PAGE_SHIFT pointer for /dev/kmem access
The declaration of handle_IPI in arch/ia64/kernel/smp.c was changed but
not the definition of this function. Remove struct pt_regs from
handle_IPI().
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The check to see if the firmware drops interrupts during a
SAL_CACHE_FLUSH is done to early in the boot. SAL_CACHE_FLUSH expects
to be able to make PAL calls in virtual mode, on some cell based
machines a fault occurs causing a MCA. This patch moves the check
after mmu_context_init so the TLB and VHPT are properly setup.
Signed-off-by Troy Heber <troy.heber@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The information in MCA records is filled in slightly differently on
Montecito than on Madison/McKinley. Usually, the cache check and bus
check target identifiers have the same address. On Montecito the
cache check and bus check target identifiers can be different if
a corrected error (ie SBE or unconsumed poison data) was encountered and
then an uncorrected error (ie DBE) was consumed. In that case, the
cache check target identifier is the physical address of the DBE (that
caused the MCA to surface) while the bus check target identifier is the
physical address of the SBE. This patch correctly finds the target
identifier that triggered the MCA.
If there are multiple valid cache target identifiers in the same
error record then use the one with the lowest cache level.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.
This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Count the number of "resched" interrupts that each cpu receives.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reformat to fit in 80 columns. Fix a couple typos. Remove
a couple unused labels.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linux maps PAL instructions with an ITR, but uses a DTC for PAL data.
Section 11.10.2.1.3, "Making PAL Procedures Calls in Physical or Virtual
Mode," of the SDM (rev 2.2), says we must therefore make all PAL calls
with PSR.ic = 1 so that Linux can handle any TLB faults.
PAL_CALL_IC_OFF is currently unused, and as long as we use the ITR + DTC
strategy, we can't use it. So remove it. I also removed the code in
ia64_pal_call_static() that conditionally cleared PSR.ic.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Allow pending IPIs to interrupt a timer interrupt that is looping
in the do_timer() "while" loop in timer_interrupt(). (Interrupts are
allowed at only 1 spot in the code).
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Missed one piece of ia64 fallout from the global IRQ patch
7d12e780e0
Perfmon interrupt handler needs to use get_irq_regs() too.
Acked-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
This is just a few makefile tweaks and some file renames.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The last in-kernel user of errno is gone, so we should remove the definition
and everything referring to it. This also removes the now-unused lib/execve.c
file that was introduced earlier.
Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the
kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some architectures provide an execve function that does not set errno, but
instead returns the result code directly. Rename these to kernel_execve to
get the right semantics there. Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.
[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This
avoids all arches having to be updated. Compiles and boots on s390.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a nsproxy structure to the task struct. Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.
The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.
[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpumask: ensure that node_to_cpumask() is available to modules for all
supported combinations of architecture and CONFIG_NUMA.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock. This patch moves kfree function out scope of
kretprobe_lock.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
asm/serial.h is supposed to contain the definitions for the architecture
specific 8250 ports for the 8250 driver. It may also define BASE_BAUD,
but this is the base baud for the architecture specific ports _only_.
Therefore, nothing other than the 8250 driver should be including this
header file. In order to move towards this goal, here is a patch which
removes some of the more obvious incorrect includes of the file.
Acked-by: Paul Fulghum <paulkf@microgate.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.
This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1". This condition is never met so I
suppose it is just a bug. I just remove that condition only instead of
kill the whole "if" block.
[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.
Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update. Passing ticks
get rid of this redundant calculation. Also there are another redundancy
pointed out by Martin Schwidefsky.
This cleanup make a barrier added by
5aee405c66 needless. So this patch removes
it.
As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies. (This patch does not really
remove wall_jiffies. It would be another cleanup patch)
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] minor reformatting to vmlinux.lds.S
[IA64] CMC/CPE: Reverse the order of fetching log and checking poll threshold
[IA64] PAL calls need physical mode, stacked
[IA64] ar.fpsr not set on MCA/INIT kernel entry
[IA64] printing support for MCA/INIT
[IA64] trim output of show_mem()
[IA64] show_mem() printk levels
[IA64] Make gp value point to Region 5 in mca handler
Revert "[IA64] Unwire set/get_robust_list"
[IA64] Implement futex primitives
[IA64-SGI] Do not request DMA memory for BTE
[IA64] Move perfmon tables from thread_struct to pfm_context
[IA64] Add interface so modules can discover whether multithreading is on.
[IA64] kprobes: fixup the pagefault exception caused by probehandlers
[IA64] kprobe opcode 16 bytes alignment on IA64
[IA64] esi-support
[IA64] Add "model name" to /proc/cpuinfo
Fix build error introduced by 3212fe1594
Non-NUMA case should be handled.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor reformatting to vmlinux.lds.S to make it 80-column usable,
in accordance with Linux coding style.
Signed-off-by: Al Stone <ahs3@fc.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch reverses the order of fetching log from SAL and
checking poll threshold. This will fix following trivial issues:
- If SAL_GET_SATE_INFO is unbelievably slow (due to huge system
or just its silly implementation) and if it takes more than
1/5 sec, CMCI/CPEI will never switch to CMCP/CPEP.
- Assuming terrible flood of interrupt (continuous corrected
errors let all CPUs enter to handler at once and bind them
in it), CPUs will be serialized by IA64_LOG_LOCK(*).
Now we check the poll threshold after the lock and log fetch,
so we need to call SAL_GET_STATE_INFO (num_online_cpus() + 4)
times in the worst case.
if we can check the threshold before the lock, we can shut up
interrupts quickly without waiting preceding log fetches, and
the number of times will be reduced to (num_online_cpus()) in
the same situation.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When entering the kernel due to an MCA or INIT, ar.fpsr (ar40)
was not getting set to the kernel default value (remaining
at the user value). The effect depends on the user setting
of ar.fpsr. In the test case, the effect was addresses
printing with strange hex values.
Setting ar.fpsr in ia64_set_kernel_registers sets it for both
the MCA and INIT paths. The user value of ar.fpsr is correctly
saved (in ia64_state_save) and restored (in ia64_state_restore).
Below is an example of output with very strange hex values.
Anyone know the value of hex 'g'? :-)
Processes interrupted by INIT - 0 (cpu 14 task 0xdfffg55g7a4c6gA)
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Printing message to console from MCA/INIT handler is useful,
however doing oops_in_progress = 1 in them exactly makes
something in kernel wrong. Especially it sounds ugly if
system goes wrong after returning from recoverable MCA.
This patch adds ia64_mca_printk() function that collects
messages into temporary-not-so-large message buffer during
in MCA/INIT environment and print them out later, after
returning to normal context or when handlers determine to
down the system.
Also this print function is exported for use in extensional
MCA handler. It would be useful to describe detail about
recovery.
NOTE:
I don't think it is sane thing if temporary message buffer
is enlarged enough to hold whole stack dumps from INIT, so
buffering is disabled during stack dump from INIT-monarch
(= default_monarch_init_process). please fix it in future.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
MCA dispatch code take physical address of GP passed from SAL, then call
DATA_PA_TO_VA twice on GP before call into C code. The first time is
in ia64_set_kernel_register, the second time is in VIRTUAL_MODE_ENTER.
The gp is changed to a virtual address in region 7 because DATA_PA_TO_VA
is implemented by dep instruction.
However when notify blocks were called from MCA handler code, because
notify blocks are supported by callback function pointers, gp value
value was switched to region 5 again.
The patch set gp register to kernel gp of region 5 at entry of MCA
dispatch.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This reverts commit 2636255488.
Jakub Jelinek provided the missing futex_atomic_cmpxchg_inatomic()
function, so now it should be safe to re-enable these syscalls.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch renders thread_struct->pmcs[] and thread_struct->pmds[]
OBSOLETE. The actual table is moved to pfm_context structure which
saves space in thread_struct (in turn saving space in task_struct
which frees up more space for kernel stacks).
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add is_multithreading_enabled() to check whether multi-threading
is enabled independently of which cpu is currently online
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If the user-specified kprobe handler causes the page fault when accessing
user space address, fixup this fault since do_page_fault() should not
continue as the kprobe handler are run with preemption disabled.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On IA64 instruction opcode must be 16 bytes alignment, in kprobe structure
there is one element to save original instruction, currently saved opcode
is not statically allocated in kprobe structure, that can not assure
16 bytes alignment. This patch dynamically allocated kprobe instruction
opcode to assure 16 bytes alignment.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The uncached allocator manages per node pools. Specify __GFP_THISNODE in
order to force allocation on the indicated node or fail. The uncached
allocator has already logic to deal with failing allocations.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Assume that a cpu is *physically* offlined at boot time...
Because smpboot.c::smp_boot_cpu_map() canoot find cpu's sapicid,
numa.c::build_cpu_to_node_map() cannot build cpu<->node map for
offlined cpu.
For such cpus, cpu_to_node map should be fixed at cpu-hot-add.
This mapping should be done before cpu onlining.
This patch also handles cpu hotremove case.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Problem description:
We have additional_cpus= option for allocating possible_cpus. But nid
for possible cpus are not fixed at boot time. cpus which is offlined at
boot or cpus which is not on SRAT is not tied to its node. This will
cause panic at cpu onlining.
Usually, pxm_to_nid() mapping is fixed at boot time by SRAT.
But, unfortunately, some system (my system!) do not include
full SRAT table for possible cpus. (Then, I use
additiona_cpus= option.)
For such possible cpus, pxm<->nid should be fixed at
hot-add. We now have acpi_map_pxm_to_node() which is also
used at boot. It's suitable here.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The SN PROM uses the register stack in the slave loop. The contents
must be preserved for the OS to return to the slave loop via offlining
a cpu or for kexec. A 'flushrs" is needed to force the stack to be written
to memory prior to changing bspstore.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The syscalls set/get_robust_list must not be wired up until
futex_atomic_cmpxchg_inatomic is implemented. Otherwise the kernel will
hang in handle_futex_death.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a bug in sys_perfmonctl() whereby it was not correctly
decrementing the file descriptor reference count.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This prevents cross-region mappings on IA64 and SPARC which could lead
to system crash. They were correctly trapped for normal mmap() calls,
but not for the kernel internal calls generated by executable loading.
This code just moves the architecture-specific cross-region checks into
an arch-specific "arch_mmap_check()" macro, and defines that for the
architectures that needed it (ia64, sparc and sparc64).
Architectures that don't have any special requirements can just ignore
the new cross-region check, since the mmap() code will just notice on
its own when the macro isn't defined.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Cleaned up to not affect architectures that don't need it ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Increase default nodes shift to 10, nr_cpus to 1024
[IA64] remove redundant local_irq_save() calls from sn_sal.h
[IA64] panic if topology_init kzalloc fails
[IA64-SGI] Silent data corruption caused by XPC V2.
There really is no sense trying to continue if the kzalloc of sysfs_cpus[]
fails in ia64 topology_init. The code calling into here doesn't check
errors very well, and one ends up with a nonobvious boot failure that
wastes peoples time debugging.
See for example the lkml thread at:
http://lkml.org/lkml/2006/3/2/215
Since the system is totally dead when this kzalloc fails, not having yet
even booted, might as well announce one's death boldly and plainly.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ACPI 3.0 appended a variable length UID string to the LAPIC structure
as part of support for > 256 processors. So the BAD_MADT_ENTRY() sanity
check can no longer compare for equality with a fixed structure length.
Signed-off-by: Alexey Y Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path. However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception. On his suggestion, this
patch changes the message to simply "Fatal exception". A suitable oops
message should already have been displayed.
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The uncached allocator has a function, uncached_get_new_chunk(), that needs
to be serialized on a per node basis. It also has a global variable,
allocated_granules, which should be defined on a per node basis and protected
by that serialization. Additionally, all error returns from functions called
(like ia64_pal_mc_drain()) should be handled appropriately.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Acked-by: Jes Sorenson <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
CONFIG_MD_RAID5 became CONFIG_MD_RAID456 in drivers/md/Kconfig. Make
the same change in arch/ia64
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Aron Griffis <aron@hp.com>
Acked-by: Jes Sorenson <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I think ia64_switch_mode_phys and ia64_switch_mode_virt
does not need to alloc an empty frame.
An empty frame is required by loadrs but flushrs
does not need that.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We found an issue in pal.S.
According to the software runtime SPEC,
The caller's output registers do not need to be preserved for
caller. The callee may reuse input registers for any other
purpose within the procedure.
in ia64_pal_call_phys_stacked,
input registers are copied to output registers before call
into ia64_switch_mode_phys, then used to call into PAL. This
assumes output registers are preserved in ia64_switch_mode_phys,
which may not be true.
In this particular case, ia64_switch_mode_phys alloc a null frame
, and mask off psr.i.
If an interrupt comes at this small window,
or an MCA comes inside the procedure, output registers
maybe changed,
then the pal call may got some staled input registers.
This patch moves the copies from input to output
after ia64_switch_mode_phys to follow the software
runtime convention.
It also removed some unused labels in
ia64_pal_call_phys_stacked.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix some sparse warnings on ia64. Large constants that should be long
instead of int. Use NULL instead of 0. Add some missing __iomem
casts. Replace a non-C99 structure assignment.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The latest toolchains can produce a new ELF section in DSOs and
dynamically-linked executables. The new section ".gnu.hash" replaces
".hash", and allows for more efficient runtime symbol lookups by the
dynamic linker. The new ld option --hash-style={sysv|gnu|both} controls
whether to produce the old ".hash", the new ".gnu.hash", or both. In some
new systems such as Fedora Core 6, gcc by default passes --hash-style=gnu
to the linker, so that a standard invocation of "gcc -shared" results in
producing a DSO with only ".gnu.hash". The new ".gnu.hash" sections need
to be dealt with the same way as ".hash" sections in all respects; only the
dynamic linker cares about their contents. To work with older dynamic
linkers (i.e. preexisting releases of glibc), a binary must have the old
".hash" section. The --hash-style=both option produces binaries that a new
dynamic linker can use more efficiently, but an old dynamic linker can
still handle.
The new section runs afoul of the custom linker scripts used to build vDSO
images for the kernel. On ia64, the failure mode for this is a boot-time
panic because the vDSO's PT_IA_64_UNWIND segment winds up ill-formed.
This patch addresses the problem in two ways.
First, it mentions ".gnu.hash" in all the linker scripts alongside ".hash".
This produces correct vDSO images with --hash-style=sysv (or old tools),
with --hash-style=gnu, or with --hash-style=both.
Second, it passes the --hash-style=sysv option when building the vDSO
images, so that ".gnu.hash" is not actually produced. This is the most
conservative choice for compatibility with any old userland. There is some
concern that some ancient glibc builds (though not any known old production
system) might choke on --hash-style=both binaries. The optimizations
provided by the new style of hash section do not really matter for a DSO
with a tiny number of symbols, as the vDSO has. If someone wants to use
=gnu or =both for their vDSO builds and worry less about that
compatibility, just change the option and the linker script changes will
make any choice work fine.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use hotplug version of register_cpu_notifier in late init functions.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is part of an effort to unify the panic_on_oops behaviour across
all architectures that implement it.
It was pointed out to me by Andi Kleen that if an oops has occured in
interrupt context, then calling sleep() in the oops path will only cause a
panic, and that it would be really better for it not to be in the path at
all.
This patch removes the ssleep() call and reworks the console message
accordinly. I have a slght concern that the resulting console message is
too long, feedback welcome.
For powerpc it also unifies the 32bit and 64bit behaviour.
Fror x86_64, this patch only updates the console message, as ssleep() is
already not present.
Signed-off-by: Horms <horms@verge.net.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kprobe inserts breakpoint instruction in probepoint and then jumps to
instruction slot when breakpoint is hit, the instruction slot icache must
be consistent with dcache. Here is the patch which invalidates instruction
slot icache area.
Without this patch, in some machines there will be fault when executing
instruction slot where icache content is inconsistent with dcache.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
/proc/pal/*/version_info is a bit confusing. HP firmware, at least,
reports 07.31 instead of 0.7.31. Also, the comment is out of place;
it's an internal detail about the implementation of ia64_pal_version.
Since the 2.2 revision of the SDM still states that PAL_VERSION can
be called in virtual mode, correct the comment to be more accurate.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Newer ARMs have a 40 bit physical address space, but mapping physical
memory above 4G needs a special page table format which we (currently?) do
not use for userspace mappings, so what happens instead is that mapping an
address >= 4G will happily discard the upper bits and wrap.
There is a valid_mmap_phys_addr_range() arch hook where we could check for
>= 4G addresses and deny the mapping, but this hook takes an unsigned long
address:
static inline int valid_mmap_phys_addr_range(unsigned long addr, size_t size);
And drivers/char/mem.c:mmap_mem() calls it like this:
static int mmap_mem(struct file * file, struct vm_area_struct * vma)
{
size_t size = vma->vm_end - vma->vm_start;
if (!valid_mmap_phys_addr_range(vma->vm_pgoff << PAGE_SHIFT, size))
So that's not much help either.
This patch makes the hook take a pfn instead of a phys address.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
screen_info.h doesn't have anything to do with the tty layer and shouldn't be
included by tty.h. This patches removes the include and modifies all users to
directly include screen_info.h. struct screen_info is mainly used to
communicate with the console drivers in drivers/video/console. Note that this
patch touches every arch and I have no way of testing it. If there is a
mistake the worst thing that will happen is a compile error.
[akpm@osdl.org: fix arm build]
[akpm@osdl.org: fix alpha build]
Signed-off-by: Jon Smirl <jonsmir@gmail.com>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.
Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the new IRQF_ constants and remove the SA_INTERRUPT define
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow to tie upper bits of syscall bitmap in audit rules to kernel-defined
sets of syscalls. Infrastructure, a couple of classes (with 32bit counterparts
for biarch targets) and actual tie-in on i386, amd64 and ia64.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations.
(Most architectures had it defined to NOP anyway.)
NOTE: ia64 needs testing. i386 and x86_64 tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cleanup: remove irq_descp() - explicit use of irq_desc[] is shorter and more
readable.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the
irq_desc[NR_IRQS].affinity field.
[akpm@osdl.org: sparc64 build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch-queue improves the generic IRQ layer to be truly generic, by adding
various abstractions and features to it, without impacting existing
functionality.
While the queue can be best described as "fix and improve everything in the
generic IRQ layer that we could think of", and thus it consists of many
smaller features and lots of cleanups, the one feature that stands out most is
the new 'irq chip' abstraction.
The irq-chip abstraction is about describing and coding and IRQ controller
driver by mapping its raw hardware capabilities [and quirks, if needed] in a
straightforward way, without having to think about "IRQ flow"
(level/edge/etc.) type of details.
This stands in contrast with the current 'irq-type' model of genirq
architectures, which 'mixes' raw hardware capabilities with 'flow' details.
The patchset supports both types of irq controller designs at once, and
converts i386 and x86_64 to the new irq-chip design.
As a bonus side-effect of the irq-chip approach, chained interrupt controllers
(master/slave PIC constructs, etc.) are now supported by design as well.
The end result of this patchset intends to be simpler architecture-level code
and more consolidation between architectures.
We reused many bits of code and many concepts from Russell King's ARM IRQ
layer, the merging of which was one of the motivations for this patchset.
This patch:
rename desc->handler to desc->chip.
Originally i did not want to do this, because it's a big patch. But having
both "desc->handler", "desc->handle_irq" and "action->handler" caused a
large degree of confusion and made the code appear alot less clean than it
truly is.
I have also attempted a dual approach as well by introducing a
desc->chip alias - but that just wasnt robust enough and broke
frequently.
So lets get over with this quickly. The conversion was done automatically
via scripts and converts all the code in the kernel.
This renaming patch is the first one amongst the patches, so that the
remaining patches can stay flexible and can be merged and split up
without having some big monolithic patch act as a merge barrier.
[akpm@osdl.org: build fix]
[akpm@osdl.org: another build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chandra Seetharaman missed one place in commit:
65edc68c34
[but it only shows up when building the ski simulator configuration
of ia64, so thats understandable]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make notifier_blocks associated with cpu_notifier as __cpuinitdata.
__cpuinitdata makes sure that the data is init time only unless
CONFIG_HOTPLUG_CPU is defined.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).
We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).
If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.
This patch reverts the notifier_call changes made in 2.6.17
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Goto-san's patch, we can add new pgdat/node at runtime. I'm now
considering node-hot-add with cpu + memory on ACPI.
I found acpi container, which describes node, could evaluate cpu before
memory. This means cpu-hot-add occurs before memory hot add.
In most part, cpu-hot-add doesn't depend on node hot add. But register_cpu(),
which creates symbolic link from node to cpu, requires that node should be
onlined before register_cpu(). When a node is onlined, its pgdat should be
there.
This patch-set holds off creating symbolic link from node to cpu
until node is onlined.
This removes node arguments from register_cpu().
Now, register_cpu() requires 'struct node' as its argument. But the array of
struct node is now unified in driver/base/node.c now (By Goto's node hotplug
patch). We can get struct node in generic way. So, this argument is not
necessary now.
This patch also guarantees add cpu under node only when node is onlined. It
is necessary for node-hot-add vs. cpu-hot-add patch following this.
Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
to its 'struct node *root' argument. This patch removes it.
Also modify callers of register_cpu()/unregister_cpu, whose args are changed
by register-cpu-remove-node-struct patch.
[Brice.Goglin@ens-lyon.org: fix it]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When new node becomes enable by hot-add, new sysfs file must be created for
new node. So, if new node is enabled by add_memory(), register_one_node() is
called to create it. In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().
This is tested by Tiger4(IPF) with node hot-plug emulation.
Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
Converted i386/x86-64/ia64 for now because that was the easiest
way to fix ACPI which also manipulates these flags in its idle
function.
Cc: Nick Piggin <npiggin@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
ACPI: suppress power button event on S3 resume
ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
ACPI: use for_each_possible_cpu() instead of for_each_cpu()
ACPI: delete newly added debugging macros in processor_perflib.c
ACPI: UP build fix for bugzilla-5737
Enable P-state software coordination via _PDC
P-state software coordination for speedstep-centrino
P-state software coordination for acpi-cpufreq
P-state software coordination for ACPI core
ACPI: create acpi_thermal_resume()
ACPI: create acpi_fan_suspend()/acpi_fan_resume()
ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
ACPI: create acpi_device_suspend()/acpi_device_resume()
ACPI: replace spin_lock_irq with mutex for ec poll mode
ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
sem2mutex: acpi, acpi_link_lock
ACPI: delete unused acpi_bus_drivers_lock
sem2mutex: drivers/acpi/processor_perflib.c
ACPI add ia64 exports to build acpi_memhotplug as a module
ACPI: asus_acpi_init(): propagate correct return value
...
Manual resolve of conflicts in:
arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
include/acpi/processor.h
Pass the POSIX lock owner ID to the flush operation.
This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally. FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.
Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move_pages() is used to move individual pages of a process. The function can
be used to determine the location of pages and to move them onto the desired
node. move_pages() returns status information for each page.
long move_pages(pid, number_of_pages_to_move,
addresses_of_pages[],
nodes[] or NULL,
status[],
flags);
The addresses of pages is an array of void * pointing to the
pages to be moved.
The nodes array contains the node numbers that the pages should be moved
to. If a NULL is passed instead of an array then no pages are moved but
the status array is updated. The status request may be used to determine
the page state before issuing another move_pages() to move pages.
The status array will contain the state of all individual page migration
attempts when the function terminates. The status array is only valid if
move_pages() completed successfullly.
Possible page states in status[]:
0..MAX_NUMNODES The page is now on the indicated node.
-ENOENT Page is not present
-EACCES Page is mapped by multiple processes and can only
be moved if MPOL_MF_MOVE_ALL is specified.
-EPERM The page has been mlocked by a process/driver and
cannot be moved.
-EBUSY Page is busy and cannot be moved. Try again later.
-EFAULT Invalid address (no VMA or zero page).
-ENOMEM Unable to allocate memory on target node.
-EIO Unable to write back page. The page must be written
back in order to move it since the page is dirty and the
filesystem does not provide a migration function that
would allow the moving of dirty pages.
-EINVAL A dirty page cannot be moved. The filesystem does not provide
a migration function and has no ability to write back pages.
The flags parameter indicates what types of pages to move:
MPOL_MF_MOVE Move pages that are only mapped by the process.
MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
Requires sufficient capabilities.
Possible return codes from move_pages()
-ENOENT No pages found that would require moving. All pages
are either already on the target node, not present, had an
invalid address or could not be moved because they were
mapped by multiple processes.
-EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
to migrate pages in a kernel thread.
-EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges.
or an attempt to move a process belonging to another user.
-EACCES One of the target nodes is not allowed by the current cpuset.
-ENODEV One of the target nodes is not online.
-ESRCH Process does not exist.
-E2BIG Too many pages to move.
-ENOMEM Not enough memory to allocate control array.
-EFAULT Parameters could not be accessed.
A test program for move_pages() may be found with the patches
on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
From: Christoph Lameter <clameter@sgi.com>
Detailed results for sys_move_pages()
Pass a pointer to an integer to get_new_page() that may be used to
indicate where the completion status of a migration operation should be
placed. This allows sys_move_pags() to report back exactly what happened to
each page.
Wish there would be a better way to do this. Looks a bit hacky.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify the gen_pool allocator (lib/genalloc.c) to utilize a bitmap scheme
instead of the buddy scheme. The purpose of this change is to eliminate
the touching of the actual memory being allocated.
Since the change modifies the interface, a change to the uncached allocator
(arch/ia64/kernel/uncached.c) is also required.
Both Andrey Volkov and Jes Sorenson have expressed a desire that the
gen_pool allocator not write to the memory being managed. See the
following:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113518602713125&w=2http://marc.theaimsgroup.com/?l=linux-kernel&m=113533568827916&w=2
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Andrey Volkov <avolkov@varma-el.com>
Acked-by: Jes Sorensen <jes@trained-monkey.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sorry I didn't notice earlier, but that BUG_ON triggers for me on the
simulator. AFAICS the mask for itv is set in cpu_init(), which comes
after sal_init(). Consequently on the simulator the itv still has its
start value of zero. I've probably missed something, but I wonder why
at this stage of the boot you even need to save and restore the itv?
Signed-Off-By: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
struct ia64_sal_os_state has three semi-independent sections. The code
in mca_asm.S assumes that these three sections are contiguous, which
makes it very awkward to add new data to this structure. Remove the
assumption that the sections are contiguous. Define a macro to shorten
references to offsets in ia64_sal_os_state.
This patch does not change the way that the code behaves. It just
makes it easier to update the code in future and to add fields to
ia64_sal_os_state when debugging the MCA/INIT handlers.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
One more trivial, stand-alone patch from the Xen/ia64 review. Sanity
check usage of the reserved region numbers.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Abstract IA64_FIRST_DEVICE_VECTOR/IA64_LAST_DEVICE_VECTOR since SN platforms
use a subset of the IA64 range. Implement this by making the above macros
global variables which the platform can override in it setup code.
Also add a reserve_irq_vector() routine used by SN to mark a vector's as
in-use when that weren't allocated through assign_irq_vector().
Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add support for making ESI calls [1]. ESI stands for "Extensible SAL
specification" and is basically a way for invoking firmware
subroutines which are identified by a GUID. I don't know whether ESI
is used by vendors other than HP (if you do, please let me know) but
as firmware "backdoors" go, this seems one of the cleaner methods, so
it seems reasonable to support it, even though I'm not aware of any
publicly documented ESI calls. I'd have liked to make the ESI module
completely stand-alone, but unfortunately that is not easily (or not
at all) possible because in order to make ESI calls in physical mode,
a small stub similar to the EFI stub is needed in the kernel proper.
I did try to create a stub that would work in user-level, but it
quickly got ugly beyond recognition (e.g., the stub had to make
assumptions about how the module-loader generated call-stubs work) and
I didn't even get it to work (that's probably fixable, but I didn't
bother because I concluded it was too ugly anyhow). While it's not
terribly elegant to have kernel code which isn't actively used in the
kernel proper, I think it might be worth making an exception here for
two reasons: the code is trivially small (all that's really needed is
esi_stub.S) and by including it in the normal kernel distro, it might
encourage other OEMs to also use ESI, which I think would be far
better than each inventing their own firmware "backdoor".
The code was originally written by Alex. I just massaged and packaged
it a bit (and perhaps messed up some things along the way...).
Changes since first version of patch that was posted to mailing list:
* Export ia64_esi_call and ia64_esi_call_phys() as GPL symbols.
* Disallow building esi.c as a module for now. Building as a module
would currently lead to an unresolved reference to "sal_lock" on SMP kernels
because that symbol doesn't get exported.
* Export esi_call_phys() only if ESI is enabled.
* Remove internal stuff from esi.h and add a "proc_type" argument to
ia64_esi_call() such that serialization-requirements can be expressed (ESI
follows SAL here, where procedure calls may have to be serialized, are
MP-safe, or MP-safe andr reentrant).
[1] h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,919,00.html
Signed-off-by: David Mosberger <David.Mosberger@acm.org>
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linux ia64 port tried to decode the processor family number
to something human-readable, but Intel brandnames don't change
synchronously with updates to the family number. Adopt a more
i386-like approach and just print the family number in decimal.
Add a new field "model name" that uses PAL_BRAND_INFO to find
the official name for the cpu, or on older systems, falls back
to using the well-known codenames (Merced, McKinley, Madison).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Calls to set_irq_info in set_irq_affinity_info() is redundant because
irq_affinity mask was set just one line immediately above it. Remove
that duplicate call.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When CONFIG_PCI_MSI is set, move_irq() is an empty function, causing
grief when sys admin tries to bind interrupt to CPU.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When building sim_defconfig, which does not define CONFIG_ACPI
arch/ia64/kernel/acpi.c:71: warning: 'acpi_madt_rev' defined but not used
really acpi.c should not be built when CONFIG_ACPI=n...
Signed-off-by: Len Brown <len.brown@intel.com>
This closes a couple holes in our attribute aliasing avoidance scheme:
- The current kernel fails mmaps of some /dev/mem MMIO regions because
they don't appear in the EFI memory map. This keeps X from working
on the Intel Tiger box.
- The current kernel allows UC mmap of the 0-1MB region of
/sys/.../legacy_mem even when the chipset doesn't support UC
access. This causes an MCA when starting X on HP rx7620 and rx8620
boxes in the default configuration.
There's more detail in the Documentation/ia64/aliasing.txt file this
adds, but the general idea is that if a region might be covered by
a granule-sized kernel identity mapping, any access via /dev/mem or
mmap must use the same attribute as the identity mapping.
Otherwise, we fall back to using an attribute that is supported
according to the EFI memory map, or to using UC if the EFI memory
map doesn't mention the region.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] update sn2 defconfig
[IA64] Add mca recovery failure messages
[IA64-SGI] fix SGI Altix tioce_reserve_m32() bug
[IA64] enable dumps to capture second page of kernel stack
[IA64-SGI] - Reduce overhead of reading sn_topology
[IA64-SGI] - Fix discover of nearest cpu node to IO node
[IA64] IOC4 config option ordering
[IA64] Setup an IA64 specific reclaim distance
[IA64] eliminate compile time warnings
[IA64] eliminate compile time warnings
[IA64-SGI] SN SAL call to inject memory errors
[IA64] - Fix MAX_PXM_DOMAINS for systems with > 256 nodes
[IA64] Remove unused variable in sn_sal.h
[IA64] Remove redundant NULL checks before kfree
[IA64] wire up compat_sys_adjtimex()
When the mca recovery code encounters a condition that makes
the MCA non-recoverable, print the reason it could not recover.
This will make it easier to identify why the recovery code did
not recover.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).
This patch fixes all such usages to _not_ have the notifier_call __init
section.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.
This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.
Signed-off-by: Jens Axboe <axboe@suse.de>
Andrew Morton pointed out that compiler might not inline the functions
marked for inline in kprobes. There-by allowing the insertion of probes
on these kprobes routines, which might cause recursion.
This patch removes all such inline and adds them to kprobes section
there by disallowing probes on all such routines. Some of the routines
can even still be inlined, since these routines gets executed after the
kprobes had done necessay setup for reentrancy.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dmi_scan.c is arch-independent and is used by i386, x86_64, and ia64.
Currently all three arches compile it from arch/i386, which means that ia64
and x86_64 depend on things in arch/i386 that they wouldn't otherwise care
about.
This is simply "mv arch/i386/kernel/dmi_scan.c drivers/firmware/" (removing
trailing whitespace) and the associated Makefile changes. All three
architectures already set CONFIG_DMI in their top-level Kconfig files.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andrey Panin <pazke@orbita1.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'tee' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] splice: add support for sys_tee()
[PATCH] splice: pass offset around for ->splice_read() and ->splice_write()
ia64_wait_for_slaves() was changed in 2.6.17-rc1 to report the slave
state. It incorrectly assumes that all slaves are for MCA, but
ia64_wait_for_slaves() is also called from the INIT monarch handler.
The existing message is very misleading, so correct it.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.
Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.
Signed-off-by: Jens Axboe <axboe@suse.de>
The OS INIT handler is loading incorrect values into cr.ifa on exit.
This shows up as a hang when resuming after an INIT that is delivered
while a cpu is in user space. Correct the value loaded into cr.ifa.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The MCA/INIT handlers maintain important state in the SAL to OS (sos)
area and in the monarch_cpu flag. Kernel debuggers (such as KDB) need
this data, and may need to adjust the monarch_cpu field so make the
data available to the notify_die hooks. Define two more events for
calling the functions on the notify_die chain.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.
This patch replaces for_each_cpu with for_each_possible_cpu under
arch/ia64/kernel/.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fjitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Get rid of the manual search of _CRS, in favor of
acpi_get_vendor_resource() which is now provided by the ACPI CA. And fall
back to searching for a consumer-only address space descriptor if no
vendor-defined resource is found.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
gcc3 thinks that a 32-bit field of a u64 type is itself a u64, so
should be printed with "%ld". gcc4 thinks it needs just "%d".
Make both versions happy by avoiding this construct.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] ioremap() should prefer WB over UC
[IA64] Add __mca_table to the DISCARD list in gate.lds
[IA64] Move __mca_table out of the __init section
[IA64] simplify some condition checks in iosapic_check_gsi_range
[IA64] correct some messages and fixes some minor things
[IA64-SGI] fix for-loop in sn_hwperf_geoid_to_cnode()
[IA64-SGI] sn_hwperf use of num_online_cpus()
[IA64] optimize flush_tlb_range on large numa box
[IA64] lazy_mmu_prot_update needs to be aware of huge pages
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).
From the splice.c comments:
"splice": joining two ropes together by interweaving their strands.
This is the "extended pipe" functionality, where a pipe is used as
an arbitrary in-memory buffer. Think of a pipe as a small kernel
buffer that you can use to transfer data from one end to the other.
The traditional unix read/write is extended with a "splice()" operation
that transfers data buffers to or from a pipe buffer.
Named by Larry McVoy, original implementation from Linus, extended by
Jens to support splicing to files and fixing the initial implementation
bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add __mca_table to the DISCARD list for the gate.lds linker script to
avoid broken linker references when linking the final vmlinux file.
Also add comment to include/asm-ia64/asmmacros.h to avoid anyone else
hitting this problem in the future.
Credits to James Bottomley <James.Bottomley@SteelEye.com> for spotting
the DISCARD list in gate.lds.S
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some condition checks on iosapic_check_gsi_range() can be omitted
because always `base <= end' is assured. This patch simplifies those
checks.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch corrects some wrong comments and a printk message.
It also fixes some minor things, and makes all lines fit in
80 columns.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Anil S Keshavamurthy<anil.s.keshavamurthy@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently kprobe handler traps only happen in kernel space, so function
kprobe_exceptions_notify should skip traps which happen in user space.
This patch modifies this, and it is based on 2.6.16-rc4.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: <hiramatu@sdl.hitachi.co.jp>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When kretprobe probes the schedule() function, if the probed process exits
then schedule() will never return, so some kretprobe instances will never
be recycled.
In this patch the parent process will recycle retprobe instances of the
probed function and there will be no memory leak of kretprobe instances.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Almost all users of the table addresses from the EFI system table want
physical addresses. So rather than doing the pa->va->pa conversion, just keep
physical addresses in struct efi.
This fixes a DMI bug: the efi structure contained the physical SMBIOS address
on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
on a virtual address on ia64.
This is essentially the same as an earlier patch by Matt Tolentino:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
except that this changes all table addresses, not just ACPI addresses.
Matt's original patch was backed out because it caused MCAs on HP sx1000
systems. That problem is resolved by the ioremap() attribute checking added
for ia64.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the size, not a pointer to the size, to efi_mem_attribute_range().
This function validates memory regions for the /dev/mem read/write/mmap paths.
The pointer allows arches to reduce the size of the range, but I think that's
unnecessary complexity. Simplifying it will let me use
efi_mem_attribute_range() to improve the ia64 ioremap() implementation.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enable DMI table parsing on ia64.
Andi Kleen has a patch in his x86_64 tree which enables the use of i386
dmi_scan.c on x86_64. dmi_scan.c functions are being used by the
drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or
memory spaces where the IPMI controllers may be found.
This patch adds equivalent changes for ia64 as to what is in the x86_64
tree. In addition, I reworked the DMI detection, such that on EFI-capable
systems, it uses the efi.smbios pointer to find the table, rather than
brute-force searching from 0xF0000. On non-EFI systems, it continues the
brute-force search.
My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with
latest BIOS, does not list the IPMI controller in the ACPI namespace, nor
does it have an ACPI SPMI table. Also note, currently shipping Dell x8xx
EM64T servers don't have these either, so DMI is the only method for
obtaining the address of the IPMI controller.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...
Fixed trivial conflict in security/selinux/hooks.c
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] New IA64 core/thread detection patch
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Increase max node count on SN platforms
[IA64] Tollhouse HP: IA64 arch changes
[IA64] cleanup dig_irq_init
[IA64] MCA recovery: kernel context recovery table
IA64: Use early_parm to handle mvec_name and nomca
[IA64] move patchlist and machvec into init section
[IA64] add init declaration - nolwsys
[IA64] add init declaration - gate page functions
[IA64] add init declaration to memory initialization functions
[IA64] add init declaration to cpu initialization functions
[IA64] add __init declaration to mca functions
[IA64] Ignore disabled Local SAPIC Affinity Structure in SRAT
[IA64] sn_check_intr: use ia64_get_irr()
[IA64] fix ia64 is_hugepage_only_range
IPF SDM 2.2 changes definition of PAL_LOGICAL_TO_PHYSICAL to add
proc_number=-1 to get core/thread mapping info on the running processer.
Based on this change, we had better to update existing core/thread
detection in IA64 kernel correspondingly. The attached patch implements
this change. It simplifies detection code and eliminates potential race
condition. It also runs a bit faster and has better scalability especially
when cores and threads number grows up in one package.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Node number are kept in the cpu_to_node_map which is
currently defined as u8. Change to u16 to accomodate
larger node numbers.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add support in IA64 acpi for platforms that support more than
256 nodes. Currently, ACPI is limited to 256 nodes because the
proximity domain number is 8-bits.
Long term, we expect to use ACPI3.0 to support >256 nodes.
This patch is an interim solution that works with platforms
that pass the high order bits of the proximity domain in
"reserved" fields of the ACPI tables. This code is enabled
ONLY on SN platforms.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Memory errors encountered by user applications may surface
when the CPU is running in kernel context. The current code
will not attempt recovery if the MCA surfaces in kernel
context (privilage mode 0). This patch adds a check for cases
where the user initiated the load that surfaces in kernel
interrupt code.
An example is a user process lauching a load from memory
and the data in memory had bad ECC. Before the bad data
gets to the CPU register, and interrupt comes in. The
code jumps to the IVT interrupt entry point and begins
execution in kernel context. The process of saving the
user registers (SAVE_REST) causes the bad data to be loaded
into a CPU register, triggering the MCA. The MCA surfaces in
kernel context, even though the load was initiated from
user context.
As suggested by David and Tony, this patch uses an exception
table like approach, puting the tagged recovery addresses in
a searchable table. One difference from the exception table
is that MCAs do not surface in precise places (such as with
a TLB miss), so instead of tagging specific instructions,
address ranges are registers. A single macro is used to do
the tagging, with the input parameter being the label
of the starting address and the macro being the ending
address. This limits clutter in the code.
This patch only tags one spot, the interrupt ivt entry.
Testing showed that spot to be a "heavy hitter" with
MCAs surfacing while saving user registers. Other spots
can be added as needed by adding a single macro.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm not sure of the worthiness of this idea, so please consider it an RFC.
Its key merits are:
* Reuse existing infrastructure
* Greatly tightens up the parsing of nomca
* Greatly simplifies the parsing of machvec
Addition cleanup (moving setup_mvec() to machvec.c) by Ken Chen.
Signed-Off-By: Horms <horms@verge.net.au>
Signed-Off-By: Tony Luck <tony.luck@intel.com>
ia64_mv is initialized based on platform detected or specified.
However, there is one instantiation of each platform type. We
don't expect to switch platform vector during run time. Move
those platform specific type into init section since a copy is
made into global ia64_mv at initialization.
Also move instruction patch list into init section as well.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to bunch of patch functions and gate
page setup function.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to variables/functions used for memory
initialization. I don't think they would clash with memory
hotplug. If they do, please yell.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add init declaration to cpu initialization functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Mark init related variable and functions with appropriate
__init* declaration to mca functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to the ACPI spec, the OSPM must ignore the contents of the
Processor Local APIC/SAPIC Affinity Structure in System Resource
Affinity Table (SRAT), if its enable flag is cleared. However, ia64
linux refers all of the Processor Local APIC/SAPIC Affinity Structures
in SRAT regardless of the enable flag. This is obviously against the
ACPI spec. This patch fixes this bug.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Original 2.6.9 patch and explanation from somewhere within HP via
bugzilla...
ia64 stores a success/failure code in r10, and the return value (normal
return, or *positive* errno) in r8. The patch also sets the exit code to
negative errno if it's a failure result for consistency with other
architectures.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
A pte may be zapped by the swapper, exiting process, unmapping or page
migration while the accessed or dirty bit handers are about to run. In that
case the accessed bit or dirty is set on an zeroed pte which leads the VM to
conclude that this is a swap pte. This may lead to
- Messages from the vm like
swap_free: Bad swap file entry 4000000000000000
- Processes being aborted
swap_dup: Bad swap file entry 4000000000000000
VM: killing process ....
Page migration is particular suitable for the creation of this race since
it needs to remove and restore page table entries.
The fix here is to check for the present bit and simply not update
the pte if the page is not present anymore. If the page is not present
then the fault handler should run next which will take care of the problem
by bringing the page back and then mark the page dirty or move it onto the
active list.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When there is no bus check, the return code should be failure, not success.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
This stuff is all in the generic ia64 kernel, and the new initcall error
reporting complains about them.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The MCA recovery messages are currently KERN_DEBUG,
so they don't show up in /var/log/messages (by default).
Increase the severity to KERN_ERR, for the initial
message (and also add the physical address to this
message). Leave the successful isolation message as
KERN_DEBUG, but increase the severity when isolation
fails to KERN_CRIT.
[Russ' patch made these all KERN_CRIT]
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Allow sysadmin to disable all warnings about userland apps
making unaligned accesses by using:
# echo 1 > /proc/sys/kernel/ignore-unaligned-usertrap
Rather than having to use prctl on a process by process basis.
Default behaivour leaves the warnings enabled.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
beautify coding style for zeroing end of fsyscall_table entries.
Remove misleading __NR_syscall_last and add more comments.
Drop (now unneeded) "guard against failure to increase NR_syscalls"
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/kernel/unaligned.c erroneously marked die_if_kernel()
with a "noreturn" attribute ... which is silly (it returns whenever
the argument regs say that the fault happened in user mode, as one
might expect given the "if_kernel" part of its name!). Thanks to
Alan and Gareth for pointing this out.
Signed-off-by: Tony Luck <tony.luck@intel.com>
unaligned_access does fetch cr.ipsr, then calls
dispatch_unaligned_handler, but dispatch_unaligned_handler fetches
cr.ipsr again, so delete the first one.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Minor updates to earlier patch.
- Added to documentation to add ia64 as well.
- Minor clarification on how to use disabled cpus
- used plain max instead of max_t per Andew Morton.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It appears that if auditing is enabled, the kernel fails to
check for pending signals before returning to user mode.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Trivial port of this feature from i386
As it stands, panic_on_oops but does nothing on ia64
Signed-Off-By: Horms <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The original ia64 udelay() was simple, but flawed for platforms without
synchronized ITCs: a preemption and migration to another CPU during the
while-loop likely resulted in too-early termination or very, very
lengthy looping.
The first fix (now in 2.6.15) broke the delay loop into smaller,
non-preemptible chunks, reenabling preemption between the chunks. This
fix is flawed in that the total udelay is computed to be the sum of just
the non-premptible while-loop pieces, i.e., not counting the time spent
in the interim preemptible periods. If an interrupt or a migration
occurs during one of these interim periods, then that time is invisible
and only serves to lengthen the effective udelay().
This new fix backs out the current flawed fix and returns to a simple
udelay(), fully preemptible and interruptible. It implements two simple
alternative udelay() routines: one a default generic version that uses
ia64_get_itc(), and the other an sn-specific version that uses that
platform's RTC.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove symbol exports from ia64_ksyms.c that are already exported in
lib/string.c.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Have a facility to account for potentially hot-pluggable CPUs. ACPI doesnt
give a determinstic method to find hot-pluggable CPUs. Hence we use 2 methods
to assist.
- BIOS can mark potentially hot-pluggable CPUs as disabled in the MADT tables.
- User can specify the number of hot-pluggable CPUs via parameter
additional_cpus=X
The option is enabled only if ACPI_CONFIG_HOTPLUG_CPU=y which enables the
physical hotplug option. Without which user can still use logical onlining
and offlining of CPUs by enabling CONFIG_HOTPLUG_CPU=y
Adds more bits to cpu_possible_map for potentially hot-pluggable cpus.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Do not set cpu_possible_map for NR_CPUS when ACPI_CONFIG_HOTPLUG_CPU is set.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
MCA driver can cause panic if kernel gets a state info with no minstate.
This patch adds minstate validation before handling it.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Registers system call for the ia64 architecture.
Reserves space for ppoll and pselect, and adds unshare at system
call number 1296.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
No platform in the community tree uses PLATFORM_MCA_HANDLERS, remove
the references.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Update the comm field on the MCA handler for user tasks as well as for
verified kernel tasks. This helps to identify the task that was
running when the MCA occurred.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Print a message identifying the monarch MCA handler. Print a summary
of the status of the slave MCA cpus.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There were two problems with enabling the PRINTK_TIME config
option:
1) The first calls to printk() occur before per-cpu data virtual
address is pinned into the TLB, so sched_clock() can fault.
2) sched_clock() is based on ar.itc, which may not be synchronized
across cpus.
Ken Chen started this patch, Tony Luck tinkered with it, and Jes
Sorensen perfected it.
Signed-off-by: Tony Luck <tony.luck@intel.com>
The check of (end != cp) after memparse in efi.c looks wrong to me.
The result is that we can't use mem= and max_addr= kernel parameter at
the same time.
The following patch removed the check just like other arches do.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch implements cpu topology exportation by sysfs.
Items (attributes) are similar to /proc/cpuinfo.
1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
represent the physical package id of cpu X;
2) /sys/devices/system/cpu/cpuX/topology/core_id:
represent the cpu core id to cpu X;
3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
represent the thread siblings to cpu X in the same core;
4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
represent the thread siblings to cpu X in the same physical package;
To implement it in an architecture-neutral way, a new source file,
driver/base/topology.c, is to export the 5 attributes.
If one architecture wants to support this feature, it just needs to
implement 4 defines, typically in file include/asm-XXX/topology.h.
The 4 defines are:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
#define topology_thread_siblings(cpu)
#define topology_core_siblings(cpu)
The type of **_id is int.
The type of siblings is cpumask_t.
To be consistent on all architectures, the 4 attributes should have
deafult values if their values are unavailable. Below is the rule.
1) physical_package_id: If cpu has no physical package id, -1 is the
default value.
2) core_id: If cpu doesn't support multi-core, its core id is 0.
3) thread_siblings: Just include itself, if the cpu doesn't support
HT/multi-thread.
4) core_siblings: Just include itself, if the cpu doesn't support
multi-core and HT/Multi-thread.
So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
If an attribute isn't defined on an architecture, it won't be exported.
Thank Nathan, Greg, Andi, Paul and Venki.
The patch provides defines for i386/x86_64/ia64.
Signed-off-by: Zhang, Yanmin <yanmin.zhang@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If SAL_CACHE_FLUSH drops interrupts, complain about it and fall back to
using PAL_CACHE_FLUSH instead.
This is to work around a defect in HP rx5670 firmware: when an interrupt
occurs during SAL_CACHE_FLUSH, SAL drops the interrupt but leaves it marked
"in-service", which leaves the interrupt (and others of equal or lower
priority) masked.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The only user of the MCA/INIT sigdelayed code (SGI's I/O probing) has
moved from the kernel into SAL. Delete the MCA/INIT sigdelayed code.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>