The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.
In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.
This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
External driver files should not include any private acpica headers.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
ACPI_SIG_DSDT is acpica internal used only.
call acpi_get_table to avoid using ACPI_SIG_DSDT.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
acpi_ns_print_node_pathname is internal used only
use acpi_get_name instead to get node fullname
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
Impact: New APIs
The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask. Part of removing cpumasks from
the stack.
We can also use the new for_each_cpu_and() to avoid a temporary cpumask,
and a gratuitous test in sn_topology_show.
(Includes fix from KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
Phonet: keep TX queue disabled when the device is off
SCHED: netem: Correct documentation comment in code.
netfilter: update rwlock initialization for nat_table
netlabel: Compiler warning and NULL pointer dereference fix
e1000e: fix double release of mutex
IA64: HP_SIMETH needs to depend upon NET
netpoll: fix race on poll_list resulting in garbage entry
ipv6: silence log messages for locally generated multicast
sungem: improve ethtool output with internal pcs and serdes
tcp: tcp_vegas cong avoid fix
sungem: Make PCS PHY support partially work again.
Impact: change existing irq_chip API
Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.
Fortunately, not widely used code, but hits a few architectures.
Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly. Ingo, does this break anything?
(Folded in fix from KOSAKI Motohiro)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: change calling convention of existing cpumask APIs
Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.
These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
Impact: cleanup
Each SMP arch defines these themselves. Move them to a central
location.
Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
Those archs simply have phys_cpu_present_map replaced everywhere.
3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
so I just manipulate them both in sync.
4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
declarations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
With the introduction of the generic affinity autoselector,
irq_select_affinity(), IRQs are now being retargetted,
using a default mask, via the request_irq() path.
This results in all IRQs targetted at CPU 0.
SN Altix assigns affinity in the SN PROM, and does not
expect that to be changed as part of request_irq().
Set the IRQ_AFFINITY_SET flag to prevent
request_irq() from resetting affinity.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The generic_defconfig has three section mismatches. This clears
arch_unregister_cpu()
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The generic_defconfig has three section mismatches. This clears up
sn_check_wars().
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The AUTOFS=y and AUTOFS4=y causes problems with some distros versions of
automount. I turned both of those to =m and then followed the default
prompts for everything else. I did notice that CONFIG_PNP_DEBUG got
changed to CONFIG_PNP_DEBUG_MESSAGES and the default was a =y so I turned
that back to a =n.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As noted by Akinobu Mita in patch b1fceac2b9,
alloc_bootmem and related functions never return NULL and always return a
zeroed region of memory. Thus a NULL test or memset after calls to these
functions is unnecessary.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
CC arch/ia64/kernel/asm-offsets.s
In file included from include/linux/bitops.h:17,
from include/linux/kernel.h:15,
from include/linux/sched.h:52,
from arch/ia64/kernel/asm-offsets.c:9:
arch/ia64/include/asm/bitops.h: In function 'set_bit':
arch/ia64/include/asm/bitops.h:47: error: implicit declaration of function 'BUILD_BUG_ON'
Obvious inclusion of kernel.h doesn't fix it, because of circular dependencies
involving fls.h and log2(). Fixing the latter requires some serious header surgery,
it seems, so just remove BUILD_BUG_ON for now.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Simply replace netdev->priv with netdev_priv().
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: MMU: avoid creation of unreachable pages in the shadow
KVM: ppc: stop leaking host memory on VM exit
KVM: MMU: fix sync of ptes addressed at owner pagetable
KVM: ia64: Fix: Use correct calling convention for PAL_VPS_RESUME_HANDLER
KVM: ia64: Fix incorrect kbuild CFLAGS override
KVM: VMX: Fix interrupt loss during race with NMI
KVM: s390: Fix problem state handling in guest sigp handler
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).
Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathieu Desnoyers reported this build failure on powerpc:
kernel/sched.c: In function 'sd_init_NODE':
kernel/sched.c:7319: error: non-static initialization of a flexible array member
kernel/sched.c:7319: error: (near initialization for '(anonymous)')
this happens because .span changed to cpumask_var_t, hence
the static CPU_MASK_NONE initializers in the SD_*_INIT
templates are not type-correct anymore.
Remove them, as they default to empty anyway.
Also remove them from IA64, MIPS and SH.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
PAL_VPS_RESUME_HANDLER should use r26 to hold vac fields according to SDM.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use CFLAGS_vcpu.o, not EXTRA_CFLAGS, to provide fixed register information
to the compiler.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
fix xen_get_eflags. It doesn't take any argument.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
pv_cpu_ops.getreg(_IA64_REG_IP) returned constant.
But the returned ip valued should be the one in the caller, not of the callee.
This patch fixes that.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/kernel/pci-dma.c only needs to include iommu once.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Using printk from MCA/INIT context is unsafe since it can cause deadlock.
The ia64_mca_modify_original_stack is called from both of mca handler and
init handler, so it should use mprintk instead of printk.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Itanium processors can handle some misaligned data accesses. They
also provide a mode where all such accesses are forced to trap. The
kernel was schizophrenic about use of this mode:
* Base kernel code ran in permissive mode where the only traps
generated were from those cases that the h/w could not handle.
* Interrupt, syscall and trap code ran in strict mode where all
unaligned accesses caused traps to the 0x5a00 unaligned reference
vector.
Use strict alignment checking throughout the kernel, but make
sure that we continue to let user mode use more relaxed mode
as the default.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap current->cred and a few other accessors to hide their actual
implementation.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.
Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.
With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
In the case of !CONFIG_SMP, raw_spinlock_t is empty and the spinlock functions
don't build. Fix by defining spinlock functions for the uniprocessor case.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Before a vcpu blocks, it should switch to the guest signal mask to allow
signals to unblock it.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Impact: cleanup, change .config option name
We had this ugly config name for a long time for hysteric raisons.
Rename it to a saner name.
We still cannot get rid of it completely, until /proc/<pid>/stack
usage replaces WCHAN usage for good.
We'll be able to do that in the v2.6.29/v2.6.30 timeframe.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
IA64 kdump kernel failed to initialize /proc/vmcore in 2.6.28-rc2.
A bug was introduced in this patch commit:
d9a9855d0b
always reserve elfcore header memory in crash kernel
The problem was that the call to reserve_elfcorehdr() should be placed
in CONFIG_CRASH_DUMP rather than in CONFIG_CRASH_KERNEL, which does
not exist.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Hormon <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes a regression introduced by 2c6e6db41f
"Minimize per_cpu reservations." That patch incorrectly used information about
what CPUs are possible that was not yet initialized by ACPI. The end result
was that per_cpu structures for offline CPUs were not initialized causing a
NULL pointer reference.
Since we cannot do the full acpi_boot_init() call any earlier, the simplest
fix is to just parse the MADT for SAPIC entries early to find the CPU
info. This should also allow for some cleanup of the code added by the
"Minimize per_cpu reservations". This patch just fixes the regressions, the
cleanup will come in a later patch.
Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
CC: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
No functional change, just reorder some config options and update
the "Power management and ACPI" label to match the defacto x86
standard.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The block layer dropped the virtual merge feature
(b8b3e16cfe). BIO_VMERGE_BOUNDARY
definition is meaningless now (For IA64, BIO_VMERGE_BOUNDARY has been
meaningless for a long time since IA64 disables the virtual merge
feature).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove the swiotlb prototypes from the architecture code and use the
common header file instead.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
makedumpfile[1] cannot run on ia64 discontigmem kernel, because the member
node_mem_map of struct pgdat_list has invalid value. This patch fixes it.
node_start_pfn shows the start pfn of each node, and node_mem_map should
point 'struct page' of each node's node_start_pfn. On my machine, node0's
node_start_pfn shows 0x400 and its node_mem_map points 0xa0007fffbf000000.
This address is the same as vmem_map, so the node_mem_map points 'struct
page' of pfn 0, even if its node_start_pfn shows 0x400.
The cause is due to the round down of min_pfn in count_node_pages() and
node0's node_mem_map points 'struct page' of inactive pfn (0x0). This
patch fixes it.
makedumpfile[1]: dump filtering command
https://sourceforge.net/projects/makedumpfile/
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Cc: Bernhard Walle <bwalle@suse.de>
Cc: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add the error_recovery_info field to the SAL section header,
as defined in the SAL Spec.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add partition id, coherence id, and region size to UV to
make life simpler for drivers shared between sn2 & uv.
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Common halt logic was changed by x86 and did not update ia64. This patch
updates halt for ia64.
Fixes a regression causing guests to hang with more than 2 vcpus.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Every call of kvm_set_irq() should offer an irq_source_id, which is
allocated by kvm_request_irq_source_id(). Based on irq_source_id, we
identify the irq source and implement logical OR for shared level
interrupts.
The allocated irq_source_id can be freed by kvm_free_irq_source_id().
Currently, we support at most sizeof(unsigned long) different irq sources.
[Amit: - rebase to kvm.git HEAD
- move definition of KVM_USERSPACE_IRQ_SOURCE_ID to common file
- move kvm_request_irq_source_id to the update_irq ioctl]
[Xiantao: - Add kvm/ia64 stuff and make it work for kvm/ia64 guests]
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
On Thu, Oct 23, 2008 at 04:09:52PM -0700, Alexander Beregalov wrote:
> arch/x86/kernel/built-in.o: In function `iommu_setup':
> pci-dma.c:(.init.text+0x36ad): undefined reference to `forbid_dac'
> pci-dma.c:(.init.text+0x36cc): undefined reference to `forbid_dac'
> pci-dma.c:(.init.text+0x3711): undefined reference to `forbid_dac
This patch partially reverts a patch to add IOMMU support to ia64. The
forbid_dac variable was incorrectly moved to quirks.c, which isn't built
when PCI is disabled.
Tested-by: "Alexander Beregalov" <a.beregalov@gmail.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
hrtimers: add missing docbook comments to struct hrtimer
hrtimers: simplify hrtimer_peek_ahead_timers()
hrtimers: fix docbook comments
DECLARE_PER_CPU needs linux/percpu.h
hrtimers: fix typo
rangetimers: fix the bug reported by Ingo for real
rangetimer: fix BUG_ON reported by Ingo
rangetimer: fix x86 build failure for the !HRTIMERS case
select: fix alpha OSF wrapper
select: fix alpha OSF wrapper
hrtimer: peek at the timer queue just before going idle
hrtimer: make the futex() system call use the per process slack value
hrtimer: make the nanosleep() syscall use the per process slack
hrtimer: fix signed/unsigned bug in slack estimator
hrtimer: show the timer ranges in /proc/timer_list
hrtimer: incorporate feedback from Peter Zijlstra
hrtimer: add a hrtimer_start_range() function
hrtimer: another build fix
hrtimer: fix build bug found by Ingo
hrtimer: make select() and poll() use the hrtimer range feature
...
* 'x86/um-header' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
x86: canonicalize remaining header guards
x86: drop double underscores from header guards
x86: Fix ASM_X86__ header guards
x86, um: get rid of uml-config.h
x86, um: get rid of arch/um/Kconfig.arch
x86, um: get rid of arch/um/os symlink
x86, um: get rid of excessive includes of uml-config.h
x86, um: get rid of header symlinks
x86, um: merge Kconfig.i386 and Kconfig.x86_64
x86, um: get rid of sysdep symlink
x86, um: trim the junk from uml ptrace-*.h
x86, um: take vm-flags.h to sysdep
x86, um: get rid of uml asm/arch
x86, um: get rid of uml highmem.h
x86, um: get rid of uml unistd.h
x86, um: get rid of system.h -> system.h include
x86, um: uml atomic.h is not needed anymore
x86, um: untangle uml ldt.h
x86, um: get rid of more uml asm/arch uses
x86, um: remove dead header (uml module-generic.h; never used these days)
...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (123 commits)
dock: make dock driver not a module
ACPI: fix ia64 build warning
ACPI: hack around sysfs warning with link order
ACPI suspend: fix build warning when CONFIG_ACPI_SLEEP=n
intel_menlo: fix build warning
panasonic-laptop: fix build
ACPICA: Update version to 20080926
ACPICA: Add support for zero-length buffer-to-string conversions
ACPICA: New: Validation for predefined ACPI methods/objects
ACPICA: Fix for implicit return compatibility
ACPICA: Fixed a couple memory leaks associated with "implicit return"
ACPICA: Optimize buffer allocation procedure
ACPICA: Fix possible memory leak, error exit path
ACPICA: Fix fault after mem allocation failure in AML parser
ACPICA: Remove unused ACPI register bit definition
ACPICA: Update version to 20080829
ACPICA: Fix possible memory leak in acpi_ns_get_external_pathname
ACPICA: Cleanup for internal Reference Object
ACPICA: Update comments - no functional changes
ACPICA: Update for Reference ACPI_OPERAND_OBJECT
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (21 commits)
OProfile: Fix buffer synchronization for IBS
oprofile: hotplug cpu fix
oprofile: fixing whitespaces in arch/x86/oprofile/*
oprofile: fixing whitespaces in arch/x86/oprofile/*
oprofile: fixing whitespaces in drivers/oprofile/*
x86/oprofile: add the logic for enabling additional IBS bits
x86/oprofile: reordering functions in nmi_int.c
x86/oprofile: removing unused function parameter in add_ibs_begin()
oprofile: more whitespace fixes
oprofile: whitespace fixes
OProfile: Rename IBS sysfs dir into "ibs_op"
OProfile: Rework string handling in setup_ibs_files()
OProfile: Rework oprofile_add_ibs_sample() function
oprofile: discover counters for op ppro too
oprofile: Implement Intel architectural perfmon support
oprofile: Don't report Nehalem as core_2
oprofile: drop const in num counters field
Revert "Oprofile Multiplexing Patch"
x86, oprofile: BUG: using smp_processor_id() in preemptible code
x86/oprofile: fix on_each_cpu build error
...
Manually fixed trivial conflicts in
drivers/oprofile/{cpu_buffer.c,event_buffer.h}
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (41 commits)
[IA64] Fix annoying IA64_TR_ALLOC_MAX message.
[IA64] kill sys32_pipe
[IA64] remove sys32_pause
[IA64] Add Variable Page Size and IA64 Support in Intel IOMMU
ia64/pv_ops: paravirtualized instruction checker.
ia64/xen: a recipe for using xen/ia64 with pv_ops.
ia64/pv_ops: update Kconfig for paravirtualized guest and xen.
ia64/xen: preliminary support for save/restore.
ia64/xen: define xen machine vector for domU.
ia64/pv_ops/xen: implement xen pv_time_ops.
ia64/pv_ops/xen: implement xen pv_irq_ops.
ia64/pv_ops/xen: define the nubmer of irqs which xen needs.
ia64/pv_ops/xen: implement xen pv_iosapic_ops.
ia64/pv_ops/xen: paravirtualize entry.S for ia64/xen.
ia64/pv_ops/xen: paravirtualize ivt.S for xen.
ia64/pv_ops/xen: paravirtualize DO_SAVE_MIN for xen.
ia64/pv_ops/xen: define xen paravirtualized instructions for hand written assembly code
ia64/pv_ops/xen: define xen pv_cpu_ops.
ia64/pv_ops/xen: define xen pv_init_ops for various xen initialization.
ia64/pv_ops/xen: elf note based xen startup.
...
arch/ia64/sn/kernel/io_acpi_init.c:361: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘long long unsigned int’
Signed-off-by: Len Brown <len.brown@intel.com>
This adds the ability to mmap legacy IO space to the legacy_io files
in sysfs on platforms that support it. This will allow to clean up
X to use this instead of /dev/mem for legacy IO accesses such as
those performed by Int10.
While at it I moved pci_create/remove_legacy_files() to pci-sysfs.c
where I think they belong, thus making more things statis in there
and cleaned up some spurrious prototypes in the ia64 pci.h file
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
elfcore header memory needs to be reserved in a crash kernel. This means
that the relevant code should be protected by CONFIG_CRASH_DUMP rather
than CONFIG_PROC_VMCORE.
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The usage of elfcorehdr_addr has changed recently such that being set to
ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is
executing in a kernel executed as a crash kernel.
However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest
elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent
calls to is_kdump_kernel() will return 0, even though they should return
1.
Ok, at this point in time there are no subsequent calls, but I think its
fair to say that there is ample scope for error or at the very least
confusion.
This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that
elfcorehdr_addr was passed on the command line, and thus execution is
taking place in a crashdump kernel, but vmcore can't be used for some
reason. This is tested for using is_vmcore_usable() and set using
vmcore_unusable(). A subsequent patch makes use of this new code.
To summarise, the states that elfcorehdr_addr can now be in are as follows:
ELFCORE_ADDR_MAX: not a crashdump kernel
ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable
any other value: crash dump kernel and vmcore is usable
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o Make use of is_kdump_kernel() rather than checking elfcorehdr_addr directly.
o Remove CONFIG_CRASH_DUMP as is_kdump_kernel() is safe to call anywhere
o Remove CONFIG_PROC_FS as it is bogus, the check
should occur regardless of if CONFIG_PROC_FS is set or not.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE
but also by the code which is not inside CONFIG_PROC_VMCORE. For
example, is_kdump_kernel() is used by powerpc code to determine if
kernel is booting after a panic then use previous kernel's TCE table.
So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be
able to correctly determine that we are booting after a panic and setup
calgary iommu accordingly.
o So remove the assumption that elfcorehdr_addr is under
CONFIG_PROC_VMCORE.
o Move definition of elfcorehdr_addr to arch dependent crash files.
(Unfortunately crash dump does not have an arch independent file
otherwise that would have been the best place).
o kexec.c is not the right place as one can Have CRASH_DUMP enabled in
second kernel without KEXEC being enabled.
o I don't see sh setup code parsing the command line for
elfcorehdr_addr. I am wondering how does vmcore interface work on sh.
Anyway, I am atleast defining elfcoredhr_addr so that compilation is not
broken on sh.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch implements a new freezer subsystem in the control groups
framework. It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.
The freezer subsystem in the container filesystem defines a file named
freezer.state. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup. Reading will return the current state.
* Examples of usage :
# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks
to get status of the freezer subsystem :
# cat /containers/0/freezer.state
RUNNING
to freeze all tasks in the container :
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZEN
to unfreeze all tasks in the container :
# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNING
This is the basic mechanism which should do the right thing for user space
task in a simple scenario.
It's important to note that freezing can be incomplete. In that case we
return EBUSY. This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time. After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read. The state will remain
"FREEZING" until one of these things happens:
1) Userspace cancels the freezing operation by writing "RUNNING" to
the freezer.state file
2) Userspace retries the freezing operation by writing "FROZEN" to
the freezer.state file (writing "FREEZING" is not legal
and returns EIO)
3) The tasks that blocked the cgroup from entering the "FROZEN"
state disappear from the cgroup's set of tasks.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is nothing architecture specific about remove_memory().
remove_memory() function is common for all architectures which support
hotplug memory remove. Instead of duplicating it in every architecture,
collapse them into arch neutral function.
[akpm@linux-foundation.org: fix the export]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Madison cpus support 64 TR registers. Increase IA64_TR_ALLOC_MAX
to 64. Also fixup the messages that get printed when this limit
is exceeded. Repeating for every cpu is too noisy.
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's just a duplicate of the generic sys_pipe that still lacks the
recently added error handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's just a duplicate of the native sys_pause, which we can use after
defining __ARCH_WANT_SYS_PAUSE.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch contains Intel IOMMU IA64 specific code. It defines new
machvec dig_vtd, hooks for IOMMU, DMAR table detection, cache line flush
function, etc.
For a generic kernel with CONFIG_DMAR=y, if Intel IOMMU is detected,
dig_vtd is used for machinve vector. Otherwise, kernel falls back to
dig machine vector. Kernel parameter "machvec=dig" or "intel_iommu=off"
can be used to force kernel to boot dig machine vector.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements a checker to detect instructions which
should be paravirtualized instead of direct writing raw instruction.
This patch does rough check so that it doesn't fully cover all cases,
but it can detects most cases of paravirtualization breakage of hand
written assembly codes.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
preliminary support for save/restore.
Although Save/restore isn't fully working yet, this patch is necessary
to compile.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define arch/ia64/include/asm/xen/irq.h to define the number of
irqs which xen needs.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize entry.S for ia64/xen by multi compile.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements xen version of pv_init_ops to do various
xen initialization.
This patch also includes ia64 counter part of x86 xen early printk support
patches.
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch enables elf note based xen startup for IA-64, which gives the
kernel an early hint for running on xen like x86 case.
In order to avoid the multi entry point, presumably extending booting
protocol(i.e. extending struct ia64_boot_param) would be necessary.
It probably means that elilo also needs modification.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Xen paravirtualizes interrupt as event channel.
This patch defines arch specific part of xen event channel.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Xen implements grant tables which is for sharing pages with
guest domains.
This patch implements arch specific part of grant table initialization.
and xen_alloc_vm_area()/xen_free_vm_area() which are helper functions
for xen grant table.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On ia64/xen, pointer arguments for hypercall is passed
by pseudo physical address(guest physical address.)
So such hypercalls needs address conversion functions.
This patch implements concrete conversion functions for
such hypercalls.
Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On ia64/xen, pointer argument for the hypercall is passed
by pseudo physical address (guest physical address.)
So it is necessary to convert virtual address into pseudo physical
address right before issuing hypercall. The frame work is called
xencomm. This patch implements arch specific part.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Create include/asm-ia64/pvclock-abi.h to compile which contains
the same definitions of include/asm-x86/pvclock-abi.h because ia64/xen
uses same structure.
Hopefully include/asm-x86/pvclock-abi.h would be moved to somewhere
more generic.
Another approach is to include include/asm-x86/pvclock-abi.h
from include/asm-ia64/pvclock-abi.h. But this would break
if/when x86 header files are moved under arch/x86.
So for now, same definitions are duplicated as suggested by Tony.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
import arch/ia64/include/asm/xen/interface.h to introduce
definitions necessary for ia64/xen hypercalls.
They are basic structures to communicate with xen hypervisor and
will be used later.
Cc: Robin Holt <holt@sgi.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Xenlinux/ia64 needs to reserve one more region passed from xen hypervisor
as start info.
Cc: Robin Holt <holt@sgi.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define sync bitops which is necessary for ia64/xen.
This bit operation is used to communicate with VMM or other guest kernel
Even when this kernel is built for UP, VMM might be SMP so that those operation
must always use atomic operation.
Cc: Robin Holt <holt@sgi.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
reserve "break" numbers used for xen hypercalls to avoid
reuse for something else.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
eliminate the function declaration ia64_cpu_local_tick() in
process.c by defining in arch/ia64/include/asm/timex.h
The same function will be used in a different .c file later.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The macro get_irq_chip() is defined in linux/include/linux/irq.h
which cause name conflict with one in linux/arch/ia64/include/asm/paravirt.h.
rename the latter to __get_irq_chip().
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When CONFIG_SMP=n, three instruction in ivt.S were missed to paravirtualize.
paravirtualize them.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (134 commits)
KVM: ia64: Add intel iommu support for guests.
KVM: ia64: add directed mmio range support for kvm guests
KVM: ia64: Make pmt table be able to hold physical mmio entries.
KVM: Move irqchip_in_kernel() from ioapic.h to irq.h
KVM: Separate irq ack notification out of arch/x86/kvm/irq.c
KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archs
KVM: Move device assignment logic to common code
KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/
KVM: VMX: enable invlpg exiting if EPT is disabled
KVM: x86: Silence various LAPIC-related host kernel messages
KVM: Device Assignment: Map mmio pages into VT-d page table
KVM: PIC: enhance IPI avoidance
KVM: MMU: add "oos_shadow" parameter to disable oos
KVM: MMU: speed up mmu_unsync_walk
KVM: MMU: out of sync shadow core
KVM: MMU: mmu_convert_notrap helper
KVM: MMU: awareness of new kvm_mmu_zap_page behaviour
KVM: MMU: mmu_parent_walk
KVM: x86: trap invlpg
KVM: MMU: sync roots on mmu reload
...
Nothing arch specific in get/settimeofday. The details of the timeval
conversion varied a little from arch to arch, but all with the same
results.
Also add an extern declaration for sys_tz to linux/time.h because externs
in .c files are fowned upon. I'll kill the externs in various other files
in a sparate patch.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct stat / compat_stat is the same on all architectures, so
cp_compat_stat should be, too.
Turns out it is, except that various architectures have slightly and some
high2lowuid/high2lowgid or the direct assignment instead of the
SET_UID/SET_GID that expands to the correct one anyway.
This patch replaces the arch-specific cp_compat_stat implementations with
a common one based on the x86-64 one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
Acked-by: Kyle McMartin <kyle@mcmartin.ca> [ parisc bits ]
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using "def_bool n" is pointless, simply using bool here appears more
appropriate.
Further, retaining such options that don't have a prompt and aren't
selected by anything seems also at least questionable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SET_PERSONALITY macro is always called with a second argument of 0.
Remove the ibcs argument and the various tests to set the PER_SVR4
personality.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With intel iommu hardware, we can assign devices to kvm/ia64 guests.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Using vt-d, kvm guests can be assigned physcial devices, so
this patch introduce a new mmio type (directed mmio)
to handle its mmio access.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Don't try to do put_page once the entries are mmio.
Set the tag to indicate the mmio space for vmm setting
TLB's memory attribute.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Moving irq ack notification logic as common, and make
it shared with ia64 side.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
In Tukwila processor, VT-i has been enhanced in its
implementation, it is often called VT-i2 techonology.
With VTi-2 support, virtulization performance should be
improved. In this patch, we added the related stuff to
support kvm/ia64 for Tukwila processors.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
An uniform entry kvm_vps_entry is added for
vps_sync_write/read, vps_resume_handler/guest,
and branches to differnt PAL service according to the offset.
Singed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Two ioctl arch functions are added to set vcpu's smp state.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm/ia64 uses the virtio drivers to optimize its I/O subsytem.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Before enabling notify_acked_irq for ia64, leave the related APIs as
nop-op first.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* 'x86-v28-for-linus-phase3-B' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
AMD IOMMU: use iommu_device_max_index, fix
AMD IOMMU: use iommu_device_max_index
x86: add PCI IDs for AMD Barcelona PCI devices
x86/iommu: use __GFP_ZERO instead of memset for GART
x86/iommu: convert GART need_flush to bool
x86/iommu: make GART driver checkpatch clean
x86 gart: remove unnecessary initialization
x86: restore old GART alloc_coherent behavior
revert "x86: make GART to respect device's dma_mask about virtual mappings"
x86: export pci-nommu's alloc_coherent
iommu: remove fullflush and nofullflush in IOMMU generic option
x86: remove set_bit_string()
iommu: export iommu_area_reserve helper function
AMD IOMMU: use coherent_dma_mask in alloc_coherent
add AMD IOMMU tree to MAINTAINERS file
AMD IOMMU: use cmd_buf_size when freeing the command buffer
AMD IOMMU: calculate IVHD size with a function
AMD IOMMU: remove unnecessary cast to u64 in the init code
AMD IOMMU: free domain bitmap with its allocation order
AMD IOMMU: simplify dma_mask_to_pages
...
As of version 2.0, ACPI can return 64-bit integers. The current
acpi_evaluate_integer only supports 64-bit integers on 64-bit platforms.
Change the argument to take a pointer to an acpi_integer so we support
64-bit integers on all platforms.
lenb: replaced use of "acpi_integer" with "unsigned long long"
lenb: fixed bug in acpi_thermal_trips_update()
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Initial fix for making sure that we can access percpu variables
in all C code (commit: 10617bbe84)
inadvertantly allocated the memory in the "percpu" section of
the vmlinux ELF executable. This confused kexec/dump.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently a SIGTRAP can denote any one of below reasons.
- Breakpoint hit
- H/W debug register hit
- Single step
- Signal sent through kill() or rasie()
Architectures like powerpc/parisc provides infrastructure to demultiplex
SIGTRAP signal by passing down the information for receiving SIGTRAP through
si_code of siginfot_t structure. Here is an attempt is generalise this
infrastructure by extending it to x86 and x86_64 archs.
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: akpm@linux-foundation.org
Cc: paulus@samba.org
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently a memory segment in memory map with attribute of EFI_MEMORY_UC
is denoted as "System RAM" in /proc/iomem, while memory of attribute
(EFI_MEMORY_WB|EFI_MEMORY_UC) is also labeled the same.
The kexec utility then includes uncached memory as part of vmcore. The
kdump kernel MCA'ed when it tries to save the vmcore to a disk. A normal
"cached" access may cause MCAs.
This patch would label memory with attribute of EFI_MEMORY_UC only as
"Uncached RAM" so that kexec would know not to include it in the vmcore.
I will submit a separate kexec-tools patch to the kexec list.
Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Peter Chubb reported that commit 3463a93def
(Update check_sal_cache_flush to use platform_send_ipi()) broke
Ski because it does not implement IPIs.
Tony Luck suggested we just #ifndef out the call (since the simulator
does not have the SAL bug that this code is attempting to detect and
workaround)
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Commit 4611a77 ("[IA64] fix compile failure with non modular builds")
introduced struct fdesc into asm/elf.h, which duplicates KVM's definition.
Remove the latter to avoid the build error.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Make ia64 refrain from clearing a given to-be-offlined CPU's bit in the
cpu_online_mask until it has processed pending irqs. This change
prevents other CPUs from being blindsided by an apparently offline CPU
nevertheless changing globally visible state. Also remove the existing
redundant cpu_clear(cpu, cpu_online_map).
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Error handling code following a kmalloc should free the allocated data.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
bte.h expects a #define of L1_CACHE_MASK which is currently only
in bte.c. This small patch gets bte.h to include cleanly and makes
BTE_UNALIGNED_COPY not report errors.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Broke the non modular builds by moving an essential function into
modules.c. Fix this by moving it out again and into asm/sections.h as
an inline. To do this, the definitions of struct fdesc and struct
got_val have been lifted out of modules.c and put in asm/elf.h where
they belong.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7. However,
the current way its coded doesn't work on parisc64. For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors
Make dereference_function_descriptor() more accommodating by allowing
architecture overrides. I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, there is no notifier that is called on a new cpu, before the new
cpu begins processing interrupts/softirqs.
Various kernel function would need that notification, e.g. kvm works around
by calling smp_call_function_single(), rcu polls cpu_online_map.
The patch adds a CPU_STARTING notification. It also adds a helper function
that sends the message to all cpu_chain handlers.
Tested on x86-64.
All other archs are untested. Especially on sparc, I'm not sure if I got
it right.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes dma_alloc_coherent use GFP_DMA at all times. This is
necessary for swiotlb, which requires the callers to set up the gfp
flags properly.
swiotlb_alloc_coherent tries to allocate pages with the gfp flags. If
the allocated memory isn't fit for dev->coherent_dma_mask,
swiotlb_alloc_coherent reserves some of the swiotlb memory area, which
is precious resource. So the callers need to set up the gfp flags
properly.
This patch means that other IA64 IOMMUs' dma_alloc_coherent also use
GFP_DMA. These IOMMUs (e.g. SBA IOMMU) don't need GFP_DMA since they
can map a memory to any address. But IA64's GFP_DMA is large,
generally drivers allocate small memory with dma_alloc_coherent only
at startup. So I chose the simplest way to set up the gfp flags for
swiotlb.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch also includes the required removal of (unused) inclusion of
<asm/a.out.h> <linux/a.out.h>'s in the arch/ code for these
architectures.
[dwmw2: updated for 2.6.27-rc]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In order to be able to do range hrtimers we need to use accessor functions
to the "expire" member of the hrtimer struct.
This patch converts KVM-ia64 to these accessors.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Some ia64 systems produce several repeats of kernel messages like this:
kernel unaligned access to 0xe000000644220466, ip=0xa000000100516fa1
This was tracked to ide code using the __cmd[] field in "struct request"
via the __outsw() function. __cmd[] is a char array, so is not guaranteed
to be properly aligned when accessed as words.
Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
CONFIG_SFC=m uses topology_core_siblings() which, for ia64, expects
cpu_core_map to be exported. It is not. This patch exports the needed
symbol.
Maintainers note: This really looks like the wrong thing to do ... it
would be much better for the kernel to export an API to provide
drivers like this with data they need (which in the case of this
driver seems to be an estimate of the effective parallelism available
on the platform). But x86 has exported this forever ... so go with
the flow until such an API is defined.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Switch ia64 to the generic compat_sys_old_readdir which is identical
except for slightly better error handling. Also remove sys32_getdents
which already isn't wired up to the syscall table anymore in favour of
compat_sys_getdents.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The code walks all the acpi _CRS methods to see how many windows
to allocate. It then scans them all again to insert_resource()
for each *even if the first scan found that there were none*.
Move the second scan inside the "if (windows)" clause.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Making allmodconfig will break the current build. This patch shrinks
the per_cpu__shadow_flush_counts from 16k to 8k which frees enough space
to allow allmodconfig to successfully complete.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=11338
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not really a patch as much as a remove this file request. Now that
generic_defconfig supports all the configurations SGI currently supports
and has NR_CPUS and NR_NODES at our largest configurations, we have no
reason to maintain the extra defconfig file.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.
[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch changes ia64 to use the new bcd2bin/bin2bcd functions instead
of the obsolete BCD2BIN/BIN2BCD macros.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 handles per-cpu variables a litle differently from other architectures
in that it maps the physical memory allocated for each cpu at a constant
virtual address (0xffffffffffff0000). This mapping is not enabled until
the architecture specific cpu_init() function is run, which causes problems
since some generic code is run before this point. In particular when
CONFIG_PRINTK_TIME is enabled, the boot cpu will trap on the access to
per-cpu memory at the first printk() call so the boot will fail without
the kernel printing anything to the console.
Fix this by allocating percpu memory for cpu0 in the kernel data section
and doing all initialization to enable percpu access in head.S before
calling any generic code.
Other cpus must take care not to access per-cpu variables too early, but
their code path from start_secondary() to cpu_init() is all in arch/ia64
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch changes the generic_defconfig so it works on all sn2
platforms I have access to. There is only one support configuration
which was not tested and that configuration is only a combination of two
tested configurations. With this patchset applied, a generic kernel can
be booted on either a RHEL 5.2, RHEL5.3, or SLES10 SP1 root and operate.
All features needed by SGI's ProPack are also working. I have not
tested all features of RHEL or SLES, but they do at least boot.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch updates the generic_defconfig for 2.6.27-rc1 by simply doing
a make oldconfig and holding down the carriage return.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 has compiled with NR_CPUS=4096 for a couple releases, just forgot
to update Kconfig to allow it.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
arch/ia64/kernel/vmlinux.lds is a generated file. Tell
git to ignore it.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Recent kernels are not booting on some HP systems (though
it does boot on others). James and Willy reported the
problem. James did the bisection to find the commit
that caused the problem:
498c517047.
[IA64] pvops: paravirtualize ivt.S
Two instructions were wrongly paravirtualized such that
_FROM_ macro had been used where _TO_ was intended
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Wilcox, Matthew R" <matthew.r.wilcox@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After moving the the include files there were a few clean-ups:
1) Some files used #include <asm-ia64/xyz.h>, changed to <asm/xyz.h>
2) Some comments alerted maintainers to look at various header files to
make matching updates if certain code were to be changed. Updated these
comments to use the new include paths.
3) Some header files mentioned their own names in initial comments. Just
deleted these self references.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This series of patches adds a driver for the SGI UV GRU. The driver is
still in development but it currently compiles for both x86_64 & IA64.
All simple regression tests pass on IA64. Although features remain to be
added, I'd like to start the process of getting the driver into the
kernel. Additional kernel drivers will depend on services provide by the
GRU driver.
The GRU is a hardware resource located in the system chipset. The GRU
contains memory that is mmaped into the user address space. This memory
is used to communicate with the GRU to perform functions such as
load/store, scatter/gather, bcopy, AMOs, etc. The GRU is directly
accessed by user instructions using user virtual addresses. GRU
instructions (ex., bcopy) use user virtual addresses for operands.
The GRU contains a large TLB that is functionally very similar to
processor TLBs. Because the external contains a TLB with user virtual
address, it requires callouts from the core VM system when certain types
of changes are made to the process page tables. There are several MMUOPS
patches currently being discussed but none has been accepted into the
kernel. The GRU driver is built using version V18 from Andrea Arcangeli.
This patch:
Contains the definitions of the hardware GRU data structures that are used
by the driver to manage the GRU.
[akpm@linux-foundation;org: export hpage_shift]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a call to local_irq_restore in the normal exit case, so it would
seem that there should be one on an error return as well.
The semantic patch that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression l;
expression E,E1,E2;
@@
local_irq_save(l);
... when != local_irq_restore(l)
when != spin_unlock_irqrestore(E,l)
when any
when strict
(
if (...) { ... when != local_irq_restore(l)
when != spin_unlock_irqrestore(E1,l)
+ local_irq_restore(l);
return ...;
}
|
if (...)
+ {local_irq_restore(l);
return ...;
+ }
|
spin_unlock_irqrestore(E2,l);
|
local_irq_restore(l);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This extends wait_task_inactive() with a new argument so it can be used in
a "soft" mode where it will check for the task changing state unexpectedly
and back off. There is no change to existing callers. This lays the
groundwork to allow robust, noninvasive tracing that can try to sample a
blocked thread but back off safely if it wakes up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value. This patch implements
the handling of O_CLOEXEC for the flag. I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation. I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.
The implementation introduces do_pipe_flags. I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code. To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags. Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_pipe2
# ifdef __x86_64__
# define __NR_pipe2 293
# elif defined __i386__
# define __NR_pipe2 331
# else
# error "need __NR_pipe2"
# endif
#endif
int
main (void)
{
int fd[2];
if (syscall (__NR_pipe2, fd, 0) != 0)
{
puts ("pipe2(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
{
puts ("pipe2(O_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Almost all users of this field need a PFN instead of a physical address,
so replace node_boot_start with node_min_pfn.
[Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
Signed-off-by: Johannes Weiner <hannes@saeureba.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Straight forward extensions for huge pages located in the PUD instead of
PMDs.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The goal of this patchset is to support multiple hugetlb page sizes. This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg. huge page
size, number of huge pages currently allocated, etc).
The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.
This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.
Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.
[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The double indirection here is not needed anywhere and hence (at least)
confusing.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a lot of places that define either a single bootmem descriptor or an
array of them. Use only one central array with MAX_NUMNODES items instead.
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (25 commits)
mmtimer: Push BKL down into the ioctl handler
[IA64] Remove experimental status of kdump
[IA64] Update ia64 mmr list for SGI uv
[IA64] Avoid overflowing ia64_cpu_to_sapicid in acpi_map_lsapic()
[IA64] adding parameter check to module_free()
[IA64] improper printk format in acpi-cpufreq
[IA64] pv_ops: move some functions in ivt.S to avoid lack of space.
[IA64] pvops: documentation on ia64/pv_ops
[IA64] pvops: add to hooks, pv_time_ops, for steal time accounting.
[IA64] pvops: add hooks, pv_irq_ops, to paravirtualized irq related operations.
[IA64] pvops: add hooks, pv_iosapic_ops, to paravirtualize iosapic.
[IA64] pvops: define initialization hooks, pv_init_ops, for paravirtualized environment.
[IA64] pvops: paravirtualize NR_IRQS
[IA64] pvops: paravirtualize entry.S
[IA64] pvops: paravirtualize ivt.S
[IA64] pvops: paravirtualize minstate.h.
[IA64] pvops: define paravirtualized instructions for native.
[IA64] pvops: preparation for paravirtulization of hand written assembly code.
[IA64] pvops: introduce pv_cpu_ops to paravirtualize privileged instructions.
[IA64] pvops: add an early setup hook for pv_ops.
...
* 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (70 commits)
KVM: Adjust smp_call_function_mask() callers to new requirements
KVM: MMU: Fix potential race setting upper shadow ptes on nonpae hosts
KVM: x86 emulator: emulate clflush
KVM: MMU: improve invalid shadow root page handling
KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction
KVM: Prefix some x86 low level function with kvm_, to avoid namespace issues
KVM: check injected pic irq within valid pic irqs
KVM: x86 emulator: Fix HLT instruction
KVM: Apply the kernel sigmask to vcpus blocked due to being uninitialized
KVM: VMX: Add ept_sync_context in flush_tlb
KVM: mmu_shrink: kvm_mmu_zap_page requires slots_lock to be held
x86: KVM guest: make kvm_smp_prepare_boot_cpu() static
KVM: SVM: fix suspend/resume support
KVM: s390: rename private structures
KVM: s390: Set guest storage limit and offset to sane values
KVM: Fix memory leak on guest exit
KVM: s390: dont allocate dirty bitmap
KVM: move slots_lock acquision down to vapic_exit
KVM: VMX: Fake emulate Intel perfctr MSRs
KVM: VMX: Fix a wrong usage of vmcs_config
...
Noted by Tony Luck although I've done the patches differently and also
removed some other bogus oddments.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Flush the shadow mmu before removing regions to avoid stale entries.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch enables coalesced MMIO for ia64 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.
[akpm: fix compile error on ia64]
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Modify member in_range() of structure kvm_io_device to pass length and the type
of the I/O (write or read).
This modification allows to use kvm_io_device with coalesced MMIO.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch removes the experimental status of kdump on IA64. kdump is on IA64
now since more than one year and it has proven to be stable.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
acpi_map_lsapic tries to stuff a long into ia64_cpu_to_sapicid[],
which can only hold ints, so let's fix that.
We need to update the signature of acpi_map_cpu2node() too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
module_free() refers the first parameter before checking.
But it is called like below(in kernel/kprobes). The first parameter is always NULL.
This happens when many probe points(>1024) are set by kprobes.
I encountered this with using SystemTap. It can set many probes easily.
static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
{
...
if (kip->nused == 0) {
hlist_del(&kip->hlist);
if (hlist_empty(&kprobe_insn_pages)) {
...
} else {
module_free(NULL, kip->insns); //<<< 1st param always NULL
kfree(kip);
}
return 1;
}
return 0;
}
Signed-off-by: Akiyama, Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When dprintk is enabled the following warnings are generated:
arch/ia64/kernel/cpufreq/acpi-cpufreq.c: In function 'processor_set_pstate':
arch/ia64/kernel/cpufreq/acpi-cpufreq.c:54: warning: format '%x' expects type 'unsigned int', but argumen
t 3 has type 's64'
arch/ia64/kernel/cpufreq/acpi-cpufreq.c: In function 'processor_get_pstate':
arch/ia64/kernel/cpufreq/acpi-cpufreq.c:76: warning: format '%x' expects type 'unsigned int', but argumen
t 2 has type 's64'
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix calls of smp_call_function*() in arch/ia64/kvm for recent API
changes.
CC [M] arch/ia64/kvm/kvm-ia64.o
arch/ia64/kvm/kvm-ia64.c: In function 'handle_global_purge':
arch/ia64/kvm/kvm-ia64.c:398: error: too many arguments to function 'smp_call_function_single'
arch/ia64/kvm/kvm-ia64.c: In function 'kvm_vcpu_kick':
arch/ia64/kvm/kvm-ia64.c:1696: error: too many arguments to function 'smp_call_function_single'
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"idle=nomwait" disables the use of the MWAIT
instruction from both C1 (C1_FFH) and deeper (C2C3_FFH)
C-states.
When MWAIT is unavailable, the BIOS and OS generally
negotiate to use the HALT instruction for C1,
and use IO accesses for deeper C-states.
This option is useful for power and performance
comparisons, and also to work around BIOS bugs
where broken MWAIT support is advertised.
http://bugzilla.kernel.org/show_bug.cgi?id=10807http://bugzilla.kernel.org/show_bug.cgi?id=10914
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
"idle=halt" limits the idle loop to using
the halt instruction. No MWAIT, no IO accesses,
no C-states deeper than C1.
If something is broken in the idle code,
"idle=halt" is a less severe workaround
than "idle=poll" which disables all power savings.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
* 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
generic-ipi: more merge fallout
generic-ipi: merge fix
x86, visws: use mach-default/entry_arch.h
x86, visws: fix generic-ipi build
generic-ipi: fixlet
generic-ipi: fix s390 build bug
generic-ipi: fix linux-next tree build failure
fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
fix "smp_call_function: get rid of the unused nonatomic/retry argument"
on_each_cpu(): kill unused 'retry' parameter
smp_call_function: get rid of the unused nonatomic/retry argument
sh: convert to generic helpers for IPI function calls
parisc: convert to generic helpers for IPI function calls
mips: convert to generic helpers for IPI function calls
m32r: convert to generic helpers for IPI function calls
arm: convert to generic helpers for IPI function calls
alpha: convert to generic helpers for IPI function calls
ia64: convert to generic helpers for IPI function calls
powerpc: convert to generic helpers for IPI function calls
...
Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
The symbol account_system_vtime is used by the kvm module but
not exported. This breaks building with CONFIG_VIRT_CPU_ACCOUNTING
and CONFIG_KVM=m.
Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Acked-by: Hidetosho Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On a system where there are no hot pluggable cpus "additional_cpus"
is still set to -1 at the point where we call per_cpu_scan_finalize().
If we didn't find an SRAT table and so pick the default "32" for the
number of cpus, when we get to:
high_cpu = min(high_cpu + reserve_cpus, NR_CPUS);
we will end up initializing for just 31 cpus ... and so we will
die horribly when bringing up cpu#32.
Problem introduced by: 2c6e6db41f
"Minimize per_cpu reservations."
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This converts ia64 to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory. Thus a NULL test or
memset after calls to these functions is unnecessary.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The fix applied in e0c6d97c65
"security hole in sn2_ptc_proc_write" didn't take into account
the case where count==0 (which results in a buffer underrun
when adding the trailing '\0'). Thanks to Andi Kleen for
pointing this out.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Call check_sal_cache_flush() after platform_setup() as
check_sal_cache_flush() now relies on being able to call platform
vector code.
Problem was introduced by: 3463a93def
"Update check_sal_cache_flush to use platform_send_ipi()"
Signed-off-by: Jes Sorensen <jes@sgi.com>
Tested-by: Alex Chiang: <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Security hole in sn2_ptc_proc_write
It is possible to overrun a buffer with a write to this /proc file.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix build error in CONFIG_IA64_SGI_UV config. (GENERIC builds
are ok).
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
check_sal_cache_flush is used to detect broken firmware that drops
pending interrupts.
The old implementation schedules a timer interrupt for itself in
the future by getting the current value of the Interval Timer
Counter + 1000 cycles, waits for the interrupt to be pended, calls
SAL_CACHE_FLUSH, and finally checks to see if the interrupt is
still pending.
This implementation can cause problems for virtual machine code if
the process of scheduling the timer interrupt takes more than 1000
cycles; the virtual machine can end up sleeping for several hundred
years while waiting for the ITC to wrap around.
The fix is to use platform_send_ipi. The processor will still send
an interrupt to itself, using the IA64_IPI_DM_INT delivery mode,
which causes the IPI to look like an external interrupt. The rest
of the SAL_CACHE_FLUSH + checking to see if the interrupt is still
pending remains unchanged.
This fix has been boot tested successfully on:
- intel tiger2
- hp rx6600
- hp rx5670
The rx5670 has known buggy firmware, where SAL_CACHE_FLUSH drops
pending interrupts. A boot test on this machine showed this message
on the console:
SAL: SAL_CACHE_FLUSH drops interrupts; PAL_CACHE_FLUSH will be used instead
Which proves that the self-inflicted IPI approach is viable. And
as expected, the other tested platforms correctly did not display
the warning.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a SLIT sanity checking patch. It moves slit_valid() function to
generic ACPI code and does sanity checking for both x86 and ia64. It sets up
node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
SLIT table on ia64. It also cleans up unused variable localities in
acpi_parse_slit() on x86.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Move the cleanup of the async queue to the close callback from the flush
callback. This avoids losing asynchronous overflow notifications when
the file descriptor is shared by multiple processes and one terminates.
Signed-off-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Only copy in the data actually requested by the instruction emulation
and zero pad the destination register first. This avoids the problem
where emulated mmio access got garbled data from ld2.acq instructions
in the vga console driver.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
move interrupt, page_fault, non_syscall, dispatch_unaligned_handler and
dispatch_to_fault_handler to avoid lack of instructin space.
The change set 4dcc29e157 bloated
SAVE_MIN_WITH_COVER, SAVE_MIN_WITH_COVER_R19 so that it bloated the
functions which uses those macros.
In the native case, only dispatch_illegal_op_fault had to be moved.
When paravirtualized case the all functions which use the macros need
to be moved to avoid the lack of space.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Introduce pv_time_ops which adds hook to steal time accounting.
On virtualized environment, cpus are shared by many guests and
steal time is the time which is used for other guests.
On virtualized environtment, streal time should be accounted.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_irq_ops which adds hooks to paravirtualize irq related
operations.
On virtualized environment, interruption may be replaced by something
virtualization friendly. So the irq related operation also may need
paravirtualization.
This patch adds necessary hooks to paravirtualize irq related operations.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
add hooks to paravirtualize iosapic which is a real hardware resource.
On virtualized environment it may be replaced something virtualized
friendly.
Define pv_iosapic_ops and add the hooks.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define pv_init_ops hooks which represents various initialization
hooks for paravirtualized environment. and add hooks.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make NR_IRQ overridable by each pv instances.
Pv instance may need each own number of irqs so that
NR_IRQS should be the maximum number of nr_irqs each
pv instances need.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ia64_swtich_to, ia64_leave_syscall and ia64_leave_kernel.
They include sensitive or performance critical privileged instructions
so that they need paravirtualization.
To paravirtualize them by single source and multi compile
they are converted into indirect jump. And define each pv instances.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ivt.S which implements fault handler in hand written
assembly code.
They includes sensitive or performance critical privileged instructions.
So they need paravirtualization.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize minstate.h which are hand written assembly code.
They include sensitive or performance critical privileged
instructions. So that they are appropriate for paravirtualization.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Preparation for paravirtualization of hand written assembly code.
They are paravirtualized by single source code and compiled multi times.
To tell those files for target (including native), add one defines.
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_cpu_ops to paravirtualize privleged instructions
which are defined by ia64 intrinsics.
make them indirect C function calls by introducing function
tables, pv_cpu_ops.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds a setup hook in the very early boot sequence
before start_kernel() to initialize paravirtualization stuff.
The hook will be set by each pv loader code or by using multi entry point.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_info which describes some randome info about
underlying execution environment.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move the LOAD_OFFSET definition from vmlinux.lds.S into system.h.
On paravirtualized environments, it is necessary to detect the
execution environment. One of the solutions is the multi entry point.
The multi entry point allows a boot loader to start the kernel execution
from the entry point which is different from the ELF entry point.
The non standard entry point will defined as the specialized elf note
which contains the LMA of the entry point symbol.
The constant, LOAD_OFFSET, is necessary to calculate the symbol's LMA.
Move the definition into the public header file to make it available
to the multi entry point support.
Cc: "He, Qing" <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
remove extern declaration of handle_IPI() in irq_ia64.c.
Instead, declare it in asm-ia64/smp.h.
Later handle_IPI() will be referenced from another file.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Problem: An application violating the architectural rules regarding
operation dependencies and having specific Register Stack Engine (RSE)
state at the time of the violation, may result in an illegal operation
fault and invalid RSE state. Such faults may initiate a cascade of
repeated illegal operation faults within OS interruption handlers.
The specific behavior is OS dependent.
Implication: An application causing an illegal operation fault with
specific RSE state may result in a series of illegal operation faults
and an eventual OS stack overflow condition.
Workaround: OS interruption handlers that switch to kernel backing
store implement a check for invalid RSE state to avoid the series
of illegal operation faults.
The core of the workaround is the RSE_WORKAROUND code sequence
inserted into each invocation of the SAVE_MIN_WITH_COVER and
SAVE_MIN_WITH_COVER_R19 macros. This sequence includes hard-coded
constants that depend on the number of stacked physical registers
being 96. The rest of this patch consists of code to disable this
workaround should this not be the case (with the presumption that
if a future Itanium processor increases the number of registers, it
would also remove the need for this patch).
Move the start of the RBS up to a mod32 boundary to avoid some
corner cases.
The dispatch_illegal_op_fault code outgrew the spot it was
squatting in when built with this patch and CONFIG_VIRT_CPU_ACCOUNTING=y
Move it out to the end of the ivt.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] return to old errno choice in mkdir() et.al.
[Patch] fs/binfmt_elf.c: fix wrong return values
[PATCH] get rid of leak in compat_execve()
[Patch] fs/binfmt_elf.c: fix a wrong free
[PATCH] avoid multiplication overflows and signedness issues for max_fds
[PATCH] dup_fd() part 4 - race fix
[PATCH] dup_fd() - part 3
[PATCH] dup_fd() part 2
[PATCH] dup_fd() fixes, part 1
[PATCH] take init_files to fs/file.c
Move rcu-protected lists from list.h into a new header file rculist.h.
This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.
For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h. It actually compiles because users of
rcu_dereference() are macros. Others RCU functions could be used too but
aren't probably because of this.
Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The GVMM module is position independent since it is relocated to the guest
address space.
Commit ea696f9cf ("ia64 kvm fixes for O=... builds") broke this by linking
GVMM with non-PIC objects.
Fix by creating two files: memset.S and memcpy.S which just include the files
under arch/ia64/lib/{memset.S, memcpy.S} respectively.
[akpm: don't delete files which we need]
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The patch aims to fix a performance issue for the syscall
personality(PER_LINUX32).
On IA-64 box, the syscall personality (PER_LINUX32) has poor performance
because it failed to find the Linux/x86 execution domain. Then it tried
to load the kernel module however it failed always and it used the default
execution domain PER_LINUX instead. Requesting kernel modules is very
expensive. It caused the performance issue. (see the function
lookup_exec_domain in kernel/exec_domain.c).
To resolve the issue, execution domain Linux/x86 is always registered in
initialization time for IA-64 architecture.
Signed-off-by: Xiaolan Huang <xiaolan.huang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
acpi_unregister_gsi() should "undo" what acpi_register_gsi() does.
On systems that have legacy interrupts, acpi_unregister_gsi erroneously calls
iosapci_unregister_intr() which is wrong to do and causes a loud warning.
acpi_unregister_gsi() should just return in these cases.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is only palinfo_handle_smp as (indirect) user of palinfo_smp_call (by
way of smp_call_function_single) and surely palinfo_handle_smp never pass
NULL as parameter for info.
Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a typo, and coding style cleanups for pfm_handle_work().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch does:
- make comment at next to resched check more robust
- move "re-check" comments to next to where change predicate regs
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[Bug-fix for "[BUG?][2.6.25-mm1] sleeping during IRQ disabled"]
This patch does:
- enable interrupts before calling schedule() as same as others, ex. x86
- enable interrupts during ia64_do_signal() and ia64_sync_krbs()
- do_notify_resume_user() is still called with interrupts disabled, since
we can take short path of fsys_mode if-statement quickly.
- pfm_handle_work() is also called with interrupts disabled, since
it can deal interrupt mask within itself.
- fix/add some comments/notes
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sequence executed in check_sal_cache_flush:
- pend a timer interrupt
- call SAL_CACHE_FLUSH
- see if interrupt is still pending
can hang HP machines with buggy SAL_CACHE_FLUSH implementations.
Provide a kernel command-line argument to allow users skip this
check if desired. Using this parameter will force ia64_sal_cache_flush
to call ia64_pal_cache_flush() instead of SAL_CACHE_FLUSH.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some IA64 machines map all cell-local memory above 4 GB (32 bit limit).
However, in most cases, the kernel needs some memory below that limit that is
DMA-capable. So in this machine configuration, the crashkernel will be reserved
above 4 GB.
For machines that use SWIOTLB implementation because they lack an I/O MMU
the low memory is required by the SWIOTLB implementation. In that case,
it doesn't make sense to reserve the crashkernel at all because it's unusable
for kdump.
A special case is the "hpzx1" machine vector. In theory, it has a I/O MMU, so
it can be booted above 4 GB. However, in the kdump case that is not possible
because of changeset 51b58e3e26:
On HP zx1 machines, the 'machvec=dig' parameter is needed for the kdump
kernel to avoid problems with the HP sba iommu. The problem is that during
the boot of the kdump kernel, the iommu is re-initialized, so in-flight DMA
from improperly shutdown drivers causes an IOTLB miss which leads to an
MCA. With kdump, the idea is to get into the kdump kernel with as little
code as we can, so shutting down drivers properly is not an option.
The workaround is to add 'machvec=dig' to the kdump kernel boot parameters.
This makes the kdump kernel avoid using the sba iommu altogether, leaving
the IOTLB intact. Any ongoing DMA falls harmlessly outside the kdump
kernel. After the kdump kernel reboots, all devices will have been
shutdown properly and DMA stopped.
This patch pushes that functionality into the sba iommu initialization
code, so that users won't have to find the obscure documentation telling
them about 'machvec=dig'.
This means that also for hpzx1 it's not possible to boot when all
memory is above the 4 GB limit. So the only machine vectors that can handle
this case are "sn2" and "uv".
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds the basic IA64 machvec infrastructure to support
the SGI "UV" platform.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Races galore... General rule: as soon as it's in descriptor table,
it's over; another thread might have started IO on it/dup2() it
elsewhere/dup2() something *over* it/etc. fd_install() is the very
last step one should take - it's a point of no return.
Besides, the damn thing leaked on failure exits...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define
our own set_restore_sigmask() function. This saves the costly
SMP-safe set_bit operation, which we do not need for the sigmask
flag since TIF_SIGPENDING always has to be set too.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix indenting of switch statement to follow CodingStyle, and
pull out handling of call_data into an inlined function.
I confirmed that applying this fix doesn't affect assembled code.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide. Move its definition
to math64.h as currently no architecture overrides the generic implementation.
They can still override it of course, but the duplicated declarations are
avoided.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch silences:
WARNING: vmlinux.o(.text+0x44672): Section mismatch in
reference from the function arch_register_cpu() to the
function .cpuinit.text:register_cpu()
Changes are based on codes in arch/x86/kernel/topology.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes following warning:
WARNING: vmlinux.o(.exit.text+0xb1): Section mismatch in
reference from the function palinfo_exit() to the variable
.cpuinit.data:palinfo_cpu_notifier
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch shuts up the following:
WARNING: vmlinux.o(.text+0x7102): Section mismatch in
reference from the function fixup_irqs() to the function
.devinit.text:ia64_disable_timer()
Removing ia64_disable_timer() is safe because there are no functions
calling it other than the fixup_irqs(),
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch kills:
WARNING: vmlinux.o(.text+0x1702): Section mismatch in
reference from the function acpi_register_ioapic() to the
function .devinit.text:iosapic_init()
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
- Operations are now a shared const function block as with most other Linux
objects
- Introduce wrappers for some optional functions to get consistent behaviour
- Wrap put_char which used to be patched by the tty layer
- Document which functions are needed/optional
- Make put_char report success/fail
- Cache the driver->ops pointer in the tty as tty->ops
- Remove various surplus lock calls we no longer need
- Remove proc_write method as noted by Alexey Dobriyan
- Introduce some missing sanity checks where certain driver/ldisc
combinations would oops as they didn't check needed methods were present
[akpm@linux-foundation.org: fix fs/compat_ioctl.c build]
[akpm@linux-foundation.org: fix isicom]
[akpm@linux-foundation.org: fix arch/ia64/hp/sim/simserial.c build]
[akpm@linux-foundation.org: fix kgdb]
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks.
Those low bits are scarce. Renumber TIF_RESTORE_SIGMASK to free one up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Legacy HP ia64 platforms currently cannot provide
/proc/cpuinfo/physical_id due to legacy SAL/PAL implementations.
However, that physical topology information can be obtained
via ACPI.
Provide an interface that gives ACPI one last chance to provide
physical_id for these legacy platforms. This logic only comes
into play iff:
- ACPI actually provides slot information for the CPU
- we lack a valid socket_id
Otherwise, we don't do anything.
Since x86 uses the ACPI processor driver as well, we provide a nop
stub function for arch_fix_phys_package_id() in asm-x86/topology.h
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Commit 113134fcbc changed the flow of
control when calling PAL_LOGICAL_TO_PHYSICAL and SAL_PHYSICAL_ID_INFO.
With the change, if a platform did not implement the latter, a useless
printk would appear in the boot log:
ia64_sal_pltid failed with -1
So let's check the return code and only printk on a true error, and do
not print anything in the unimplemented case. While we're in there,
clean up some stylistic issues too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Enable the uncached allocator to allocate multiple pages of contiguous
uncached memory.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If "max_purges" from PAL is 0, it actually means 1.
However it was not handled later when a hot-added cpu pass the
max_purges from PAL. This makes systems easy to go BUG_ON().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces.
Implement the old dma_*map_*() interfaces in terms of the corresponding new
interfaces. For ia64/sn, make use of one dma attribute,
DMA_ATTR_WRITE_BARRIER. Introduce swiotlb_*map*_attrs() functions.
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce new interfaces, dma_*map*_attrs(), for passing architecture-specific
attributes when memory is mapped and unmapped for DMA. Give the interfaces
default implementations which ignore attributes. Also introduce the
dma_{set|get}_attr() interfaces for setting and retrieving individual
attributes. Define one attribute, DMA_ATTR_WRITE_BARRIER, in anticipation of
its use by ia64/sn. Select whether architectures implement arch-specific
versions of the dma_*map*_attrs() interfaces via HAVE_DMA_ATTRS in Kconfig.
[markn@au1.ibm.com: dma_{set,get}_attr() have to be static inline]
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa1). SWIOTLB can use it instead
of the homegrown function.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* EXTRA_CFLAGS do not apply for *.S
* don't bother with symlinks to ../lib/mem*.S, just add ../lib/mem*.o
to object list
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All architectures use an effectively identical definition of online_page(), so
just make it common code. x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.
x86-32's differences arise because it puts its hotplug pages in the highmem
zone. We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately. This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.
I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.
[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So userspace can save/restore the mpstate during migration.
[avi: export the #define constants describing the value]
[christian: add s390 stubs]
[avi: ditto for ia64]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
ignored, possibly resulting in hangs.
Also make sure that atomic_inc and waitqueue_active tests happen in the
specified order, otherwise the following race is open:
CPU0 CPU1
if (waitqueue_active(wq))
add_wait_queue()
if (!atomic_read(pit_timer->pending))
schedule()
atomic_inc(pit_timer->pending)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Update the related Makefile and KConfig for kvm build
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Some sal/pal calls would be traped to kvm for virtulization
from guest firmware.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
process.c mainly handle interruption injection, and some faults handling.
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
asm-offsets.c will generate offset values used for assembly code
for some fileds of special structures.
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
optvfault.S Add optimization for some performance-critical
virtualization faults.
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
trampoline code targets for guest/host world switch.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
mmio.c includes mmio decoder, and related mmio logics.
Signed-off-by: Anthony Xu <Anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
vmm_ivt.S includes an ivt for vmm use.
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
vmm.c adds the interfaces with kvm/module, and initialize global data area.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_minstate.h : Marcos about Min save routines.
lapic.h: apic structure definition.
vcpu.h : routions related to vcpu virtualization.
vti.h : Some macros or routines for VT support on Itanium.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_ia64.c is created to handle kvm ia64-specific core logic.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move XPC and XPNET from arch/ia64/sn/kernel to drivers/misc/sgi-xp.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
- remove unused 'irq' argument from pfm_do_interrupt_handler()
- remove pointless cast to void*
- add KERN_xxx prefix to printk()
- remove braces around singleton C statement
- in tioce_provider.c, start tioce_dma_consistent() and
tioce_error_intr_handler() function declarations in column 0
This change's main purpose is to prepare for the patchset in
jgarzik/misc-2.6.git#irq-remove, that explores removal of the
never-used 'irq' argument in each interrupt handler.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are many notify_die() and almost all take same style with
ia64_mca_spin(). This patch defines macros and replace them all,
to reduce lines and to improve readability.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are 3 hooks in MCA handler, but this DIE_MCA_MONARCH_PROCESS
event does not notified other than for the first monarch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing with CONFIG_VIRT_CPU_ACCOUNTING=y, I found that
I occasionally get very huge system time in some threads.
So I dug the issue and finally noticed that it was caused
because of an interrupt which interrupt in the following window:
> [arch/ia64/kernel/entry.S: (!CONFIG_PREEMPT && CONFIG_VIRT_CPU_ACCOUNTING)]
>
> ENTRY(ia64_leave_syscall)
> :
> (pUStk) rsm psr.i
> cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall
> (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk
> .work_processed_syscall:
> adds r2=PT(LOADRS)+16,r12
> (pUStk) mov.m r22=ar.itc // fetch time at leave
> adds r18=TI_FLAGS+IA64_TASK_SIZE,r13
> ;;
> <<< window: from here >>>
> (p6) ld4 r31=[r18] // load current_thread_info()->flags
> ld8 r19=[r2],PT(B6)-PT(LOADRS)
> adds r3=PT(AR_BSPSTORE)+16,r12
> ;;
> mov r16=ar.bsp
> ld8 r18=[r2],PT(R9)-PT(B6)
> (p6) and r15=TIF_WORK_MASK,r31 // any work other than TIF_SYSCALL_TRACE?
> ;;
> ld8 r23=[r3],PT(R11)-PT(AR_BSPSTORE)
> (p6) cmp4.ne.unc p6,p0=r15, r0 // any special work pending?
> (p6) br.cond.spnt .work_pending_syscall
> ;;
> ld8 r9=[r2],PT(CR_IPSR)-PT(R9)
> ld8 r11=[r3],PT(CR_IIP)-PT(R11)
> (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE!
> ;;
> invala
> <<< window: to here >>>
> rsm psr.i | psr.ic // turn off interrupts and interruption collection
If pUStk is true, it means we are going to return user mode, hence we fetch
ar.itc to get time at leave from system.
It seems that it is not possible to interrupt the window if pUStk is true,
because interrupts are disabled early. And also disabling interrupt makes
sense because it is safe for referring current_thread_info()->flags.
However interrupting the window while pUStk is true was possible.
The route was:
ia64_trace_syscall
-> .work_pending_syscall_end
-> .work_processed_syscall
Only in case entering the window from this route, interrupts are enabled
during in the window even if pUStk is true. I suppose interrupts must be
disabled here anyway if pUStk is true.
I'm not sure but afraid that what kind of bad effect were there, other
than crazy system time which I found.
FYI, there was a commit 6f6d75825d that
points out a bug at same point(exit of ia64_trace_syscall) in 2006.
It can be said that there was an another bug.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use the generic pci_enable_resources() instead of the arch-specific code.
Unlike this arch-specific code, the generic version:
- does not check for a NULL dev pointer
- skips resources that have neither IORESOURCE_IO nor IORESOURCE_MEM set
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
[SCSI] iscsi: bidi support for iscsi_tcp
[SCSI] iscsi: bidi support at the generic libiscsi level
[SCSI] iscsi: extended cdb support
[SCSI] zfcp: Fix error handling for blocked unit for send FCP command
[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
[SCSI] zfcp: fix 31 bit compile warnings
[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
[SCSI] bsg: remove minor in struct bsg_device
[SCSI] bsg: use better helper list functions
[SCSI] bsg: replace kobject_get with blk_get_queue
[SCSI] bsg: takes a ref to struct device in fops->open
[SCSI] qla1280: remove version check
[SCSI] libsas: fix endianness bug in sas_ata
[SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
[SCSI] aacraid: Do not describe check_reset parameter with its value
[SCSI] aacraid: Fix down_interruptible() to check the return value
[SCSI] sun3_scsi_vme: add MODULE_LICENSE
[SCSI] st: rename flush_write_buffer()
[SCSI] tgt: use KMEM_CACHE macro
[SCSI] initio: fix big endian problems for auto request sense
...
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
http://bugzilla.kernel.org/show_bug.cgi?id=10124
this change:
commit 08f1c192c3
Author: Muli Ben-Yehuda <muli@il.ibm.com>
Date: Sun Jul 22 00:23:39 2007 +0300
x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
This patch introduces struct pci_sysdata to x86 and x86-64, and
converts the existing two users (NUMA, Calgary) to use it.
This lays the groundwork for having other users of sysdata, such as
the PCI domains work.
The Calgary bits are tested, the NUMA bits just look ok.
replaces pcibios_scan_root by pci_scan_bus_parented...
but in pcibios_scan_root we have a check about scanned busses.
Cc: <yakui.zhao@intel.com>
Cc: Stian Jordet <stian@jordet.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Yinghai Lu" <yhlu.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the problem that kdump by INIT does not work if we use
makedumpfile. The problem is that after INIT is issued, 2nd kernel
starts and makedumpfile fails with the following error message.
/proc/vmcore doesn't contain vmcoreinfo.
'-x' or '-i' must be specified.
makedumpfile Failed.
The cause of this problem is that kernel does not call
crash_save_vmcoreinfo. When kdump starts by panic or sysrq-trigger,
crash_save_vmcoreinfo is called by crash_kexec. But this function is not
called when kdump starts by INIT. The Attached patch fixes this.
This patch just adds crash_save_vmcoreinfo into machine_kdump_on_init so
that crash_save_vmcoreinfo can be called when kdump starts by INIT.
I tested this patch with linux-2.6.25-rc9 and I confirmed it worked.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a NUMA memory configuration issue in 2.6.24:
A 2-node machine of ours has got the following memory layout:
Node 0: 0 - 2 Gbytes
Node 0: 4 - 8 Gbytes
Node 1: 8 - 16 Gbytes
Node 0: 16 - 18 Gbytes
"efi_memmap_init()" merges the three last ranges into one.
"register_active_ranges()" is called as follows:
efi_memmap_walk(register_active_ranges, NULL);
i.e. once for the 4 - 18 Gbytes range. It picks up the node
number from the start address, and registers all the memory for
the node #0.
"register_active_ranges()" should be called as follows to
make sure there is no merged address range at its entry:
efi_memmap_walk(filter_memory, register_active_ranges);
"filter_memory()" is similar to "filter_rsvd_memory()",
but the reserved memory ranges are not filtered out.
Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Untangle the chaos of page size determination in this function by
simply using PAGE_SIZE << compound_order().
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The functions time_before, time_before_eq, time_after, and time_after_eq are
more robust for comparing jiffies against other values.
So use the time_after() & time_before() macros, defined at linux/jiffies.h,
which deal with wrapping correctly
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
show_mem() has no need to print the amount of free swap space manually because
show_free_areas() does this already and is called by the former.
The two outputs only differ in text formatting:
printk("Free swap = %lukB\n", ...);
printk("Free swap: %6ldkB\n", ...);
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
IA64's IOMMU implementation allocates memory areas spanning LLD's segment
boundary limit. It forces low level drivers to have a workaround to adjust
scatter lists that the IOMMU builds.
We are in the process of making all the IOMMUs respect the segment boundary
limits to remove such work around in LLDs. This patch is for IA64's IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add kprobe-booster support on ia64.
Kprobe-booster improves the performance of kprobes by eliminating single-step,
where possible. Currently, kprobe-booster is implemented on x86 and x86-64.
This is an ia64 port.
On ia64, kprobe-booster executes a copied bundle directly, instead of single
stepping. Bundles which have B or X unit and which may cause an exception
(including break) are not executed directly. And also, to prevent hitting
break exceptions on the copied bundle, only the hindmost kprobe is executed
directly if several kprobes share a bundle and are placed in different slots.
Note: set_brl_inst() is used for preparing an instruction buffer(it does not
modify any active code), so it does not need any atomic operation.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: bibo,mao <bibo.mao@intel.com>
Cc: Rusty Lynch <rusty.lynch@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sys_getpid() and sys_set_tid_address() behavior changed from
return current->tgid
to
struct pid *pid;
pid = current->pids[PIDTYPE_PID].pid;
return pid->numbers[pid->level].nr;
But the fast system calls on ia64 still operate the old way. Patch them
appropriately to let ia64 work with pid namespaces. Besides, this is one more
step in deprecating of pid and tgid on task_struct.
The fsys_getppid() is to be patched as well, but its logic is much
more complex now, so I will make it later.
One thing I'm not 100% sure is the trick with the IA64_UPID_SHIFT. On order
to access the pid->level's element of an array I have to perform the following
calculations
pid + sizeof(struct upid) * pid->level
The problem is that ia64 can only multiply float point registers, while all
the offsets I have in code are in rXX ones. Fortunately, the sizeof(struct
upid) is 32 bytes on ia64 (and is very unlikely to ever change), so the
calculations get simpler:
pid + pid->level << 5
So, I introduce the IA64_UPID_SHIFT and use the shl instruction. I also
looked at how gcc compiles the similar place and found that it makes it with
shift as well. Is this OK to do so?
Tested with ski emulator with 2.6.24 kernel, but fits 2.6.25-rc4 and
2.6.25-rc4-mm1 as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: David Mosberger-Tang <davidm@hpl.hp.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
do_each_thread/while_each_thread is a double loop, so
should use 'goto' rather than 'break' to break out
the loop.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
One should normally unlock in the reverse order of the lock calls,
and in this case there certainly is no reason not to.
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While it is convenient that we can invoke kdump by asserting INIT
via button on chassis etc., there are some situations that invoking
kdump on fatal MCA is not welcomed rather than rebooting fast without
dump.
This patch adds a new flag 'kdump_on_fatal_mca' that is independent
from 'kdump_on_init' currently available. Adding this flag enable
us to turning on/off of kdump depend on the event, INIT and/or fatal
MCA. Default for this flag is to take the dump.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This attached patch significantly shrinks boot memory allocation on ia64.
It does this by not allocating per_cpu areas for cpus that can never
exist.
In the case where acpi does not have any numa node description of the
cpus, I defaulted to assigning the first 32 round-robin on the known
nodes.. For the !CONFIG_ACPI I used for_each_possible_cpu().
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
A simple fix. The existing pernodesize reservation is not taking into
account a second array of pg_data_t structures. This is normally not
important because the PAGE_ALIGN macro reserves adequate space.
I made the compute_pernodesize steps in the same order as the fill_pernode
steps to make the correlation more clear.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This replaces simscsi_fillresult with scsi_sg_copy_from_buffer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The patch defines kernel parameter "nptcg=". The parameter overrides max number
of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to SDM2.2, Itanium supports multiple outstanding ptc.g instructions.
But current kernel function ia64_global_tlb_purge() uses a spinlock to serialize
ptc.g instructions issued by multiple processors. This serialization might have
scalability issue on a big SMP machine where many processors could purge TLB
in parallel.
The patch fixes this problem by issuing multiple ptc.g instructions in
ia64_global_tlb_purge(). It also adds support for the "PALO" table to get
a platform view of the max number of outstanding ptc.g instructions (which
may be different from the processor view found from PAL_VM_SUMMARY).
PALO specification can be found at: http://www.dig64.org/home/DIG64_PALO_R1_0.pdf
spinaphore implementation by Matthew Wilcox.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>