Commit Graph

5163 Commits

Author SHA1 Message Date
Linus Torvalds
808c783e9b Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus 2006-02-21 10:13:22 -08:00
Linus Torvalds
52aa536f5a Merge branch 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev 2006-02-21 10:11:32 -08:00
Ralf Baechle
1242737735 [MIPS] Follow Uli's latest *at syscall changes.
(This really is only the half of the patch which was forgotten in
326a625748 ...)
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-02-21 16:58:23 +00:00
Atsushi Nemoto
8ecbbcaf08 [MIPS] Fixes for uaccess.h with gcc >= 4.0.1
It seems current get_user() incorrectly sign-extend an unsigned int
value on 64bit kernel.  I think this is because '(__typeof__(val))'
cast in final assignment.  I suppose the cast should be
'(__typeof__(*(addr))'.
    
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-02-21 16:58:22 +00:00
Linus Torvalds
cf70a6f264 Merge branch 'fixes.b8' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bird 2006-02-20 20:09:44 -08:00
Linus Torvalds
6bd25e7821 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2006-02-20 20:05:45 -08:00
Hirokazu Takata
b04ec261bd [PATCH] m32r: __cmpxchg_u32 fix
This patch fixes a bug of include/asm-m32r/system.h:__cmpxchg_u32().

  static __inline__ unsigned long
  __cmpxchg_u32(volatile unsigned int *p, unsigned int old, unsigned int new);

In __cmpxchg_u32(), the "old" value must not be changed to the previous "*p"
value.  But the former code modifies the previous "*p" value.

A deadlock at _atomic_dec_and_lock sometimes happened due to this bug.

Signed-off-by: Hayato Fujiwara <fujiwara@linux-m32r.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:00:12 -08:00
Heiko Carstens
49d9c81a69 [PATCH] s390: revert dasd eer module
Revert dasd eer module until we have a common understanding of how the
interface should be.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:00:12 -08:00
Stephen Rothwell
7fd105e758 [PATCH] Fix compile for CONFIG_SYSVIPC=n or CONFIG_SYSCTL=n
The compat syscalls are added to sys_ni.c since they are not defined if the
above CONFIG options are off.  Also, nfs would not build with CONFIG_SYSCTL
off.

Noticed by Arthur Othieno.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:00:11 -08:00
Luke Yang
7a9166e3b0 [PATCH] Fix undefined symbols for nommu architecture
Signed-off-by: Luke Yang <luke.adi@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:00:11 -08:00
Pavel Machek
c255d844dd [PATCH] suspend-to-ram: allow video options to be set at runtime
Currently, acpi video options can only be set on kernel command line.  That's
little inflexible; I'd like userland s2ram application that just works, and
modifying kernel command line according to whitelist is not fun.  It is better
to just allow s2ram application to set video options just before suspend
(according to the whitelist).

This implements sysctl to allow setting suspend video options without reboot.

(akpm: Documentation updates for this new sysctl are pending..)

Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:00:10 -08:00
Christoph Lameter
9b0f8b040a [PATCH] Terminate process that fails on a constrained allocation
Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints).  If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior.  The process may be killed but then the
sysadmin or developer can investigate the situation.  The solution is
similar to what we do when running out of hugepages.

This patch adds a check before we kill processes.  At that point
performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes.  If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should kill the
current process.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:00:09 -08:00
Tejun Heo
cc1887f3d8 [PATCH] libata: fix qc->n_elem == 0 case handling in ata_qc_next_sg
This patch makes ata_for_each_sg() start with pad_sgent when
qc->n_elem is zero.  Previously, ata_for_each_sg() unconditionally
started with qc->__sg, handling the first sg to fill_sg() routines
even when the entry was invalid.  And while at it, unwind ?: in
ata_qc_next_sg() into if statement.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-02-20 16:48:18 -05:00
Olaf Hering
0728a2f99e [PATCH] powerpc: remove duplicate exports
A few symbols are exported twice, remove them from ppc_ksyms.c
Remove users of sys_ctrler in arch/ppc/

WARNING: vmlinux: duplicate symbol '__delay' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__up' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down_interruptible' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'sys_ctrler' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncat' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncmp' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strrchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strnlen' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strpbrk' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'memscan' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strstr' previous definition was in vmlinux

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-20 10:44:31 +11:00
Al Viro
ad6b97fc92 [PATCH] iomap_copy fallout (m68k)
added __raw_writel(), sanitized include order in iomap_copy.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-18 16:30:40 -05:00
Al Viro
00fc00df9e [PATCH] m68k: restore disable_irq_nosync()
Patch claiming to remove enable_irq_nosync() had left it alive but killed
disable_irq_nosync() instead...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-18 15:59:06 -05:00
David Gibson
200a4552af [PATCH] powerpc: Fix accidentally-working typo in __pud_free_tlb
One of the parameters to the __pud_free_tlb() macro for powerpc is
incorrect (see patch) .  We get away with it by accident, because the one
place the macro is called, the second parameter is a variable named "pud".

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 13:59:27 -08:00
Heiko Carstens
255acee706 [PATCH] s390: additional_cpus parameter
Introduce additional_cpus command line option.  By default no additional cpu
can be attached to the system anymore.  Only the cpus present at IPL time can
be switched on/off.  If it is desired that additional cpus can be attached to
the system the maximum number of additional cpus needs to be specified with
this option.

This change is necessary in order to limit the waste of per_cpu data
structures.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 13:59:26 -08:00
Chuck Ebbert
cfe91f9ce2 [PATCH] i386: fix singlestepping though a syscall
Do not mask TIF_SINGLESTEP bit in _TIF_WORK_MASK. Masking this stopped
do_notify_resume() from being called when it should have been.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 08:55:21 -08:00
Paul Mackerras
726c14bf49 [PATCH] Provide an interface for getting the current tick length
This provides an interface for arch code to find out how many
nanoseconds are going to be added on to xtime by the next call to
do_timer.  The value returned is a fixed-point number in 52.12 format
in nanoseconds.  The reason for this format is that it gives the
full precision that the timekeeping code is using internally.

The motivation for this is to fix a problem that has arisen on 32-bit
powerpc in that the value returned by do_gettimeofday drifts apart
from xtime if NTP is being used.  PowerPC is now using a lockless
do_gettimeofday based on reading the timebase register and performing
some simple arithmetic.  (This method of getting the time is also
exported to userspace via the VDSO.)  However, the factor and offset
it uses were calculated based on the nominal tick length and weren't
being adjusted when NTP varied the tick length.

Note that 64-bit powerpc has had the lockless do_gettimeofday for a
long time now.  It also had an extremely hairy routine that got called
from the 32-bit compat routine for adjtimex, which adjusted the
factor and offset according to what it thought the timekeeping code
was going to do.  Not only was this only called if a 32-bit task did
adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
duplicating computations from kernel/timer.c and it wasn't clear that
it was (still) correct.

The simple solution is to ask the timekeeping code how long the
current jiffy will be on each timer interrupt, after calling
do_timer.  If this jiffy will be a different length from the last one,
we then need to compute new values for the factor and offset used in
the lockless do_gettimeofday.  In this way we can keep xtime and
do_gettimeofday in sync, even when NTP is varying the tick length.

Note that when adjtimex varies the tick length, it almost always
introduces the variation from the next tick on.  The only case I could
see where adjtimex would vary the length of the current tick is when
an old-style adjtime adjustment is being cancelled.  (It's not clear
to me why the adjustment has to be cancelled immediately rather than
from the next tick on.)  Thus I don't see any real need for a hook in
adjtimex; the rare case of an old-style adjustment being cancelled can
be fixed up at the next tick.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: john stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 08:24:29 -08:00
Linus Torvalds
759b650f54 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 2006-02-17 08:16:35 -08:00
Linus Torvalds
a5222049f3 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2006-02-17 08:13:11 -08:00
Andi Kleen
7fd67843b9 [PATCH] x86_64: Disable tsc when apicpmtimer is active
Otherwise it has no effect anyways.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 08:00:40 -08:00
Andi Kleen
a62eaf151d [PATCH] x86_64: Add boot option to disable randomized mappings and cleanup
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.

Also I moved the variable to a more appropiate place and make
it independent from sysctl

And marked __read_mostly which it is.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17 08:00:40 -08:00
Nicolas Pitre
d9db950cfa [ARM] 3339/1: ARM EABI: make unmuxed syscalls visible
Patch from Nicolas Pitre

With EABI the multiplex sys_ipc and sys_socketcall syscalls are
unavailable and their support code even removed from the compiled
kernel, and the new unmuxed syscalls must be used instead.

Make those syscall numbers visible.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-02-16 22:36:15 +00:00
Linus Torvalds
26d451b603 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 2006-02-16 12:47:44 -08:00
Russell King
7bbb794031 [ARM] Fix SMP initialisation oops
A change to the SMP initialisation caused the following oops:

 CPU1: Booted secondary processor
 CPU1: D VIPT write-back cache
 CPU1: I cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
 CPU1: D cache: 32768 bytes, associativity 4, 32 byte lines, 256 sets
 <7>Calibrating delay loop... 83.14 BogoMIPS (lpj=415744)
 <1>Unable to handle kernel NULL pointer dereference at virtual address 0000001c
 ...
 PC is at enqueue_task+0x1c/0x64
 LR is at activate_task+0xcc/0xe4

SMP initialisation now requires cpu_possible_map to be initialised in
setup_arch().  Move this from smp_prepare_cpus() to smp_init_cpus()
and call it from our setup_arch() if CONFIG_SMP is enabled.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-02-16 11:08:09 +00:00
Linus Torvalds
0b60afba53 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-02-15 19:56:33 -08:00
Roman Zippel
b2ee9dbfad [PATCH] hrtimer: fix multiple macro argument expansion
For two macros the arguments were expanded twice, change them to inline
functions to avoid it.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15 15:32:22 -08:00
Michael S. Tsirkin
5f6164f309 [PATCH] add asm-generic/mman.h
Make new MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK consistent across all
arches.  The idea is to make it possible to use them portably even before
distros include them in libc headers.

Move common flags to asm-generic/mman.h

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15 15:32:22 -08:00
Christian Trefzer
9f672004ab [PATCH] neofb: avoid resetting display config on unblank (v2)
There were two mistakes in the register-read-on-(un)blank approach.

- First, without proper register (un)locking the value read back will always
  be zero, and this is what I missed entirely until just now.  Due to this,
  the logic could not be verified at all and I tried some bogus checks which
  are completely stupid.

- Second, the LCD status bit will always be set to zero when the backlight
  has been turned off.  Reading the value back during unblank will disable the
  LCD unconditionally, regardless of the state it is supposed to be in, since
  we set it to zero beforehand.

So this is what we do now:

- create a new variable in struct neofb_par, and use that to determine
  whether to read back registers (initialized to true)

- before actually blanking the screen, read back the register to sense any
  possible change made through Fn key combo

- use proper neoUnlock() / neoLock() to actually read something

- every call to neofb_blank() determines if we read back next time: blanking
  disables readback, unblanking (FB_BLANK_UNBLANK) enables it

This should give us a nice and clean state machine.  Has been thoroughly
tested on a Dell Latitude CPiA / NM220 Chip docked to a C/Dock2 with attached
CRT in all possible combinations of LCD/CRT on/off.  I changed the config via
Fn key, let the console blank, unblanked by keypress - works flawlessly.

Signed-off-by: Christian Trefzer <ctrefzer@gmx.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15 15:32:21 -08:00
Patrick McHardy
9c92d34864 [NETFILTER]: Don't invoke okfn in CONFIG_NETFILTER=n variant of nf_hook()
nf_hook() is supposed to call the netfilter hook and return control of the
packet back to the caller in case it may pass, the okfn is only used for
queueing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:18:19 -08:00
Tony Luck
72166c35f0 Pull fix-cpu-possible-map into release branch 2006-02-15 15:17:57 -08:00
Patrick McHardy
48d5cad87c [XFRM]: Fix SNAT-related crash in xfrm4_output_finish
When a packet matching an IPsec policy is SNATed so it doesn't match any
policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish
crash because of a NULL pointer dereference.

This patch directs these packets to the original output path instead. Since
the packets have already passed the POST_ROUTING hook, but need to start at
the beginning of the original output path which includes another
POST_ROUTING invocation, a flag is added to the IPCB to indicate that the
packet was rerouted and doesn't need to pass the POST_ROUTING hook again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:10:22 -08:00
hawkes@sgi.com
defbb2c929 [IA64] ia64: simplify and fix udelay()
The original ia64 udelay() was simple, but flawed for platforms without
synchronized ITCs:  a preemption and migration to another CPU during the
while-loop likely resulted in too-early termination or very, very
lengthy looping.

The first fix (now in 2.6.15) broke the delay loop into smaller,
non-preemptible chunks, reenabling preemption between the chunks.  This
fix is flawed in that the total udelay is computed to be the sum of just
the non-premptible while-loop pieces, i.e., not counting the time spent
in the interim preemptible periods.  If an interrupt or a migration
occurs during one of these interim periods, then that time is invisible
and only serves to lengthen the effective udelay().

This new fix backs out the current flawed fix and returns to a simple
udelay(), fully preemptible and interruptible.  It implements two simple
alternative udelay() routines:  one a default generic version that uses
ia64_get_itc(), and the other an sn-specific version that uses that
platform's RTC.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 13:37:04 -08:00
Dean Nelson
4c2cd96696 [IA64-SGI] enforce proper ordering of callouts by XPC
Fix XPC so that it does not deliver any messages until the connected
callout has returned, as well as, prevent the disconnected callout to
occur before the disconnecting callout has returned.

Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 13:35:03 -08:00
Dean Roe
c2a4969ba1 [IA64-SGI] fix the size of __sn_cnodeid_to_nasid
The __sn_cnodeid_to_nasid array was incorrectly sized at MAX_NUMNODES.
On a large system, this array could overflow.  The following patch
corrects this by defining it to MAX_COMPACT_NODES.

Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 13:33:48 -08:00
Jes Sorensen
d3454344b3 [IA64] remove obsolete corporate address
Remove obsolete SGI address

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 13:25:37 -08:00
Jes Sorensen
8ed9b2c7a8 [IA64-SGI] sn2 minor fixes and cleanups
General SN2 code cleanup:
 - Do not initialize global variables to zero
 - Use kzalloc instead of kmalloc+memset
 - Check kmalloc return values
 - Do not obfuscate spin lock calls
 - Remove some unused code
 - Various formatting cleanups

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-15 13:24:45 -08:00
Oleg Nesterov
5ecfbae093 [PATCH] fix zap_thread's ptrace related problems
1. The tracee can go from ptrace_stop() to do_signal_stop()
   after __ptrace_unlink(p).

2. It is unsafe to __ptrace_unlink(p) while p->parent may wait
   for tasklist_lock in ptrace_detach().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15 11:05:43 -08:00
Patrick McHardy
ee68cea2c2 [NETFILTER]: Fix xfrm lookup after SNAT
To find out if a packet needs to be handled by IPsec after SNAT, packets
are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This
breaks SNAT of non-unicast packets to non-local addresses because the
packet is routed as incoming packet and no neighbour entry is bound to the
dst_entry. In general, it seems to be a bad idea to replace the dst_entry
after the packet was already sent to the output routine because its state
might not match what's expected.

This patch changes the xfrm lookup in POST_ROUTING to re-use the original
dst_entry without routing the packet again. This means no policy routing
can be used for transport mode transforms (which keep the original route)
when packets are SNATed to match the policy, but it looks like the best
we can do for now.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 01:34:23 -08:00
David Howells
28baebae73 [PATCH] FRV: Use virtual interrupt disablement
Make the FRV arch use virtual interrupt disablement because accesses to the
processor status register (PSR) are relatively slow and because we will
soon have the need to deal with multiple interrupt controls at the same
time (separate h/w and inter-core interrupts).

The way this is done is to dedicate one of the four integer condition code
registers (ICC2) to maintaining a virtual interrupt disablement state
whilst inside the kernel.  This uses the ICC2.Z flag (Zero) to indicate
whether the interrupts are virtually disabled and the ICC2.C flag (Carry)
to indicate whether the interrupts are physically disabled.

ICC2.Z is set to indicate interrupts are virtually disabled.  ICC2.C is set
to indicate interrupts are physically enabled.  Under normal running
conditions Z==0 and C==1.

Disabling interrupts with local_irq_disable() doesn't then actually
physically disable interrupts - it merely sets ICC2.Z to 1.  Should an
interrupt then happen, the exception prologue will note ICC2.Z is set and
branch out of line using one instruction (an unlikely BEQ).  Here it will
physically disable interrupts and clear ICC2.C.

When it comes time to enable interrupts (local_irq_enable()), this simply
clears the ICC2.Z flag and invokes a trap #2 if both Z and C flags are
clear (the HI integer condition).  This can be done with the TIHI
conditional trap instruction.

The trap then physically reenables interrupts and sets ICC2.C again.  Upon
returning the interrupt will be taken as interrupts will then be enabled.
Note that whilst processing the trap, the whole exceptions system is
disabled, and so an interrupt can't happen till it returns.

If no pending interrupt had happened, ICC2.C would still be set, the HI
condition would not be fulfilled, and no trap will happen.

Saving interrupts (local_irq_save) is simply a matter of pulling the ICC2.Z
flag out of the CCR register, shifting it down and masking it off.  This
gives a result of 0 if interrupts were enabled and 1 if they weren't.

Restoring interrupts (local_irq_restore) is then a matter of taking the
saved value mentioned previously and XOR'ing it against 1.  If it was one,
the result will be zero, and if it was zero the result will be non-zero.
This result is then used to affect the ICC2.Z flag directly (it is a
condition code flag after all).  An XOR instruction does not affect the
Carry flag, and so that bit of state is unchanged.  The two flags can then
be sampled to see if they're both zero using the trap (TIHI) as for the
unconditional reenablement (local_irq_enable).

This patch also:

 (1) Modifies the debugging stub (break.S) to handle single-stepping crossing
     into the trap #2 handler and into virtually disabled interrupts.

 (2) Removes superseded fixup pointers from the second instructions in the trap
     tables (there's no a separate fixup table for this).

 (3) Declares the trap #3 vector for use in .org directives in the trap table.

 (4) Moves irq_enter() and irq_exit() in do_IRQ() to avoid problems with
     virtual interrupt handling, and removes the duplicate code that has now
     been folded into irq_exit() (softirq and preemption handling).

 (5) Tells the compiler in the arch Makefile that ICC2 is now reserved.

 (6) Documents the in-kernel ABI, including the virtual interrupts.

 (7) Renames the old irq management functions to different names.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:35 -08:00
David Howells
68f624fc8b [PATCH] FRV: Miscellaneous fixes
Make various alterations and fixes to the FRV arch:

 (1) Resyncs the FRV system call collection with the i386 arch.

 (2) Discards __iounmap() as it's not used.

 (3) Fixes the use of the SWAP/SWAPI instruction to get the arguments the right
     way around in atomic.h, and also to get the asm constraints correct.

 (4) Moves copy_to/from_user_page() to asm/cacheflush.h to be consistent with
     other archs.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:35 -08:00
Chen, Kenneth W
d6077cb80c [PATCH] sched: revert "filter affine wakeups"
Revert commit d7102e95b7:

    [PATCH] sched: filter affine wakeups

Apparently caused more than 10% performance regression for aim7 benchmark.
The setup in use is 16-cpu HP rx8620, 64Gb of memory and 12 MSA1000s with 144
disks.  Each disk is 72Gb with a single ext3 filesystem (courtesy of HP, who
supplied benchmark results).

The problem is, for aim7, the wake-up pattern is random, but it still needs
load balancing action in the wake-up path to achieve best performance.  With
the above commit, lack of load balancing hurts that workload.

However, for workloads like database transaction processing, the requirement
is exactly opposite.  In the wake up path, best performance is achieved with
absolutely zero load balancing.  We simply wake up the process on the CPU that
it was previously run.  Worst performance is obtained when we do load
balancing at wake up.

There isn't an easy way to auto detect the workload characteristics.  Ingo's
earlier patch that detects idle CPU and decide whether to load balance or not
doesn't perform with aim7 either since all CPUs are busy (it causes even
bigger perf.  regression).

Revert commit d7102e95b7, which causes more
than 10% performance regression with aim7.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:34 -08:00
Michael S. Tsirkin
f822566165 [PATCH] madvise MADV_DONTFORK/MADV_DOFORK
Currently, copy-on-write may change the physical address of a page even if the
user requested that the page is pinned in memory (either by mlock or by
get_user_pages).  This happens if the process forks meanwhile, and the parent
writes to that page.  As a result, the page is orphaned: in case of
get_user_pages, the application will never see any data hardware DMA's into
this page after the COW.  In case of mlock'd memory, the parent is not getting
the realtime/security benefits of mlock.

In particular, this affects the Infiniband modules which do DMA from and into
user pages all the time.

This patch adds madvise options to control whether memory range is inherited
across fork.  Useful e.g.  for when hardware is doing DMA from/into these
pages.  Could also be useful to an application wanting to speed up its forks
by cutting large areas out of consideration.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:34 -08:00
James Bottomley
8b09fb3451 [PATCH] fix x86 topology export in sysfs for subarchitectures
The correct way to export hyperthreading based functions is to predicate
them on CONFIG_X86_HT.  Without this, the topology exporting patch breaks
the build on all non-PC x86 subarchitectures.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:34 -08:00
Trond Myklebust
5ac5f9d1ce [PATCH] NLM: Fix the NLM_GRANTED callback checks
If 2 threads attached to the same process are blocking on different locks on
different files (maybe even on different servers) but have the same lock
arguments (i.e.  same offset+length - actually quite common, since most
processes try to lock the entire file) then the first GRANTED call that wakes
one up will also wake the other.

Currently when the NLM_GRANTED callback comes in, lockd walks the list of
blocked locks in search of a match to the lock that the NLM server has
granted.  Although it checks the lock pid, start and end, it fails to check
the filehandle and the server address.

By checking the filehandle and server IP address, we ensure that this only
happens if the locks truly are referencing the same file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:34 -08:00
Mark Fasheh
7c8903f637 [PATCH] jbd: revert checkpoint list changes
This patch reverts commit f93ea411b7:
  [PATCH] jbd: split checkpoint lists

This broke journal_flush() for OCFS2, which is its method of being sure
that metadata is sent to disk for another node.

And two related commits 8d3c7fce2d and
43c3e6f5ab with the subjects:
  [PATCH] jbd: log_do_checkpoint fix
  [PATCH] jbd: remove_transaction fix

These seem to be incremental bugfixes on the original patch and as such are
no longer needed.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:34 -08:00
Ashok Raj
a6b14fa6fd [IA64] Count disabled cpus as potential hot-pluggable CPUs
Have a facility to account for potentially hot-pluggable CPUs. ACPI doesnt
give a determinstic method to find hot-pluggable CPUs. Hence we use 2 methods
to assist.

- BIOS can mark potentially hot-pluggable CPUs as disabled in the MADT tables.
- User can specify the number of hot-pluggable CPUs via parameter
  additional_cpus=X

The option is enabled only if ACPI_CONFIG_HOTPLUG_CPU=y which enables the
physical hotplug option. Without which user can still use logical onlining
and offlining of CPUs by enabling CONFIG_HOTPLUG_CPU=y

Adds more bits to cpu_possible_map for potentially hot-pluggable cpus.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-14 15:37:58 -08:00
Maciej W. Rozycki
9cf8ff9644 [MIPS] Fix CPU type bitmasks for MIPS III, IV and V.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-02-14 19:13:25 +00:00