Commit Graph

13798 Commits

Author SHA1 Message Date
Linus Torvalds
9a9136e270 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  sound: convert "sound" subdirectory to UTF-8
  MAINTAINERS: Add cxacru website/mailing list
  include files: convert "include" subdirectory to UTF-8
  general: convert "kernel" subdirectory to UTF-8
  documentation: convert the Documentation directory to UTF-8
  Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
  remove broken URLs from net drivers' output
  Magic number prefix consistency change to Documentation/magic-number.txt
  trivial: s/i_sem /i_mutex/
  fix file specification in comments
  drivers/base/platform.c: fix small typo in doc
  misc doc and kconfig typos
  Remove obsolete fat_cvf help text
  Fix occurrences of "the the "
  Fix minor typoes in kernel/module.c
  Kconfig: Remove reference to external mqueue library
  Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
  Correct comments in genrtc.c to refer to correct /proc file.
  Fix more "deprecated" spellos.
  Fix "deprecated" typoes.
  ...

Fix trivial comment conflict in kernel/relay.c.
2007-05-09 12:54:17 -07:00
Linus Torvalds
3960208f9c Merge branch 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32
* 'for-linus' of git://www.atmel.no/~hskinnemoen/linux/kernel/avr32:
  [AVR32] Wire up sys_utimensat
  [AVR32] Fix section mismatch .taglist -> .init.text
  [AVR32] Implement dma_{alloc,free}_writecombine()
  AVR32: Spinlock initializer cleanup
  [AVR32] Use correct config symbol when setting cpuflags
2007-05-09 12:50:25 -07:00
H. Peter Anvin
b0b73cb41d i386: msr.h: be paranoid about types and parentheses
When implementing things as macros, make sure we use typecasts and
parentheses where needed.  The macros as defined were vulnerable to
surreptitious promotion causing problems.

Avoid macros where practical; e.g. wrmsr() can be an inline instead.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:49:33 -07:00
H. Peter Anvin
29bd443377 i386: remove unused rdtsc() macro
All users to the two-part rdtsc() macro have already switched to using
rdtscl() or rdtscll().  Remove the now-obsolete macro.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:49:33 -07:00
NeilBrown
5b479c91da md: improve partition detection in md array
md currently uses ->media_changed to make sure rescan_partitions
is call on md array after they are assembled.

However that doesn't happen until the array is opened, which is later
than some people would like.

So use blkdev_ioctl to do the rescan immediately that the
array has been assembled.

This means we can remove all the ->change infrastructure as it was only used
to trigger a partition rescan.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
Haavard Skinnemoen
880169dd2e fbdev: add support for AVR32
Provide framebuffer page protection flags and definitions of
fb_readl/fb_writel for AVR32.

Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
Antonino A. Daplas
5a87ede945 svgalib: move fb_get_caps to svgalib
Move fb_get_caps() method to svgalib.c as svga_get_caps() so it can be used by
s3fb, arkfb and vt8623fb.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
David Rientjes
c3c117f06e i386 mmzone: use __maybe_unused
Replace automatic variable instances of __attribute__ ((unused)) with
__maybe_unused.

Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
David Rientjes
d16aaffa75 sh: dma: use __maybe_unused
There is no such thing as labeling a variable as __attribute__((used)).  Since
ts_shift is not referenced in inline assembly, we assume that we're simply
suppressing a warning here if the variable is declared but unreferenced.

Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:57 -07:00
David Rientjes
0d7ebbbc6e compiler: introduce __used and __maybe_unused
__used is defined to be __attribute__((unused)) for all pre-3.3 gcc
compilers to suppress warnings for unused functions because perhaps they
are referenced only in inline assembly.  It is defined to be
__attribute__((used)) for gcc 3.3 and later so that the code is still
emitted for such functions.

__maybe_unused is defined to be __attribute__((unused)) for both function
and variable use if it could possibly be unreferenced due to the evaluation
of preprocessor macros.  Function prototypes shall be marked with
__maybe_unused if the actual definition of the function is dependant on
preprocessor macros.

No update to compiler-intel.h is necessary because ICC supports both
__attribute__((used)) and __attribute__((unused)) as specified by the gcc
manual.

__attribute_used__ is deprecated and will be removed once all current
code is converted to using __used.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Roman Zippel
f7e4217b00 rename thread_info to stack
This finally renames the thread_info field in task structure to stack, so that
the assumptions about this field are gone and archs have more freedom about
placing the thread_info structure.

Nonbroken archs which have a proper thread pointer can do the access to both
current thread and task structure via a single pointer.

It'll allow for a few more cleanups of the fork code, from which e.g.  ia64
could benefit.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
[akpm@linux-foundation.org: build fix]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Roman Zippel
c9f4f06d31 wrap access to thread_info
Recently a few direct accesses to the thread_info in the task structure snuck
back, so this wraps them with the appropriate wrapper.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Roman Zippel
e61a1c1c4f Allow arch to initialize arch field of the module structure
This will later allow an arch to add module specific information via linker
generated tables instead of poking directly in the module object structure.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Thomas Gleixner
b52f52a093 clocksource: fix resume logic
We need to make sure that the clocksources are resumed, when timekeeping is
resumed.  The current resume logic does not guarantee this.

Add a resume function pointer to the clocksource struct, so clocksource
drivers which need to reinitialize the clocksource can provide a resume
function.

Add a resume function, which calls the maybe available clocksource resume
functions and resets the watchdog function, so a stable TSC can be used
accross suspend/resume.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Christoph Lameter
4037d45220 Move remote node draining out of slab allocators
Currently the slab allocators contain callbacks into the page allocator to
perform the draining of pagesets on remote nodes.  This requires SLUB to have
a whole subsystem in order to be compatible with SLAB.  Moving node draining
out of the slab allocators avoids a section of code in SLUB.

Move the node draining so that is is done when the vm statistics are updated.
At that point we are already touching all the cachelines with the pagesets of
a processor.

Add a expire counter there.  If we have to update per zone or global vm
statistics then assume that the pageset will require subsequent draining.

The expire counter will be decremented on each vm stats update pass until it
reaches zero.  Then we will drain one batch from the pageset.  The draining
will cause vm counter updates which will then cause another expiration until
the pcp is empty.  So we will drain a batch every 3 seconds.

Note that remote node draining is a somewhat esoteric feature that is required
on large NUMA systems because otherwise significant portions of system memory
can become trapped in pcp queues.  The number of pcp is determined by the
number of processors and nodes in a system.  A system with 4 processors and 2
nodes has 8 pcps which is okay.  But a system with 1024 processors and 512
nodes has 512k pcps with a high potential for large amount of memory being
caught in them.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Christoph Lameter
d1187ed210 vmstat: use our own timer events
vmstat is currently using the cache reaper to periodically bring the
statistics up to date.  The cache reaper does only exists in SLUB as a way to
provide compatibility with SLAB.  This patch removes the vmstat calls from the
slab allocators and provides its own handling.

The advantage is also that we can use a different frequency for the updates.
Refreshing vm stats is a pretty fast job so we can run this every second and
stagger this by only one tick.  This will lead to some overlap in large
systems.  F.e a system running at 250 HZ with 1024 processors will have 4 vm
updates occurring at once.

However, the vm stats update only accesses per node information.  It is only
necessary to stagger the vm statistics updates per processor in each node.  Vm
counter updates occurring on distant nodes will not cause cacheline
contention.

We could implement an alternate approach that runs the first processor on each
node at the second and then each of the other processor on a node on a
subsequent tick.  That may be useful to keep a large amount of the second free
of timer activity.  Maybe the timer folks will have some feedback on this one?

[jirislaby@gmail.com: add missing break]
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Rafael J. Wysocki
8bb7844286 Add suspend-related notifications for CPU hotplug
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Nate Diller
f37bc2712b fs: deprecate memclear_highpage_flush
Now that all the in-tree users are converted over to zero_user_page(),
deprecate the old memclear_highpage_flush() call.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:56 -07:00
Nate Diller
01f2705daf fs: convert core functions to zero_user_page
It's very common for file systems to need to zero part or all of a page,
the simplist way is just to use kmap_atomic() and memset().  There's
actually a library function in include/linux/highmem.h that does exactly
that, but it's confusingly named memclear_highpage_flush(), which is
descriptive of *how* it does the work rather than what the *purpose* is.
So this patchset renames the function to zero_user_page(), and calls it
from the various places that currently open code it.

This first patch introduces the new function call, and converts all the
core kernel callsites, both the open-coded ones and the old
memclear_highpage_flush() ones.  Following this patch is a series of
conversions for each file system individually, per AKPM, and finally a
patch deprecating the old call.  The diffstat below shows the entire
patchset.

[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:55 -07:00
Eric Dumazet
34f01cc1f5 FUTEX: new PRIVATE futexes
Analysis of current linux futex code :
  --------------------------------------

A central hash table futex_queues[] holds all contexts (futex_q) of waiting
threads.

Each futex_wait()/futex_wait() has to obtain a spinlock on a hash slot to
perform lookups or insert/deletion of a futex_q.

When a futex_wait() is done, calling thread has to :

1) - Obtain a read lock on mmap_sem to be able to validate the user pointer
     (calling find_vma()). This validation tells us if the futex uses
     an inode based store (mapped file), or mm based store (anonymous mem)

2) - compute a hash key

3) - Atomic increment of reference counter on an inode or a mm_struct

4) - lock part of futex_queues[] hash table

5) - perform the test on value of futex.
	(rollback is value != expected_value, returns EWOULDBLOCK)
	(various loops if test triggers mm faults)

6) queue the context into hash table, release the lock got in 4)

7) - release the read_lock on mmap_sem

   <block>

8) Eventually unqueue the context (but rarely, as this part  may be done
   by the futex_wake())

Futexes were designed to improve scalability but current implementation has
various problems :

- Central hashtable :

  This means scalability problems if many processes/threads want to use
  futexes at the same time.
  This means NUMA unbalance because this hashtable is located on one node.

- Using mmap_sem on every futex() syscall :

  Even if mmap_sem is a rw_semaphore, up_read()/down_read() are doing atomic
  ops on mmap_sem, dirtying cache line :
    - lot of cache line ping pongs on SMP configurations.

  mmap_sem is also extensively used by mm code (page faults, mmap()/munmap())
  Highly threaded processes might suffer from mmap_sem contention.

  mmap_sem is also used by oprofile code. Enabling oprofile hurts threaded
  programs because of contention on the mmap_sem cache line.

- Using an atomic_inc()/atomic_dec() on inode ref counter or mm ref counter:
  It's also a cache line ping pong on SMP. It also increases mmap_sem hold time
  because of cache misses.

Most of these scalability problems come from the fact that futexes are in
one global namespace.  As we use a central hash table, we must make sure
they are all using the same reference (given by the mm subsystem).  We
chose to force all futexes be 'shared'.  This has a cost.

But fact is POSIX defined PRIVATE and SHARED, allowing clear separation,
and optimal performance if carefuly implemented.  Time has come for linux
to have better threading performance.

The goal is to permit new futex commands to avoid :
 - Taking the mmap_sem semaphore, conflicting with other subsystems.
 - Modifying a ref_count on mm or an inode, still conflicting with mm or fs.

This is possible because, for one process using PTHREAD_PROCESS_PRIVATE
futexes, we only need to distinguish futexes by their virtual address, no
matter the underlying mm storage is.

If glibc wants to exploit this new infrastructure, it should use new
_PRIVATE futex subcommands for PTHREAD_PROCESS_PRIVATE futexes.  And be
prepared to fallback on old subcommands for old kernels.  Using one global
variable with the FUTEX_PRIVATE_FLAG or 0 value should be OK.

PTHREAD_PROCESS_SHARED futexes should still use the old subcommands.

Compatibility with old applications is preserved, they still hit the
scalability problems, but new applications can fly :)

Note : the same SHARED futex (mapped on a file) can be used by old binaries
*and* new binaries, because both binaries will use the old subcommands.

Note : Vast majority of futexes should be using PROCESS_PRIVATE semantic,
as this is the default semantic. Almost all applications should benefit
of this changes (new kernel and updated libc)

Some bench results on a Pentium M 1.6 GHz (SMP kernel on a UP machine)

/* calling futex_wait(addr, value) with value != *addr */
433 cycles per futex(FUTEX_WAIT) call (mixing 2 futexes)
424 cycles per futex(FUTEX_WAIT) call (using one futex)
334 cycles per futex(FUTEX_WAIT_PRIVATE) call (mixing 2 futexes)
334 cycles per futex(FUTEX_WAIT_PRIVATE) call (using one futex)
For reference :
187 cycles per getppid() call
188 cycles per umask() call
181 cycles per ni_syscall() call

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Pierre Peiffer <pierre.peiffer@bull.net>
Cc: "Ulrich Drepper" <drepper@gmail.com>
Cc: "Nick Piggin" <nickpiggin@yahoo.com.au>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:55 -07:00
Pierre Peiffer
d0aa7a70bf futex_requeue_pi optimization
This patch provides the futex_requeue_pi functionality, which allows some
threads waiting on a normal futex to be requeued on the wait-queue of a
PI-futex.

This provides an optimization, already used for (normal) futexes, to be used
with the PI-futexes.

This optimization is currently used by the glibc in pthread_broadcast, when
using "normal" mutexes.  With futex_requeue_pi, it can be used with
PRIO_INHERIT mutexes too.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:55 -07:00
Pierre Peiffer
c19384b5b2 Make futex_wait() use an hrtimer for timeout
This patch modifies futex_wait() to use an hrtimer + schedule() in place of
schedule_timeout().

schedule_timeout() is tick based, therefore the timeout granularity is the
tick (1 ms, 4 ms or 10 ms depending on HZ).  By using a high resolution timer
for timeout wakeup, we can attain a much finer timeout granularity (in the
microsecond range).  This parallels what is already done for futex_lock_pi().

The timeout passed to the syscall is no longer converted to jiffies and is
therefore passed to do_futex() and futex_wait() as an absolute ktime_t
therefore keeping nanosecond resolution.

Also this removes the need to pass the nanoseconds timeout part to
futex_lock_pi() in val2.

In futex_wait(), if there is no timeout then a regular schedule() is
performed.  Otherwise, an hrtimer is fired before schedule() is called.

[akpm@linux-foundation.org: fix `make headers_check']
Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:55 -07:00
Andrew Morton
f34c506b03 declare struct ktime
Some smarty went and inflicted ktime_t as a typedef upon us, so we cannot
forward declare it.

Create a new `union ktime', map ktime_t onto that.  Now we need to kill off
this ktime_t thing.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Andrew Morton
b8522ead35 aio is unlikely
Stick an unlikely() around is_aio(): I assert that most IO is synchronous.

Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Jeff Layton
cd123012d9 RPC: add wrapper for svc_reserve to account for checksum
When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet.  If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:

RPC request reserved 164 but used 208

While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726

This is probably also a problem for other sec= types and for NFSv4.  The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.

This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum.  It also fixes up the appropriate callers of svc_reserve to
call the wrapper.  For now, it just uses a hardcoded value that I
determined via testing.  That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.

Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
NeilBrown
7ac1bea550 knfsd: rename sk_defer_lock to sk_lock
Now that sk_defer_lock protects two different things, make the name more
generic.

Also don't bother with disabling _bh as the lock is only ever taken from
process context.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Adrian Bunk
8842c9655b remove nfs4_acl_add_ace()
nfs4_acl_add_ace() can now be removed.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:54 -07:00
Oleg Nesterov
10ab825bde change kernel threads to ignore signals instead of blocking them
Currently kernel threads use sigprocmask(SIG_BLOCK) to protect against
signals.  This doesn't prevent the signal delivery, this only blocks
signal_wake_up().  Every "killall -33 kthreadd" means a "struct siginfo"
leak.

Change kthreadd_setup() to set all handlers to SIG_IGN instead of blocking
them (make a new helper ignore_signals() for that).  If the kernel thread
needs some signal, it should use allow_signal() anyway, and in that case it
should not use CLONE_SIGHAND.

Note that we can't change daemonize() (should die!) in the same way,
because it can be used along with CLONE_SIGHAND.  This means that
allow_signal() still should unblock the signal to work correctly with
daemonize()ed threads.

However, disallow_signal() doesn't block the signal any longer but ignores
it.

NOTE: with or without this patch the kernel threads are not protected from
handle_stop_signal(), this seems harmless, but not good.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:53 -07:00
Eric W. Biederman
73c279927f kthread: don't depend on work queues
Currently there is a circular reference between work queue initialization
and kthread initialization.  This prevents the kthread infrastructure from
initializing until after work queues have been initialized.

We want the properties of tasks created with kthread_create to be as close
as possible to the init_task and to not be contaminated by user processes.
The later we start our kthreadd that creates these tasks the harder it is
to avoid contamination from user processes and the more of a mess we have
to clean up because the defaults have changed on us.

So this patch modifies the kthread support to not use work queues but to
instead use a simple list of structures, and to have kthreadd start from
init_task immediately after our kernel thread that execs /sbin/init.

By being a true child of init_task we only have to change those process
settings that we want to have different from init_task, such as our process
name, the cpus that are allowed, blocking all signals and setting SIGCHLD
to SIG_IGN so that all of our children are reaped automatically.

By being a true child of init_task we also naturally get our ppid set to 0
and do not wind up as a child of PID == 1.  Ensuring that tasks generated
by kthread_create will not slow down the functioning of the wait family of
functions.

[akpm@linux-foundation.org: use interruptible sleeps]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:53 -07:00
Oleg Nesterov
28e53bddf8 unify flush_work/flush_work_keventd and rename it to cancel_work_sync
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:53 -07:00
Oleg Nesterov
23b2e5991a workqueue: kill NOAUTOREL works
We don't have any users, and it is not so trivial to use NOAUTOREL works
correctly.  It is better to simplify API.

Delete NOAUTOREL support and rename work_release to work_clear_pending to
avoid a confusion.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:52 -07:00
Oleg Nesterov
1634c48f8b make cancel_rearming_delayed_work() work on any workqueue, not just keventd_wq
cancel_rearming_delayed_workqueue(wq, dwork) doesn't need the first
parameter.  We don't hang on un-queued dwork any longer, and work->data
doesn't change its type.  This means we can always figure out "wq" from
dwork when it is needed.

Remove this parameter, and rename the function to
cancel_rearming_delayed_work().  Re-create an inline "obsolete"
cancel_rearming_delayed_workqueue(wq) which just calls
cancel_rearming_delayed_work().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:52 -07:00
Oleg Nesterov
7097a87afe workqueue: kill run_scheduled_work()
Because it has no callers.

Actually, I think the whole idea of run_scheduled_work() was not right, not
good to mix "unqueue this work and execute its ->func()" in one function.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:52 -07:00
Gautham R Shenoy
baaca49f41 Define and use new events,CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE
This is an attempt to provide an alternate mechanism for postponing
a hotplug event instead of using a global mechanism like lock_cpu_hotplug.

The proposal is to add two new events namely CPU_LOCK_ACQUIRE and
CPU_LOCK_RELEASE. The notification for these two events would be sent
out before and after a cpu_hotplug event respectively.

During the CPU_LOCK_ACQUIRE event, a cpu-hotplug-aware subsystem is
supposed to acquire any per-subsystem hotcpu mutex ( Eg. workqueue_mutex
in kernel/workqueue.c ).

During the CPU_LOCK_RELEASE release event the cpu-hotplug-aware subsystem
is supposed to release the per-subsystem hotcpu mutex.

The reasons for defining new events as opposed to reusing the existing events
like CPU_UP_PREPARE/CPU_UP_FAILED/CPU_ONLINE for locking/unlocking of
per-subsystem hotcpu mutexes are as follow:

	- CPU_LOCK_ACQUIRE: All hotcpu mutexes are taken before subsystems
	start handling pre-hotplug events like CPU_UP_PREPARE/CPU_DOWN_PREPARE
	etc, thus ensuring a clean handling of these events.

	- CPU_LOCK_RELEASE: The hotcpu mutexes will be released only after
	all subsystems have handled post-hotplug events like CPU_DOWN_FAILED,
	CPU_DEAD,CPU_ONLINE etc thereby ensuring that there are no subsequent
	clashes amongst the interdependent subsystems after a cpu hotplugs.

This patch also uses __raw_notifier_call chain in _cpu_up to take care
of the dependency between the two consequetive calls to
raw_notifier_call_chain.

[akpm@linux-foundation.org: fix a bug]
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:51 -07:00
Gautham R Shenoy
6f7cc11aa6 Extend notifier_call_chain to count nr_calls made
Since 2.6.18-something, the community has been bugged by the problem to
provide a clean and a stable mechanism to postpone a cpu-hotplug event as
lock_cpu_hotplug was badly broken.

This is another proposal towards solving that problem.  This one is along the
lines of the solution provided in kernel/workqueue.c

Instead of having a global mechanism like lock_cpu_hotplug, we allow the
subsytems to define their own per-subsystem hot cpu mutexes.  These would be
taken(released) where ever we are currently calling
lock_cpu_hotplug(unlock_cpu_hotplug).

Also, in the per-subsystem hotcpu callback function,we take this mutex before
we handle any pre-cpu-hotplug events and release it once we finish handling
the post-cpu-hotplug events.  A standard means for doing this has been
provided in [PATCH 2/4] and demonstrated in [PATCH 3/4].

The ordering of these per-subsystem mutexes might still prove to be a
problem, but hopefully lockdep should help us get out of that muddle.

The patch set to be applied against linux-2.6.19-rc5 is as follows:

[PATCH 1/4] :	Extend notifier_call_chain with an option to specify the
		number of notifications to be sent and also count the
		number of notifications actually sent.

[PATCH 2/4] :	Define events CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE
		and send out notifications for these in _cpu_up and
		_cpu_down. This would help us standardise the acquire and
		release of the subsystem locks in the hotcpu
		callback functions of these subsystems.

[PATCH 3/4] :	Eliminate lock_cpu_hotplug from kernel/sched.c.

[PATCH 4/4] :	In workqueue_cpu_callback function, acquire(release) the
		workqueue_mutex while handling
		CPU_LOCK_ACQUIRE(CPU_LOCK_RELEASE).

If the per-subsystem-locking approach survives the test of time, we can expect
a slow phasing out of lock_cpu_hotplug, which has not yet been eliminated in
these patches :)

This patch:

Provide notifier_call_chain with an option to call only a specified number of
notifiers and also record the number of call to notifiers made.

The need for this enhancement was identified in the post entitled
"Slab - Eliminate lock_cpu_hotplug from slab"
(http://lkml.org/lkml/2006/10/28/92) by Ravikiran G Thirumalai and
Andrew Morton.

This patch adds two additional parameters to notifier_call_chain API namely
 - int nr_to_calls : Number of notifier_functions to be called.
 		     The don't care value is -1.

 - unsigned int *nr_calls : Records the total number of notifier_funtions
			    called by notifier_call_chain. The don't care
			    value is NULL.

[michal.k.k.piotrowski@gmail.com: build fix]
Credit: Andrew Morton <akpm@osdl.org>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:51 -07:00
Tom Zanussi
7c9cb38302 relay: use plain timer instead of delayed work
relay doesn't need to use schedule_delayed_work() for waking readers
when a simple timer will do.

Signed-off-by: Tom Zanussi <zanussi@comcast.net>
Cc: Satyam Sharma <satyam.sharma@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:51 -07:00
Andrew Morton
19a75d83ff kblockd: use flush_work
Switch the kblockd flushing from a global flush to a more specific
flush_work().

(akpm: bypassed maintainers, sorry.  There are other patches which depend on
this)

Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:51 -07:00
Oleg Nesterov
b89deed32c implement flush_work()
A basic problem with flush_scheduled_work() is that it blocks behind _all_
presently-queued works, rather than just the work whcih the caller wants to
flush.  If the caller holds some lock, and if one of the queued work happens
to want that lock as well then accidental deadlocks can occur.

One example of this is the phy layer: it wants to flush work while holding
rtnl_lock().  But if a linkwatch event happens to be queued, the phy code will
deadlock because the linkwatch callback function takes rtnl_lock.

So we implement a new function which will flush a *single* work - just the one
which the caller wants to free up.  Thus we avoid the accidental deadlocks
which can arise from unrelated subsystems' callbacks taking shared locks.

flush_work() non-blockingly dequeues the work_struct which we want to kill,
then it waits for its handler to complete on all CPUs.

Add ->current_work to the "struct cpu_workqueue_struct", it points to
currently running "struct work_struct". When flush_work(work) detects
->current_work == work, it inserts a barrier at the _head_ of ->worklist
(and thus right _after_ that work) and waits for completition. This means
that the next work fired on that CPU will be this barrier, or another
barrier queued by concurrent flush_work(), so the caller of flush_work()
will be woken before any "regular" work has a chance to run.

When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect
against CPU hotplug), CPU may go away. But in that case take_over_work() will
move a barrier we queued to another CPU, it will be fired sometime, and
wait_on_work() will be woken.

Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
take_over_work(), so cwq->thread should complete its ->worklist (and thus
the barrier), because currently we don't check kthread_should_stop() in
run_workqueue(). But even if we did, everything should be ok.

[akpm@osdl.org: cleanup]
[akpm@osdl.org: add flush_work_keventd() wrapper]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:50 -07:00
Haavard Skinnemoen
e7498281d3 Use common cpu_is_xxx() macros on AT91 and AVR32
Several drivers shared between AT91 and AVR32 chips use cpu_is_xxx()
to handle CPU-specific differences. Currently, such code needs to be
inside #ifdef CONFIG_ARCH_AT91 because the macros don't exist on AVR32.

By defining the same macros on both AT91 and AVR32, these #ifdefs can
be eliminated. Since the macros will evaluate to a constant value for
CPUs that aren't supported by the current architecture, any code that
is only needed on AT91 will be optimized away on AVR32 and vice versa.

Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Andrew Victor <andrew@sanpeople.com>
Cc: Nicolas Ferre <nicolas.ferre@rfo.atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:50 -07:00
Andrew Morton
18d8362d51 mutex_lock_interruptible(): add __must_check
It's not sane to use mutex_lock_interruptible() and to then ignore the result.

Ditto down_interruptible(), but I'm lazy.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:49 -07:00
Roland McGrath
55c0d1f83e Move sig_kernel_* et al macros to linux/signal.h
This patch moves the sig_kernel_* and related macros from kernel/signal.c
to linux/signal.h, and cleans them up slightly.  I need the sig_kernel_*
macros for default signal behavior in the utrace code, and want to avoid
duplication or overhead to share the knowledge.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:49 -07:00
James Bottomley
8813d1c00c mca: add integrated device bus matching
The MCA bus has a few "integrated" functions, which are effectively virtual
slots on the bus.  The problem is that these special functions don't have
dedicated pos IDs, so we have to manufacture ids for them outside the pos
space ...  and these ids can't be matched by the standard matching function,
so add a special registration that requests a list of pos ids or a particular
integrated function.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:49 -07:00
Fernando Luis Vazquez Cao
818563dcec Always ask the hardware to obtain hardware processor id - ia64
Always ask the hardware to determine the hardware processor id in both UP and
SMP kernels.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:49 -07:00
Fernando Luis Vazquez Cao
dd988528f4 Use the APIC to determine the hardware processor id - x86_64
hard_smp_processor_id used to be just a macro that hard-coded
hard_smp_processor_id to 0 in the non SMP case.  When booting non SMP kernels
on hardware where the boot ioapic id is not 0 this turns out to be a problem.
This is happens frequently in the case of kdump and once in a great while in
the case of real hardware.

Use the APIC to determine the hardware processor id in both UP and SMP kernels
to fix this issue.

Notice that hard_smp_processor_id is only used by SMP code or by code that
works with apics so we do not need to handle the case when apics are not
present and hard_smp_processor_id should never be called there.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Fernando Luis Vazquez Cao
a36166c6ef Use the APIC to determine the hardware processor id - i386
hard_smp_processor_id used to be just a macro that hard-coded
hard_smp_processor_id to 0 in the non SMP case.  When booting non SMP kernels
on hardware where the boot ioapic id is not 0 this turns out to be a problem.
This is happens frequently in the case of kdump and once in a great while in
the case of real hardware.

Use the APIC to determine the hardware processor id in both UP and SMP kernels
to fix this issue.

Notice that hard_smp_processor_id is only used by SMP code or by code that
works with apics so we do not need to handle the case when apics are not
present and hard_smp_processor_id should never be called there.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Fernando Luis Vazquez Cao
2f4dfe206a Remove hardcoding of hard_smp_processor_id on UP systems
With the advent of kdump, the assumption that the boot CPU when booting an UP
kernel is always the CPU with a particular hardware ID (often 0) (usually
referred to as BSP on some architectures) is not valid anymore.  The reason
being that the dump capture kernel boots on the crashed CPU (the CPU that
invoked crash_kexec), which may be or may not be that particular CPU.

Move definition of hard_smp_processor_id for the UP case to
architecture-specific code ("asm/smp.h") where it belongs, so that each
architecture can provide its own implementation.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Dave Gilbert
dd2a345f8f Display all possible partitions when the root filesystem failed to mount
Display all possible partitions when the root filesystem is not mounted.
This helps to track spell'o's and missing drivers.

Updated to work with newer kernels.

Example output:

VFS: Cannot open root device "foobar" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
0800    8388608 sda driver: sd
  0801     192748 sda1
  0802    8193150 sda2
0810    4194304 sdb driver: sd
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

[akpm@linux-foundation.org: cleanups, fix printk warnings]
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Dave Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Jeff Dike
1e0cb0c3bf uml: fix build breakage
UML now needs required-features.h to build - an empty one suffices.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Rafael J. Wysocki
a3d25c275d PM: Separate hibernation code from suspend code
[ With Johannes Berg <johannes@sipsolutions.net> ]

Separate the hibernation (aka suspend to disk code) from the other suspend
code.  In particular:

 * Remove the definitions related to hibernation from include/linux/pm.h
 * Introduce struct hibernation_ops and a new hibernate() function to hibernate
   the system, defined in include/linux/suspend.h
 * Separate suspend code in kernel/power/main.c from hibernation-related code
   in kernel/power/disk.c and kernel/power/user.c (with the help of
   hibernation_ops)
 * Switch ACPI (the only user of pm_ops.pm_disk_mode) to hibernation_ops

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:48 -07:00
Christoph Lameter
8defab3377 FRV: Replace pgd management via slabs through quicklists
This is done in order to be able to run SLUB which expects no modifications
to its page structs.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:46 -07:00
Stephen Rothwell
97416ce82e Declare {compat_}sys_utimensat
This is needed before Powerpc can wire up the syscall.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:44 -07:00
Jon Burgess
87c3019d7b V4L/DVB (5592): DMA: Correctly free resources on error, sync PCI streamed data
I added saa7146_vmalloc_destroy_pgtable() which frees the resources
allocated by saa7146_vmalloc_build_pgtable() and updated the callers in
budget-core.c and av7110.c. I have also been through the updated
functions and updated the error paths to ensure they free all allocated
resources on error.
I also realised that there are other callers to saa7146_pgtable_free()
which did not have any sg DMA mapped so it seems wrong to add the
pci_unmap_sg() into that function. Instead I created
saa7146_vmalloc_destroy_pgtable() to do this.
Also included in this patch are the previous fixes for pci_unmap_sg()
and syncing the PCI streamed data to work with a SWIOTLB and match the
requirements documented in DMA-API.txt.

Signed-off-by: Jon Burgess <jburgess777@googlemail.com>
Signed-off-by: Oliver Endriss <o.endriss@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2007-05-09 10:12:42 -03:00
Adrian Bunk
32a1db4248 V4L/DVB (5591): Saa7146: proper prototype for saa7146_video_do_ioctl()
This patch adds a proper prototype for saa7146_video_do_ioctl() in
include/media/saa7146_vv.h.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Oliver Endriss <o.endriss@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2007-05-09 10:12:41 -03:00
Robert P. J. Day
42f209d3c9 [MTD] Delete allegedly obsolete "bank_size" field of mtd_info.
Delete the allegedly obsolete "bank_size" member of struct mtd_info.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-09 13:26:52 +01:00
Robert P. J. Day
36200b7600 [MTD] Remove unnecessary user space check from mtd.h.
Since the header file include/linux/mtd/mtd.h is not exported to user
space, remove the user space check and error.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-09 13:24:37 +01:00
Russell King
805f53f085 Merge branches 'armv7', 'at91', 'misc' and 'omap' into devel 2007-05-09 10:41:28 +01:00
Imre Deak
b7cc6d46b4 ARM: OMAP: FB sync with N800 tree (support for dynamic SRAM allocations)
- in addition to fixed FB regions - as passed by the bootloader -
  allow dynamic allocations
- do some more checking against overlapping / reserved regions
- move the FB specific parts out from sram.c to fb.c

Signed-off-by: Imre Deak <imre.deak@solidboot.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-09 10:39:14 +01:00
Kai Svahn
b01d96d653 ARM: OMAP: Sync framebuffer headers with N800 tree
This patch syncs framebuffer headers with N800 tree.

Signed-off-by: Kai Svahn <kai.svahn@nokia.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-09 10:39:03 +01:00
Tony Lindgren
39b8e69867 ARM: OMAP: Fix gpmc header
Fix gpmc header

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-09 10:37:25 +01:00
Hiroshi DOYU
340a614ac6 ARM: OMAP: Add mailbox support for IVA
This patch adds a generic mailbox interface for for DSP and IVA
(Image Video Accelerator). This patch itself doesn't contain
any IVA driver.

Signed-off-by: Hiroshi DOYU <Hiroshi.DOYU@nokia.com>
Signed-off-by: Juha Yrjola <juha.yrjola@solidboot.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-09 10:37:10 +01:00
Catalin Marinas
065cf519c3 [ARM] armv7: add support for asid-tagged VIVT I-cache
ARMv7 can have VIPT, PIPT or ASID-tagged VIVT I-cache. This patch
adds the necessary invalidation of the I-cache when the ASID numbers
are re-used.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-09 09:50:23 +01:00
Manuel Lauss
fc467a2623 sh: SH7760 DMABRG support.
The DMABRG is a special DMA unit within the SH7760 which does data
transfers from main memory to Audio units and USB shared memory.
It has 3 IRQ lines which generate 10 events, which have to be masked
unmasked and acked in a single 32bit register. It works independently
from the tradition SH DMAC, but blocks usage of DMAC channel 0.

This patch adds 2 functions to associate callbacks with DMABRG events
and initialization.

Signed-off-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 17:36:15 +09:00
Paul Mundt
57be2b484a sh: clockevent/clocksource/hrtimers/nohz TMU support.
This adds basic support for clockevents and clocksources,
presently only implemented for TMU-based systems (which
are the majority of SH-3 and SH-4 systems).

The old NO_IDLE_HZ implementation is also dropped completely,
the only users of this were on TMU-based systems anyways.

More work needs to be done to generalize the TMU handling,
in that the current implementation is rather tied to the
notion of TMU0 and TMU1 utilization.

Additionally, as more SH timers switch over to this scheme,
we'll be able to gut most of the remaining system timer
infrastructure that existed before.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 17:33:24 +09:00
Haavard Skinnemoen
47cc3e7804 [AVR32] Wire up sys_utimensat
Tested with a slightly hacked version of the test case included with
the original utimensat patch. All OK.

Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
2007-05-09 10:23:11 +02:00
Haavard Skinnemoen
b3cfe0cb37 [AVR32] Fix section mismatch .taglist -> .init.text
Rename .taglist to .taglist.init to silence section mismatch warnings.
The .taglist.init section was already placed in the .init output
section along with .init.text, so the warning didn't indicate any real
problems.

Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
2007-05-09 09:26:18 +02:00
John Anthony Kazos Jr
121e70b69a include files: convert "include" subdirectory to UTF-8
Convert the "include" subdirectory to UTF-8.

Signed-off-by: John Anthony Kazos Jr. <jakj@j-a-k-j.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:21 +02:00
Uwe Kleine-König
5886269962 fix file specification in comments
Many files include the filename at the beginning, serveral used a wrong one.

Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:16 +02:00
Michael Opdenacker
59c51591a0 Fix occurrences of "the the "
Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:57:56 +02:00
Hugh Dickins
517e22638c [POWERPC] Don't use SLAB/SLUB for PTE pages
The SLUB allocator relies on struct page fields first_page and slab,
overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
be used for the lowest level of pagetable pages.  This was obstructing
SLUB on PowerPC, which uses kmem_caches for its pagetables.  So convert
its pte level to use normal gfp pages (whereas pmd, pud and 64k-page pgd
want partpages, so continue to use kmem_caches for pmd, pud and pgd).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt
f1fa74f4af [POWERPC] Spufs support for 64K LS mappings on 4K kernels
This adds an option to spufs when the kernel is configured for
4K page to give it the ability to use 64K pages for SPE local store
mappings.

Currently, we are optimistic and try order 4 allocations when creating
contexts. If that fails, the code will fallback to 4K automatically.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt
16c2d47623 [POWERPC] Add ability to 4K kernel to hash in 64K pages
This adds the ability for a kernel compiled with 4K page size
to have special slices containing 64K pages and hash the right type
of hash PTEs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Benjamin Herrenschmidt
d0f13e3c20 [POWERPC] Introduce address space "slices"
The basic issue is to be able to do what hugetlbfs does but with
different page sizes for some other special filesystems; more
specifically, my need is:

 - Huge pages

 - SPE local store mappings using 64K pages on a 4K base page size
kernel on Cell

 - Some special 4K segments in 64K-page kernels for mapping a dodgy
type of powerpc-specific infiniband hardware that requires 4K MMU
mappings for various reasons I won't explain here.

The main issues are:

 - To maintain/keep track of the page size per "segment" (as we can
only have one page size per segment on powerpc, which are 256MB
divisions of the address space).

 - To make sure special mappings stay within their allotted
"segments" (including MAP_FIXED crap)

 - To make sure everybody else doesn't mmap/brk/grow_stack into a
"segment" that is used for a special mapping

Some of the necessary mechanisms to handle that were present in the
hugetlbfs code, but mostly in ways not suitable for anything else.

The patch relies on some changes to the generic get_unmapped_area()
that just got merged.  It still hijacks hugetlb callbacks here or
there as the generic code hasn't been entirely cleaned up yet but
that shouldn't be a problem.

So what is a slice ?  Well, I re-used the mechanism used formerly by our
hugetlbfs implementation which divides the address space in
"meta-segments" which I called "slices".  The division is done using
256MB slices below 4G, and 1T slices above.  Thus the address space is
divided currently into 16 "low" slices and 16 "high" slices.  (Special
case: high slice 0 is the area between 4G and 1T).

Doing so simplifies significantly the tracking of segments and avoids
having to keep track of all the 256MB segments in the address space.

While I used the "concepts" of hugetlbfs, I mostly re-implemented
everything in a more generic way and "ported" hugetlbfs to it.

Slices can have an associated page size, which is encoded in the mmu
context and used by the SLB miss handler to set the segment sizes.  The
hash code currently doesn't care, it has a specific check for hugepages,
though I might add a mechanism to provide per-slice hash mapping
functions in the future.

The slice code provide a pair of "generic" get_unmapped_area() (bottomup
and topdown) functions that should work with any slice size.  There is
some trickiness here so I would appreciate people to have a look at the
implementation of these and let me know if I got something wrong.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:35:00 +10:00
Johannes Berg
940d67f6b9 [POWERPC] swsusp: Introduce register_nosave_region_late
This patch introduces a new register_nosave_region_late function that
can be called from initcalls when register_nosave_region can no longer
be used because it uses bootmem.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09 16:34:56 +10:00
Robert P. J. Day
1591275cb5 Fix "deprecated" typoes.
Fix remaining misspellings of "depreciated" to "deprecated."

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:18:01 +02:00
Robert P. J. Day
beb7dd86a1 Fix misspellings collected by members of KJ list.
Fix the misspellings of "propogate", "writting" and (oh, the shame
:-) "kenrel" in the source tree.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:14:03 +02:00
Geert Uytterhoeven
b590d2baf1 m68k: <asm/scatterlist.h> needs <linux/types.h>
The recent <linux/pci.h> cleanup uncovered that include/asm-m68k/scatterlist.h
needs to include <linux/types.h>

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 20:41:14 -07:00
David Howells
5616df204e FRV: Miscellaneous fixes
Miscellaneous fixes to bring FRV up to date:

 (1) Copy the new syscall numbers from i386 to asm-frv/unistd.h and fill out
     the syscall table in entry.S too.

 (2) Mark __frv_uart0 and __frv_uart1 __pminitdata rather than __initdata so
     that determine_clocks() can access them when CONFIG_PM=y.

 (3) Make arch/frv/mm/elf-fdpic.c include asm/mman.h so that MAP_FIXED is
     available (fixes commit 2fd3bebaad).

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 20:41:14 -07:00
Linus Torvalds
0c23664ee8 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Optimize fault kprobe handling just like powerpc.
  [SPARC]: Wire up utimensat syscall.
  [SPARC64]: Fix request_irq() ignored result warnings in PCI controller code.
  [SPARC64]: Kill asm-sparc64/pbm.h
  [ATYFB]: Fix sparc includes.
  [QLA2XXX]: Fix build on sparc.
  [SPARC64]: Removal of trivial pci_controller_info uses.
  [SPARC64]: Move index info pci_pbm_info.
  [SPARC64]: Move {setup,teardown}_msi_irq into pci_pbm_info.
  [SPARC64]: Move pci_ops into pci_pbm_info.
  [SPARC64] SBUS: Error interrupt registry cleanups.
  [SPARC64] PCI: Use root list of pbm's instead of pci_controller_info's
  [SPARC64] PCI: Kill PROM_PCIRNG_MAX and PROM_PCIIMAP_MAX.
  [SPARC64] PCI: Use common routine to fetch PBM properties.
2007-05-08 20:32:43 -07:00
Linus Torvalds
6ec129c3a2 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (58 commits)
  [SCSI] zfcp: clear boxed flag on unit reopen.
  [SCSI] zfcp: clear adapter failed flag if an fsf request times out.
  [SCSI] zfcp: rework request ID management.
  [SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI
  [SCSI] zfcp: Locking for req_no and req_seq_no
  [SCSI] zfcp: print S_ID and D_ID with 3 bytes
  [SCSI] ipr: Use PCI-E reset API for new ipr adapter
  [SCSI] qla2xxx: Update version number to 8.01.07-k7.
  [SCSI] qla2xxx: Add MSI support.
  [SCSI] qla2xxx: Correct pci_set_msi() usage semantics.
  [SCSI] qla2xxx: Attempt to stop firmware only if it had been previously executed.
  [SCSI] qla2xxx: Honor NVRAM port-down-retry-count settings.
  [SCSI] qla2xxx: Error-out during probe() if we're unable to complete HBA initialization.
  [SCSI] zfcp: Stop system after memory corruption
  [SCSI] mesh: cleanup variable usage in interrupt handler
  [SCSI] megaraid: replace yield() with cond_resched()
  [SCSI] megaraid: fix warnings when CONFIG_PROC_FS=n
  [SCSI] aacraid: correct SUN products to README
  [SCSI] aacraid: superfluous adapter reset for IBM 8 series ServeRAID controllers
  [SCSI] aacraid: kexec fix (reset interrupt handler)
  ...
2007-05-08 20:32:16 -07:00
Paul Mundt
b118ca572d sh: Convert to common die chain.
This went in immediately after SH added the die chain notifiers,
so move over to that instead..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 10:55:38 +09:00
Paul Mundt
21ec4c6453 sh: Wire up utimensat syscall.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 10:42:48 +09:00
Paul Mundt
074f98df05 sh: Add 32-bit opcode feature CPU flag.
Add a CPU flag for the CPUs that support 32-bit opcodes, which
gets passed down to userspace.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 01:35:01 +00:00
Paul Mundt
bd0799977c sh: Support for SH-2A 32-bit opcodes.
SH-2A supports both 16 and 32-bit instructions, add a simple helper
for figuring out the instruction size in the places where there are
hardcoded 16-bit assumptions.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 01:35:01 +00:00
Paul Mundt
44530c696b sh: Always define TRAPA_BUG_OPCODE.
Previously this was only set when CONFIG_BUG=y. While we rely
on that for handle_BUG() dispatch, we still want to hand the
opcode off to the die chain notifier for determining the trap
value.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 01:35:01 +00:00
Paul Mundt
1039b9a9d8 sh: __GFP_REPEAT for pte allocations, too.
This got dropped in the quicklist conversion, add it back in..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 01:35:01 +00:00
Paul Mundt
5f8c9908f2 sh: generic quicklist support.
This moves SH over to the generic quicklists. As per x86_64,
we have special mappings for the PGDs, so these go on their
own list..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-05-09 01:35:00 +00:00
David S. Miller
127cda1e8c [SPARC64]: Optimize fault kprobe handling just like powerpc.
And eliminate DIE_GPF while we're at it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 18:25:14 -07:00
Roland Dreier
225c7b1fee IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters
Add an InfiniBand driver for Mellanox ConnectX adapters.  Because
these adapters can also be used as ethernet NICs and Fibre Channel 
HBAs, the driver is split into two modules: 
 
  mlx4_core: Handles low-level things like device initialization and 
    processing firmware commands.  Also controls resource allocation 
    so that the InfiniBand, ethernet and FC functions can share a 
    device without stepping on each other. 
 
  mlx4_ib: Handles InfiniBand-specific things; plugs into the 
    InfiniBand midlayer. 

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:38 -07:00
Roland Dreier
1bf66a3042 IB: Put rlimit accounting struct in struct ib_umem
When memory pinned with ib_umem_get() is released, ib_umem_release()
needs to subtract the amount of memory being unpinned from
mm->locked_vm.  However, ib_umem_release() may be called with
mm->mmap_sem already held for writing if the memory is being released
as part of an munmap() call, so it is sometimes necessary to defer
this accounting into a workqueue.

However, the work struct used to defer this accounting is dynamically
allocated before it is queued, so there is the possibility of failing
that allocation.  If the allocation fails, then ib_umem_release has no
choice except to bail out and leave the process with a permanently
elevated locked_vm.

Fix this by allocating the structure to defer accounting as part of
the original struct ib_umem, so there's no possibility of failing a
later allocation if creating the struct ib_umem and pinning memory
succeeds.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Roland Dreier
f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Jiri Kosina
66da876962 USB HID: report descriptor of Cypress USB barcode readers needs fixup
Certain versions of Cypress USB barcode readers (this problem is known to
happen at least with PIDs 0xde61 and 0xde64) have report descriptor which
has swapped usage min and usage max tag. This results in HID parser failing
for report descriptor of these devices, as it (wrongly) requires allocating
more usages than HID_MAX_USAGES.

Solve this by walking through the report descriptor for such devices, and swap
the usage min and usage max items (and their values) to be in proper order.

Reported-by: Bret Towe <magnade@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2007-05-09 02:52:51 +02:00
David S. Miller
6c1142602c [SPARC]: Wire up utimensat syscall.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 17:50:14 -07:00
David S. Miller
c57c2ffb15 [SPARC64]: Kill asm-sparc64/pbm.h
Everything it contains can be hidden in pci_impl.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 16:43:08 -07:00
David S. Miller
6c108f1299 [SPARC64]: Move index info pci_pbm_info.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 16:41:40 -07:00
David S. Miller
e9870c4c0a [SPARC64]: Move {setup,teardown}_msi_irq into pci_pbm_info.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 16:41:36 -07:00
David S. Miller
f1cd8de2c9 [SPARC64]: Move pci_ops into pci_pbm_info.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 16:41:32 -07:00
David S. Miller
34768bc832 [SPARC64] PCI: Use root list of pbm's instead of pci_controller_info's
The idea is to move more and more things into the pbm,
with the eventual goal of eliminating the pci_controller_info
entirely as there really isn't any need for it.

This stage of the transformations requires some reworking of
the PCI error interrupt handling.

It might be tricky to get rid of the pci_controller_info parenting for
a few reasons:

1) When we get an uncorrectable or correctable error we want
   to interrogate the IOMMU and streaming cache of both
   PBMs for error status.  These errors come from the UPA
   front-end which is shared between the two PBM PCI bus
   segments.

   Historically speaking this is why I choose the datastructure
   hierarchy of pci_controller_info-->pci_pbm_info

2) The probing does a portid/devhandle match to look for the
   'other' pbm, but this is entirely an artifact and can be
   eliminated trivially.

What we could do to solve #1 is to have a "buddy" pointer from one pbm
to another.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 16:41:24 -07:00
David S. Miller
5a4a3e592d [SPARC64] PCI: Kill PROM_PCIRNG_MAX and PROM_PCIIMAP_MAX.
They are totally unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 16:41:20 -07:00
David S. Miller
cfa0652c4e [SPARC64] PCI: Use common routine to fetch PBM properties.
Namely bus-range and ino-bitmap.

This allows us also to eliminate pci_controller_info's
pci_{first,last}_busno fields as only the pbm ones are
used now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-08 16:41:12 -07:00
Alexey Kuznetsov
e180583b85 [IA64] wire up pselect, ppoll
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-08 15:57:59 -07:00
Catalin Marinas
56163fcf19 [ARM] armv7: add dedicated ARMv7 barrier instructions
Starting with ARMv7, there are dedicated instruction for the ISB, DSB
and DMB barriers and there is no need to execute them as CP15
operations.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 22:55:57 +01:00
Catalin Marinas
aaf83acba9 [ARM] armv7: Add ARMv7 cacheid macros
This patch renames the old __cacheid_* macros to __cacheid_*_prev7 and adds
support for the new format.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 22:55:54 +01:00
Catalin Marinas
bbe888864e [ARM] armv7: add support for ARMv7 cores.
This patch adds support for the ARMv7 cores.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 22:55:53 +01:00
Alexey Dobriyan
4a177cbf84 [IA64] Add TIF_RESTORE_SIGMASK
Preparation for pselect and ppoll.
ia32 compat code not tested. :-(

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-08 14:51:59 -07:00
Jack Steiner
3be44b9cc3 [IA64] Optional method to purge the TLB on SN systems
This patch adds an optional method for purging the TLB on SN IA64 systems.
The change should not affect any non-SN system.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-08 14:50:43 -07:00
Alex Dubov
055b822414 disable socket power in adapter driver instead of media one
Socket power must be fully controlled by adapter driver. This also prevents
unnecessary power-off of the socket when media driver is unloaded, yet
media remains in the socket.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2007-05-08 22:41:47 +02:00
Andrew Victor
b85fe92766 [ARM] 4363/1: AT91: Remove legacy PIO definitions
Remove the legacy PIO pin definitions for the AT91 processors.
The standard (and portable between the different AT91 processors) method
is to use the AT91_PIN_* defines and the GPIO API.

Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 20:45:21 +01:00
Andrew Victor
8eef3896b3 [ARM] 4361/1: AT91: Build error
Fix a build error due to a missing semicolon.

Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 20:45:14 +01:00
Tony Lindgren
f4e4c324a5 ARM: OMAP: Sync headers with linux-omap
This patch syncs omap specific headers with linux-omap.
Most of the changes needed because of bitrot caused by
driver changes in linux-omap tree. Integrating this
is needed for adding support for various omap drivers.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 20:36:25 +01:00
Imre Deak
771af222eb ARM: OMAP: FB: add controller platform data
Add controller platform data

Signed-off-by: Imre Deak <imre.deak@solidboot.com>
Signed-off-by: Juha Yrjola <juha.yrjola@solidboot.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 20:35:08 +01:00
Linus Torvalds
36f021b579 Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (32 commits)
  Use menuconfig objects - hwmon
  hwmon/smsc47b397: Use dynamic sysfs callbacks
  hwmon/smsc47b397: Convert to a platform driver
  hwmon/w83781d: Deprecate W83627HF support
  hwmon/w83781d: Use dynamic sysfs callbacks
  hwmon/w83781d: Be less i2c_client-centric
  hwmon/w83781d: Clean up conversion macros
  hwmon/w83781d: No longer use i2c-isa
  hwmon/ams: Do not print error on systems without apple motion sensor
  hwmon/ams: Fix I2C read retry logic
  hwmon: New AD7416, AD7417 and AD7418 driver
  hwmon/coretemp: Add documentation
  hwmon: New coretemp driver
  i386: Use functions from library in msr driver
  i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpu
  hwmon/lm75: Use dynamic sysfs callbacks
  hwmon/lm78: Use dynamic sysfs callbacks
  hwmon/lm78: Be less i2c_client-centric
  hwmon/lm78: No longer use i2c-isa
  hwmon: New max6650 driver
  ...
2007-05-08 12:07:28 -07:00
Russell King
8678c1f042 [ARM] Fix ASID version switch
Close a hole in the ASID version switch, particularly the following
scenario:

CPU0 MM PID			CPU1 MM PID
	idle
				  A	pid(A)
				  A	idle(lazy tlb)
		* new asid version triggered by B *
  B	pid(B)
  A	pid(A)
		* MM A gets new asid version *
  A	idle(lazy tlb)
				  A	pid(A)
		* CPU1 doesn't see the new ASID *

The result is that CPU1 continues running with the hardware set
for the original (stale) ASID value, but mm->context.id contains
the new ASID value.  The result is that the next MM fault on CPU1
updates the page table entries, but flush_tlb_page() fails due to
wrong ASID.

There is a related case with a threaded application is allocated
a new ASID on one CPU while another of its threads is running on
some different CPU.  This scenario is not fixed by this commit.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-05-08 20:03:09 +01:00
Linus Torvalds
4750def52c Merge branch 'reset-seq' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'reset-seq' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata reset-seq] build and merge fixes
  libata: reimplement reset sequencing
  libata: improve ata_std_prereset()
  libata: improve 0xff status handling
  libata: add deadline support to prereset and reset methods
2007-05-08 11:58:20 -07:00
Linus Torvalds
9028780a3e Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (40 commits)
  [netdrvr] atl1: fix build
  pasemi_mac: Use local-mac-address instead of mac-address if available
  pasemi_mac: PHY support
  pasemi_mac: Add msglevel support and "debug" module param
  pasemi_mac: Logic cleanup / rx performance improvements
  pasemi_mac: Minor cleanup / define fixes
  pasemi_mac: Add SKB reuse / copy-break
  pasemi_mac: Timer and interrupt fixes
  pasemi_mac: Abstract and fix up interrupt restart routines
  pasemi_mac: Move the IRQ mapping from the PCI layer to the driver
  tc35815: Remove unnecessary skb->dev assignment
  drivers/net/dm9000: Convert to generic boolean
  AT91RM9200 Ethernet: Fix multicast addressing
  AT91RM9200 Ethernet: Support additional PHYs
  PCMCIA-NETDEV : xirc2ps_cs: bugfix of multicast code
  sky2: re-enable 88E8056 for most motherboards
  MIPS: Drop unnecessary CONFIG_ISA from RBTX49XX
  ne: MIPS: Use platform_driver for ne on RBTX49XX
  ne: Add NEEDS_PORTLIST to control ISA auto-probe
  ne: Misc fixes for platform driver.
  ...

Fix conflict in drivers/net/pasemi_mac.c (get_property() got renamed to
of_get_property()) manually.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:57:17 -07:00
Linus Torvalds
393bfca19e Merge master.kernel.org:/pub/scm/linux/kernel/git/dtor/input
* master.kernel.org:/pub/scm/linux/kernel/git/dtor/input:
  Input: move USB miscellaneous devices under drivers/input/misc
  Input: move USB mice under drivers/input/mouse
  Input: move USB gamepads under drivers/input/joystick
  Input: move USB touchscreens under drivers/input/touchscreen
  Input: move USB tablets under drivers/input/tablet
  Input: i8042 - fix AUX port detection with some chips
  Input: aaed2000_kbd - convert to use polldev library
  Input: drivers/usb/input - usb_buffer_free() cleanup
  Input: synaptics - don't complain about failed resets
  Input: pull input.h into uinpit.h
  Input: drivers/usb/input - fix sparse warnings (signedness)
  Input: evdev - fix some sparse warnings (signedness, shadowing)
  Input: drivers/joystick - fix various sparse warnings
  Input: force feedback - make sure effect is present before playing
2007-05-08 11:51:43 -07:00
Linus Torvalds
df6d3916f3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (77 commits)
  [POWERPC] Abolish powerpc_flash_init()
  [POWERPC] Early serial debug support for PPC44x
  [POWERPC] Support for the Ebony 440GP reference board in arch/powerpc
  [POWERPC] Add device tree for Ebony
  [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for now
  [POWERPC] MPIC U3/U4 MSI backend
  [POWERPC] MPIC MSI allocator
  [POWERPC] Enable MSI mappings for MPIC
  [POWERPC] Tell Phyp we support MSI
  [POWERPC] RTAS MSI implementation
  [POWERPC] PowerPC MSI infrastructure
  [POWERPC] Rip out the existing powerpc msi stubs
  [POWERPC] Remove use of 4level-fixup.h for ppc32
  [POWERPC] Add powerpc PCI-E reset API implementation
  [POWERPC] Holly bootwrapper
  [POWERPC] Holly DTS
  [POWERPC] Holly defconfig
  [POWERPC] Add support for 750CL Holly board
  [POWERPC] Generalize tsi108 PCI setup
  [POWERPC] Generalize tsi108 PHY types
  ...

Fixed conflict in include/asm-powerpc/kdebug.h manually

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:50:19 -07:00
Ondrej Zajicek
34ed25f50b s3fb: updates
Move s3fb_get_tilemax to svgalib.c as svga_get_tilemax, because it reports
limitation of other code from svgalib (svga_settile, svga_tilecopy, ...)

Limit font width to 8 pixels in 4 bpp mode.

Signed-off-by: Ondrej Zajicek <santiago@crfreenet.org>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:33 -07:00
Matthias Kaehlcke
c831c338f0 use mutex instead of semaphore in virtual console driver
The virtual console driver uses a semaphore as mutex.  Use the mutex API
instead of the (binary) semaphore.

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:33 -07:00
Ville Syrjala
159dde9369 atyfb: halve XCLK with Mobility and 32bit memory
Laptops with Rage Mobility and 32bit memory interface seem to require halved
XCLK to operate correctly.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:32 -07:00
Antonino A. Daplas
38a3dc5185 fbdev: fbcon: check if mode can handle new screen
Check if the mode can properly display the screen.  This will be needed by
drivers where the capability is not constant with each mode.  The function
fb_set_var() will query fbcon the requirement, then it will query the driver
(via a new hook fb_get_caps()) its capability.  If the driver's capability
cannot handle fbcon's requirement, then fb_set_var() will fail.

For example, if a particular driver supports 2 modes where:

mode1 = can only display 8x16 bitmaps
mode2 = can display any bitmap

then if current mode = mode2 and current font = 12x22

fbset <mode1> /* mode1 cannot handle 12x22 */
fbset will fail

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:32 -07:00
Krzysztof Helt
e5d809d774 pm2fb: Permedia 2V memory clock setting
Permedia 2V uses its own registers to set a memory clock. The
patch adds these registers and uses them in the set_memclock()
function.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:32 -07:00
Antonino A. Daplas
11f11d522f fbdev: add tile operation to get the maximum length of the map
Add a tile method, fb_get_tilemax(), that returns the maximum length of
the tile map (or font map).  This is needed by s3fb which can only handle
256 characters.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:31 -07:00
Antonino A. Daplas
2d2699d984 fbcon: font setting should check limitation of driver
fbcon_set_font() will now check if the new font dimensions can be drawn by the
driver (by checking pixmap.blit_x and blit_y).  Similarly, add 2 new
parameters to get_default_font(), font_w and font_h, to further aid in the
font selection process.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:31 -07:00
Antonino A. Daplas
bf26ad72a6 fbdev: advertise limitation of drawing engine
A few drivers are not capable of blitting rectangles of any dimension.
vga16fb can only blit 8-pixel wide rectangles, while s3fb (in tileblitting
mode) can only blit 8x16 rectangles.  For example, loading a 12x22 font in
vga16fb will result in a corrupt display.

Advertise this limitation/capability in info->pixmap.blit_x and blit_y.  These
fields are 32-bit arrays (font max is 32x32 only), ie, if bit 7 is set, then
width/height of 7+1 is supported.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:30 -07:00
Antonino A. Daplas
09aaf268eb fbdev: add fb_read/fb_write functions for framebuffers in system RAM
The functions fb_read() and fb_write in fbmem.c assume that the framebuffer
is in IO memory.  However, we have 3 drivers (hecubafb, arcfb, and vfb)
where the framebuffer is allocated from system RAM (via vmalloc). Using
__raw_read/__raw_write (fb_readl/fb_writel) for these drivers is
illegal, especially in other platforms.

Create file read and write methods for these types of drivers.  These are
named fb_sys_read() and fb_sys_write().

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:30 -07:00
Antonino A. Daplas
3f9b0880e4 fbdev: pass struct fb_info to fb_read and fb_write
It is unnecessary to pass struct file to fb_read() and fb_write() in struct
fb_ops. For consistency with the other methods, pass struct fb_info instead.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:30 -07:00
Antonino A. Daplas
68648ed1f5 fbdev: add drawing functions for framebuffers in system RAM
The generic drawing functions (cfbimgblt, cfbcopyarea, cfbfillrect) assume
that the framebuffer is in IO memory.  However, we have 3 drivers (hecubafb,
arcfb, and vfb) where the framebuffer is allocated from system RAM (via
vmalloc). Using _raw_read/write and family for these drivers (as used in
the cfb* functions) is illegal, especially in other platforms.

Create 3 new drawing functions, based almost entirely from the original
except that the framebuffer memory is assumed to be in system RAM.
These are named as sysimgblt, syscopyarea, and sysfillrect.

Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:30 -07:00
Jan Engelhardt
fa6ce9ab5f vt: add color support to the "underline" and "italic" attributes
Add color support to the "underline" and "italic" attributes as in
OpenBSD/NetBSD-style (vt220) and xterm.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Acked-by: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:27 -07:00
Maciej W. Rozycki
86c6f7d08b tgafb: TURBOchannel support
This is support for the TC variations of the TGA boards (properly known as
SFB+ or Smart Frame Buffer Plus boards).  The 8-plane SFB+ board uses the
Bt459 RAMDAC (unlike its PCI TGA counterpart, which uses the Bt485), so
bits have been added to support this chip as well.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: James Simmons <jsimmons@infradead.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:27 -07:00
Paul Mundt
5e841b88d2 fb: fsync() method for deferred I/O flush.
There are cases when we do not want to wait on the delay for automatically
updating the "real" framebuffer, this implements a simple ->fsync() hook
for explicitly flushing the deferred I/O work.  The ->page_mkwrite()
handler will rearm the work queue normally.

(akpm: nuke unneeded ifdefs, forward-delcare struct dentry)

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jaya Kumar <jayakumar.lkml@gmail.com>
Acked-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:27 -07:00
Jaya Kumar
60b59beafb fbdev: mm: Deferred IO support
This implements deferred IO support in fbdev.  Deferred IO is a way to delay
and repurpose IO.  This implementation is done using mm's page_mkwrite and
page_mkclean hooks in order to detect, delay and then rewrite IO.  This
functionality is used by hecubafb.

[adaplas]
This is useful for graphics hardware with no directly addressable/mappable
framebuffer. Implementing this will allow the "framebuffer" to be accesible
from user space via mmap().

Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:26 -07:00
James Simmons
2ee121631b fbdev: display class
Add the new display class.  This is meant to unite the various solutions to
display units ie acpi output device, auxdisplay and the defunct lcd class
in the backlight directory.

Signed-off-by: James Simmons <jsimmons@infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:26 -07:00
Jiri Slaby
dd025c0c7a Char: cyclades, dynamic ports
and save thus approx. 160k of .bss

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:25 -07:00
Jiri Slaby
6a0aa67b17 Char: cyclades, remove unused timestamps
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:25 -07:00
Jiri Slaby
2c7fea9921 Char: cyclades, remove sleep_on
convert to wait_* and completion

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:25 -07:00
Jiri Slaby
875b206b5f Char: cyclades, make info->card a pointer
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:25 -07:00
Jiri Slaby
46039f8a64 Char: cyclades, remove useless fileds from cyclades_card
pde, ctl_phys and base_phys are useless -- they are never used.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:24 -07:00
Jiri Slaby
b81cc310f1 Char: cyclades, unexport struct cyclades_card
Do not export internal card data to userspace. cytune doesn't use this
anyway.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:24 -07:00
Bjorn Helgaas
7e92b4fc34 x86, serial: convert legacy COM ports to platform devices
Make x86 COM ports into platform devices and don't probe for them
if we have PNP.

This prevents double discovery, where a device was found both by
the legacy probe and by 8250_pnp, e.g.,

    serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
    00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A

This also means IRDA devices without a UART PNP ID will no longer be
claimed by the serial driver, which might require changes in IRDA
drivers and administration.

In addition to this patch, you may need to configure a setserial init
script, e.g., /etc/init.d/setserial, so it doesn't poke legacy UART
stuff back in.  On Debian, "dpkg-reconfigure setserial" with the "kernel"
option does this.

To force the old legacy probe behavior even when we have PNPBIOS or
ACPI, load the new legacy_serial module (or build 8250 static) with
the "legacy_serial.force" option.

[akpm@linux-foundation.org: fix makefiles]
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Jean Tourrilhes <jt@hpl.hp.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Russell King <rmk+serial@arm.linux.org.uk>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:23 -07:00
Bjorn Helgaas
8f81dd1498 PNP: notice whether we have PNP devices (PNPBIOS or PNPACPI)
This series converts i386 and x86_64 legacy serial ports to be platform
devices and prevents probing for them if we have PNP.

This prevents double discovery, where a device was found both by the legacy
probe and by 8250_pnp.

This also prevents the serial driver from claiming IRDA devices (unless they
have a UART PNP ID).  The serial legacy probe sometimes assumed the wrong IRQ,
so the user had to use "setserial" to fix it.

Removing the need for setserial to make IRDA devices work seems good, but it
does break some things.  In particular, you may need to keep setserial from
poking legacy UART stuff back in by doing something like "dpkg-reconfigure
setserial" with the "kernel" option.  Otherwise, the setserial-discovered
"UART" will claim resources and prevent the IRDA driver from loading.

This patch:

If we can discover devices using PNP, we can skip some legacy probes.  This
flag ("pnp_platform_devices") indicates that PNPBIOS or PNPACPI is enabled and
should tell us about builtin devices.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Jean Tourrilhes <jt@hpl.hp.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Russell King <rmk+serial@arm.linux.org.uk>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:23 -07:00
Jiri Slaby
db05c3b1dd Char: cyclades, cy_readX/writeX cleanup
cyclades, cy_readX/writeX cleanup

- cy_readX are placeholders for readX, remove it
- move cy_writeX macros into do {} while(0) to be safe

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:22 -07:00
Bernhard Walle
d85a60d85e Add IRQF_IRQPOLL flag (common code)
irqpoll is broken on some architectures that don't use the IRQ 0 for the timer
interrupt like IA64.  This patch adds a IRQF_IRQPOLL flag.

Each architecture is handled in a separate pach.  As I left the irq == 0 as
condition, this should not break existing architectures that use timer_irq ==
0 and that I did't address with that patch (because I don't know).

This patch:

This patch adds a IRQF_IRQPOLL flag that the interrupt registration code could
use for the interrupt it wants to use for IRQ polling.

Because this must not be the timer interrupt, an additional flag was added
instead of re-using the IRQF_TIMER constant.  Until all architectures will
have an IRQF_IRQPOLL interrupt, irq == 0 will stay as alternative as it should
not break anything.

Also, note_interrupt() is called on CPU-specific interrupts to be used as
interrupt source for IRQ polling.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Grant Grundler <grundler@google.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:22 -07:00
Peter Zijlstra
277866a0e3 nfs: fix congestion control: use atomic_longs
Change the atomic_t in struct nfs_server to atomic_long_t in anticipation
of machines that can handle 8+TB of (4K) pages under writeback.

However I suspect other things in NFS will start going *bang* by then.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Robert P. J. Day
cc38682f35 Some grammatical fixups and additions to atomic.h kernel-doc content
Tweak and add content for extractable documentation in asm-i386/atomic.h.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Jeff Dike
a436ed9c51 x86: create asm/cmpxchg.h
i386:

  Rearrange the cmpxchg code to allow atomic.h to get it without needing to
  include system.h.  This kills warnings in the UML build from atomic.h about
  implicit declarations of cmpxchg symbols.  The i386 build presumably isn't
  seeing this because a separate inclusion of system.h is covering it over.

  The cmpxchg stuff is moved to asm-i386/cmpxchg.h, with an include left in
  system.h for the benefit of generic code which expects cmpxchg there.

  Meanwhile, atomic.h includes cmpxchg.h.

  This causes no noticable damage to the i386 build.

x86_64:

  Move cmpxchg into its own header.  atomic.h already included system.h, so
  this is changed to include cmpxchg.h.

  This is purely cleanup - it's not fixing any warnings - so if the x86_64
  system.h isn't considered as cleanup-worthy as i386, then this can be
  dropped.

  It causes no noticable damage to the x86_64 build.

uml:

  The i386 and x86_64 cmpxchg patches require an asm-um/cmpxchg.h for the
  UML build.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Jeff Dike
5dc12ddee9 Remove tas()
tas() has no users, so get rid of it.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
c343c14aec local_t: x86_64 extension
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
469b50b622 local_t: sparc64 cleanup
sparc64 local_t cleanup : simply use asm-generic/local.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
6d8944a0d7 local_t: powerpc extension
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
14c846a4d8 local_t: parisc cleanup
parisc architecture local_t cleanup : use asm-generic/local.h.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
7232311ef1 local_t: mips extension
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
4431f46f5f local_t: ia64 extension
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
a075227948 local_t: i386 extension
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
f43f7b46eb local_t: alpha extension
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
5e97b9309b local_t: architecture independent extension
This series extena and standardises local_t operations on each architecture,
allowing a rich set of atomic operations to be done on per-cpu data with
minimal performance impact.  On architectures where there seems to be no
difference between the SMP and UP operation (same memory barriers, same
LOCKing), local.h simply includes asm-generic/local.h, which removes
duplicated code from the current kernel tree.

This patch:

local_t: architecture independent extension

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
2856f5e31c atomic.h: atomic_add_unless as inline. Remove system.h atomic.h circular dependency
atomic_add_unless as inline. Remove system.h atomic.h circular dependency.
I agree (with Andi Kleen) this typeof is not needed and more error
prone. All the original atomic.h code that uses cmpxchg (which includes
the atomic_add_unless) uses defines instead of inline functions,
probably to circumvent a circular dependency between system.h and
atomic.h on powerpc (which my patch addresses). Therefore, it makes
sense to use inline functions that will provide type checking.

atomic_add_unless as inline. Remove system.h atomic.h circular dependency.
Digging into the FRV architecture shows me that it is also affected by
such a circular dependency. Here is the diff applying this against the
rest of my atomic.h patches.

It applies over the atomic.h standardization patches.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:20 -07:00
Mathieu Desnoyers
79d365a306 atomic.h: add atomic64 cmpxchg, xchg and add_unless to x86_64
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
2549c8589c atomic.h: add atomic64 cmpxchg, xchg and add_unless to sparc64
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
f46e477ed9 atomic.h: add atomic64 cmpxchg, xchg and add_unless to powerpc
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
8ffe9d0bff atomic.h: add atomic64 cmpxchg, xchg and add_unless to parisc
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
e12f644bd0 atomic.h: add atomic64 cmpxchg, xchg and add_unless to mips
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
819791319b atomic.h: add atomic64 cmpxchg, xchg and add_unless to ia64
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
e656e245d5 atomic.h: i386 type safety fix
Remove an explicit cast to an integer type for the result returned by cmpxchg.
 It is not per se a problem on the i386 architecture, because sizeof(int) ==
sizeof(long), but whenever this code is cut'n'pasted to a accept passing an
atomic64_t value as parameter to cmpxchg, xchg and add_unless, having 64 bits
inputs casted to 32 bits.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
bb2382c3e4 atomic.h: complete atomic_long operations in asm-generic
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Mathieu Desnoyers
e96e699423 atomic.h: add atomic64 cmpxchg, xchg and add_unless to alpha
This series mainly adds support for missing 64 bits cmpxchg and 64 bits atomic
add unless.  Therefore, principally 64 bits architectures are targeted by
these patches.  It also adds the complete list of atomic operations on the
atomic_long type.

This patch:

atomic.h: add atomic64 cmpxchg, xchg and add_unless to alpha

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Ananth N Mavinakayanahalli
bf8f6e5b3e Kprobes: The ON/OFF knob thru debugfs
This patch provides a debugfs knob to turn kprobes on/off

o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
  not (default enabled)
o Echoing 0 to this file will disarm all installed probes
o Any new probe registration when disabled will register the probe but
  not arm it. A message will be printed out in such a case.
o When a value 1 is echoed to the file, all probes (including ones
  registered in the intervening period) will be enabled
o Unregistration will happen irrespective of whether probes are globally
  enabled or not.
o Update Documentation/kprobes.txt to reflect these changes. While there
  also update the doc to make it current.

We are also looking at providing sysrq key support to tie to the disabling
feature provided by this patch.

[akpm@linux-foundation.org: Use bool like a bool!]
[akpm@linux-foundation.org: add printk facility levels]
[cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390]
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Christoph Hellwig
4c4308cb93 kprobes: kretprobes simplifications
- consolidate duplicate code in all arch_prepare_kretprobe instances
   into common code
 - replace various odd helpers that use hlist_for_each_entry to get
   the first elemenet of a list with either a hlist_for_each_entry_save
   or an opencoded access to the first element in the caller
 - inline add_rp_inst into it's only remaining caller
 - use kretprobe_inst_table_head instead of opencoding it

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:19 -07:00
Andrew Morton
416ce32e70 revert "rtc: Add rtc_merge_alarm()"
David says "884b4aaaa242a2db8c8252796f0118164a680ab5 should be reverted.  It
added an rtc_merge_alarm() call to the 2.6.20 kernel, which hasn't yet been
used by any in-tree driver; this patch obviates the need for that call, and
uses a more robust approach."

Cc: Scott Wood <scottwood@freescale.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
David Brownell
87ac84f42a rtc-cmos wakeup interface
I finally got around to testing the updated wakeup event hooks for rtc-cmos,
and they follow in two patches:

 - Interface update ... when a simple enable_irq_wake() doesn't suffice,
   the platform data can hold suspend/resume callback hooks.

 - ACPI implementation ... provides callback hooks to do ACPI magic, and
   eliminate the legacy /proc/acpi/alarm file.

The interface update could go into 2.6.21, but that's not essential; they
will be NOPs on most PCs, without the ACPI stuff.

I suspect the ACPI folk may have opinions about how to merge that second
patch, and how to obsolete that legacy procfs file.  I'd like to see that
merge into 2.6.22 if possible...

As for how to kick it in ... two ways:

 - The appended "rtcwake" program; updated since the last time it was
   posted, it deals much better with timezones and DST.

 - Write the /sys/class/rtc/.../wakealarm file, then go to sleep.

For some reason RTC wake from "swsusp" stopped working on a system where
it previously worked; the alarm setting appears to get clobbered.  But
on the bright side, RTC wake from "standby" worked on a system that had
never been able to resume from that state before ... IDEACPI is my guess
as to why it finally started to work.  It's the old "two steps forward,
one step back" dance, I guess.

- Dave

/* gcc -Wall -Os -o rtcwake rtcwake.c */

#include <stdio.h>
#include <getopt.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>

#include <sys/ioctl.h>
#include <sys/time.h>
#include <sys/types.h>

#include <linux/rtc.h>

/* constants from legacy PC/AT hardware */
#define	RTC_PF	0x40
#define	RTC_AF	0x20
#define	RTC_UF	0x10

/*
 * rtcwake -- enter a system sleep state until specified wakeup time.
 *
 * This uses cross-platform Linux interfaces to enter a system sleep state,
 * and leave it no later than a specified time.  It uses any RTC framework
 * driver that supports standard driver model wakeup flags.
 *
 * This is normally used like the old "apmsleep" utility, to wake from a
 * suspend state like ACPI S1 (standby) or S3 (suspend-to-RAM).  Most
 * platforms can implement those without analogues of BIOS, APM, or ACPI.
 *
 * On some systems, this can also be used like "nvram-wakeup", waking
 * from states like ACPI S4 (suspend to disk).  Not all systems have
 * persistent media that are appropriate for such suspend modes.
 *
 * The best way to set the system's RTC is so that it holds the current
 * time in UTC.  Use the "-l" flag to tell this program that the system
 * RTC uses a local timezone instead (maybe you dual-boot MS-Windows).
 */

static char		*progname;

#ifdef	DEBUG
#define	VERSION	"1.0 dev (" __DATE__ " " __TIME__ ")"
#else
#define	VERSION	"0.9"
#endif

static unsigned		verbose;
static int		rtc_is_utc = -1;

static int may_wakeup(const char *devname)
{
	char	buf[128], *s;
	FILE	*f;

	snprintf(buf, sizeof buf, "/sys/class/rtc/%s/device/power/wakeup",
			devname);
	f = fopen(buf, "r");
	if (!f) {
		perror(buf);
		return 0;
	}
	fgets(buf, sizeof buf, f);
	fclose(f);

	s = strchr(buf, '\n');
	if (!s)
		return 0;
	*s = 0;

	/* wakeup events could be disabled or not supported */
	return strcmp(buf, "enabled") == 0;
}

/* all times should be in UTC */
static time_t	sys_time;
static time_t	rtc_time;

static int get_basetimes(int fd)
{
	struct tm	tm;
	struct rtc_time	rtc;

	/* this process works in RTC time, except when working
	 * with the system clock (which always uses UTC).
	 */
	if (rtc_is_utc)
		setenv("TZ", "UTC", 1);
	tzset();

	/* read rtc and system clocks "at the same time", or as
	 * precisely (+/- a second) as we can read them.
	 */
	if (ioctl(fd, RTC_RD_TIME, &rtc) < 0) {
		perror("read rtc time");
		return 0;
	}
	sys_time = time(0);
	if (sys_time == (time_t)-1) {
		perror("read system time");
		return 0;
	}

	/* convert rtc_time to normal arithmetic-friendly form,
	 * updating tm.tm_wday as used by asctime().
	 */
	memset(&tm, 0, sizeof tm);
	tm.tm_sec = rtc.tm_sec;
	tm.tm_min = rtc.tm_min;
	tm.tm_hour = rtc.tm_hour;
	tm.tm_mday = rtc.tm_mday;
	tm.tm_mon = rtc.tm_mon;
	tm.tm_year = rtc.tm_year;
	tm.tm_isdst = rtc.tm_isdst;	/* stays unspecified? */
	rtc_time = mktime(&tm);

	if (rtc_time == (time_t)-1) {
		perror("convert rtc time");
		return 0;
	}

	if (verbose) {
		if (!rtc_is_utc) {
			printf("\ttzone   = %ld\n", timezone);
			printf("\ttzname  = %s\n", tzname[daylight]);
			gmtime_r(&rtc_time, &tm);
		}
		printf("\tsystime = %ld, (UTC) %s",
				(long) sys_time, asctime(gmtime(&sys_time)));
		printf("\trtctime = %ld, (UTC) %s",
				(long) rtc_time, asctime(&tm));
	}

	return 1;
}

static int setup_alarm(int fd, time_t *wakeup)
{
	struct tm		*tm;
	struct rtc_wkalrm	wake;

	tm = gmtime(wakeup);

	wake.time.tm_sec = tm->tm_sec;
	wake.time.tm_min = tm->tm_min;
	wake.time.tm_hour = tm->tm_hour;
	wake.time.tm_mday = tm->tm_mday;
	wake.time.tm_mon = tm->tm_mon;
	wake.time.tm_year = tm->tm_year;
	wake.time.tm_wday = tm->tm_wday;
	wake.time.tm_yday = tm->tm_yday;
	wake.time.tm_isdst = tm->tm_isdst;

	/* many rtc alarms only support up to 24 hours from 'now' ... */
	if ((rtc_time + (24 * 60 * 60)) > *wakeup) {
		if (ioctl(fd, RTC_ALM_SET, &wake.time) < 0) {
			perror("set rtc alarm");
			return 0;
		}
		if (ioctl(fd, RTC_AIE_ON, 0) < 0) {
			perror("enable rtc alarm");
			return 0;
		}

	/* ... so use the "more than 24 hours" request only if we must */
	} else {
		/* avoid an extra AIE_ON call */
		wake.enabled = 1;

		if (ioctl(fd, RTC_WKALM_SET, &wake) < 0) {
			perror("set rtc wake alarm");
			return 0;
		}
	}

	return 1;
}

static void suspend_system(const char *suspend)
{
	FILE	*f = fopen("/sys/power/state", "w");

	if (!f) {
		perror("/sys/power/state");
		return;
	}

	fprintf(f, "%s\n", suspend);
	fflush(f);

	/* this executes after wake from suspend */
	fclose(f);
}

int main(int argc, char **argv)
{
	static char		*devname = "rtc0";
	static unsigned		seconds = 0;
	static char		*suspend = "standby";

	int		t;
	int		fd;
	time_t		alarm = 0;

	progname = strrchr(argv[0], '/');
	if (progname)
		progname++;
	else
		progname = argv[0];
	if (chdir("/dev/") < 0) {
		perror("chdir /dev");
		return 1;
	}

	while ((t = getopt(argc, argv, "d:lm:s:t:uVv")) != EOF) {
		switch (t) {

		case 'd':
			devname = optarg;
			break;

		case 'l':
			rtc_is_utc = 0;
			break;

		/* what system power mode to use?  for now handle only
		 * standardized mode names; eventually when systems define
		 * their own state names, parse /sys/power/state.
		 *
		 * "on" is used just to test the RTC alarm mechanism,
		 * bypassing all the wakeup-from-sleep infrastructure.
		 */
		case 'm':
			if (strcmp(optarg, "standby") == 0
					|| strcmp(optarg, "mem") == 0
					|| strcmp(optarg, "disk") == 0
					|| strcmp(optarg, "on") == 0
					) {
				suspend = optarg;
				break;
			}
			printf("%s: unrecognized suspend state '%s'\n",
					progname, optarg);
			goto usage;

		/* alarm time, seconds-to-sleep (relative) */
		case 's':
			t = atoi(optarg);
			if (t < 0) {
				printf("%s: illegal interval %s seconds\n",
						progname, optarg);
				goto usage;
			}
			seconds = t;
			break;

		/* alarm time, time_t (absolute, seconds since 1/1 1970 UTC) */
		case 't':
			t = atoi(optarg);
			if (t < 0) {
				printf("%s: illegal time_t value %s\n",
						progname, optarg);
				goto usage;
			}
			alarm = t;
			break;

		case 'u':
			rtc_is_utc = 1;
			break;

		case 'v':
			verbose++;
			break;

		case 'V':
			printf("%s: version %s\n", progname, VERSION);
			break;

		default:
usage:
			printf("usage: %s [options]"
				"\n\t"
				"-d rtc0|rtc1|...\t(select rtc)"
				"\n\t"
				"-l\t\t\t(RTC uses local timezone)"
				"\n\t"
				"-m standby|mem|...\t(sleep mode)"
				"\n\t"
				"-s seconds\t\t(seconds to sleep)"
				"\n\t"
				"-t time_t\t\t(time to wake)"
				"\n\t"
				"-u\t\t\t(RTC uses UTC)"
				"\n\t"
				"-v\t\t\t(verbose messages)"
				"\n\t"
				"-V\t\t\t(show version)"
				"\n",
				progname);
			return 1;
		}
	}

	if (!alarm && !seconds) {
		printf("%s: must provide wake time\n", progname);
		goto usage;
	}

	/* REVISIT:  if /etc/adjtime exists, read it to see what
	 * the util-linux version of hwclock assumes.
	 */
	if (rtc_is_utc == -1) {
		printf("%s: assuming RTC uses UTC ...\n", progname);
		rtc_is_utc = 1;
	}

	/* this RTC must exist and (if we'll sleep) be wakeup-enabled */
	fd = open(devname, O_RDONLY);
	if (fd < 0) {
		perror(devname);
		return 1;
	}
	if (strcmp(suspend, "on") != 0 && !may_wakeup(devname)) {
		printf("%s: %s not enabled for wakeup events\n",
				progname, devname);
		return 1;
	}

	/* relative or absolute alarm time, normalized to time_t */
	if (!get_basetimes(fd))
		return 1;
	if (verbose)
		printf("alarm %ld, sys_time %ld, rtc_time %ld, seconds %u\n",
				alarm, sys_time, rtc_time, seconds);
	if (alarm) {
		if (alarm < sys_time) {
			printf("%s: time doesn't go backward to %s",
					progname, ctime(&alarm));
			return 1;
		}
		alarm += sys_time - rtc_time;
	} else
		alarm = rtc_time + seconds + 1;
	if (setup_alarm(fd, &alarm) < 0)
		return 1;

	sync();
	printf("%s: wakeup from \"%s\" using %s at %s",
			progname, suspend, devname,
			ctime(&alarm));
	fflush(stdout);
	usleep(10 * 1000);

	if (strcmp(suspend, "on") != 0)
		suspend_system(suspend);
	else {
		unsigned long data;

		do {
			t = read(fd, &data, sizeof data);
			if (t < 0) {
				perror("rtc read");
				break;
			}
			if (verbose)
				printf("... %s: %03lx\n", devname, data);
		} while (!(data & RTC_AF));
	}

	if (ioctl(fd, RTC_AIE_OFF, 0) < 0)
		perror("disable rtc alarm interrupt");

	close(fd);
	return 0;
}

This patch:

Make rtc-cmos do the relevant magic so this RTC can wake the system from a
sleep state.  That magic comes in two basic flavors:

 - Straightforward:  enable_irq_wake(), the way it'd work on most SOC chips;
   or generally with system sleep states which don't disable core IRQ logic.

 - Roundabout, using non-IRQ platform hooks.  This is needed with ACPI and
   one almost-clone chip which uses a special wakeup-only alarm.  (That's
   the RTC used on Footbridge boards, FWIW, which don't do PM in Linux.)

A separate patch implements those hooks for ACPI platforms, so that rtc_cmos
can issue system wakeup events (and its sysfs "wakealarm" attribute works on
at least some systems).

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
David Brownell
cd9662094e rtc: remove rest of class_device
Finish converting the RTC framework so it no longer uses class_device.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-By: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
David Brownell
ab6a2d70d1 rtc: rtc interfaces don't use class_device
This patch removes class_device from the programming interface that the RTC
framework exposes to the rest of the kernel.  Now an rtc_device is passed,
which is more type-safe and streamlines all the relevant code.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-By: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
David Brownell
5726fb2012 rtc: remove /sys/class/rtc-dev/*
This simplifies the /dev support by removing a superfluous class_device (the
/sys/class/rtc-dev stuff) and the class_interface that hooks it into the rtc
core.  Accordingly, if it's configured then /dev support is now part of the
RTC core, and is never a separate module.

It's another step towards being able to remove "struct class_device".

[bunk@stusta.de: drivers/rtc/rtc-dev.c should #include "rtc-core.h"]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-By: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
Ulrich Drepper
1c710c896e utimensat implementation
Implement utimensat(2) which is an extension to futimesat(2) in that it

a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
   of the BSD lutimes(3) functions

For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.

Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.

Also, the completely missing futimensat() functionality is added.  We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).

Test application (the syscall number will need per-arch editing):

#include <errno.h>
#include <fcntl.h>
#include <time.h>
#include <sys/time.h>
#include <stddef.h>
#include <syscall.h>

#define __NR_utimensat 280

#define UTIME_NOW       ((1l << 30) - 1l)
#define UTIME_OMIT      ((1l << 30) - 2l)

int
main(void)
{
  int status = 0;

  int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
  if (fd == -1)
    error (1, errno, "failed to create test file \"ttt\"");

  struct stat64 st1;
  if (fstat64 (fd, &st1) != 0)
    error (1, errno, "fstat failed");

  struct timespec t[2];
  t[0].tv_sec = 0;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 0;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  struct stat64 st2;
  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
    {
      puts ("atim not reset to zero");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim not reset to zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0] = st1.st_atim;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_OMIT;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
      || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
    {
      puts ("atim not set");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim changed from zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_OMIT;
  t[1] = st1.st_mtim;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
      || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
    {
      puts ("mtim changed from original time");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
      || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
    {
      puts ("mtim not set");
      status = 1;
    }
  if (status != 0)
    goto out;

  sleep (2);

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_NOW;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_NOW;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  struct timeval tv;
  gettimeofday(&tv,NULL);

  if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
      || st2.st_atim.tv_sec > tv.tv_sec)
    {
      puts ("atim not set to NOW");
      status = 1;
    }
  if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
      || st2.st_mtim.tv_sec > tv.tv_sec)
    {
      puts ("mtim not set to NOW");
      status = 1;
    }

  if (symlink ("ttt", "tttsym") != 0)
    error (1, errno, "cannot create symlink");

  t[0].tv_sec = 0;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 0;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
    error (1, errno, "utimensat failed");

  if (lstat64 ("tttsym", &st2) != 0)
    error (1, errno, "lstat failed");

  if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
    {
      puts ("symlink atim not reset to zero");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("symlink mtim not reset to zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0].tv_sec = 1;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 1;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
    {
      puts ("atim not reset to one");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim not reset to one");
      status = 1;
    }

  if (status == 0)
     puts ("all OK");

 out:
  close (fd);
  unlink ("ttt");
  unlink ("tttsym");

  return status;
}

[akpm@linux-foundation.org: add missing i386 syscall table entry]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
Eric Dumazet
5517d86bea Speed up divides by cpu_power in scheduler
I noticed expensive divides done in try_to_wakeup() and
find_busiest_group() on a bi dual core Opteron machine (total of 4 cores),
moderatly loaded (15.000 context switch per second)

oprofile numbers :

CPU: AMD64 processors, speed 2600.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 50000
samples  %        symbol name
...
613914    1.0498  try_to_wake_up
    834  0.0013 :ffffffff80227ae1:   div    %rcx
77513  0.1191 :ffffffff80227ae4:   mov    %rax,%r11

608893    1.0413  find_busiest_group
   1841  0.0031 :ffffffff802260bf:       div    %rdi
140109  0.2394 :ffffffff802260c2:       test   %sil,%sil

Some of these divides can use the reciprocal divides we introduced some
time ago (currently used in slab AFAIK)

We can assume a load will fit in a 32bits number, because with a
SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432

When/if we reach this limit one day, probably cpus will have a fast
hardware divide and we can zap the reciprocal divide trick.

Ingo suggested to rename cpu_power to __cpu_power to make clear it should
not be modified without changing its reciprocal value too.

I did not convert the divide in cpu_avg_load_per_task(), because tracking
nr_running changes may be not worth it ?  We could use a static table of 32
reciprocal values but it would add a conditional branch and table lookup.

[akpm@linux-foundation.org: !SMP build fix]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:17 -07:00
Siddha, Suresh B
46cb4b7c88 sched: dynticks idle load balancing
Fix the process idle load balancing in the presence of dynticks.  cpus for
which ticks are stopped will sleep till the next event wakes it up.
Potentially these sleeps can be for large durations and during which today,
there is no periodic idle load balancing being done.

This patch nominates an owner among the idle cpus, which does the idle load
balancing on behalf of the other idle cpus.  And once all the cpus are
completely idle, then we can stop this idle load balancing too.  Checks added
in fast path are minimized.  Whenever there are busy cpus in the system, there
will be an owner(idle cpu) doing the system wide idle load balancing.

Open items:
1. Intelligent owner selection (like an idle core in a busy package).
2. Merge with rcu's nohz_cpu_mask?

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:17 -07:00
Mike Frysinger
a7e27d5dd3 sanitize linux/isdn_divertif.h for userspace
the isdn_divertif contains kernel-only references so I've wrapped them in
__KERNEL__ and add proper #include statements.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Karsten Keil <kkeil@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:16 -07:00
Adrian Bunk
3a3a51d1f2 make drivers/isdn/capi/capiutil.c:cdebbuf_alloc() static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:16 -07:00
Jan Nikitenko
63bd23591e au1550 SPI controller driver
Here is a driver for the Alchemy au1550 PSC (Programmable Serial
Controller) in SPI master mode.

It supports dma transfers using the Alchemy descriptor based dma controller
for 4-8 bits per word SPI transfers.  For 9-24 bits per word transfers, pio
irq based mode is used to avoid setup of dma channels from scratch on each
number of bits per word change.

Tested with au1550; this may also work on other MIPS Alchemy cpus, like
au1200/au1210/au1250.  Used extensively with SD card connected via SPI;
this handles 8.1MHz SPI clock transfers using dma without any problem (the
highest SPI clock freq possible with au1550 running on 324MHz).

The driver supports sharing of SPI bus by multiple devices.  All features
of Alchemy SPI mode are supported (all SPI modes, msb/lsb first, bits per
word in 4-24 range).

As the SPI clock of the controller depends on main input clock that shall
be configured externally, platform data structure for au1550 SPI controller
driver contains mainclk_hz attribute to define the input clock rate.  From
this value, dividers of the controller for SPI clock are set up for
required frequency.

Signed-off-by: Jan Nikitenko <jan.nikitenko@gmail.com>

Whitespace and section fixups.  Remove partial workaround for platform
setup bug in dma_mask setup; it couldn't work with multiple controllers.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:16 -07:00
David Brownell
33e34dc6ee SPI kerneldoc
Various documentation updates for the SPI infrastructure, to clarify things
that may not have been clear, to cope with lack of editing, and fix
omissions.

Also, plug SPI into the kernel-api DocBook template, and fix all the
resulting glitches in document generation.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:16 -07:00
Andrea Paterniani
814a8d50eb /dev/spidevB.C interface
Add a filesystem API for <linux/spi/spi.h> stack.  The initial version of
this interface is purely synchronous.

dbrownell@users.sourceforge.net:

 Cleaned up, bugfixed; much simplified; added preliminary documentation.

 Works with mdev given CONFIG_SYSFS_DEPRECATED; and presumably udev.

 Updated SPI_IOC_MESSAGE ioctl to full spi_message semantics, supporting
 groups of one or more transfers (each of which may be full duplex if
 desired).

 This is marked as EXPERIMENTAL with an explicit disclaimer that the API
 (notably the ioctls) is subject to change.

Signed-off-by: Andrea Paterniani <a.paterniani@swapp-eng.it>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:15 -07:00
Sergei Shtylyov
ce0be1273d clockchips.h: kernel-doc fix
Fix misnamed fields of 'struct clock_event_device' in the kernel-doc
comment.  Convert the acronyms to uppercase, while at it...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:15 -07:00
Mike Frysinger
acd64b7375 hide spinlock in linux/quota.h behind __KERNEL__
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Jan Kara <jack@ucw.cz>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:15 -07:00
David Woodhouse
6d4d8c0aa2 Add taskstats.h to kbuild
Add taskstats.h to include/linux/Kbuild, make headers_install would then
pickup taskstats.h.  This needs to be done as taskstats.h is a user
interface header.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:15 -07:00
Jiri Slaby
cef2cf0727 Misc: add sensable phantom driver
Add sensable phantom driver

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
akpm@linux-foundation.org
f19b121e21 Driver for the Maxim DS1WM, a 1-wire bus master ASIC core
Cc: Matt Reimer <mreimer@vpop.net>

[akpm@linux-foundation.org: kconfig update]
Signed-off-by: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
Randy Dunlap
63f6564d35 x86_64: kill 19000+ sparse warnings
Eliminate 19439 (!!) sparse warnings like:
include/linux/mm.h:321:22: warning: constant 0xffff810000000000 is so big it is unsigned long

Eliminate 56 sparse warnings like:
arch/x86_64/kernel/setup.c:248:16: warning: constant 0xffffffff80000000 is so big it is unsigned long

Eliminate 5 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0xfffffffffff00000 is so big it is unsigned long

Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:551:37: warning: constant 0xffffc20000000000 is so big it is unsigned long

Eliminate 6 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0xffffffff88000000 is so big it is unsigned long

Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:552:6: warning: constant 0xffffe1ffffffffff is so big it is unsigned long

Eliminate 3 sparse warnings like:
arch/x86_64/kernel/e820.c:186:17: warning: constant 0x3fffffffffff is so big it is long

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
Randy Dunlap
6df95fd7ad consolidate asm/const.h to linux/const.h
Make a global linux/const.h header file instead of having multiple,
per-arch files, and convert current users of asm/const.h to use
linux/const.h.

Built on x86_64 and sparc64.

[akpm@linux-foundation.org: fix include/asm-x86_64/Kbuild]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
OGAWA Hirofumi
28ec039c21 fat: don't use free_clusters for fat32
It seems that the recent Windows changed specification, and it's
undocumented.  Windows doesn't update ->free_clusters correctly.

This patch doesn't use ->free_clusters by default.  (instead, add "usefree"
for forcing to use it)

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Juergen Beisert <juergen127@kreuzholzen.de>
Cc: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
David Gibson
0bb5e19d63 Clean up mostly unused IOSPACE macros
Most architectures defined three macros, MK_IOSPACE_PFN(), GET_IOSPACE()
and GET_PFN() in pgtable.h.  However, the only callers of any of these
macros are in Sparc specific code, either in arch/sparc, arch/sparc64 or
drivers/sbus.

This patch removes the redundant macros from all architectures except
sparc and sparc64.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
Jan Kara
28be5abb40 ext3: copy i_flags to inode flags on write
A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc.  from
i_flags to EXT3_I(inode)->i_flags when inode is written to disk.  The same
thing is done on GETFLAGS ioctl.

Quota code changes these flags on quota files (to make it harder for
sysadmin to screw himself) and these changes were not correctly propagated
into the filesystem (especially, lsattr did not show them and users were
wondering...).

Propagate flags such as S_APPEND, S_IMMUTABLE, etc.  from i_flags into
ext3-specific i_flags.  Hence, when someone sets these flags via a
different interface than ioctl, they are stored correctly.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:12 -07:00
Michael Ellerman
d1ab824be4 Document SPIN_LOCK_UNLOCKED/RW_LOCK_UNLOCKED deprecation
Apparently it's not cool anymore to use SPIN/RW_LOCK_UNLOCKED.  There's
some mention of this in Documentation/spinlocks.txt, but that only talks
about dynamic initialisation.

A comment in the code mentioning the preferred usage would be good IMHO.

[akpm@linux-foundation.org: add reason for deprecation]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:11 -07:00
Pavel Emelianov
b5e618181a Introduce a handy list_first_entry macro
There are many places in the kernel where the construction like

   foo = list_entry(head->next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

   list_first_entry(head, type, member) \
             list_entry((head)->next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
 If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:11 -07:00
Milind Arun Choudhary
b32e41bb97 SPIN_LOCK_UNLOCKED cleanup in init_task.h
SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
Bjorn Helgaas
873ec74615 EFI: warn only for pre-1.00 system tables
We used to warn unless the EFI system table major revision was exactly 1.
But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't
affect anything in Linux.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
David Brownell
49a4ec188f fix hotplug for legacy platform drivers
We've had various reports of some legacy "probe the hardware" style
platform drivers having nasty problems with hotplug support.

The core issue is that those legacy drivers don't fully conform to the
driver model.  They assume a role that should be the responsibility of
infrastructure code: creating device nodes.

The "modprobe" step in hotplugging relies on drivers to have split those
roles into different modules.  The lack of this split causes the problems.
When a driver creates nodes for devices that don't exist (sending a hotplug
event), then exits (aborting one modprobe) before the "modprobe $MODALIAS"
step completes (by failing, since it's in the middle of a modprobe), the
result can be an endless loop of modprobe invocations ...  badness.

This fix uses the newish per-device flag controlling issuance of "add"
events.  (A previous version of this patch used a per-device "driver can
hotplug" flag, which only scrubbed $MODALIAS from the environment rather
than suppressing the entire hotplug event.) It also shrinks that flag to
one bit, saving a word in "struct device".

So the net of this patch is removing some nasty failures with legacy
drivers, while retaining hotplug capability for the majority of platform
drivers.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Greg KH <gregkh@suse.de>
Cc: Andres Salomon <dilinger@debian.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
Christoph Hellwig
6272e26679 cleanup compat ioctl handling
Merge all compat ioctl handling into compat_ioctl.c instead of splitting it
over compat.c and compat_ioctl.c.  This also allows to get rid of ioctl32.h

Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks-good-to: Andi Kleen <ak@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Ravikiran G Thirumalai
e729aa16b1 Pad irq_desc to internode cacheline size
We noticed a drop in n/w performance due to the irq_desc being cacheline
aligned rather than internode aligned.  We see 50% of expected performance
when two e1000 nics local to two different nodes have consecutive irq
descriptors allocated, due to false sharing.

Note that this patch does away with cacheline padding for the UP case, as
it does not seem useful for UP configurations.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Christoph Hellwig
644fd4f5de merge compat_ioctl.h into compat_ioctl.c
Now that there is no arch-specific compat ioctl handling left there is not
point in having a separate copat_ioctl.h, so merge it into compat_ioctl.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Alexey Dobriyan
7e80d0d0b6 i386: sched.h inclusion from module.h is baack
linux/module.h
  -> linux/elf.h
     -> asm-i386/elf.h
        -> linux/utsname.h
           -> linux/sched.h

Noticeably cut the number of files which are rebuild upon touching sched.h
and cut down pulled junk from every module.h inclusion.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00
Thomas Gleixner
0e8638e2ac Deprecate SA_xxx interrupt flags -V2
The deprecation of the SA_xxx interrupt flags did not emit deprecated
warnings. Andrew said about the removal of the deprecated flag defines:

> This is going to break a lot of external stuff.  We should have found
> a way to make usage of SA_* emit deprecated warnings (or _some_
> warning) to warn people of impending doom.  But I can't immediately
> find a way of doing that. if we _can_ find a way of doing this, I
> suspect we'll need to do it, and give people another six months.  It's
> going to get ugly out there.  We shall see...

Define the deprecated flags as a call to a __deprecated inline function
so a warning is emitted on compile time.

Extend the reprieve of out of tree drivers to 9/2007.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00