Commit Graph

465 Commits

Author SHA1 Message Date
Maciej W. Rozycki
38e0e8c055 [PATCH] char/rtc: Handle memory-mapped chips properly
Handle memory-mapped chips properly, needed for example on DECstations.
This support was in Linux 2.4 but for some reason got lost in 2.6.  This
patch is taken directly from the linux-mips repository.

[akpm@osdl.org: cleanup]
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Cc: Paul Gortmaker <penguin@muskoka.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:25 -07:00
Linus Torvalds
6fa0cb1141 Merge git://git.infradead.org/hdrinstall-2.6
* git://git.infradead.org/hdrinstall-2.6:
  Remove export of include/linux/isdn/tpam.h
  Remove <linux/i2c-id.h> and <linux/i2c-algo-ite.h> from userspace export
  Restrict headers exported to userspace for SPARC and SPARC64
  Add empty Kbuild files for 'make headers_install' in remaining arches.
  Add Kbuild file for Alpha 'make headers_install'
  Add Kbuild file for SPARC 'make headers_install'
  Add Kbuild file for IA64 'make headers_install'
  Add Kbuild file for S390 'make headers_install'
  Add Kbuild file for i386 'make headers_install'
  Add Kbuild file for x86_64 'make headers_install'
  Add Kbuild file for PowerPC 'make headers_install'
  Add generic Kbuild files for 'make headers_install'
  Basic implementation of 'make headers_check'
  Basic implementation of 'make headers_install'
2006-07-04 12:55:45 -07:00
Thomas Gleixner
f40298fddc [PATCH] irq-flags: MIPS: Use the new IRQF_ constants
Use the new IRQF_ constants and remove the SA_INTERRUPT define

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-02 13:58:47 -07:00
Catherine Zhang
877ce7c1b3 [AF_UNIX]: Datagram getpeersec
This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:06 -07:00
Linus Torvalds
8d231c11fd Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (33 commits)
  [MIPS] Add missing backslashes to macro definitions.
  [MIPS] Death list of board support to be removed after 2.6.18.
  [MIPS] Remove BSD and Sys V compat data types.
  [MIPS] ioc3.h: Uses u8, so include <linux/types.h>.
  [MIPS] 74K: Assume it will also have an AR bit in config7
  [MIPS] Treat CPUs with AR bit as physically indexed.
  [MIPS] Oprofile: Support VSMP on 34K.
  [MIPS] MIPS32/MIPS64 S-cache fix and cleanup
  [MIPS] excite: PCI makefile needs to use += if it wants a chance to work.
  [MIPS] excite: plat_setup -> plat_mem_setup.
  [MIPS] au1xxx: export dbdma functions
  [MIPS] au1xxx: dbdma, no sleeping under spin_lock
  [MIPS] au1xxx: fix PSC_SMBTXRX_RSR.
  [MIPS] Early printk for IP27.
  [MIPS] Fix handling of 0 length I & D caches.
  [MIPS] Typo fixes.
  [MIPS] MIPS32/MIPS64 secondary cache management
  [MIPS] Fix FIXADDR_TOP for TX39/TX49.
  [MIPS] Remove first timer interrupt setup in wrppmc_timer_setup()
  [MIPS] Fix configuration of R2 CPU features and multithreading.
  ...
2006-06-29 13:44:45 -07:00
Ralf Baechle
8db089c6b5 [MIPS] Add missing backslashes to macro definitions.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:55 +01:00
Ralf Baechle
fc103349bb [MIPS] Remove BSD and Sys V compat data types.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:54 +01:00
Ralf Baechle
89e22d1591 [MIPS] ioc3.h: Uses u8, so include <linux/types.h>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:54 +01:00
Domen Puncer
38e9156147 [MIPS] au1xxx: fix PSC_SMBTXRX_RSR.
Signed-off-by: Domen Puncer <domen.puncer@ultra.si>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:53 +01:00
Atsushi Nemoto
f7a849153b [MIPS] Fix FIXADDR_TOP for TX39/TX49.
FIXADDR_TOP is used for HIGHMEM and for upper limit of vmalloc area on
32bit kernel.  TX39XX and TX49XX have "reserved" segment in CKSEG3
area.  0xff000000-0xff3fffff on TX49XX and 0xff000000-0xfffeffff on
TX39XX are reserved (unmapped, uncached) therefore can not be used as
mapped area.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:52 +01:00
Ralf Baechle
f41ae0b2b9 [MIPS] Fix configuration of R2 CPU features and multithreading.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:51 +01:00
Ralf Baechle
4277ff5ee5 [MIPS] Fix use of ehb instruction for non-R2 configurations.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:49 +01:00
Ralf Baechle
b4ab24e1c8 [MIPS] Define ARCH_HAS_IRQ_PER_CPU for all SMP systems.
Without SMTC on non-Malta will blow up.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:48 +01:00
Ralf Baechle
136d47d3e1 [MIPS] Wire up tee(2).
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-29 21:10:47 +01:00
Ingo Molnar
c0ad90a32f [PATCH] genirq: add ->retrigger() irq op to consolidate hw_irq_resend()
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations.
(Most architectures had it defined to NOP anyway.)

NOTE: ia64 needs testing. i386 and x86_64 tested.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:23 -07:00
Ingo Molnar
0d7012a968 [PATCH] genirq: cleanup: turn ARCH_HAS_IRQ_PER_CPU into CONFIG_IRQ_PER_CPU
Cleanup: change ARCH_HAS_IRQ_PER_CPU into a Kconfig method.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29 10:26:23 -07:00
akpm@osdl.org
838cd153a5 [PATCH] N32 sigset and __COMPAT_ENDIAN_SWAP__
I'm testing glibc on MIPS64, little-endian, N32, O32 and N64 multilibs.

Among the NPTL test failures seen are some arising from sigsuspend problems
for N32: it blocks the wrong signals, so SIGCANCEL (SIGRTMIN) is blocked
despite glibc's carefully excluding it from sets of signals to block.
Specifically, testing suggests it blocks signal N^32 instead of signal N,
so (in the example tested) blocking SIGUSR1 (17) blocks signal 49 instead.

glibc's sigset_t uses an array of unsigned long, as does the kernel.
In both cases, signal N+1 is represented as
(1UL << (N % (8 * sizeof (unsigned long)))) in word number
(N / (8 * sizeof (unsigned long))).

Thus the N32 glibc uses an array of 32-bit words and the N64 kernel uses an
array of 64-bit words.  For little-endian, the layout is the same, with
signals 1-32 in the first 4 bytes, signals 33-64 in the second, etc.; for
big-endian, userspace has that layout while in the kernel each 8 bytes have
the two halves swapped from the userspace layout.

The N32 sigsuspend syscall uses sigset_from_compat to convert the userspace
sigset to kernel format.  If __COMPAT_ENDIAN_SWAP__ is *not* set, this uses
logic of the form

  set->sig[0] = compat->sig[0] | (((long)compat->sig[1]) << 32 )

to convert the userspace sigset to a kernel one.  This looks correct to me
for both big and little endian, given that in userspace compat->sig[1] will
represent signals 33-64, and so will the high 32 bits of set->sig[0] in the
kernel.  If however __COMPAT_ENDIAN_SWAP__ *is* set, as it is for
__MIPSEL__, it uses

  set->sig[0] = compat->sig[1] | (((long)compat->sig[0]) << 32 );

which seems incorrect for both big and little endian, and would
explain the observed symptoms.

This code is the only use of __COMPAT_ENDIAN_SWAP__, so if incorrect
then that macro serves no purpose, in which case something like the
following patch would seem appropriate to remove it.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:15 -07:00
Matt Mackall
afedfd016a [PATCH] random: remove SA_SAMPLE_RANDOM from floppy driver
The floppy driver is already calling add_disk_randomness as it should, so this
was redundant.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:00 -07:00
Sergei Shtylyov
f2c780c1fd [PATCH] Au1550/1200: add missing PSC #define's, make OSS driver use the proper ones
Add missing PSC #define's required for the drivers using PSC on DBAu1550
board (also fixing Au1550 PSC3 address) and all Au1200-based boards as
well.  Make the OSS driver use the correct PSC definitions fo each board.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:56 -07:00
Ralf Baechle
d501e62bc7 [PATCH] Delete unused definitions of kvaddr_to_nid
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:52 -07:00
Bjorn Helgaas
4f1bcaf094 [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use
VGA_MAP_MEM translates to ioremap() on some architectures.  It makes sense
to do this to vga_vram_base, because we're going to access memory between
vga_vram_base and vga_vram_end.

But it doesn't really make sense to map starting at vga_vram_end, because
we aren't going to access memory starting there.  On ia64, which always has
to be different, ioremapping vga_vram_end gives you something completely
incompatible with ioremapped vga_vram_start, so vga_vram_size ends up being
nonsense.

As a bonus, we often know the size up front, so we can use ioremap()
correctly, rather than giving it a zero size.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:58 -07:00
Linus Torvalds
cee4cca740 Merge git://git.infradead.org/hdrcleanup-2.6
* git://git.infradead.org/hdrcleanup-2.6: (63 commits)
  [S390] __FD_foo definitions.
  Switch to __s32 types in joystick.h instead of C99 types for consistency.
  Add <sys/types.h> to headers included for userspace in <linux/input.h>
  Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h
  Remove struct fddi_statistics from user view in <linux/if_fddi.h>
  Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
  Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
  Include <linux/types.h> and use __uXX types in <linux/cramfs_fs.h>
  Use __uXX types in <linux/i2o_dev.h>, include <linux/ioctl.h> too
  Remove private struct dx_hash_info from public view in <linux/ext3_fs.h>
  Include <linux/types.h> and use __uXX types in <linux/affs_hardblocks.h>
  Use __uXX types in <linux/divert.h> for struct divert_blk et al.
  Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible.
  Remove PPP_FCS from user view in <linux/ppp_defs.h>, remove __P mess entirely
  Use __uXX types in user-visible structures in <linux/nbd.h>
  Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
  Use __uXX types for S390 DASD volume label definitions which are user-visible
  S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
  Remove unneeded inclusion of <linux/time.h> from <linux/ufs_fs.h>
  Fix private integer types used in V4L2 ioctls.
  ...

Manually resolve conflict in include/linux/mtd/physmap.h
2006-06-20 15:10:08 -07:00
Atsushi Nemoto
1723b4a34a [MIPS] Make timer interrupt frequency configurable from kconfig.
Make HZ configurable.  DECSTATION can select 128/256/1024 HZ, JAZZ can
only select 100 HZ, others can select 100/128/250/256/1000/1024 HZ if
not explicitly specified).  Also remove all mach-xxx/param.h files and
update all defconfigs according to current HZ value.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:27 +01:00
Kumba
c3b1c2de22 [MIPS] Fix R4K cache macro names
Several machines have the R4K cache macro name spelled incorrectly.  Namely,
they have cpu_has_4kcache defined instead of cpu_has_4k_cache.

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:26 +01:00
Kumba
2493921c28 [MIPS] Add Missing R4K Cache Macros to IP27 & IP32
Keeping in accordance with other machines, IP27 and IP32 lack a few
macros.  IP27 lacks cpu_has_4kex & cpu_has_4k_cache macros while IP32
lacks just the cpu_has_4k_cache macro.

Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:26 +01:00
Ralf Baechle
35189fad3c [MIPS] Support for the RM9000-based Basler eXcite smart camera platform.
Signed-off-by: Thomas Koeller <thomas.koeller@baslerweb.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:26 +01:00
dmitry pervushin
355c471f2f [MIPS] Support for the R5500-based NEC EMMA2RH Mark-eins board
Signed-off-by: dmitry pervushin  <dpervushin@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:26 +01:00
Thomas Bogendoerfer
4a0312fca6 [MIPS] Support SNI RM200C SNI in big endian mode and R5000 processors.
Added support for RM200C machines with big endian firmware
Added support for RM200-C40 (R5000 support)
    
Signed-off-by: Florian Lohoff <flo@rfc822.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:24 +01:00
Ralf Baechle
b00f473e1a [MIPS] SN: include asm/sn/types.h for nasid_t.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:24 +01:00
Ralf Baechle
470b160364 [MIPS] Remove support for NEC DDB5476.
As warned several times before.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:24 +01:00
Ralf Baechle
eaff388874 [MIPS] Remove support for NEC DDB5074.
As warned several times before.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:24 +01:00
Ralf Baechle
2925aba422 [MIPS] Cleanup memory managment initialization.
Historically plat_mem_setup did the entire platform initialization.  This
was rather impractical because it meant plat_mem_setup had to get away
without any kind of memory allocator.  To keep old code from breaking
plat_setup was just renamed to plat_setup and a second platform
initialization hook for anything else was introduced.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:23 +01:00
Ralf Baechle
7ab2dc41d1 [MIPS] SN: Declare bridge_pci_ops.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:23 +01:00
Ralf Baechle
f456acae4f [MIPS] IP27: Cleanup N/M mode configuration.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:23 +01:00
Ralf Baechle
0986625861 [MIPS] IP27: Throw away old unused hacks.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:22 +01:00
Ralf Baechle
d9b8d0da40 [MIPS] Drop 0 definition for kern_addr_valid
kern_addr_valid is currently only being used in kmem_ptr_validate which
is making some vague attempt at verfying the validity of an address.
Only IA-64, PARISC and x86-64 actually make some actual effort to verify
the validity of the pointer.  Most architecture definitions of
kern_addr_valid() just define it as 1; the Alpha and CONFIG_DISCONTIGMEM
on i386 and MIPS even as 0; the 0-definition will result in
kmem_ptr_validate always failing which in turn will cause d_validate to
always fail.  d_validate's only two users are smbfs and ncpfs, so the
0 definition ended breaking those ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:22 +01:00
Ralf Baechle
e53639d8f3 [MIPS] Consolidate definitions of pfn_valid in one file.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:22 +01:00
Rodolfo Giometti
952fa954a6 [MIPS] APM emu support
Signed-off-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:22 +01:00
Ralf Baechle
aa9772e330 [MIPS] SN: Rename SGI_SN0_N_MODE -> SGI_SN_N_MODE.
It's not SN0-specific.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:22 +01:00
Ralf Baechle
bf5a312b26 [MIPS] SN: Move FRU header one level up; it is not SN0-specific.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:22 +01:00
Ralf Baechle
3e0ba410a5 [MIPS] IP27: Remove #if 0'ed code.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:21 +01:00
Ralf Baechle
8f2f360da9 [MIPS] IP27: Nuke leftovers of _STANDALONE
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:21 +01:00
Ralf Baechle
edc123d183 [MIPS] IP27: Remove leftovers of sable support.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:21 +01:00
Ralf Baechle
8dbd1d3e65 [MIPS] IP27: Nuke last leftovers from FRUTEST
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:21 +01:00
Ralf Baechle
b383f47ec7 [MIPS] IP27: Nuke last leftovers of CONFIG_SGI_IO.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:20 +01:00
Ralf Baechle
1bd5e16168 [MIPS] Cleanup __emt() a bit.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:20 +01:00
Yoichi Yuasa
c0589f1ece [MIPS] Remove unused definitions from addrspace.h.
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:19 +01:00
Thiemo Seufer
c583122c26 [MIPS] Qemu system shutdown support
Signed-off-by: Thiemo Seufer <ths@networkno.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:19 +01:00
Atsushi Nemoto
eae89076e6 [MIPS] Unify mips_fpu_soft_struct and mips_fpu_hard_structs.
The struct mips_fpu_soft_struct and mips_fpu_hard_struct are
completely same now and the kernel fpu emulator assumes that.  This
patch unifies them to mips_fpu_struct and get rid of mips_fpu_union.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:18 +01:00
Mark.Zhan
a240a46964 [MIPS] Wind River 4KC PPMC Eval Board Support
Support for the GT-64120-based Wind River 4KC PPMC Evaluation board.

Signed-off-by: Rongkai.Zhan <Rongkai.zhan@windriver.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:18 +01:00
Atsushi Nemoto
0307e8d024 [MIPS] Fix futex_atomic_op_inuser.
I found that NPTL's pthread_cond_signal() does not work properly on
kernels compiled by gcc 4.1.x.  I suppose inline asm for
__futex_atomic_op() was wrong.  I suppose:

1. "=&r" constraint should be used for oldval.
2. Instead of "r" (uaddr), "=R" (*uaddr) for output and "R" (*uaddr)
   for input should be used.
3. "memory" should be added to the clobber list.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:17 +01:00
Atsushi Nemoto
c138e12f3a [MIPS] Fix fpu_save_double on 64-bit.
> Without this fix, _save_fp() in 64-bit kernel is seriously broken.
>
> ffffffff8010bec0 <_save_fp>:
> ffffffff8010bec0:       400d6000        mfc0    t1,c0_status
> ffffffff8010bec4:       000c7140        sll     t2,t0,0x5
> ffffffff8010bec8:       05c10011        bgez    t2,ffffffff8010bf10 <_save_fp+0x50>
> ffffffff8010becc:       00000000        nop
> ffffffff8010bed0:       f4810328        sdc1    $f1,808(a0)
> ...

Fix register usage in fpu_save_double() and make fpu_restore_double()
more symmetric with fpu_save_double().

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-19 17:39:13 +01:00
David Woodhouse
ef4d04b87d Add empty Kbuild files for 'make headers_install' in remaining arches.
These include nothing more than the basic set of files listed in
asm-generic/Kbuild.asm. Any extra arch-specific files will need to be
added.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-06-18 12:58:53 +01:00
Chad Reese
b1c231f5a5 [MIPS] Fix sparsemem support.
Move memory_present() in arch/mips/kernel/setup.c. When using sparsemem
extreme, this function does an allocate for bootmem. This would always
fail since init_bootmem hasn't been called yet.
    
Move memory_present after free_bootmem. This only marks actual memory
ranges as present instead of the entire address space.
    
Signed-off-by: Chad Reese  <creese@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:20 +01:00
Ralf Baechle
e32b699335 [MIPS] Fix 64-bit build for RM7000.
RM7000 has 40-bit virtual / 36-bit physical address space.
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:19 +01:00
Sergei Shtylyov
7cb710c9a6 [MIPS] Fix non-linear memory mapping on MIPS
Fix the non-linear memory mapping done via remap_file_pages() -- it
didn't work on any MIPS CPU because the page offset clashing with
_PAGE_FILE and some other page protection bits which should have been left
zeros for this kind of pages.
    
Signed-off-by: Konstantin Baydarov <kbaidarov@ru.mvista.com>
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:18 +01:00
Sergei Shtylyov
6ebba0e2f5 [MIPS] Fix swap entry for MIPS32 36-bit physical address
With 64-bit physical address enabled, 'swapon' was causing kernel oops on
Alchemy CPUs (MIPS32) because of the swap entry type field corrupting the
_PAGE_FILE bit in 'pte_low' field. So, switch to storing the swap entry in
'pte_high' field using all its bits except _PAGE_GLOBAL and _PAGE_VALID which
gives 25 bits for the swap entry offset.
    
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:16 +01:00
Sergei Shtylyov
79e0bc3725 [MIPS] Fix mprotect() syscall for MIPS32 w/36-bit physical address support
Fix mprotect() syscall for MIPS32 CPUs with 36-bit physical address
support: pte_modify() macro didn't clear the hardware page protection bits
before modifying...
    
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:15 +01:00
Ralf Baechle
722ace9dfb [MIPS] Fix declaration of smp_prepare_cpus() platform hook.
A while ago prom_prepare_cpus was replaced by plat_prepare_cpus but
the declaration has stayed unchanged.
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:12 +01:00
Ralf Baechle
5ee823507b [MIPS] Fix instable BogoMIPS on multi-issue processors.
Increase alignment of BogoMIPS loop to 8 bytes.  Having the delay loop
overlap cache line boundaries may cause instable delays.
    
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:10 +01:00
Ralf Baechle
acf518cbba [MIPS] Remove duplicate declaration of cpu_online_map.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-06 00:15:09 +01:00
Linus Torvalds
48e49ead3e Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Fix D-cache corruption in mremap
  [SPARC64]: Make smp_processor_id() functional before start_kernel()
2006-06-02 16:02:22 -07:00
David S. Miller
0b0968a3e6 [SPARC64]: Fix D-cache corruption in mremap
If we move a mapping from one virtual address to another,
and this changes the virtual color of the mapping to those
pages, we can see corrupt data due to D-cache aliasing.

Check for and deal with this by overriding the move_pte()
macro.  Set things up so that other platforms can cleanly
override the move_pte() macro too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-01 17:47:25 -07:00
Kumba
44d921b246 [MIPS] Treat R14000 like R10000.
Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-01 00:28:35 +01:00
Thiemo Seufer
ca30225e9e [MIPS] Update/Fix instruction definitions
A small bugfix for up to now unused instruction definitions, and a
somewhat larger update to cover MIPS32R2 instructions.
    
Signed-off-by: Thiemo Seufer <ths@networkno.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-01 00:28:34 +01:00
Thiemo Seufer
3301edcbd7 [MIPS] DSP and MDMX share the same config flag bit.
Clarify comment.
    
Signed-off-by: Thiemo Seufer <ths@networkno.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-01 00:28:33 +01:00
Daniel Jacobowitz
1c0c1ae4f3 [MIPS] Update struct sigcontext member names
Rename the 64-bit sc_hi and sc_lo arrays to use the same names
as the 32-bit struct sigcontext (sc_mdhi, sc_hi1, et cetera).
    
Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-01 00:28:31 +01:00
Ralf Baechle
6ee1da94c5 [MIPS] Update/fix futex assembly
o Implement futex_atomic_op_inuser() operation
 o Don't use the R10000-ll/sc bug workaround version for every processor.
   branch likely is deprecated and some historic ll/sc processors don't
   implement it.  In any case it's slow.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-01 00:28:31 +01:00
Chris Dearman
c620953c32 [MIPS] Fix detection and handling of the 74K processor.
Nothing exciting; Linux just didn't know it yet so this is most adding
a value to a case statement.
    
Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-01 00:28:30 +01:00
Sergei Shtylyov
6e9538917c [MIPS] Fix marking buddy of pte global for MIPS32 w/36-bit physical address
In case of CONFIG_64BIT_PHYS_ADDR, set_pte() and pte_clear() functions
only set _PAGE_GLOBAL bit in the pte_low field of the buddy PTEs,
forgetting to propagate ito to pte_high. Thus, the both pages might not
really be made global for the CPU (since it AND's the G-bit of the
odd / even PTEs together to decide whether they're global or not). Thus,
if only a single page is allocated via vmalloc() or ioremap(), it's not
really global for CPU (and it must be, since this is kernel mapping),
and thus its ASID is compared against the current process' one -- so,
we'll get into trouble sooner or later...  Also, pte_none() will fail
on global pages because _PAGE_GLOBAL bit is set in both pte_low and
pte_high, and pte_val() will return u64 value consisting of those fields
concateneted.
    
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-06-01 00:28:30 +01:00
David Woodhouse
5614253686 Remove unneeded _syscallX macros from user view in asm-*/unistd.h
These aren't needed by glibc or klibc, and they're broken in some cases
anyway. The uClibc folks are apparently switching over to stop using
them too (now that we agreed that they should be dropped, at least).

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-29 01:51:47 +01:00
David Woodhouse
d6754b401a Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2006-04-29 01:42:26 +01:00
Chris Dearman
7a8341969f [MIPS] 24K LV: Add core card id.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-27 15:13:50 +01:00
Atsushi Nemoto
bc81824720 [MIPS] Fix bitops for MIPS32/MIPS64 CPUs.
With recent rewrite for generic bitops, fls() for 32bit kernel with
MIPS64_CPU is broken.  Also, ffs(), fls() should be defined the same
way as the libc and compiler built-in routines (returns int instead of
unsigned long).
    
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-27 15:13:49 +01:00
David Woodhouse
62c4f0a2d5 Don't include linux/config.h from anywhere else in include/
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-26 12:56:16 +01:00
Ralf Baechle
7e3bfc7cfc [MIPS] Handle IDE PIO cache aliases on SMP.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:29 +02:00
Ralf Baechle
b4ade4bf88 [MIPS] MIPS boards: Set HZ to 100.
1000Hz will bring an FPGA CPU down on it's knees and it's even worse on
multithreaded cores.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:29 +02:00
Ralf Baechle
f088fc84f9 [MIPS] FPU affinity for MT ASE.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:28 +02:00
Ralf Baechle
41c594ab65 [MIPS] MT: Improved multithreading support.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:28 +02:00
Ralf Baechle
2600990e64 [MIPS] kpsd and other AP/SP improvements.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:27 +02:00
Ralf Baechle
a682a24170 [MIPS] Fix genrtc compilation.
Signed-off-by: Ralf Roesch <ralf.roesch@rw-gmbh.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:22 +02:00
Ralf Baechle
675055bfb5 [MIPS] Use "R" constraint for cache_op.
Gcc might emit an absolute address for the the "m" constraint which
gas unfortunately does not permit.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:21 +02:00
Ralf Baechle
d35d473c25 [MIPS] Fix the crime against humanity that mipsIRQ.S is.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:21 +02:00
Ralf Baechle
ba8990f2ae [MIPS] JMR3927 build fixes for the RTC code.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:20 +02:00
Ralf Baechle
088cf96a69 [MIPS] MV6434x: Add prototype of interrupt dispatch function.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:20 +02:00
Ralf Baechle
ac2384a855 [MIPS] it8172: Fix build of serial driver.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:19 +02:00
Ralf Baechle
93373ed4d8 [MIPS] Rewrite spurious_interrupt from assembler to C.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:18 +02:00
Ralf Baechle
a8d587a71b [MIPS] Wire up sync_file_range(2).
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:14 +02:00
Ralf Baechle
f115da9cd6 [MIPS] Wire splice syscall.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:13 +02:00
Ralf Baechle
84ada9f856 [MIPS] More SHT_* and SHF_* ELF definitions.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:13 +02:00
Ralf Baechle
91b05e6776 [MIPS] Fix vectored interrupt support in TLB exception handler generator.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:13 +02:00
Ralf Baechle
15c4f67ab8 [MIPS] Provide access functions for c0_badvaddr.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:13 +02:00
Ralf Baechle
b4d05cb9cb [MIPS] Make set_vi_srs_handler static.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-04-19 04:14:12 +02:00
Yasunori Goto
c80d79d746 [PATCH] Configurable NODES_SHIFT
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch.  Its definition is sometimes configurable.  Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree.  But it looks a bit messy.

SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config.  Suitable node's number may be changed in the
future even if it is other architecture.  So, I wrote configurable node's
number.

This patch set defines just default value for each arch which needs multi
nodes except ia64.  But, it is easy to change to configurable if necessary.

On ia64 the number of nodes can be already configured in generic ia64 and SN2
config.  But, NODES_SHIFT is defined for DIG64 and HP'S machine too.  So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT.  It
would be simpler.

See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:39 -07:00
Linus Torvalds
d4965b3e2f Merge master.kernel.org:/home/rmk/linux-2.6-serial
* master.kernel.org:/home/rmk/linux-2.6-serial:
  [SERIAL] Provide Cirrus EP93xx AMBA PL010 serial support.
  [SERIAL] amba-pl010: allow platforms to specify modem control method
  [SERIAL] Remove obsoleted au1x00_uart driver
  [SERIAL] Small time UART configuration fix for AU1100 processor
2006-03-28 13:52:37 -08:00
Matt Mackall
da2468b6a8 [PATCH] RTC: Remove RTC UIP synchronization on MIPS MC146818
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:00 -08:00
Yoichi Yuasa
d23ee8fe6e [PATCH] mips: fixed collision of rtc function name
Fix the collision of rtc function name.

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Ingo Molnar
8f17d3a504 [PATCH] lightweight robust futexes updates
- fix: initialize the robust list(s) to NULL in copy_process.

- doc update

- cleanup: rename _inuser to _inatomic

- __user cleanups and other small cleanups

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
e9056f13bf [PATCH] lightweight robust futexes: arch defaults
This patchset provides a new (written from scratch) implementation of robust
futexes, called "lightweight robust futexes".  We believe this new
implementation is faster and simpler than the vma-based robust futex solutions
presented before, and we'd like this patchset to be adopted in the upstream
kernel.  This is version 1 of the patchset.

  Background
  ----------

What are robust futexes?  To answer that, we first need to understand what
futexes are: normal futexes are special types of locks that in the
noncontended case can be acquired/released from userspace without having to
enter the kernel.

A futex is in essence a user-space address, e.g.  a 32-bit lock variable
field.  If userspace notices contention (the lock is already owned and someone
else wants to grab it too) then the lock is marked with a value that says
"there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to
wait for the other guy to release it.  The kernel creates a 'futex queue'
internally, so that it can later on match up the waiter with the waker -
without them having to know about each other.  When the owner thread releases
the futex, it notices (via the variable value) that there were waiter(s)
pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up.  Once all
waiters have taken and released the lock, the futex is again back to
'uncontended' state, and there's no in-kernel state associated with it.  The
kernel completely forgets that there ever was a futex at that address.  This
method makes futexes very lightweight and scalable.

"Robustness" is about dealing with crashes while holding a lock: if a process
exits prematurely while holding a pthread_mutex_t lock that is also shared
with some other process (e.g.  yum segfaults while holding a pthread_mutex_t,
or yum is kill -9-ed), then waiters for that lock need to be notified that the
last owner of the lock exited in some irregular way.

To solve such types of problems, "robust mutex" userspace APIs were created:
pthread_mutex_lock() returns an error value if the owner exits prematurely -
and the new owner can decide whether the data protected by the lock can be
recovered safely.

There is a big conceptual problem with futex based mutexes though: it is the
kernel that destroys the owner task (e.g.  due to a SEGFAULT), but the kernel
cannot help with the cleanup: if there is no 'futex queue' (and in most cases
there is none, futexes being fast lightweight locks) then the kernel has no
information to clean up after the held lock!  Userspace has no chance to clean
up after the lock either - userspace is the one that crashes, so it has no
opportunity to clean up.  Catch-22.

In practice, when e.g.  yum is kill -9-ed (or segfaults), a system reboot is
needed to release that futex based lock.  This is one of the leading
bugreports against yum.

To solve this problem, 'Robust Futex' patches were created and presented on
lkml: the one written by Todd Kneisel and David Singleton is the most advanced
at the moment.  These patches all tried to extend the futex abstraction by
registering futex-based locks in the kernel - and thus give the kernel a
chance to clean up.

E.g.  in David Singleton's robust-futex-6.patch, there are 3 new syscall
variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER.
The kernel attaches such robust futexes to vmas (via
vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are
searched to see whether they have a robust_head set.

Lots of work went into the vma-based robust-futex patch, and recently it has
improved significantly, but unfortunately it still has two fundamental
problems left:

 - they have quite complex locking and race scenarios.  The vma-based
   patches had been pending for years, but they are still not completely
   reliable.

 - they have to scan _every_ vma at sys_exit() time, per thread!

The second disadvantage is a real killer: pthread_exit() takes around 1
microsecond on Linux, but with thousands (or tens of thousands) of vmas every
pthread_exit() takes a millisecond or more, also totally destroying the CPU's
L1 and L2 caches!

This is very much noticeable even for normal process sys_exit_group() calls:
the kernel has to do the vma scanning unconditionally!  (this is because the
kernel has no knowledge about how many robust futexes there are to be cleaned
up, because a robust futex might have been registered in another task, and the
futex variable might have been simply mmap()-ed into this process's address
space).

This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than
that: the overhead makes robust futexes impractical for any type of generic
Linux distribution.

So it became clear to us, something had to be done.  Last week, when Thomas
Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he
found a handful of new races and we were talking about it and were analyzing
the situation.  At that point a fundamentally different solution occured to
me.  This patchset (written in the past couple of days) implements that new
solution.  Be warned though - the patchset does things we normally dont do in
Linux, so some might find the approach disturbing.  Parental advice
recommended ;-)

  New approach to robust futexes
  ------------------------------

At the heart of this new approach there is a per-thread private list of robust
locks that userspace is holding (maintained by glibc) - which userspace list
is registered with the kernel via a new syscall [this registration happens at
most once per thread lifetime].  At do_exit() time, the kernel checks this
user-space list: are there any robust futex locks to be cleaned up?

In the common case, at do_exit() time, there is no list registered, so the
cost of robust futexes is just a simple current->robust_list != NULL
comparison.  If the thread has registered a list, then normally the list is
empty.  If the thread/process crashed or terminated in some incorrect way then
the list might be non-empty: in this case the kernel carefully walks the list
[not trusting it], and marks all locks that are owned by this thread with the
FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any).

The list is guaranteed to be private and per-thread, so it's lockless.  There
is one race possible though: since adding to and removing from the list is
done after the futex is acquired by glibc, there is a few instructions window
for the thread (or process) to die there, leaving the futex hung.  To protect
against this possibility, userspace (glibc) also maintains a simple per-thread
'list_op_pending' field, to allow the kernel to clean up if the thread dies
after acquiring the lock, but just before it could have added itself to the
list.  Glibc sets this list_op_pending field before it tries to acquire the
futex, and clears it after the list-add (or list-remove) has finished.

That's all that is needed - all the rest of robust-futex cleanup is done in
userspace [just like with the previous patches].

Ulrich Drepper has implemented the necessary glibc support for this new
mechanism, which fully enables robust mutexes.  (Ulrich plans to commit these
changes to glibc-HEAD later today.)

Key differences of this userspace-list based approach, compared to the vma
based method:

 - it's much, much faster: at thread exit time, there's no need to loop
   over every vma (!), which the VM-based method has to do.  Only a very
   simple 'is the list empty' op is done.

 - no VM changes are needed - 'struct address_space' is left alone.

 - no registration of individual locks is needed: robust mutexes dont need
   any extra per-lock syscalls.  Robust mutexes thus become a very lightweight
   primitive - so they dont force the application designer to do a hard choice
   between performance and robustness - robust mutexes are just as fast.

 - no per-lock kernel allocation happens.

 - no resource limits are needed.

 - no kernel-space recovery call (FUTEX_RECOVER) is needed.

 - the implementation and the locking is "obvious", and there are no
   interactions with the VM.

  Performance
  -----------

I have benchmarked the time needed for the kernel to process a list of 1
million (!) held locks, using the new method [on a 2GHz CPU]:

 - with FUTEX_WAIT set [contended mutex]: 130 msecs
 - without FUTEX_WAIT set [uncontended mutex]: 30 msecs

I have also measured an approach where glibc does the lock notification [which
it currently does for !pshared robust mutexes], and that took 256 msecs -
clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do.

(1 million held locks are unheard of - we expect at most a handful of locks to
be held at a time.  Nevertheless it's nice to know that this approach scales
nicely.)

  Implementation details
  ----------------------

The patch adds two new syscalls: one to register the userspace list, and one
to query the registered list pointer:

 asmlinkage long
 sys_set_robust_list(struct robust_list_head __user *head,
                     size_t len);

 asmlinkage long
 sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
                     size_t __user *len_ptr);

List registration is very fast: the pointer is simply stored in
current->robust_list.  [Note that in the future, if robust futexes become
widespread, we could extend sys_clone() to register a robust-list head for new
threads, without the need of another syscall.]

So there is virtually zero overhead for tasks not using robust futexes, and
even for robust futex users, there is only one extra syscall per thread
lifetime, and the cleanup operation, if it happens, is fast and
straightforward.  The kernel doesnt have any internal distinction between
robust and normal futexes.

If a futex is found to be held at exit time, the kernel sets the highest bit
of the futex word:

	#define FUTEX_OWNER_DIED        0x40000000

and wakes up the next futex waiter (if any). User-space does the rest of
the cleanup.

Otherwise, robust futexes are acquired by glibc by putting the TID into the
futex field atomically.  Waiters set the FUTEX_WAITERS bit:

	#define FUTEX_WAITERS           0x80000000

and the remaining bits are for the TID.

  Testing, architecture support
  -----------------------------

I've tested the new syscalls on x86 and x86_64, and have made sure the parsing
of the userspace list is robust [ ;-) ] even if the list is deliberately
corrupted.

i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the
new glibc code (on x86_64 and i386), and it works for his robust-mutex
testcases.

All other architectures should build just fine too - but they wont have the
new syscalls yet.

Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline
function before writing up the syscalls (that function returns -ENOSYS right
now).

This patch:

Add placeholder futex_atomic_cmpxchg_inuser() implementations to every
architecture that supports futexes.  It returns -ENOSYS.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
62ac285f3c [PATCH] mips: add ptr_to_compat()
Add ptr_to_compat() - needed by the new robust futex code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00