Commit Graph

121 Commits

Author SHA1 Message Date
Christoph Hellwig
6b2f3d1f76 vfs: Implement proper O_SYNC semantics
While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag.  To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.

This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.

We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.

Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op.  We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:50 +01:00
Linus Torvalds
4ef58d4e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
  tree-wide: fix misspelling of "definition" in comments
  reiserfs: fix misspelling of "journaled"
  doc: Fix a typo in slub.txt.
  inotify: remove superfluous return code check
  hdlc: spelling fix in find_pvc() comment
  doc: fix regulator docs cut-and-pasteism
  mtd: Fix comment in Kconfig
  doc: Fix IRQ chip docs
  tree-wide: fix assorted typos all over the place
  drivers/ata/libata-sff.c: comment spelling fixes
  fix typos/grammos in Documentation/edac.txt
  sysctl: add missing comments
  fs/debugfs/inode.c: fix comment typos
  sgivwfb: Make use of ARRAY_SIZE.
  sky2: fix sky2_link_down copy/paste comment error
  tree-wide: fix typos "couter" -> "counter"
  tree-wide: fix typos "offest" -> "offset"
  fix kerneldoc for set_irq_msi()
  spidev: fix double "of of" in comment
  comment typo fix: sybsystem -> subsystem
  ...
2009-12-09 19:43:33 -08:00
André Goddard Rosa
af901ca181 tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:55 +01:00
Frederic Weisbecker
205153aa40 mem_class: Drop the bkl from memory_open()
The generic open callback for the mem class devices is "protected" by
the bkl.

Let's look at the datas manipulated inside memory_open:

- inode and file: safe
- the devlist: safe because it is constant
- the memdev classes inside this array are safe too (constant)

After we find out which memdev file operation we need to use, we call
its open callback. Depending on the targeted memdev, we call either
open_port() that doesn't manipulate any racy data (just a capable()
check), or we call nothing.

So it's safe to remove the big kernel lock there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1255113062-5835-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-14 17:36:49 +02:00
Alexey Dobriyan
f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Nikanth Karthikesan
bb521c5de0 /dev/zero: avoid repeated access_ok() checks
In read_zero, we check for access_ok() once for the count bytes.  It is
unnecessarily checked again in clear_user.  Use __clear_user, which does
not check for access_ok().

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:03 -07:00
Kay Sievers
e454cea20b Driver-Core: extend devnode callbacks to provide permissions
This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.

This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-19 12:50:38 -07:00
Jin Dongming
162dd42124 mem_class: fix bug
When I build and boot -next on fedora 10, I can not login anymore.
When I input the user name and password, the system does not output
any message and requires user to input the user name and password
again and again.

I find the patch which caused this problem with "GIT BISECT" command.
And the patch is
    commit 7c4b7daa1878972ed0137c95f23569124bd6e2b1
    "mem_class: use minor as index instead of searching the array".

Though I don't know the real reason why user could not login, I
confirmed the patch I made as following could resolve the problem on
fedora 10.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15 09:50:47 -07:00
Kay Sievers
389e0cb9a1 mem_class: use minor as index instead of searching the array
Declare the device list with the minor numbers as the index, which saves us from
searching for a matching list entry. Remove old devfs permissions declaration.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15 09:50:47 -07:00
Jens Axboe
d993831fa7 writeback: add name to backing_dev_info
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
Adriano dos Santos Fernandes
d6f47befdd drivers/char/mem.c: memory_open() cleanup: lookup minor device number from devlist
memory_open() ignores devlist and does a switch for each item, duplicating
code and conditional definitions.

Clean it up by adding backing_dev_info to devlist and use it to lookup for
the minor device.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Adriano dos Santos Fernandes <adrianosf@uol.com.br>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:54 -07:00
Linus Torvalds
2b83868723 Make /dev/zero reads interruptible by signals
This helps with bad latencies for large reads from /dev/zero, but might
conceivably break some application that "knows" that a read of /dev/zero
cannot return early.  So do this early in the merge window to give us
maximal test coverage, even if the patch is totally trivial.

Obviously, no well-behaved application should ever depend on the read
being uninterruptible, but hey, bugs happen.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-09 20:40:25 -07:00
Salman Qazi
730c586ad5 drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero
While running 20 parallel instances of dd as follows:

  #!/bin/bash
  for i in `seq 1 20`; do
           dd if=/dev/zero of=/export/hda3/dd_$i bs=1073741824 count=1 &
  done
  wait

on a 16G machine, we noticed that rather than just killing the processes,
the entire kernel went down.  Stracing dd reveals that it first does an
mmap2, which makes 1GB worth of zero page mappings.  Then it performs a
read on those pages from /dev/zero, and finally it performs a write.

The machine died during the reads.  Looking at the code, it was noticed
that /dev/zero's read operation had been changed by
557ed1fa26 ("remove ZERO_PAGE") from giving
zero page mappings to actually zeroing the page.

The zeroing of the pages causes physical pages to be allocated to the
process.  But, when the process exhausts all the memory that it can, the
kernel cannot kill it, as it is still in the kernel mode allocating more
memory.  Consequently, the kernel eventually crashes.

To fix this, I propose that when a fatal signal is pending during
/dev/zero read operation, we simply return and let the user process die.

Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Modified error return and comment trivially.  - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-04 15:20:39 -07:00
Suresh Siddha
0c3c8a1836 x86, PAT: Remove duplicate memtype reserve in devmem mmap
/dev/mem mmap code was doing memtype reserve/free for a while now.
Recently we added memtype tracking in remap_pfn_range, and /dev/mem mmap
uses it indirectly. So, we don't need seperate tracking in /dev/mem code
any more. That means another ~100 lines of code removed :-).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <20090409212709.085210000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10 13:55:48 +02:00
KOSAKI Motohiro
69beeb1d34 mm: make vread() and vwrite() declaration
Sparse output following warnings.

mm/vmalloc.c:1436:6: warning: symbol 'vread' was not declared. Should it be static?
mm/vmalloc.c:1474:6: warning: symbol 'vwrite' was not declared. Should it be static?

However, it is used by /dev/kmem. fixed here.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:05 -08:00
Greg Kroah-Hartman
03457cd455 device create: char: convert device_create_drvdata to device_create
Now that device_create() has been audited, rename things back to the
original call to be sane.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16 09:24:42 -07:00
Rik van Riel
7ae8ed5053 use generic_access_phys for /dev/mem mappings
Use generic_access_phys as the access_process_vm access function for
/dev/mem mappings.  This makes it possible to debug the X server.

[akpm@linux-foundation.org: repair all the architectures which broke]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Greg Kroah-Hartman
47aa5793f7 device create: char: convert device_create to device_create_drvdata
device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:41 -07:00
Ingo Molnar
d092633bff Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM
From: Arjan van de Ven <arjan@infradead.org>
Date: Sat, 19 Jul 2008 15:47:17 -0700

CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it
to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not
support this feature; this patch renames it to CONFIG_STRICT_DEVMEM,
so that architectures can opt-in into it.

( the polarity of the option is still the same as it was originally; it
  needs to be for now to not break architectures that don't have the
  infastructure yet to support this feature)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: "V.Radhakrishnan" <rk@atr-labs.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
2008-07-20 08:35:55 +02:00
Ingo Molnar
64d206d896 x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM
Linus observed:

> The real bug is that we shouldn't have "double negatives", and
> certainly not negative config options. Making that "promiscuous
> /dev/mem" option a negated thing as a config option was bad.

right ... lets rename this option. There should never be a negation
in config options.

[ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that
  is for another commit ;-) ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 00:28:57 +02:00
Jonathan Corbet
1f439647a4 mem: cdev lock_kernel() pushdown
It's really hard to tell if this is necessary - lots of weird
magic happens by way of map_devmem()

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:48 -06:00
Arjan van de Ven
b781ecb6a3 make /dev/kmem a config option
Make /dev/kmem a config option; /dev/kmem is VERY rarely used, and when
used, it's generally for no good (rootkits tend to be the most common
users).  With this config option, users have the choice to disable
/dev/kmem, saving some size as well.

A patch to disable /dev/kmem has been in the Fedora and RHEL kernels for
4+ years now without any known problems or legit users of /dev/kmem.

[akpm@linux-foundation.org: make CONFIG_DEVKMEM default to y]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
venkatesh.pallipadi@intel.com
e7f260a276 x86: PAT use reserve free memtype in mmap of /dev/mem
Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
f0970c13b6 x86: PAT phys_mem_access_prot_allowed for dev/mem mmap
Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.

x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
venkatesh.pallipadi@intel.com
e045fb2a98 x86: PAT avoid aliasing in /dev/mem read/write
Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
Venki Pallipadi
e2beb3eae6 devmem: add range_is_allowed() check to mmap of /dev/mem
Earlier patch that introduced CONFIG_NONPROMISC_DEVMEM, did the
range_is_allowed() check only for read and write. Add range_is_allowed()
check to mmap of /dev/mem as well.

Changes the paramaters of range_is_allowed() to pfn and size to handle
more than 32 bits of physical address on 32 bit arch cleanly.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-24 23:40:47 +02:00
Arjan van de Ven
ae531c26c5 x86: introduce /dev/mem restrictions with a config option
This patch introduces a restriction on /dev/mem: Only non-memory can be
read or written unless the newly introduced config option is set.

The X server needs access to /dev/mem for the PCI space, but it doesn't need
access to memory; both the file permissions and SELinux permissions of /dev/mem
just make X effectively super-super powerful. With the exception of the
BIOS area, there's just no valid app that uses /dev/mem on actual memory.
Other popular users of /dev/mem are rootkits and the like.
(note: mmap access of memory via /dev/mem was already not allowed since
a really long time)

People who want to use /dev/mem for kernel debugging can enable the config
option.

The restrictions of this patch have been in the Fedora and RHEL kernels for
at least 4 years without any problems.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-24 23:40:47 +02:00
Al Viro
ca5cd877ae x86 merge fallout: uml
Don't undef __i386__/__x86_64__ in uml anymore, make sure that (few) places
that required adjusting the ifdefs got those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-29 07:41:32 -07:00
Peter Zijlstra
e0bf68ddec mm: bdi init hooks
provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Nick Piggin
557ed1fa26 remove ZERO_PAGE
The commit b5810039a5 contains the note

  A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
  (and thus mapcounted and count towards shared rss).  These writes to
  the struct page could cause excessive cacheline bouncing on big
  systems.  There are a number of ways this could be addressed if it is
  an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
much use to them except on benchmarks.  All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
Linus Torvalds
0f166396e7 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (62 commits)
  [MIPS] PNX8550: Cleanup proc code.
  [MIPS] WRPPMC: Fix build.
  [MIPS] Yosemite: Fix modpost warnings.
  [MIPS] Change names of local variables to silence sparse
  [MIPS] SB1: Fix modpost warning.
  [MIPS] PNX: Fix modpost warnings.
  [MIPS] Alchemy: Fix modpost warnings.
  [MIPS] Non-FPAFF: Fix warning.
  [MIPS] DEC: Fix modpost warning.
  [MIPS] MIPSsim: Enable MIPSsim virtual network driver.
  [MIPS] Delete Ocelot 3 support.
  [MIPS] remove LASAT Networks platforms support
  [MIPS] Early check for SMTC kernel on non-MT processor
  [MIPS] Add debugfs files to show fpuemu statistics
  [MIPS] Add some debugfs files to debug unaligned accesses
  [MIPS] rbtx4938: Fix secondary PCIC and glue internal NICs
  [MIPS] tc35815: Load MAC address via platform_device
  [MIPS] Move FPU affinity code into separate file.
  [MIPS] Make ioremap() work on TX39/49 special unmapped segment
  [MIPS] rbtx4938: Update and minimize defconfig
  ...
2007-07-10 14:48:43 -07:00
Ralf Baechle
24e9d0b96d [MIPS] Hook for platforms to define cachability of /dev/mem regions
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-07-10 17:32:56 +01:00
Jens Axboe
d6b29d7cee splice: divorce the splice structure/function definitions from the pipe header
We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Russell King
4f911d64e0 Make /dev/port conditional on config symbol
Instead of having /dev/port support dependent in multiple places on a
string of preprocessor symbols, define a new configuration directive for
it.  This ensures that all four places remain consistent with each other.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Randy Dunlap
e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Benjamin Herrenschmidt
8a93258ce3 fix bogon in /dev/mem mmap'ing on nommu
While digging through my MAP_FIXED changes, I found that rather obvious
bug in /dev/mem mmap implementation for nommu archs. get_unmapped_area()
is expected to return an address, not a pfn.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-17 16:36:27 -07:00
Linus Torvalds
6d3154cc11 Revert "[PATCH] Fix up mmap_kmem"
This reverts commit 99a10a60ba.

As per Hugh Dickins:

  "Nadia Derbey has reported that mmap of /dev/kmem no longer works with
   the kernel virtual address as offset, and Franck has confirmed that
   his patch came from a misunderstanding of what an offset means to
   /dev/kmem - whereas his patch description seems to say that he was
   correcting the offset on a few plaforms, there was no such problem to
   correct, and his patch was in fact changing its API on all platforms."

Suggested-by: Hugh Dickins <hugh@veritas.com>
Cc: Franck Bui-Huu <fbuihuu@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Andi Kleen <ak@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-22 08:53:24 -08:00
Hugh Dickins
5fcf7bb73f [PATCH] read_zero_pagealigned() locking fix
Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
bugzilla 7645.  Right: read_zero_pagealigned uses down_read of mmap_sem,
but another thread's racing read of /dev/zero, or a normal fault, can
easily set that pte again, in between zap_page_range and zeromap_page_range
getting there.  It's been wrong ever since 2.4.3.

The simple fix is to use down_write instead, but that would serialize reads
of /dev/zero more than at present: perhaps some app would be badly
affected.  So instead let zeromap_page_range return the error instead of
BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
that case - there's no need to optimize for it.

Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
zeromap_page_range), though it really isn't interesting there.  And since
mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
than -ENOMEM.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Ramiro Voicu: <Ramiro.Voicu@cern.ch>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:55:39 -08:00
Josef Sipek
a7113a9662 [PATCH] struct path: convert char-drivers
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:44 -08:00
Greg Kroah-Hartman
ebf644c462 Driver core: change mem class_devices to be real devices
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-01 14:52:00 -08:00
Linus Torvalds
b8a3ad5b53 Include proper header file for PFN_DOWN()
The recent commit (99a10a60ba) to fix up
mmap_kmem() broke compiles because it used PFN_DOWN() without including
<linux/pfn.h>.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-13 08:42:10 -07:00
Franck Bui-Huu
99a10a60ba [PATCH] Fix up mmap_kmem
vma->vm_pgoff is an pfn _offset_ relatif to the begining
of the memory start. The previous code was doing at first:

	vma->vm_pgoff << PAGE_SHIFT

which results into a wrong physical address since some
platforms have a physical mem start that can be different
from 0. After that the previous call __pa() on this
wrong physical address, however __pa() is used to convert
a _virtual_ address into a physical one.

This patch rewrites this convertion. It calculates the
pfn of PAGE_OFFSET which is the pfn of the mem start
then it adds the vma->vm_pgoff to it.

It also uses virt_to_phys() instead of __pa() since the
latter shouldn't be used by drivers.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-13 08:35:38 -07:00
Geoff Levand
153dcc54df [PATCH] mem driver: fix conditional on isa i/o support
This change corrects the logic on the preprocessor conditionals that
include support for ISA port i/o (/dev/ioports) into the mem character
driver.

This fixes the following error when building for powerpc platforms with
CONFIG_PCI=n.

  drivers/built-in.o: undefined reference to `pci_io_base'

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Linas Vepstas <lins@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:22 -07:00
David Howells
5da6185bca [PATCH] NOMMU: Set BDI capabilities for /dev/mem and /dev/kmem
Set the backing device info capabilities for /dev/mem and /dev/kmem to
permit direct sharing under no-MMU conditions and full mapping capabilities
under MMU conditions.  Make the BDI used by these available to all directly
mappable character devices.

Also comment the capabilities for /dev/zero.

[akpm@osdl.org: ifdef reductions]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:14 -07:00
Lennert Buytenhek
06c67befee [PATCH] make valid_mmap_phys_addr_range() take a pfn
Newer ARMs have a 40 bit physical address space, but mapping physical
memory above 4G needs a special page table format which we (currently?) do
not use for userspace mappings, so what happens instead is that mapping an
address >= 4G will happily discard the upper bits and wrap.

There is a valid_mmap_phys_addr_range() arch hook where we could check for
>= 4G addresses and deny the mapping, but this hook takes an unsigned long
address:

	static inline int valid_mmap_phys_addr_range(unsigned long addr, size_t size);

And drivers/char/mem.c:mmap_mem() calls it like this:

	static int mmap_mem(struct file * file, struct vm_area_struct * vma)
	{
		size_t size = vma->vm_end - vma->vm_start;

		if (!valid_mmap_phys_addr_range(vma->vm_pgoff << PAGE_SHIFT, size))

So that's not much help either.

This patch makes the hook take a pfn instead of a phys address.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:25 -07:00
Arjan van de Ven
62322d2554 [PATCH] make more file_operation structs static
Mark the static struct file_operations in drivers/char as const.  Making
them const prevents accidental bugs, and moves them to the .rodata section
so that they no longer do any false sharing; in addition with the proper
debug option they are then protected against corruption..

[akpm@osdl.org: build fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 15:26:59 -07:00
Jörn Engel
6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Greg Kroah-Hartman
ff23eca3e8 [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
Also fixes up all files that #include it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman
7c69ef7974 [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
Removes the devfs_mk_cdev() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:07 -07:00
Jens Axboe
1ebd32fc54 [PATCH] splice: add ->splice_write support for /dev/null
Useful for testing.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 14:40:08 +02:00
Arjan van de Ven
99ac48f54a [PATCH] mark f_ops const in the inode
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
Bjorn Helgaas
136939a2b5 [PATCH] EFI, /dev/mem: simplify efi_mem_attribute_range()
Pass the size, not a pointer to the size, to efi_mem_attribute_range().

This function validates memory regions for the /dev/mem read/write/mmap paths.
The pointer allows arches to reduce the size of the range, but I think that's
unnecessary complexity.  Simplifying it will let me use
efi_mem_attribute_range() to improve the ia64 ioremap() implementation.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Jan Beulich
c654d60e8f [PATCH] adjust /dev/{kmem,mem,port} write handlers
The /dev/mem and /dev/kmem write handlers weren't fully POSIX compliant in
that they wouldn't always force the file pointer to be updated when
returning success status.

The /dev/port write handler was inconsistent with the /dev/mem and
/dev/kmem handlers in that when encountering a -EFAULT condition after
already having written a number of items it would return -EFAULT rather
than the number of bytes written.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:55 -08:00
Stephen Rothwell
ee2cdecec4 [PATCH] powerpc: iSeries fixes for build with no PCI
This reverts part of "ppc64 iSeries: allow build with no PCI"
(145d01e428) which affected generic code
and applies a fix in the arch specific code.

Commit "partly merge iseries do_IRQ"
(5fee9b3b39eb55c7e3619a3b36ceeabffeb8f144) introduced iSeries_get_irq
which was only available if CONFIG_PCI is set.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-12 20:09:30 +11:00
Jes Sorensen
1b1dcc1b57 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar <mingo@elte.hu>

(finished the conversion)

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2006-01-09 15:59:24 -08:00
Bjorn Helgaas
80851ef2a5 [PATCH] /dev/mem: validate mmap requests
Add a hook so architectures can validate /dev/mem mmap requests.

This is analogous to validation we already perform in the read/write
paths.

The identity mapping scheme used on ia64 requires that each 16MB or
64MB granule be accessed with exactly one attribute (write-back or
uncacheable).  This avoids "attribute aliasing", which can cause a
machine check.

Sample problem scenario:
  - Machine supports VGA, so it has uncacheable (UC) MMIO at 640K-768K
  - efi_memmap_init() discards any write-back (WB) memory in the first granule
  - Application (e.g., "hwinfo") mmaps /dev/mem, offset 0
  - hwinfo receives UC mapping (the default, since memmap says "no WB here")
  - Machine check abort (on chipsets that don't support UC access to WB
    memory, e.g., sx1000)

In the scenario above, the only choices are
  - Use WB for hwinfo mmap.  Can't do this because it causes attribute
    aliasing with the UC mapping for the VGA MMIO space.
  - Use UC for hwinfo mmap.  Can't do this because the chipset may not
    support UC for that region.
  - Disallow the hwinfo mmap with -EINVAL.  That's what this patch does.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Bjorn Helgaas
44ac841390 [PATCH] /dev/mem __HAVE_PHYS_MEM_ACCESS_PROT tidy-up
Tidy up __HAVE_PHYS_MEM_ACCESS_PROT usage to make mmap_mem() easier to read.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Guillaume Chazarain
cd140a5c1f [PATCH] kmsg_write: don't return printk return value
kmsg_write returns with printk, so some programs may be confused by a
successful write() with a return value different than the buffer length.

# /bin/echo something > /dev/kmsg
/bin/echo: write error: Inappropriate ioctl for device

The drawbacks is that the printk return value can no more be quickly
checked from userspace.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:53 -08:00
Linus Torvalds
6aab341e0a mm: re-architect the VM_UNPAGED logic
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.

Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way.  It just works.  As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.

Sparc update from David in the next commit.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:34:23 -08:00
Hugh Dickins
f57e88a8d8 [PATCH] unpaged: ZERO_PAGE in VM_UNPAGED
It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas,
let's not insert the ZERO_PAGE there - though whether it would matter will
depend on what we decide about ZERO_PAGE refcounting.

But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED
area, do_no_page should never be: just BUG_ON.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:13:42 -08:00
Paul Mackerras
23fd07750a Merge ../linux-2.6 by hand 2005-10-31 13:37:12 +11:00
Roland Dreier
8b150478ae [PATCH] ppc: make phys_mem_access_prot() work with pfns instead of addresses
Change the phys_mem_access_prot() function to take a pfn instead of an
address.  This allows mmap64() to work on /dev/mem for addresses above 4G
on 32-bit architectures.  We start with a pfn in mmap_mem(), so there's no
need to convert to an address; in fact, it's actively bad, since the
conversion can overflow when the address is above 4G.

Similarly fix the ppc32 page_is_ram() function to avoid a conversion to an
address by directly comparing to max_pfn.  Working with max_pfn instead of
high_memory fixes page_is_ram() to give the right answer for highmem pages.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-10-29 14:25:49 +10:00
Greg Kroah-Hartman
53f4654272 [PATCH] Driver Core: fix up all callers of class_device_create()
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create().  This patch
fixes up all in-kernel users of the function.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28 09:52:52 -07:00
Christoph Hellwig
edf83015fc [PATCH] remove a dead extern in mem.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:23 -07:00
Linus Torvalds
4bb82551e1 Fix up mmap of /dev/kmem
This leaves the issue of whether we should deprecate the whole thing (or
if we should check the whole mmap range, for that matter) open. Just do
the minimal fix for now.
2005-08-13 14:22:59 -07:00
Maneesh Soni
72414d3f1d [PATCH] kexec code cleanup
o Following patch provides purely cosmetic changes and corrects CodingStyle
  guide lines related certain issues like below in kexec related files

  o braces for one line "if" statements, "for" loops,
  o more than 80 column wide lines,
  o No space after "while", "for" and "switch" key words

o Changes:
  o take-2: Removed the extra tab before "case" key words.
  o take-3: Put operator at the end of line and space before "*/"

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:55 -07:00
Vivek Goyal
315c215c0a [PATCH] kdump: cleanups for dump file access in linear raw format
Removed the dependency on backup region.  Now all the information is encoded
in ELF format.  /dev/oldmem is a dummy interface.  User space tool need to be
intelligent enough to parse the elf headers and read the relevant memory areas
with the help of /dev/oldmem.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:54 -07:00
Vivek Goyal
50b1fdbd81 [PATCH] kdump: Accessing dump file in linear raw format (/dev/oldmem)
Hariprasad Nellitheertha <hari@in.ibm.com>

This patch contains the code that enables us to access the previous kernel's
memory as /dev/oldmem.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 16:24:54 -07:00
Stephen Rothwell
145d01e428 [PATCH] ppc64 iSeries: allow build with no PCI
This patch allows iSeries to build with CONFIG_PCI=n.  This is useful for
partitions that have only virtual I/O.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:31 -07:00
gregkh@suse.de
ca8eca6884 [PATCH] class: convert drivers/char/* to use the new class api instead of class_simple
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:08 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00