Commit Graph

1309 Commits

Author SHA1 Message Date
Pekka Enberg
ad2c1604da [PATCH] fat: Remove duplicate directory scanning code
This patch removes duplicate directory scanning code from fs/fat/dir.c.  The
two functions that share identical code are fat_readdirx() and
fat_search_long().  This patch also renames fat_readdirx to __fat_readdir().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:32 -08:00
OGAWA Hirofumi
9131dd4256 [PATCH] fat: remove the unneeded vfat_find() in vfat_rename()
Now, vfat_rename() is using vfat_find() for sanity check.  This removes that
sanity check, the cost of sanity check is too high.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:32 -08:00
OGAWA Hirofumi
451cbaa1c3 [PATCH] fat: cleanup and optimization of checksum
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:32 -08:00
Tim Schmielau
4e57b68178 [PATCH] fix missing includes
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.

In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch.  This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other.  So if any
hunk rejects or gets in the way of other patches, just drop it.  My scripts
will pick it up again in the next round.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:32 -08:00
Andrew Morton
a3e713b5fd [PATCH] __bread oops fix
If a filesystem passes an idiotic blocksize into bread(), __getblk_slow() will
warn and will return NULL.  We have a report (from Hubert Tonneau
<hubert.tonneau@fullpliant.org>) of isofs_fill_super() doing this (passing in
a silly block size) against an unplugged CDROM drive.

But a couple of __getblk_slow() callers forgot to check for the NULL bh, hence
oops.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:27 -08:00
Alexey Dobriyan
b3099b48da [PATCH] fs/attr.c: remove BUG()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:27 -08:00
Glauber de Oliveira Costa
2973dfdb87 [PATCH] Test for sb_getblk return value
This patch adds tests for the return value of sb_getblk() in the ext2/3
filesystems.  In fs/buffer.c it is stated that the getblk() function never
fails.  However, it does can return NULL in some situations due to I/O
errors, which may lead us to NULL pointer dereferences

Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:26 -08:00
Andrea Arcangeli
7f04c26d71 [PATCH] fix nr_unused accounting, and avoid recursing in iput with I_WILL_FREE set
list_move(&inode->i_list, &inode_in_use);
 		} else {
 			list_move(&inode->i_list, &inode_unused);
+			inodes_stat.nr_unused++;
 		}
 	}
 	wake_up_inode(inode);

Are you sure the above diff is correct? It was added somewhere between
2.6.5 and 2.6.8. I think it's wrong.

The only way I can imagine the i_count to be zero in the above path, is
that I_WILL_FREE is set.  And if I_WILL_FREE is set, then we must not
increase nr_unused.  So I believe the above change is buggy and it will
definitely overstate the number of unused inodes and it should be backed
out.

Note that __writeback_single_inode before calling __sync_single_inode, can
drop the spinlock and we can have both the dirty and locked bitflags clear
here:

		spin_unlock(&inode_lock);
		__wait_on_inode(inode);
		iput(inode);
XXXXXXX
		spin_lock(&inode_lock);
	}
	use inode again here

a construct like the above makes zero sense from a reference counting
standpoint.

Either we don't ever use the inode again after the iput, or the
inode_lock should be taken _before_ executing the iput (i.e. a __iput
would be required). Taking the inode_lock after iput means the iget was
useless if we keep using the inode after the iput.

So the only chance the 2.6 was safe to call __writeback_single_inode
with the i_count == 0, is that I_WILL_FREE is set (I_WILL_FREE will
prevent the VM to free the inode in XXXXX).

Potentially calling the above iput with I_WILL_FREE was also wrong
because it would recurse in iput_final (the second mainline bug).

The below (untested) patch fixes the nr_unused accounting, avoids recursing
in iput when I_WILL_FREE is set and makes sure (with the BUG_ON) that we
don't corrupt memory and that all holders that don't set I_WILL_FREE, keeps
a reference on the inode!

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:26 -08:00
Ben Dooks
381be25458 [PATCH] ext3: sparse fixes
Fix warnings from sparse due to un-declared functions that should either
have a header file or have been declared static

 fs/ext2/bitmap.c:14:15: warning: symbol 'ext2_count_free' was not declared. Should it be static?
 fs/ext2/namei.c:92:15: warning: symbol 'ext2_get_parent' was not declared. Should it be static?
 fs/ext3/bitmap.c:15:15: warning: symbol 'ext3_count_free' was not declared. Should it be static?
 fs/ext3/namei.c:1013:15: warning: symbol 'ext3_get_parent' was not declared. Should it be static?
 fs/ext3/xattr.c:214:1: warning: symbol 'ext3_xattr_block_get' was not declared. Should it be static?
 fs/ext3/xattr.c:358:1: warning: symbol 'ext3_xattr_block_list' was not declared. Should it be static?
 fs/ext3/xattr.c:630:1: warning: symbol 'ext3_xattr_block_find' was not declared. Should it be static?
 fs/ext3/xattr.c:863:1: warning: symbol 'ext3_xattr_ibody_find' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:25 -08:00
Oleg Nesterov
1291cf4163 [PATCH] fix de_thread() vs do_coredump() deadlock
de_thread() sends SIGKILL to all sub-threads and waits them to die in 'D'
state.  It is possible that one of the threads already dequeued coredump
signal.  When de_thread() unlocks ->sighand->lock that thread can enter
do_coredump()->coredump_wait() and cause a deadlock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:25 -08:00
Miklos Szeredi
1779381dea [PATCH] fuse: spelling fixes
Correct some typos and inconsistent use of "initialise" vs "initialize" in
comments.  Reported by Ioannis Barkas.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:24 -08:00
Glauber de Oliveira Costa
5b11687924 [PATCH] Locking problems while EXT3FS_DEBUG on
I noticed some problems while running ext3 with the debug flag set on.
More precisely, I was unable to umount the filesystem.  Some investigation
took me to the patch that follows.

At a first glance , the lock/unlock I've taken out seems really not
necessary, as the main code (outside debug) does not lock the super.  The
only additional danger operations that debug code introduces seems to be
related to bitmap, but bitmap operations tends to be all atomic anyway.

I also took the opportunity to fix 2 spelling errors.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:23 -08:00
Oleg Nesterov
2384f55f8a [PATCH] coredump_wait() cleanup
This patch deletes pointless code from coredump_wait().

1. It does useless mm->core_waiters inc/dec under mm->mmap_sem,
   but any changes to ->core_waiters have no effect until we drop
   ->mmap_sem.

2. It calls yield() for absolutely unknown reason.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:23 -08:00
Kirill Korotaev
e954365971 [PATCH] proc: fix of error path in proc_get_inode()
This patch fixes incorrect error path in proc_get_inode(), when module
can't be get due to being unloaded.  When try_module_get() fails, this
function puts de(!) and still returns inode with non-getted de.

There are still unresolved known bugs in proc yet to be fixed:
- proc_dir_entry tree is managed without any serialization
- create_proc_entry() doesn't setup de->owner anyhow,
   so setting it later manually is inatomic.
- looks like almost all modules do not care whether
   it's de->owner is set...

Signed-Off-By: Denis Lunev <den@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:21 -08:00
Miklos Szeredi
f12ec44070 [PATCH] fuse: clean up dead code related to nfs exporting
Remove last remains of NFS exportability support.

The code is actually buggy (as reported by Akshat Aranya), since 'alias'
will be leaked if it's non-null and alias->d_flags has DCACHE_DISCONNECTED.

This is not an active bug, since there will never be any disconnected
dentries.  But it's better to get rid of the unnecessary complexity anyway.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:21 -08:00
Oleg Nesterov
932aeafbe8 [PATCH] fix de_thread vs it_real_fn() deadlock
de_thread() calls del_timer_sync(->real_timer) under ->sighand->siglock.
This is deadlockable, it_real_fn sends a signal and needs this lock too.

Also, delete unneeded ->real_timer.data assignment.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:19 -08:00
Eric Dumazet
2f51201662 [PATCH] reduce sizeof(struct file)
Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
file_free_rcu) time, to reduce the size of this critical kernel object.

The trick I used is to move f_rcuhead and f_list in an union called f_u

The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
becomes f_u.f_list

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:19 -08:00
Miklos Szeredi
42e50a5a69 [PATCH] open: cleanup in lookup_flags()
lookup_flags() is only called from the non-create case, so it needn't check
for O_CREAT|O_EXCL.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:18 -08:00
Eric W. Biederman
a928972864 [PATCH] Don't uselessly export task_struct to userspace in core dumps
task_struct is an internal structure to the kernel with a lot of good
information, that is probably interesting in core dumps.  However there is
no way for user space to know what format that information is in making it
useless.

I grepped the GDB 6.3 source code and NT_TASKSTRUCT while defined is not
used anywhere else.  So I would be surprised if anyone notices it is
missing.

In addition exporting kernel pointers to all the interesting kernel data
structures sounds like the very definition of an information leak.  I
haven't a clue what someone with evil intentions could do with that
information, but in any attack against the kernel it looks like this is the
perfect tool for aiming that attack.

So since NT_TASKSTRUCT is useless as currently defined and is potentially
dangerous, let's just not export it.

(akpm: Daniel Jacobowitz <dan@debian.org> "would be amazed" if anything was
using NT_TASKSTRUCT).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:18 -08:00
Christoph Hellwig
9c0cbd54ce [PATCH] TIOC* compat ioctl handling
TIOCSTART and TIOCSTOP are defined in asm/ioctls.h and asm/termios.h by
various architectures but not actually implemented anywhere but in the IRIX
compatibility layer, so remove their COMPATIBLE_IOCTL from parisc, ppc64
and sparc64.

Move the TIOCSLTC COMPATIBLE_IOCTL to common code, guided by an ifdef to
only show up on architectures that support it (same as the code handling it
in tty_ioctl.c), aswell as it's brother TIOCGLTC that wasn't handled so
far.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:17 -08:00
Oleg Nesterov
9e4e23bccb [PATCH] little de_thread() cleanup
Trivial, saves one 'if' branch in de_thread().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:17 -08:00
James Lamanna
833d304b22 [PATCH] reiserfs: [kv]free() checking cleanup
Signed-off-by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:17 -08:00
Jan Kara
aaa4059bc2 [PATCH] ext3: Fix unmapped buffers in transaction's lists
Fix the problem (BUG 4964) with unmapped buffers in transaction's
t_sync_data list.  The problem is we need to call filesystem's own
invalidatepage() from block_write_full_page().

block_write_full_page() must call filesystem's invalidatepage().  Otherwise
following nasty race can happen:

   proc 1                                        proc 2
   ------                                        ------
- write some new data to 'offset'
  => bh gets to the transactions data list
                                              - starts truncate
                                                => i_size set to new size
- mpage_writepages()
  - ext3_ordered_writepage() to 'offset'
    - block_write_full_page()
      - page->index > end_index+1
        - block_invalidatepage()
          - discard_buffer()
            - clear_buffer_mapped()

- commit triggers and finds unmapped buffer - BOOM!

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:17 -08:00
James Morris
d381d8a9a0 [PATCH] SELinux: canonicalize getxattr()
This patch allows SELinux to canonicalize the value returned from
getxattr() via the security_inode_getsecurity() hook, which is called after
the fs level getxattr() function.

The purpose of this is to allow the in-core security context for an inode
to override the on-disk value.  This could happen in cases such as
upgrading a system to a different labeling form (e.g.  standard SELinux to
MLS) without needing to do a full relabel of the filesystem.

In such cases, we want getxattr() to return the canonical security context
that the kernel is using rather than what is stored on disk.

The implementation hooks into the inode_getsecurity(), adding another
parameter to indicate the result of the preceding fs-level getxattr() call,
so that SELinux knows whether to compare a value obtained from disk with
the kernel value.

We also now allow getxattr() to work for mountpoint labeled filesystems
(i.e.  mount with option context=foo_t), as we are able to return the
kernel value to the user.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:11 -08:00
Brian Gerst
0d078f6f96 [PATCH] CONFIG_IA32
Add CONFIG_X86_32 for i386.  This allows selecting options that only apply
to 32-bit systems.

(X86 && !X86_64) becomes X86_32
(X86 ||  X86_64) becomes X86

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:10 -08:00
Trond Myklebust
d3f8cf4899 [PATCH] NFS: Remove unbalanced spin_unlock() calls from nfs_refresh_inode()
Doh!

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 14:46:47 -08:00
Anton Altaparmakov
07b188ab77 Merge branch 'master' of /usr/src/ntfs-2.6/ 2005-10-30 21:00:04 +00:00
Adam Litke
2e9b367c22 [PATCH] hugetlb: overcommit accounting check
Basic overcommit checking for hugetlb_file_map() based on an implementation
used with demand faulting in SLES9.

Since demand faulting can't guarantee the availability of pages at mmap
time, this patch implements a basic sanity check to ensure that the number
of huge pages required to satisfy the mmap are currently available.
Despite the obvious race, I think it is a good start on doing proper
accounting.  I'd like to work towards an accounting system that mimics the
semantics of normal pages (especially for the MAP_PRIVATE/COW case).  That
work is underway and builds on what this patch starts.

Huge page shared memory segments are simpler and still maintain their
commit on shmget semantics.

Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:43 -07:00
Adam Litke
4c88726597 [PATCH] hugetlb: demand fault handler
Below is a patch to implement demand faulting for huge pages.  The main
motivation for changing from prefaulting to demand faulting is so that huge
page memory areas can be allocated according to NUMA policy.

Thanks to consolidated hugetlb code, switching the behavior requires changing
only one fault handler.  The bulk of the patch just moves the logic from
hugelb_prefault() to hugetlb_pte_fault() and find_get_huge_page().

Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:43 -07:00
Christoph Hellwig
0b1533f67c [PATCH] cleanup hugelbfs_forget_inode
Reformat hugelbfs_forget_inode and add the missing but harmless
write_inode_now call.  It looks the same as generic_forget_inode now except
for the call to truncate_hugepages instead of truncate_inode_pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:43 -07:00
Christoph Hellwig
6b09b9df05 [PATCH] kill hugelbfs_do_delete_inode
hugetlbfs_do_delete_inode is the same as generic_delete_inode now, so remove
it in favour of the latter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:43 -07:00
Christoph Hellwig
149f4211af [PATCH] hugetlbfs: clean up hugetlbfs_delete_inode
Make hugetlbfs looks the same as generic_detelte_inode, fixing a bunch of
missing updates to it at the same time.  Rename it to
hugetlbfs_do_delete_inode and add a real hugetlbfs_delete_inode that
implements ->delete_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:43 -07:00
Christoph Hellwig
96527980d4 [PATCH] hugetlbfs: move free_inodes accounting
Move hugetlbfs accounting into ->alloc_inode / ->destroy_inode.  This keeps
the code simpler, fixes a loeak where a failing inode allocation wouldn't
decrement the counter and moves hugetlbfs_delete_inode and
hugetlbfs_forget_inode closer to their generic counterparts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:43 -07:00
Hugh Dickins
4c21e2f244 [PATCH] mm: split page table lock
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.

This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock.  (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)

In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.

Splitting the lock is not quite for free: another cacheline access.  Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS.  But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.

There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:42 -07:00
Hugh Dickins
deceb6cd17 [PATCH] mm: follow_page with inner ptlock
Final step in pushing down common core's page_table_lock.  follow_page no
longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself;
and so no page_table_lock is taken in get_user_pages itself.

But get_user_pages (and get_futex_key) do then need follow_page to pin the
page for them: take Daniel's suggestion of bitflags to follow_page.

Need one for WRITE, another for TOUCH (it was the accessed flag before:
vanished along with check_user_page_readable, but surely get_numa_maps is
wrong to mark every page it finds as accessed), another for GET.

And another, ANON to dispose of untouched_anonymous_page: it seems silly for
that to descend a second time, let follow_page observe if there was no page
table and return ZERO_PAGE if so.  Fix minor bug in that: check VM_LOCKED -
make_pages_present ought to make readonly anonymous present.

Give get_numa_maps a cond_resched while we're there.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:41 -07:00
Hugh Dickins
508034a32b [PATCH] mm: unmap_vmas with inner ptlock
Remove the page_table_lock from around the calls to unmap_vmas, and replace
the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
now safe to descend without page_table_lock.

Don't attempt fancy locking for hugepages, just take page_table_lock in
unmap_hugepage_range.  Which makes zap_hugepage_range, and the hugetlb test in
zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway.  Nor
does unmap_vmas have much use for its mm arg now.

The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
page_table_lock: if they're implemented at all, they typically come down to
flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
(which we already audited for the mprotect case).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:41 -07:00
Hugh Dickins
705e87c0c3 [PATCH] mm: pte_offset_map_lock loops
Convert those common loops using page_table_lock on the outside and
pte_offset_map within to use just pte_offset_map_lock within instead.

These all hold mmap_sem (some exclusively, some not), so at no level can a
page table be whipped away from beneath them.  But whereas pte_alloc loops
tested with the "atomic" pmd_present, these loops are testing with pmd_none,
which on i386 PAE tests both lower and upper halves.

That's now unsafe, so add a cast into pmd_none to test only the vital lower
half: we lose a little sensitivity to a corrupt middle directory, but not
enough to worry about.  It appears that i386 and UML were the only
architectures vulnerable in this way, and pgd and pud no problem.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:40 -07:00
Hugh Dickins
c74df32c72 [PATCH] mm: ptd_alloc take ptlock
Second step in pushing down the page_table_lock.  Remove the temporary
bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
to hold page_table_lock, whether it's on init_mm or a user mm; take
page_table_lock internally to check if a racing task already allocated.

Convert their callers from common code.  But avoid coming back to change them
again later: instead of moving the spin_lock(&mm->page_table_lock) down,
switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
encapsulate the mapping+locking and unlocking+unmapping together, and in the
end may use alternatives to the mm page_table_lock itself.

These callers all hold mmap_sem (some exclusively, some not), so at no level
can a page table be whipped away from beneath them; and pte_alloc uses the
"atomic" pmd_present to test whether it needs to allocate.  It appears that on
all arches we can safely descend without page_table_lock.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:40 -07:00
Hugh Dickins
365e9c87a9 [PATCH] mm: update_hiwaters just in time
update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability.  Originally it was called whenever rss or
total_vm got raised.  Then many of those callsites were replaced by a timer
tick call from account_system_time.  Now Frank van Maarseveen reports that to
be found inadequate.  How about this?  Works for Frank.

Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm.  Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths.  Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit.  Handle
mm->hiwater_vm in the same way, though it's much less of an issue.  Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.

And there has been no collector of these hiwater statistics in the tree.  The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).

There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high.  A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.

What locking?  None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact.  But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:39 -07:00
Nick Piggin
b5810039a5 [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED
handling in mm/ to cover PageReserved functionality.

PageReserved special casing is removed from get_page and put_page.

All setting and clearing of PageReserved is retained, and it is now flagged
in the page_alloc checks to help ensure we don't introduce any refcount
based freeing of Reserved pages.

MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
deprecated.  We never completely handled it correctly anyway, and is be
reintroduced in future if required (Hugh has a proof of concept).

Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
be trivially removed.

Last real user of PageReserved is swsusp, which uses PageReserved to
determine whether a struct page points to valid memory or not.  This still
needs to be addressed (a generic page_is_ram() should work).

A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
thus mapcounted and count towards shared rss).  These writes to the struct
page could cause excessive cacheline bouncing on big systems.  There are a
number of ways this could be addressed if it is an issue.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Refcount bug fix for filemap_xip.c

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:39 -07:00
Hugh Dickins
4294621f41 [PATCH] mm: rss = file_rss + anon_rss
I was lazy when we added anon_rss, and chose to change as few places as
possible.  So currently each anonymous page has to be counted twice, in rss
and in anon_rss.  Which won't be so good if those are atomic counts in some
configurations.

Change that around: keep file_rss and anon_rss separately, and add them
together (with get_mm_rss macro) when the total is needed - reading two
atomics is much cheaper than updating two atomics.  And update anon_rss
upfront, typically in memory.c, not tucked away in page_add_anon_rmap.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:38 -07:00
Hugh Dickins
404351e67a [PATCH] mm: mm_init set_mm_counters
How is anon_rss initialized?  In dup_mmap, and by mm_alloc's memset; but
that's not so good if an mm_counter_t is a special type.  And how is rss
initialized?  By set_mm_counter, all over the place.  Come on, we just need to
initialize them both at once by set_mm_counter in mm_init (which follows the
memcpy when forking).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:38 -07:00
Andi Kleen
dfcd3c0dc4 [PATCH] Convert mempolicies to nodemask_t
The NUMA policy code predated nodemask_t so it used open coded bitmaps.
Convert everything to nodemask_t.  Big patch, but shouldn't have any actual
behaviour changes (except I removed one unnecessary check against
node_online_map and one unnecessary BUG_ON)

Signed-off-by: "Andi Kleen" <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 21:40:35 -07:00
Pete Zaitcev
c36fc889b5 [PATCH] usb: Patch for USBDEVFS_IOCTL from 32-bit programs
Dell supplied me with the following test:

#include<stdio.h>
#include<errno.h>
#include<sys/ioctl.h>
#include<fcntl.h>
#include<linux/usbdevice_fs.h>

main(int argc,char*argv[])
{
   struct usbdevfs_hub_portinfo hubPortInfo = {0};
   struct usbdevfs_ioctl command = {0};
   command.ifno = 0;
   command.ioctl_code = USBDEVFS_HUB_PORTINFO;
   command.data = (void*)&hubPortInfo;
   int fd, ret;
   if(argc != 2) {
     fprintf(stderr,"Usage: %s /proc/bus/usb/<BusNo>/<HubID>\n",argv[0]);
     fprintf(stderr,"Example: %s /proc/bus/usb/001/001\n",argv[0]);
     exit(1);
   }
   errno = 0;
   fd = open(argv[1],O_RDWR);
   if(fd < 0) {
     perror("open failed:");
     exit(errno);
   }
   errno = 0;
   ret = ioctl(fd,USBDEVFS_IOCTL,&command);
   printf("IOCTL return status:%d\n",ret);
   if(ret<0) {
     perror("IOCTL failed:");
     close(fd);
     exit(3);
   } else {
       printf("IOCTL passed:Num of ports %d\n",hubPortInfo.nports);
       close(fd);
       exit(0);
   }
   return 0;
}

I have verified that it breaks if built in 32 bit mode on x86_64 and that
the patch below fixes it.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28 16:47:46 -07:00
Peter Osterlund
6cd37cda7e [PATCH] Fix ext3 warning for unused var
Fix compile warning in ext3 quota code.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 13:57:57 -07:00
Linus Torvalds
84860bf064 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2005-10-28 13:09:47 -07:00
Linus Torvalds
8caf89157d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 2005-10-28 12:44:24 -07:00
Dave Kleikamp
7038f1cbac JFS: make sure right-most xtree pages have header.next set to zero
The xtTruncate code was only doing this for leaf pages.  When a file is
horribly fragmented, we may truncate a file leaving an internal page with
an invalid head.next field, which may cause a stale page to be referenced.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-10-28 13:27:40 -05:00
Greg KH
6fbfddcb52 Merge ../bleed-2.6 2005-10-28 10:13:16 -07:00
Greg Kroah-Hartman
53f4654272 [PATCH] Driver Core: fix up all callers of class_device_create()
The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create().  This patch
fixes up all in-kernel users of the function.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28 09:52:52 -07:00
Kay Sievers
a7fd67062e [PATCH] add sysfs attr to re-emit device hotplug event
A "coldplug + udevstart" can be simple like this:
  for i in /sys/block/*/*/uevent; do echo 1 > $i; done
  for i in /sys/class/*/*/uevent; do echo 1 > $i; done
  for i in /sys/bus/*/devices/*/uevent; do echo 1 > $i; done

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28 09:52:51 -07:00
Linus Torvalds
0ee40c6628 Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block 2005-10-28 08:53:00 -07:00
Al Viro
c4cdd03831 [PATCH] gfp_t: reiserfs mapping_set_gfp_mask() use
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:51 -07:00
Al Viro
27496a8c67 [PATCH] gfp_t: fs/*
- ->releasepage() annotated (s/int/gfp_t), instances updated
 - missing gfp_t in fs/* added
 - fixed misannotation from the original sweep caught by bitwise checks:
   XFS used __nocast both for gfp_t and for flags used by XFS allocator.
   The latter left with unsigned int __nocast; we might want to add a
   different type for those but for now let's leave them alone.  That,
   BTW, is a case when __nocast use had been actively confusing - it had
   been used in the same code for two different and similar types, with
   no way to catch misuses.  Switch of gfp_t to bitwise had caught that
   immediately...

One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications.  Left alone for now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:47 -07:00
Al Viro
af4ca457ea [PATCH] gfp_t: infrastructure
Beginning of gfp_t annotations:

 - -Wbitwise added to CHECKFLAGS
 - old __bitwise renamed to __bitwise__
 - __bitwise defined to either __bitwise__ or nothing, depending on
   __CHECK_ENDIAN__ being defined
 - gfp_t switched from __nocast to __bitwise__
 - force cast to gfp_t added to __GFP_... constants
 - new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
   the result to int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:46 -07:00
Chen, Kenneth W
20e5c81fcf [patch] remove gendisk->stamp_idle field
struct gendisk has these two fields: stamp, stamp_idle.  Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp.  Suggest to remove it.

Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-28 08:15:30 +02:00
Paul Mackerras
4542437679 Merge in v2.6.14 by hand 2005-10-28 13:38:53 +10:00
Trond Myklebust
bec273b491 NFS: Allow files that are open for write to invalidate caches
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:45 -04:00
Trond Myklebust
16c32b71bc NFSv4: Convert unnecessary XDR warning messages into dprintk()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:45 -04:00
Trond Myklebust
4f9838c7ec NFSv4: Add post-op attributes to NFSv4 write and commit callbacks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:44 -04:00
Trond Myklebust
16e429596d NFSv4: Add post-op attributes to nfs4_proc_remove()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:44 -04:00
Trond Myklebust
6caf2c8276 NFSv4: Add post-op attributes to nfs4_proc_rename()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:43 -04:00
Trond Myklebust
91ba2eeec5 NFSv4: Add post-op attributes to nfs4_proc_link()
Optimise attribute revalidation when hardlinking. Add post-op attributes
 for the directory and the original inode.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:42 -04:00
Trond Myklebust
cf80955614 NFS: Ensure that nfs_link() instantiates the dentry correctly
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:42 -04:00
Trond Myklebust
516a6af641 NFS: Add optional post-op getattr instruction to the NFSv4 file close.
"Optional" means that the close call will not fail if the getattr
 at the end of the compound fails.
 If it does succeed, try to refresh inode attributes.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:41 -04:00
Trond Myklebust
3338c143b4 NFS: Optimise attribute revalidation on close().
Only force a getattr in nfs_file_flush() if the attribute
 cache is stale.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:41 -04:00
Trond Myklebust
56ae19f38f NFSv4: Add directory post-op attributes to the CREATE operations.
Since the directory attributes change every time we CREATE a file,
 we might as well pick up the new directory attributes in the same
 compound.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:40 -04:00
Chuck Lever
0c70b50150 NFS: nfs_lookup doesn't need to revalidate the parent directory's inode
nfs_lookup() used to consult a lookup cache before trying an actual wire
 lookup operation.  The lookup cache would be invalid, of course, if the
 parent directory's mtime had changed, so nfs_lookup performed an inode
 revalidation on the parent.

 Since nfs_lookup() doesn't use a cache anymore, the revalidation is no
 longer necessary.  There are cases where it will generate a lot of
 unnecessary GETATTR traffic.

 See http://bugzilla.linux-nfs.org/show_bug.cgi?id=9

 Test-plan:
 Use lndir and "rm -rf" and watch for excess GETATTR traffic or application
 level errors.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:40 -04:00
Trond Myklebust
decf491f30 NFS: Don't let nfs_end_data_update() clobber attribute update information
Since we almost always call nfs_end_data_update() after we called
 nfs_refresh_inode(), we now end up marking the inode metadata
 as needing revalidation immediately after having updated it.

 This patch rearranges things so that we mark the inode as needing
 revalidation _before_ we call nfs_refresh_inode() on those operations
 that need it.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:39 -04:00
Trond Myklebust
33801147a8 NFS: Optimise inode attribute cache updates
Allow nfs_refresh_inode() also to update attributes on the inode if the
 RPC call was sent after the last call to nfs_update_inode().

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:39 -04:00
Trond Myklebust
913a70fc17 NFS: Convert cache_change_attribute into a jiffy-based value
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:38 -04:00
Trond Myklebust
0e574af1be NFS: Cleanup initialisation of struct nfs_fattr
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:38 -04:00
Trond Myklebust
4c2cb58c55 Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6 2005-10-27 19:12:49 -04:00
Trond Myklebust
34123da66e NFS: Fix a bad cast in nfs3_read_done
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 19:10:09 -04:00
Steve French
0753ca7bc2 [CIFS] Change pragma pack(1) to attribute(packed) to allow cifs on arm to access
unaligned structures coming in off the wire

gcc on arm processors generates very odd code with pragma pack specified -
although it does pack the structures in some sense - it does not allow you
to access unaligned elements in nested structures at the right offset as other
architectures do.  Oddly enough though, specifying the structures as packed
the long way - one by one with the packed attribute does work.  Rather than
fighting over whether this is a gcc bug or some obscure side effect
of pragma pack, it is easier to do what most (all but 96 other places in
the kernel) do - and replace pragma pack with dozens of attribute(packed)
structure qualifiers.  Much more verbose ... but at least it works.

Signed-off-by: David Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>  CG: -----------------------------------------------------------------------
2005-10-27 13:55:12 -07:00
Steve French
04290949b3 Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-10-27 12:53:03 -07:00
Peter Wainwright
94c1d31845 [PATCH] Fix HFS+ to free up the space when a file is deleted.
fsck_hfs reveals lots of temporary files accumulating in the hidden
directory "\000\000\000HFS+ Private Data".  According to the HFS+
documentation these are files which are unlinked while in use.  However,
there may be a bug in the Linux hfsplus implementation which causes this to
happen even when the files are not in use.  It looks like the "opencnt"
field is never initialized as (I think) it should be in hfsplus_read_inode.
 This means that a file can appear to be still in use when in fact it has
been closed.  This patch seems to fix it for me.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26 10:39:43 -07:00
Anton Altaparmakov
47c564e10f Merge branch 'master' of /usr/src/ntfs-2.6/ 2005-10-24 09:19:36 +01:00
Anton Altaparmakov
c9c2009a4e NTFS: Document extended attribute ($EA) NEED_EA flag. (Based on libntfs
patch by Yura Pakhuchiy.)

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-24 09:00:51 +01:00
Anton Altaparmakov
dda65b941f NTFS: Fix compilation warnings with gcc-4.0.2 on SUSE 10.0.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-24 08:57:59 +01:00
Anton Altaparmakov
d04bd1fb60 NTFS: Use %z for size_t to fix compilation warnings. (Andrew Morton)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-24 08:41:24 +01:00
Andrew Morton
8d3b35914a [PATCH] inotify/idr leak fix
Fix a bug which was reported and diagnosed by
Stefan Jones <stefan.jones@churchillrandoms.co.uk>

IDR trees include a cache of idr_layer objects.  There's no way to destroy
this cache, so when we discard an overall idr tree we end up leaking some
memory.

Add and use idr_destroy() for this.  v9fs and infiniband also need to use
idr_destroy() to avoid leaks.

Or, we make the cache global, like radix_tree_preload().  Which is probably
better.  Later.

Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-23 16:38:39 -07:00
Kostik Belousov
8766ce4101 [PATCH] aio syscalls are not checked by lsm
Another case of missing call to security_file_permission: aio functions
(namely, io_submit) does not check credentials with security modules.

Below is the simple patch to the problem.  It seems that it is enough to
check for rights at the request submission time.

Signed-off-by: Kostik Belousov <kostikbel@gmail.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-23 16:38:38 -07:00
Paul Mackerras
985990137e Merge changes from linux-2.6 by hand 2005-10-22 16:51:34 +10:00
Steve French
d6d3f5bc68 Merge with /pub/scm/linux/kernel/git/sfrench/cifs-2.6.git/ 2005-10-21 08:39:12 -07:00
Trond Myklebust
ec07342828 NFSv4: Fix up locking for nfs4_state_owner
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-20 14:22:47 -07:00
Trond Myklebust
4e51336a00 NFSv4: Final tweak to sequence id
Sacrifice queueing fairness for performance.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-20 14:22:41 -07:00
Steve French
23e7dd7d95 [CIFS] Defer close of file handle slightly if there are pending writes that
need to get in ahead of it that depend on that file handle. Fixes
occassional bad file handle errors on write with heavy use multiple process
cases.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-20 13:44:56 -07:00
Anton Altaparmakov
d5aeaef37d NTFS: Fix serious data corruption issue when writing.
Many thanks to Alberto Patino for testing and reporting the data
      corruption.  And many apologies for corrupting his partition.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-19 12:23:10 +01:00
Anton Altaparmakov
7d0ffdb279 NTFS: $EA attributes can be both resident non-resident.
Minor tidying.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-19 12:21:19 +01:00
Anton Altaparmakov
e087a412b4 Merge branch 'master' of /usr/src/ntfs-2.6/ 2005-10-19 12:13:58 +01:00
J. Bruce Fields
1d95db8e16 NFSv4: Fix acl buffer size
resp_len is passed in as buffer size to decode routine; make sure it's
 set right in case where userspace provides less than a page's worth of
 buffer.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:41 -07:00
J. Bruce Fields
8c233cf9c2 NFSv4: handle no acl attr
Stop handing garbage to userspace in the case where a weird server clears the
 acl bit in the getattr return (despite the fact that they've already claimed
 acl support.)

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:41 -07:00
Trond Myklebust
7f709a48fa NFSv4: Fix an oopsable condition in nfs_free_seqid
Storing a pointer to the struct rpc_task in the nfs_seqid is broken
 since the nfs_seqid may be freed well after the task has been destroyed.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:39 -07:00
Trond Myklebust
6fe43f9e37 NFS: Fix rename of directory onto empty directory
If someone tries to rename a directory onto an empty directory, we
 currently fail and return EBUSY.
 This patch ensures that we try the rename if both source and target
 are directories, and that we fail with a correct error of EISDIR if
 the source is not a directory.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:22 -07:00
Trond Myklebust
4c780a4688 Fix Connectathon locking test failure
We currently fail Connectathon test 6.10 in the case of 32-bit locks due
 to incorrect error checking.
 Also add support for l->l_len < 0 to 64-bit locks.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:21 -07:00
Trond Myklebust
550f57470c NFSv4: Ensure that we recover from the OPEN + OPEN_CONFIRM BAD_STATEID race
If the server is in the unconfirmed OPEN state for a given open owner
 and receives a second OPEN for the same open owner, it will cancel the
 state of the first request and set up an OPEN_CONFIRM for the second.

 This can cause a race that is discussed in rfc3530 on page 181.

 The following patch allows the client to recover by retrying the
 original open request.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:21 -07:00
Trond Myklebust
b8e5c4c297 NFSv4: If a delegated open fails, ensure that we return the delegation
Unless of course the open fails due to permission issues.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:20 -07:00
Trond Myklebust
642ac54923 NFSv4: Return delegations in case we're changing ACLs
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:19 -07:00
Trond Myklebust
cae7a073a4 NFSv4: Return delegation upon rename or removal of file.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:19 -07:00
Trond Myklebust
cdce5d6b94 VFS: Make link_path_walk set LOOKUP_CONTINUE before calling permission().
This will allow nfs_permission() to perform additional optimizations when
 walking the path, by folding the ACCESS(MAY_EXEC) call on the directory
 into the lookup revalidation.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:18 -07:00
Trond Myklebust
6f926b5ba7 [NFS]: Check that the server returns a valid regular file to our OPEN request
Since it appears that some servers don't...

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:18 -07:00
Trond Myklebust
02a913a73b NFSv4: Eliminate nfsv4 open race...
Make NFSv4 return the fully initialized file pointer with the
 stateid that it created in the lookup w/intent.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:17 -07:00
Trond Myklebust
834f2a4a15 VFS: Allow the filesystem to return a full file pointer on open intent
This is needed by NFSv4 for atomicity reasons: our open command is in
 fact a lookup+open, so we need to be able to propagate open context
 information from lookup() into the resulting struct file's
 private_data field.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:16 -07:00
Trond Myklebust
039c4d7a82 NFS: Fix up a race in the NFS implementation of GETLK
...and fix a memory corruption bug due to improper use of memcpy() on
 a struct file_lock.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:16 -07:00
Trond Myklebust
06735b3454 NFSv4: Fix up handling of open_to_lock sequence ids
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:15 -07:00
Trond Myklebust
faf5f49c2d NFSv4: Make NFS clean up byte range locks asynchronously
Currently we fail to do so if the process was signalled.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:15 -07:00
Trond Myklebust
0a8838f972 NFSv4: Add missing handling of OPEN_CONFIRM requests on CLAIM_DELEGATE_CUR.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:14 -07:00
Trond Myklebust
83c9d41e45 NFSv4: Remove nfs4_client->cl_sem from close() path
We no longer need to worry about collisions between close() and the state
 recovery code, since the new close will automatically recheck the
 file state once it is done waiting on its sequence slot.

 Ditto for the nfs4_proc_locku() procedure.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:13 -07:00
Trond Myklebust
e6dfa553cf NFSv4: Remove obsolete state_owner and lock_owner semaphores
OPEN, CLOSE, etc no longer need these semaphores to ensure ordering of
 requests.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:13 -07:00
Trond Myklebust
9512135df1 NFSv4: Fix a potential CLOSE race
Once the state_owner and lock_owner semaphores get removed, it will be
 possible for other OPEN requests to reopen the same file if they have
 lower sequence ids than our CLOSE call.
 This patch ensures that we recheck the file state once
 nfs_wait_on_sequence() has completed waiting.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:12 -07:00
Trond Myklebust
cee54fc944 NFSv4: Add functions to order RPC calls
NFSv4 file state-changing functions such as OPEN, CLOSE, LOCK,... are all
 labelled with "sequence identifiers" in order to prevent the server from
 reordering RPC requests, as this could cause its file state to
 become out of sync with the client.

 Currently the NFS client code enforces this ordering locally using
 semaphores to restrict access to structures until the RPC call is done.
 This, of course, only works with synchronous RPC calls, since the
 user process must first grab the semaphore.
 By dropping semaphores, and instead teaching the RPC engine to hold
 the RPC calls until they are ready to be sent, we can extend this
 process to work nicely with asynchronous RPC calls too.

 This patch adds a new list called "rpc_sequence" that defines the order
 of the RPC calls to be sent. We add one such list for each state_owner.
 When an RPC call is ready to be sent, it checks if it is top of the
 rpc_sequence list. If so, it proceeds. If not, it goes back to sleep,
 and loops until it hits top of the list.
 Once the RPC call has completed, it can then bump the sequence id counter,
 and remove itself from the rpc_sequence list, and then wake up the next
 sleeper.

 Note that the state_owner sequence ids and lock_owner sequence ids are
 all indexed to the same rpc_sequence list, so OPEN, LOCK,... requests
 are all ordered w.r.t. each other.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:12 -07:00
Trond Myklebust
cff6bf9709 Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6 2005-10-18 13:50:52 -07:00
Zach Brown
4faa528528 [PATCH] aio: revert lock_kiocb()
lock_kiocb() was introduced to serialize retrying and cancellation.  In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock.  Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb.  Cancel has other problems and
has no significant in-tree users that have been complaining about it.  So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 17:03:57 -07:00
David McCullough
b65574fec5 [PATCH] output of /proc/maps on nommu systems is incomplete
Currently you do not get all the map entries on nommu systems because the
start function doesn't index into the list using the value of "pos".

Signed-off-by: David McCullough <davidm@snapgear.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 17:03:57 -07:00
Trond Myklebust
6ce969171d [PATCH] NFS: Fix Oopsable/unnecessary i_count manipulations in nfs_wait_on_inode()
Oopsable since nfs_wait_on_inode() can get called as part of iput_final().

Unnecessary since the caller had better be damned sure that the inode won't
disappear from underneath it anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 14:47:16 -07:00
Trond Myklebust
b3c52da33c [PATCH] NFS: Fix cache consistency races
If the data cache has been marked as potentially invalid by nfs_refresh_inode,
we should invalidate it rather than assume that changes are due to our own
activity.

Also ensure that we always start with a valid cache before declaring it
to be protected by a delegation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17 14:47:16 -07:00
Anton Altaparmakov
7946ada30b Merge branch 'master' of /usr/src/ntfs-2.6/ 2005-10-17 15:00:34 +01:00
Yoshinori Sato
63c6764ce4 [PATCH] nommu build error fix
"proc_smaps_operations" is not defined in case of "CONFIG_MMU=n".

Signed-off-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-14 17:10:13 -07:00
Steve French
84d2f07e8e CIFS: cifs_writepages should not write beyond end of file
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-12 15:32:05 -07:00
Paul Mackerras
b6ec995a21 Merge from Linus' tree 2005-10-12 14:43:32 +10:00
Steve French
47c786e79b [CIFS] Add null malloc response check in notify experimental code
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-10-11 20:03:18 -07:00
Steve French
1047abc159 [CIFS] CIFS Stats improvements
New cifs_writepages routine was not updated bytes written in cifs stats.
Also added ability to clear /proc/fs/cifs/Stats by writing (0 or 1) to it.
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-11 19:58:06 -07:00
akpm@osdl.org
6de505173e [PATCH] binfmt_elf bss padding fix
Nir Tzachar <tzachar@cs.bgu.ac.il> points out that if an ELF file specifies a
zero-length bss at a whacky address, we cannot load that binary because
padzero() tries to zero out the end of the page at the whacky address, and
that may not be writeable.

See also http://bugzilla.kernel.org/show_bug.cgi?id=5411

So teach load_elf_binary() to skip the bss settng altogether if the elf file
has a zero-length bss segment.

Cc: Roland McGrath <roland@redhat.com>
Cc: Daniel Jacobowitz <dan@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-11 09:46:54 -07:00
Andreas Gruenbacher
22c1ea44f0 [PATCH] nfsacl: Solaris VxFS compatibility fix
Here is a compatibility fix between Linux and Solaris when used with VxFS
filesystems: Solaris usually accepts acl entries in any order, but with
VxFS it replies with NFSERR_INVAL when it sees a four-entry acl that is not
in canonical form.  It may also fail with other non-canonical acls -- I
can't tell, because that case never triggers: We only send non-canonical
acls when we fake up an ACL_MASK entry.

Instead of adding fake ACL_MASK entries at the end, inserting them in the
correct position makes Solaris+VxFS happy.  The Linux client and server
sides don't care about entry order.  The three-entry-acl special case in
which we need a fake ACL_MASK entry was handled in xdr_nfsace_encode.  The
patch moves this into nfsacl_encode.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-11 09:46:54 -07:00
Latchesar Ionkov
19cba8abd6 [PATCH] v9fs: remove additional buffer allocation from v9fs_file_read and v9fs_file_write
v9fs_file_read and v9fs_file_write use kmalloc to allocate buffers as big
as the data buffer received as parameter.  kmalloc cannot be used to
allocate buffers bigger than 128K, so reading/writing data in chunks bigger
than 128k fails.

This patch reorganizes v9fs_file_read and v9fs_file_write to allocate only
buffers as big as the maximum data that can be sent in one 9P message.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-11 09:46:54 -07:00
Anton Altaparmakov
98b270362b NTFS: The big ntfs write(2) rewrite has arrived. We now implement our own
file operations ->write(), ->aio_write(), and ->writev() for regular
      files.  This replaces the old use of generic_file_write(), et al and
      the address space operations ->prepare_write and ->commit_write.
      This means that both sparse and non-sparse (unencrypted and
      uncompressed) files can now be extended using the normal write(2)
      code path.  There are two limitations at present and these are that
      we never create sparse files and that we only have limited support
      for highly fragmented files, i.e. ones whose data attribute is split
      across multiple extents.   When such a case is encountered,
      EOPNOTSUPP is returned.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-11 15:40:40 +01:00
Dave Kleikamp
b6a47fd8ff JFS: Corrupted block map should not cause trap
Replace assert statements with better error handling.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-10-11 09:06:59 -05:00
Anton Altaparmakov
29f5f3c141 NTFS: Remove address space operations ->prepare_write and ->commit_write in
preparation for the big rewrite of write(2) support in ntfs.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-11 14:59:40 +01:00
Anton Altaparmakov
29b8990513 NTFS: In attrib.c::ntfs_attr_set() call balance_dirty_pages_ratelimited()
and cond_resched() in the main loop as we could be dirtying a lot of
      pages and this ensures we play nice with the VM and the system as a
      whole.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-11 14:54:42 +01:00
Anton Altaparmakov
29d8699ebb Merge branch 'master' of /usr/src/ntfs-2.6/ 2005-10-11 09:29:48 +01:00
Steve French
4ca9c190d9 [CIFS] Fix oops in experimental notify code (when CONFIG_CIFS_EXPERIMENTAL
was turned on).

Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-10 19:52:13 -07:00
Steve French
34210f3302 [CIFS] Still missing a line from previous fix
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-10-10 14:31:13 -07:00
Steve French
9e2e85f82f [CIFS] Fix minor build problem with previous changeset
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-10 14:28:38 -07:00
Steve French
b387eaeb66 [CIFS] Do not shrink tcp sndbuf/rcvbuf from their defaults
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-10 14:21:15 -07:00
Steve French
5e1253b501 [CIFS] Correct cifs tcp retry when some data sent before getting EAGAIN.
Continue implementation of cifs umount begin to allow force unmounts of
cifs mounts.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-10 14:06:37 -07:00
Steve French
02c37a6df5 [CIFS] Update cifs version to 1.38
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-10 11:49:37 -07:00
Steve French
190fdeb844 [CIFS] Fix byte range locking to Windows when Windows server returns
illegal RFC1001 length (which had caused the lock to block forever
until killed).
2005-10-10 11:48:26 -07:00
Steve French
0ae0efada3 [CIFS] Fix rsize calculation so that large readx flag is checked.
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-10-10 10:57:19 -07:00
Steve French
68058e7575 [CIFS] Reduce CIFS tcp congestion timeout (it was too long) and backoff
ever longer amounts (up to 15 seconds).  This improves performance
especially when using large wsize.

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-10-10 10:34:22 -07:00
Tom Zanussi
1cc956e12a [PATCH] relayfs: fix bogus param value in call to vmap
The third param in this call to vmap shouldn't be GFP_KERNEL, which
makes no sense, but rather VM_MAP.  Thanks to Al Viro for spotting
this.

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 08:39:50 -07:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Steve French
131afd0b74 [CIFS] /proc/fs/cifs debug code cleanup and new stats2
These changes to debug code and new stats are helpful in
debugging potential tcp performance/configuration problems under cifs.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-07 09:51:05 -07:00
Linus Torvalds
8298411468 Avoid 'names_cache' memory leak with CONFIG_AUDITSYSCALL
The nameidata "last.name" is always allocated with "__getname()", and
should always be free'd with "__putname()".

Using "putname()" without the underscores will leak memory, because the
allocation will have been hidden from the AUDITSYSCALL code.

Arguably the real bug is that the AUDITSYSCALL code is really broken,
but in the meantime this fixes the problem people see.

Reported by Robert Derr, patch by Rick Lindsley.

Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-06 21:54:21 -07:00
Steve French
dd99cd803d [CIFS] cleanup sparse and compile errors in previous fix
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-10-05 19:32:49 -07:00
Steve French
4a77118cd5 CIFS: Allow wsize to exceed CIFSMaxBufSize
This allows cifs_writepages to send data in larger chunks from the page
cache, without requiring larger memory allocations in other cases.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-05 15:14:33 -07:00
Steve French
37c0eb4677 CIFS: implement cifs_writepages to perform multi-page I/O
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-05 14:50:29 -07:00
Steve French
6148a742b2 CIFS: Create routine find_writable_file to reduce redundant code
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2005-10-05 12:23:19 -07:00
Al Viro
c2b513dfbb [PATCH] bfs iget() abuses
bfs_fill_super() walks the inode table to get the bitmap of free inodes
and collect stats.  It has no business using iget() for that - it's a
lot of extra work, extra icache pollution and more complex code.
Switched to walking the damn thing directly.

Note: that also allows to kill ->i_dsk_ino in there - separate patch if
Tigran can confirm that this field can be zero only for deleted inodes
(i.e.  something that could only be found during that scan and not by
normal lookups).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-04 13:22:01 -07:00
Alexey Dobriyan
ce0fe7e70a [PATCH] bfs endianness annotations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-04 13:22:01 -07:00
Anton Altaparmakov
e9438250b6 NTFS: Enable ATTR_SIZE attribute changes in ntfs_setattr(). This completes
the initial implementation of file truncation.  Now both open(2)ing
      a file with the O_TRUNC flag and the {,f}truncate(2) system calls
      will resize a file appropriately.  The limitations are that only
      uncompressed and unencrypted files are supported.  Also, there is
      only very limited support for highly fragmented files (the ones whose
      $DATA attribute is split into multiple attribute extents).

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 16:01:06 +01:00
Anton Altaparmakov
dd072330d1 NTFS: Implement fs/ntfs/inode.[hc]::ntfs_truncate(). It only supports
uncompressed and unencrypted files.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 15:39:02 +01:00
Anton Altaparmakov
2d86829b84 NTFS: Add fs/ntfs/attrib.[hc]::ntfs_attr_extend_allocation(), a function to
extend the allocation of an attributes.  Optionally, the data size,
      but not the initialized size can be extended, too.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 15:18:56 +01:00
Anton Altaparmakov
2a6fc4e1b0 NTFS: Fix ntfs_attr_make_non_resident() to update the vfs inode i_blocks
which is zero for a resident attribute but should no longer be zero
      once the attribute is non-resident as it then has real clusters
      allocated.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 14:57:15 +01:00
Anton Altaparmakov
8925d4f0d3 NTFS: Change ntfs_attr_make_non_resident to take the attribute value size
as an extra parameter.  This is needed since we need to know the size
      before we can map the mft record and our callers always know it.  The
      reason we cannot simply read the size from the vfs inode i_size is
      that this is not necessarily uptodate.  This happens when
      ntfs_attr_make_non_resident() is called in the ->truncate call path.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 14:48:20 +01:00
Anton Altaparmakov
fc0fa7dc7d NTFS: - Change ntfs_cluster_alloc() to take an extra boolean parameter
specifying whether the cluster are being allocated to extend an
        attribute or to fill a hole.
      - Change ntfs_attr_make_non_resident() to call ntfs_cluster_alloc()
        with @is_extension set to TRUE and remove the runlist terminator
        fixup code as this is now done by ntfs_cluster_alloc().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 14:36:56 +01:00
Anton Altaparmakov
511bea5ea2 NTFS: - Change {__,}ntfs_cluster_free() to also take an optional attribute
search context as argument.  This allows calling it with the mft
        record mapped.  Update all callers.
      - Fix potential deadlock in ntfs_mft_data_extend_allocation_nolock()
	error handling by passing in the active search context when calling
	ntfs_cluster_free().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 14:24:21 +01:00
Anton Altaparmakov
69b41e3c02 NTFS: Change ntfs_attr_find_vcn_nolock() to also take an optional attribute
search context as argument.  This allows calling it with the mft
      record mapped.  Update all callers.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 14:01:14 +01:00
Anton Altaparmakov
fd9d63678d NTFS: Change ntfs_map_runlist_nolock() to also take an optional attribute
search context.  This allows calling it with the mft record mapped.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 13:44:48 +01:00
Anton Altaparmakov
c394e458b6 NTFS: Fix a 64-bitness bug where a left-shift could overflow a 32-bit variable
which we now cast to 64-bit first (fs/ntfs/mft.c::map_mft_record_page().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 13:08:53 +01:00
Anton Altaparmakov
18efefa935 NTFS: Fix a stupid bug in __ntfs_bitmap_set_bits_in_run() which caused the
count to become negative and hence we had a wild memset() scribbling
      all over the system's ram.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-04 13:06:00 +01:00
Steve French
04c08816d6 [CIFS] Missing parenthesis from error message in previous fix
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-10-03 19:33:15 -07:00
Steve French
8cc64c6ecf [CIFS] Allow SMBWrite2 to work to older servers
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-10-03 13:49:43 -07:00
Steve French
3e84469d01 [CIFS] Add writepages support to shrink memory usage on writes,
eliminate the double copy, and improve cifs write performance and
help the server by upping the typical write size from 4K to 16K
(or even larger if wsize set explicitly)  for servers which support this.
Part 1 of 2

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French  <sfrench@us.ibm.com>
2005-10-03 13:37:24 -07:00
Dave Kleikamp
ac17b8b570 JFS: make special inodes play nicely with page balancing
This patch fixes up a few problems with jfs's reserved inodes.

1. There is no need for the jfs code setting the I_DIRTY bits in i_state.
   I am ashamed that the code ever did this, and surprised it hasn't been
   noticed until now.

2. Make sure special inodes are on an inode hash list.  If the inodes are
   unhashed, __mark_inode_dirty will fail to put the inode on the
   superblock's dirty list, and the data will not be flushed under memory
   pressure.

3. Force writing journal data to disk when metapage_writepage is unable to
   write a metadata page due to pending journal I/O.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-10-03 15:32:11 -05:00
Miklos Szeredi
dd190d066b [PATCH] fuse: check O_DIRECT
Check O_DIRECT and return -EINVAL error in open.  dentry_open() also checks
this but only after the open method is called.  This patch optimizes away
the unnecessary upcalls in this case.

It could be a correctness issue too: if filesystem has open() with side
effect, then it should fail before doing the open, not after.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 12:41:18 -07:00
Paolo 'Blaisorblade' Giarrusso
daa35edc0a [PATCH] uml: remove empty hostfs_truncate method
Calling truncate() on hostfs spits a kernel warning "Something isn't
implemented here", but it still works fine.

Indeed, hostfs i_op->truncate doesn't do anything.  But hostfs_setattr() ->
set_attr() correctly detects ATTR_SIZE and calls truncate() on the host.  So
we should be safe (using ftruncate() may be better, in case the file is
unlinked on the host, but we aren't sure to have the file open for writing,
and reopening it would cause the same races; plus nobody should expect UML to
be so careful).

So, the warning is wrong, because the current implementation is working.  Al,
am I correct, and can the warning be therefore dropped?

CC: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 12:41:18 -07:00
Zach Brown
353fb07e20 [PATCH] aio: avoid extra aio_{read,write} call when ki_left == 0
Recently aio_p{read,write} changed to perform retries internally rather
than returning -EIOCBRETRY.  This inadvertantly resulted in always calling
aio_{read,write} with ki_left at 0 which would in turn immediately return
0.  Harmless, but we can avoid this call by checking in the caller.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 12:41:17 -07:00
Zach Brown
897f15fb58 [PATCH] aio: remove unlocked task_list test and resulting race
Only one of the run or kick path is supposed to put an iocb on the run
list.  If both of them do it than one of them can end up referencing a
freed iocb.  The kick path could delete the task_list item from the wait
queue before getting the ctx_lock and putting the iocb on the run list.
The run path was testing the task_list item outside the lock so that it
could catch ki_retry methods that return -EIOCBRETRY *without* putting the
iocb on a wait queue and promising to call kick_iocb.  This unlocked check
could then race with the kick path to cause both to try and put the iocb on
the run list.

The patch stops the run path from testing task_list by requring that any
ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
called in the future.  aio_p{read,write}, the only in-tree -EIOCBRETRY
users, are updated.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 12:41:17 -07:00
Zach Brown
998765e558 [PATCH] aio: lock around kiocbTryKick()
Only one of the run or kick path is supposed to put an iocb on the run
list.  If both of them do it than one of them can end up referencing a
freed iocb.  The kick patch could set the Kicked bit before acquiring the
ctx_lock and putting the iocb on the run list.  The run path, while holding
the ctx_lock, could see this partial kick and mistake it for a kick that
was deferred while it was doing work with the run_list NULLed out.  It
would then race with the kick thread to add the iocb to the run list.

This patch moves the kick setting under the ctx_lock so that only one of
the kick or run path queues the iocb on the run list, as intended.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 12:41:17 -07:00
Al Viro
192eaa28ba [PATCH] missing ERR_PTR in 9fs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 08:42:23 -07:00
Kostik Belousov
411b67b4b6 [PATCH] readv/writev syscalls are not checked by lsm
it seems that readv(2)/writev(2) syscalls do not call
file_permission callback. Looks like this is overlook.

I have filled the issue into redhat bugzilla as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=169433
and got the recommendation to post this on lsm mailing list.

The following trivial patch solves the problem.

Signed-off-by: Kostik Belousov <kostikbel@gmail.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
2005-09-29 15:42:08 -07:00
Paul Mackerras
ab11d1ea28 Merge by hand from Linus' tree.
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-09-29 13:13:36 +10:00
Davide Libenzi
e3306dd5f7 [PATCH] epoll: handle timeout overflow
Handle the timeout upper boundary for epoll.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:41 -07:00
Latchesar Ionkov
0b8dd17762 [PATCH] v9fs: fix races in fid allocation
Fid management cleanup.  The patch attempts to fix the races in dentry's
fid management.

Dentries don't keep the opened fids anymore, they are moved to the file
structs.  Ideally there should be no more than one fid with fidcreate equal
to zero in the dentry's list of fids.

v9fs_fid_create initializes the important fields (fid, fidcreated) before
v9fs_fid is added to the list.  v9fs_fid_lookup returns only fids that are
not created by v9fs_create.  v9fs_fid_get_created returns the fid created
by the same process by v9fs_create (if any) and removes it from dentry's
list

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Chris Sykes
dc7b5fd6b0 [PATCH] Fix ext3_new_inode() failure paths
Fix failure paths in ext3_new_inode() and clean up duplicated code: -
DQUOT_DROP() was not being called if ext3_init_security() failed.

Signed-off-by: Chris Sykes <chris@sigsegv.plus.com>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Chris Sykes
9ed6c2fb34 [PATCH] Fix ext2_new_inode() failure paths
Fix failure paths in ext2_new_inode() and clean up duplicated code: -
DQUOT_DROP() was not being called if ext2_init_security() failed.

Signed-off-by: Chris Sykes <chris@sigsegv.plus.com>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Miklos Szeredi
ee4e52719c [PATCH] fuse: check reserved node ID values
This patch checks reserved node ID values returned by lookup and creation
operations.  In case one of the reserved values is sent, return -EIO.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Miklos Szeredi
909021ea7a [PATCH] fuse: add required version info
Add information about required version of the userspace library/utilities
to Documentation/Changes.  Also add pointer to this and to FUSE
documentation from Kconfig.

Thanks to Anton Altaparmakov for the reminder.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Anton Altaparmakov
e2fcc61ef0 NTFS: Re-fix sparse warnings in a more correct way, i.e. don't use an enum with
different types in it but #define the two constants instead.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-26 17:02:41 +01:00
Anton Altaparmakov
e8c2cd99a3 Merge branch 'master' of /home/src/linux-2.6/ 2005-09-26 10:50:29 +01:00
Anton Altaparmakov
5a8c0cc32b NTFS: More $LogFile handling fixes: when chkdsk has been run, it can leave the
restart pages in the journal without multi sector transfer protection
      fixups (i.e. the update sequence array is empty and in fact does not
      exist).

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-26 10:48:54 +01:00
Anton Altaparmakov
838bf9675a NTFS: Fix the definition of the CHKD ntfs record magic. It had an off by
two error causing it to be CHKB instead of CHKD.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-26 10:45:46 +01:00
Paul Mackerras
14cf11af6c powerpc: Merge enough to start building in arch/powerpc.
This creates the directory structure under arch/powerpc and a bunch
of Kconfig files.  It does a first-cut merge of arch/powerpc/mm,
arch/powerpc/lib and arch/powerpc/platforms/powermac.  This is enough
to build a 32-bit powermac kernel with ARCH=powerpc.

For now we are getting some unmerged files from arch/ppc/kernel and
arch/ppc/syslib, or arch/ppc64/kernel.  This makes some minor changes
to files in those directories and files outside arch/powerpc.

The boot directory is still not merged.  That's going to be interesting.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-09-26 16:04:21 +10:00
Steve French
ede1327ea4 [PATCH] cifs: Add support for suspend
cifsd had been preventing software suspend from completing.

Signed-off-by: pavel@suse.de
Signed-off-by: Steve French <sfrench@us.ibm.com>  lightly modified
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-23 11:37:53 -07:00
Trond Myklebust
f134585a73 Revert "[PATCH] RPC,NFS: new rpc_pipefs patch"
This reverts 17f4e6febca160a9f9dd4bdece9784577a2f4524 commit.
2005-09-23 12:39:00 -04:00
Trond Myklebust
3063d8a166 NFS: Make /proc/mounts display the protocol used by NFSv4
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:59 -04:00
Christoph Hellwig
278c995c8a [PATCH] RPC,NFS: new rpc_pipefs patch
Currently rpc_mkdir/rpc_rmdir and rpc_mkpipe/mk_unlink have an API that's
 a little unfortunate.  They take a path relative to the rpc_pipefs root and
 thus need to perform a full lookup.  If you look at debugfs or usbfs they
 always store the dentry for directories they created and thus can pass in
 a dentry + single pathname component pair into their equivalents of the
 above functions.

 And in fact rpc_pipefs actually stores a dentry for all but one component so
 this change not only simplifies the core rpc_pipe code but also the callers.

 Unfortuntately this code path is only used by the NFS4 idmapper and
 AUTH_GSSAPI for which I don't have a test enviroment.  Could someone give
 it a spin?  It's the last bit needed before we can rework the
 lookup_hash API

 Signed-off-by: Christoph Hellwig <hch@lst.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:57 -04:00
Chuck Lever
03bf4b707e [PATCH] RPC: parametrize various transport connect timeouts
Each transport implementation can now set unique bind, connect,
 reestablishment, and idle timeout values.  These are variables,
 allowing the values to be modified dynamically.  This permits
 exponential backoff of any of these values, for instance.

 As an example, we implement exponential backoff for the connection
 reestablishment timeout.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:53 -04:00
Chuck Lever
ed63c00370 [PATCH] RPC: remove xprt->nocong
Get rid of the "xprt->nocong" variable.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
 Look for significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:47 -04:00
Chuck Lever
43118c29de [PATCH] RPC: get rid of xprt->stream
Now we can fix up the last few places that use the "xprt->stream"
 variable, and get rid of it from the rpc_xprt structure.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:35 -04:00
Chuck Lever
eab5c084b8 [PATCH] NFS: use a constant value for TCP retransmit timeouts
Implement a best practice: don't use exponential backoff when computing
 retransmit timeout values on TCP connections, but simply retransmit
 at regular intervals.

 This also fixes a bug introduced when xprt_reset_majortimeo() was added.

 Test-plan:
 Enable RPC debugging and watch timeout behavior on a NFS/TCP mount.

 Version: Thu, 11 Aug 2005 16:02:19 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:06 -04:00
Trond Myklebust
9aa48b7e27 NFS: Don't expose internal READDIR errors to userspace
Fixes a condition whereby the kernel is returning the non-POSIX error
 EBADCOOKIE to userspace.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:01 -04:00
Olaf Kirch
449231d6dd From: Olaf Kirch <okir@suse.de>
[PATCH] Fix miscompare in __posix_lock_file

 If an application requests the same lock twice, the
 kernel should just leave the existing lock in place.
 Currently, it will install a second lock of the same type.

 Signed-off-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:37:59 -04:00
Trond Myklebust
20509f1bc5 NFS: Drop inode after rename
When doing a rename on top of an existing file that is not in use,
 the inode of the overwritten file will remain in the icache.

 The fix is to decrement i_nlink of the overwritten inode, like we
 do for unlink, rmdir etc already.

 Problem diagnosed by Olaf Kirch. This patch is a slight variation
 on his fix.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:37:58 -04:00
Linus Torvalds
bfab08c097 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/shaggy/jfs-2.6 2005-09-23 07:40:53 -07:00
Anton Altaparmakov
715dc636b6 NTFS: Change ntfs_cluster_free() to require a write locked runlist on entry
since we otherwise get into a lock reversal deadlock if a read locked
      runlist is passed in. In the process also change it to take an ntfs
      inode instead of a vfs inode as parameter.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-23 11:24:28 +01:00
Nick Wilson
10d2c46f94 [PATCH] NFS: fix client oops when debugging is on
nfs_readpage_release() causes an oops while accessing a file with NFS
debugging turned on (echo 32767 > /proc/sys/sunrpc/nfs_debug) and a kernel
built with CONFIG_DEBUG_SLAB.

This patch moves the debugging statement above nfs_release_request() to
avoid accessing freed memory.

Signed-off-by: Nick Wilson <njw@osdl.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:37 -07:00
Glauber de Oliveira Costa
8bdac5d1ed [PATCH] ext3: EXT3_DEBUG build fixes
Fix some warnings and a build error when EXT3_DEBUG is enabled.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:37 -07:00
OGAWA Hirofumi
275abf5b06 [PATCH] ext3: ext3_show_options fix
EXT3_MOUNT_DATA_FLAGS is not a boolean. This fixes it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:35 -07:00
Latchesar Ionkov
f71626a461 [PATCH] v9fs: don't free root dentry & inode if error occurs in v9fs_get_sb
If error occurs while in v9fs_get_sb after it calles sget, the dentry object
of the root and its inode may be freed twice -- once while handling the error
in v9fs_get_sb, and second time when v9fs_get_sb calles deactivate_super
(which in turn calls v9fs_kill_super)

The patch removes the unnecessary code that frees the root dentry and its
inode.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
a1f9d8d23f [PATCH] v9fs: replace strlen on newly allocated by __getname buffers to PATH_MAX
v9fs_vfs_readlink allocates space for the link using __getname and
errorneously uses strlen on the newly allocated buffer to check if the buffer
passed by the user is bigger than the one returned by __getname.

The patch replaces the strlen usage to PATH_MAX, which is the actual size of
the buffers returned by __getname.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
a8e63bff52 [PATCH] v9fs: make copy of the transport prototype instead of using it directly
When a new session is created it uses a template object of the specified
transport type to instantiate its own copy.  The code for the making a copy of
the template object was lost, and the object itself is attached to the v9fs
session.  This leads to many sessions using the same transport instead of
having their own copy.

The patch puts back the code that makes a copy of the template object.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
5b06767623 [PATCH] v9fs: allocate the Rwalk qid array from the right conv buffer
When v9fs_deserealize_fcall deserializes a Rwalk message, it incorrectly
allocates space for the qid array in the source instead of the destination
buffer.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Latchesar Ionkov
d06a8fb130 [PATCH] v9fs: make conv functions to check for conv buffer overflow
buf_check_size function checks if the conv buffer has enough space for the
performed operation, but it doesn't return the result back to the calling
function, only logs an error in the log.

The report-back-error functionality was lost when buf_check_size was
converted from macro to inline function. The return in the macro used to
exit from the functions that include it, after the conversion it just exits
from the inline function itself.

The patch makes buf_check_size to return flag and all functions that use
it check if they should perform the operation, or exit.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Andrew Morton
0678e5feaa [PATCH] proc_task_root_link c99 fix
fs/proc/base.c: In function `proc_task_root_link':
fs/proc/base.c:364: warning: ISO C90 forbids mixed declarations and code

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:33 -07:00
Steve French
70ca734a14 [CIFS] Various minor bigendian fixes and sparse level 2 warning message fixes
Most important of these fixes mapchars on bigendian and a few statfs fields

Signed-off-by: Shaggy (shaggy@austin.ibm.com)
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-09-22 16:32:06 -07:00
Anton Altaparmakov
91fbc6edfa NTFS: Fix sparse warnings that have crept in over time.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-22 13:26:44 +01:00
Steve French
2096243885 [CIFS] Add support for legacy servers part nine. statfs (df and du) is now
functional, and the length check is fixed so readdir does not throw a
warning message when windows me messes up the response to FindFirst
of an empty dir (with only . and ..).

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-09-21 22:05:57 -07:00
Stephane Kardas
62a36c43c8 [PATCH] fat: fix adate
During a forensic analysis on the fat file system, I found than the result for
the last access date on this file system was different between the stat
command and the istat command (package tct-utils).

The istat command display a true date (the right windows date) but the stat
primitive (so stat, find, ls command) displays a wrong date.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 10:12:18 -07:00
Sripathi Kodi
66dcca0628 [PATCH] Fix invisible threads problem
When the main thread of a thread group has done pthread_exit() and died,
the other threads are still happily running, but will not be visible
under /proc because their leader is no longer accessible.

This fixes the access control so that we can see the sub-threads again.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 09:15:34 -07:00
Steve French
e30dcf3a19 [CIFS] Add support for legacy servers part eight. Write fixes for Windows
ME, and do not set ctime unless explicitly requested with atime and/or
mtime (it gets thrown away by most servers anyway as there is no way to set
this via posix).

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-09-20 20:49:16 -07:00
Dave Kleikamp
438282d85d JFS: don't dereference tlck->ip from txUpdateMap
The inode pointer may no longer be valid

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-20 14:58:11 -05:00
Anton Altaparmakov
eed8b2dee7 NTFS: More runlist handling fixes from Richard Russon and myself.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-20 14:19:30 +01:00
Linus Torvalds
f805fbdaac Make fsnotify possibly work better for the inode removal case
Checking i_nlink is dubious, but the alternatives look even
less appetizing.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-19 19:54:29 -07:00
Anton Altaparmakov
044a500e46 Merge branch 'master' of /home/src/linux-2.6/ 2005-09-19 09:47:49 +01:00
Anton Altaparmakov
f6098cf449 NTFS: Fix ntfs_{read,write}page() to cope with concurrent truncates better.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-19 09:41:39 +01:00
Anton Altaparmakov
4e64c88693 NTFS: Fix handling of compressed directories that I broke in earlier changeset.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-19 09:38:41 +01:00
Anton Altaparmakov
5c9f6de3b8 NTFS: Fix various bugs in the runlist merging code. (Based on libntfs
changes by Richard Russon.)

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-19 09:33:40 +01:00
Steve French
3e87d80391 [CIFS] Add support for legacy servers part seven. Fix open for write,
begin implementation of Win9x style set file size via open then
write of zero bytes.

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-09-18 20:49:21 -07:00
OGAWA Hirofumi
ef402268f7 [PATCH] FAT: miss-sync issues on sync mount (miss-sync on write)
This patch fixes miss-sync issue on write() system call.  This updates
inode attrs flags, mtime and ctime on every comit_write call, due to
locking.

Signed-off-by: Hiroyuki Machida <machida@sm.sony.co.jp>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Dipankar Sarma
4fb3a53860 [PATCH] files: fix preemption issues
With the new fdtable locking rules, you have to protect fdtable with either
->file_lock or rcu_read_lock/unlock().  There are some places where we
aren't doing either.  This patch fixes those places.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Zach Brown
a464adeb7e [PATCH] Add smp_mb__after_clear_bit() to unlock_kiocb()
Add smp_mb__after_clear_bit() to unlock_kiocb()

AIO's use of wait_on_bit_lock()/wake_up_bit() forgot to add a barrier
between clearing its lock bit and calling wake_up_bit() so wake_up_bit()'s
unlocked waitqueue_active() can race.  This puts AIO's use in line with the
others and the comment above wake_up_bit().

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Davide Libenzi
53d2be79d5 [PATCH] epoll: fix delayed initialization bug
Al found a potential problem in epoll_create(), where the
file->private_data member was set after fd_install().  This is obviously
wrong since another thread might do a close() on that fd# before we set the
file->private_data member.  This goes over 2.6.13 and passes a few basic
tests I've done here.

(akpm: snuck in a kzalloc() cleanup too)

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Steve French
f9f5c81769 [CIFS] Add support for legacy servers part six. Fix read syntax so
we do not request more than negotiated buffer size even if buffer
size is small (smaller than one page)

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-09-15 23:06:38 -07:00
Steve French
eafe870121 [CIFS] Fix readdir caching when unlink removes file in current search
buffer, and this is followed by a rewind search to just before
the deleted entry.

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-09-15 21:47:30 -07:00
Dave Kleikamp
6cb1269b96 JFS: Fix sparse warnings, including endian error
The fix in inode.c is a real bug.  It could result in undeleted, yet
unconnected files on big-endian hardware.

The others are trivial.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-09-15 23:25:41 -05:00
Steve French
ab2f218f4f [CIFS] Fix compiler warnings
Fix some compiler warnings noticed on x64 by me and ppc64 by Shaggy

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-09-15 20:44:50 -07:00
David S. Miller
4a805e863d [COMPAT]: Fixup compat_do_execve()
Missing acct_update_integrals() and update_mem_hiwater() calls
compared to it's native counterpart.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-14 21:40:00 -07:00
Dipankar Sarma
0b175a7e68 [PATCH] Fix the fdtable freeing in the case of vmalloced fdset/arrays
Noted by David Miller:

  "The bug is that free_fd_array() takes a "num" argument, but when
   calling it from __free_fdtable() we're instead passing in the size in
   bytes (ie.  "num * sizeof(struct file *)")."

Yes it is a bug. I think I messed it up while merging newer
changes with an older version where I was using size in bytes
to optimize.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 12:38:26 -07:00
Hugh Dickins
2fd4ef85e0 [PATCH] error path in setup_arg_pages() misses vm_unacct_memory()
Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
security_vm_enough_memory tend to forget to vm_unacct_memory when a
failure occurs further down (typically in setup_arg_pages variants).

These are all users of insert_vm_struct, and that reservation will only
be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.

So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
Committed_AS each time they're run.  But don't add VM_ACCOUNT to them,
it's inappropriate to reserve against the very unlikely case that gdb
be used to COW a vDSO page - we ought to do something about that in
do_wp_page, but there are yet other inconsistencies to be resolved.

The safe and economical way to fix this is to let insert_vm_struct do
the security_vm_enough_memory check when it finds VM_ACCOUNT is set.

And the MIPS irix_brk has been calling security_vm_enough_memory before
calling do_brk which repeats it, doubly accounting and so also leaking.
Remove that, and all the fs and arch calls to security_vm_enough_memory:
give it a less misleading name later on.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 11:18:13 -07:00
Alexander Nyberg
fb085cf1d4 [PATCH] Fix fs/exec.c:788 (de_thread()) BUG_ON
It turns out that the BUG_ON() in fs/exec.c: de_thread() is unreliable
and can trigger due to the test itself being racy.

de_thread() does
 	while (atomic_read(&sig->count) > count) {
	}
	.....
	.....
	BUG_ON(!thread_group_empty(current));

but release_task does
	write_lock_irq(&tasklist_lock)
	__exit_signal
		(this is where atomic_dec(&sig->count) is run)
	__exit_sighand
	__unhash_process
		takes write lock on tasklist_lock
		remove itself out of PIDTYPE_TGID list
	write_unlock_irq(&tasklist_lock)

so there's a clear (although small) window between the
atomic_dec(&sig->count) and the actual PIDTYPE_TGID unhashing of the
thread.

And actually there is no need for all threads to have exited at this
point, so we simply kill the BUG_ON.

Big thanks to Marc Lehmann who provided the test-case.

Fixes Bug 5170 (http://bugme.osdl.org/show_bug.cgi?id=5170)

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 10:26:34 -07:00
Linus Torvalds
5d54e69c68 Merge master.kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6 2005-09-13 09:47:30 -07:00
Neil Brown
73aea4ecd3 [PATCH] nfsd4: fix setclientid unlock of unlocked state lock
We could try to unlock the state lock here without having first locked it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:32 -07:00
Neil Brown
b59e3c0e17 [PATCH] nfsd4: fix open seqid incrementing in lock
In the case of a lock which introduces a new lockowner, the openowner's
sequence id should be incremented, even when the operation fails, if the
error is a sequence-id-mutating error.  The current code fails to do that
in some cases.  Fix this by using the same sequence-id-incrementing
mechanism that all other such operations use.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:32 -07:00
Neil Brown
f2327d9adb [PATCH] nfsd4: move replay_owner
It seems more natural to move the setting of the replay_owner into the
relevant procedure instead of doing it in nfsv4_proc_compound.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:31 -07:00
Neil Brown
849823c52d [PATCH] nfsd4: printk reduction
Demote some printk's that look like they could be triggered by non-buggy
clients to dprintk's.  (For example, stale clientid's are normal
occurrences on reboot, and on a server with a lot of clients these messages
could become annoying.)

Also remove some redundant dprintk's (e.g. no need for both STALE_CLIENTID
and its callers to do dprintks).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:31 -07:00
Chris Mason
9f03783ce5 [PATCH] reiserfs: use mark_inode_dirty instead of reiserfs_update_sd
reiserfs should use mark_inode_dirty during reiserfs_file_write and
reiserfs_commit_write.  This makes sure the inode is properly flagged as
dirty, which is used during O_SYNC to decide when to trigger log commits.

This patch also removes the O_SYNC check from reiserfs_commit_write, since
that gets dealt with properly at higher layers once we start using
mark_inode_dirty.

Thanks to Hifumi Hisashi <hifumi.hisashi@lab.ntt.co.jp> for catching this.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:29 -07:00
Peter Staubach
a1a5b3d93c [PATCH] open returns ENFILE but creates file anyway
When open(O_CREAT) is called and the error, ENFILE, is returned, the file
may be created anyway.  This is counter intuitive, against the SUS V3
specification, and may cause applications to misbehave if they are not
coded correctly to handle this semantic.  The SUS V3 specification
explicitly states "No files shall be created or modified if the function
returns -1.".

The error, ENFILE, is used to indicate the system wide open file table is
full and no more file structs can be allocated.

This is due to an ordering problem.  The entry in the directory is created
before the file struct is allocated.  If the allocation for the file struct
fails, then the system call must return an error, but the directory entry
was already created and can not be safely removed.

The solution to this situation is relatively easy.  The file struct should
be allocated before the directory entry is created.  If the allocation
fails, then the error can be returned directly.  If the creation of the
directory entry fails, then the file struct can be easily freed.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:28 -07:00
Anton Altaparmakov
89ecf38c7a NTFS: Mask out __GFP_HIGHMEM when doing kmalloc() in __ntfs_malloc() as it
otherwise causes a BUG().

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-12 15:43:03 +01:00
Anton Altaparmakov
5d46770f5f NTFS: Change the mount options {u,f,d}mask to always parse the number as
an octal number to conform to how chmod(1) works, too.  Thanks to
      Giuseppe Bilotta and Horst von Brand for pointing out the errors of
      my ways.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-12 14:33:47 +01:00
Linus Torvalds
32983696a4 Merge branch 'for-linus' from kernel.org:/.../shaggy/jfs-2.6 manually
Clash due to new delete_inode behavior (the filesystem now needs to do
the truncate_inode_pages() call itself).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-11 10:14:54 -07:00
Nishanth Aravamudan
041e0e3b19 [PATCH] fs: fix-up schedule_timeout() usage
Use schedule_timeout_{,un}interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.  Also use helper
functions to convert between human time units and jiffies rather than constant
HZ division to avoid rounding errors.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:36 -07:00
Adrian Bunk
e711700a0e [PATCH] fs/cramfs/uncompress.c should #include <linux/cramfs_fs.h>
Every file should #include the header with the prototypes of the global
functions it is offering.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:35 -07:00
James Lamanna
ea0e0a4f53 [PATCH] janitor: reiserfs: super.c - vfree() checking cleanups
super.c vfree() checking cleanups.

Signed-off by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:34 -07:00
Domen Puncer
0cdca3f980 [PATCH] janitor: fs/dcache.c: list_for_each*
First one is list_for_each_entry (thanks maks), second 2 list_for_each_safe.

Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:32 -07:00
Domen Puncer
fdadd65fbc [PATCH] janitor: fs/namespace.c: list_for_each_entry
Make code more readable with list_for_each_entry.

Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:32 -07:00
Domen Puncer
216d81bb35 [PATCH] janitor: jffs/intrep: list_for_each_entry
Use list_for_each_entry to make code more readable.

Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Cc: <jffs-dev@axis.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:32 -07:00
Ingo Molnar
d79fc0fc66 [PATCH] sched: TASK_NONINTERACTIVE
This patch implements a task state bit (TASK_NONINTERACTIVE), which can be
used by blocking points to mark the task's wait as "non-interactive".  This
does not mean the task will be considered a CPU-hog - the wait will simply
not have an effect on the waiting task's priority - positive or negative
alike.  Right now only pipe_wait() will make use of it, because it's a
common source of not-so-interactive waits (kernel compilation jobs, etc.).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:22 -07:00
Ingo Molnar
fb1c8f93d8 [PATCH] spinlock consolidation
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code.  It does the following
things:

 - consolidates and enhances the spinlock/rwlock debugging code

 - simplifies the asm/spinlock.h files

 - encapsulates the raw spinlock type and moves generic spinlock
   features (such as ->break_lock) into the generic code.

 - cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c.  (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

 include/asm-i386/spinlock_types.h       |   16
 include/asm-x86_64/spinlock_types.h     |   16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

   SMP                         |  UP
   ----------------------------|-----------------------------------
   asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
   linux/spinlock_types.h      |  linux/spinlock_types.h
   asm/spinlock_smp.h          |  linux/spinlock_up.h
   linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
   linux/spinlock.h            |  linux/spinlock.h

/*
 * here's the role of the various spinlock/rwlock related include files:
 *
 * on SMP builds:
 *
 *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
 *                        initializers
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
 *                        implementations, mostly inline assembly code
 *
 *   (also included on UP-debug builds:)
 *
 *  linux/spinlock_api_smp.h:
 *                        contains the prototypes for the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 *
 * on UP builds:
 *
 *  linux/spinlock_type_up.h:
 *                        contains the generic, simplified UP spinlock type.
 *                        (which is an empty structure on non-debug builds)
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  linux/spinlock_up.h:
 *                        contains the __raw_spin_*()/etc. version of UP
 *                        builds. (which are NOPs on non-debug, non-preempt
 *                        builds)
 *
 *   (included on UP-non-debug builds:)
 *
 *  linux/spinlock_api_up.h:
 *                        builds the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 */

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler <grundler@parisc-linux.org>

  Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
  Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
  non-SMP kernels.  That should be trivial to fix up later if necessary.

  I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
  some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
  are well tested and contained entirely inside arch specific code.  I do NOT
  expect any new issues to arise with them.

 If someone does ever need to use debug/metrics with them, then they will
  need to unravel this hairball between spinlocks, atomic ops, and bit ops
  that exist only because parisc has exactly one atomic instruction: LDCW
  (load and clear word).

From: "Luck, Tony" <tony.luck@intel.com>

   ia64 fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:21 -07:00